Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3643991.3644900acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

A Large-Scale Empirical Study of Open Source License Usage: Practices and Challenges

Published: 02 July 2024 Publication History

Abstract

The popularity of open source software (OSS) has led to a significant increase in the number of available licenses, each with their own set of terms and conditions. This proliferation of licenses has made it increasingly challenging for developers to select an appropriate license for their projects and to ensure that they are complying with the terms of those licenses. As a result, there is a need for empirical studies to identify current practices and challenges in license usage, both to help developers make informed decisions about license selection and to ensure that OSS is being used and distributed in a legal and ethical manner. Moreover, the development of new licenses might be required to better meet the needs of the open source community and address emerging legal issues.
In this paper, we conduct a large-scale empirical study of license usage across five package management platforms, i.e., Maven, NPM, PyPI, RubyGems, and Cargo. Our objective is to examine the current trends and potential issues in license usage of the OSS community. In total, we analyze the licenses of 33,710,877 packages across the selected five platforms. We statistically analyze licenses in package management platforms from multiple perspectives, e.g., license usage, license incompatibility, license updates, and license evolution. Moreover, we conduct a comparative study of various aspects of core packages and common packages in these platforms. Our results reveal irregularities in license names and license incompatibilities that require attention. We observe both similarities and differences in license usage across the five platforms, with Cargo being the most standardized among them. Finally, we discuss some implications for actions based on our findings.

References

[1]
1997. CRAN. https://cran.r-project.org/
[2]
2003. PyPI. https://pypi.org/
[3]
2004. Maven. https://maven.apache.org/
[4]
2004. RubyGems. https://rubygems.org/
[5]
2007. Open Source Definition. https://opensource.org/osd/
[6]
2010. NPM. https://www.npmjs.com/package/npm
[7]
2015. Cargo. https://crates.io/
[8]
2022. Cargo data source. https://static.crates.io/db-dump.tar.gz
[9]
2022. Maven data source. https://repo.maven.apache.org/maven2/.index/
[10]
2022. npm data source. https://replicate.npmjs.com/_all_docs
[11]
2022. PyPI data source. https://console.cloud.google.com/marketplace/product/gcp-public-data-pypi/pypi?_ga=2.219857497.-1185994749.1670227125&project=dataanalysis-368712
[12]
2022. RubyGems data source. https://rubygems.org/pages/data
[13]
2023. mvnrepository. https://mvnrepository.com/
[14]
2023. Replication Package. https://figshare.com/s/5f35cac93da06567b1ca
[15]
2023. ScanCode. https://github.com/nexB/scancode-toolkit
[16]
2023. tldrlegal. https://www.tldrlegal.com/
[17]
Alfred V Aho and Margaret J Corasick. 1975. Efficient string matching: an aid to bibliographic search. Commun. ACM 18, 6 (1975), 333--340.
[18]
Daniel A Almeida, Gail C Murphy, Greg Wilson, and Michael Hoye. 2019. Investigating whether and how software developers understand open source software licensing. Empirical Software Engineering 24 (2019), 211--239.
[19]
Barry W. Boehm. 1987. Improving software productivity. Computer 20, 09 (1987), 43--57.
[20]
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems 30, 1-7 (1998), 107--117.
[21]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.
[22]
Massimiliano Di Penta, Daniel M German, Yann-Gaël Guéhéneuc, and Giuliano Antoniol. 2010. An exploratory study of the evolution of software licensing. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. 145--154.
[23]
Karl Fogel. 2005. Producing open source software: How to run a successful free software project. " O'Reilly Media, Inc.".
[24]
GR Gangadharan, Vincenzo D'Andrea, Stefano De Paoli, and Michael Weiss. 2012. Managing license compliance in free and open source software development. Information Systems Frontiers 14 (2012), 143--154.
[25]
Daniel M German, Massimiliano Di Penta, and Julius Davies. 2010. Understanding and auditing the licensing of open source software distributions. In 2010 IEEE 18th International Conference on Program Comprehension. IEEE, 84--93.
[26]
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2016. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering 21 (2016), 2035--2071.
[27]
Georgia M Kapitsaki and Georgia Charalambous. 2019. Modeling and recommending open source licenses with findOSSLicense. IEEE Transactions on Software Engineering 47, 5 (2019), 919--935.
[28]
Jingyue Li, Reidar Conradi, Christian Bunse, Marco Torchiano, Odd Petter N Slyngstad, and Maurizio Morisio. 2009. Development with off-the-shelf components: 10 facts. IEEE software 26, 2 (2009), 80--87.
[29]
librariesio. 2015. Check compatibility between different SPDX licenses for checking dependency license compatibility. https://github.com/librariesio/license-compatibility
[30]
Ilyas Saïd Makari, Ahmed Zerouali, and Coen De Roover. 2022. Prevalence and Evolution of License Violations in npm and RubyGems Dependency Networks. In Reuse and Software Quality: 20th International Conference on Software and Systems Reuse, ICSR 2022, Montpellier, France, June 15--17, 2022, Proceedings. Springer, 85--100.
[31]
Yuki Manabe, Yasuhiro Hayase, and Katuro Inoue. 2010. Evolutional analysis of licenses in FOSS. In Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE). 83--87.
[32]
M Douglas McIlroy, J Buxton, Peter Naur, and Brian Randell. 1968. Mass-produced software components. In Proceedings of the 1st international conference on software engineering, Garmisch Pattenkirchen, Germany. 88--98.
[33]
Rômulo Meloca, Gustavo Pinto, Leonardo Baiser, Marco Mattos, Ivanilton Polato, Igor Scaliante Wiese, and Daniel M German. 2018. Understanding the usage, impact, and adoption of non-osi approved licenses. In Proceedings of the 15th International Conference on Mining Software Repositories. 270--280.
[34]
Joao Pedro Moraes, Ivanilton Polato, Igor Wiese, Filipe Saraiva, and Gustavo Pinto. 2021. From one to hundreds: multi-licensing in the JavaScript ecosystem. Empirical Software Engineering 26 (2021), 1--29.
[35]
Demetris Paschalides and Georgia M Kapitsaki. 2016. Validate your SPDX files for open source license violations. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 1047--1051.
[36]
Shi Qiu, Daniel M German, and Katsuro Inoue. 2021. Empirical study on dependency-related license violation in the javascript package ecosystem. Journal of Information Processing 29 (2021), 296--304.
[37]
Carlos Denner dos Santos. 2017. Changes in free and open source software licenses: managerial interventions and variations on project attractiveness. Journal of Internet Services and Applications 8 (2017), 1--12.
[38]
SPDX. 2023. SPDX License List. https://spdx.org/licenses/
[39]
Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al. 2022. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022).
[40]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
[41]
Christopher Vendome, Gabriele Bavota, Massimiliano Di Penta, Mario Linares-Vásquez, Daniel German, and Denys Poshyvanyk. 2017. License usage and changes: a large-scale study on github. Empirical Software Engineering 22 (2017), 1537--1577.
[42]
Christopher Vendome, Daniel M German, Massimiliano Di Penta, Gabriele Bavota, Mario Linares-Vásquez, and Denys Poshyvanyk. 2018. To distribute or not to distribute? why licensing bugs matter. In Proceedings of the 40th International Conference on Software Engineering. 268--279.
[43]
Christopher Vendome, Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Daniel German, and Denys Poshyvanyk. 2015. License usage and changes: a large-scale study of java projects on github. In 2015 IEEE 23rd International Conference on Program Comprehension. IEEE, 218--228.
[44]
Christopher Vendome, Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Daniel German, and Denys Poshyvanyk. 2017. Machine learning-based detection of open source license exceptions. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 118--129.
[45]
Christopher Vendome, Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Daniel M German, and Denys Poshyvanyk. 2015. When and why developers adopt and change software licenses. In 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, 31--40.
[46]
Erik Wittern, Philippe Suter, and Shriram Rajagopalan. 2016. A look at the dynamics of the JavaScript package ecosystem. In Proceedings of the 13th International Conference on Mining Software Repositories. 351--361.
[47]
Thomas Wolter, Ann Barcomb, Dirk Riehle, and Nikolay Harutyunyan. [n. d.]. Open Source License Inconsistencies on GitHub. ACM Transactions on Software Engineering and Methodology ([n. d.]).
[48]
Yuhao Wu, Yuki Manabe, Tetsuya Kanda, Daniel M German, and Katsuro Inoue. 2015. A method to detect license inconsistencies in large-scale open source projects. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 324--333.
[49]
Yuhao Wu, Yuki Manabe, Tetsuya Kanda, Daniel M German, and Katsuro Inoue. 2017. Analysis of license inconsistency in large collections of open source projects. Empirical Software Engineering 22 (2017), 1194--1222.
[50]
Sihan Xu, Ya Gao, Lingling Fan, Zheli Liu, Yang Liu, and Hua Ji. 2023. LiDetector: License Incompatibility Detection for Open Source Software. ACM Transactions on Software Engineering and Methodology 32, 1 (2023), 1--28.
[51]
Weiwei Xu, Hao He, Kai Gao, and Minghui Zhou. 2023. Understanding and Remediating Open-Source License Incompatibilities in the PyPI Ecosystem. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 178--190.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '24: Proceedings of the 21st International Conference on Mining Software Repositories
April 2024
788 pages
ISBN:9798400705878
DOI:10.1145/3643991
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

  1. OSS licenses
  2. empirical study
  3. package management platform

Qualifiers

  • Research-article

Conference

MSR '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 57
    Total Downloads
  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)23
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media