Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3661167.3661175acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
research-article
Open access

On the Accuracy of GitHub's Dependency Graph

Published: 18 June 2024 Publication History

Abstract

GitHub’s dependency graph shows dependency relationships between repositories. This feature is leveraged by tools such as Dependabot, or GitHub’s feature to export SBOM (Software Bill of Materials) files. Also, it has been used in empirical studies. Inaccuracies in the dependency graph might negatively affect both the effectiveness of tools and the results of the conducted studies. In this paper, we present the results of a mining study to assess the accuracy of GitHub’s dependency graph in Java and Python open-source software projects. In particular, on April 16th, 2023, we randomly sampled 297 software projects developed in Java and 338 developed in Python (all hosted on GitHub), each using GitHub’s dependency graph. Then, we performed three analyses to assess how accurate GitHub’s dependency graph is: (i) backward analysis, focusing on the accuracy of the dependencies of a given repository, as reported in GitHub’s dependency graph; (ii) forward analysis, focusing on the accuracy of the dependents of a given repository, as reported in GitHub’s dependency graph; and (iii) manifest/lock file analysis, focusing on the correspondence between the dependencies reported in the dependency graph of a given repository and what was reported in the corresponding manifest/lock files. The obtained results highlight several inaccuracies in GitHub’s dependency graph, which might affect the output of tools based on GitHub’s dependency graph (e.g., Dependabot and SBOM generators) as well as the outcomes of past empirical studies. We also provide qualitative insights into these inaccuracies and implications for practitioners and researchers.

References

[1]
Mahmoud Alfadel, Diego Elias Costa, Emad Shihab, and Mouafak Mkhallalati. 2021. On the Use of Dependabot Security Pull Requests. In Proceedings of International Conference on Mining Software Repositories. IEEE, 254–265.
[2]
Gabriele Bavota, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, and Sebastiano Panichella. 2015. How the apache community upgrades dependencies: an evolutionary study. Empir. Softw. Eng. 20 (2015), 1275–1317.
[3]
Daniele Bifolco, Sabato Nocera, Simone Romano, Massimiliano Di Penta, Rita Francese, and Giuseppe Scanniello. 2023. On the Accuracy of GitHub’s Dependency Graph: A Replication Package. https://figshare.com/s/81e96d4864f4ebc5e25c. https://doi.org/10.6084/m9.figshare.24441289
[4]
Christopher Bogart, Christian Kästner, and James Herbsleb. 2015. When it breaks, it breaks: How ecosystem developers reason about the stability of dependencies. In Proceedings of IEEE/ACM International Conference on Automated Software Engineering Workshop. IEEE, 86–89.
[5]
Ozren Dabic, Emad Aghajani, and Gabriele Bavota. 2021. Sampling Projects in GitHub for MSR Studies. In Proceedings of International Conference on Mining Software Repositories. IEEE, 560–564.
[6]
GitHub. 2023. About the Dependency Graph. https://docs.github.com/en/code-security/supply-chain-security/understanding-your-software-supply-chain/about-the-dependency-graph.
[7]
GitHub. 2023. Dependabot. https://docs.github.com/en/code-security/dependabot.
[8]
GitHub. 2023. DMCA Takedown Policy. https://docs.github.com/en/site-policy/content-removal-policies/dmca-takedown-policy.
[9]
GitHub. 2023. Exporting a Software Bill of Materials for Your Repository. https://docs.github.com/en/code-security/supply-chain-security/understanding-your-software-supply-chain/exporting-a-software-bill-of-materials-for-your-repository.
[10]
GitHub. 2023. Help improve GitHub dependency graph with your feedback!https://github.com/orgs/community/discussions/43364
[11]
GitHub. 2023. Troubleshooting the dependency graph. https://docs.github.com/en/[email protected]/code-security/supply-chain-security/understanding-your-software-supply-chain/troubleshooting-the-dependency-graph#are-there-limits-which-affect-the-dependency-graph-data.
[12]
Runzhi He, Hao He, Yuxia Zhang, and Minghui Zhou. 2023. Automating Dependency Updates in Practice: An Exploratory Study on GitHub Dependabot. IEEE Trans. Softw. Eng. 49, 8 (2023), 4004–4022.
[13]
Vincent Jacques. 2023. PyGithub. https://pygithub.readthedocs.io/en/stable/.
[14]
Joe Biden. 2021. Executive Order on Improving the Nation’s Cybersecurity. https://www.whitehouse.gov/briefing-room/presidential-actions/2021/05/12/executive-order-on-improving-the-nations-cybersecurity/
[15]
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining GitHub. In Proceedings of Mining Software Repositories. ACM, 92–101.
[16]
Hamid Mohayeji, Andrei Agaronian, Eleni Constantinou, Nicola Zannone, and Alexander Serebrenik. 2023. Investigating the Resolution of Vulnerable Dependencies with Dependabot Security Updates. In Proceedings of 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). IEEE, 234–246.
[17]
João Eduardo Montandon, Marco Tulio Valente, and Luciana L. Silva. 2021. Mining the Technical Roles of GitHub Users. Inf. Softw. Technol. 131 (2021), 106485.
[18]
Sabato Nocera, Simone Romano, Massimiliano Di Penta, Rita Francese, and Giuseppe Scanniello. 2023. Software Bill of Materials Adoption: A Mining Study from GitHub. In Proceedings of International Conference on Software Maintenance and Evolution. IEEE.
[19]
Jevgenija Pantiuchina, Bin Lin, Fiorella Zampetti, Massimiliano Di Penta, Michele Lanza, and Gabriele Bavota. 2021. Why Do Developers Reject Refactorings in Open-Source Projects?ACM Trans. Softw. Eng. Methodol. 31, 2 (2021), 1–23.
[20]
PYPL. 2023. 2023 PYPL Index. https://pypl.github.io/PYPL.html.
[21]
Leonard Richardson. 2023. BeautifulSoup. https://www.crummy.com/software/BeautifulSoup/.
[22]
Stack Overflow. 2022. 2022 Developer Survey. https://survey.stackOverflow.co/2022.
[23]
Trevor Stalnaker, Nathan Wintersgill, Oscar Chaparro, Massimiliano Di Penta, Daniel M German, and Denys Poshyvanyk. 2024. BOMs Away! Inside the Minds of Stakeholders: A Comprehensive Study of Bills of Materials for Software Systems. In Procedings of International Conference on Software Engineering. ACM.
[24]
Eric Tooley and Erin Havens. 2023. A Smarter Quieter Dependabot. https://github.blog/2023-01-12-a-smarter-quieter-dependabot/.
[25]
Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer.
[26]
Boming Xia, Tingting Bi, Zhenchang Xing, Qinghua Lu, and Liming Zhu. 2023. An Empirical Study on Software Bill of Materials: Where We Stand and the Road Ahead. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 2630–2642.
[27]
Nusrat Zahan, Thomas Zimmermann, Patrice Godefroid, Brendan Murphy, Chandra Maddila, and Laurie Williams. 2022. What Are Weak Links in the Npm Supply Chain?. In Proceedings of International Conference on Software Engineering: Software Engineering in Practice. ACM, 331–340.
[28]
Fiorella Zampetti, Ritu Kapur, Massimiliano Di Penta, and Sebastiano Panichella. 2022. An empirical characterization of software bugs in open-source Cyber–Physical Systems. J. Syst. Softw. 192 (2022), 111425.

Cited By

View all
  • (2024)MSR4SBOM: Mining Software Repositories for enhanced Software Bills of MaterialsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3695390(589-593)Online publication date: 24-Oct-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering
June 2024
728 pages
ISBN:9798400717017
DOI:10.1145/3661167
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Check for updates

Author Tags

  1. Dependency graph
  2. Empirical study
  3. GitHub

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Italian Ministry of University and Research

Conference

EASE 2024

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)472
  • Downloads (Last 6 weeks)90
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MSR4SBOM: Mining Software Repositories for enhanced Software Bills of MaterialsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3695390(589-593)Online publication date: 24-Oct-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media