Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Dependency-Induced Waste in Continuous Integration: An Empirical Study of Unused Dependencies in the npm Ecosystem

Published: 12 July 2024 Publication History

Abstract

Modern software systems are increasingly dependent upon code from external packages (i.e., dependencies). Building upon external packages allows software reuse to span across projects seamlessly. Package maintainers regularly release updated versions to provide new features, fix defects, and address security vulnerabilities. Due to the potential for regression, managing dependencies is not just a trivial matter of selecting the latest versions. Since it is perceived to be less risky to retain a dependency than remove it, as projects evolve, they tend to accrue dependencies, exacerbating the difficulty of dependency management. It is not uncommon for a considerable proportion of external packages to be unused by the projects that list them as a dependency. Although such unused dependencies are not required to build and run the project, updates to their dependency specifications will still trigger Continuous Integration (CI) builds. The CI builds that are initiated by updates to unused dependencies are fundamentally wasteful. Considering that CI build time is a finite resource that is directly associated with project development and service operational costs, understanding the consequences of unused dependencies within this CI context is of practical importance. In this paper, we study the CI waste that is generated by updates to unused dependencies. We collect a dataset of 20,743 commits that are solely updating dependency specifications (i.e., the package.json file), spanning 1,487 projects that adopt npm for managing their dependencies. Our findings illustrate that 55.88% of the CI build time that is associated with dependency updates is only triggered by unused dependencies. At the project level, the median project spends 56.09% of its dependency-related CI build time on updates to unused dependencies. For projects that exceed the budget of free build minutes, we find that the median percentage of billable CI build time that is wasted due to unused-dependency commits is 85.50%. Moreover, we find that automated bots are the primary producers of dependency-induced CI waste, contributing 92.93% of the CI build time that is spent on unused dependencies. The popular Dependabot is responsible for updates to unused dependencies that account for 74.52% of that waste. To mitigate the impact of unused dependencies on CI resources, we introduce Dep-sCImitar, an approach to cut down wasted CI time by identifying and skipping CI builds that are triggered due to unused-dependency commits. A retrospective evaluation of the 20,743 studied commits shows that Dep-sCImitar reduces wasted CI build time by 68.34% by skipping wasteful builds with a precision of 94%.

References

[1]
Rabe Abdalkareem, Suhaib Mujahid, and Emad Shihab. 2020. A machine learning approach to improve the detection of CI skip commits. Transactions on Software Engineering, https://doi.org/10.1109/TSE.2020.2967380
[2]
Rabe Abdalkareem, Suhaib Mujahid, Emad Shihab, and Juergen Rilling. 2019. Which commits can be CI skipped? Transactions on Software Engineering, 47 (2019), https://doi.org/10.1109/TSE.2019.2897300
[3]
Mahmoud Alfadel, Diego Elias Costa, Mouafak Mokhallalati, Emad Shihab, and Bram Adams. 2020. On the threat of npm vulnerable dependencies in node. js applications. arXiv preprint arXiv:2009.09019, https://doi.org/10.48550/arXiv.2009.09019
[4]
Mahmoud Alfadel, Diego Elias Costa, and Emad Shihab. 2023. Empirical analysis of security vulnerabilities in python packages. Empirical Software Engineering, 28 (2023), https://doi.org/10.1007/s10664-022-10278-4
[5]
Mahmoud Alfadel, Diego Elias Costa, Emad Shihab, and Bram Adams. 2023. On the discoverability of npm vulnerabilities in node. js projects. ACM Transactions on Software Engineering and Methodology, 32, 4 (2023), 1–27. https://doi.org/10.1145/3571848
[6]
Mahmoud Alfadel, Diego Elias Costa, Emad Shihab, and Mouafak Mkhallalati. 2021. On the use of dependabot security pull requests. In 18th International Conference on Mining Software Repositories (MSR). https://doi.org/10.1109/MSR52588.2021.00037
[7]
Victor R Basili, Lionel C Briand, and Walcélio L Melo. 1996. How reuse influences productivity in object-oriented systems. Commun. ACM, 39 (1996), https://doi.org/10.1145/236156.236184
[8]
Gabriele Bavota, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, and Sebastiano Panichella. 2015. How the apache community upgrades dependencies: an evolutionary study. Empirical Software Engineering, 20 (2015), https://doi.org/10.1007/s10664-014-9325-9
[9]
Arka Bhattacharya. 2014. Impact of continuous integration on software quality and productivity. Ph. D. Dissertation. The Ohio State University.
[10]
Christopher Bogart, Christian Kästner, and James Herbsleb. 2015. When it breaks, it breaks: How ecosystem developers reason about the stability of dependencies. In International Conference on Automated Software Engineering Workshop. https://doi.org/10.1109/ASEW.2015.21
[11]
Chris Bogart, Christian Kästner, James Herbsleb, and Ferdian Thung. 2021. When and how to make breaking changes: Policies and practices in 18 open source software ecosystems. ACM Transactions on Software Engineering and Methodology (TOSEM), 30, 4 (2021), 1–56. https://doi.org/10.1145/3447245
[12]
John Businge, Alexander Serebrenik, and Mark van den Brand. 2012. Survival of Eclipse third-party plug-ins. In 2012 28th IEEE International Conference on Software Maintenance (ICSM). 368–377. https://doi.org/10.1109/ICSM.2012.6405295
[13]
Bihuan Chen, Linlin Chen, Chen Zhang, and Xin Peng. 2020. Buildfast: History-aware build outcome prediction for fast feedback and reduced cost in continuous integration. In Proceedings of the 35th International Conference on Automated Software Engineering. https://doi.org/10.1145/3324884.3416616
[14]
Ching-Chi Chuang, Luís Cruz, Robbert van Dalen, Vladimir Mikovski, and Arie van Deursen. 2022. Removing dependencies from large software projects: are you really sure? In 22nd International Working Conference on Source Code Analysis and Manipulation. https://doi.org/10.1109/SCAM55253.2022.00017
[15]
Filipe Roseiro Cogo, Gustavo A Oliva, and Ahmed E Hassan. 2019. An empirical study of dependency downgrades in the npm ecosystem. Transactions on Software Engineering, 47 (2019), https://doi.org/10.1109/TSE.2019.2952130
[16]
Russ Cox. 2019. Surviving software dependencies. Commun. ACM, 62, 9 (2019), 36–43. https://doi.org/10.1145/3347446
[17]
Ozren Dabic, Emad Aghajani, and Gabriele Bavota. 2021. Sampling projects in github for MSR studies. In 18th International Conference on Mining Software Repositories (MSR). https://doi.org/10.1109/MSR52588.2021.00074
[18]
Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the evolution of technical lag in the npm package dependency network. In International Conference on Software Maintenance and Evolution. https://doi.org/10.1109/ICSME.2018.00050
[19]
Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the impact of security vulnerabilities in the npm package dependency network. In Proceedings of the 15th international conference on mining software repositories. https://doi.org/10.1145/3196398.3196401
[20]
Alexandre Decan, Tom Mens, and Hassan Onsori Delicheh. 2023. On the outdatedness of workflows in the GitHub Actions ecosystem. Journal of Systems and Software, https://doi.org/10.1016/j.jss.2023.111827
[21]
Alexandre Decan, Tom Mens, and Philippe Grosjean. 2019. An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering, 24 (2019), https://doi.org/10.1007/s10664-017-9589-y
[22]
Tapajit Dey, Sara Mousavi, Eduardo Ponce, Tanner Fry, Bogdan Vasilescu, Anna Filippova, and Audris Mockus. 2020. Detecting and characterizing bots that commit code. In Proceedings of the 17th international conference on mining software repositories. https://doi.org/10.1145/3379597.3387478
[23]
Jens Dietrich, David Pearce, Jacob Stringer, Amjed Tahir, and Kelly Blincoe. 2019. Dependency versioning in the wild. In 16th International Conference on Mining Software Repositories. https://doi.org/10.1109/MSR.2019.00061
[24]
Thomas Durieux, Claire Le Goues, Michael Hilton, and Rui Abreu. 2020. Empirical study of restarted and flaky builds on Travis CI. In Proceedings of the 17th International Conference on Mining Software Repositories. https://doi.org/10.1145/3379597.3387460
[25]
Paul M Duvall, Steve Matyas, and Andrew Glover. 2007. Continuous integration: improving software quality and reducing risk.
[26]
Wagner Felidré, Leonardo Furtado, Daniel A Da Costa, Bruno Cartaxo, and Gustavo Pinto. 2019. Continuous integration theater. In International Symposium on Empirical Software Engineering and Measurement. https://doi.org/10.1109/ESEM.2019.8870152
[27]
Gabriel Ferreira, Limin Jia, Joshua Sunshine, and Christian Kästner. 2021. Containing malicious package updates in npm with a lightweight permission system. In 43rd International Conference on Software Engineering (ICSE). 1334–1346. https://doi.org/10.1109/ICSE43902.2021.00121
[28]
Keheliya Gallaba, John Ewart, Yves Junqueira, and Shane Mcintosh. 2020. Accelerating continuous integration by caching environments and inferring dependencies. Transactions on Software Engineering, https://doi.org/10.1109/TSE.2020.3048335
[29]
Keheliya Gallaba, Maxime Lamothe, and Shane McIntosh. 2022. Lessons from Eight Years of Operational Data from a Continuous Integration Service: An Exploratory Case Study of CircleCI. In Proc. of the International Conference on Software Engineering. https://doi.org/10.1145/3510003.3510211
[30]
Keheliya Gallaba, Christian Macho, Martin Pinzger, and Shane McIntosh. 2018. Noise and heterogeneity in historical build data: an empirical study of travis ci. In Proceedings of the 33rd International Conference on Automated Software Engineering. https://doi.org/10.1145/3238147.3238171
[31]
Taher Ahmed Ghaleb, Daniel Alencar Da Costa, and Ying Zou. 2019. An empirical study of the long duration of continuous integration builds. Empirical Software Engineering, 24 (2019), https://doi.org/10.1007/s10664-019-09695-9
[32]
Taher Ahmed Ghaleb, Daniel Alencar Da Costa, Ying Zou, and Ahmed E Hassan. 2019. Studying the impact of noises in build breakage data. IEEE Transactions on Software Engineering, 47 (2019), https://doi.org/10.1109/TSE.2019.2941880
[33]
Antonios Gkortzis, Daniel Feitosa, and Diomidis Spinellis. 2019. A double-edged sword? Software reuse and potential security vulnerabilities. In Reuse in the Big Data Era: 18th International Conference on Software and Systems Reuse. https://doi.org/10.1007/978-3-030-22888-0_13
[34]
Mehdi Golzadeh, Alexandre Decan, and Tom Mens. 2022. On the rise and fall of CI services in GitHub. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). https://doi.org/10.1109/SANER53432.2022.00084
[35]
Foyzul Hassan and Xiaoyin Wang. 2017. Change-aware build prediction model for stall avoidance in continuous integration. In Proceedings of the International Symposium on Empirical Software Engineering and Measurement. https://doi.org/10.1109/ESEM.2017.23
[36]
Runzhi He, Hao He, Yuxia Zhang, and Minghui Zhou. 2023. Automating dependency updates in practice: An exploratory study on github dependabot. Transactions on Software Engineering, https://doi.org/10.1109/TSE.2023.3278129
[37]
Joseph Hejderup and Georgios Gousios. 2022. Can we trust tests to automate dependency updates? a case study of java projects. Journal of Systems and Software, 183 (2022), https://doi.org/10.1016/j.jss.2021.111097
[38]
Michael Hilton, Nicholas Nelson, Danny Dig, Timothy Tunnell, and Darko Marinov. 2016. Continuous integration (CI) needs and wishes for developers of proprietary code.
[39]
Michael Hilton, Nicholas Nelson, Timothy Tunnell, Darko Marinov, and Danny Dig. 2017. Trade-offs in continuous integration: assurance, security, and flexibility. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. https://doi.org/10.1145/3106237.3106270
[40]
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, costs, and benefits of continuous integration in open-source projects. In Proceedings of the 31st international conference on automated software engineering. https://doi.org/10.1145/2970276.2970358
[41]
Abbas Javan Jafari, Diego Elias Costa, Rabe Abdalkareem, Emad Shihab, and Nikolaos Tsantalis. 2021. Dependency smells in Javascript projects. Transactions on Software Engineering, 48 (2021), https://doi.org/10.1109/TSE.2021.3106247
[42]
Lukas Jendele, Markus Schwenk, Diana Cremarenco, Ivan Janicijevic, and Mikhail Rybalkin. 2019. Efficient automated decomposition of build targets at large-scale. In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). https://doi.org/10.1109/ICST.2019.00055
[43]
Xianhao Jin and Francisco Servant. 2020. A cost-efficient approach to building in continuous integration. In Proceedings of the 42nd International Conference on Software Engineering. https://doi.org/10.1145/3377811.3380437
[44]
Xianhao Jin and Francisco Servant. 2022. Which builds are really safe to skip? Maximizing failure observation for build selection in continuous integration. Journal of Systems and Software, 188 (2022), https://doi.org/10.1016/j.jss.2022.111292
[45]
Xianhao Jin and Francisco Servant. 2023. HybridCISave: A Combined Build and Test Selection Approach in Continuous Integration. ACM Transactions on Software Engineering and Methodology, 32, 4 (2023), 1–39. https://doi.org/10.1145/3576038
[46]
Md Mahir Asef Kabir, Ying Wang, Danfeng Yao, and Na Meng. 2022. How Do Developers Follow Security-Relevant Best Practices When Using NPM Packages? In Secure Development Conference (SecDev). https://doi.org/10.1109/SecDev53368.2022.00027
[47]
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The promises and perils of mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories. https://doi.org/10.1145/2597073.2597074
[48]
Igibek Koishybayev and Alexandros Kapravelos. 2020. Mininode: Reducing the Attack Surface of Node. js Applications. In RAID.
[49]
Raula Gaikovina Kula, Daniel M German, Ali Ouni, Takashi Ishio, and Katsuro Inoue. 2018. Do developers update their library dependencies? An empirical study on the impact of security advisories on library migration. Empirical Software Engineering, 23 (2018), https://doi.org/10.1007/s10664-017-9521-5
[50]
Jasmine Latendresse, Suhaib Mujahid, Diego Elias Costa, and Emad Shihab. 2022. Not All Dependencies are Equal: An Empirical Study on Production Dependencies in NPM. In 37th International Conference on Automated Software Engineering. https://doi.org/10.1145/3551349.3556896
[51]
Rungroj Maipradit, Dong Wang, Patanamon Thongtanunam, Raula Gaikovina Kula, Yasutaka Kamei, and Shane McIntosh. 2023. Repeated Builds During Code Review: An Empirical Study of the OpenStack Community. In Proc. of the International Conference on Automated Software Engineering. https://doi.org/10.1109/ASE56229.2023.00030
[52]
Samim Mirhosseini and Chris Parnin. 2017. Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In 2017 32nd IEEE/ACM international conference on automated software engineering (ASE). https://doi.org/10.1109/ASE.2017.8115621
[53]
Hamid Mohayeji, Andrei Agaronian, Eleni Constantinou, Nicola Zannone, and Alexander Serebrenik. 2023. Investigating the resolution of vulnerable dependencies with dependabot security updates. In 20th International Conference on Mining Software Repositories (MSR). https://doi.org/10.1109/MSR59073.2023.00042
[54]
J David Morgenthaler, Misha Gridnev, Raluca Sauciuc, and Sanjay Bhansali. 2012. Searching for build debt: Experiences managing technical debt at Google. In third international workshop on managing technical debt. https://doi.org/10.1109/MTD.2012.6225994
[55]
Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating github for engineered software projects. Empirical Software Engineering, https://doi.org/10.1007/s10664-017-9512-6
[56]
Thiago Nicolini, Andre Hora, and Eduardo Figueiredo. 2023. On the Usage of New JavaScript Features through Transpilers: The Babel Case. IEEE Software, https://doi.org/10.1109/MS.2023.3243858
[57]
Ivan Pashchenko, Henrik Plate, Serena Elisa Ponta, Antonino Sabetta, and Fabio Massacci. 2018. Vulnerable open source dependencies: Counting those that matter. In Proceedings of the 12th International Symposium on Empirical Software Engineering and Measurement. https://doi.org/10.1145/3239235.3268920
[58]
Ivan Pashchenko, Duc-Ly Vu, and Fabio Massacci. 2020. A qualitative study of dependency management and its security implications. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security. https://doi.org/10.1145/3372297.3417232
[59]
Serena Elisa Ponta, Wolfram Fischer, Henrik Plate, and Antonino Sabetta. 2021. The used, the bloated, and the vulnerable: Reducing the attack surface of an industrial application. In International Conference on Software Maintenance and Evolution. https://doi.org/10.1109/ICSME52107.2021.00056
[60]
Islem Saidani, Ali Ouni, Moataz Chouchen, and Mohamed Wiem Mkaouer. 2020. Predicting continuous integration build failures using evolutionary search. Information and Software Technology, 128 (2020), https://doi.org/10.1016/j.infsof.2020.106392
[61]
August Shi, Wing Lam, Reed Oei, Tao Xie, and Darko Marinov. 2019. iFixFlakies: A framework for automatically fixing order-dependent flaky tests. In Proceedings of the 27th Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. https://doi.org/10.1145/3338906.3338925
[62]
César Soto-Valero, Amine Benelallam, Nicolas Harrand, Olivier Barais, and Benoit Baudry. 2019. The emergence of software diversity in maven central. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 333–343. https://doi.org/10.1109/MSR.2019.00059
[63]
César Soto-Valero, Thomas Durieux, and Benoit Baudry. 2021. A longitudinal analysis of bloated java dependencies. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. https://doi.org/10.1145/3468264.3468589
[64]
César Soto-Valero, Nicolas Harrand, Martin Monperrus, and Benoit Baudry. 2021. A comprehensive study of bloated dependencies in the maven ecosystem. Empirical Software Engineering, 26 (2021), https://doi.org/10.1007/s10664-020-09914-8
[65]
Mohsen Vakilian, Raluca Sauciuc, J David Morgenthaler, and Vahab Mirrokni. 2015. Automated decomposition of build targets. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 1, 123–133. https://doi.org/10.1109/ICSE.2015.34
[66]
Bogdan Vasilescu, Alexander Serebrenik, and Vladimir Filkov. 2015. A Data Set for Social Diversity Studies of GitHub Teams. In Proceedings of the 12th Working Conference on Mining Software Repositories, Data Track (MSR). https://doi.org/10.1109/MSR.2015.77
[67]
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and productivity outcomes relating to continuous integration in GitHub. In Proceedings of the 10th joint meeting on foundations of software engineering. https://doi.org/10.1145/2786805.2786850
[68]
Hernan C Vazquez, J Pace, Claudia Marcos, and Santiago Vidal. 2022. Retrieving and Ranking Relevant JavaScript Technologies from Web Repositories. arXiv preprint arXiv:2205.15086, https://doi.org/10.48550/arXiv.2205.15086
[69]
Pei Wang, Jingiu Yang, Lin Tan, Robert Kroeger, and J David Morgenthaler. 2013. Generating precise dependencies for large software. In 4th International Workshop on Managing Technical Debt (MTD). https://doi.org/10.1109/MTD.2013.6608678
[70]
Nimmi Weeraddana, Mahmoud Alfadel, and Shane McIntosh. 2024. Characterizing Timeout Builds in Continuous Integration. IEEE Transactions on Software Engineering, https://doi.org/10.1109/TSE.2024.3387840
[71]
David Gray Widder, Michael Hilton, Christian Kästner, and Bogdan Vasilescu. 2019. A conceptual replication of continuous integration pain points in the context of Travis CI. In Proceedings of the 27th Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. https://doi.org/10.1145/3338906.3338922
[72]
Timo Wolf, Adrian Schroter, Daniela Damian, and Thanh Nguyen. 2009. Predicting build failures using social network analysis on developer communication. In 31st international conference on software engineering. https://doi.org/10.1109/ICSE.2009.5070503
[73]
Ahmed Zerouali, Tom Mens, Jesus Gonzalez-Barahona, Alexandre Decan, Eleni Constantinou, and Gregorio Robles. 2019. A formal framework for measuring technical lag in component repositories—and its application to npm. Journal of Software: Evolution and Process, 31 (2019), https://doi.org/10.1002/smr.2157
[74]
Markus Zimmermann, Cristian-Alexandru Staicu, Cam Tenny, and Michael Pradel. 2019. Small World with High Risks: A Study of Security Threats in the npm Ecosystem. In USENIX security symposium. 17, isbn:978-1-939133-06-9

Cited By

View all
  • (2024)Developer-Applied Accelerations in Continuous Integration: A Detection Approach and Catalog of PatternsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695533(1655-1666)Online publication date: 27-Oct-2024

Index Terms

  1. Dependency-Induced Waste in Continuous Integration: An Empirical Study of Unused Dependencies in the npm Ecosystem

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Software Engineering
      Proceedings of the ACM on Software Engineering  Volume 1, Issue FSE
      July 2024
      2770 pages
      EISSN:2994-970X
      DOI:10.1145/3554322
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 July 2024
      Published in PACMSE Volume 1, Issue FSE

      Author Tags

      1. continuous integration
      2. npm dependencies
      3. unused dependencies

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)239
      • Downloads (Last 6 weeks)65
      Reflects downloads up to 19 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Developer-Applied Accelerations in Continuous Integration: A Detection Approach and Catalog of PatternsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695533(1655-1666)Online publication date: 27-Oct-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media