Abstract
JavaScript is an increasingly popular language for server-side development, thanks in part to the Node.js runtime environment and its vast ecosystem of modules. With the Node.js package manager npm, users are able to easily include external modules as dependencies in their projects. However, npm installs modules with all of their functionality, even if only a fraction is needed, which causes an undue increase in code size. Eliminating this unused functionality from distributions is desirable, but the sound analysis required to find unused code is difficult due to JavaScript’s extreme dynamicity. We present a fully automatic technique that identifies unused code by constructing static or dynamic call graphs from the application’s tests, and replacing code deemed unreachable with either file- or function-level stubs. Due to JavaScript’s highly dynamic nature, call graph construction may suffer from unsoundness, i.e., code identified as unused may in fact be reachable. To handle such cases, if a stub is called, it will fetch and execute the original code on-demand to preserve the application’s behavior. The technique also provides an optional guarded execution mode to guard application against injection vulnerabilities in untested code that resulted from stub expansion. This technique is implemented in an open source tool called Stubbifier, designed to help package developers to produce a minimal production distribution. Stubbifier supports the ECMAScript 2019 standard. In an empirical evaluation on 15 Node.js applications and 75 clients of these applications, Stubbifier reduced application size by 56% on average while incurring only minor performance overhead. The evaluation also shows that Stubbifier’s guarded execution mode is capable of preventing several known injection vulnerabilities that are manifested in stubbed-out code. Finally, Stubbifier can work alongside bundlers, popular JavaScript tools for bundling an application with its dependencies. For the considered subject applications, we measured an average size reduction of 37% in bundled distributions.
Similar content being viewed by others
Notes
Many npm modules rely on additional development dependencies (sometimes referred to as “devDependencies”) that are needed only for development purposes, e.g., for running tests. These dependencies are typically not installed by clients.
There is more unreachable code in css-loader, but we focus on semver for the sake of illustration
Recall that in JavaScript functions are objects, and can have properties assigned dynamically.
apply calls its receiver as a function, binding its first argument to inside the function, and passing the other arguments as function arguments. arguments is a metavariable available inside functions that refers to its arguments.
The metrics in the table reflect the project’s own source code (excluding tests), and all its (transitive) production dependencies, but excluding devDependencies. .
Of the subject applications reported on in Karim et al. (2018), these were the only two that had a confirmed vulnerability and a test suite with passing tests.
The default behavior of rollup is to ignore dependent modules in node_modules, but the bundle should all code in which stubs may be introduced, to be able to determine Stubbifier’s effectiveness.
The full data for all applications is included in the supplemental material.
Of all the subject applications considered in Karim et al. (2018), these are the only two that still build, install, and have a test suite with passing tests, as required by Stubbifier.
In general, adapting application test suites to work with a bundled version of the application instead of the original version can be a complex and error-prone process, as test suites may import specific functions (that may be renamed by the bundler) from specific files (that may be combined by the bundler). For the applications mentioned here, this conversion was straightforward.
memfs, fs-nextra, commander.js, redux
memory-fs, serve-favicon
prop-types
References
Abadi M, Budiu M, Erlingsson U, Ligatti J (2009) Control-flow integrity principles, implementations, and applications. ACM Trans Inf Syst Secur 13(1). https://doi.org/10.1145/1609956.1609960
Agesen O, Ungar D (1994) Sifting out the gold: delivering compact applications from an exploratory object-oriented programming environment. In: Proceedings of the ninth annual conference on object-oriented programming systems, languages, and applications (OOPSLA’94), Portland, OR, pp 355–370, ACM SIGPLAN Notices 29(10)
Agesen O, Palsberg J, Schwartzbach MI (1993) Type inference of SELF. In: ECOOP’93—object-oriented programming, 7th European conference, Kaiserslautern, Germany, July 26–30, 1993, Proceedings. https://doi.org/10.1007/3-540-47910-4∖_14, pp 247–267
Andreasen E, Møller A (2014) Determinacy in static analysis for jQuery. In: Proceedings of the 29th ACM SIGPLAN international conference on object oriented programming systems languages, and applications (OOPSLA)
Avgustinov P, de Moor O, Jones M P, Schäfer M (2016) QL: object-oriented Queries on relational data. In: 30th European conference on object-oriented programming, ECOOP 2016, July 18–22, 2016, Rome, Italy. https://doi.org/10.4230/LIPIcs.ECOOP.2016.2, pp 2:1–2:25
Bacon D F, Sweeney P F (1996) Fast static analysis of c++ virtual function calls. In: Proceedings of the 1996 ACM SIGPLAN conference on object-oriented programming systems, languages & applications (OOPSLA ’96), San Jose, California, USA, October 6–10, 1996. https://doi.org/10.1145/236337.236371https://doi.org/10.1145/236337.236371, pp 324–341
Bhattacharya S, Gopinath K, Nanda M G (2013) Combining concern input with program analysis for bloat detection. In: Proceedings of the 2013 ACM SIGPLAN international conference on object oriented programming systems languages and applications, OOPSLA ’13. https://doi.org/10.1145/2509136.2509522. ACM, New York, pp 745–764
bdistin/fs-nextra (2021) https://github.com/bdistin/fs-nextra. Accessed 25 Oct 2021
Bruce B R, Zhang T, Arora J, Xu G H, Kim M (2020) JSHrink: in-depth investigation into debloating modern Java applications. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 135–146
Commit: remove eval (2021a) https://github.com/dougwilson/nodejs-depd/commit/887283b4. Accessed 16 Apr 2021
De Sutter B, De Bus B, De Bosschere K (2002) Sifting out the mud: low level C++ code reuse. SIGPLAN Not 37(11):275–291. https://doi.org/10.1145/583854.582445
Dean J, Grove D, Chambers C (1995) Optimization of object-oriented programs using static class hierarchy analysis. In: ECOOP’95—Object-oriented programming, 9th European conference, Århus, Denmark, August 7–11, 1995, Proceedings. https://doi.org/10.1007/3-540-49538-X∖_5, pp 77–101
depd issue 20 (2021b) https://github.com/dougwilson/nodejs-depd/issues/20. Accessed 16 Apr 2021
depd issue 22 (2021c) https://github.com/dougwilson/nodejs-depd/issues/22. Accessed 16 Apr 2021
depd issue 24 (2021d) https://github.com/dougwilson/nodejs-depd/issues/24. Accessed 16 Apr 2021
dougwilson/nodejs-depd (2021e) https://github.com/dougwilson/nodejs-depd. Accessed 16 Apr 2021
ECMA International (2019) ECMAScript 2019 language specification. https://262.ecma-international.org/10.0/. Accessed 16 Apr 2021
ECMA International (2021) ECMAScript module system. https://www.ecma-international.org/ecma-262/#sec-modules. Accessed 16 Apr 2021
expressjs/body-parser (2021) https://github.com/expressjs/body-parser. Accessed: 25 Oct 2021
expressjs/compression (2021) https://github.com/expressjs/compression. Accessed 25 Oct 2021
expressjs/morgan (2021) https://github.com/expressjs/morgan. Accessed 25 Oct 2021
expressjs/serve-favicon (2021) https://github.com/expressjs/serve-favicon. Accessed 25 Oct 2021
expressjs/serve-static (2021) https://github.com/expressjs/serve-static. Accessed 25 Oct 2021
facebook/prop-types (2021) https://github.com/facebook/prop-types. Accessed 25 Oct 2021
Gauthier F, Hassanshahi B, Jordan A (2018) Affogato: runtime detection of injection attacks for node.js. In: Companion proceedings for the ISSTA/ECOOP 2018 workshops, ISSTA ’18. https://doi.org/10.1145/3236454.3236502. Association for Computing Machinery, New York, pp 94–99
GitHub (2020) Language trends on GitHub. https://octoverse.github.com/#top-languages
GitHub (2021) CodeQL. https://github.com/github/codeql. Accessed 16 Apr 2021
Hovemeyer D, Pugh W (2001) More efficient network class loading through bundling. In: Proceedings of the 1st Java virtual machine research and technology symposium, April 23–24, 2001, Monterey, CA, USA, pp 127–140
isaacs/node-glob (2021) https://github.com/isaacs/node-glob. Accessed 25 Oct 2021
Istanbul (2021) nyc. https://www.npmjs.com/package/nyc. Accessed: 12 Oct 2021
Jensen S H, Madsen M, Møller A (2011) Modeling the HTML DOM and browser API in static analysis of JavaScript web applications. In: SIGSOFT/FSE’11 19th ACM SIGSOFT symposium on the foundations of software engineering (FSE-19) and ESEC’11: 13th European software engineering conference (ESEC-13), Szeged, Hungary, September 5–9, 2011, pp 59–69
Jensen S H, Jonsson P A, Møller A (2012) Remedying the eval that men do. In: Heimdahl MPE, Su Z (eds) International symposium on software testing and analysis, ISSTA 2012, Minneapolis, MN, USA, July 15-20, 2012. https://doi.org/10.1145/2338965.2336758. ACM, pp 34–44
Karim R, Tip F, Sochŭrková A, Sen K (2018) Platform-independent dynamic taint analysis for JavaScript. IEEE Trans Softw Eng 46(12):1364–1379
Koishybayev I, Kapravelos A (2020) Mininode: reducing the attack surface of Node.js applications. In: Proceedings of the international symposium on research in attacks, intrusions and defenses (RAID)
Koo H, Ghavamnia S, Polychronakis M (2019) Configuration-driven software debloating. In: Proceedings of 12th European workshop on systems security (EuroSec ’19)
Krintz C, Calder B, Hölzle U (1999) Reducing transfer delay using Java class file splitting and prefetching. In: Proceedings of the 1999 ACM SIGPLAN conference on object-oriented programming systems, languages & applications (OOPSLA ’99), Denver, Colorado, USA, November 1–5, 1999. https://doi.org/10.1145/320384.320412, pp 276–291
kriskowal/q (2021) https://github.com/kriskowal/q. Accessed 25 Oct 2021
Li S, Kang M, Hou J, Cao Y (2021) Detecting Node.Js prototype pollution vulnerabilities via object lookup analysis. Association for Computing Machinery, New York, pp 268–279. https://doi.org/10.1145/3468264.3468542https://doi.org/10.1145/3468264.3468542
Li S, Kang M, Hou J, Cao Y (2022) Mining node.js vulnerabilities via object dependence graph and query. In: 31st USENIX Security Symposium (USENIX Security 22). https://www.usenix.org/conference/usenixsecurity22/presentation/li-song. USENIX Association, Boston
Li Y, Tan T, Møller A, Smaragdakis Y (2018a) Precision-guided context sensitivity for pointer analysis. PACMPL 2(OOPSLA):141:1–141:29
Li Y, Tan T, Møller A, Smaragdakis Y (2018b) Scalability-first pointer analysis with self-tuning context-sensitivity. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 129–140
Livshits V B, Kiciman E (2008) Doloto: code splitting for network-bound Web 2.0 applications. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering, 2008, Atlanta, Georgia, USA, November 9–14, 2008. https://doi.org/10.1145/1453101.1453151, pp 350–360
Lutz M (2013) Learning Python, 5th edn
Madsen M, Tip F, Lhoták O (2015) Static analysis of event-driven Node.js JavaScript applications. In: Aldrich J, Eugster P (eds) Proceedings of the 2015 ACM SIGPLAN international conference on object-oriented programming, systems, languages, and applications, OOPSLA 2015, part of SPLASH 2015, Pittsburgh, PA, USA, October 25–30, 2015. https://doi.org/10.1145/2814270.2814272https://doi.org/10.1145/2814270.2814272. ACM, pp 505–519
mapbox/node-blend (2021) https://github.com/mapbox/node-blend. Accessed 16 Apr 2021
MDN (2021) Tree shaking. https://developer.mozilla.org/en-US/docs/Glossary/Tree_shaking. Accessed 11 Oct 2021
Møller A, Torp MT (2019) Model-based testing of breaking changes in node.js libraries. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ESEC/FSE 2019. https://doi.org/10.1145/3338906.3338940. Association for Computing Machinery, New York, pp 409–419
Mozilla (2021) Rest parameters. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/rest_parameters. Accessed 16 Apr 2021
Nielsen B B, Hassanshahi B, Gauthier F (2019) Nodest: feedback-driven static analysis of node.js applications. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ESEC/FSE 2019. https://doi.org/10.1145/3338906.3338933. Association for Computing Machinery, New York, pp 455–465
Niu B, Tan G (2015) Per-input control-flow integrity. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, CCS ’15. https://doi.org/10.1145/2810103.2813644. Association for Computing Machinery, New York, pp 914–926
npm (2021a) npm. https://www.npmjs.com/. Accessed 16 Apr 2021
npm (2021b) semver. https://www.npmjs.com/package/semver. Accessed 16 Apr 2021
OpenJS Foundation (2021) Node.js. https://nodejs.org/en/. Accessed 16 Apr 2021
pillarjs/send (2021) https://github.com/pillarjs/send. Accessed 25 Oct 2021
Rayside D, Kontogiannis K (2002) Extracting Java library subsets for deployment on embedded systems. Sci Comput Program 45(2):245–270. https://doi.org/10.1016/S0167-6423(02)00059-X
reduxjs/redux (2021) https://github.com/reduxjs/redux. Accessed 25 Oct 2021
Richards G, Hammer C, Burg B, Vitek J (2011) The eval that men do—a large-scale study of the use of eval in JavaScript applications. In: Mezini M (ed) ECOOP 2011—object-oriented programming—25th European conference, Lancaster, UK, July 25–29, 2011 Proceedings, vol 6813. Springer, Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-642-22655-7_4, pp 52–78
rlang (2022) Execute a function. See https://rlang.r-lib.org/reference/exec.html
Rollup (2021) Rollup. https://www.npmjs.com/package/rollup. Accessed 11 Oct 2021
Sharif H, Abubakar M, Gehani A, Zaffar F (2018) TRIMMER: application specialization for code debloating. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018, Montpellier, France, September 3–7, 2018. https://doi.org/10.1145/3238147.3238160https://doi.org/10.1145/3238147.3238160, pp 329–339
streamich/memfs (2021) https://github.com/streamich/memfs. Accessed 25 Oct 2021
Sridharan M, Dolby J, Chandra S, Schäfer M, Tip F (2012) Correlation tracking for points-to analysis of JavaScript. In: Noble J (ed) ECOOP 2012—Object-Oriented Programming—26th European Conference, Beijing, China, June 11–16, 2012. Proceedings, vol 7313. Springer, Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-642-31057-7_20, pp 435–458
Stack Overflow (2020) Developer survey. https://insights.stackoverflow.com/survey/2020#most-popular-technologies
Staicu C A, Pradel M, Livshits B (2018) Synode: understanding and automatically preventing injection attacks on node.js. In: NDSS
Staicu C A, Torp M T, Schäfer M, Møller A, Pradel M (2020) Extracting taint specifications for javascript libraries. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, ICSE ’20. https://doi.org/10.1145/3377811.3380390. Association for Computing Machinery, New York, pp 198–209
Stein B, Nielsen B B, Chang B E, Møller A (2019) Static analysis with demand-driven value refinement. Proc ACM Program Lang 3(OOPSLA):140:1–140:29. https://doi.org/10.1145/3360566
Sweeney P F, Tip F (2000) Extracting library-based object-oriented applications. In: ACM SIGSOFT symposium on foundations of software engineering, San Diego, California, USA, November 6–10, 2000, Proceedings, pp 98–107
Tip F, Palsberg J (2000) Scalable propagation-based call graph construction algorithms. In: Proceedings of the 2000 ACM SIGPLAN conference on object-oriented programming systems, languages & applications (OOPSLA 2000), Minneapolis, Minnesota, USA, October 15–19, 2000. https://doi.org/10.1145/353171.353190, pp 281–293
Tip F, Laffra C, Sweeney P F, Streeter D (1999) Practical experience with an application extractor for Java. In: Proceedings of the 1999 ACM SIGPLAN conference on object-oriented programming systems, languages & applications (OOPSLA ’99), Denver, Colorado, USA, November 1–5, 1999. https://doi.org/10.1145/320384.320414, pp 292–305
Tip F, Sweeney P F, Laffra C, Eisma A, Streeter D (2002) Practical extraction techniques for Java. ACM Trans Program Lang Syst 24(6):625–666. https://doi.org/10.1145/586088.586090
tj/commander.js (2021) https://github.com/tj/commander.js. Accessed 25 Oct 2021
Turcotte A, Arteca E, Mishra A, Alimadadi S, Tip F (2021) Stubbifer: debloating dynamic server-side JavaScript applications (artifact). https://doi.org/10.5281/zenodo.5599914
Vasilakis N, Staicu C A, Ntousakis G, Kallas K, Karel B, DeHon A, Pradel M (2021) Preventing dynamic library compromise on node.js via rwx-based privilege reduction. In: Proceedings of the 2021 ACM SIGSAC conference on computer and communications security, CCS ’21. https://doi.org/10.1145/3460120.3484535. Association for Computing Machinery, New York, pp 1821–1838
VisualWorks User’s Guide (1995) ParcPlace-DigiTalk, software release 2.5 edn, chapter 13: application delivery tools. Available from http://esug.org/data/Old/vw-tutorials/vw25/vw25ug.pdf
VisualAge for Smalltalk Handbook Volume 1: Fundamentals (1997) IBM Corporation, first edition edn, available from http://www.redbooks.ibm.com/redbooks/4instantiations/sg244828.pdf
Wagner G, Gal A, Franz M (2011) “Slimming” a Java virtual machine by way of cold code removal and optimistic partial program loading. Sci Comput Program 76(11):1037–1053. https://doi.org/10.1016/j.scico.2010.04.008https://doi.org/10.1016/j.scico.2010.04.008
webpack (2021) webpack. https://www.npmjs.com/package/webpack. Accessed 11 Oct 2021
webpack-contrib (2021) css-loader. https://www.npmjs.com/package/css-loader. Accessed 16 Apr 2021
webpack-contrib/css-loader (2021) https://github.com/webpack-contrib/css-loader. Accessed 25 Oct 2021
webpack/memory-fs (2021) https://github.com/webpack/memory-fs. Accessed 25 Oct 2021
Zhang C, Wei T, Chen Z, Duan L, Szekeres L, McCamant S, Song D, Zou W (2013) Practical control flow integrity and randomization for binary executables. In: 2013 IEEE symposium on security and privacy, pp 559–573
Zimmermann M, Staicu C A, Tenny C, Pradel M (2019) Smallworld with high risks: a study of security threats in the npm ecosystem. In: Proceedings of the 28th USENIX conference on security symposium, USENIX Association, USA, SEC’19, pp 995–1010
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declared that they have no conflict of interest.
Additional information
Communicated by: Carlo A. Furia
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was supported in part by Office of Naval Research (ONR) grants N00014-17-1-2945 and N00014-21-1-2491, and by National Science Foundation grant CCF-1907727. E. Arteca and A. Turcotte are supported in part by the Natural Sciences and Engineering Research Council of Canada.
Alexi Turcotte and Ellen Arteca contributed equally to the work.
Rights and permissions
About this article
Cite this article
Turcotte, A., Arteca, E., Mishra, A. et al. Stubbifier: debloating dynamic server-side JavaScript applications. Empir Software Eng 27, 161 (2022). https://doi.org/10.1007/s10664-022-10195-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-022-10195-6