Minimalist: Semi-automated debloating of PHP web applications through static analysis
Article No.: 311, Pages 5557 - 5573
Abstract
As web applications grow more complicated and rely on third-party libraries to deliver new features to their users, they become bloated with unnecessary code. This unnecessary code increases a web application's attack surface, which can be exploited to steal user data and compromise the underlying web server. One approach to deal with bloated code is the process of selectively removing features that users do not require -- debloating.
In this paper, we identify the current challenges with debloating web applications and propose a semi-automated static debloating scheme. We implement a prototype of our proposed method, called Minimalist that generates a call-graph for a given PHP web application. Minimalist performs a reachability analysis for the features users require and removes unreachable functions in the analyzed web application. Compared to priorwork, Minimalist debloats web applications without relying on heavy runtime instrumentation. Furthermore, the call-graph generated by Minimalist can be reused (in combination with web server logs) to debloat different installations of the same web application. Due to the inherent complexity and highly dynamic nature of the PHP language, Minimalist cannot guarantee the soundness of its call-graph analysis. However, Minimalist follows a best-effort approach to model the majority of PHP features used by popular web applications, such as WordPress, phpMyAdmin, and others.
We evaluated Minimalist on 12 versions of four popular PHP web applications with 45 recent security vulnerabilities. We show that Minimalist reduces the size of web applications in our dataset on average by 18% and removes 38% of known vulnerabilities. Our results demonstrate that the principled debloating of web applications can lead to significant security gains without relying on instrumentation mechanisms that degrade the performance of the server.
References
[1]
Muhammad Abubakar, Adil Ahmad, Pedro Fonseca, and Dongyan Xu. shard: Fine-grained kernel specialization with context-aware hardening. In Proceedings of the 30th USENIX Security Symposium, 2021.
[2]
Ioannis Agadakos, Di Jin, David Williams-King, Vasileios P Kemerlis, and Georgios Portokalidis. Nibbler: debloating binary shared libraries. In Proceedings of the 35th Annual Computer Security Applications Conference, 2019.
[3]
Babak Amin Azad. Less is More Source Code. https://debloating.com, 2022.
[4]
Babak Amin Azad, Pierre Laperdrix, and Nick Nikiforakis. Less is more: Quantifying the security benefits of debloating web applications. In Proceedings of the 28th USENIX Security Symposium, 2019.
[5]
Alexander Bulekov, Rasoul Jahanshahi, and Manuel Egele. Saphire: Sandboxing php applications with tailored system call allowlists. In Proceedings of the 30th USENIX Security Symposium, 2021.
[6]
Johannes Dahse and Thorsten Holz. Simulation of built-in php features for precise static code analysis. In Proceedings of Network and Distributed System Security Symposium, 2014.
[7]
Apache Software Foundation. Log files - apache http server. https://httpd.apache.org/docs/2.4/logs.html, 2021.
[8]
Seyedhamed Ghavamnia, Tapti Palit, Azzedine Benameur, and Michalis Polychronakis. Confine: Automated system call policy generation for container attack surface reduction. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses, 2020.
[9]
Seyedhamed Ghavamnia, Tapti Palit, Shachee Mishra, and Michalis Polychronakis. Temporal system call specialization for attack surface reduction. In Proceedings of the 29th USENIX Security Symposium, 2020.
[10]
David Grove, Greg DeFouw, Jeffrey Dean, and Craig Chambers. Call graph construction in object-oriented languages. In Proceedings of the 12th ACM SIGPLAN conference on Object-oriented programming, Systems, Languages, and Applications, 1997.
[11]
Kihong Heo, Woosuk Lee, Pardis Pashakhanloo, and Mayur Naik. Effective program debloating via reinforcement learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2018.
[12]
Mark Hills, Paul Klint, and Jurgen Vinju. An empirical study of php featuers usage: a static analysis perspective. In Proceedings of the International Symposium on Software Testing and Analysis, 2013.
[13]
Rasoul Jahanshahi, Adam Doupé, and Manuel Egele. You shall not pass: Mitigating sql injection attacks on legacy web applications. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, 2020.
[14]
Nenad Jovanovic, Christopher Kruegel, and Engin Kirda. Pixy: A static analysis tool for detecting web application vulnerabilities. In IEEE Symposium on Security and Privacy, 2006.
[15]
Igibek Koishybayev and Alexandros Kapravelos. Mininode: Reducing the attack surface of node. js applications. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses, 2020.
[16]
Hyungjoon Koo, Seyedhamed Ghavamnia, and Michalis Polychronakis. Configuration-driven software debloating. In Proceedings of the 12th European Workshop on Systems Security, 2019.
[17]
Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ond?rej Lhoták, J Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z Guyer, Uday P Khedker, Anders Møller, and Dimitrios Vardoulakis. In defense of soundiness: A manifesto. Communications of the ACM, 2015.
[18]
Pratyusa K Manadhata and Jeannette M Wing. An attack surface metric. IEEE Transactions on Software Engineering, 2010.
[19]
Steve McConnell. Code complete. Pearson Education, 2004.
[20]
Shachee Mishra and Michalis Polychronakis. Shredder: Breaking exploits through api specialization. In Proceedings of the 34th Annual Computer Security Applications Conference, 2018.
[21]
Shachee Mishra and Michalis Polychronakis. Saffire: Context-sensitive function specialization and hardening against code reuse attacks. In Proceedings of the IEEE European Symposium on Security & Privacy, 2020.
[22]
Shachee Mishra and Michalis Polychronakis. Sgxpecial: Specializing sgx interfaces against code reuse attacks. In Proceedings of the 14th EuropeanWorkshop on Systems Security, 2021.
[23]
National Institute of Standards and Technology. Nvd - vulnerability metrics. https://nvd.nist.gov/vuln-metrics/cvss, 2021.
[24]
PHP. Php built-in functions and methods. https://www.php.net/manual/en/indexes.functions.php, 2021.
[25]
PHP. Variable scope. https://www.php.net/manual/en/language.variables.scope.php, 2022.
[26]
Chenxiong Qian, Hyungjoon Koo, ChangSeok Oh, Taesoo Kim, and Wenke Lee. Slimium: Debloating the chromium browser with feature subsetting. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2020.
[27]
Anh Quach, Aravind Prakash, and Lok Yan. Debloating software through piece-wise compilation and loading. In Proceedings of the 27th USENIX Security Symposium, 2018.
[28]
Nilo Redini, Ruoyu Wang, Aravind Machiry, Yan Shoshitaishvili, Giovanni Vigna, and Christopher Kruegel. Bintrimmer: Towards static binary debloating through abstract interpretation. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, 2019.
[29]
Vadym Slizov. php-parser. https://github.com/z7zmey/php-parser, 2019.
[30]
Peter Snyder, Cynthia Taylor, and Chris Kanich. Most websites don't need to vibrate: A cost-benefit approach to improving browser security. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2017.
[31]
W3Tech. Usage statistics and market share of content management systems. https://w3techs.com/technologies/overview/content_management, 2021.
[32]
W3Tech. Usage statistics and market share of server-side programming language. https://w3techs.com/technologies/overview/programming_language, 2021.
[33]
WordPress. Wordpress developer resources. https://wordpress.org/plugins/browse/featured/, 2022.
Recommendations
Minimalist grammars and minimalist categorial grammars: toward inclusion of generated languages
Logic and grammarStabler proposes an implementation of the Chomskyan Minimalist Program [1] with Minimalist Grammars (MG) [2]. This framework inherits a long linguistic tradition. But the semantic calculus is more easily added if one uses the Curry-Howard isomorphism. ...
Minimalist Grammar Transition-Based Parsing
Logical Aspects of Computational Linguistics. Celebrating 20 Years of LACL (1996–2016)AbstractCurrent chart-based parsers of Minimalist Grammars exhibit prohibitively high polynomial complexity that makes them unusable in practice. This paper presents a transition-based parser for Minimalist Grammars that approximately searches through the ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Copyright © 2023 The USENIX Association.
Sponsors
- Meta
- Google Inc.
- NSF
- IBM
- Futurewei Technologies
Publisher
USENIX Association
United States
Publication History
Published: 09 August 2023
Qualifiers
- Research-article
- Research
- Refereed limited
Acceptance Rates
Overall Acceptance Rate 40 of 100 submissions, 40%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 14 Dec 2024