research-article

Open access

Input splitting for cloud-based static application security testing platforms

Authors:

Maria Christakis,

Thomas Cottenier,

Antonio Filieri,

Muhammad Numair Mansur,

Nicolás Rosner,

Aritra Sengupta,

Willem VisserAuthors Info & Claims

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 1367 - 1378

https://doi.org/10.1145/3540250.3558944

Published: 09 November 2022 Publication History

Abstract

As software development teams adopt DevSecOps practices, application security is increasingly the responsibility of development teams, who are required to set up their own Static Application Security Testing (SAST) infrastructure.

Since development teams often do not have the necessary infrastructure and expertise to set up a custom SAST solution, there is an increased need for cloud-based SAST platforms that operate as a service and run a variety of static analyzers. Adding a new static analyzer to a cloud-based SAST platform can be challenging because static analyzers greatly vary in complexity, from linters that scale efficiently to interprocedural dataflow engines that use cubic or even more complex algorithms. Careful manual evaluation is needed to decide whether a new analyzer would slow down the overall response time of the platform or may timeout too often.

We explore the question of whether this can be simplified by splitting the input to the analyzer into partitions and analyzing the partitions independently. Depending on the complexity of the static analyzer, the partition size can be adjusted to curtail the overall response time. We report on an experiment where we run different analysis tools with and without splitting the inputs. The experimental results show that simple splitting strategies can effectively reduce the running time and memory usage per partition without significantly affecting the findings produced by the tool.

References

[1]

Aws Albarghouthi, Rahul Kumar, Aditya V. Nori, and Sriram K. Rajamani. 2012. Parallelizing top-down interprocedural analyses. In PLDI. ACM, 217–228. https://doi.org/10.1145/2254064.2254091

Digital Library

[2]

Steven Arzt and Eric Bodden. 2014. Reviser: Efficiently Updating IDE-/IFDS-Based Data-Flow Analyses in Response to Incremental Program Changes. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). Association for Computing Machinery, New York, NY, USA. 288–298. isbn:9781450327565 https://doi.org/10.1145/2568225.2568243

Digital Library

[3]

Steven Arzt and Eric Bodden. 2016. StubDroid: Automatic Inference of Precise Data-Flow Summaries for the Android Framework. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 725–735. https://doi.org/10.1145/2884781.2884816

Digital Library

[4]

Python Code Quality Authority. 2008. Bandit. https://bandit.readthedocs.io/en/latest/

[5]

Brenda S Baker and Edward G Coffman, Jr. 1981. A tight asymptotic bound for next-fit-decreasing bin-packing. SIAM Journal on Algebraic Discrete Methods, 2, 2 (1981), 147–152. https://doi.org/10.1137/0602019

Digital Library

[6]

Vipin Balachandran. 2013. Reducing Human Effort and Improving Quality in Peer Code Reviews Using Automatic Static Analysis and Reviewer Recommendation. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press, 931–940. isbn:9781467330763 https://doi.org/10.1109/ICSE.2013.6606642

[7]

Jiri Barnat, Lubos Brim, and Jitka Stríbrná. 2001. Distributed LTL Model-Checking in SPIN. In SPIN (LNCS, Vol. 2057). Springer, 200–216. https://doi.org/10.1007/3-540-45139-0_13

[8]

Osbert Bastani, Saswat Anand, and Alex Aiken. 2015. Specification Inference Using Context-Free Language Reachability. In POPL. ACM, 553–566. https://doi.org/10.1145/2676726.2676977

Digital Library

[9]

Andrew Binstock. 2022. Gitleaks: a SAST tool for detecting and preventing hardcoded secrets like passwords, api keys, and tokens in git repositories. https://blogs.oracle.com/javamagazine/post/java-class-file-constant-pool

[10]

Martin Blais. 2007. Snakefood. https://furius.ca/snakefood/doc/snakefood-doc.html

[11]

Cristiano Calcagno, Dino Distefano, Jérémy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter W. O’Hearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez. 2015. Moving Fast with Software Verification. In NFM (LNCS, Vol. 9058). Springer, 3–11. https://doi.org/10.1007/978-3-319-17524-9_1

[12]

Justin Collins. 2022. Brakeman: a static vulnerability scanner specifically designed for Ruby on Rails applications. https://brakemanscanner.org/

[13]

Christopher L. Conway, Kedar S. Namjoshi, Dennis Dams, and Stephen A. Edwards. 2005. Incremental Algorithms for Inter-procedural Analysis of Safety Properties. In Computer Aided Verification, Kousha Etessami and Sriram K. Rajamani (Eds.). Springer, 449–461. isbn:978-3-540-31686-2 https://doi.org/10.1007/11513988_45

Digital Library

[14]

Utkarsh Desai, Sambaran Bandyopadhyay, and Srikanth Tamilselvam. 2021. Graph Neural Network to Dilute Outliers for Refactoring Monolith Application. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 72–80. https://ojs.aaai.org/index.php/AAAI/article/view/16079

[15]

Lisa Nguyen Quang Do, Karim Ali, Benjamin Livshits, Eric Bodden, Justin Smith, and Emerson Murphy-Hill. 2017. Just-in-Time Static Analysis. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2017). Association for Computing Machinery, New York, NY, USA. 307–317. isbn:9781450350761 https://doi.org/10.1145/3092703.3092705

Digital Library

[16]

Michael Emmi, Liana Hadarean, Ranjit Jhala, Lee Pike, Nicolás Rosner, Martin Schäf, Aritra Sengupta, and Willem Visser. 2021. RAPID: Checking API Usage for the Cloud in the Cloud. ACM, New York, NY, USA. 1416–1426. isbn:9781450385626 https://doi.org/10.1145/3468264.3473934

Digital Library

[17]

Martin DeMello et al. 2017. Importlab. https://github.com/google/importlab

[18]

Cormac Flanagan, K Rustan M Leino, Mark Lillibridge, Greg Nelson, James B Saxe, and Raymie Stata. 2002. Extended static checking for Java. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation. 234–245. https://doi.org/10.1145/512529.512558

Digital Library

[19]

Jonas Fritzsch, Justus Bogner, Alfred Zimmermann, and Stefan Wagner. 2018. From Monolith to Microservices: A Classification of Refactoring Approaches. In Software Engineering Aspects of Continuous Development and New Paradigms of Software Production and Deployment - First International Workshop, DEVOPS 2018, Chateau de Villebrumier, France, March 5-6, 2018, Revised Selected Papers, Jean-Michel Bruel, Manuel Mazzara, and Bertrand Meyer (Eds.) (Lecture Notes in Computer Science, Vol. 11350). Springer, 128–141. https://doi.org/10.1007/978-3-030-06019-0_10

[20]

Diego Garbervetsky, Edgardo Zoppi, and Benjamin Livshits. 2017. Toward Full Elasticity in Distributed Static Analysis: The Case of Callgraph Analysis. ESEC/FSE 2017. Association for Computing Machinery, New York, NY, USA. 442–453. isbn:9781450351058 https://doi.org/10.1145/3106237.3106261

Digital Library

[21]

Emmanuel Geay, Eran Yahav, and Stephen Fink. 2006. Continuous code-quality assurance with SAFE. In Proceedings of the 2006 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation. 145–149. https://doi.org/10.1145/1111542.1111567

Digital Library

[22]

Orna Grumberg, Tamir Heyman, Nili Ifergan, and Assaf Schuster. 2005. Achieving Speedups in Distributed Symbolic Reachability Analysis Through Asynchronous Computation. In IFIP (LNCS, Vol. 3725). Springer, 129–145. https://doi.org/10.1007/11560548_12

Digital Library

[23]

Nevin Heintze and David A. McAllester. 1997. On the Cubic Bottleneck in Subtyping and Flow Analysis. In LICS. IEEE, 342–351. https://doi.org/10.1109/LICS.1997.614960

[24]

Susan Horwitz, Thomas W. Reps, and David W. Binkley. 1988. Interprocedural Slicing Using Dependence Graphs. In PLDI. ACM, 35–46. https://doi.org/10.1145/1328438.1328464

Digital Library

[25]

Di Jin, Zhizhi Yu, Pengfei Jiao, Shirui Pan, Dongxiao He, Jia Wu, Philip Yu, and Weixiong Zhang. 2021. A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning. IEEE Transactions on Knowledge and Data Engineering, https://doi.org/10.1109/TKDE.2021.3104155

[26]

David S Johnson. 1973. Near-optimal bin packing algorithms. Ph.D. Dissertation. Massachusetts Institute of Technology.

[27]

Anup K. Kalia, Jin Xiao, Rahul Krishna, Saurabh Sinha, Maja Vukovic, and Debasish Banerjee. 2021. Mono2Micro: A Practical and Effective Tool for Decomposing Monolithic Java Applications to Microservices. Association for Computing Machinery, New York, NY, USA. 1214–1224. isbn:9781450385626 https://doi.org/10.1145/3468264.3473915

Digital Library

[28]

John Kodumal and Alexander Aiken. 2004. The set constraint/CFL reachability connection in practice. In PLDI. ACM, 207–218. https://doi.org/10.1145/996841.996867

Digital Library

[29]

Rahul Kumar and Eric G. Mercer. 2005. Load Balancing Parallel Explicit State Model Checking. ENTCS, 128 (2005), 19–34. https://doi.org/10.1016/j.entcs.2004.10.016

Digital Library

[30]

James A. Kupsch, Barton P. Miller, Vamshi Basupalli, and Josef Burger. 2017. From continuous integration to continuous assurance. In 2017 IEEE 28th Annual Software Technology Conference (STC). 1–8. https://doi.org/10.1109/STC.2017.8234450

[31]

Yi Lu, Lei Shang, Xinwei Xie, and Jingling Xue. 2013. An Incremental Points-to Analysis with CFL-Reachability. In CC (LNCS, Vol. 7791). Springer, 61–81. https://doi.org/10.1007/978-3-642-37051-9_4

Digital Library

[32]

Silvano Martello and Paolo Toth. 1990. Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc.

[33]

Maven. 2022. List of Maven Packages. https://gist.github.com/linghuiluo/1b82866051e4c4ebb0fd065549f60100

[34]

Genc Mazlami, Jürgen Cito, and Philipp Leitner. 2017. Extraction of Microservices from Monolithic Software Architectures. In 2017 IEEE International Conference on Web Services, ICWS 2017, Honolulu, HI, USA, June 25-30, 2017, Ilkay Altintas and Shiping Chen (Eds.). IEEE, 524–531. https://doi.org/10.1109/ICWS.2017.61

[35]

Mario Méndez-Lojo, Augustine Mathew, and Keshav Pingali. 2010. Parallel inclusion-based points-to analysis. In OOPSLA. ACM, 428–443. https://doi.org/10.1145/1869459.1869495

Digital Library

[36]

Meta. 2022. Infer: a static analysis platform for Java, C, and Objective-C. https://fbinfer.com/docs/about-Infer

[37]

Mangala Gowri Nanda, Monika Gupta, Saurabh Sinha, Satish Chandra, David Schmidt, and Pradeep Balachandran. 2010. Making Defect-Finding Tools Work for You. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2 (ICSE ’10). Association for Computing Machinery, New York, NY, USA. 99–108. isbn:9781605587196 https://doi.org/10.1145/1810295.1810310

Digital Library

[38]

Mangala Gowri Nanda and Saurabh Sinha. 2009. Accurate interprocedural null-dereference analysis for Java. In 2009 IEEE 31st International Conference on Software Engineering. 133–143. https://doi.org/10.1109/ICSE.2009.5070515

Digital Library

[39]

NIST. 2022. Juliet Test Suite for Java. https://samate.nist.gov/SRD/testsuite.php

[40]

Oracle. 2022. JDeps - Java Platform, Standard Edition Tools Reference. https://docs.oracle.com/javase/9/tools/jdeps.htm

[41]

OWASP. 2022. FindSecBugs: the SpotBugs plugin for security audits of Java web applications. https://find-sec-bugs.github.io/

[42]

OWASP. 2022. OWASP. https://owasp.org/www-project-benchmark/

[43]

Praetorian, Inc. 2021. Gokart: a security-oriented static analysis for Golang with a focus on minimizing false positives. https://github.com/praetorian-inc/gokart/

[44]

Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise Interprocedural Dataflow Analysis via Graph Reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’95). Association for Computing Machinery, New York, NY, USA. 49–61. isbn:0897916921 https://doi.org/10.1145/199448.199462

Digital Library

[45]

Thomas W. Reps. 1997. Program Analysis via Graph Reachability. In ISLP. MIT, 5–19. https://doi.org/10.7551/mitpress/4283.003.0008

[46]

Zachary Rice. 2018. Understanding the constant pool inside a Java class file. https://github.com/zricethezav/gitleaks/

[47]

Jonathan Rodriguez and Ondrej Lhoták. 2011. Actor-Based Parallel Dataflow Analysis. In CC (LNCS, Vol. 6601). Springer, 179–197. https://doi.org/10.1007/978-3-642-19861-8_11

[48]

Atanas Rountev, Mariana Sharp, and Guoqing Xu. 2008. IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries. In Compiler Construction, Laurie Hendren (Ed.). Springer, 53–68. isbn:978-3-540-78791-4 https://doi.org/10.1007/978-3-540-78791-4_4

[49]

Caitlin Sadowski, Jeffrey van Gogh, Ciera Jaspan, Emma Söderberg, and Collin Winter. 2015. Tricorder: Building a Program Analysis Ecosystem. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, 598–608. isbn:9781479919345 https://doi.org/10.1109/ICSE.2015.76

[50]

Amazon Web Services. 2022. Elastic Compute Cloud (EC2) Pricing. https://aws.amazon.com/ec2/pricing/

[51]

Gagandeep Singh, Markus Püschel, and Martin T. Vechev. 2017. Fast polyhedra abstract domain. In POPL. ACM, 46–59. https://doi.org/10.1145/3009837.3009885

Digital Library

[52]

SonarSource, S.A. 2008. Sonarqube: a Static Application Security Testing (SAST) solution to detect security issues in code review. https://www.sonarqube.org/features/security/

[53]

Yu Su, Ding Ye, and Jingling Xue. 2014. Parallel Pointer Analysis with CFL-Reachability. In ICPP. IEEE Computer Society, 451–460. https://doi.org/10.1109/ICPP.2014.54

Digital Library

[54]

David Trabish, Andrea Mattavelli, Noam Rinetzky, and Cristian Cadar. 2018. Chopped symbolic execution. In ICSE. ACM, 350–360. https://doi.org/10.1145/3180155.3180251

Digital Library

[55]

Jens Van der Plas, Quentin Stiévenart, Noah Van Es, and Coen De Roover. 2020. Incremental Flow Analysis through Computational Dependency Reification. In 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM). 25–36. https://doi.org/10.1109/SCAM51674.2020.00008

[56]

Dimitrios Vardoulakis and Olin Shivers. 2010. CFA2: A Context-Free Approach to Control-Flow Analysis. In ESOP (LNCS, Vol. 6012). Springer, 570–589. https://doi.org/10.1007/978-3-642-11957-6_30

Digital Library

[57]

Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A Single-Machine Disk-Based Graph System for Interprocedural Static Analyses of Large-Scale Systems Code. In ASPLOS. ACM, 389–404. https://doi.org/10.1145/3037697.3037744

Digital Library

[58]

Hao Yuan and Patrick Th. Eugster. 2009. An Efficient Algorithm for Solving the Dyck-CFL Reachability Problem on Trees. In ESOP (LNCS, Vol. 5502). Springer, 175–189. https://doi.org/10.1007/978-3-642-00590-9_13

Digital Library

[59]

Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In POPL. ACM, 197–208. https://doi.org/10.1145/1328438.1328464

Digital Library

[60]

Zhiqiang Zuo, Rong Gu, Xi Jiang, Zhaokang Wang, Yihua Huang, Linzhang Wang, and Xuandong Li. 2019. BigSpa: An Efficient Interprocedural Static Analysis Engine in the Cloud. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 771–780. https://doi.org/10.1109/IPDPS.2019.00086

[61]

Zhiqiang Zuo, John Thorpe, Yifei Wang, Qiuhong Pan, Shenming Lu, Kai Wang, Guoqing Harry Xu, Linzhang Wang, and Xuandong Li. 2019. Grapple: A Graph System for Static Finite-State Property Checking of Large-Scale Systems Code. In EuroSys. ACM. https://doi.org/10.1145/3302424.3303972

Digital Library

Cited By

Prates LPereira R(2024)DevSecOps practices and toolsInternational Journal of Information Security10.1007/s10207-024-00914-z24:1Online publication date: 5-Nov-2024
https://doi.org/10.1007/s10207-024-00914-z

Index Terms

Input splitting for cloud-based static application security testing platforms
1. General and reference

Index terms have been assigned to the content through auto-classification.

Recommendations

RAPID: checking API usage for the cloud in the cloud
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

We present RAPID, an industrial-strength analysis developed at AWS that aims to help developers by providing automatic, fast and actionable feedback about correct usage of cloud-service APIs. RAPID’s design is based on the insight that cloud service ...
Compositional Taint Analysis for Enforcing Security Policies at Scale
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Automated static dataflow analysis is an effective technique for detecting security critical issues like sensitive data leak, and vulnerability to injection attacks. Ensuring high precision and recall requires an analysis that is context, field and ...
Tracking pointers with path and context sensitivity for bug detection in C programs

This paper proposes a pointer alias analysis for automatic error detection. State-of-the-art pointer alias analyses are either too slow or too imprecise for finding errors in real-life programs. We propose a hybrid pointer analysis that tracks actively ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 2022

1822 pages

ISBN:9781450394130

DOI:10.1145/3540250

General Chair:
Abhik Roychoudhury
National University of Singapore, Singapore
,
Program Chairs:
Cristian Cadar
Imperial College London, UK
,
Miryung Kim
University of California at Los Angeles, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '22

Sponsor:

ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 14 - 18, 2022

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
362
Total Downloads

Downloads (Last 12 months)203
Downloads (Last 6 weeks)17

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Prates LPereira R(2024)DevSecOps practices and toolsInternational Journal of Information Security10.1007/s10207-024-00914-z24:1Online publication date: 5-Nov-2024
https://doi.org/10.1007/s10207-024-00914-z

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents