Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

FeatMaker: Automated Feature Engineering for Search Strategy of Symbolic Execution

Published: 12 July 2024 Publication History

Abstract

We present FeatMaker, a novel technique that automatically generates state features to enhance the search strategy of symbolic execution. Search strategies, designed to address the well-known state-explosion problem, prioritize which program states to explore. These strategies typically depend on a ”state feature” that describes a specific property of program states, using this feature to score and rank them. Recently, search strategies employing multiple state features have shown superior performance over traditional strategies that use a single, generic feature. However, the process of designing these features remains largely manual. Moreover, manually crafting state features is both time-consuming and prone to yielding unsatisfactory results. The goal of this paper is to fully automate the process of generating state features for search strategies from scratch. The key idea is to leverage path-conditions, which are basic but vital information maintained by symbolic execution, as state features. A challenge arises when employing all path-conditions as state features, as it results in an excessive number of state features. To address this, we present a specialized algorithm that iteratively generates and refines state features based on data accumulated during symbolic execution. Experimental results on 15 open-source C programs show that FeatMaker significantly outperforms existing search strategies that rely on manually-designed features, both in terms of branch coverage and bug detection. Notably, FeatMaker achieved an average of 35.3% higher branch coverage than state-of-the-art strategies and discovered 15 unique bugs. Of these, six were detected exclusively by FeatMaker.

References

[1]
J.H. Andrews, L.C. Briand, Y. Labiche, and A.S. Namin. 2006. Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria. IEEE Transactions on Software Engineering, 32, 8 (2006), 608–624. https://doi.org/10.1109/TSE.2006.83
[2]
Andrea Arcuri and Gordon Fraser. 2011. On parameter tuning in search based software engineering. In International Symposium on Search Based Software Engineering. 33–47.
[3]
Andrea Arcuri and Gordon Fraser. 2013. Parameter tuning or default values? An empirical investigation in search-based software engineering. Empirical Software Engineering, 594–623.
[4]
Andrea Arcuri and Xin Yao. 2008. Search based software testing of object-oriented containers. Information Sciences, 3075–3095.
[5]
Arthur Baars, Mark Harman, Youssef Hassoun, Kiran Lakhotia, Phil McMinn, Paolo Tonella, and Tanja Vos. 2011. Symbolic search-based testing. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011) (ASE ’11). 53–62.
[6]
Peter Boonstoppel, Cristian Cadar, and Dawson Engler. 2008. RWset: Attacking path explosion in constraint-based test generation. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS ’08). 351–366.
[7]
Suhabe Bugrara and Dawson Engler. 2013. Redundant State Detection for Dynamic Symbolic Execution. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (USENIX ATC’13). 199–212.
[8]
Jacob Burnim and Koushik Sen. 2008. Heuristics for Scalable Dynamic Test Generation. In Proceedings of 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE ’08). 443–446.
[9]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI ’08). 209–224.
[10]
Cristian Cadar and Dawson Engler. 2005. Execution Generated Test Cases: How to Make Systems Code Crash Itself. In Proceedings of the 12th International Conference on Model Checking Software (SPIN’05). 2–23.
[11]
Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2008. EXE: Automatically Generating Inputs of Death. ACM Trans. Inf. Syst. Secur., 12, 2 (2008), 10:1–10:38. issn:1094-9224
[12]
Sooyoung Cha, Seongjoon Hong, Jiseong Bak, Jingyoung Kim, Junhee Lee, and Hakjoo Oh. 2022. Enhancing Dynamic Symbolic Execution by Automatically Learning Search Heuristics. IEEE Transactions on Software Engineering, 3640–3663.
[13]
Sooyoung Cha, Seongjoon Hong, Junhee Lee, and Hakjoo Oh. 2018. Automatically Generating Search Heuristics for Concolic Testing. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). 1244–1254.
[14]
Sooyoung Cha and Hakjoo Oh. 2020. Making Symbolic Execution Promising by Learning Aggressive State-Pruning Strategy. In The 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’20).
[15]
Thierry Titcheu Chekam, Mike Papadakis, and Yves Le Traon. 2019. Mart: a mutant generation tool for LLVM. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA. 1080–1084. isbn:9781450355728 https://doi.org/10.1145/3338906.3341180
[16]
Thierry Titcheu Chekam, Mike Papadakis, Yves Le Traon, and Mark Harman. 2017. An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, 597–608. isbn:9781538638682 https://doi.org/10.1109/ICSE.2017.61
[17]
CREST. A concolic test generation tool for C. 2008. https://github.com/jburnim/crest
[18]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS ’08). 337–340.
[19]
Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1978. Hints on test data selection: Help for the practicing programmer. Computer, 34–41.
[20]
Gordon Fraser and Andrea Arcuri. 2012. The seed is strong: Seeding strategies in search-based software testing. In 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation (ICST ’12). 121–130.
[21]
Vijay Ganesh and David L Dill. 2007. A decision procedure for bit-vectors and arrays. In International Conference on Computer Aided Verification (CAV ’07). 519–531.
[22]
Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’05). 213–223.
[23]
Patrice Godefroid, Michael Y Levin, and David A Molnar. 2008. Automated Whitebox Fuzz Testing. In Proceedings of the Symposium on Network and Distributed System Security (NDSS ’08). 151–166.
[24]
R.G. Hamlet. 1977. Testing Programs with the Aid of a Compiler. IEEE Transactions on Software Engineering, 279–290.
[25]
Mark Harman, Yue Jia, and Yuanyuan Zhang. 2015. Achievements, open problems and challenges for search based software testing. In 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST ’15).
[26]
Mark Harman and Bryan F Jones. 2001. Search-based software engineering. Information and software Technology, 833–839.
[27]
Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR), 1–61.
[28]
Mark Harman, Phil McMinn, Jerffeson Teixeira De Souza, and Shin Yoo. 2010. Search based software engineering: Techniques, taxonomy, tutorial. In Empirical software engineering and verification. 1–59.
[29]
Jingxuan He, Gishor Sivanrupan, Petar Tsankov, and Martin Vechev. 2021. Learning to Explore Paths for Symbolic Execution. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21). 2526–2540.
[30]
Joxan Jaffar, Vijayaraghavan Murali, and Jorge A. Navas. 2013. Boosting Concolic Testing via Interpolation. In Proceedings of the 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’13). 48–58.
[31]
Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. IEEE Transactions on Software Engineering (TSE), 649–678.
[32]
David S. Johnson. 1973. Approximation Algorithms for Combinatorial Problems. In Proceedings of the Fifth Annual ACM Symposium on Theory of Computing. 38–49.
[33]
Richard M Karp. 1972. Reducibility among combinatorial problems. In Complexity of computer computations. 85–103.
[34]
K. Krishna and M. Narasimha Murty. 1999. Genetic K-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 433–439.
[35]
You Li, Zhendong Su, Linzhang Wang, and Xuandong Li. 2013. Steering Symbolic Execution to Less Traveled Paths. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages, and Applications (OOPSLA ’13). 19–32.
[36]
Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 50–60.
[37]
Phil McMinn. 2011. Search-based software testing: Past, present and future. In 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops. 153–163.
[38]
OSDI’08 Coreutil Experiments. 2008. https://klee.github.io/docs/coreutils-experiments
[39]
Sangmin Park, B. M. Mainul Hossain, Ishtiaque Hussain, Christoph Csallner, Mark Grechanik, Kunal Taneja, Chen Fu, and Qing Xie. 2012. CarFast: Achieving Higher Statement Coverage Faster. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE ’12). 35:1–35:11.
[40]
Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A Concolic Unit Testing Engine for C. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE ’05). 263–272.
[41]
Hyunmin Seo and Sunghun Kim. 2014. How We Get There: A Context-guided Search Strategy in Concolic Testing. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’14).
[42]
ParaDySE. A tool for automatically generating search strategy of symbolic execution. 2022. https://github.com/kupl/dd-klee/tree/master/paradyse
[43]
Gcov. A tool for measuring coverage. 2021. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html
[44]
David Trabish, Andrea Mattavelli, Noam Rinetzky, and Cristian Cadar. 2018. Chopped Symbolic Execution. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). 350–360.
[45]
Xinyu Wang, Jun Sun, Zhenbang Chen, Peixin Zhang, Jingyi Wang, and Yun Lin. 2018. Towards Optimal Concolic Testing. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). 291–302.
[46]
Tao Xie, Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte. 2009. Fitness-guided path exploration in dynamic symbolic execution. In 2009 IEEE/IFIP International Conference on Dependable Systems Networks. 359–368.
[47]
Qiuping Yi, Zijiang Yang, Shengjian Guo, Chao Wang, Jian Liu, and Chen Zhao. 2018. Eliminating Path Redundancy via Postconditioned Symbolic Execution. IEEE Transactions on Software Engineering, 25–43.
[48]
Lingming Zhang, Milos Gligoric, Darko Marinov, and Sarfraz Khurshid. 2013. Operator-based and random mutant selection: Better together. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 92–102. https://doi.org/10.1109/ASE.2013.6693070
[49]
Lu Zhang, Shan-Shan Hou, Jun-Jue Hu, Tao Xie, and Hong Mei. 2010. Is operator-based mutant selection superior to random mutant selection? In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE ’10). Association for Computing Machinery, New York, NY, USA. 435–444. isbn:9781605587196

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Software Engineering
Proceedings of the ACM on Software Engineering  Volume 1, Issue FSE
July 2024
2770 pages
EISSN:2994-970X
DOI:10.1145/3554322
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2024
Published in PACMSE Volume 1, Issue FSE

Badges

Author Tags

  1. Software Testing
  2. Symbolic Execution

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 236
    Total Downloads
  • Downloads (Last 12 months)236
  • Downloads (Last 6 weeks)88
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media