research-article

Open access

FeatMaker: Automated Feature Engineering for Search Strategy of Symbolic Execution

Authors:

Sooyoung ChaAuthors Info & Claims

Proceedings of the ACM on Software Engineering, Volume 1, Issue FSE

Article No.: 108, Pages 2447 - 2468

https://doi.org/10.1145/3660815

Published: 12 July 2024 Publication History

Abstract

We present FeatMaker, a novel technique that automatically generates state features to enhance the search strategy of symbolic execution. Search strategies, designed to address the well-known state-explosion problem, prioritize which program states to explore. These strategies typically depend on a ”state feature” that describes a specific property of program states, using this feature to score and rank them. Recently, search strategies employing multiple state features have shown superior performance over traditional strategies that use a single, generic feature. However, the process of designing these features remains largely manual. Moreover, manually crafting state features is both time-consuming and prone to yielding unsatisfactory results. The goal of this paper is to fully automate the process of generating state features for search strategies from scratch. The key idea is to leverage path-conditions, which are basic but vital information maintained by symbolic execution, as state features. A challenge arises when employing all path-conditions as state features, as it results in an excessive number of state features. To address this, we present a specialized algorithm that iteratively generates and refines state features based on data accumulated during symbolic execution. Experimental results on 15 open-source C programs show that FeatMaker significantly outperforms existing search strategies that rely on manually-designed features, both in terms of branch coverage and bug detection. Notably, FeatMaker achieved an average of 35.3% higher branch coverage than state-of-the-art strategies and discovered 15 unique bugs. Of these, six were detected exclusively by FeatMaker.

References

[1]

J.H. Andrews, L.C. Briand, Y. Labiche, and A.S. Namin. 2006. Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria. IEEE Transactions on Software Engineering, 32, 8 (2006), 608–624. https://doi.org/10.1109/TSE.2006.83

Digital Library

[2]

Andrea Arcuri and Gordon Fraser. 2011. On parameter tuning in search based software engineering. In International Symposium on Search Based Software Engineering. 33–47.

[3]

Andrea Arcuri and Gordon Fraser. 2013. Parameter tuning or default values? An empirical investigation in search-based software engineering. Empirical Software Engineering, 594–623.

[4]

Andrea Arcuri and Xin Yao. 2008. Search based software testing of object-oriented containers. Information Sciences, 3075–3095.

[5]

Arthur Baars, Mark Harman, Youssef Hassoun, Kiran Lakhotia, Phil McMinn, Paolo Tonella, and Tanja Vos. 2011. Symbolic search-based testing. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011) (ASE ’11). 53–62.

[6]

Peter Boonstoppel, Cristian Cadar, and Dawson Engler. 2008. RWset: Attacking path explosion in constraint-based test generation. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS ’08). 351–366.

Digital Library

[7]

Suhabe Bugrara and Dawson Engler. 2013. Redundant State Detection for Dynamic Symbolic Execution. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (USENIX ATC’13). 199–212.

Digital Library

[8]

Jacob Burnim and Koushik Sen. 2008. Heuristics for Scalable Dynamic Test Generation. In Proceedings of 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE ’08). 443–446.

Digital Library

[9]

Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI ’08). 209–224.

Digital Library

[10]

Cristian Cadar and Dawson Engler. 2005. Execution Generated Test Cases: How to Make Systems Code Crash Itself. In Proceedings of the 12th International Conference on Model Checking Software (SPIN’05). 2–23.

Digital Library

[11]

Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. Engler. 2008. EXE: Automatically Generating Inputs of Death. ACM Trans. Inf. Syst. Secur., 12, 2 (2008), 10:1–10:38. issn:1094-9224

Digital Library

[12]

Sooyoung Cha, Seongjoon Hong, Jiseong Bak, Jingyoung Kim, Junhee Lee, and Hakjoo Oh. 2022. Enhancing Dynamic Symbolic Execution by Automatically Learning Search Heuristics. IEEE Transactions on Software Engineering, 3640–3663.

[13]

Sooyoung Cha, Seongjoon Hong, Junhee Lee, and Hakjoo Oh. 2018. Automatically Generating Search Heuristics for Concolic Testing. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). 1244–1254.

Digital Library

[14]

Sooyoung Cha and Hakjoo Oh. 2020. Making Symbolic Execution Promising by Learning Aggressive State-Pruning Strategy. In The 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’20).

Digital Library

[15]

Thierry Titcheu Chekam, Mike Papadakis, and Yves Le Traon. 2019. Mart: a mutant generation tool for LLVM. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA. 1080–1084. isbn:9781450355728 https://doi.org/10.1145/3338906.3341180

Digital Library

[16]

Thierry Titcheu Chekam, Mike Papadakis, Yves Le Traon, and Mark Harman. 2017. An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, 597–608. isbn:9781538638682 https://doi.org/10.1109/ICSE.2017.61

Digital Library

[17]

CREST. A concolic test generation tool for C. 2008. https://github.com/jburnim/crest

[18]

Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS ’08). 337–340.

[19]

Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. 1978. Hints on test data selection: Help for the practicing programmer. Computer, 34–41.

[20]

Gordon Fraser and Andrea Arcuri. 2012. The seed is strong: Seeding strategies in search-based software testing. In 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation (ICST ’12). 121–130.

[21]

Vijay Ganesh and David L Dill. 2007. A decision procedure for bit-vectors and arrays. In International Conference on Computer Aided Verification (CAV ’07). 519–531.

[22]

Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed Automated Random Testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’05). 213–223.

Digital Library

[23]

Patrice Godefroid, Michael Y Levin, and David A Molnar. 2008. Automated Whitebox Fuzz Testing. In Proceedings of the Symposium on Network and Distributed System Security (NDSS ’08). 151–166.

[24]

R.G. Hamlet. 1977. Testing Programs with the Aid of a Compiler. IEEE Transactions on Software Engineering, 279–290.

[25]

Mark Harman, Yue Jia, and Yuanyuan Zhang. 2015. Achievements, open problems and challenges for search based software testing. In 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST ’15).

[26]

Mark Harman and Bryan F Jones. 2001. Search-based software engineering. Information and software Technology, 833–839.

[27]

Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR), 1–61.

[28]

Mark Harman, Phil McMinn, Jerffeson Teixeira De Souza, and Shin Yoo. 2010. Search based software engineering: Techniques, taxonomy, tutorial. In Empirical software engineering and verification. 1–59.

[29]

Jingxuan He, Gishor Sivanrupan, Petar Tsankov, and Martin Vechev. 2021. Learning to Explore Paths for Symbolic Execution. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS ’21). 2526–2540.

Digital Library

[30]

Joxan Jaffar, Vijayaraghavan Murali, and Jorge A. Navas. 2013. Boosting Concolic Testing via Interpolation. In Proceedings of the 9th Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’13). 48–58.

[31]

Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. IEEE Transactions on Software Engineering (TSE), 649–678.

[32]

David S. Johnson. 1973. Approximation Algorithms for Combinatorial Problems. In Proceedings of the Fifth Annual ACM Symposium on Theory of Computing. 38–49.

Digital Library

[33]

Richard M Karp. 1972. Reducibility among combinatorial problems. In Complexity of computer computations. 85–103.

[34]

K. Krishna and M. Narasimha Murty. 1999. Genetic K-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 433–439.

[35]

You Li, Zhendong Su, Linzhang Wang, and Xuandong Li. 2013. Steering Symbolic Execution to Less Traveled Paths. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages, and Applications (OOPSLA ’13). 19–32.

Digital Library

[36]

Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 50–60.

[37]

Phil McMinn. 2011. Search-based software testing: Past, present and future. In 2011 IEEE Fourth International Conference on Software Testing, Verification and Validation Workshops. 153–163.

Digital Library

[38]

OSDI’08 Coreutil Experiments. 2008. https://klee.github.io/docs/coreutils-experiments

[39]

Sangmin Park, B. M. Mainul Hossain, Ishtiaque Hussain, Christoph Csallner, Mark Grechanik, Kunal Taneja, Chen Fu, and Qing Xie. 2012. CarFast: Achieving Higher Statement Coverage Faster. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE ’12). 35:1–35:11.

[40]

Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A Concolic Unit Testing Engine for C. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE ’05). 263–272.

[41]

Hyunmin Seo and Sunghun Kim. 2014. How We Get There: A Context-guided Search Strategy in Concolic Testing. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE ’14).

Digital Library

[42]

ParaDySE. A tool for automatically generating search strategy of symbolic execution. 2022. https://github.com/kupl/dd-klee/tree/master/paradyse

[43]

Gcov. A tool for measuring coverage. 2021. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html

[44]

David Trabish, Andrea Mattavelli, Noam Rinetzky, and Cristian Cadar. 2018. Chopped Symbolic Execution. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). 350–360.

Digital Library

[45]

Xinyu Wang, Jun Sun, Zhenbang Chen, Peixin Zhang, Jingyi Wang, and Yun Lin. 2018. Towards Optimal Concolic Testing. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). 291–302.

Digital Library

[46]

Tao Xie, Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte. 2009. Fitness-guided path exploration in dynamic symbolic execution. In 2009 IEEE/IFIP International Conference on Dependable Systems Networks. 359–368.

[47]

Qiuping Yi, Zijiang Yang, Shengjian Guo, Chao Wang, Jian Liu, and Chen Zhao. 2018. Eliminating Path Redundancy via Postconditioned Symbolic Execution. IEEE Transactions on Software Engineering, 25–43.

[48]

Lingming Zhang, Milos Gligoric, Darko Marinov, and Sarfraz Khurshid. 2013. Operator-based and random mutant selection: Better together. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 92–102. https://doi.org/10.1109/ASE.2013.6693070

Digital Library

[49]

Lu Zhang, Shan-Shan Hou, Jun-Jue Hu, Tao Xie, and Hong Mei. 2010. Is operator-based mutant selection superior to random mutant selection? In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE ’10). Association for Computing Machinery, New York, NY, USA. 435–444. isbn:9781605587196

Index Terms

FeatMaker: Automated Feature Engineering for Search Strategy of Symbolic Execution
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Concrete Constraint Guided Symbolic Execution
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Symbolic execution is a popular program analysis technique. It systematically explores all feasible paths of a program but its scalability is largely limited by the path explosion problem, which causes the number of paths proliferates at runtime. A key ...
Scaling symbolic execution using staged analysis

Recent advances in constraint solving technology and raw computation power have led to a substantial increase in the effectiveness of techniques based on symbolic execution for systematic bug finding. However, scaling symbolic execution remains a ...
Handling Memory-Intensive Operations in Symbolic Execution
ISEC '22: Proceedings of the 15th Innovations in Software Engineering Conference

Symbolic execution is a popular software testing technique that can help developers identify complex bugs in real-world applications. Unfortunately, symbolic execution may struggle at analyzing programs containing memory-intensive operations, such as ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Software Engineering

Proceedings of the ACM on Software Engineering Volume 1, Issue FSE

July 2024

2770 pages

EISSN:2994-970X

DOI:10.1145/3554322

Editor:
Luciano Baresi
Politecnico di Milano, Italy

Issue’s Table of Contents

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2024

Published in PACMSE Volume 1, Issue FSE

Badges

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
236
Total Downloads

Downloads (Last 12 months)236
Downloads (Last 6 weeks)88

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents