research-article

Grammar-agnostic symbolic execution by token symbolization

Authors:

Ji WangAuthors Info & Claims

ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 374 - 387

https://doi.org/10.1145/3460319.3464845

Published: 11 July 2021 Publication History

Abstract

Parsing code exists extensively in software. Symbolic execution of complex parsing programs is challenging. The inputs generated by the symbolic execution using the byte-level symbolization are usually rejected by the parsing program, which dooms the effectiveness and efficiency of symbolic execution. Complex parsing programs usually adopt token-based input grammar checking. A token sequence represents one case of the input grammar. Based on this observation, we propose grammar-agnostic symbolic execution that can automatically generate token sequences to test complex parsing programs effectively and efficiently. Our method's key idea is to symbolize tokens instead of input bytes to improve the efficiency of symbolic execution. Technically, we propose a novel two-stage algorithm: the first stage collects the byte-level constraints of token values; the second stage employs token symbolization and the constraints collected in the first stage to generate the program inputs that are more possible to pass the parsing code.

We have implemented our method on a Java Pathfinder (JPF) based concolic execution engine. The results of the extensive experiments on real-world Java parsing programs demonstrate the effectiveness and efficiency in testing complex parsing programs. Our method detects 6 unknown bugs in the benchmark programs and achieves orders of magnitude speedup to find the same bugs.

References

[1]

Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley.

Digital Library

[2]

Saswat Anand, Patrice Godefroid, and Nikolai Tillmann. 2008. Demand-Driven Compositional Symbolic Execution. In Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, C. R. Ramakrishnan and Jakob Rehof (Eds.) (Lecture Notes in Computer Science, Vol. 4963). Springer, 367–381. https://doi.org/10.1007/978-3-540-78800-3_28

[3]

Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing program input grammars. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Martin T. Vechev (Eds.). ACM, 95–110. https://doi.org/10.1145/3062341.3062349

Digital Library

[4]

Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, December 8-10, 2008, San Diego, California, USA, Proceedings, Richard Draves and Robbert van Renesse (Eds.). USENIX Association, 209–224.

Digital Library

[5]

Cristiano Calcagno, Dino Distefano, Peter W. O’Hearn, and Hongseok Yang. 2009. Compositional shape analysis by means of bi-abduction. In Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2009, Savannah, GA, USA, January 21-23, 2009, Zhong Shao and Benjamin C. Pierce (Eds.). ACM, 289–300. https://doi.org/10.1145/1480881.1480917

Digital Library

[6]

William Craig. 1957. Three uses of the Herbrand-Gentzen theorem in relating model theory and proof theory. Journal of Symbolic Logic, 22, 3 (1957), 269–285. https://doi.org/10.2307/2963594

[7]

Heming Cui, Gang Hu, Jingyue Wu, and Junfeng Yang. 2013. Verifying systems rules using rule-directed symbolic execution. In Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, Houston, TX, USA - March 16 - 20, 2013, Vivek Sarkar and Rastislav Bodík (Eds.). ACM, 329–342. https://doi.org/10.1145/2451116.2451152

Digital Library

[8]

Robin David, Sébastien Bardin, Josselin Feist, Laurent Mounier, Marie-Laure Potet, Thanh Dinh Ta, and Jean-Yves Marion. 2016. Specification of concretization and symbolization policies in symbolic execution. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, July 18-20, 2016, Andreas Zeller and Abhik Roychoudhury (Eds.). ACM, 36–46. https://doi.org/10.1145/2931037.2931048

Digital Library

[9]

Patrice Godefroid. 2007. Compositional dynamic test generation. In Proceedings of the 34th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2007, Nice, France, January 17-19, 2007, Martin Hofmann and Matthias Felleisen (Eds.). ACM, 47–54. https://doi.org/10.1145/1190216.1190226

Digital Library

[10]

Patrice Godefroid, Adam Kiezun, and Michael Y. Levin. 2008. Grammar-based whitebox fuzzing. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, Rajiv Gupta and Saman P. Amarasinghe (Eds.). ACM, 206–215. https://doi.org/10.1145/1375581.1375607

Digital Library

[11]

Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: directed automated random testing. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005, Vivek Sarkar and Mary W. Hall (Eds.). ACM, 213–223. https://doi.org/10.1145/1065010.1065036

Digital Library

[12]

Rahul Gopinath, Björn Mathis, and Andreas Zeller. 2020. Mining input grammars from dynamic control flow. In ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 172–183. https://doi.org/10.1145/3368089.3409679

Digital Library

[13]

Nikolas Havrikov. 2019. tribble 1.0.0. https://github.com/havrikov/tribble

[14]

Nikolas Havrikov and Andreas Zeller. 2019. Systematically Covering Input Structure. In 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11-15, 2019. IEEE, 189–199. https://doi.org/10.1109/ASE.2019.00027

Digital Library

[15]

Marc R Hoffmann, B Janiczak, and E Mandrikov. 2014. JaCoCo, version 0.6.5.201403032054. https://github.com/jacoco/jacoco

[16]

John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. 2007. Introduction to automata theory, languages, and computation, 3rd Edition. Addison-Wesley.

[17]

Karthick Jayaraman, David Harvison, Vijay Ganesh, and Adam Kiezun. 2009. jFuzz: A Concolic Whitebox Fuzzer for Java. In First NASA Formal Methods Symposium - NFM 2009, Moffett Field, California, USA, April 6-8, 2009, Ewen Denney, Dimitra Giannakopoulou, and Corina S. Pasareanu (Eds.) (NASA Conference Proceedings, Vol. NASA/CP-2009-215407). 121–125.

[18]

Yunho Kim, Shin Hong, and Moonzoo Kim. 2019. Target-driven compositional concolic testing with function summary refinement for effective bug detection. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). ACM, 16–26. https://doi.org/10.1145/3338906.3338934

Digital Library

[19]

James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM, 19, 7 (1976), 385–394. https://doi.org/10.1145/360248.360252

Digital Library

[20]

Daniel Kroening and Ofer Strichman. 2016. Decision Procedures - An Algorithmic Point of View, Second Edition. Springer. https://doi.org/10.1007/978-3-662-50497-0

[21]

Volodymyr Kuznetsov, Johannes Kinder, Stefan Bucur, and George Candea. 2012. Efficient state merging in symbolic execution. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012, Jan Vitek, Haibo Lin, and Frank Tip (Eds.). ACM, 193–204. https://doi.org/10.1145/2254064.2254088

Digital Library

[22]

Kin-Keung Ma, Yit Phang Khoo, Jeffrey S. Foster, and Michael Hicks. 2011. Directed Symbolic Execution. In Static Analysis - 18th International Symposium, SAS 2011, Venice, Italy, September 14-16, 2011. Proceedings, Eran Yahav (Ed.) (Lecture Notes in Computer Science, Vol. 6887). Springer, 95–111. https://doi.org/10.1007/978-3-642-23702-7_11

[23]

Rupak Majumdar and Ru-Gang Xu. 2007. Directed test generation using symbolic grammars. In 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2007), November 5-9, 2007, Atlanta, Georgia, USA, R. E. Kurt Stirewalt, Alexander Egyed, and Bernd Fischer (Eds.). ACM, 134–143. https://doi.org/10.1145/1321631.1321653

Digital Library

[24]

Björn Mathis, Rahul Gopinath, Michaël Mera, Alexander Kampmann, Matthias Höschele, and Andreas Zeller. 2019. Parser-directed fuzzing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019, Kathryn S. McKinley and Kathleen Fisher (Eds.). ACM, 548–560. https://doi.org/10.1145/3314221.3314651

Digital Library

[25]

Björn Mathis, Rahul Gopinath, and Andreas Zeller. 2020. Learning input tokens for effective fuzzing. In ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020, Sarfraz Khurshid and Corina S. Pasareanu (Eds.). ACM, 27–37. https://doi.org/10.1145/3395363.3397348

Digital Library

[26]

Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis, and Yves Le Traon. 2019. Semantic fuzzing with zest. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, Beijing, China, July 15-19, 2019, Dongmei Zhang and Anders Møller (Eds.). ACM, 329–340. https://doi.org/10.1145/3293882.3330576

Digital Library

[27]

Corina S. Pasareanu, Peter C. Mehlitz, David H. Bushnell, Karen Gundy-Burlet, Michael R. Lowry, Suzette Person, and Mark Pape. 2008. Combining unit-level symbolic execution and system-level concrete execution for testing NASA software. In Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2008, Seattle, WA, USA, July 20-24, 2008, Barbara G. Ryder and Andreas Zeller (Eds.). ACM, 15–26. https://doi.org/10.1145/1390630.1390635

Digital Library

[28]

Corina S. Pasareanu and Neha Rungta. 2010. Symbolic PathFinder: symbolic execution of Java bytecode. In ASE 2010, 25th IEEE/ACM International Conference on Automated Software Engineering, Antwerp, Belgium, September 20-24, 2010, Charles Pecheur, Jamie Andrews, and Elisabetta Di Nitto (Eds.). ACM, 179–180. https://doi.org/10.1145/1858996.1859035

Digital Library

[29]

José Fragoso Santos, Petar Maksimovic, Sacha-Élie Ayoun, and Philippa Gardner. 2020. Gillian: Compositional Symbolic Execution for All. CoRR, abs/2001.05059 (2020), arxiv:2001.05059

[30]

Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: a concolic unit testing engine for C. In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, Lisbon, Portugal, September 5-9, 2005, Michel Wermelinger and Harald C. Gall (Eds.). ACM, 263–272. https://doi.org/10.1145/1081706.1081750

[31]

Nastaran Shafiei and Franck van Breugel. 2014. Automatic handling of native methods in Java PathFinder. In 2014 International Symposium on Model Checking of Software, SPIN 2014, Proceedings, San Jose, CA, USA, July 21-23, 2014, Neha Rungta and Oksana Tkachuk (Eds.). ACM, 97–100. https://doi.org/10.1145/2632362.2632363

Digital Library

[32]

Nikolai Tillmann and Jonathan de Halleux. 2008. Pex-White Box Test Generation for .NET. In Tests and Proofs - 2nd International Conference, TAP 2008, Prato, Italy, April 9-11, 2008. Proceedings, Bernhard Beckert and Reiner Hähnle (Eds.) (Lecture Notes in Computer Science, Vol. 4966). Springer, 134–153. https://doi.org/10.1007/978-3-540-79124-9_10

[33]

Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-Driven Seed Generation for Fuzzing. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. IEEE Computer Society, 579–594. https://doi.org/10.1109/SP.2017.23

[34]

Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: grammar-aware greybox fuzzing. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 724–735. https://doi.org/10.1109/ICSE.2019.00081

Digital Library

[35]

Zhengkai Wu, Evan Johnson, Wei Yang, Osbert Bastani, Dawn Song, Jian Peng, and Tao Xie. 2019. REINAM: reinforcement learning for input-grammar inference. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). ACM, 488–498. https://doi.org/10.1145/3338906.3338958

Digital Library

[36]

Tao Xie, Nikolai Tillmann, Jonathan de Halleux, and Wolfram Schulte. 2009. Fitness-guided path exploration in dynamic symbolic execution. In Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2009, Estoril, Lisbon, Portugal, June 29 - July 2, 2009. IEEE Computer Society, 359–368. https://doi.org/10.1109/DSN.2009.5270315

[37]

Hengbiao Yu, Zhenbang Chen, Ji Wang, Zhendong Su, and Wei Dong. 2018. Symbolic verification of regular properties. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 871–881. https://doi.org/10.1145/3180155.3180227

Digital Library

[38]

Hengbiao Yu, Zhenbang Chen, Yufeng Zhang, Ji Wang, and Wei Dong. 2017. RGSE: a regular property guided symbolic executor for Java. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, Eric Bodden, Wilhelm Schäfer, Arie van Deursen, and Andrea Zisman (Eds.). ACM, 954–958. https://doi.org/10.1145/3106237.3122830

Digital Library

[39]

Andreas Zeller, Rahul Gopinath, Marcel Böhme, Gordon Fraser, and Christian Holler. 2019. The Fuzzing Book. In The Fuzzing Book.

[40]

Yufeng Zhang, Zhenbang Chen, Ji Wang, Wei Dong, and Zhiming Liu. 2015. Regular Property Guided Dynamic Symbolic Execution. In 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 1, Antonia Bertolino, Gerardo Canfora, and Sebastian G. Elbaum (Eds.). IEEE Computer Society, 643–653. https://doi.org/10.1109/ICSE.2015.80

Cited By

Bettscheider LZeller Ad'Amorim M(2024)Look Ma, No Input Samples! Mining Input Grammars from Code with Symbolic ParsingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663790(522-526)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663790
Tu HJiang HZhou ZTang YRen ZQiao LJiang L(2023)Detecting C++ Compiler Front-End Bugs via Grammar Mutation and Differential TestingIEEE Transactions on Reliability10.1109/TR.2022.317122072:1(343-357)Online publication date: Mar-2023
https://doi.org/10.1109/TR.2022.3171220

Index Terms

Grammar-agnostic symbolic execution by token symbolization
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation

Recommendations

Scaling symbolic execution using staged analysis

Recent advances in constraint solving technology and raw computation power have led to a substantial increase in the effectiveness of techniques based on symbolic execution for systematic bug finding. However, scaling symbolic execution remains a ...
Shadow Symbolic Execution with Java PathFinder

Regression testing ensures that a software system when it evolves still performs correctly and that the changes introduce no unintended side-effects. However, the creation of regression test cases that show divergent behavior needs a lot of effort. A ...
Verifying systems rules using rule-directed symbolic execution
ASPLOS '13

Systems code must obey many rules, such as "opened files must be closed." One approach to verifying rules is static analysis, but this technique cannot infer precise runtime effects of code, often emitting many false positives. An alternative is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2021

685 pages

ISBN:9781450384599

DOI:10.1145/3460319

General Chair:
Cristian Cadar
Imperial College London, UK
,
Program Chair:
Xiangyu Zhang
Purdue University, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Conference

ISSTA '21

Sponsor:

SIGSOFT

ISSTA '21: 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 11 - 17, 2021

Virtual, Denmark

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
246
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)6

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bettscheider LZeller Ad'Amorim M(2024)Look Ma, No Input Samples! Mining Input Grammars from Code with Symbolic ParsingCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663790(522-526)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663790
Tu HJiang HZhou ZTang YRen ZQiao LJiang L(2023)Detecting C++ Compiler Front-End Bugs via Grammar Mutation and Differential TestingIEEE Transactions on Reliability10.1109/TR.2022.317122072:1(343-357)Online publication date: Mar-2023
https://doi.org/10.1109/TR.2022.3171220

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents