research-article

FormatFuzzer: Effective Fuzzing of Binary File Formats

Authors:

Rahul Gopinath,

Andreas ZellerAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 2

Article No.: 53, Pages 1 - 29

https://doi.org/10.1145/3628157

Published: 22 December 2023 Publication History

Abstract

Effective fuzzing of programs that process structured binary inputs, such as multimedia files, is a challenging task, since those programs expect a very specific input format. Existing fuzzers, however, are mostly format-agnostic, which makes them versatile, but also ineffective when a specific format is required.

We present FormatFuzzer, a generator for format-specific fuzzers. FormatFuzzer takes as input a binary template (a format specification used by the 010 Editor) and compiles it into C++ code that acts as parser, mutator, and highly efficient generator of inputs conforming to the rules of the language.

The resulting format-specific fuzzer can be used as a standalone producer or mutator in black-box settings, where no guidance from the program is available. In addition, by providing mutable decision seeds, it can be easily integrated with arbitrary format-agnostic fuzzers such as AFL to make them format-aware. In our evaluation on complex formats such as MP4 or ZIP, FormatFuzzer showed to be a highly effective producer of valid inputs that also detected previously unknown memory errors in ffmpeg and timidity.

References

[1]

2021. Wikipedia: ffmpeg. Retrieved from https://en.wikipedia.org/wiki/FFmpeg. Accessed: 13 August 2021.

[2]

2021. Wikipedia: List of File Formats. Retrieved from https://en.wikipedia.org/wiki/List_of_file_formats. Accessed: 13 August 2021.

[3]

Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, Ahmad-Reza Sadeghi, and Daniel Teuchert. 2019. NAUTILUS: Fishing for deep bugs with grammars. In Proceedings of the NDSS 2019. Retrieved from https://www.ndss-symposium.org/ndss-paper/nautilus-fishing-for-deep-bugs-with-grammars/

[4]

Julian Bangert and Nickolai Zeldovich. 2014. Nail: A practical tool for parsing and generating data formats. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation. 615–628.

[5]

Tim Blazytko, Cornelius Aschermann, Moritz Schlögel, Ali Abbasi, Sergej Schumilo, Simon Wörner, and Thorsten Holz. 2019. \(\lbrace\)GRIMOIRE\(\rbrace\): Synthesizing structure while fuzzing. In Proceedings of the 28th Security Symposium. 1985–2002.

[6]

W. H. Burkhardt. 1967. Generating test programs from syntax. Computing 2, 1(1967), 53–73. DOI:DOI:

[7]

Yongheng Chen, Rui Zhong, Hong Hu, Hangfan Zhang, Yupeng Yang, Dinghao Wu, and Wenke Lee. 2021. One engine to fuzz ’em All: Generic language processor testing with semantic validation (to appear). In Proceedings of the 42nd IEEE Symposium on Security and Privacy. San Francisco, CA.

[8]

Zehan Chen, Yuliang Lu, Kailong Zhu, Lu Yu, and Jiazhen Zhao. 2022. Fast format-aware fuzzing for structured input applications. Applied Sciences 12, 18 (2022), 9350.

[9]

Noam Chomsky. 1956. Three models for the description of language. IRE Transactions on Information Theory 2, 3 (1956), 113–124. Retrieved from https://chomsky.info/wp-content/uploads/195609-.pdf

[10]

Koen Claessen and John Hughes. 2011. QuickCheck: A lightweight tool for random testing of Haskell programs. ACM SIGPLAN Notices 46, 4 (2011), 53–64.

Digital Library

[11]

Baojiang Cui, Shurui Liang, Shilei Chen, Bing Zhao, and Xiaobing Liang. 2014. A novel fuzzing method for Zigbee based on finite state machine. International Journal of Distributed Sensor Networks 10, 1 (2014), 762891.

[12]

James ”d0c_s4vage” Johnson. 2020. GitHub - d0c-s4vage/pfp: pfp - Python Format Parser - a Python-based 010 Editor Template Interpreter. Retrieved August 1, 2021 from https://github.com/d0c-s4vage/pfp. (2020).

[13]

James ”d0c_s4vage” Johnson. 2020. GitHub - d0c-s4vage/py010parser: A Modified Pycparser to Parse 010 Templates. Retrieved August 1, 2021 from https://github.com/d0c-s4vage/py010parser. (2020).

[14]

Oxford Brookes University (Second Edition) David Duce. 2003. Portable Network Graphics (PNG) Specification (Second Edition). Retrieved November 15, 2021 from https://www.w3.org/TR/PNG/. (2003).

[15]

Kyle Dewey, Jared Roesch, and Ben Hardekopf. 2014. Language fuzzing using constraint logic programming. Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. 725–730.

[16]

Stephen Dolan. 2021. Crowbar. Retrieved November 15, 2021 from https://github.com/stedolan/crowbar. (2021).

[17]

2018. Peach Fuzzer: Discover unknown vulnerabilities. Retrieved from https://www.peach.tech/. Accessed 29 August 2018.

[18]

Andrea Fioraldi, Daniele Cono D’Elia, and Emilio Coppa. 2020. WEIZZ: Automatic grey-box fuzzing for structured binary formats. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 1–13.

Digital Library

[19]

Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. AFL++: Combining incremental steps of fuzzing research. In Proceedings of the 14th USENIX Workshop on Offensive Technologies. USENIX Association.

[20]

Ivan Fratric. 2019. Domato A DOM Fuzzer. (2019). Retrieved from https://github.com/googleprojectzero/domato. Accessed: 13 August 2021.

[21]

Patrice Godefroid, Adam Kiezun, and Michael Y. Levin. 2008. Grammar-based whitebox fuzzing. ACM, New York, NY, USA, 206–215.

[22]

Harrison Goldstein and Benjamin C. Pierce. 2022. Parsing randomness. Proceedings of the ACM on Programming Languages6, OOPSLA (2022), 89–113.

[23]

Rahul Gopinath, Björn Mathis, and Andreas Zeller. 2020. Mining input grammars from dynamic control flow. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 172–183.

Digital Library

[24]

Claude Cordell Green. 1970. The Application of Theorem Proving to Question-answering Systems. Number 96. Management Information Services.

[25]

Tao Guo, Puhan Zhang, Xin Wang, and Qiang Wei. 2013. Gramfuzz: Fuzzing testing of web browsers based on grammar analysis and structural mutation. In Proceedings of the 2013 Second International Conference on Informatics & Applications. IEEE, 212–215.

[26]

Kenneth V. Hanford. 1970. Automatic generation of test cases. IBM Systems Journal 9, 4 (1970), 242–257. DOI:DOI:

Digital Library

[27]

Nikolas Havrikov and Andreas Zeller. 2019. Systematically covering input structure. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering.IEEE, 189–199. DOI:DOI:

Digital Library

[28]

Renáta Hodován, Ákos Kiss, and Tibor Gyimóthy. 2018. Grammarinator: A grammar-based open source fuzzer. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation. ACM, 45–48.

Digital Library

[29]

Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with code fragments. In Proceedings of the 21st USENIX Conference on Security Symposium.USENIX Association, Berkeley, CA, 38–38.

Digital Library

[30]

Leonidas Lampropoulos, Diane Gallois-Wong, Cătălin Hriţcu, John Hughes, Benjamin C Pierce, and Li-yao Xia. 2017. Beginner’s luck: A language for property-based generators. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages. 114–129.

Digital Library

[31]

Leonidas Lampropoulos, Michael Hicks, and Benjamin C. Pierce. 2019. Coverage guided, property based testing. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1–29.

Digital Library

[32]

Olivier Levillain. 2014. Parsifal: A pragmatic solution to the binary parsing problems. In Proceedings of the 2014 IEEE Security and Privacy Workshops. IEEE, 191–197.

Digital Library

[33]

Björn Mathis, Rahul Gopinath, Michaël Mera, Alexander Kampmann, Matthias Höschele, and Andreas Zeller. 2019. Parser-directed fuzzing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 548–560.

Digital Library

[34]

Mozilla. 2019. Dharma: A Generation-based, Context-free Grammar Fuzzer. (2019). Retrieved from https://blog.mozilla.org/security/2015/06/29/dharma/. Accessed: 13 August 2021.

[35]

Rohan Padhye, Caroline Lemieux, and Koushik Sen. 2019. JQF: Coverage-guided property-based testing in Java. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 398–401.

Digital Library

[36]

Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis, and Yves Le Traon. 2019. Semantic fuzzing with zest. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 329–340.

Digital Library

[37]

Fan Pan, Ying Hou, Zheng Hong, Lifa Wu, and Haiguang Lai. 2013. Efficient model-based fuzz testing using higher-order attribute grammars. JSW 8, 3 (2013), 645–651.

[38]

Van-Thuan Pham, Marcel Böhme, and Abhik Roychoudhury. 2016. Model-based whitebox fuzzing for program binaries. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 543–553.

Digital Library

[39]

Van-Thuan Pham, Marcel Böhme, Andrew Edward Santosa, Alexandru Razvan Caciulescu, and Abhik Roychoudhury. 2019. Smart greybox fuzzing. IEEE Transactions on Software Engineering 47, 9 (2019), 1980–1997.

[40]

Paul Purdom. 1972. A sentence generator for testing parsers. BIT Numerical Mathematics 12, 3 (1972), 366–375. DOI:DOI:

Digital Library

[41]

Jesse Ruderman. 2007. Introducing Jsfunfuzz. (2007). Retrieved from http://www.squarefree.com/2007/08/02/introducing-jsfunfuzz/. Accessed: 13 August 2021.

[42]

Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A concolic unit testing engine for C. In Proceedings of the ESEC/FSE’05.

Digital Library

[43]

Kosta Serebryany. 2016. Continuous fuzzing with libfuzzer and addresssanitizer. In Proceedings of the 2016 IEEE Cybersecurity Development. IEEE, 157–157.

[44]

2021. GitHub - google/libprotobuf-mutator: Library for structured fuzzing with protobuffers. Retrieved from https://github.com/google/libprotobuf-mutator. (2021). Accessed: 13 August 2021.

[45]

SweetScape Software. 2021. 010 Editor - Binary Template Repository - Download Binary Templates. Retrieved August 1, 2021 from https://www.sweetscape.com/010editor/repository/templates/. (2021).

[46]

SweetScape Software. 2021. 010 Editor - Binary Templates - Parsing Binary Files. Retrieved August 1, 2021 from https://www.sweetscape.com/010editor/templates.html. (2021).

[47]

SweetScape Software. 2021. 010 Editor - Pro Text/Hex Editor | Edit 160+ Formats | Fast & Powerful. Retrieved August 1, 2021 from https://www.sweetscape.com/010editor/. (2021).

[48]

SweetScape Software. 2021. 010 Editor Manual - Writing Templates. Retrieved August 1, 2021 from https://www.sweetscape.com/010editor/manual/IntroTemplates.htm. (2021).

[49]

Ezekiel Soremekun, Esteban Pavese, Nikolas Havrikov, Lars Grunske, and Andreas Zeller. 2020. Inputs from Hell: Learning input distributions for grammar-based test generation. IEEE Transactions on Software Engineering 48, 4 (2020), 1138–1153. DOI:DOI:

[50]

Sören Tempel, Vladimir Herdt, and Rolf Drechsler. 2022. SISL: Concolic testing of structured binary input formats via partial specification. In Proceedings of the Automated Technology for Verification and Analysis: 20th International Symposium, ATVA 2022, Virtual Event, October 25–28, 2022, Proceedings. Springer, 77–82.

Digital Library

[51]

William Underwood. 2012. Grammar-based specification and parsing of binary file formats. International Journal of Digital Curation 7, 03 (2012), 95–106. DOI:DOI:

[52]

Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-driven seed generation for fuzzing. In Proceedings of the2017 IEEE Symposium on Security and Privacy. IEEE, 579–594.

[53]

Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: Grammar-aware greybox fuzzing. In Proceedings of the 41st International Conference on Software Engineering. IEEE, 724–735.

Digital Library

[54]

Ming-Hung Wang, Han-Chi Wang, You-Ru Chen, and Chin-Laung Lei. 2017. Automatic test pattern generator for fuzzing based on finite state machine. Security and Communication Networks 2017, 1 (2017), 1–11.

Digital Library

[55]

David HD Warren, Luis M Pereira, and Fernando Pereira. 1977. Prolog-the language and its implementation compared with Lisp. ACM SIGPLAN Notices 12, 8 (1977), 109–115.

Digital Library

[56]

Jingbo Yan, Yuqing Zhang, and Dingning Yang. 2013. Structurized grammar-based fuzz testing for programs with highly structured inputs. Security and Communication Networks 6, 11 (2013), 1319–1330.

[57]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 283–294.

Digital Library

[58]

Michał Zalewski. 2016. American Fuzzy Lop. Retrieved October 1, 2016 from http://lcamtuf.coredump.cx/afl. (2016).

Index Terms

FormatFuzzer: Effective Fuzzing of Binary File Formats
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software notations and tools
    1. Compilers
      1. Parsers
      2. Translator writing systems and compiler generators
    2. Formal language definitions
      1. Syntax

Recommendations

Generator-based Fuzzing with Input Features
SBFT '24: Proceedings of the 17th ACM/IEEE International Workshop on Search-Based and Fuzz Testing

Generator-based fuzzing is a capable technique for testing semantic processing stages of a system under test (SUT). The idea is to use format-specific input generators, which can guarantee that inputs will be syntactically valid. One open question ...
Grammar-based whitebox fuzzing
PLDI '08: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation

Whitebox fuzzing is a form of automatic dynamic test generation, based on symbolic execution and constraint solving, designed for security testing of large applications. Unfortunately, the current effectiveness of whitebox fuzzing is limited when ...
Grammar-based whitebox fuzzing
PLDI '08

Whitebox fuzzing is a form of automatic dynamic test generation, based on symbolic execution and constraint solving, designed for security testing of large applications. Unfortunately, the current effectiveness of whitebox fuzzing is limited when ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 33, Issue 2

February 2024

947 pages

EISSN:1557-7392

DOI:10.1145/3618077

Editor:
Mauro Pezzè
USI Universitá della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2023

Online AM: 17 October 2023

Accepted: 18 September 2023

Revised: 18 August 2023

Received: 13 August 2021

Published in TOSEM Volume 33, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
431
Total Downloads

Downloads (Last 12 months)413
Downloads (Last 6 weeks)55

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents