Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

FormatFuzzer: Effective Fuzzing of Binary File Formats

Published: 22 December 2023 Publication History

Abstract

Effective fuzzing of programs that process structured binary inputs, such as multimedia files, is a challenging task, since those programs expect a very specific input format. Existing fuzzers, however, are mostly format-agnostic, which makes them versatile, but also ineffective when a specific format is required.
We present FormatFuzzer, a generator for format-specific fuzzers. FormatFuzzer takes as input a binary template (a format specification used by the 010 Editor) and compiles it into C++ code that acts as parser, mutator, and highly efficient generator of inputs conforming to the rules of the language.
The resulting format-specific fuzzer can be used as a standalone producer or mutator in black-box settings, where no guidance from the program is available. In addition, by providing mutable decision seeds, it can be easily integrated with arbitrary format-agnostic fuzzers such as AFL to make them format-aware. In our evaluation on complex formats such as MP4 or ZIP, FormatFuzzer showed to be a highly effective producer of valid inputs that also detected previously unknown memory errors in ffmpeg and timidity.

References

[1]
2021. Wikipedia: ffmpeg. Retrieved from https://en.wikipedia.org/wiki/FFmpeg. Accessed: 13 August 2021.
[2]
2021. Wikipedia: List of File Formats. Retrieved from https://en.wikipedia.org/wiki/List_of_file_formats. Accessed: 13 August 2021.
[3]
Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, Ahmad-Reza Sadeghi, and Daniel Teuchert. 2019. NAUTILUS: Fishing for deep bugs with grammars. In Proceedings of the NDSS 2019. Retrieved from https://www.ndss-symposium.org/ndss-paper/nautilus-fishing-for-deep-bugs-with-grammars/
[4]
Julian Bangert and Nickolai Zeldovich. 2014. Nail: A practical tool for parsing and generating data formats. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation. 615–628.
[5]
Tim Blazytko, Cornelius Aschermann, Moritz Schlögel, Ali Abbasi, Sergej Schumilo, Simon Wörner, and Thorsten Holz. 2019. \(\lbrace\)GRIMOIRE\(\rbrace\): Synthesizing structure while fuzzing. In Proceedings of the 28th Security Symposium. 1985–2002.
[6]
W. H. Burkhardt. 1967. Generating test programs from syntax. Computing 2, 1(1967), 53–73. DOI:DOI:
[7]
Yongheng Chen, Rui Zhong, Hong Hu, Hangfan Zhang, Yupeng Yang, Dinghao Wu, and Wenke Lee. 2021. One engine to fuzz ’em All: Generic language processor testing with semantic validation (to appear). In Proceedings of the 42nd IEEE Symposium on Security and Privacy. San Francisco, CA.
[8]
Zehan Chen, Yuliang Lu, Kailong Zhu, Lu Yu, and Jiazhen Zhao. 2022. Fast format-aware fuzzing for structured input applications. Applied Sciences 12, 18 (2022), 9350.
[9]
Noam Chomsky. 1956. Three models for the description of language. IRE Transactions on Information Theory 2, 3 (1956), 113–124. Retrieved from https://chomsky.info/wp-content/uploads/195609-.pdf
[10]
Koen Claessen and John Hughes. 2011. QuickCheck: A lightweight tool for random testing of Haskell programs. ACM SIGPLAN Notices 46, 4 (2011), 53–64.
[11]
Baojiang Cui, Shurui Liang, Shilei Chen, Bing Zhao, and Xiaobing Liang. 2014. A novel fuzzing method for Zigbee based on finite state machine. International Journal of Distributed Sensor Networks 10, 1 (2014), 762891.
[12]
James ”d0c_s4vage” Johnson. 2020. GitHub - d0c-s4vage/pfp: pfp - Python Format Parser - a Python-based 010 Editor Template Interpreter. Retrieved August 1, 2021 from https://github.com/d0c-s4vage/pfp. (2020).
[13]
James ”d0c_s4vage” Johnson. 2020. GitHub - d0c-s4vage/py010parser: A Modified Pycparser to Parse 010 Templates. Retrieved August 1, 2021 from https://github.com/d0c-s4vage/py010parser. (2020).
[14]
Oxford Brookes University (Second Edition) David Duce. 2003. Portable Network Graphics (PNG) Specification (Second Edition). Retrieved November 15, 2021 from https://www.w3.org/TR/PNG/. (2003).
[15]
Kyle Dewey, Jared Roesch, and Ben Hardekopf. 2014. Language fuzzing using constraint logic programming. Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. 725–730.
[16]
Stephen Dolan. 2021. Crowbar. Retrieved November 15, 2021 from https://github.com/stedolan/crowbar. (2021).
[17]
2018. Peach Fuzzer: Discover unknown vulnerabilities. Retrieved from https://www.peach.tech/. Accessed 29 August 2018.
[18]
Andrea Fioraldi, Daniele Cono D’Elia, and Emilio Coppa. 2020. WEIZZ: Automatic grey-box fuzzing for structured binary formats. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 1–13.
[19]
Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. AFL++: Combining incremental steps of fuzzing research. In Proceedings of the 14th USENIX Workshop on Offensive Technologies. USENIX Association.
[20]
Ivan Fratric. 2019. Domato A DOM Fuzzer. (2019). Retrieved from https://github.com/googleprojectzero/domato. Accessed: 13 August 2021.
[21]
Patrice Godefroid, Adam Kiezun, and Michael Y. Levin. 2008. Grammar-based whitebox fuzzing. ACM, New York, NY, USA, 206–215.
[22]
Harrison Goldstein and Benjamin C. Pierce. 2022. Parsing randomness. Proceedings of the ACM on Programming Languages6, OOPSLA (2022), 89–113.
[23]
Rahul Gopinath, Björn Mathis, and Andreas Zeller. 2020. Mining input grammars from dynamic control flow. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 172–183.
[24]
Claude Cordell Green. 1970. The Application of Theorem Proving to Question-answering Systems. Number 96. Management Information Services.
[25]
Tao Guo, Puhan Zhang, Xin Wang, and Qiang Wei. 2013. Gramfuzz: Fuzzing testing of web browsers based on grammar analysis and structural mutation. In Proceedings of the 2013 Second International Conference on Informatics & Applications. IEEE, 212–215.
[26]
Kenneth V. Hanford. 1970. Automatic generation of test cases. IBM Systems Journal 9, 4 (1970), 242–257. DOI:DOI:
[27]
Nikolas Havrikov and Andreas Zeller. 2019. Systematically covering input structure. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering.IEEE, 189–199. DOI:DOI:
[28]
Renáta Hodován, Ákos Kiss, and Tibor Gyimóthy. 2018. Grammarinator: A grammar-based open source fuzzer. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation. ACM, 45–48.
[29]
Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with code fragments. In Proceedings of the 21st USENIX Conference on Security Symposium.USENIX Association, Berkeley, CA, 38–38.
[30]
Leonidas Lampropoulos, Diane Gallois-Wong, Cătălin Hriţcu, John Hughes, Benjamin C Pierce, and Li-yao Xia. 2017. Beginner’s luck: A language for property-based generators. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages. 114–129.
[31]
Leonidas Lampropoulos, Michael Hicks, and Benjamin C. Pierce. 2019. Coverage guided, property based testing. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1–29.
[32]
Olivier Levillain. 2014. Parsifal: A pragmatic solution to the binary parsing problems. In Proceedings of the 2014 IEEE Security and Privacy Workshops. IEEE, 191–197.
[33]
Björn Mathis, Rahul Gopinath, Michaël Mera, Alexander Kampmann, Matthias Höschele, and Andreas Zeller. 2019. Parser-directed fuzzing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 548–560.
[34]
Mozilla. 2019. Dharma: A Generation-based, Context-free Grammar Fuzzer. (2019). Retrieved from https://blog.mozilla.org/security/2015/06/29/dharma/. Accessed: 13 August 2021.
[35]
Rohan Padhye, Caroline Lemieux, and Koushik Sen. 2019. JQF: Coverage-guided property-based testing in Java. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 398–401.
[36]
Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis, and Yves Le Traon. 2019. Semantic fuzzing with zest. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 329–340.
[37]
Fan Pan, Ying Hou, Zheng Hong, Lifa Wu, and Haiguang Lai. 2013. Efficient model-based fuzz testing using higher-order attribute grammars. JSW 8, 3 (2013), 645–651.
[38]
Van-Thuan Pham, Marcel Böhme, and Abhik Roychoudhury. 2016. Model-based whitebox fuzzing for program binaries. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 543–553.
[39]
Van-Thuan Pham, Marcel Böhme, Andrew Edward Santosa, Alexandru Razvan Caciulescu, and Abhik Roychoudhury. 2019. Smart greybox fuzzing. IEEE Transactions on Software Engineering 47, 9 (2019), 1980–1997.
[40]
Paul Purdom. 1972. A sentence generator for testing parsers. BIT Numerical Mathematics 12, 3 (1972), 366–375. DOI:DOI:
[41]
Jesse Ruderman. 2007. Introducing Jsfunfuzz. (2007). Retrieved from http://www.squarefree.com/2007/08/02/introducing-jsfunfuzz/. Accessed: 13 August 2021.
[42]
Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A concolic unit testing engine for C. In Proceedings of the ESEC/FSE’05.
[43]
Kosta Serebryany. 2016. Continuous fuzzing with libfuzzer and addresssanitizer. In Proceedings of the 2016 IEEE Cybersecurity Development. IEEE, 157–157.
[44]
2021. GitHub - google/libprotobuf-mutator: Library for structured fuzzing with protobuffers. Retrieved from https://github.com/google/libprotobuf-mutator. (2021). Accessed: 13 August 2021.
[45]
SweetScape Software. 2021. 010 Editor - Binary Template Repository - Download Binary Templates. Retrieved August 1, 2021 from https://www.sweetscape.com/010editor/repository/templates/. (2021).
[46]
SweetScape Software. 2021. 010 Editor - Binary Templates - Parsing Binary Files. Retrieved August 1, 2021 from https://www.sweetscape.com/010editor/templates.html. (2021).
[47]
SweetScape Software. 2021. 010 Editor - Pro Text/Hex Editor | Edit 160+ Formats | Fast & Powerful. Retrieved August 1, 2021 from https://www.sweetscape.com/010editor/. (2021).
[48]
SweetScape Software. 2021. 010 Editor Manual - Writing Templates. Retrieved August 1, 2021 from https://www.sweetscape.com/010editor/manual/IntroTemplates.htm. (2021).
[49]
Ezekiel Soremekun, Esteban Pavese, Nikolas Havrikov, Lars Grunske, and Andreas Zeller. 2020. Inputs from Hell: Learning input distributions for grammar-based test generation. IEEE Transactions on Software Engineering 48, 4 (2020), 1138–1153. DOI:DOI:
[50]
Sören Tempel, Vladimir Herdt, and Rolf Drechsler. 2022. SISL: Concolic testing of structured binary input formats via partial specification. In Proceedings of the Automated Technology for Verification and Analysis: 20th International Symposium, ATVA 2022, Virtual Event, October 25–28, 2022, Proceedings. Springer, 77–82.
[51]
William Underwood. 2012. Grammar-based specification and parsing of binary file formats. International Journal of Digital Curation 7, 03 (2012), 95–106. DOI:DOI:
[52]
Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-driven seed generation for fuzzing. In Proceedings of the2017 IEEE Symposium on Security and Privacy. IEEE, 579–594.
[53]
Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: Grammar-aware greybox fuzzing. In Proceedings of the 41st International Conference on Software Engineering. IEEE, 724–735.
[54]
Ming-Hung Wang, Han-Chi Wang, You-Ru Chen, and Chin-Laung Lei. 2017. Automatic test pattern generator for fuzzing based on finite state machine. Security and Communication Networks 2017, 1 (2017), 1–11.
[55]
David HD Warren, Luis M Pereira, and Fernando Pereira. 1977. Prolog-the language and its implementation compared with Lisp. ACM SIGPLAN Notices 12, 8 (1977), 109–115.
[56]
Jingbo Yan, Yuqing Zhang, and Dingning Yang. 2013. Structurized grammar-based fuzz testing for programs with highly structured inputs. Security and Communication Networks 6, 11 (2013), 1319–1330.
[57]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 283–294.
[58]
Michał Zalewski. 2016. American Fuzzy Lop. Retrieved October 1, 2016 from http://lcamtuf.coredump.cx/afl. (2016).

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 2
February 2024
947 pages
EISSN:1557-7392
DOI:10.1145/3618077
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2023
Online AM: 17 October 2023
Accepted: 18 September 2023
Revised: 18 August 2023
Received: 13 August 2021
Published in TOSEM Volume 33, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Structure-aware fuzzing
  2. file format specifications
  3. binary files
  4. grammars
  5. parser generators
  6. generator-based fuzzing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 431
    Total Downloads
  • Downloads (Last 12 months)413
  • Downloads (Last 6 weeks)55
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media