Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

R2I: A Relative Readability Metric for Decompiled Code

Published: 12 July 2024 Publication History

Abstract

Decompilation is a process of converting a low-level machine code snippet back into a high-level programming language such as C. It serves as a basis to aid reverse engineers in comprehending the contextual semantics of the code. In this respect, commercial decompilers like Hex-Rays have made significant strides in improving the readability of decompiled code over time. While previous work has proposed the metrics for assessing the readability of source code, including identifiers, variable names, function names, and comments, those metrics are unsuitable for measuring the readability of decompiled code primarily due to i) the lack of rich semantic information in the source and ii) the presence of erroneous syntax or inappropriate expressions. In response, to the best of our knowledge, this work first introduces R2I, the Relative Readability Index, a specialized metric tailored to evaluate decompiled code in a relative context quantitatively. In essence, R2I can be computed by i) taking code snippets across different decompilers as input and ii) extracting pre-defined features from an abstract syntax tree. For the robustness of R2I, we thoroughly investigate the enhancement efforts made by (non-)commercial decompilers and academic research to promote code readability, identifying 31 features to yield a reliable index collectively. Besides, we conducted a user survey to capture subjective factors such as one’s coding styles and preferences. Our empirical experiments demonstrate that R2I is a versatile metric capable of representing the relative quality of decompiled code (e.g., obfuscation, decompiler updates) and being well aligned with human perception in our survey.

References

[1]
Duaa Alawad, Manisha Panta, Minhaz Zibran, and Md Rakibul Islam. 2019. An Empirical Study of the Relationships between Code Readability and Software Complexity. arXiv.
[2]
Jim Alves-Foss and Jia Song. 2019. Function Boundary Detection in Stripped Binaries. In Proceedings of the 35th Annual Computer Security Applications Conference (ACSAC). San Juan, Puerto Rico.
[3]
Dennis Andriesse, Asia Slowinska, and Herbert Bos. 2017. Compiler-Agnostic Function Detection in Binaries. In Proceedings of the 2nd. Paris, France.
[4]
Avast. 2023. Retdec : Changelog. https://github.com/avast/retdec/blob/master/CHANGELOG.md
[5]
Elias Bachaalany. 2007. Hex-Rays Decompiler primer. https://hex-rays.com/blog/hex-rays-decompiler-primer/
[6]
Tiffany Bao, Jonathan Burket, Maverick Woo, Rafael Turner, and David Brumley. 2014. ByteWeight: Learning to Recognize Functions in Binary Code. In Proceedings of the 23rd USENIX Security Symposium (Security). San Diego, CA.
[7]
Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in The Real World. Commun. ACM, 53, 2 (2010), 66–75.
[8]
Dirk Beyer and Ashgan Fararooy. 2010. A Simple and Effective Measure for Complex Low-level Dependencies. In Proceedings of the IEEE 18th International Conference on Program Comprehension (ICPC). 80–83.
[9]
Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. 2006. Detecting Self-mutating Malware Using Control-flow Graph Matching. In Detection of Intrusions and Malware & Vulnerability Assessment: Third International Conference, DIMVA 2006, Berlin, Germany, July 13-14, 2006. Proceedings 3. 129–143.
[10]
Raymond PL Buse and Westley R Weimer. 2008. A Metric for Software Readability. In Proceedings of the 2008 International Symposium on Software Testing and Analysis (ISSTA). ACM, 121–130.
[11]
Silvio Cesare, Yang Xiang, and Wanlei Zhou. 2013. Control Flow-based Malware Variantdetection. IEEE Transactions on Dependable and Secure Computing (TDSC), 11, 4 (2013), 307–317.
[12]
Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. Bingo: Cross-architecture Cross-os Binary Search. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). 678–689.
[13]
Gengbiao Chen, Zhengwei Qi, Shiqiu Huang, Kangqi Ni, Yudi Zheng, Walter Binder, and Haibing Guan. 2010. A Refined Decompiler to Generate C Code with High Readability. Special Issue: Focus Section on Selected Papers from the 2010 Conference on Cloud Computing and Virtualization (CCV), 43, 11 (2010), 150–154.
[14]
TIS Committee. 1995. Executable and Linking Format (ELF) Specification. https://refspecs.linuxfoundation.org/elf/elf.pdf
[15]
Yaniv David, Nimrod Partush, and Eran Yahav. 2016. Statistical Similarity of Binaries. Acm Sigplan Notices, 51, 6 (2016), 266–280.
[16]
Jonathan Dorn. 2012. A General Software Readability Model. Master’s thesis. Department of Computer Science, University of Virginia.
[17]
Emmanuel Dupuy. 2023. JD-GUI : Java Decompiler. https://java-decompiler.github.io/
[18]
Eliben. 2023. Pycparser : Github. https://github.com/eliben/pycparser
[19]
Van Emmerik and Michael James. 2007. Static Single Assignment for Decompilation. Ph. D. Dissertation. University of Queensland.
[20]
Steffen Enders. 2020-2021. Dewolf Survey : Github. https://github.com/steffenenders/dewolf-surveys
[21]
Steffen Enders, Eva-Maria C Behner, Niklas Bergmann, Mariia Rybalka, Elmar Padilla, Er Xue Hui, Henry Low, and Nicholas Sim. 2023. Dewolf: Improving Decompilation by Leveraging User Surveys. In Proceedings of The Workshop on Binary Analysis Research (BAR).
[22]
Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code. In Proceeding of the 23rd Annual Network and Distributed System Security Symposium (NDSS). 52, 58–79.
[23]
Sarah Fakhoury, Devjeet Roy, Adnan Hassan, and Vernera Arnaoudova. 2019. Improving Source Code Readability: Theory and Practice. In Proceeding of the 27th IEEE/ACM International Conference on Program Comprehension (ICPC). IEEE, 2–12.
[24]
Jian Gao, Xin Yang, Ying Fu, Yu Jiang, and Jiaguang Sun. 2018. VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-platform Binary. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE). 896–899.
[25]
Free Software Foundation GNU. 2015. GNU Core Utilities : Findutils. https://ftp.gnu.org/gnu/findutils/
[26]
Free Software Foundation GNU. 2017. GNU Core Utilities : Coreutils. https://ftp.gnu.org/gnu/coreutils/
[27]
Eom Haeun, Kim Dohee, Lim Sori, Koo Hyungjoon, and Hwang Sungjae. 2024. R2I: A Relative Readability Metric for Decompiled Code. https://doi.org/10.5281/zenodo.10684856
[28]
Eom Haeun, Kim Dohee, Lim Sori, Koo Hyungjoon, and Hwang Sungjae. 2024. R2I: A Relative Readability Metric for Decompiled Code. https://github.com/e0mh4/R2I.git
[29]
Maurice H Halstead. 1977. Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Ltd.
[30]
Hex-Rays. 2023. Hex-Rays Decompiler : Comparisons of Decompilation Across Decompiler Version. https://hex-rays.com/decompiler/
[31]
Hex-Rays. 2023. IDA Pro : A Powerful Disassembler and A Versatile Debugger. https://hex-rays.com/ida-pro/
[32]
John Johnson, Sergio Lubo, Nishitha Yedla, Jairo Aponte, and Bonita Sharif. 2019. An Empirical Study Assessing Source Code Readability in Comprehension. In Proceedings of the 35th IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 513–523.
[33]
Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2017. LLVM Obfuscator. https://github.com/obfuscator-llvm/obfuscator/tree/llvm-4.0
[34]
Hyungjoon Koo, Soyeon Park, and Taesoo Kim. 2023. A Look Back on a Function Identification Problem. In Proceedings of the 37th Annual Computer Security Applications Conference (ACSAC). Austin, TX.
[35]
Jakub Křoustek, Peter Matula, and P Zemek. 2017. Retdec: An Open-Source Machine-Code Decompiler. In Proceeding of the 6th Botnet and Malware Ecosystems Fighting Conference (BotConf).
[36]
Jakub Křoustek, Peter Matula, and P Zemek. 2023. Retdec : Github. https://github.com/avast/retdec
[37]
Taek Lee, Jung Been Lee, and Hoh Peter In. 2013. A Study of Different Coding Styles Affecting Code Readability. International Journal of Software Engineering and Its Applications, 7, 5 (2013), 413–422.
[38]
Umme Ayda Mannan, Iftekhar Ahmed, and Anita Sarma. 2018. Towards Understanding Code Readability and Its Impact on Design Quality. In Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering (NL4SE). ACM, 18–21.
[39]
Nicholas D. Matsakis and Felix S. Klock. 2014. The Rust Language. ACM SIGAda Ada Letters, 34, 3 (2014), 103–104.
[40]
Yukihiro Matsumoto. 2022. Ruby Programming Language. https://www.ruby-lang.org/
[41]
Larry Melling and Bob Zeidman. 2012. Comparing Android Applications to Find Copying. Journal of Digital Forensics, Security and Law, 7, 1 (2012), 4.
[42]
Jeff Meyerson. 2014. The Go Programming Language. IEEE Software, 31, 5 (2014), 104–104.
[43]
Qing Mi, Yiqun Hao, Liwei Ou, and Wei Ma. 2022. Towards Using Visual, Semantic and Structural Features to Improve Code Readability Classification. Journal of Systems and Software, 193 (2022), 111454.
[44]
Qing Mi, Jacky Keung, Yan Xiao, Solomon Mensah, and Yujin Gao. 2018. Improving Code Readability Classification Using Convolutional Neural Networks. Information and Software Technology, 104 (2018), 60–71.
[45]
Qing Mi, Yan Xiao, Zhi Cai, and Xibin Jia. 2021. The Effectiveness of Data Augmentation in Code Readability Classification. Information and Software Technology, 129 (2021), 106378.
[46]
NSA. 2023. Ghidra : Ghidra Change History. https://github.com/NationalSecurityAgency/ghidra/blob/Ghidra_10.3.1_build/Ghidra/Configurations/Public_Release/src/global/docs/ChangeHistory.html
[47]
NSA. 2023. Ghidra Decompiler. https://ghidra-sre.org/
[48]
Delano Oliveira, Reydne Bruno, Fernanda Madeiral, and Fernando Castor. 2020. Evaluating Code Readability and Legibility: An Examination of Human-Centric Studies. In Proceedings of the 36th IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 348–359.
[49]
Oracle. 2023. Java. https://www.java.com
[50]
Jihee Park, Sungho Lee, Jaemin Hong, and Sukyoung Ryu. 2023. Static Analysis of JNI Programs via Binary Decompilation. Journal of the IEEE Transactions on Software Engineering, 49 (2023), 3089–3105.
[51]
Fabiano Pecorelli, Fabio Palomba, Dario Di Nucci, and Andrea De Lucia. 2019. Comparing Heuristic and Machine Learning Approaches for Metric-based Code Smell Detection. In Proceedings of the IEEE/ACM 27th International Conference on Program Comprehension (ICPC). 93–104.
[52]
Daryl Posnett, Abram Hindle, and Premkumar Devanbu. 2011. A Simpler Model of Software Readability. In Proceedings of the 8th Working Conference on Mining Software Repositories (MSR). ACM, 73–82.
[53]
Radare. 2023. Radare : Libre and Portable Reverse Engineering Framework. https://www.radare.org/n/
[54]
Andreas Rumpf. 2022. A Statically Typed Compiled Systems Programming Language. https://nim-lang.org/
[55]
Simone Scalabrino, Mario Linares-Vásquez, Rocco Oliveto, and Denys Poshyvanyk. 2018. A Comprehensive Model for Code Readability. Journal of Software: Evolution and Process, 30, 6 (2018), 1958.
[56]
Simone Scalabrino, Mario Linares-Vasquez, Denys Poshyvanyk, and Rocco Oliveto. 2016. Improving Code Readability Models with Textual Features. In Proceeding of the 24th IEEE International Conference on Program Comprehension (ICPC). IEEE, 1–10.
[57]
Eric Schulte, Jason Ruchti, Matt Noonan, David Ciarletta, and Alexey Loginov. 2018. Evolving Exact Decompilation. In Proceedings of The Workshop on Binary Analysis Research (BAR).
[58]
Yash Shejwal, Virat Tiwari, Rewa Wader, and Aditya Warghane. 2023. Decompilers in Reverse Engineering. https://medium.com/@raw.rewa10/decompilers-and-reverse-engineering-6b4acf3f76ff
[59]
Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing Functions in Binaries with Neural Networks. In Proceedings of the 24th USENIX Security Symposium (Security). Washington, DC.
[60]
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In Proceeding of the 37th IEEE Symposium on Security and Privacy (SP).
[61]
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2023. Angr : Github. https://github.com/angr/angr/tree/master
[62]
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2023. Angr Documentation : Angr Changelog. https://docs.angr.io/en/latest/appendix/changelog.html
[63]
Yahya Tashtoush, Zeinab Odat, Izzat M Alsmadi, and Maryan Yatim. 2013. Impact of programming features on code readability. International Journal of Software Engineering and Its Applicati, 7, 6 (2013), 441–458.
[64]
Katerina Troshina, Yegor Derevenets, and Alexander Chernov. 2010. Reconstruction of Composite Types for Decompilation. In Proceeding of the 10th IEEE Working Conference on Source Code Analysis and Manipulation. IEEE, 179–188.
[65]
Guido Van Rossum and Fred L Drake. 2022. Python Reference Manual. https://www.python.org/psf-landing/
[66]
Vector35. 2023. Binary Ninja. https://binary.ninja
[67]
Vector35. 2023. Decompiler Explorer : A Web Front-end to a Number of Decompilers. https://dogbolt.org/
[68]
Wargio. 2023. R2dec-js : Github. https://github.com/wargio/r2dec-js
[69]
Wargio. 2023. R2dec-js : Github : Releases. https://github.com/wargio/r2dec-js/releases
[70]
Jordan Wiens, Kyle Martin, Peter LaFosse, Alexander Taylor, Xusheng Li, Rusty Wagner, Andrew Lamoureux, Jon Palmisciano, Stephen Tong, and Brian Potchik. 2023. Binary Ninja : Binary Ninja Change History Blog. https://binary.ninja/blog/
[71]
The Program Transformation Wiki. 2023. The Decompilation Wiki. https://www.program-transformation.org/Transform/DeCompilation.html
[72]
Erik Wirtanen. 2022. NSF-funded Project Aims to Mitigate Malware and Viruses by Making Them Easily Understandable. https://fullcircle.asu.edu/faculty/know-thy-enemy/
[73]
Khaled Yakdan, Sergej Dechand, Elmar Gerhards-Padilla, and Matthew Smith. 2016. Helping Johnny to Analyze Malware: A Usability-optimized Decompiler and Malware Analysis User Study. In Proceeding of the 37th IEEE Symposium on Security and Privacy (SP). 158–177.
[74]
Khaled Yakdan, Sebastian Eschweiler, Elmar Gerhards-Padilla, and Matthew Smith. 2015. No More Gotos: Decompilation Using Pattern-Independent Control-Flow Structuring and Semantic-Preserving Transformations. In Proceeding of the 22nd Annual Network and Distributed System Security Symposium (NDSS).
[75]
Feng Zhang, Audris Mockus, Ying Zou, Foutse Khomh, and Ahmed E Hassan. 2013. How Does Context Affect the Distribution of Software Maintainability Metrics? In Proceedings of the IEEE International Conference on Software Maintenance (ICSM). 350–359.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Software Engineering
Proceedings of the ACM on Software Engineering  Volume 1, Issue FSE
July 2024
2770 pages
EISSN:2994-970X
DOI:10.1145/3554322
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2024
Published in PACMSE Volume 1, Issue FSE

Badges

Author Tags

  1. Code Metric
  2. Code Readability
  3. Decompiled Code
  4. Decompiler

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 224
    Total Downloads
  • Downloads (Last 12 months)224
  • Downloads (Last 6 weeks)101
Reflects downloads up to 28 Sep 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media