research-article

Open access

Khaos: The Impact of Inter-procedural Code Obfuscation on Binary Diffing Techniques

Authors:

Zhe WangAuthors Info & Claims

CGO 2023: Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization

Pages 55 - 67

https://doi.org/10.1145/3579990.3580007

Published: 22 February 2023 Publication History

Abstract

Software obfuscation techniques can prevent binary diffing techniques from locating vulnerable code by obfuscating the third-party code, to achieve the purpose of protecting embedded device software. With the rapid development of binary diffing techniques, they can achieve more and more accurate function matching and identification by extracting the features within the function. This makes existing software obfuscation techniques, which mainly focus on the intra-procedural code obfuscation, no longer effective.

In this paper, we propose a new inter-procedural code obfuscation mechanism Khaos, which moves the code across functions to obfuscate the function by using compilation optimizations. Two obfuscation primitives are proposed to separate and aggregate the function, which are called fission and fusion respectively. A prototype of Khaos is implemented based on the LLVM compiler and evaluated on a large number of real-world programs including SPEC CPU 2006 & 2017, CoreUtils, JavaScript engines, etc. Experimental results show that Khaos outperforms existing code obfuscations and can significantly reduce the accuracy rates of five state-of-the-art binary diffing techniques (less than 19%) with lower runtime overhead (less than 7%).

References

[1]

Markus F.X.J. Oberhumer and László Molnár and John F. Reiser. 2022. The Ultimate Packer for eXecutables. https://upx.github.io/.

[2]

Robert B Allan and Renu Laskar. 1978. On domination and independent domination numbers of a graph. Discrete mathematics, 23, 2 (1978), 73–76. https://doi.org/10.1016/0012-365X(78)90105-X

[3]

Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2018. Fossil: a resilient and efficient system for identifying foss functions in malware binaries. ACM Transactions on Privacy and Security (TOPS), 21, 2 (2018), 1–34. https://doi.org/10.1145/3175492

Digital Library

[4]

Manos Antonakakis. 2017. Understanding the mirai botnet. In 26th USENIX security symposium (USENIX Security 17). 1093–1110.

[5]

Sebastian Banescu, Christian Collberg, Vijay Ganesh, Zack Newsham, and Alexander Pretschner. 2016. Code obfuscation against symbolic execution attacks. In Proceedings of the 32nd Annual Conference on Computer Security Applications. 189–200. https://doi.org/10.1145/2991079.2991114

Digital Library

[6]

Martial Bourquin, Andy King, and Edward Robbins. 2013. Binslayer: accurate comparison of binary executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop. 1–10. https://doi.org/10.1145/2430553.2430557

Digital Library

[7]

Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. Bingo: Cross-architecture cross-os binary search. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 678–689. https://doi.org/10.1145/2950290.2950350

Digital Library

[8]

Zoe Chen, Paul O’Donnell, Eric Ottman, Steven Trieu, and Alan J Michaels. 2020. An Invisible Insider Threat: The Risks of Implanted Medical Devices in Secure Spaces.

[9]

Christian Collberg, Sam Martin, Jonathan Myers, and Jasvir Nagra. 2012. Distributed application tamper detection via continuous software updates. In Proceedings of the 28th Annual Computer Security Applications Conference. 319–328. https://doi.org/10.1145/2420950.2420997

Digital Library

[10]

Christian Collberg, Clark Thomborson, and Douglas Low. 1998. Breaking abstractions and unstructuring data structures. In Proceedings of the 1998 International Conference on Computer Languages (Cat. No. 98CB36225). 28–38. https://doi.org/10.1109/ICCL.1998.674154

[11]

Ang Cui, Michael Costello, and Salvatore Stolfo. 2013. When firmware modifications attack: A case study of embedded exploitation. https://doi.org/10.7916/D8P55NKB

[12]

Yaniv David, Nimrod Partush, and Eran Yahav. 2017. Similarity of binaries through re-optimization. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 79–94. https://doi.org/10.1145/3062341.3062387

Digital Library

[13]

Yaniv David, Nimrod Partush, and Eran Yahav. 2018. Firmup: Precise static detection of common vulnerabilities in firmware. ACM SIGPLAN Notices, 53, 2 (2018), 392–404. https://doi.org/10.1145/3173162.3177157

Digital Library

[14]

Yaniv David and Eran Yahav. 2014. Tracelet-based code search in executables. Acm Sigplan Notices, 49, 6 (2014), 349–360. https://doi.org/10.1145/2594291.2594343

Digital Library

[15]

Artem Dinaburg, Paul Royal, Monirul Sharif, and Wenke Lee. 2008. Ether: malware analysis via hardware virtualization extensions. In Proceedings of the 15th ACM conference on Computer and communications security. 51–62. https://doi.org/10.1145/1455770.1455779

Digital Library

[16]

Steven HH Ding, Benjamin CM Fung, and Philippe Charland. 2019. Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In 2019 IEEE Symposium on Security and Privacy (SP). 472–489. https://doi.org/10.1109/SP.2019.00003

[17]

Yue Duan, Xuezixiang Li, Jinghan Wang, and Heng Yin. 2020. Deepbindiff: Learning program-wide code representations for binary diffing. In Network and Distributed System Security Symposium. https://doi.org/10.14722/ndss.2020.24311

[18]

Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. 2008. A survey on automated dynamic malware-analysis techniques and tools. ACM computing surveys (CSUR), 44, 2 (2008), 1–42. https://doi.org/10.1145/2089125.2089126

Digital Library

[19]

Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code. In NDSS. 52, 58–79. https://doi.org/10.14722/ndss.2016.23185

[20]

Qian Feng, Minghua Wang, Mu Zhang, Rundong Zhou, Andrew Henderson, and Heng Yin. 2017. Extracting conditional formulas for cross-platform bug search. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. 346–359. https://doi.org/10.1145/3052973.3052995

Digital Library

[21]

Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 480–491. https://doi.org/10.1145/2976749.2978370

Digital Library

[22]

Jian Gao, Xin Yang, Ying Fu, Yu Jiang, and Jiaguang Sun. 2018. VulSeeker: A semantic learning based vulnerability seeker for cross-platform binary. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). 896–899. https://doi.org/10.1145/3238147.3240480

Digital Library

[23]

Yikun Hu, Yuanyuan Zhang, Juanru Li, and Dawu Gu. 2016. Cross-architecture binary semantics understanding via similar code comparison. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 1, 57–67. https://doi.org/10.1109/SANER.2016.50

[24]

Yikun Hu, Yuanyuan Zhang, Juanru Li, and Dawu Gu. 2017. Binary code clone detection across architectures and compiling configurations. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). 88–98. https://doi.org/10.1109/ICPC.2017.22

Digital Library

[25]

Yikun Hu, Yuanyuan Zhang, Juanru Li, Hui Wang, Bodong Li, and Dawu Gu. 2018. Binmatch: A semantics-based hybrid approach on binary code clone analysis. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 104–114. https://doi.org/10.1109/ICSME.2018.00019

[26]

He Huang, Amr M Youssef, and Mourad Debbabi. 2017. Binsequence: Fast, accurate and scalable binary code reuse detection. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. 155–166. https://doi.org/10.1145/3052973.3052974

Digital Library

[27]

Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2015. Obfuscator-LLVM–software protection for the masses. In 2015 IEEE/ACM 1st International Workshop on Software Protection. 3–9. https://doi.org/10.1109/SPRO.2015.10

Digital Library

[28]

Samuel T King and Peter M Chen. 2006. SubVirt: Implementing malware with virtual machines. In 2006 IEEE Symposium on Security and Privacy (S&P’06). 14–pp. https://doi.org/10.1109/SP.2006.38

Digital Library

[29]

Kaiyuan Kuang, Zhanyong Tang, Xiaoqing Gong, Dingyi Fang, Xiaojiang Chen, and Zheng Wang. 2018. Enhance virtual-machine-based code obfuscation security through dynamic bytecode scheduling. Computers & Security, 74 (2018), 202–220. https://doi.org/10.1016/j.cose.2018.01.008

Digital Library

[30]

Albert Kwon, Udit Dhawan, Jonathan M Smith, Thomas F Knight Jr, and Andre DeHon. 2013. Low-fat pointers: compact encoding and efficient gate-level implementation of fat pointers for spatial safety and capability-based security. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. 721–732. https://doi.org/10.1145/2508859.2516713

Digital Library

[31]

Thomas Lengauer and Robert Endre Tarjan. 1979. A fast algorithm for finding dominators in a flowgraph. ACM Transactions on Programming Languages and Systems (TOPLAS), 1, 1 (1979), 121–141. https://doi.org/10.1145/357062.357071

Digital Library

[32]

LLVM Project. 2022. LLVM Block Frequency Terminology. https://llvm.org/docs/BlockFrequencyTerminology.html.

[33]

Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2014. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 389–400. https://doi.org/10.1145/2635868.2635900

Digital Library

[34]

Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. Safe: Self-attentive function embeddings for binary similarity. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 309–329. https://doi.org/10.1007/978-3-030-22038-9_15

[35]

microsoft. 2021. New Security Signals study shows firmware attacks on the rise. https://www.microsoft.com/security/blog/2021/03/30/new-security-signals-study-shows-firmware-attacks-on-the-rise-heres-how-microsoft-is-working-to-help-eliminate-this-entire-class-of-threats/.

[36]

Jiang Ming, Dongpeng Xu, Yufei Jiang, and Dinghao Wu. 2017. $BinSim$: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking. In 26th USENIX Security Symposium (USENIX Security 17). 253–270.

[37]

Jiang Ming, Dongpeng Xu, Li Wang, and Dinghao Wu. 2015. Loop: Logic-oriented opaque predicate detection in obfuscated binary code. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 757–768. https://doi.org/10.1145/2810103.2813617

Digital Library

[38]

MNEMONIC LABS. 2020. Uncovering vulnerabilities in pacemakers. https://www.mnemonic.no/blog/uncovering-vulnerabilities-in-pacemakers/.

[39]

Carey Nachenberg. 1997. Computer virus-antivirus coevolution. Commun. ACM, 40, 1 (1997), 46–51. https://doi.org/10.1145/242857.242869

Digital Library

[40]

Mathilde Ollivier, Sébastien Bardin, Richard Bonichon, and Jean-Yves Marion. 2019. How to kill symbolic deobfuscation for free (or: unleashing the potential of path-oriented protections). In Proceedings of the 35th Annual Computer Security Applications Conference. 177–189. https://doi.org/10.1145/3359789.3359812

Digital Library

[41]

Oreans Technologies. 2022. Themida Overview. https://www.oreans.com/themida.php.

[42]

Anthony Ralston, Edwin D Reilly, and David Hemmendinger. 2000. Encyclopedia of computer science. Grove’s Dictionaries Inc.

[43]

Xiaolei Ren, Michael Ho, Jiang Ming, Yu Lei, and Li Li. 2021. Unleashing the hidden power of compiler optimization on binary code difference: An empirical study. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 142–157. https://doi.org/10.1145/3453483.3454035

Digital Library

[44]

Kevin A Roundy and Barton P Miller. 2013. Binary-code obfuscations in prevalent packer tools. ACM Computing Surveys (CSUR), 46, 1 (2013), 1–32. https://doi.org/10.1145/2522968.2522972

Digital Library

[45]

Sebastian Schrittwieser, Stefan Katzenbeisser, Johannes Kinder, Georg Merzdovnik, and Edgar Weippl. 2016. Protecting software through obfuscation: Can it keep pace with progress in code analysis? ACM Computing Surveys (CSUR), 49, 1 (2016), 1–37. https://doi.org/10.1145/2886012

Digital Library

[46]

Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee. 2009. Automatic reverse engineering of malware emulators. In 2009 30th IEEE Symposium on Security and Privacy. 94–109. https://doi.org/10.1109/SP.2009.27

Digital Library

[47]

statista. 2020. Number of Connected IoT Devices Worldwide. https://www.statista.com/statistics/1101442/iot-number-of-connected-devices-worldwide/.

[48]

Mechthild Stoer and Frank Wagner. 1997. A simple min-cut algorithm. Journal of the ACM (JACM), 44, 4 (1997), 585–591. https://doi.org/10.1145/263867.263872

Digital Library

[49]

Tencent Blade Team. 2021. Exploiting Qualcomm WLAN and Modem Over The Air. https://blade.tencent.com/en/advisories/qualpwn/.

[50]

Roberto Tiella and Mariano Ceccato. 2017. Automatic generation of opaque constants based on the k-clique problem for resilient data obfuscation. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). 182–192. https://doi.org/10.1109/SANER.2017.7884620

[51]

Xabier Ugarte-Pedrero, Davide Balzarotti, Igor Santos, and Pablo G Bringas. 2015. SoK: Deep packer inspection: A longitudinal study of the complexity of run-time packers. In 2015 IEEE Symposium on Security and Privacy. 659–673. https://doi.org/10.1109/SP.2015.46

Digital Library

[52]

Huaijin Wang, Pingchuan Ma, Yuanyuan Yuan, Zhibo Liu, Shuai Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2022. Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking. IEEE Transactions on Software Engineering, https://doi.org/10.1109/TSE.2022.3149240

[53]

Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: Jump-Aware Transformer for Binary Code Similarity. arXiv preprint arXiv:2205.12713, https://doi.org/10.48550/arXiv.2205.12713

[54]

Huaijin Wang, Shuai Wang, Dongpeng Xu, Xiangyu Zhang, and Xiao Liu. 2020. Generating effective software obfuscation sequences with reinforcement learning. IEEE Transactions on Dependable and Secure Computing, https://doi.org/10.1109/TDSC.2020.3041655

[55]

Shuai Wang and Dinghao Wu. 2017. In-memory fuzzing for binary code similarity analysis. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 319–330. https://doi.org/10.1109/ASE.2017.8115645

[56]

Lili Wei, Yepang Liu, and Shing-Chi Cheung. 2016. Taming android fragmentation: Characterizing and detecting compatibility issues for android apps. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 226–237. https://doi.org/10.1145/2970276.2970312

Digital Library

[57]

Hui Xu, Yangfan Zhou, Yu Kang, Fengzhi Tu, and Michael Lyu. 2018. Manufacturing resilient bi-opaque predicates against symbolic execution. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 666–677. https://doi.org/10.1109/DSN.2018.00073

[58]

Xi Xu, Qinghua Zheng, Zheng Yan, Ming Fan, Ang Jia, and Ting Liu. 2021. Interpretation-enabled Software Reuse Detection Based on a Multi-Level Birthmark Model. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 873–884. https://doi.org/10.1109/ICSE43902.2021.00084

Digital Library

[59]

Yifei Xu, Zhengzi Xu, Bihuan Chen, Fu Song, Yang Liu, and Ting Liu. 2020. Patch based vulnerability matching for binary programs. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 376–387. https://doi.org/10.1145/3395363.3397361

Digital Library

[60]

Yinxing Xue, Zhengzi Xu, Mahinthan Chandramohan, and Yang Liu. 2018. Accurate and scalable cross-architecture cross-os binary code search with emulation. IEEE Transactions on Software Engineering, 45, 11 (2018), 1125–1149. https://doi.org/10.1109/TSE.2018.2827379

[61]

Peihua Zhang. 2022. Khaos. https://doi.org/10.5281/zenodo.7496594

Digital Library

[62]

zynamics GmbH and Google LLC. 2022. BinDiff Manual. http://www.zynamics.com/bindiff/manual/index.html.

Cited By

Zhang PWu CWang Z(2024)BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniquesBenchCouncil Transactions on Benchmarks, Standards and Evaluations10.1016/j.tbench.2024.1001634:2(100163)Online publication date: Jun-2024
https://doi.org/10.1016/j.tbench.2024.100163
Geng HZhong MZhang PLv FFeng X(2023)OPTango: Multi-central Representation Learning against Innumerable Compiler Optimization for Binary Diffing2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00013(774-785)Online publication date: 9-Oct-2023
https://doi.org/10.1109/ISSRE59848.2023.00013

Index Terms

Khaos: The Impact of Inter-procedural Code Obfuscation on Binary Diffing Techniques
1. Security and privacy
  1. Software and application security

Recommendations

Binary Protection using Dynamic Fine-grained Code Hiding and Obfuscation
ICINS '16: Proceedings of the 4th International Conference on Information and Network Security

Anti-reverse engineering is one of the core technologies of software intellectual property protection, prevailing techniques of which are static and dynamic obfuscation. Static obfuscation can only prevent static analysis with code mutation done before ...
Obfuscation: The Hidden Malware

A cyberwar exists between malware writers and antimalware researchers. At this war's heart rages a weapons race that originated in the 80s with the first computer virus. Obfuscation is one of the latest strategies to camouflage the telltale signs of ...
Code Artificiality: A Metric for the Code Stealth Based on an N-Gram Model
SPRO '15: Proceedings of the 2015 IEEE/ACM 1st International Workshop on Software Protection

This paper proposes a method for evaluating the artificiality of protected code by means of an N-gram model. The proposed artificiality metric helps us measure the stealth of the protected code, that is, the degree to which protected code can be ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '23: Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization

February 2023

262 pages

ISBN:9798400701016

DOI:10.1145/3579990

General Chair:
Christophe Dubach
McGill University, Canada
,
Program Chairs:
Derek Bruening
Google, USA
,
Ben Hardekopf
University of California at Santa Barbara, USA

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
the Innovation Funding of ICT, CAS

Conference

CGO '23

Sponsor:

CGO '23: 21st ACM/IEEE International Symposium on Code Generation and Optimization

February 25 - March 1, 2023

QC, Montréal, Canada

Acceptance Rates

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
722
Total Downloads

Downloads (Last 12 months)493
Downloads (Last 6 weeks)76

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang PWu CWang Z(2024)BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniquesBenchCouncil Transactions on Benchmarks, Standards and Evaluations10.1016/j.tbench.2024.1001634:2(100163)Online publication date: Jun-2024
https://doi.org/10.1016/j.tbench.2024.100163
Geng HZhong MZhang PLv FFeng X(2023)OPTango: Multi-central Representation Learning against Innumerable Compiler Optimization for Binary Diffing2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00013(774-785)Online publication date: 9-Oct-2023
https://doi.org/10.1109/ISSRE59848.2023.00013

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents