Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3579990.3580007acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article
Open access

Khaos: The Impact of Inter-procedural Code Obfuscation on Binary Diffing Techniques

Published: 22 February 2023 Publication History

Abstract

Software obfuscation techniques can prevent binary diffing techniques from locating vulnerable code by obfuscating the third-party code, to achieve the purpose of protecting embedded device software. With the rapid development of binary diffing techniques, they can achieve more and more accurate function matching and identification by extracting the features within the function. This makes existing software obfuscation techniques, which mainly focus on the intra-procedural code obfuscation, no longer effective.
In this paper, we propose a new inter-procedural code obfuscation mechanism Khaos, which moves the code across functions to obfuscate the function by using compilation optimizations. Two obfuscation primitives are proposed to separate and aggregate the function, which are called fission and fusion respectively. A prototype of Khaos is implemented based on the LLVM compiler and evaluated on a large number of real-world programs including SPEC CPU 2006 & 2017, CoreUtils, JavaScript engines, etc. Experimental results show that Khaos outperforms existing code obfuscations and can significantly reduce the accuracy rates of five state-of-the-art binary diffing techniques (less than 19%) with lower runtime overhead (less than 7%).

References

[1]
Markus F.X.J. Oberhumer and László Molnár and John F. Reiser. 2022. The Ultimate Packer for eXecutables. https://upx.github.io/.
[2]
Robert B Allan and Renu Laskar. 1978. On domination and independent domination numbers of a graph. Discrete mathematics, 23, 2 (1978), 73–76. https://doi.org/10.1016/0012-365X(78)90105-X
[3]
Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2018. Fossil: a resilient and efficient system for identifying foss functions in malware binaries. ACM Transactions on Privacy and Security (TOPS), 21, 2 (2018), 1–34. https://doi.org/10.1145/3175492
[4]
Manos Antonakakis. 2017. Understanding the mirai botnet. In 26th USENIX security symposium (USENIX Security 17). 1093–1110.
[5]
Sebastian Banescu, Christian Collberg, Vijay Ganesh, Zack Newsham, and Alexander Pretschner. 2016. Code obfuscation against symbolic execution attacks. In Proceedings of the 32nd Annual Conference on Computer Security Applications. 189–200. https://doi.org/10.1145/2991079.2991114
[6]
Martial Bourquin, Andy King, and Edward Robbins. 2013. Binslayer: accurate comparison of binary executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop. 1–10. https://doi.org/10.1145/2430553.2430557
[7]
Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. Bingo: Cross-architecture cross-os binary search. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 678–689. https://doi.org/10.1145/2950290.2950350
[8]
Zoe Chen, Paul O’Donnell, Eric Ottman, Steven Trieu, and Alan J Michaels. 2020. An Invisible Insider Threat: The Risks of Implanted Medical Devices in Secure Spaces.
[9]
Christian Collberg, Sam Martin, Jonathan Myers, and Jasvir Nagra. 2012. Distributed application tamper detection via continuous software updates. In Proceedings of the 28th Annual Computer Security Applications Conference. 319–328. https://doi.org/10.1145/2420950.2420997
[10]
Christian Collberg, Clark Thomborson, and Douglas Low. 1998. Breaking abstractions and unstructuring data structures. In Proceedings of the 1998 International Conference on Computer Languages (Cat. No. 98CB36225). 28–38. https://doi.org/10.1109/ICCL.1998.674154
[11]
Ang Cui, Michael Costello, and Salvatore Stolfo. 2013. When firmware modifications attack: A case study of embedded exploitation. https://doi.org/10.7916/D8P55NKB
[12]
Yaniv David, Nimrod Partush, and Eran Yahav. 2017. Similarity of binaries through re-optimization. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 79–94. https://doi.org/10.1145/3062341.3062387
[13]
Yaniv David, Nimrod Partush, and Eran Yahav. 2018. Firmup: Precise static detection of common vulnerabilities in firmware. ACM SIGPLAN Notices, 53, 2 (2018), 392–404. https://doi.org/10.1145/3173162.3177157
[14]
Yaniv David and Eran Yahav. 2014. Tracelet-based code search in executables. Acm Sigplan Notices, 49, 6 (2014), 349–360. https://doi.org/10.1145/2594291.2594343
[15]
Artem Dinaburg, Paul Royal, Monirul Sharif, and Wenke Lee. 2008. Ether: malware analysis via hardware virtualization extensions. In Proceedings of the 15th ACM conference on Computer and communications security. 51–62. https://doi.org/10.1145/1455770.1455779
[16]
Steven HH Ding, Benjamin CM Fung, and Philippe Charland. 2019. Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In 2019 IEEE Symposium on Security and Privacy (SP). 472–489. https://doi.org/10.1109/SP.2019.00003
[17]
Yue Duan, Xuezixiang Li, Jinghan Wang, and Heng Yin. 2020. Deepbindiff: Learning program-wide code representations for binary diffing. In Network and Distributed System Security Symposium. https://doi.org/10.14722/ndss.2020.24311
[18]
Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. 2008. A survey on automated dynamic malware-analysis techniques and tools. ACM computing surveys (CSUR), 44, 2 (2008), 1–42. https://doi.org/10.1145/2089125.2089126
[19]
Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code. In NDSS. 52, 58–79. https://doi.org/10.14722/ndss.2016.23185
[20]
Qian Feng, Minghua Wang, Mu Zhang, Rundong Zhou, Andrew Henderson, and Heng Yin. 2017. Extracting conditional formulas for cross-platform bug search. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. 346–359. https://doi.org/10.1145/3052973.3052995
[21]
Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 480–491. https://doi.org/10.1145/2976749.2978370
[22]
Jian Gao, Xin Yang, Ying Fu, Yu Jiang, and Jiaguang Sun. 2018. VulSeeker: A semantic learning based vulnerability seeker for cross-platform binary. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). 896–899. https://doi.org/10.1145/3238147.3240480
[23]
Yikun Hu, Yuanyuan Zhang, Juanru Li, and Dawu Gu. 2016. Cross-architecture binary semantics understanding via similar code comparison. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 1, 57–67. https://doi.org/10.1109/SANER.2016.50
[24]
Yikun Hu, Yuanyuan Zhang, Juanru Li, and Dawu Gu. 2017. Binary code clone detection across architectures and compiling configurations. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). 88–98. https://doi.org/10.1109/ICPC.2017.22
[25]
Yikun Hu, Yuanyuan Zhang, Juanru Li, Hui Wang, Bodong Li, and Dawu Gu. 2018. Binmatch: A semantics-based hybrid approach on binary code clone analysis. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 104–114. https://doi.org/10.1109/ICSME.2018.00019
[26]
He Huang, Amr M Youssef, and Mourad Debbabi. 2017. Binsequence: Fast, accurate and scalable binary code reuse detection. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. 155–166. https://doi.org/10.1145/3052973.3052974
[27]
Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. 2015. Obfuscator-LLVM–software protection for the masses. In 2015 IEEE/ACM 1st International Workshop on Software Protection. 3–9. https://doi.org/10.1109/SPRO.2015.10
[28]
Samuel T King and Peter M Chen. 2006. SubVirt: Implementing malware with virtual machines. In 2006 IEEE Symposium on Security and Privacy (S&P’06). 14–pp. https://doi.org/10.1109/SP.2006.38
[29]
Kaiyuan Kuang, Zhanyong Tang, Xiaoqing Gong, Dingyi Fang, Xiaojiang Chen, and Zheng Wang. 2018. Enhance virtual-machine-based code obfuscation security through dynamic bytecode scheduling. Computers & Security, 74 (2018), 202–220. https://doi.org/10.1016/j.cose.2018.01.008
[30]
Albert Kwon, Udit Dhawan, Jonathan M Smith, Thomas F Knight Jr, and Andre DeHon. 2013. Low-fat pointers: compact encoding and efficient gate-level implementation of fat pointers for spatial safety and capability-based security. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. 721–732. https://doi.org/10.1145/2508859.2516713
[31]
Thomas Lengauer and Robert Endre Tarjan. 1979. A fast algorithm for finding dominators in a flowgraph. ACM Transactions on Programming Languages and Systems (TOPLAS), 1, 1 (1979), 121–141. https://doi.org/10.1145/357062.357071
[32]
LLVM Project. 2022. LLVM Block Frequency Terminology. https://llvm.org/docs/BlockFrequencyTerminology.html.
[33]
Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. 2014. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 389–400. https://doi.org/10.1145/2635868.2635900
[34]
Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Roberto Baldoni, and Leonardo Querzoni. 2019. Safe: Self-attentive function embeddings for binary similarity. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. 309–329. https://doi.org/10.1007/978-3-030-22038-9_15
[35]
microsoft. 2021. New Security Signals study shows firmware attacks on the rise. https://www.microsoft.com/security/blog/2021/03/30/new-security-signals-study-shows-firmware-attacks-on-the-rise-heres-how-microsoft-is-working-to-help-eliminate-this-entire-class-of-threats/.
[36]
Jiang Ming, Dongpeng Xu, Yufei Jiang, and Dinghao Wu. 2017. $BinSim$: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking. In 26th USENIX Security Symposium (USENIX Security 17). 253–270.
[37]
Jiang Ming, Dongpeng Xu, Li Wang, and Dinghao Wu. 2015. Loop: Logic-oriented opaque predicate detection in obfuscated binary code. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 757–768. https://doi.org/10.1145/2810103.2813617
[38]
MNEMONIC LABS. 2020. Uncovering vulnerabilities in pacemakers. https://www.mnemonic.no/blog/uncovering-vulnerabilities-in-pacemakers/.
[39]
Carey Nachenberg. 1997. Computer virus-antivirus coevolution. Commun. ACM, 40, 1 (1997), 46–51. https://doi.org/10.1145/242857.242869
[40]
Mathilde Ollivier, Sébastien Bardin, Richard Bonichon, and Jean-Yves Marion. 2019. How to kill symbolic deobfuscation for free (or: unleashing the potential of path-oriented protections). In Proceedings of the 35th Annual Computer Security Applications Conference. 177–189. https://doi.org/10.1145/3359789.3359812
[41]
Oreans Technologies. 2022. Themida Overview. https://www.oreans.com/themida.php.
[42]
Anthony Ralston, Edwin D Reilly, and David Hemmendinger. 2000. Encyclopedia of computer science. Grove’s Dictionaries Inc.
[43]
Xiaolei Ren, Michael Ho, Jiang Ming, Yu Lei, and Li Li. 2021. Unleashing the hidden power of compiler optimization on binary code difference: An empirical study. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 142–157. https://doi.org/10.1145/3453483.3454035
[44]
Kevin A Roundy and Barton P Miller. 2013. Binary-code obfuscations in prevalent packer tools. ACM Computing Surveys (CSUR), 46, 1 (2013), 1–32. https://doi.org/10.1145/2522968.2522972
[45]
Sebastian Schrittwieser, Stefan Katzenbeisser, Johannes Kinder, Georg Merzdovnik, and Edgar Weippl. 2016. Protecting software through obfuscation: Can it keep pace with progress in code analysis? ACM Computing Surveys (CSUR), 49, 1 (2016), 1–37. https://doi.org/10.1145/2886012
[46]
Monirul Sharif, Andrea Lanzi, Jonathon Giffin, and Wenke Lee. 2009. Automatic reverse engineering of malware emulators. In 2009 30th IEEE Symposium on Security and Privacy. 94–109. https://doi.org/10.1109/SP.2009.27
[47]
statista. 2020. Number of Connected IoT Devices Worldwide. https://www.statista.com/statistics/1101442/iot-number-of-connected-devices-worldwide/.
[48]
Mechthild Stoer and Frank Wagner. 1997. A simple min-cut algorithm. Journal of the ACM (JACM), 44, 4 (1997), 585–591. https://doi.org/10.1145/263867.263872
[49]
Tencent Blade Team. 2021. Exploiting Qualcomm WLAN and Modem Over The Air. https://blade.tencent.com/en/advisories/qualpwn/.
[50]
Roberto Tiella and Mariano Ceccato. 2017. Automatic generation of opaque constants based on the k-clique problem for resilient data obfuscation. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). 182–192. https://doi.org/10.1109/SANER.2017.7884620
[51]
Xabier Ugarte-Pedrero, Davide Balzarotti, Igor Santos, and Pablo G Bringas. 2015. SoK: Deep packer inspection: A longitudinal study of the complexity of run-time packers. In 2015 IEEE Symposium on Security and Privacy. 659–673. https://doi.org/10.1109/SP.2015.46
[52]
Huaijin Wang, Pingchuan Ma, Yuanyuan Yuan, Zhibo Liu, Shuai Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2022. Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking. IEEE Transactions on Software Engineering, https://doi.org/10.1109/TSE.2022.3149240
[53]
Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, and Chao Zhang. 2022. jTrans: Jump-Aware Transformer for Binary Code Similarity. arXiv preprint arXiv:2205.12713, https://doi.org/10.48550/arXiv.2205.12713
[54]
Huaijin Wang, Shuai Wang, Dongpeng Xu, Xiangyu Zhang, and Xiao Liu. 2020. Generating effective software obfuscation sequences with reinforcement learning. IEEE Transactions on Dependable and Secure Computing, https://doi.org/10.1109/TDSC.2020.3041655
[55]
Shuai Wang and Dinghao Wu. 2017. In-memory fuzzing for binary code similarity analysis. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 319–330. https://doi.org/10.1109/ASE.2017.8115645
[56]
Lili Wei, Yepang Liu, and Shing-Chi Cheung. 2016. Taming android fragmentation: Characterizing and detecting compatibility issues for android apps. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 226–237. https://doi.org/10.1145/2970276.2970312
[57]
Hui Xu, Yangfan Zhou, Yu Kang, Fengzhi Tu, and Michael Lyu. 2018. Manufacturing resilient bi-opaque predicates against symbolic execution. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 666–677. https://doi.org/10.1109/DSN.2018.00073
[58]
Xi Xu, Qinghua Zheng, Zheng Yan, Ming Fan, Ang Jia, and Ting Liu. 2021. Interpretation-enabled Software Reuse Detection Based on a Multi-Level Birthmark Model. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 873–884. https://doi.org/10.1109/ICSE43902.2021.00084
[59]
Yifei Xu, Zhengzi Xu, Bihuan Chen, Fu Song, Yang Liu, and Ting Liu. 2020. Patch based vulnerability matching for binary programs. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 376–387. https://doi.org/10.1145/3395363.3397361
[60]
Yinxing Xue, Zhengzi Xu, Mahinthan Chandramohan, and Yang Liu. 2018. Accurate and scalable cross-architecture cross-os binary code search with emulation. IEEE Transactions on Software Engineering, 45, 11 (2018), 1125–1149. https://doi.org/10.1109/TSE.2018.2827379
[61]
Peihua Zhang. 2022. Khaos. https://doi.org/10.5281/zenodo.7496594
[62]
zynamics GmbH and Google LLC. 2022. BinDiff Manual. http://www.zynamics.com/bindiff/manual/index.html.

Cited By

View all
  • (2024)BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniquesBenchCouncil Transactions on Benchmarks, Standards and Evaluations10.1016/j.tbench.2024.1001634:2(100163)Online publication date: Jun-2024
  • (2023)OPTango: Multi-central Representation Learning against Innumerable Compiler Optimization for Binary Diffing2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00013(774-785)Online publication date: 9-Oct-2023

Index Terms

  1. Khaos: The Impact of Inter-procedural Code Obfuscation on Binary Diffing Techniques

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '23: Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization
    February 2023
    262 pages
    ISBN:9798400701016
    DOI:10.1145/3579990
    This work is licensed under a Creative Commons Attribution 4.0 International License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 February 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. Binary Diffing
    2. Obfuscation
    3. Software Protection

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • the Innovation Funding of ICT, CAS

    Conference

    CGO '23

    Acceptance Rates

    Overall Acceptance Rate 312 of 1,061 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)493
    • Downloads (Last 6 weeks)76
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)BinCodex: A comprehensive and multi-level dataset for evaluating binary code similarity detection techniquesBenchCouncil Transactions on Benchmarks, Standards and Evaluations10.1016/j.tbench.2024.1001634:2(100163)Online publication date: Jun-2024
    • (2023)OPTango: Multi-central Representation Learning against Innumerable Compiler Optimization for Binary Diffing2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE59848.2023.00013(774-785)Online publication date: 9-Oct-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media