Memoized Semantics-Based Binary Diffing with Application to Malware Lineage Inference

Jiang Ming¹⁷,
Dongpeng Xu¹⁷ &
Dinghao Wu¹⁷

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 455))

Included in the following conference series:

IFIP International Information Security and Privacy Conference

1843 Accesses
14 Citations
1 Altmetric

Abstract

Identifying differences between two executable binaries (binary diffing) has compelling security applications, such as software vulnerability exploration, “1-day” exploit generation and software plagiarism detection. Recently, binary diffing based on symbolic execution and constraint solver has been proposed to look for the code pairs with the same semantics, even though they are ostensibly different in syntactics. Such logical-based method captures intrinsic differences of binary code, making it a natural choice to analyze highly-obfuscated malicious program. However, semantics-based binary diffing suffers from significant performance slowdown, hindering it from analyzing large-scale malware samples. In this paper, we attempt to mitigate the high overhead of semantics-based binary diffing with application to malware lineage inference. We first study the key obstacles that contribute to the performance bottleneck. Then we propose basic blocks fast matching to speed up semantics-based binary diffing. We introduce an union-find set structure that records semantically equivalent basic blocks. Managing the union-find structure during successive comparisons allows direct reuse of previously computed results. Moreover, we purpose to concretize symbolic formulas and cache equivalence queries to further cut down the invocation times of constraint solver. We have implemented our technique on top of iBinHunt and evaluated it on 12 malware families with respect to the performance improvement when performing intra-family comparisons. Our experimental results show that our methods can accelerate symbolic execution from \(2.8\)x to \(5.3\)x (with an average \(4.0\)x), and reduce constraint solver invocation by a factor of \(3.0\)x to \(6.0\)x (with an average \(4.3\)x).

Download to read the full chapter text

Chapter PDF

MalwareHunt: semantics-based malware diffing speedup by normalized basic block memoization

Article 17 May 2016

Study on Binary Code Evolution with Concrete Semantic Analysis

QuickBCC: Quick and Scalable Binary Vulnerable Code Clone Detection

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Brumley, D., Poosankam, P., Song, D., Zheng, J.: Automatic patch-based exploit generation is possible: techniques and implications. In: Proceedings of the 2008 IEEE Symposimu on Security and Privacy, SP 2008 (2008)
Google Scholar
Bruschi, D., Martignoni, L., Monga, M.: Using code normalization for fighting self-mutating malware. In: Proceedings of the International Symposium of Secure Software Engineering (2006)
Google Scholar
Cadar, C., Dunbar, D., Engler, D.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 2008 USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008 (2008)
Google Scholar
Cadar, C., Ganesh, V., Pawlowski, P.M., Dill, D.L., Engler, D.R.: EXE: automatically generating inputs of death. In: Proceedings of the 2006 ACM Conference on Computer and Communications Security, CCS 2006 (2006)
Google Scholar
Christodorescu, M., Kinder, J., Jha, S., Katzenbeisser, S., Veith, H.: Malware normalization. Technical Report 1539, University of Wisconsin, Madison (November 2005)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Data structures for disjoint sets, chapter 21. In: Introduction to Algorithms, 2nd edn, pp. 498–524. MIT Press (2001)
Google Scholar
Ganesh, V., Dill, D.L.: A decision procedure for bit-vectors and arrays. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 519–531. Springer, Heidelberg (2007)
Google Scholar
Gao, D., Reiter, M.K., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Chen, L., Ryan, M.D., Wang, G. (eds.) ICICS 2008. LNCS, vol. 5308, pp. 238–255. Springer, Heidelberg (2008)
Google Scholar
Jacob, M., Jakubowski, M.H., Naldurg, P., Saw, C.W.N., Venkatesan, R.: The superdiversifier: peephole individualization for software protection. In: Matsuura, K., Fujisaki, E. (eds.) IWSEC 2008. LNCS, vol. 5312, pp. 100–120. Springer, Heidelberg (2008)
Google Scholar
Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: Presented as part of the 22nd USENIX Security Symposium, USENIX Security 2013 (2013)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the 10th ACM SIGKDD Conference, KDD 2004 (2004)
Google Scholar
Lindorfer, M., Di Federico, A., Maggi, F., Comparetti, P.M., Zanero, S.: Lines of malicious code: insights into the malicious software industry. In: Proceedings of the 28th Annual Computer Security Applications Conference, ACSAC 2012 (2012)
Google Scholar
Liu, L., Ming, J., Wang, Z., Gao, D., Jia, C.: Denial-of-service attacks on host-based generic unpackers. In: Qing, S., Mitchell, C.J., Wang, G. (eds.) ICICS 2009. LNCS, vol. 5927, pp. 241–253. Springer, Heidelberg (2009)
Google Scholar
Luo, L., Ming, J., Wu, D., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014 (2014)
Google Scholar
Ming, J., Pan, M., Gao, D.: iBinHunt: binary hunting with inter-procedural control flow. In: Proceedings of the 15th Annual International Conference on Information Security and Cryptology, ICISC 2012 (2012)
Google Scholar
Ng, B.H., Hu, X., Prakash, A.: A study on latent vulnerabilities. In: Proceedings of the 29th IEEE Symposium on Reliable Distributed Systems, SRDS 2010 (2010)
Google Scholar
Ng, B.H., Prakash, A.: Exposé: discovering potential binary code re-use. In: Proceedings of the 37th IEEE Annual Computer Software and Applications Conference, COMPSAC 2013 (2013)
Google Scholar
Panda Security. Annual report 2013 summary. http://press.pandasecurity.com/wp-content/uploads/2010/05/PandaLabs-Annual-Report_2013.pdf (last reviewed February 12, 2014)
Roundy, K.A., Miller, B.P.: Binary-code obfuscations in prevalent packer tools. ACM Computing Surveys 46(1) (2013)
Google Scholar
Sikorski, M., Honig, A.: Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. No Starch Press (February 2012)
Google Scholar
Yang, G., Păsăreanu, C.S., Khurshid, S.: Memoized symbolic execution. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012 (2012)
Google Scholar
Yin, H., Song, D.: TEMU: binary code analysis via whole-system layered annotative execution. Technical Report UCB/EECS-2010-3, EECS Department, University of California, Berkeley (January 2010)
Google Scholar

Download references

Author information

Authors and Affiliations

The Pennsylvania State University, University Park, PA, 16802, USA
Jiang Ming, Dongpeng Xu & Dinghao Wu

Authors

Jiang Ming
View author publications
You can also search for this author in PubMed Google Scholar
Dongpeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Dinghao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiang Ming .

Editor information

Editors and Affiliations

Universität Hamburg, Hamburg, Germany
Hannes Federrath
Technische Universität Hamburg-Harburg, Hamburg, Hamburg, Germany
Dieter Gollmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ming, J., Xu, D., Wu, D. (2015). Memoized Semantics-Based Binary Diffing with Application to Malware Lineage Inference. In: Federrath, H., Gollmann, D. (eds) ICT Systems Security and Privacy Protection. SEC 2015. IFIP Advances in Information and Communication Technology, vol 455. Springer, Cham. https://doi.org/10.1007/978-3-319-18467-8_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-18467-8_28
Published: 09 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18466-1
Online ISBN: 978-3-319-18467-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Memoized Semantics-Based Binary Diffing with Application to Malware Lineage Inference

Abstract

Chapter PDF

Similar content being viewed by others

MalwareHunt: semantics-based malware diffing speedup by normalized basic block memoization

Study on Binary Code Evolution with Concrete Semantic Analysis

QuickBCC: Quick and Scalable Binary Vulnerable Code Clone Detection

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Memoized Semantics-Based Binary Diffing with Application to Malware Lineage Inference

Abstract

Chapter PDF

Similar content being viewed by others

MalwareHunt: semantics-based malware diffing speedup by normalized basic block memoization

Study on Binary Code Evolution with Concrete Semantic Analysis

QuickBCC: Quick and Scalable Binary Vulnerable Code Clone Detection

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation