Abstract
Identifying differences between two executable binaries (binary diffing) has compelling security applications, such as software vulnerability exploration, “1-day” exploit generation and software plagiarism detection. Recently, binary diffing based on symbolic execution and constraint solver has been proposed to look for the code pairs with the same semantics, even though they are ostensibly different in syntactics. Such logical-based method captures intrinsic differences of binary code, making it a natural choice to analyze highly-obfuscated malicious program. However, semantics-based binary diffing suffers from significant performance slowdown, hindering it from analyzing large-scale malware samples. In this paper, we attempt to mitigate the high overhead of semantics-based binary diffing with application to malware lineage inference. We first study the key obstacles that contribute to the performance bottleneck. Then we propose basic blocks fast matching to speed up semantics-based binary diffing. We introduce an union-find set structure that records semantically equivalent basic blocks. Managing the union-find structure during successive comparisons allows direct reuse of previously computed results. Moreover, we purpose to concretize symbolic formulas and cache equivalence queries to further cut down the invocation times of constraint solver. We have implemented our technique on top of iBinHunt and evaluated it on 12 malware families with respect to the performance improvement when performing intra-family comparisons. Our experimental results show that our methods can accelerate symbolic execution from \(2.8\)x to \(5.3\)x (with an average \(4.0\)x), and reduce constraint solver invocation by a factor of \(3.0\)x to \(6.0\)x (with an average \(4.3\)x).
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Brumley, D., Poosankam, P., Song, D., Zheng, J.: Automatic patch-based exploit generation is possible: techniques and implications. In: Proceedings of the 2008 IEEE Symposimu on Security and Privacy, SP 2008 (2008)
Bruschi, D., Martignoni, L., Monga, M.: Using code normalization for fighting self-mutating malware. In: Proceedings of the International Symposium of Secure Software Engineering (2006)
Cadar, C., Dunbar, D., Engler, D.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 2008 USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008 (2008)
Cadar, C., Ganesh, V., Pawlowski, P.M., Dill, D.L., Engler, D.R.: EXE: automatically generating inputs of death. In: Proceedings of the 2006 ACM Conference on Computer and Communications Security, CCS 2006 (2006)
Christodorescu, M., Kinder, J., Jha, S., Katzenbeisser, S., Veith, H.: Malware normalization. Technical Report 1539, University of Wisconsin, Madison (November 2005)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Data structures for disjoint sets, chapter 21. In: Introduction to Algorithms, 2nd edn, pp. 498–524. MIT Press (2001)
Ganesh, V., Dill, D.L.: A decision procedure for bit-vectors and arrays. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 519–531. Springer, Heidelberg (2007)
Gao, D., Reiter, M.K., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Chen, L., Ryan, M.D., Wang, G. (eds.) ICICS 2008. LNCS, vol. 5308, pp. 238–255. Springer, Heidelberg (2008)
Jacob, M., Jakubowski, M.H., Naldurg, P., Saw, C.W.N., Venkatesan, R.: The superdiversifier: peephole individualization for software protection. In: Matsuura, K., Fujisaki, E. (eds.) IWSEC 2008. LNCS, vol. 5312, pp. 100–120. Springer, Heidelberg (2008)
Jang, J., Woo, M., Brumley, D.: Towards automatic software lineage inference. In: Presented as part of the 22nd USENIX Security Symposium, USENIX Security 2013 (2013)
Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the 10th ACM SIGKDD Conference, KDD 2004 (2004)
Lindorfer, M., Di Federico, A., Maggi, F., Comparetti, P.M., Zanero, S.: Lines of malicious code: insights into the malicious software industry. In: Proceedings of the 28th Annual Computer Security Applications Conference, ACSAC 2012 (2012)
Liu, L., Ming, J., Wang, Z., Gao, D., Jia, C.: Denial-of-service attacks on host-based generic unpackers. In: Qing, S., Mitchell, C.J., Wang, G. (eds.) ICICS 2009. LNCS, vol. 5927, pp. 241–253. Springer, Heidelberg (2009)
Luo, L., Ming, J., Wu, D., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014 (2014)
Ming, J., Pan, M., Gao, D.: iBinHunt: binary hunting with inter-procedural control flow. In: Proceedings of the 15th Annual International Conference on Information Security and Cryptology, ICISC 2012 (2012)
Ng, B.H., Hu, X., Prakash, A.: A study on latent vulnerabilities. In: Proceedings of the 29th IEEE Symposium on Reliable Distributed Systems, SRDS 2010 (2010)
Ng, B.H., Prakash, A.: Exposé: discovering potential binary code re-use. In: Proceedings of the 37th IEEE Annual Computer Software and Applications Conference, COMPSAC 2013 (2013)
Panda Security. Annual report 2013 summary. http://press.pandasecurity.com/wp-content/uploads/2010/05/PandaLabs-Annual-Report_2013.pdf (last reviewed February 12, 2014)
Roundy, K.A., Miller, B.P.: Binary-code obfuscations in prevalent packer tools. ACM Computing Surveys 46(1) (2013)
Sikorski, M., Honig, A.: Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. No Starch Press (February 2012)
Yang, G., Păsăreanu, C.S., Khurshid, S.: Memoized symbolic execution. In: Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012 (2012)
Yin, H., Song, D.: TEMU: binary code analysis via whole-system layered annotative execution. Technical Report UCB/EECS-2010-3, EECS Department, University of California, Berkeley (January 2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 IFIP International Federation for Information Processing
About this paper
Cite this paper
Ming, J., Xu, D., Wu, D. (2015). Memoized Semantics-Based Binary Diffing with Application to Malware Lineage Inference. In: Federrath, H., Gollmann, D. (eds) ICT Systems Security and Privacy Protection. SEC 2015. IFIP Advances in Information and Communication Technology, vol 455. Springer, Cham. https://doi.org/10.1007/978-3-319-18467-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-18467-8_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18466-1
Online ISBN: 978-3-319-18467-8
eBook Packages: Computer ScienceComputer Science (R0)