Abstract
A probabilistic finite state machine approach to statically disassembling x86 machine language programs is presented and evaluated. Static disassembly is a crucial prerequisite for software reverse engineering, and has many applications in computer security and binary analysis. The general problem is provably undecidable because of the heavy use of unaligned instruction encodings and dynamically computed control flows in the x86 architecture. Limited work in machine learning and data mining has been undertaken on this subject. This paper shows that semantic meanings of opcode sequences can be leveraged to infer similarities between groups of opcode and operand sequences. This empowers a probabilistic finite state machine to learn statistically significant opcode and operand sequences in a training corpus of disassemblies. The similarities demonstrate the statistical significance of opcodes and operands in a surrounding context, facilitating more accurate disassembly of new binaries. Empirical results demonstrate that the algorithm is more efficient and effective than comparable approaches used by state-of-the-art disassembly tools.
The research reported herein was supported in part by AFOSR awards FA9550-12-1-0082 & FA9550-10-1-0088, NIH awards 1R0-1LM009989 & 1R01HG006844, NSF awards #1054629, Career-CNS-0845803, CNS-0964350, CNS-1016343, CNS-1111529, & CNS-1228198, ARO award W911NF-12-1-0558, and ONR award N00014-14-1-0030.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 522–536. Springer, Heidelberg (2011)
Krishnamoorthy, N., Debray, S., Fligg, K.: Static detection of disassembly errors. In: Proceedings of the 16th Working Conference on Reverse Engineering (WCRE), pp. 259–268 (2009)
Eagle, C.: The IDA Pro Book: The Unofficial Guide to the World’s Most Popular Disassembler. No Starch Press, Inc., San Francisco (2008)
Hex-Rays: The IDA Pro disassembler and debugger, http://www.hex-rays.com/idapro
GNU Project.: Gnu binary utilities (2012), http://sourceware.org/binutils/docs-2.22/binutils/index.html
Schwarz, B., Debray, S., Andrews, G.: Disassembly of executable code revisited. In: Proceedings of the 9th Working Conference on Reverse Engineering (WCRE), pp. 45–54 (2002)
Intel: Intel\(^{\hbox{\scriptsize\textregistered}}\) architecture software developer’s manual (2011), http://www.intel.com/design/intarch/manuals/243191.htm
Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.: Probabilistic finite-state machines – part I. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1013–1025 (2005)
Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.: Probabilistic finite-state machines – part II. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1026–1039 (2005)
Invisigoth of KenShoto: Visipedia, http://visi.kenshoto.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M. (2014). Shingled Graph Disassembly: Finding the Undecideable Path. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8443. Springer, Cham. https://doi.org/10.1007/978-3-319-06608-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-06608-0_23
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06607-3
Online ISBN: 978-3-319-06608-0
eBook Packages: Computer ScienceComputer Science (R0)