Abstract
We present a reverse-engineering tool, called Lego, which recovers class hierarchies and composition relationships from stripped binaries. Lego takes a stripped binary as input, and uses information obtained from dynamic analysis to (i) group the functions in the binary into classes, and (ii) identify inheritance and composition relationships between the inferred classes. The software artifacts recovered by Lego can be subsequently used to understand the object-oriented design of software systems that lack documentation and source code, e.g., to enable interoperability. Our experiments show that the class hierarchies recovered by Lego have a high degree of agreement—measured in terms of precision and recall—with the hierarchy defined in the source code.
Supported, in part, by NSF under grants CCF- {0810053, 0904371}; by ONR under grants N00014- 09-1-0510, 11-C-0447; by ARL under grant W911NF-09-1-0413; by AFRL under grants FA9550-09-1-0279 and FA8650-10-C-7088; and by DARPA under cooperative agreement HR0011-12-2-0012. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors, and do not necessarily reflect the views of the sponsoring agencies. T. Reps has an ownership interest in GrammaTech, Inc., which has licensed elements of the technology reported in this publication.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Balakrishnan, G., Reps, T.: DIVINE: DIscovering Variables IN Executables. In: VMCAI 2007. LNCS, vol. 4349, pp. 1–28. Springer, Heidelberg (2007)
Balakrishnan, G., Reps, T.: WYSINWYX: What You See Is Not What You eXecute. TOPLAS 32(6) (2010)
Bardin, S., Herrmann, P., Védrine, F.: Refinement-based CFG reconstruction from unstructured programs. In: Jhala, R., Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 54–69. Springer, Heidelberg (2011)
Bojic, D., Velasevic, D.: A Use-case driven method of architecture recovery for program understanding and reuse reengineering. In: CSMR (2000)
Cho, C.Y., Babić, D., Poosankam, P., Chen, K.Z., Wu, E.X., Song, D.: MACE: Model inference assisted concolic exploration for protocol and vulnerability discovery. In: USENIX Sec. Symp. (2011)
Cozzie, A., Stratton, F., Xue, H., King, S.T.: Digging for data structures. In: OSDI (2008)
D Programming Language, http://dlang.org
DMCA §1201. Circumvention of Copyright Protection Systems, www.copyright.gov/title17/92chap12.html#1201
Driscoll, E., Burton, A., Reps, T.: Checking compatibility of a producer and a consumer. In: FSE (2011)
ElWazeer, K., Anand, K., Kotha, A., Smithson, M., Barua, R.: Scalable variable and data type detection in a binary rewriter. In: PLDI (2013)
Fokin, A., Derevenetc, E., Chernov, A., Troshina, K.: SmartDec: Approaching C++ decompilation. In: WCRE (2011)
Freecode, www.freecode.com
GNU Software Repository, www.gnu.org/software/software.html
Itanium C++ ABI, refspecs.linux-foundation.org/cxxabi-1.83.html
Jacobson, E.R., Rosenblum, N., Miller, B.P.: Labeling library functions in stripped binaries. In: PASTE (2011)
Lee, J., Avgerinos, T., Brumley, D.: TIE: Principled reverse engineering of types in binary programs. In: NDSS (2011)
Lim, J., Reps, T., Liblit, B.: Extracting output formats from executables. In: WCRE (2006)
Lin, Z., Zhang, X., Xu, D.: Automatic reverse engineering of data structures from binary execution. In: NDSS (2010)
Lindig, C., Snelting, G.: Assessing modular structure of legacy code based on mathematical concept analysis. In: ICSE (1997)
Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building customized program analysis tools with dynamic instrumentation. In: PLDI (2005)
Roundy, K.A., Miller, B.P.: Binary-code obfuscations in prevalent packer tools. ACM Computing Surveys 46(1) (2013)
Schwartz, E.J., Lee, J., Woo, M., Brumley, D.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: USENIX Sec. Symp. (2013)
Siff, M., Reps, T.: Identifying modules via concept analysis. TSE 25(6) (1999)
Slowinska, A., Stancescu, T., Bos, H.: Howard: A Dynamic excavator for reverse engineering data structures. In: NDSS (2011)
SourceForge, http://sourceforge.net
Tonella, P.: Concept analysis for module restructuring. TSE 27(4) (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Srinivasan, V., Reps, T. (2014). Recovery of Class Hierarchies and Composition Relationships from Machine Code. In: Cohen, A. (eds) Compiler Construction. CC 2014. Lecture Notes in Computer Science, vol 8409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54807-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-54807-9_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54806-2
Online ISBN: 978-3-642-54807-9
eBook Packages: Computer ScienceComputer Science (R0)