Abstract
Function recognition is one of the most critical tasks in binary analysis and reverse engineering. However, the recognition of inline functions still remains challenging. This is mainly due to two factors. Firstly, in binaries, there exist no expert patterns, e.g., prologue/epilogue instructions, for inline functions. Secondly, instruction reordering introduced by compiler optimization makes the address space of the instruction from the same inline function discontinuous. The address space of an inline function is often mingled with that of regular functions. This paper proposes FSmell, a graph theory based function recognition framework that specifically targets inline functions. FSmell introduces Instruction Topology Graph (ITG) to represent the data flow dependencies for instructions in a basic block. With the help of ITG, the problem of distinguishing inline instructions from caller instructions is transformed into the graph connectivity problem, which is solved by computing the minimum vertex separator. We have applied FSmell to analyze 78 binaries compiled by GCC and CLANG with 3 different optimization levels. Of the 205,890 inline functions in the 78 binaries, FSmell reports 76,777, with a precision of 67.5%, and a recall of 39.2%. With the help of FSmell, 50% of the vulnerabilities missed by other methods are detected and located.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Perkins, J.H., et al.: Automatically patching errors in deployed software. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 87–102 (2009)
Cesare, S., Xiang, Y., Zhou, W.: Control flow-based malware VariantDetection. IEEE Trans. Dependable Secure Comput. 11(4), 307–317 (2013)
Gu, F., et al.: \(\{\)COMRace\(\}\): detecting data race vulnerabilities in \(\{\)COM\(\}\) objects. In: 31st USENIX Security Symposium (USENIX Security 2022), pp. 3019–3036 (2022)
Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 363–376 (2017)
Luo, L., Ming, J., Wu, D., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 389–400 (2014)
Schwartz, E.J., Lee, J., Woo, M., Brumley, D.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring (2013)
Gussoni, A., Di Federico, A., Fezzardi, P., Agosta, G.: A comb for decompiled C code. In: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, pp. 637–651 (2020)
Burk, K., Pagani, F., Kruegel, C., Vigna, G.: Decomperson: how humans decompile and what we can learn from it. In: 31st USENIX Security Symposium (USENIX Security 2022), pp. 2765–2782 (2022)
Zeping, Yu., Zheng, W., Wang, J., Tang, Q., Nie, S., Shi, W.: CodeCMR: cross-modal retrieval for function-level binary source code matching. In: Advances in Neural Information Processing Systems, vol. 33, pp. 3872–3883 (2020)
Yuan, Z., et al.: B2SFinder: detecting open-source software reuse in COTS software. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1038–1049. IEEE (2019)
Ban, G., Lili, X., Xiao, Y., Li, X., Yuan, Z., Huo, W.: B2SMatcher: fine-grained version identification of open-source software in binary files. Cybersecurity 4(1), 1–21 (2021)
He, J., Ivanov, P., Tsankov, P., Raychev, V., Vechev, M.: Debin: predicting debug information in stripped binaries. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1667–1680 (2018)
Lacomis, J., et al.: DIRE: a neural approach to decompiled identifier naming. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 628–639. IEEE (2019)
Schwartz, E.J., Cohen, C.F., Duggan, M., Gennari, J., Havrilla, J.S., Hines, C.: Using logic programming to recover C++ classes and methods from compiled executables. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 426–441 (2018)
Zhang, M., Sekar, R.: Control flow and code integrity for COTS binaries: an effective defense against real-world ROP attacks. In: Proceedings of the 31st Annual Computer Security Applications Conference, pp. 91–100 (2015)
Abadi, M., Budiu, M., Erlingsson, U., Ligatti, J.: Control-flow integrity principles, implementations, and applications. ACM Trans. Inf. Sys. Secur. (TISSEC) 13(1), 1–40 (2009)
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. ACM Sigplan Not. 42(6), 89–100 (2007)
Hex Rays. Ida pro (2020). https://www.hex-rays.com/products/ida
Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 463–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_37
Shoshitaishvili, Y., et al.: SOK: (state of) the art of war: offensive techniques in binary analysis. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 138–157. IEEE (2016)
Jia, A., et al.: 1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis. ACM Trans. Softw. Eng. Methodol. (2022). Just Accepted
Serrano, M.: Inline expansion: When and how? In: Glaser, H., Hartel, P., Kuchen, H. (eds.) PLILP 1997. LNCS, vol. 1292, pp. 143–157. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0033842
Bao, T., Burket, J., Woo, M., Turner, R., Brumley, D.: \(\{\)BYTEWEIGHT\(\}\): learning to recognize functions in binary code. In: 23rd USENIX Security Symposium (USENIX Security 2014), pp. 845–860 (2014)
Ahmed, T., Devanbu, P., Sawant, A.A.: Learning to find usages of library functions in optimized binaries. IEEE Trans. Softw. Eng. 48(10), 3862–3876 (2021)
Qiu, J., Su, X., Ma, P.: Using reduced execution flow graph to identify library functions in binary code. IEEE Trans. Softw. Eng. 42(2), 187–202 (2015)
Chandramohan, M., Xue, Y., Xu, Z., Liu, Y., Cho, C.Y., Tan, H.B.K.: BinGo: cross-architecture cross-OS binary search. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 678–689 (2016)
Ding, S.H.H., Fung, B.C.M., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 472–489. IEEE (2019)
Guilfanov, I.: Decompiler internals: microcode (2018)
Lin, Y., Gao, D.: When function signature recovery meets compiler optimization. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 36–52. IEEE (2021)
Beyer, D., Fararooy, A.: A simple and effective measure for complex low-level dependencies. In: 2010 IEEE 18th International Conference on Program Comprehension, pp. 80–83. IEEE (2010)
Yakdan, K., Eschweiler, S., Gerhards-Padilla, E., Smith, M.: No More Gotos: decompilation using pattern-independent control-flow structuring and semantic-preserving transformations. In: NDSS. Citeseer (2015)
Becker, P., Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D.: Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional, New York (1999)
Anderson, D.: Libdwarf and dwarfdump (2011)
Rosenblum, N.E., Zhu, X., Miller, B.P., Hunt, K.: Learning to analyze binary computer code. In: AAAI, pp. 798–804 (2008)
Shin, E.C.R., Song, D., Moazzezi, R.: Recognizing functions in binaries with neural networks. In: 24th USENIX security symposium (USENIX Security 2015), pp. 611–626 (2015)
Wang, S., Wang, P., Wu, D.: Semantics-aware machine learning for function recognition in binary code. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 388–398. IEEE (2017)
Pei, K., Guan, J., King, D.W., Yang, J., Jana, S.: XDA: accurate, robust disassembly with transfer learning. In: Proceedings of the 2021 Network and Distributed System Security Symposium (NDSS) (2021)
Yu, S., Qu, Y., Hu, X., Yin, H.: DeepDi: learning a relational graph convolutional network model on instructions for fast and accurate disassembly. In: Proceedings of the USENIX Security Symposium (2022)
Acknowledgement
This research was supported in part by Key Laboratory of Network Assessment Technology (Chinese Academy of Science) and Beijing Key Laboratory of Network Security and Protection Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Table 4 presents the information of vulnerabilities associated with inline functions recognized by FSmell. “Caller Functions” refer to functions that invoke inline functions. “CVEs” denote the CVE numbers of vulnerabilities in these inline functions. The column labeled“found?” indicates whether FSmell successfully recognized the boundaries of the inline functions.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lin, W., Guo, Q., Yin, J., Zuo, X., Wang, R., Gong, X. (2024). FSmell: Recognizing Inline Function in Binary Code. In: Tsudik, G., Conti, M., Liang, K., Smaragdakis, G. (eds) Computer Security – ESORICS 2023. ESORICS 2023. Lecture Notes in Computer Science, vol 14345. Springer, Cham. https://doi.org/10.1007/978-3-031-51476-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-51476-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51475-3
Online ISBN: 978-3-031-51476-0
eBook Packages: Computer ScienceComputer Science (R0)