Nothing Special   »   [go: up one dir, main page]

Skip to main content

FSmell: Recognizing Inline Function in Binary Code

  • Conference paper
  • First Online:
Computer Security – ESORICS 2023 (ESORICS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14345))

Included in the following conference series:

  • 736 Accesses

Abstract

Function recognition is one of the most critical tasks in binary analysis and reverse engineering. However, the recognition of inline functions still remains challenging. This is mainly due to two factors. Firstly, in binaries, there exist no expert patterns, e.g., prologue/epilogue instructions, for inline functions. Secondly, instruction reordering introduced by compiler optimization makes the address space of the instruction from the same inline function discontinuous. The address space of an inline function is often mingled with that of regular functions. This paper proposes FSmell, a graph theory based function recognition framework that specifically targets inline functions. FSmell introduces Instruction Topology Graph (ITG) to represent the data flow dependencies for instructions in a basic block. With the help of ITG, the problem of distinguishing inline instructions from caller instructions is transformed into the graph connectivity problem, which is solved by computing the minimum vertex separator. We have applied FSmell to analyze 78 binaries compiled by GCC and CLANG with 3 different optimization levels. Of the 205,890 inline functions in the 78 binaries, FSmell reports 76,777, with a precision of 67.5%, and a recall of 39.2%. With the help of FSmell, 50% of the vulnerabilities missed by other methods are detected and located.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Perkins, J.H., et al.: Automatically patching errors in deployed software. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 87–102 (2009)

    Google Scholar 

  2. Cesare, S., Xiang, Y., Zhou, W.: Control flow-based malware VariantDetection. IEEE Trans. Dependable Secure Comput. 11(4), 307–317 (2013)

    Google Scholar 

  3. Gu, F., et al.: \(\{\)COMRace\(\}\): detecting data race vulnerabilities in \(\{\)COM\(\}\) objects. In: 31st USENIX Security Symposium (USENIX Security 2022), pp. 3019–3036 (2022)

    Google Scholar 

  4. Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 363–376 (2017)

    Google Scholar 

  5. Luo, L., Ming, J., Wu, D., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 389–400 (2014)

    Google Scholar 

  6. Schwartz, E.J., Lee, J., Woo, M., Brumley, D.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring (2013)

    Google Scholar 

  7. Gussoni, A., Di Federico, A., Fezzardi, P., Agosta, G.: A comb for decompiled C code. In: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, pp. 637–651 (2020)

    Google Scholar 

  8. Burk, K., Pagani, F., Kruegel, C., Vigna, G.: Decomperson: how humans decompile and what we can learn from it. In: 31st USENIX Security Symposium (USENIX Security 2022), pp. 2765–2782 (2022)

    Google Scholar 

  9. Zeping, Yu., Zheng, W., Wang, J., Tang, Q., Nie, S., Shi, W.: CodeCMR: cross-modal retrieval for function-level binary source code matching. In: Advances in Neural Information Processing Systems, vol. 33, pp. 3872–3883 (2020)

    Google Scholar 

  10. Yuan, Z., et al.: B2SFinder: detecting open-source software reuse in COTS software. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1038–1049. IEEE (2019)

    Google Scholar 

  11. Ban, G., Lili, X., Xiao, Y., Li, X., Yuan, Z., Huo, W.: B2SMatcher: fine-grained version identification of open-source software in binary files. Cybersecurity 4(1), 1–21 (2021)

    Google Scholar 

  12. He, J., Ivanov, P., Tsankov, P., Raychev, V., Vechev, M.: Debin: predicting debug information in stripped binaries. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1667–1680 (2018)

    Google Scholar 

  13. Lacomis, J., et al.: DIRE: a neural approach to decompiled identifier naming. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 628–639. IEEE (2019)

    Google Scholar 

  14. Schwartz, E.J., Cohen, C.F., Duggan, M., Gennari, J., Havrilla, J.S., Hines, C.: Using logic programming to recover C++ classes and methods from compiled executables. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 426–441 (2018)

    Google Scholar 

  15. Zhang, M., Sekar, R.: Control flow and code integrity for COTS binaries: an effective defense against real-world ROP attacks. In: Proceedings of the 31st Annual Computer Security Applications Conference, pp. 91–100 (2015)

    Google Scholar 

  16. Abadi, M., Budiu, M., Erlingsson, U., Ligatti, J.: Control-flow integrity principles, implementations, and applications. ACM Trans. Inf. Sys. Secur. (TISSEC) 13(1), 1–40 (2009)

    Article  Google Scholar 

  17. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. ACM Sigplan Not. 42(6), 89–100 (2007)

    Article  Google Scholar 

  18. Hex Rays. Ida pro (2020). https://www.hex-rays.com/products/ida

  19. Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 463–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_37

    Chapter  Google Scholar 

  20. Shoshitaishvili, Y., et al.: SOK: (state of) the art of war: offensive techniques in binary analysis. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 138–157. IEEE (2016)

    Google Scholar 

  21. Jia, A., et al.: 1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis. ACM Trans. Softw. Eng. Methodol. (2022). Just Accepted

    Google Scholar 

  22. Serrano, M.: Inline expansion: When and how? In: Glaser, H., Hartel, P., Kuchen, H. (eds.) PLILP 1997. LNCS, vol. 1292, pp. 143–157. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0033842

    Chapter  Google Scholar 

  23. Bao, T., Burket, J., Woo, M., Turner, R., Brumley, D.: \(\{\)BYTEWEIGHT\(\}\): learning to recognize functions in binary code. In: 23rd USENIX Security Symposium (USENIX Security 2014), pp. 845–860 (2014)

    Google Scholar 

  24. Ahmed, T., Devanbu, P., Sawant, A.A.: Learning to find usages of library functions in optimized binaries. IEEE Trans. Softw. Eng. 48(10), 3862–3876 (2021)

    Article  Google Scholar 

  25. Qiu, J., Su, X., Ma, P.: Using reduced execution flow graph to identify library functions in binary code. IEEE Trans. Softw. Eng. 42(2), 187–202 (2015)

    Article  Google Scholar 

  26. Chandramohan, M., Xue, Y., Xu, Z., Liu, Y., Cho, C.Y., Tan, H.B.K.: BinGo: cross-architecture cross-OS binary search. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 678–689 (2016)

    Google Scholar 

  27. Ding, S.H.H., Fung, B.C.M., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 472–489. IEEE (2019)

    Google Scholar 

  28. Guilfanov, I.: Decompiler internals: microcode (2018)

    Google Scholar 

  29. Lin, Y., Gao, D.: When function signature recovery meets compiler optimization. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 36–52. IEEE (2021)

    Google Scholar 

  30. Beyer, D., Fararooy, A.: A simple and effective measure for complex low-level dependencies. In: 2010 IEEE 18th International Conference on Program Comprehension, pp. 80–83. IEEE (2010)

    Google Scholar 

  31. Yakdan, K., Eschweiler, S., Gerhards-Padilla, E., Smith, M.: No More Gotos: decompilation using pattern-independent control-flow structuring and semantic-preserving transformations. In: NDSS. Citeseer (2015)

    Google Scholar 

  32. Becker, P., Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D.: Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional, New York (1999)

    Google Scholar 

  33. Anderson, D.: Libdwarf and dwarfdump (2011)

    Google Scholar 

  34. Rosenblum, N.E., Zhu, X., Miller, B.P., Hunt, K.: Learning to analyze binary computer code. In: AAAI, pp. 798–804 (2008)

    Google Scholar 

  35. Shin, E.C.R., Song, D., Moazzezi, R.: Recognizing functions in binaries with neural networks. In: 24th USENIX security symposium (USENIX Security 2015), pp. 611–626 (2015)

    Google Scholar 

  36. Wang, S., Wang, P., Wu, D.: Semantics-aware machine learning for function recognition in binary code. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 388–398. IEEE (2017)

    Google Scholar 

  37. Pei, K., Guan, J., King, D.W., Yang, J., Jana, S.: XDA: accurate, robust disassembly with transfer learning. In: Proceedings of the 2021 Network and Distributed System Security Symposium (NDSS) (2021)

    Google Scholar 

  38. Yu, S., Qu, Y., Hu, X., Yin, H.: DeepDi: learning a relational graph convolutional network model on instructions for fast and accurate disassembly. In: Proceedings of the USENIX Security Symposium (2022)

    Google Scholar 

Download references

Acknowledgement

This research was supported in part by Key Laboratory of Network Assessment Technology (Chinese Academy of Science) and Beijing Key Laboratory of Network Security and Protection Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingli Guo .

Editor information

Editors and Affiliations

Appendix

Appendix

Table 4 presents the information of vulnerabilities associated with inline functions recognized by FSmell. “Caller Functions” refer to functions that invoke inline functions. “CVEs” denote the CVE numbers of vulnerabilities in these inline functions. The column labeled“found?” indicates whether FSmell successfully recognized the boundaries of the inline functions.

Table 4. Vulnerability distribution. Inline functions are the functions found by FSmell.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, W., Guo, Q., Yin, J., Zuo, X., Wang, R., Gong, X. (2024). FSmell: Recognizing Inline Function in Binary Code. In: Tsudik, G., Conti, M., Liang, K., Smaragdakis, G. (eds) Computer Security – ESORICS 2023. ESORICS 2023. Lecture Notes in Computer Science, vol 14345. Springer, Cham. https://doi.org/10.1007/978-3-031-51476-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-51476-0_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-51475-3

  • Online ISBN: 978-3-031-51476-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics