Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

BinAIV: : Semantic-enhanced vulnerability detection for Linux x86 binaries

Published: 01 December 2023 Publication History

Abstract

Binary code vulnerability detection is an important research direction in the field of network security. The extensive reuse of open-source code has led to the spread of vulnerabilities that originally only affected a small number of targets to other software. Existing vulnerability detection methods are mainly based on binary code similarity analysis, that is, by comparing the similarity of code embedding to detect vulnerabilities. However, existing methods lack semantic understanding of binary code and cannot distinguish between different functions with similar code structures, which reduces the accuracy of vulnerability detection. This paper proposes a binary vulnerability detection method BinAIV based on function semantics. BinAIV is based on a neural network model, which defines and constructs binary function semantics to achieve more accurate similarity analysis. Experimental results show that in terms of binary code similarity analysis performance, BinAIV has a significant improvement compared to traditional methods that only use function embedding. In cross-compiler function search, cross-optimization function search, and cross-obfuscation function search experiments, the average Recall@1 value of BinAIV compared to the best-performing baseline methods increased by 40.1 %, 99.8 %, and 184.0 %. In the real-world vulnerability detection experiment, BinAIV had the highest detection accuracy for all vulnerabilities, with an improvement of 155.1 % and 97.7 % compared to Asm2Vec and SAFE, respectively.

References

[1]
S. Ahn, S. Ahn, H. Koo, Y. Paek, Practical binary code similarity detection with BERT-based transferable similarity learning, in: Proceedings of the 38th Annual Computer Security Applications Conference, ACSAC, 2022, pp. 361–374.
[2]
T. Ben-Nun, A.S. Jakobovits, T. Hoefler, Neural Code Comprehension: a learnable representation of code semantics, in: Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, 2018, pp. 3589–3601.
[3]
S. Cesare, Y. Xiang, Malware variant detection using similarity search over sets of control flow graphs, in: Proceedings of the IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom, 2011, pp. 181–189.
[4]
S. Cesare, Y. Xiang, W. Zhou, Control flow-based malware variant detection, IEEE Trans. Dependable Secure Comput. 11 (2014) 307–317.
[5]
Costa, L.F., 2021. Further generalizations of the Jaccard Index. ArXiv preprint, ArXiv:2110.09619.
[6]
J. Devlin, M.W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, 2019, pp. 4171–4186.
[7]
S.H.H. Ding, B.C.M. Fung, P. Charland, Kam1n0: mapreduce-based assembly clone search for reverse engineering, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 461–470.
[8]
S.H.H. Ding, B.C.M. Fung, P. Charland, Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization, in: Proceedings of the IEEE Symposium on Security and Privacy, SP, 2019, pp. 472–489.
[9]
Li Dong, N. Yang, W. Wang, F. Wei, X. Liu, Yu Wang, J. Gao, M. Zhou, H.W. Hon, Unified language model pre-training for natural language understanding and generation, in: Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, NeurIPS, 2019, pp. 13042–13054.
[10]
Y. Duan, X. Li, J. Wang, H. Yin, DeepBinDiff: learning program-wide code representations for binary diffing, in: Proceedings of the 27th Annual Network and Distributed System Security Symposium, NDSS, 2020.
[11]
S. Eschweiler, K. Yakdan, E. Gerhards-Padilla, discovRE: efficient cross-architecture identification of bugs in binary code, in: Proceedings of the 23rd Annual Network and Distributed System Security Symposium, NDSS, 2016.
[12]
Q. Feng, R. Zhou, C. Xu, Y. Cheng, B. Testa, H. Yin, Scalable graph-based bug search for firmware images, in: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS, 2016, pp. 480–491.
[13]
J. Guo, B.O. Zhao, H. Liu, D. Leng, Y. An, G. Shu, DeepDual-SD: deep dual attribute-aware embedding for binary code similarity detection, Int. J. Comput. Intell. Syst. 16 (1) (2023) 35.
[14]
Hex-rays, 2023. IDA Pro disassembler and debugger. https://hex-rays.com/ida-pro/.
[15]
Y. Hu, Y. Zhang, J. Li, D. Gu, Binary code clone detection across architectures and compiling configurations, Proceedings of the 25th International Conference on Program Comprehension, ICPC, 2017, pp. 88–98.
[16]
Y. Ji, L. Cui, H.H. Huang, BugGraph: differentiating source-binary code similarity with graph triplet-loss network, in: Proceedings of the ACM Asia Conference on Computer and Communications Security, ASIA CCS, Virtual Event, 2021, pp. 702–715.
[17]
X. Li, Yu Qu, H. Yin, PalmTree: learning an assembly language model for instruction embedding, in: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, 2021, pp. 3236–3251.
[18]
B. Liu, W. Huo, C. Zhang, W. Li, F. Li, A. Piao, W. Zou, αDiff: cross-version binary code similarity detection with DNN, in: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE, 2018, pp. 667–678.
[19]
L. Massarelli, G. Antonio Di Luna, F. Petroni, L. Querzoni, R. Baldoni, SAFE: self-attentive function embeddings for binary similarity, in: Proceedings of the Detection of Intrusions and Malware, and Vulnerability Assessment - 16th International Conference, DIMVA, 2019, pp. 309–329. Volume 11543 of Lecture Notes in Computer Science.
[20]
T. Mikolov, K. Chen, GS. Corrado, J. Dean, Efficient estimation of word representations in vector space, in: Proceedings of the 1st International Conference on Learning Representations, ICLR, 2013.
[21]
O'Shea, K., Nash, R., 2015. An introduction to convolutional neural networks. arXiv preprint, arXiv:1511.08458.
[22]
J. Pewny, B. Garmany, R. Gawlik, C. Rossow, T. Holz, Cross-architecture bug search in binary executables, in: Proceedings of the IEEE Symposium on Security and Privacy, SP, 2015, pp. 709–724.
[23]
A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving language understanding by generative pre-training, OpenAI Tech. Rep. (2018).
[24]
SecretPatch, 2023. Vulnerabilities dataset. https://github.com/SecretPatch/Dataset.
[25]
Su, J., 2023. SimBERT: integrating retrieval and generation into BERT. https://github.com/ZhuiyiTechnology/simbert.
[26]
C. Tamás, D. Papp, L. Buttyán, SIMBIoTA: similarity-based malware detection on IoT devices, in: Proceedings of the 6th International Conference on Internet of Things, Big Data and Security, IoTBDS, 2021, pp. 58–69.
[27]
S. Ullah, H. Oh, BinDiffNN: learning distributed representation of assembly for robust binary diffing against semantic differences, IEEE Trans. Softw. Eng. 48 (9) (2022) 3442–3466.
[28]
X. Wang, K. Sun, A. Batcheller, S. Jajodia, Detecting “0-Day” vulnerability: an empirical study of secret security patch in OSS, in: Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN, 2019, pp. 485–492.
[29]
H. Wang, W. Qu, G. Katz, W. Zhu, Z. Gao, H. Qiu, J. Zhuge, C. Zhang, jTrans: jump-aware transformer for binary code similarity detection, in: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA, 2022, pp. 1–13.
[30]
Y. Wang, P. Jia, Xi Peng, C. Huang, J. Liu, BinVulDet: detecting vulnerability in binary program via decompiled pseudo code and BiLSTM-attention, Comput. Secur. 125 (2023).
[31]
X. Xu, C. Liu, Q. Feng, H. Yin, Le Song, D.X. Song, Neural network-based graph embedding for cross-platform binary code similarity detection, in: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS, 2017, pp. 363–376.
[32]
S.C. Yang, L. Patil, D. Dutta, Function semantic representation (FSR): a rule-based ontology for product functions, J. Comput. Inf. Sci. Eng. 10 (2010) 3.
[33]
S. Yang, L. Cheng, Y. Zeng, Z. Lang, H. Zhu, Z. Shi, Asteria: deep learning-based AST-encoding for cross-platform binary code similarity detection, in: Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN, 2021, pp. 224–236.
[34]
Z. Yu, R. Cao, Q. Tang, S. Nie, J. Huang, S. Wu, Order Matters: semantic-aware neural networks for binary code similarity detection, in: Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI, 2020, pp. 1145–1152.
[35]
F. Zuo, X. Li, Z. Zhang, P. Young, L. Luo, Q. Zeng, Neural machine translation inspired binary code similarity comparison beyond function pairs, in: Proceedings of the 26th Annual Network and Distributed System Security Symposium, NDSS, 2019.

Cited By

View all
  • (2024)BinVuGAL: Binary vulnerability detection method based on graph neural network combined with assembly language modelProceedings of the 2024 3rd International Conference on Cryptography, Network Security and Communication Technology10.1145/3673277.3673305(159-163)Online publication date: 19-Jan-2024

Index Terms

  1. BinAIV: Semantic-enhanced vulnerability detection for Linux x86 binaries
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Computers and Security
      Computers and Security  Volume 135, Issue C
      Dec 2023
      755 pages

      Publisher

      Elsevier Advanced Technology Publications

      United Kingdom

      Publication History

      Published: 01 December 2023

      Author Tags

      1. Function semantic
      2. Vulnerability detection
      3. Code similarity
      4. Binary code
      5. Deep learning

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 26 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)BinVuGAL: Binary vulnerability detection method based on graph neural network combined with assembly language modelProceedings of the 2024 3rd International Conference on Cryptography, Network Security and Communication Technology10.1145/3673277.3673305(159-163)Online publication date: 19-Jan-2024

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media