research-article

BinAIV: : Semantic-enhanced vulnerability detection for Linux x86 binaries

Authors:

Fei KangAuthors Info & Claims

Volume 135, Issue C

https://doi.org/10.1016/j.cose.2023.103508

Published: 01 December 2023 Publication History

Abstract

Binary code vulnerability detection is an important research direction in the field of network security. The extensive reuse of open-source code has led to the spread of vulnerabilities that originally only affected a small number of targets to other software. Existing vulnerability detection methods are mainly based on binary code similarity analysis, that is, by comparing the similarity of code embedding to detect vulnerabilities. However, existing methods lack semantic understanding of binary code and cannot distinguish between different functions with similar code structures, which reduces the accuracy of vulnerability detection. This paper proposes a binary vulnerability detection method BinAIV based on function semantics. BinAIV is based on a neural network model, which defines and constructs binary function semantics to achieve more accurate similarity analysis. Experimental results show that in terms of binary code similarity analysis performance, BinAIV has a significant improvement compared to traditional methods that only use function embedding. In cross-compiler function search, cross-optimization function search, and cross-obfuscation function search experiments, the average Recall@1 value of BinAIV compared to the best-performing baseline methods increased by 40.1 %, 99.8 %, and 184.0 %. In the real-world vulnerability detection experiment, BinAIV had the highest detection accuracy for all vulnerabilities, with an improvement of 155.1 % and 97.7 % compared to Asm2Vec and SAFE, respectively.

References

[1]

S. Ahn, S. Ahn, H. Koo, Y. Paek, Practical binary code similarity detection with BERT-based transferable similarity learning, in: Proceedings of the 38th Annual Computer Security Applications Conference, ACSAC, 2022, pp. 361–374.

[2]

T. Ben-Nun, A.S. Jakobovits, T. Hoefler, Neural Code Comprehension: a learnable representation of code semantics, in: Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, 2018, pp. 3589–3601.

[3]

S. Cesare, Y. Xiang, Malware variant detection using similarity search over sets of control flow graphs, in: Proceedings of the IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom, 2011, pp. 181–189.

[4]

S. Cesare, Y. Xiang, W. Zhou, Control flow-based malware variant detection, IEEE Trans. Dependable Secure Comput. 11 (2014) 307–317.

[5]

Costa, L.F., 2021. Further generalizations of the Jaccard Index. ArXiv preprint, ArXiv:2110.09619.

[6]

J. Devlin, M.W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, 2019, pp. 4171–4186.

[7]

S.H.H. Ding, B.C.M. Fung, P. Charland, Kam1n0: mapreduce-based assembly clone search for reverse engineering, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 461–470.

[8]

S.H.H. Ding, B.C.M. Fung, P. Charland, Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization, in: Proceedings of the IEEE Symposium on Security and Privacy, SP, 2019, pp. 472–489.

[9]

Li Dong, N. Yang, W. Wang, F. Wei, X. Liu, Yu Wang, J. Gao, M. Zhou, H.W. Hon, Unified language model pre-training for natural language understanding and generation, in: Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, NeurIPS, 2019, pp. 13042–13054.

[10]

Y. Duan, X. Li, J. Wang, H. Yin, DeepBinDiff: learning program-wide code representations for binary diffing, in: Proceedings of the 27th Annual Network and Distributed System Security Symposium, NDSS, 2020.

[11]

S. Eschweiler, K. Yakdan, E. Gerhards-Padilla, discovRE: efficient cross-architecture identification of bugs in binary code, in: Proceedings of the 23rd Annual Network and Distributed System Security Symposium, NDSS, 2016.

[12]

Q. Feng, R. Zhou, C. Xu, Y. Cheng, B. Testa, H. Yin, Scalable graph-based bug search for firmware images, in: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS, 2016, pp. 480–491.

[13]

J. Guo, B.O. Zhao, H. Liu, D. Leng, Y. An, G. Shu, DeepDual-SD: deep dual attribute-aware embedding for binary code similarity detection, Int. J. Comput. Intell. Syst. 16 (1) (2023) 35.

[14]

Hex-rays, 2023. IDA Pro disassembler and debugger. https://hex-rays.com/ida-pro/.

[15]

Y. Hu, Y. Zhang, J. Li, D. Gu, Binary code clone detection across architectures and compiling configurations, Proceedings of the 25th International Conference on Program Comprehension, ICPC, 2017, pp. 88–98.

[16]

Y. Ji, L. Cui, H.H. Huang, BugGraph: differentiating source-binary code similarity with graph triplet-loss network, in: Proceedings of the ACM Asia Conference on Computer and Communications Security, ASIA CCS, Virtual Event, 2021, pp. 702–715.

[17]

X. Li, Yu Qu, H. Yin, PalmTree: learning an assembly language model for instruction embedding, in: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, 2021, pp. 3236–3251.

[18]

B. Liu, W. Huo, C. Zhang, W. Li, F. Li, A. Piao, W. Zou, αDiff: cross-version binary code similarity detection with DNN, in: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE, 2018, pp. 667–678.

[19]

L. Massarelli, G. Antonio Di Luna, F. Petroni, L. Querzoni, R. Baldoni, SAFE: self-attentive function embeddings for binary similarity, in: Proceedings of the Detection of Intrusions and Malware, and Vulnerability Assessment - 16th International Conference, DIMVA, 2019, pp. 309–329. Volume 11543 of Lecture Notes in Computer Science.

[20]

T. Mikolov, K. Chen, GS. Corrado, J. Dean, Efficient estimation of word representations in vector space, in: Proceedings of the 1st International Conference on Learning Representations, ICLR, 2013.

[21]

O'Shea, K., Nash, R., 2015. An introduction to convolutional neural networks. arXiv preprint, arXiv:1511.08458.

[22]

J. Pewny, B. Garmany, R. Gawlik, C. Rossow, T. Holz, Cross-architecture bug search in binary executables, in: Proceedings of the IEEE Symposium on Security and Privacy, SP, 2015, pp. 709–724.

[23]

A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, Improving language understanding by generative pre-training, OpenAI Tech. Rep. (2018).

[24]

SecretPatch, 2023. Vulnerabilities dataset. https://github.com/SecretPatch/Dataset.

[25]

Su, J., 2023. SimBERT: integrating retrieval and generation into BERT. https://github.com/ZhuiyiTechnology/simbert.

[26]

C. Tamás, D. Papp, L. Buttyán, SIMBIoTA: similarity-based malware detection on IoT devices, in: Proceedings of the 6th International Conference on Internet of Things, Big Data and Security, IoTBDS, 2021, pp. 58–69.

[27]

S. Ullah, H. Oh, BinDiffNN: learning distributed representation of assembly for robust binary diffing against semantic differences, IEEE Trans. Softw. Eng. 48 (9) (2022) 3442–3466.

[28]

X. Wang, K. Sun, A. Batcheller, S. Jajodia, Detecting “0-Day” vulnerability: an empirical study of secret security patch in OSS, in: Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN, 2019, pp. 485–492.

[29]

H. Wang, W. Qu, G. Katz, W. Zhu, Z. Gao, H. Qiu, J. Zhuge, C. Zhang, jTrans: jump-aware transformer for binary code similarity detection, in: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA, 2022, pp. 1–13.

[30]

Y. Wang, P. Jia, Xi Peng, C. Huang, J. Liu, BinVulDet: detecting vulnerability in binary program via decompiled pseudo code and BiLSTM-attention, Comput. Secur. 125 (2023).

[31]

X. Xu, C. Liu, Q. Feng, H. Yin, Le Song, D.X. Song, Neural network-based graph embedding for cross-platform binary code similarity detection, in: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS, 2017, pp. 363–376.

[32]

S.C. Yang, L. Patil, D. Dutta, Function semantic representation (FSR): a rule-based ontology for product functions, J. Comput. Inf. Sci. Eng. 10 (2010) 3.

[33]

S. Yang, L. Cheng, Y. Zeng, Z. Lang, H. Zhu, Z. Shi, Asteria: deep learning-based AST-encoding for cross-platform binary code similarity detection, in: Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN, 2021, pp. 224–236.

[34]

Z. Yu, R. Cao, Q. Tang, S. Nie, J. Huang, S. Wu, Order Matters: semantic-aware neural networks for binary code similarity detection, in: Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI, 2020, pp. 1145–1152.

[35]

F. Zuo, X. Li, Z. Zhang, P. Young, L. Luo, Q. Zeng, Neural machine translation inspired binary code similarity comparison beyond function pairs, in: Proceedings of the 26th Annual Network and Distributed System Security Symposium, NDSS, 2019.

Cited By

Song ZXu J(2024)BinVuGAL: Binary vulnerability detection method based on graph neural network combined with assembly language modelProceedings of the 2024 3rd International Conference on Cryptography, Network Security and Communication Technology10.1145/3673277.3673305(159-163)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3673277.3673305

Index Terms

BinAIV: Semantic-enhanced vulnerability detection for Linux x86 binaries
1. Computing methodologies
2. Social and professional topics

Index terms have been assigned to the content through auto-classification.

Recommendations

HAformer: Semantic fusion of hex machine code and assembly code for cross-architecture binary vulnerability detection
Abstract
Binary vulnerability detection is a significant area of research in computer security. The existing methods for detecting binary vulnerabilities primarily rely on binary code similarity analysis, detecting vulnerabilities by comparing the ...
SVulDetector: Vulnerability detection based on similarity using tree-based attention and weighted graph embedding mechanisms
Abstract
Vulnerability detection by comparing similarities with known vulnerable code is an important method for improving code security, and is particularly effective in detecting vulnerabilities caused by code reuse. However, vulnerability detection is ...
BugGraph: Differentiating Source-Binary Code Similarity with Graph Triplet-Loss Network
ASIA CCS '21: Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security

Binary code similarity detection, which answers whether two pieces of binary code are similar, has been used in a number of applications,such as vulnerability detection and automatic patching. Existing approaches face two hurdles in their efforts to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computers and Security

Computers and Security Volume 135, Issue C

Dec 2023

755 pages

ISSN:0167-4048

Issue’s Table of Contents

Elsevier Ltd.

Publisher

Elsevier Advanced Technology Publications

United Kingdom

Publication History

Published: 01 December 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Song ZXu J(2024)BinVuGAL: Binary vulnerability detection method based on graph neural network combined with assembly language modelProceedings of the 2024 3rd International Conference on Cryptography, Network Security and Communication Technology10.1145/3673277.3673305(159-163)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3673277.3673305

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents