Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3052973.3052974acmconferencesArticle/Chapter ViewAbstractPublication Pagesasia-ccsConference Proceedingsconference-collections
research-article

BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection

Published: 02 April 2017 Publication History

Abstract

Code reuse detection is a key technique in reverse engineering. However, existing source code similarity comparison techniques are not applicable to binary code. Moreover, compilers have made this problem even more difficult due to the fact that different assembly code and control flow structures can be generated by the compilers even when implementing the same functionality. To address this problem, we present a fuzzy matching approach to compare two functions. We first obtain an initial mapping between basic blocks by leveraging the concept of longest common subsequence on the basic block level and execution path level. We then extend the achieved mapping using neighborhood exploration. To make our approach applicable to large data sets, we designed an effective filtering process using Minhashing. Based on the proposed approach, we implemented a tool named BinSequence and conducted extensive experiments with it. Our results show that given a large assembly code repository with millions of functions, BinSequence is efficient and can attain high quality similarity ranking of assembly functions with an accuracy of above 90%. We also present several practical use cases including patch analysis, malware analysis and bug search.

References

[1]
BinDiff. http://www.zynamics.com/bindiff.html.
[2]
CVE-2015-4485. http://www.cvedetails.com/cve/CVE-2015-4485/.
[3]
Diaphora: A Program Diffing Plugin for IDA Pro. Available at: https://github.com/joxeankoret/diaphora.
[4]
FCatalog. http://www.xorpd.net/pages/fcatalog.html.
[5]
IDA Pro. https://www.hex-rays.com/products/ida/.
[6]
MS15-034. https://technet.microsoft.com/en-us/library/security/ms15-034.aspx.
[7]
PatchDiff2: Binary Diffing Plugin for IDA. Available at: https://code.google.com/p/patchdiff2/.
[8]
S. Alrabaee, P. Shirani, L. Wang, and M. Debbabi. Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation, 12:S61--S71, 2015.
[9]
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, 2006. FOCS'06, pages 459--468, 2006.
[10]
B. Bencsáth, G. Pék, L. Buttyán, and M. Felegyhazi. skywiper (aka flame aka flamer): A complex malware for targeted attacks. CrySyS Lab Technical Report, 2012.
[11]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, Third Edition. The MIT Press, 2009.
[12]
Y. David and E. Yahav. Tracelet-based code search in executables. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 349--360, 2014.
[13]
T. Dullien and R. Rolles. Graph-based comparison of executable objects (english version). SSTIC, 5:1--3, 2005.
[14]
H. Flake. Structural comparison of executable objects. In Proceedings of the IEEE Conference on Detection of Intrusions and Malware & Vulnerability Assessment, pages 161--173, 2004.
[15]
D. Gao, M. K. Reiter, and D. Song. Binhunt: Automatically finding semantic differences in binary programs. In Proceedings of the 10th International Conference on Information and Communications Security, pages 238--255, 2008.
[16]
P. Junod, J. Rinaldini, J. Wehrli, and J. Michielin. Obfuscator-LLVM -- software protection for the masses. In Proceedings of the IEEE/ACM 1st International Workshop on Software Protection, SPRO'15, pages 3--9, 2015.
[17]
W. M. Khoo, A. Mycroft, and R. Anderson. Rendezvous: A search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, pages 329--338, 2013.
[18]
Leskovec, Jure and Rajaraman, Anand and Ullman, Jeffrey D. Mining of Massive Datasets. Cambridge University Press, 2014.
[19]
L. Luo, J. Ming, D. Wu, P. Liu, and S. Zhu. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 389--400, 2014.
[20]
J. Milletary. Citadel trojan malware analysis. Dell SecureWorks, 2012.
[21]
J. Munkres. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 5(1):32--38, 1957.
[22]
B. H. Ng and A. Prakash. Exposé: discovering potential binary code re-use. In Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference, pages 492--501, 2013.
[23]
J. Oh. Fight against 1-day exploits: Diffing binaries vs anti-diffing binaries. In Blackhat technical Security Conference, 2009.
[24]
J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow. Leveraging semantic signatures for bug search in binary programs. In Proceedings of the 30th Annual Computer Security Applications Conference, pages 406--415, 2014.
[25]
A. Rahimian, R. Ziarati, S. Preda, and M. Debbabi. On the reverse engineering of the citadel botnet. In Foundations and Practice of Security, pages 408--425. Springer, 2014.
[26]
A. Sæbjørnsen, J. Willcock, T. Panas, D. Quinlan, and Z. Su. Detecting code clones in binary executables. In Proceedings of the 18th international symposium on Software testing and analysis, pages 117--128, 2009.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASIA CCS '17: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security
April 2017
952 pages
ISBN:9781450349444
DOI:10.1145/3052973
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 April 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. binary code reuse
  2. binary code similarity comparison
  3. bug search
  4. malware analysis
  5. patch analysis

Qualifiers

  • Research-article

Funding Sources

  • NSERC - Natural Sciences and Engineering Research Council of Canada

Conference

ASIA CCS '17
Sponsor:

Acceptance Rates

ASIA CCS '17 Paper Acceptance Rate 67 of 359 submissions, 19%;
Overall Acceptance Rate 418 of 2,322 submissions, 18%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)8
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Syntactic–Semantic Detection of Clone-Caused Vulnerabilities in the IoT DevicesSensors10.3390/s2422725124:22(7251)Online publication date: 13-Nov-2024
  • (2024)A Survey of Binary Code Similarity Detection TechniquesElectronics10.3390/electronics1309171513:9(1715)Online publication date: 29-Apr-2024
  • (2024)Semantic aware-based instruction embedding for binary code similarity detectionPLOS ONE10.1371/journal.pone.030529919:6(e0305299)Online publication date: 11-Jun-2024
  • (2024)CRABS-former: CRoss-Architecture Binary Code Similarity Detection based on TransformerProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671390(11-20)Online publication date: 24-Jul-2024
  • (2024)CodeExtract: Enhancing Binary Code Similarity Detection with Code Extraction TechniquesProceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657572(143-154)Online publication date: 20-Jun-2024
  • (2024)CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity DetectionProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652117(149-161)Online publication date: 11-Sep-2024
  • (2024)PPT4J: Patch Presence Test for Java BinariesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639231(1-12)Online publication date: 20-May-2024
  • (2024)LibvDiff: Library Version Difference Guided OSS Version Identification in BinariesProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623336(1-12)Online publication date: 20-May-2024
  • (2024)BinCola: Diversity-Sensitive Contrastive Learning for Binary Code Similarity DetectionIEEE Transactions on Software Engineering10.1109/TSE.2024.341107250:10(2485-2497)Online publication date: Oct-2024
  • (2024)SepBIN: Binary Feature Separation for Better Semantic Comparison and Authorship VerificationIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.333189519(1372-1387)Online publication date: 2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media