A survey of binary code similarity

IU Haq, J Caballero - Acm computing surveys (csur), 2021 - dl.acm.org
Acm computing surveys (csur), 2021dl.acm.org
Binary code similarityapproaches compare two or more pieces of binary code to identify their
similarities and differences. The ability to compare binary code enables many real-world
applications on scenarios where source code may not be available such as patch analysis,
bug search, and malware detection and analysis. Over the past 22 years numerous binary
code similarity approaches have been proposed, but the research area has not yet been
systematically analyzed. This article presents the first survey of binary code similarity. It …
Binary code similarityapproaches compare two or more pieces of binary code to identify their similarities and differences. The ability to compare binary code enables many real-world applications on scenarios where source code may not be available such as patch analysis, bug search, and malware detection and analysis. Over the past 22 years numerous binary code similarity approaches have been proposed, but the research area has not yet been systematically analyzed. This article presents the first survey of binary code similarity. It analyzes 70 binary code similarity approaches, which are systematized on four aspects: (1) the applications they enable, (2) their approach characteristics, (3) how the approaches are implemented, and (4) the benchmarks and methodologies used to evaluate them. In addition, the survey discusses the scope and origins of the area, its evolution over the past two decades, and the challenges that lie ahead.
ACM Digital Library