survey

A Survey of Automatic Protocol Reverse Engineering Tools

Authors:

Sandeep K. Shukla,

T. Charles ClancyAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 48, Issue 3

Article No.: 40, Pages 1 - 26

https://doi.org/10.1145/2840724

Published: 09 December 2015 Publication History

Abstract

Computer network protocols define the rules in which two entities communicate over a network of unique hosts. Many protocol specifications are unknown, unavailable, or minimally documented, which prevents thorough analysis of the protocol for security purposes. For example, modern botnets often use undocumented and unique application-layer communication protocols to maintain command and control over numerous distributed hosts. Inferring the specification of closed protocols has numerous advantages, such as intelligent deep packet inspection, enhanced intrusion detection system algorithms for communications, and integration with legacy software packages. The multitude of closed protocols coupled with existing time-intensive reverse engineering methodologies has spawned investigation into automated approaches for reverse engineering of closed protocols. This article summarizes and organizes previously presented automatic protocol reverse engineering tools by approach. Approaches that focus on reverse engineering the finite state machine of a target protocol are separated from those that focus on reverse engineering the protocol format.

References

[1]

Rakesh Agrawal and Srikant Ramakrishnan. 1994. Fast algorithms for mining association rules. In 20th International Conference on Very Large Data Bases (VLDB), Vol. 1215.

Digital Library

[2]

Glenn Ammons, Rastislav Bodík, and James R. Larus. 2002. Mining specifications. In 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’02). ACM, New York, NY, 4--16.

[3]

João Antunes, Nuno Neves, and Paulo Verissimo. 2011. Reverse engineering of protocols from network traces. In 2011 18th Working Conference on Reverse Engineering (WCRE), 169,178.

Digital Library

[4]

Marshall Beddoe. 2004. The protocol informatics project. Retrieved March 19, 2014 from http://www.4tphi.net/&sim;awalters/PI/PI.html.

[5]

Nikita Borisov, David J. Brumley, Helen J. Wang, and Chuanxiong Guo. 2007. Generic application-level protocol analyzer and its language. In Network and Distributed System Security Symposium.

[6]

Juan Caballero, Heng Yin, Zhenkai Liang, and Dawn Song. 2007. Polyglot: Automatic extraction of protocol message format using dynamic binary analysis. In 14th ACM Conference on Computer and Communications Security (CCS’07). ACM, New York, NY, 317--329. http://doi.acm.org/10.1145/1315245.1315286

Digital Library

[7]

Juan Caballero, Pongsin Poosankam, Christian Kreibich, and Dawn Song. 2009. Dispatcher: Enabling active botnet infiltration using automatic protocol reverse-engineering. In Proceedings of the 16th ACM Conference on Computer and Communications Security (CCS’09). ACM, New York, NY, 621--634. http://doi.acm.org/10.1145/1653662.1653737

Digital Library

[8]

Juan Caballero and Dawn Song. 2013. Automatic protocol reverse-engineering: Message format extraction and field semantics inference. International Journal of Computer and Telecommunications Networking 57, 2. Elsevier, 451--474.

Digital Library

[9]

Chia Yuan Cho, Domagoj Babić, Eui Chul Richard Shin, and Dawn Song. 2010. Inference and analysis of formal models of botnet command and control protocols. In Proceedings of the 17th ACM Conference on Computer and Communications Security (CCS’10). ACM, New York, NY, 426--439. http://doi.acm.org/10.1145/1866307.1866355

Digital Library

[10]

Paolo Milani Comparetti, Gilbert Wondracek, Christopher Kruegel, and Engin Kirda. 2009. Prospex: Protocol specification extraction. In 2009 30th IEEE Symposium on Security and Privacy, 110--125.

Digital Library

[11]

Ed Crocker. 2008. Augmented BNF for Syntax Specifications: ABNF. Retrieved February 27, 2014 from http://tools.ietf.org/html/rfc5234.

[12]

Weidong Cui, Vern Paxson, Nicholas C. Weaver, and Randy H. Katz. 2006. Protocol-independent adaptive replay of application dialog. In Proceedings of the 13th Symposium on Network and Distributed System Security (NDSS’06).

[13]

Weidong Cui, Jayanthkumar Kannan, and Helen J. Wang. 2007. Discoverer: Automatic protocol description generation from network traces. In USENIX Security Symposium.

Digital Library

[14]

Weidong Cui, Marcus Peinado, Karl Chen, Helen J. Wang, and Luis Irun-Briz. 2008. Tupni: Automatic reverse engineering of input formats. In 15th ACM Conference on Computer and Communications Security (CCS’08). ACM, New York, NY, 391--402. http://doi.acm.org/10.1145/1455770.1455820

Digital Library

[15]

Alberto Dainotti, Antonio Pescape, and Kimberly Claffy. 2012. Issues and future directions in traffic classification. IEEE Network 26, 1, (Jan.-Feb. 2012), 35--40.

Digital Library

[16]

Serge Gorbunov and Arnold Rosenbloom. 2010. AutoFuzz: Automated network protocol fuzzing framework. International Journal of Computer Science and Network Security 10, 8, 239--245.

[17]

IEEE Standards Association. 2012. IEEE Standard for Electric Power Systems Communications—Distributed Network Protocol (DNP3).

[18]

IETF.org. 1999. RFC 2616—Hypertext Transfer Protocol—HTTP/1.1. Retrieved July 20, 2015 from https://www.ietf.org/rfc/rfc2616.txt.

[19]

IETF.org. 2014. RFC 7230—Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. Retrieved July 20, 2015 from https://tools.ietf.org/html/rfc7230.

[20]

ITU.int. 2014. Introduction to ASN.1. Retrieved February 27, 2014 from http://www.itu.int/en/ITU-T/asn1/Pages/introduction.aspx.

[21]

Jim Kurose and Keith Ross. 2013. Computer Networking: A Top-Down Approach (6th ed.). Addison-Wesley, Upper Saddle River, NJ.

Digital Library

[22]

Patrick LaRoche, A. Nur Zincir-Heywood, and Malcolm I. Heywood. 2012. Network protocol discovery and analysis via live interaction. In Applications of Evolutionary Computation. Springer, Berlin, 11--20.

Digital Library

[23]

Patrick LaRoche, Aimee Burrows, and A. Nur Zincir-Heywood. 2013. How far an evolutionary approach can go for protocol state analysis and discovery. In 2013 IEEE Congress on Evolutionary Computation, 3228--3235.

[24]

David Lee and Krishan Sabnani. 1993. Reverse-engineering of communication protocols. In IEEE International Conference on Network Protocols (ICNP), 208--216.

[25]

David Lee and Mihalis Yannakakis. 1996. Principles and methods of testing finite state machines—A survey. Proceedings of the IEEE 84, 8, 1090--1123.

[26]

Corrado Leita, Ken Mermoud, and Marc Dacier. 2005. ScriptGen: An automated script generation tool for HoneyD. In 21st Annual Computer Security Applications Conference (ACSAC’05), 200--214.

Digital Library

[27]

Xiangdong Li and Li Chen. 2011. A survey on methods of automatic protocol reverse engineering. In 2011 7th International Conference on Computational Intelligence and Security (CIS), 685--689.

Digital Library

[28]

Zhiqiang Lin, Xuxian Jiang, Dongyan Xu, and Xiangyu Zhang. 2008. Automatic protocol format reverse engineering through context-aware monitored execution. In NDSS, 1--15.

[29]

Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. 2010. Reverse engineering input syntactic structure from program execution and its applications. In IEEE Transactions on Software Engineering 36, 5 (2010) 688--703.

Digital Library

[30]

Min Liu, Chunfu Jia, Lu Liu, and Zhi Wang. 2013. Extracting sent message formats from executables using backward slicing. In 2013 4th International Conference on Emerging Intelligent Data and Web Technologies (EIDWT), 377--384.

Digital Library

[31]

Jian-Zhen Luo, and Shun-Zheng Yu. 2013. Position-based automatic reverse engineering of network protocols. Journal of Network and Computer Applications 36, 3 (2013), 1070--1077.

[32]

Justin Ma, Kirill Levchenko, Christian Kreibich, Stefan Savage, and Geoffrey M. Voelker. 2006. Unexpected means of protocol inference. In 6th ACM SIGCOMM Conference on Internet Measurement (IMC’06). ACM, New York, NY, 313--326. http://doi.acm.org/10.1145/1177080.1177123

Digital Library

[33]

George Mealy. 1955. A method for synthesizing sequential circuits. In Bell System Technical Journal 34, 5 (1955), 1045--1079.

[34]

Milton Mueller and Asghari Hadi. 2012. Deep packet inspection and bandwidth management: Battles over BitTorrent in Canada and the United States. Telecommunications Policy 36, 6 (2012), 462--475.

Digital Library

[35]

Saul Needleman and Christian Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48 (1970), 443--53.

[36]

Norton.com. 2014. Bots and botnets—A growing threat. Retrieved February 26, 2014 from https://us.norton.com/botnet/promo.

[37]

Sandip C. Patel, Ganesh D. Bhatt, and James H. Graham. 2009. Improving the cyber security of SCADA communication networks. Communications of the ACM 52, 7 (July 2009), 139--142. http://doi.acm.org/10.1145/1538788.1538820

Digital Library

[38]

PeachFuzzer.com. 2014. Peach Fuzzer Overview. Retrieved February 26, 2014 from http://peachfuzzer.com/pdf/Peach-Overview-DejaVuSecurity-Datasheet-2014.pdf.

[39]

Christian Rossow and Christian J. Dietrich. 2013. Provex: Detecting botnets with encrypted command and control channels. In Detection of Intrusions and Malware, and Vulnerability Assessment, Lecture Notes in Computer Science, Vol. 7967. Springer, Berlin, 21--40.

Digital Library

[40]

Maxim Shevertalov and Spiros Mancoridis. 2007. A reverse engineering tool for extracting protocols of networked applications. In 14th Working Conference on Reverse Engineering (WCRE’07). 229--238.

Digital Library

[41]

Skype.com. 2014. TLS and SRTP for Skype Connect: Technical Datasheet. Retrieved February 27, 2014 from https://support.skype.com/resources/sites/SKYPE/content/live/DOCUMENTS/0/DO14/en_US/skype-connect-technical-datasheet.pdf.

[42]

TCPDump/LibPCap. 2010. TCPDump & LibPCap. Retrieved March 19, 2014 from http://www.tcpdump.org/.

[43]

Naftali Tishby, Fernando Pereira, and William Bialek. 1999. The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, 368--377.

[44]

Li Tong, Yuan Liu, Chun-rui Zhang, Fan-zhi Meng, and Yang Yue. 2014. A novel method for delimiting frames of unknown protocol. In 2014 IEEE Workshop on Electronics, Computer and Applications, 552--555.

[45]

Andrew Tridgell. 2003. How SAMBA Was Written. Retrieved February 26, 2014 from http://www.samba.org/ftp/tridge/misc/french_cafe.txt.

[46]

Antonio Trifilo, Stefan Burschka, and Ernst Biersack. 2009. Traffic to protocol reverse engineering. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, 1--8.

[47]

Helen J. Wang, Chuanxiong Guo, Daniel R. Simon, and Alf Zugenmaier. 2004. Shield: Vulnerability-driven network filters for preventing known vulnerability exploits. In Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’04). ACM, New York, NY, 193--204.

Digital Library

[48]

Zhi Wang, Xuxian Jiang, Weidong Cui, Xinyuan Wang, and Mike Grace. 2009. ReFormat: Automatic reverse engineering of encrypted messages. In Computer Security—ESORICS 2009. Springer, Berlin, 200--215.

Digital Library

[49]

Yipeng Wang, Xingjian Li, Jiao Meng, Yong Zhao, Zhibin Zhang, and Li Guo. 2011a. Biprominer: Automatic mining of binary protocol features. In 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), 179--184.

Digital Library

[50]

Yipeng Wang, Zhibin Zhang, Danfeng Yao, Buyun Qu, and Li Guo. 2011b. Inferring protocol state machine from network traces: A probabilistic approach. Applied Cryptography and Network Security 2011.

Digital Library

[51]

Yipeng Wang, XiaoChun Yun, M. Zubair Shafiq, Liyan Wang, Alex X. Liu, Zhibin Zhang, Danfeng Yao, Yong Zheng Zhang, and Li Guo. 2012. A semantics aware approach to automated reverse engineering unknown protocols. In 2012 20th IEEE International Conference on Network Protocols (ICNP).

Digital Library

[52]

Yong Wang. 2013. Protocol Specification Inference Based on Keywords Identification. Advanced Data Mining and Applications. Springer, Berlin, 443--454.

Digital Library

[53]

T. A. Welch. 1984. A technique for high-performance data compression. Computer 17, 6 (1984), 8--19.

Digital Library

[54]

Wine.org. 2014. About Wine. Retrieved February 26, 2014 from http://www.winehq.org/about/.

[55]

Gilbert Wondracek, Paolo Milani Comparetti, Christopher Kruegel, and Engin Kirda. 2008. Automatic network protocol analysis. In NDSS, 1--14.

[56]

Ming-Ming Xiao, Shun-Zheng Yu, and Yu Wang. 2009. Automatic network protocol automaton extraction. In 2009 3rd International Conference on Network and System Security, 336--343.

Digital Library

[57]

Zhao Zhang, Qiao-Yan Wen, and Wen Tang. 2012. Mining protocol state machines by interactive grammar inference. In 2012 3rd International Conference on Digital Manufacturing and Automation (ICDMA), 524--527.

Digital Library

Cited By

Zhan MLi YLi BZhang JLi CWang W(2024)Toward Automated Field Semantics Inference for Binary Protocol Reverse EngineeringIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.332666619(764-776)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2023.3326666
Du XXu CLi LLi X(2024)Multigranularity Feature Automatic Marking-Based Deep Learning for Anomaly Detection of Industrial Control SystemsIEEE Open Journal of Instrumentation and Measurement10.1109/OJIM.2024.34184663(1-10)Online publication date: 2024
https://doi.org/10.1109/OJIM.2024.3418466
Lee CJafarov IDietrich SLee H(2024)PRETT2: Discovering HTTP/2 DoS Vulnerabilities via Protocol Reverse EngineeringComputer Security – ESORICS 202410.1007/978-3-031-70890-9_1(3-23)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-70890-9_1
Show More Cited By

Index Terms

A Survey of Automatic Protocol Reverse Engineering Tools
1. Computing methodologies
  1. Machine learning
2. Networks
  1. Network protocols
    1. Application layer protocols

Recommendations

Towards automated protocol reverse engineering using semantic information
ASIA CCS '14: Proceedings of the 9th ACM symposium on Information, computer and communications security

Network security products, such as NIDS or application firewalls, tend to focus on application level communication flows. However, adding support for new proprietary and often undocumented protocols, implies the reverse engineering of these protocols. ...
Automatic Reverse Engineering Method for Extracting Well-trimmed Protocol Specification
ICTCE '18: Proceedings of the 2nd International Conference on Telecommunications and Communication Engineering

Emergence of high-speed Internet and ubiquitous environment has led to a rapid increase of applications and malicious behaviors with various functions. Many of the complex and diverse protocols that occur under these situations, are unknown protocols ...
Automatic protocol reverse-engineering: Message format extraction and field semantics inference

Understanding the command-and-control (C&C) protocol used by a botnet is crucial for anticipating its repertoire of nefarious activity. However, the C&C protocols of botnets, similar to many other application layer protocols, are undocumented. Automatic ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 48, Issue 3

February 2016

619 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/2856149

Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering/University of Florida/Gainesville

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 December 2015

Accepted: 01 September 2015

Revised: 01 July 2015

Received: 01 August 2014

Published in CSUR Volume 48, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

59
Total Citations
View Citations
2,106
Total Downloads

Downloads (Last 12 months)176
Downloads (Last 6 weeks)20

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhan MLi YLi BZhang JLi CWang W(2024)Toward Automated Field Semantics Inference for Binary Protocol Reverse EngineeringIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.332666619(764-776)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2023.3326666
Du XXu CLi LLi X(2024)Multigranularity Feature Automatic Marking-Based Deep Learning for Anomaly Detection of Industrial Control SystemsIEEE Open Journal of Instrumentation and Measurement10.1109/OJIM.2024.34184663(1-10)Online publication date: 2024
https://doi.org/10.1109/OJIM.2024.3418466
Lee CJafarov IDietrich SLee H(2024)PRETT2: Discovering HTTP/2 DoS Vulnerabilities via Protocol Reverse EngineeringComputer Security – ESORICS 202410.1007/978-3-031-70890-9_1(3-23)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-70890-9_1
Shi QXu XZhang XCalandrino JTroncoso C(2023)Extracting protocol format as state machine via controlled static loop analysisProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620630(7019-7036)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620630
Ning BZong XHe KLian L(2023)PREIUD: An Industrial Control Protocols Reverse Engineering Tool Based on Unsupervised Learning and Deep Neural Network MethodsSymmetry10.3390/sym1503070615:3(706)Online publication date: 11-Mar-2023
https://doi.org/10.3390/sym15030706
Cherepkov DMamoutova ODojnikov ABolsunovskaya M(2023)Using SAT Solvers to Reverse-Engineer FSM Models of Digital DevicesElectronics10.3390/electronics1222468012:22(4680)Online publication date: 17-Nov-2023
https://doi.org/10.3390/electronics12224680
Lee KCho MKim JLee K(2023)Anomaly Detection Method for Unknown Protocols in a Power Plant ICS Network with Decision TreeApplied Sciences10.3390/app1307420313:7(4203)Online publication date: 26-Mar-2023
https://doi.org/10.3390/app13074203
Yin SYou ZHu QShi QLi J(2023)Unknown Binary Protocol Recognition Algorithm Based on One Class of Classification and One‐Dimensional CNNMathematical Problems in Engineering10.1155/2023/19190452023:1Online publication date: 26-Apr-2023
https://doi.org/10.1155/2023/1919045
Shi QShao JYe YZheng MZhang XMeng WJensen CCremers CKirda E(2023)Lifting Network Protocol Implementation to Precise Format Specification with Security ApplicationsProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3616614(1287-1301)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3576915.3616614
Rusinova ZChernyshov Y(2023)Using of NLP Methods to Separate Traffic Packets of Different Protocols2023 IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT)10.1109/USBEREIT58508.2023.10158858(344-347)Online publication date: 15-May-2023
https://doi.org/10.1109/USBEREIT58508.2023.10158858
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents