Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

An efficient multicharacter transition string-matching engine based on the aho-corasick algorithm

Published: 01 December 2013 Publication History

Abstract

A string-matching engine capable of inspecting multiple characters in parallel can multiply the throughput. However, the space required for implementing a matching engine that can process multiple characters in parallel generally grows exponentially with respect to the characters to be processed in parallel. Based on the Aho-Corasick algorithm (AC-algorithm), this work presents a novel multicharacter transition Nondeterministic Finite Automaton (NFA) approach, called multicharacter AC-NFA, to allow for the inspection of multiple characters in parallel. This approach first converts an AC-trie to an AC-NFA by allowing for the simultaneous activation of multiple states and then converts the AC-NFA to a k-character AC-NFA by an algorithm with concatenation operations and assistant transitions. Additionally, the alignment problem, which occurs while multiple characters are being inspected in parallel, is solved using assistant transitions. Moreover, a corresponding output is provided for each inspected character by introducing priority multiplexers to determine the final matching outputs during implementation of the multicharacter AC-NFA. Consequently, the number of derived k-character transitions grows linearly with respect to the number k. Furthermore, the derived multicharacter AC-NFA is implemented on FPGAs for evaluation. The resulting throughput grows approximately 14 times and the hardware cost grows about 18 times for 16-character AC-NFA implementation, as compared with that for 1-character AC-NFA implementation. The achievable throughput is 21.4Gbps for the 16-character AC-NFA implementation operating at a 167.36MHz clock.

References

[1]
Aho, A. V. and Corasick, M. J. 1975. Efficient string matching: An aid to bibliographic search. Comm. ACM. 18, 6, 333--340.
[2]
Alicherry, M., Muthuprasanna, M., and Kumar, V. 2006. High speed pattern matching for network IDS/IPS. In Proceedings of the IEEE 14th International Conference on Network Protocols. IEEE, Los Alamitos, CA, 187--196.
[3]
Altera Corporation. 2013. Nios II Performance Benchmarks, DS-N28162004-9.0 Data Sheet, July 2013, 101 Innovation Drive, San Jose, CA 95134.
[4]
Bloom, B. H. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7, 422--426.
[5]
Chen, C.-C. and Wang, S. D. 2012. A multi-character transition string matching architecture based on Aho-Corasick algorithm. Int. J. Innovative Comput. Inf. Control 8, 12, 8367--8386.
[6]
Clark, C. R. and Schimmel, D. E. 2004. Scalable pattern matching for high speed networks. In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 249--257.
[7]
Dharmapurikar, S., Krishnamurthy, P., Sproull, T., and Lockwood, J. 2003. Deep packet inspection using parallel Bloom filters. In Proceedings of the 11th IEEE Symposium on High Performance Interconnects. IEEE, Los Alamitos, CA, 44--51.
[8]
Dimopoulos, V., Papaefstathiou, I., and Pnevmatikatos, D. 2007. A memory-efficient reconfigurable Aho-Corasick FSM implementation for intrusion detection systems. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS'07). IEEE, Los Alamitos, CA, 186--193.
[9]
Herath, D., Lakmali, C., and Ragel, R. 2012. Accelerating string matching for bio-computing applications on multi-core CPUs. In Proceedings of the 7th IEEE International Conference on Industrial and Information Systems (ICIIS'12). 1--6.
[10]
Hua, N., Song, H., and Lakshman, T. V. 2009. Variable-stride multi-pattern matching for scalable deep packet inspection. In Proceedings of IEEE INFOCOM 2009. IEEE, Los Alamitos, CA, 415--423.
[11]
Katashita, T., Maeda, A., Toda, K., and Yamaguchi, Y. 2006a. A method of generating highly efficient string matching circuit for intrusion detection. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL'06). IEEE, Los Alamitos, CA, 1--6.
[12]
Katashita, T., Maeda, A., Toda, K., and Yamaguchi, Y. 2006b. Highly efficient string matching circuit for IDS with FPGA. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'06). IEEE, Los Alamitos, CA, 285--286.
[13]
Lin, W. and Liu, B. 2008. Pipelined parallel AC-based approach for multi-string matching. In Proceedings of the IEEE 14th International Conference on Parallel and Distributed Systems. IEEE, Los Alamitos, CA, 665--672.
[14]
Pao, D., Lin, W., and Liu, B. 2010. A memory-efficient pipelined implementation of the Aho-Corasick string-matching algorithm. ACM Trans. Archit. Code Optim. 7, 1--27.
[15]
Pao, D. and Wang, X. 2012. Multi-stride string searching for high-speed content inspection. Comput. J. 55, 10, 1216--1231.
[16]
Rahmanzadeh, V. and Ghaznavi-Ghoushchi, M. B. 2009. A multi-Gb/s parallel string matching engine for intrusion detection systems. In H. Sarbazi-Azad, B. Parhami, S.-G. Miremadi, and S. Hessabi, Eds., Advances in Computer Science and Engineering. Springer, Berlin, 847--851.
[17]
Salmela, L., Tarhio, J., and Kytojöki, J. 2006. Multipattern string matching with q-grams. ACM J. Exp. Algorithmics 11, 1.1, 1--19.
[18]
Scarpazza, D. P., Villa, O., and Petrini, F. 2008a. Exact multi-pattern string matching on the Cell/B.E. Processor. In Proceedings of the 5th Conference on Computing Frontiers (CF'08). ACM, New York, 33--42.
[19]
Scarpazza, D. P., Villa, O., and Petrini, F. 2008b. High-speed string searching against large dictionaries on the Cell/B.E. Processor. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS'08). IEEE, Los Alamitos, CA, 1--12.
[20]
Sidhu, R. and Prasanna, V. K. 2001. Fast regular expression matching using FPGAs. In Proceedings of the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01). IEEE, Los Alamitos, CA, 227--238.
[21]
Sugawara, Y., Inaba, M., and Hiraki, K. 2004. Over 10Gbps string matching mechanism for multi-stream packet scanning systems. In J. Becker, M. Platzner, and S. Vernalde, Eds., Field Programmable Logic and Application. Springer, Berlin, 484--493.
[22]
Tripp, G. 2006. A parallel “string matching engine” for use in high speed network intrusion detection systems. J. Comput. Virol. 2, 21--34.
[23]
Tuck, N., Sherwood, T., Calder, B., and Varghese, G. 2004. Deterministic memory-efficient string matching algorithms for intrusion detection. In Proceedings of the 23rd Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM'04). IEEE, Los Alamitos, CA, 2628--2639.
[24]
Tumeo, A., Villa, O., and Chavarria-Miranda, D. G. 2012. Aho-Corasick string matching on shared and distributed-memory parallel architectures. IEEE Trans. Parallel Dist. Syst. 23, 3, 436--443.
[25]
Tumeo, A., Villa, O., and Sciuto, O. 2010. Efficient pattern matching on GPUs for intrusion detection systems. In Proceedings of the 7th ACM International Conference on Computing Frontiers. ACM, New York, 87--88.
[26]
Villa, O., Chavarria-Miranda, D., and Maschhoff, K. 2009. Input-independent, scalable and fast string matching on the Cray XMT. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS'09). IEEE, Los Alamitos, CA, 1--12.
[27]
Yamagaki, N., Sidhu, R., and Kamiya, S. 2008. High-speed regular expression matching engine using multi-character NFA. In Proceedings of the IEEE International Conference on Field Programmable Logic and Applications. IEEE, Los Alamitos, CA, 131--136.
[28]
Yang, Y.-H. E. and Prasanna, V. 2013. Robust and scalable string pattern matching for deep packet inspection on multi-core processors. IEEE Trans. Parallel Dist. Sys. 24, 11, 2283--2292.
[29]
Yang, Y.-H. E., Prasanna, V. K., and Jiang, C. Q. 2010. Head-body partitioned string matching for deep packet inspection with scalable and attack-resilient performance. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS'10). IEEE, Washington, DC, 1--11.
[30]
Zha, X. and Sahni, S. 2008. Highly compressed Aho-Corasick automata for efficient intrusion detection. In Proceedings of the IEEE Symposium on Computers and Communications (ISCC'08). IEEE, Washington, DC, 298--303.

Cited By

View all
  • (2023)An Improved Quad-Array Trie Algorithm for Website Sensitive Word DetectionProceedings of the 2023 9th International Conference on Computing and Artificial Intelligence10.1145/3594315.3594361(484-490)Online publication date: 17-Mar-2023
  • (2023)Bolt: Scalable and Cost-Efficient Multistring Pattern Matching With Programmable SwitchesIEEE/ACM Transactions on Networking10.1109/TNET.2022.320252331:2(846-861)Online publication date: Apr-2023
  • (2022)Comparative performance analysis the Aho-Corasick algorithm for developing a network detection system2022 International Conference on Information Science and Communications Technologies (ICISCT)10.1109/ICISCT55600.2022.10146815(1-7)Online publication date: 28-Sep-2022
  • Show More Cited By

Index Terms

  1. An efficient multicharacter transition string-matching engine based on the aho-corasick algorithm

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Architecture and Code Optimization
        ACM Transactions on Architecture and Code Optimization  Volume 10, Issue 4
        December 2013
        1046 pages
        ISSN:1544-3566
        EISSN:1544-3973
        DOI:10.1145/2541228
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 01 December 2013
        Accepted: 01 August 2013
        Revised: 01 July 2013
        Received: 01 January 2013
        Published in TACO Volume 10, Issue 4

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. String-matching
        2. deterministic and nondeterministic finite automaton
        3. intrusion detection system

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)114
        • Downloads (Last 6 weeks)16
        Reflects downloads up to 20 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)An Improved Quad-Array Trie Algorithm for Website Sensitive Word DetectionProceedings of the 2023 9th International Conference on Computing and Artificial Intelligence10.1145/3594315.3594361(484-490)Online publication date: 17-Mar-2023
        • (2023)Bolt: Scalable and Cost-Efficient Multistring Pattern Matching With Programmable SwitchesIEEE/ACM Transactions on Networking10.1109/TNET.2022.320252331:2(846-861)Online publication date: Apr-2023
        • (2022)Comparative performance analysis the Aho-Corasick algorithm for developing a network detection system2022 International Conference on Information Science and Communications Technologies (ICISCT)10.1109/ICISCT55600.2022.10146815(1-7)Online publication date: 28-Sep-2022
        • (2022)A holistic IoT device classification approach through spatial & temporal behaviors modellingTelecommunication Systems10.1007/s11235-021-00867-x79:4(515-528)Online publication date: 23-Jan-2022
        • (2022)Auto implementation of parallel hardware architecture for Aho-Corasick algorithmDesign Automation for Embedded Systems10.1007/s10617-021-09257-726:1(29-53)Online publication date: 1-Mar-2022
        • (2021)Application of the Aho-Corasick algorithm to create a network intrusion detection systemBulletin of TUIT: Management and Communication Technologies10.51348/tuitmct446(46-53)Online publication date: 20-Sep-2021
        • (2021)Making Multi-String Pattern Matching Scalable and Cost-Efficient with Programmable Switching ASICsIEEE INFOCOM 2021 - IEEE Conference on Computer Communications10.1109/INFOCOM42981.2021.9488796(1-10)Online publication date: 10-May-2021
        • (2018)Multi-pattern Matching Algorithm based on Variable Step SizeProceedings of the 8th International Conference on Communication and Network Security10.1145/3290480.3290482(125-128)Online publication date: 2-Nov-2018
        • (2018)Memory-Based Architecture for Multicharacter Aho–Corasick String MatchingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2017.275384326:1(143-154)Online publication date: Jan-2018
        • (2017)A Flexible Pattern-Matching Algorithm for Network Intrusion Detection Systems Using Multi-Core ProcessorsAlgorithms10.3390/a1002005810:2(58)Online publication date: 24-May-2017
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Full Access

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media