Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Efficient string matching: an aid to bibliographic search

Published: 01 June 1975 Publication History

Abstract

This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

References

[1]
Aho, A.V.Hopscroft, J,E. and Ullman, D. J., the Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974.
[2]
Booth, T.U Sequential Machines and Automata Theory. Wiley, New York, 1967.
[3]
Brzozowski, J.A. Derivatives of regular expressions. J. ACM 11:4 (October 1964), 481-494.
[4]
Bullen, R.H., Jr., and Millen, J.K. Microtext - the design of a microprogrammed finite state search machine for full-text retrieval. Proc. Fall Joint Computer Conference, 1972, pp. 479-488.
[5]
Fischer, M.J., and Paterson, M.S. String matching and other products. Technical Report 41, Project MAC, M.I.T., 1974.
[6]
Gimpel, J.A. A theory of discrete.patterns and their implementation in SNOBOL4. Comm. ACM 16:2 (February 1973), 91-100.
[7]
Harrison, M.C. Implementation of the substring test by hashing. Comm. ACM14:12 (December 1971), 777-779.
[8]
Johnson, W.L., Porter, J.H., Ackley, S.I., and Ross, D.T. Automatic generation of efficient lexical processors using finite state techniques. Comm. ACMll:I2 (December 1968), 805-813.
[9]
Kernighan, B.W., and Cherry, L.L. A system for typesetting mathematics. Comm. ACM18:3 (March 1975), 151-156.
[10]
Kleene, S.C. Representation of events in nerve nets. In Automata Studies, C.E. Shannon and J. McCarthy (eds.), Princeton University Press, 1956, pp. 3-40.
[11]
Knuth, D.E. Fundamental Algorithms, second edition, The Art of Computer Programming 1, Addison-Wesley, Reading, Mass., 1973.
[12]
Knuth, D.E. Sorting and Searching, The Art of Computer Prograining 3, Addison-Wesley, Reading, Mass., 1973.
[13]
Knuth, D.E., Morris, J.H., Jr., and Pratt, V.R. Fast pattern matching in strings. TR CS-74-440, Stanford University, Stanford, California, 1974.
[14]
Kohavi, Z. Switching and Finite Automata Theory. McGraw- Hill, New York, 1970.
[15]
McNaughton, R., and Yamada, H. Regular expressions and state graphs for automata. IRE Trans. Electronic Computers 9:1 (1960), 39-47.
[16]
Rabin, M.O., and Scott, D. Finite automata and their decision problems. IBM J. Research and Development 3, (1959), 114-125.
[17]
Thompson, K. Regular search expression algorithm. Comm. ACM 11:6 (June 1968), 419-422.

Cited By

View all
  • (2024)A Study on the Key Applications of MalwareInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-19359(481-485)Online publication date: 14-Aug-2024
  • (2024)Multi-Pattern GPU Accelerated Collision-Less Rabin-Karp for NIDSInternational Journal of Distributed Systems and Technologies10.4018/IJDST.34126915:1(1-16)Online publication date: 27-Mar-2024
  • (2024)Predictive Maintenance with Linguistic Text MiningMathematics10.3390/math1207108912:7(1089)Online publication date: 4-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 18, Issue 6
June 1975
51 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/360825
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1975
Published in CACM Volume 18, Issue 6

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bibliographic search
  2. computational complexity
  3. finite state machines
  4. information retrieval
  5. keywords and phrases
  6. string pattern matching
  7. text-editing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,613
  • Downloads (Last 6 weeks)141
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Study on the Key Applications of MalwareInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-19359(481-485)Online publication date: 14-Aug-2024
  • (2024)Multi-Pattern GPU Accelerated Collision-Less Rabin-Karp for NIDSInternational Journal of Distributed Systems and Technologies10.4018/IJDST.34126915:1(1-16)Online publication date: 27-Mar-2024
  • (2024)Predictive Maintenance with Linguistic Text MiningMathematics10.3390/math1207108912:7(1089)Online publication date: 4-Apr-2024
  • (2024)Finite State Automata on Multi-Word Units for Efficient Text-MiningMathematics10.3390/math1204050612:4(506)Online publication date: 6-Feb-2024
  • (2024)Feasibility of Application Layer Header Parsing in eBPF and P42024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619855(475-481)Online publication date: 3-Jun-2024
  • (2024)CRISPR-TE: a web-based tool to generate single guide RNAs targeting transposable elementsMobile DNA10.1186/s13100-024-00313-015:1Online publication date: 1-Feb-2024
  • (2024)A Large-Scale Empirical Study of Open Source License Usage: Practices and ChallengesProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644900(595-606)Online publication date: 15-Apr-2024
  • (2024)OptiClass: An Optimized Classifier for Application Layer Protocols Using Bit Level SignaturesACM Transactions on Privacy and Security10.1145/363377727:1(1-23)Online publication date: 10-Jan-2024
  • (2024)Efficient Matching of Regular Expressions with Lookaround AssertionsProceedings of the ACM on Programming Languages10.1145/36329348:POPL(2761-2791)Online publication date: 5-Jan-2024
  • (2024)SmartNIC Security Isolation in the Cloud with S-NICProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650071(851-869)Online publication date: 22-Apr-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media