Nothing Special   »   [go: up one dir, main page]

Skip to main content

Improved Single and Multiple Approximate String Matching

  • Conference paper
Combinatorial Pattern Matching (CPM 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Included in the following conference series:

Abstract

We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough ℓ-grams from text windows so as to prove that no occurrence can contain the part of the window read, and then shifting the window. Three variants of the algorithm are presented, which give different tradeoffs between how much they work in the window and how much they shift it. We show analytically that two of our algorithms are optimal on average. Compared to the first average-optimal multipattern approximate string matching algorithm [Fredriksson and Navarro, CPM 2003], the new algorithms are much faster and are optimal up to difference ratios of 1/2, contrary to the maximum of 1/3 that could be reached in previous work. This is also a contribution to the area of single-pattern approximate string matching, as the only average-optimal algorithm [Chang and Marr, CPM 1994] also reached a difference ratio of 1/3. We show experimentally that our algorithms are very competitive, displacing the long-standing best algorithms for this problem. On real life texts, our algorithms are especially interesting for computational biology applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Baeza-Yates, R., Navarro, G.: New and faster filters for multiple approximate string matching. Random Structures and Algorithms (RSA) 20, 23–49 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  2. Chang, W., Marr, T.: Approximate string matching and local similarity. In: Crochemore, M., Gusfield, D. (eds.) CPM 1994. LNCS, vol. 807, pp. 259–273. Springer, Heidelberg (1994)

    Google Scholar 

  3. Crochemore, M., Czumaj, A., Ga̧sieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string matching algorithms. Algorithmica 12(4/5), 247–267 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  4. Fredriksson, K., Navarro, G.: Average-optimal multiple approximate string matching. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 109–128. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Horspool, R.: Practical fast searching in strings. Software Practice and Experience 10, 501–506 (1980)

    Article  Google Scholar 

  6. Hyyrö, H., Navarro, G.: Faster bit-parallel approximate string matching. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 203–224. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  8. Muth, R., Manber, U.: Approximate multiple string search. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 75–86. Springer, Heidelberg (1996)

    Google Scholar 

  9. Myers, E.W.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM 46(3), 395–415 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  10. Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)

    Article  Google Scholar 

  11. Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics (JEA) 5(4) (2000)

    Google Scholar 

  12. Sellers, P.: The theory and computation of evolutionary distances: pattern recognition. Journal of Algorithms 1, 359–373 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  13. Sutinen, E., Tarhio, J.: Filtration with q-samples in approximate string matching. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 50–63. Springer, Heidelberg (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fredriksson, K., Navarro, G. (2004). Improved Single and Multiple Approximate String Matching. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27801-6_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22341-2

  • Online ISBN: 978-3-540-27801-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics