Nothing Special   »   [go: up one dir, main page]

Skip to main content

Approximate String Matching over Ziv—Lempel Compressed Text

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1848))

Included in the following conference series:

Abstract

We present a solution to the problem of performing approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to insertions, deletions and substitutions, in O(mkn + R) time. The existence problem needs O(mkn) time. We also show that the algorithm can be adapted to run in O(k 2 n + min(mkn, m 2(mσ)k + R) average time, where σ is the alphabet size. The experimental results show a speedup over the basic approach for moderate m and small k.

Work developed during postdoctoral stay at the University of Helsinki, partially supported by the Academy of Finland (grant (grant 44449) and Fundación Andes. Also supported by Fondecyt grant 1-990627.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. A. Amir and G. Benson. Efficient two-dimensional compressed matching. In Proc. DCC’92, pages 279–288, 1992.

    Google Scholar 

  2. A. Amir, G. Benson, and M. Farach. Let sleeping files lie: Pattern matching in Z-compressed files. J. of and Sys. Sciences, 52(2):299–307, 1996. Earlier version in Proc. SODA’ 94.

    Article  MathSciNet  Google Scholar 

  3. A. Apostolico and Z. Galil. Pattern Matching Algorithms. Oxford University Press, Oxford, UK, 1997.

    MATH  Google Scholar 

  4. R. Baeza-Yates and G. Navarro. Faster approximate string matching. Algorithmica, 23(2):127–158, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  5. T. Bell, J. Cleary, and I. Witten. Text Compression. Prentice Hall, 1990.

    Google Scholar 

  6. W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. CPM’92, LNCS 644, pages 172–181, 1992.

    Google Scholar 

  7. W. Chang and T. Marr. Approximate string matching and local similarity. In Proc. CPM’94, LNCS 807, pages 259–273, 1994.

    Google Scholar 

  8. M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, Oxford, UK, 1994.

    MATH  Google Scholar 

  9. M. Farach and M. Thorup. String matching in Lempel-Ziv compressed strings. Algorithmica, 20:388–404, 1998. Previous version in STOC’95.

    Article  MATH  MathSciNet  Google Scholar 

  10. Z. Galil and K. Park. An improved algorithm for approximate string matching. SI AM J. on Computing, 19(6):989–999, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  11. T. Kida, M. Takeda, A. Shinohara, M. Miyazaki, and S. Arikawa. Multiple pattern matching in LZW compressed text. In Proc. DCC’98, pages 103–112, 1998.

    Google Scholar 

  12. T. Kida, M. Takeda, A. Shinohara, M. Miyazaki, and S. Arikawa. Shift-And approach to pattern matching in LZW compressed text. In Proc. CPM’99, LNCS 1645, pages 1–13, 1999.

    Google Scholar 

  13. G. Myers. A fast bit-vector algorithm for approximate pattern matching based on dynamic progamming. In Proc. CPM’98, LNCS 1448, pages 1–13, 1998.

    Google Scholar 

  14. G. Navarro. A guided tour to approximate string matching. Technical Report TR/DCC-99-5, Dept. of Computer Science, Univ. of Chile, 1999. To appear in ACM Computing Surveys. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/survasm.ps.gz.

  15. G. Navarro and R. Baeza-Yates. Very fast and simple approximate string matching. Information Processing Letters, 72:65–70, 1999.

    Article  MathSciNet  Google Scholar 

  16. G. Navarro and M. Raffinot. A general practical approach to pattern matching over Ziv-Lempel compressed text. In Proc. CPM’99, LNCS 1645, pages 14–36, 1999.

    Google Scholar 

  17. G. Navarro and J. Tarhio. Boyer-Moore string matching over Ziv-Lempel compressed text. In Proc. CPM’2000, LNCS 1848, 2000, pp. 166–180. In this same volume.

    Google Scholar 

  18. S. Needleman and C. Wunsch. A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. of Molecular Biology, 48:444–453, 1970.

    Google Scholar 

  19. P. Sellers. The theory and computation of evolutionary distances: pattern recognition. J. of Algorithms, 1:359–373, 1980.

    Article  MATH  MathSciNet  Google Scholar 

  20. E. Ukkonen. Finding approximate patterns in strings. J. of Algorithms, 6:132–137, 1985.

    Article  MATH  MathSciNet  Google Scholar 

  21. T. A. Welch. A technique for high performance data compression. IEEE Computer Magazine, 17(6):8–19, June 1984.

    Google Scholar 

  22. S. Wu and U. Manber. Fast text searching allowing errors. Comm. of the ACM, 35(10):83–91, 1992.

    Article  Google Scholar 

  23. J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory, 23:337–343, 1977.

    Article  MATH  MathSciNet  Google Scholar 

  24. J. Ziv and A. Lempel. Compression of individual sequences via variable length coding. IEEE Trans. Inf. Theory, 24:530–536, 1978.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kärkkäinen, J., Navarro, G., Ukkonen, E. (2000). Approximate String Matching over Ziv—Lempel Compressed Text. In: Giancarlo, R., Sankoff, D. (eds) Combinatorial Pattern Matching. CPM 2000. Lecture Notes in Computer Science, vol 1848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45123-4_18

Download citation

  • DOI: https://doi.org/10.1007/3-540-45123-4_18

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67633-1

  • Online ISBN: 978-3-540-45123-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics