Abstract
Compressed pattern matching is one of the most active top- ics in string matching. The goal is to find all occurrences of a pattern in a compressed text without decompression. Various algorithms have been proposed depending on underlying compression methods in the last decade. Although some algorithms for multipattern searching on compressed text were also presented very recently, all of them are only for Lempel-Ziv family compressions. In this paper we propose two types of multipattern matching algorithms on collage system, which simulate the AC algorithm and a multipattern version of the BM algorithm, the most important algorithms for searching in uncompressed files. Collage system is a formal framework which is suitable to capture the essence of compressed pattern matching according to various dictionary based compressions. That is, we provide the model of multipattern matching algorithm for any compression method covered by the framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A.V. Aho and M. Corasick. Efficient string matching: An aid to bibliographic search. Comm. ACM, 18(6):333–340, 1975.
A. Amir and G. Benson. Efficient two-dimensional compressed matching. In Proc. Data Compression Conference, page 279, 1992.
M. Crochemore and W. Rytter. Text Algorithms. Oxford University Press, New York, 1994.
A.S. Fraenkel and J. Simpson. How many squares can a string contain? J. Combin. Theory Ser. A, 82:112–120, 1998.
L.C.K. Hui. Color set size problem with application to string matching. In Combinatorial Pattern Matching, volume 644 of Lecture Notes in Computer Science, pages 230–243. Springer-Verlag, 1992.
J. Kärkkäinen, G. Navarro, and E. Ukkonen. Approximate string matching over Ziv-Lempel compressed text. In Proc. 11th Ann. Symp. on Combinatorial Pattern Matching, volume 1848 of Lecture Notes in Computer Science, pages 195–209. Springer-Verlag, 2000.
T. Kida, Y. Shibata, M. Takeda, A. Shinohara, and S. Arikawa. A unifying framework for compressed pattern matching. In Proc. 6th International Symp. on String Processing and Information Retrieval, pages 89–96. IEEE Computer Society, 1999.
T. Kida, M. Takeda, A. Shinohara, M. Miyazaki, and S. Arikawa. Multiple pattern matching in LZW compressed text. Journal of Discrete Algorithms. to appear (previous version in DCC’98 and CPM’99).
T. Kida, M. Takeda, A. Shinohara, M. Miyazaki, and S. Arikawa. Multiple pattern matching in LZW compressed text. In J.A. Storer and M. Cohn, editors, Proc. Data Compression Conference’ 98, pages 103–112. IEEE Computer Society, 1998.
D.E. Knuth, J.H. Morris, and V.R. Pratt. Fast pattern matching in strings. SIAM J. Comput, 6(2):323–350, 1977.
N.J. Larsson and A. Moffat. Offline dictionary-based compression. In Proc. Data Compression Conference’ 99, pages 296–305. IEEE Computer Society, 1999.
T. Matsumoto, T. Kida, M. Takeda, A. Shinohara, and S. Arikawa. Bit-parallel approach to approximate string matching in compressed texts. In Proc. 7th International Symp. on String Processing and Information Retrieval, pages 221–228. IEEE Computer Society, 2000.
G. Navarro, T. Kida, M. Takeda, A. Shinohara, and S. Arikawa. Faster approximate string matching over compressed text. In Proc. Data Compression Conference 2001. IEEE Computer Society, 2001. to appear.
G. Navarro and M. Raffinot. A general practical approach to pattern matching over Ziv-Lempel compressed text. In Proc. 10th Ann. Symp. on Combinatorial Pattern Matching, volume 1645 of Lecture Notes in Computer Science, pages 14–36. Springer-Verlag, 1999.
G. Navarro and J. Tarhio. Boyer-Moore string matching over Ziv-Lempel compressed text. In Proc. 11th Ann. Symp. on Combinatorial Pattern Matching, volume 1848 of Lecture Notes in Computer Science, pages 166–180.Springer-Verlag, 2000.
C.G. Nevill-Manning, I.H. Witten, and D.L. Maulsby. Compression by induction of hierarchical grammars. In Proc. Data Compression Conference’ 94, pages 244–253. IEEE Press, 1994.
W. Ruzzo. Tree-size bounded alternation. Journal of Computer and System Sciences, 21(2):218–235, 1980.
W. Ruzzo. On uniform circuit complexity. Journal of Computer and System Sciences, 22(3):365–383, 1981.
W. Rytter. Algorithms on compressed strings and arrays. In Proc. 26th Ann. Conf. on Current Trends in Theory and Practice of Infomatics. Springer-Verlag, 1999.
Y. Shibata, T. Kida, S. Fukamachi, M. Takeda, A. Shinohara, T. Shinohara, and S. Arikawa. Speeding up pattern matching by text compression. In Proc. 4th Italian Conference on Algorithms and Complexity, volume 1767 of Lecture Notes in Computer Science, pages 306–315. Springer-Verlag, 2000.
Y. Shibata, T. Matsumoto, M. Takeda, A. Shinohara, and S. Arikawa. A Boyer-Moore type algorithm for compressed pattern matching. In Proc. 11th Ann. Symp. on Combinatorial Pattern Matching, volume 1848 of Lecture Notes in Computer Science, pages 181–194. Springer-Verlag, 2000.
I. Sudborough. On the tape complexity of deterministic context-free languages. Journal of ACM, 25:405–414, 1978.
M. Takeda, Y. Shibata, T. Matsumoto, T. Kida, A. Shinohara, S. Fukamachi, T. Shinohara, and S. Arikawa. Speeding up string pattern matching by text compression: The dawn of a new era. Transactions of Information Processing Society of Japan, 2001. to appear.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kida, T., Matsumoto, T., Takeda, M., Shinohara, A., Arikawa, S. (2001). Multiple Pattern Matching Algorithms on Collage System. In: Amir, A. (eds) Combinatorial Pattern Matching. CPM 2001. Lecture Notes in Computer Science, vol 2089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48194-X_18
Download citation
DOI: https://doi.org/10.1007/3-540-48194-X_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42271-6
Online ISBN: 978-3-540-48194-2
eBook Packages: Springer Book Archive