Nothing Special   »   [go: up one dir, main page]


Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching

Authors Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, Yilin Yang



PDF
Thumbnail PDF

File

LIPIcs.CPM.2016.2.pdf
  • Filesize: 0.49 MB
  • 12 pages

Document Identifiers

Author Details

Arnab Ganguly
Wing-Kai Hon
Kunihiko Sadakane
Rahul Shah
Sharma V. Thankachan
Yilin Yang

Cite As Get BibTex

Arnab Ganguly, Wing-Kai Hon, Kunihiko Sadakane, Rahul Shah, Sharma V. Thankachan, and Yilin Yang. Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching. In 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016). Leibniz International Proceedings in Informatics (LIPIcs), Volume 54, pp. 2:1-2:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016) https://doi.org/10.4230/LIPIcs.CPM.2016.2

Abstract

Let S and S' be two strings of the same length.We consider the following two variants of string matching.


* Parameterized Matching: The characters of S and S' are partitioned into static characters and parameterized characters.
The strings are parameterized match iff the static characters match exactly and there exists a one-to-one function which renames the parameterized characters in S to those in S'.

* Order-Preserving Matching: The strings are order-preserving match iff for any two integers i,j in [1,|S|], S[i] <= S[j] iff S'[i] <= S'[j]. 

Let P be a collection of d patterns {P_1, P_2, ..., P_d} of total length n characters, which are chosen from an alphabet Sigma.
Given a text T, also over Sigma, we consider the dictionary indexing problem under the above definitions of string matching.
Specifically, the task is to index P, such that we can report all positions j where at least one of the patterns P_i in P is a parameterized-match (resp. order-preserving match) with the same-length substring of $T$ starting at j. Previous best-known indexes occupy O(n  * log(n)) bits and can report all occ positions in O(|T| * log(|Sigma|) + occ) time. We present space-efficient indexes that occupy O(n * log(|Sigma|+d) * log(n)) bits and reports all occ positions in O(|T| * (log(|Sigma|) + log_{|Sigma|}(n)) + occ) time for parameterized matching and in O(|T| * log(n) + occ) time for order-preserving matching.

Subject Classification

Keywords
  • Parameterized Matching
  • Order-preserving Matching
  • Dictionary Indexing
  • Aho-Corasick Automaton
  • Sparsification

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Alfred V. Aho and Margaret J. Corasick. Efficient string matching: An aid to bibliographic search. Commun. ACM, 18(6):333-340, 1975. URL: http://dx.doi.org/10.1145/360825.360855.
  2. Amihood Amir, Martin Farach, and S. Muthukrishnan. Alphabet dependence in parameterized matching. Inf. Process. Lett., 49(3):111-115, 1994. URL: http://dx.doi.org/10.1016/0020-0190(94)90086-8.
  3. Brenda S. Baker. A theory of parameterized pattern matching: algorithms and applications. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, May 16-18, 1993, San Diego, CA, USA, pages 71-80, 1993. URL: http://dx.doi.org/10.1145/167088.167115.
  4. Djamal Belazzougui. Succinct dictionary matching with no slowdown. In Combinatorial Pattern Matching, 21st Annual Symposium, CPM 2010, New York, NY, USA, June 21-23, 2010. Proceedings, pages 88-100, 2010. URL: http://dx.doi.org/10.1007/978-3-642-13509-5_9.
  5. Djamal Belazzougui and Gonzalo Navarro. Alphabet-independent compressed text indexing. ACM Transactions on Algorithms, 10(4):23:1-23:19, 2014. URL: http://dx.doi.org/10.1145/2635816.
  6. Sudip Biswas, Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Forbidden extension queries. In 35th IARCS Annual Conference on Foundation of Software Technology and Theoretical Computer Science, FSTTCS 2015, December 16-18, 2015, Bangalore, India, pages 320-335, 2015. URL: http://dx.doi.org/10.4230/LIPIcs.FSTTCS.2015.320.
  7. Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, and Tomasz Walen. Order-preserving incomplete suffix trees and order-preserving indexes. In String Processing and Information Retrieval - 20th International Symposium, SPIRE 2013, Jerusalem, Israel, October 7-9, 2013, Proceedings, pages 84-95, 2013. URL: http://dx.doi.org/10.1007/978-3-319-02432-5_13.
  8. Paolo Ferragina and Giovanni Manzini. Opportunistic data structures with applications. In 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, 12-14 November 2000, Redondo Beach, California, USA, pages 390-398, 2000. URL: http://dx.doi.org/10.1109/SFCS.2000.892127.
  9. Paolo Ferragina and Giovanni Manzini. Indexing compressed text. J. ACM, 52(4):552-581, 2005. URL: http://dx.doi.org/10.1145/1082036.1082039.
  10. Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro. An alphabet-friendly fm-index. In String Processing and Information Retrieval, 11th International Conference, SPIRE 2004, Padova, Italy, October 5-8, 2004, Proceedings, pages 150-160, 2004. URL: http://dx.doi.org/10.1007/978-3-540-30213-1_23.
  11. Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro. Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms, 3(2), 2007. URL: http://dx.doi.org/10.1145/1240233.1240243.
  12. Arnab Ganguly, Rahul Shah, and Sharma V. Thankachan. Succinct non-overlapping indexing. In Combinatorial Pattern Matching - 26th Annual Symposium, CPM 2015, Ischia Island, Italy, June 29 - July 1, 2015, Proceedings, pages 185-195, 2015. URL: http://dx.doi.org/10.1007/978-3-319-19929-0_16.
  13. Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. High-order entropy-compressed text indexes. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA., pages 841-850, 2003. URL: http://dl.acm.org/citation.cfm?id=644108.644250.
  14. Roberto Grossi and Jeffrey Scott Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract). In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, May 21-23, 2000, Portland, OR, USA, pages 397-406, 2000. URL: http://dx.doi.org/10.1145/335305.335351.
  15. Carmit Hazay, Moshe Lewenstein, and Dina Sokol. Approximate parameterized matching. In Algorithms - ESA 2004, 12th Annual European Symposium, Bergen, Norway, September 14-17, 2004, Proceedings, pages 414-425, 2004. URL: http://dx.doi.org/10.1007/978-3-540-30140-0_38.
  16. Wing-Kai Hon, Tak Wah Lam, Rahul Shah, Siu-Lung Tam, and Jeffrey Scott Vitter. Compressed index for dictionary matching. In 2008 Data Compression Conference (DCC 2008), 25-27 March 2008, Snowbird, UT, USA, pages 23-32, 2008. URL: http://dx.doi.org/10.1109/DCC.2008.62.
  17. Wing-Kai Hon, Rahul Shah, Sharma V. Thankachan, and Jeffrey Scott Vitter. Space-efficient frameworks for top-k string retrieval. J. ACM, 61(2):9:1-9:36, 2014. URL: http://dx.doi.org/10.1145/2590774.
  18. Ramana M. Idury and Alejandro A. Schäffer. Multiple matching of parameterized patterns. In Combinatorial Pattern Matching, 5th Annual Symposium, CPM 94, Asilomar, California, USA, June 5-8, 1994, Proceedings, pages 226-239, 1994. URL: http://dx.doi.org/10.1007/3-540-58094-8_20.
  19. Markus Jalsenius, Benny Porat, and Benjamin Sach. Parameterized matching in the streaming model. In 30th International Symposium on Theoretical Aspects of Computer Science, STACS 2013, February 27 - March 2, 2013, Kiel, Germany, pages 400-411, 2013. URL: http://dx.doi.org/10.4230/LIPIcs.STACS.2013.400.
  20. Jinil Kim, Peter Eades, Rudolf Fleischer, Seok-Hee Hong, Costas S. Iliopoulos, Kunsoo Park, Simon J. Puglisi, and Takeshi Tokuyama. Order-preserving matching. Theor. Comput. Sci., 525:68-79, 2014. URL: http://dx.doi.org/10.1016/j.tcs.2013.10.006.
  21. S. Rao Kosaraju. Faster algorithms for the construction of parameterized suffix trees (preliminary version). In 36th Annual Symposium on Foundations of Computer Science, Milwaukee, Wisconsin, 23-25 October 1995, pages 631-637, 1995. URL: http://dx.doi.org/10.1109/SFCS.1995.492664.
  22. Veli Mäkinen and Gonzalo Navarro. Compressed compact suffix arrays. In Combinatorial Pattern Matching, 15th Annual Symposium, CPM 2004, Istanbul,Turkey, July 5-7, 2004, Proceedings, pages 420-433, 2004. URL: http://dx.doi.org/10.1007/978-3-540-27801-6_32.
  23. Veli Mäkinen and Gonzalo Navarro. Succinct suffix arrays based on run-length encoding. In Combinatorial Pattern Matching, 16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19-22, 2005, Proceedings, pages 45-56, 2005. URL: http://dx.doi.org/10.1007/11496656_5.
  24. J. Ian Munro, Gonzalo Navarro, Jesper Sindahl Nielsen, Rahul Shah, and Sharma V. Thankachan. Top- k term-proximity in succinct space. In Algorithms and Computation - 25th International Symposium, ISAAC 2014, Jeonju, Korea, December 15-17, 2014, Proceedings, pages 169-180, 2014. URL: http://dx.doi.org/10.1007/978-3-319-13075-0_14.
  25. Gonzalo Navarro and Veli Mäkinen. Compressed full-text indexes. ACM Comput. Surv., 39(1), 2007. URL: http://dx.doi.org/10.1145/1216370.1216372.
  26. Kunihiko Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Algorithms and Computation, 11th International Conference, ISAAC 2000, Taipei, Taiwan, December 18-20, 2000, Proceedings, pages 410-421, 2000. URL: http://dx.doi.org/10.1007/3-540-40996-3_35.
  27. Kunihiko Sadakane. New text indexing functionalities of the compressed suffix arrays. J. Algorithms, 48(2):294-313, 2003. URL: http://dx.doi.org/10.1016/S0196-6774(03)00087-7.
  28. Kunihiko Sadakane and Gonzalo Navarro. Fully-functional succinct trees. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 17-19, 2010, pages 134-149, 2010. URL: http://dx.doi.org/10.1137/1.9781611973075.13.
  29. Alan Tam, Edward Wu, Tak Wah Lam, and Siu-Ming Yiu. Succinct text indexing with wildcards. In String Processing and Information Retrieval, 16th International Symposium, SPIRE 2009, Saariselkä, Finland, August 25-27, 2009, Proceedings, pages 39-50, 2009. URL: http://dx.doi.org/10.1007/978-3-642-03784-9_5.
  30. Dekel Tsur. Top-k document retrieval in optimal space. Inf. Process. Lett., 113(12):440-443, 2013. URL: http://dx.doi.org/10.1016/j.ipl.2013.03.012.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail