Nothing Special   »   [go: up one dir, main page]

skip to main content
Compact pat trees
Publisher:
  • University of Waterloo
  • Computer Science Dept. University Avenue Waterloo, Ont. N2L 3G1
  • Canada
Order Number:UMI Order No. GAXNQ-21335
Reflects downloads up to 24 Nov 2024Bibliometrics
Skip Abstract Section
Abstract

Given a text string $S = \sb{s\sb1s\sb2s\sb3\... s\sb{n}},$ we want to preprocess S such that given a pattern $P = p\sb1p\sb2p\sb3\... p\sb{m},$ we can find $\{i\vert s\sb{i\... s\sb{i+m-1}} = P\}$ as efficiently as possible. Suffix trees are a data structure solution to this problem. Unfortunately, when n is large, the storage required by a suffix tree can be prohibitive. This thesis presents several related new representations for a close relative of the suffix tree, the PAT tree, that retain the functionality of suffix trees while requiring a fraction of the storage used by current methods.

Cited By

  1. ACM
    Arroyuelo D, Campos D, Gómez-Brandón A, Navarro G, Rojas C and Vrgoč D Space & Time Efficient Leapfrog Triejoin Proceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), (1-9)
  2. ACM
    Arroyuelo D, Bustos B, Gómez-Brandón A, Hogan A, Navarro G and Reutter J (2024). Worst-Case-Optimal Similarity Joins on Graph Databases, Proceedings of the ACM on Management of Data, 2:1, (1-26), Online publication date: 12-Mar-2024.
  3. Jiang S, Feng J, Wang C, Liu J, Xiong Z, Sha C, Zheng W, Liang J and Xiao Y (2023). EASC, Knowledge-Based Systems, 278:C, Online publication date: 25-Oct-2023.
  4. de Bernardo G, Gagie T, Ladra S, Navarro G and Seco D (2023). Faster compressed quadtrees, Journal of Computer and System Sciences, 131:C, (86-104), Online publication date: 1-Feb-2023.
  5. Chakraborty S, Grossi R, Sadakane K and Satti S (2023). Succinct representation for (non)deterministic finite automata, Journal of Computer and System Sciences, 131:C, (1-12), Online publication date: 1-Feb-2023.
  6. Fuentes-Sepúlveda J, Navarro G and Seco D (2023). Navigating planar topologies in near-optimal space and time, Computational Geometry: Theory and Applications, 109:C, Online publication date: 1-Feb-2023.
  7. ACM
    Boffa A, Ferragina P and Vinciguerra G (2022). A Learned Approach to Design Compressed Rank/Select Data Structures, ACM Transactions on Algorithms, 18:3, (1-28), Online publication date: 31-Jul-2022.
  8. Li Z, Li J and Huo H (2022). Optimal in-place suffix sorting, Information and Computation, 285:PB, Online publication date: 1-May-2022.
  9. Fariña A, Gagie T, Grabowski S, Manzini G, Navarro G and Ordóñez A (2022). Efficient and compact representations of some non-canonical prefix-free codes, Theoretical Computer Science, 907:C, (11-25), Online publication date: 12-Mar-2022.
  10. ACM
    Hoffmann M, Monaghan M and Reinert K PriSeT Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, (1-12)
  11. ACM
    Arroyuelo D, Hogan A, Navarro G, Reutter J, Rojas-Ledesma J and Soto A Worst-Case Optimal Graph Joins in Almost No Space Proceedings of the 2021 International Conference on Management of Data, (102-114)
  12. ACM
    Kärkkäinen J and Kempa D (2019). Better External Memory LCP Array Construction, ACM Journal of Experimental Algorithmics, 24, (1-27), Online publication date: 17-Dec-2019.
  13. ACM
    Lasch R, Oukid I, Dementiev R, May N, Demirsoy S and Sattler K Fast & Strong Proceedings of the 15th International Workshop on Data Management on New Hardware, (1-10)
  14. Navarro G (2019). Document listing on repetitive collections with guaranteed performance, Theoretical Computer Science, 772:C, (58-72), Online publication date: 7-Jun-2019.
  15. Gagie T, He M and Navarro G (2019). Path queries on functions, Theoretical Computer Science, 770:C, (34-50), Online publication date: 24-May-2019.
  16. Kammer F, Kratsch D and Laudahn M (2019). Space-Efficient Biconnected Components and Recognition of Outerplanar Graphs, Algorithmica, 81:3, (1180-1204), Online publication date: 1-Mar-2019.
  17. Banerjee N, Chakraborty S, Raman V and Satti S (2018). Space Efficient Linear Time Algorithms for BFS, DFS and Applications, Theory of Computing Systems, 62:8, (1736-1762), Online publication date: 1-Nov-2018.
  18. Arz J and Fischer J (2018). Lempel---Ziv-78 Compressed String Dictionaries, Algorithmica, 80:7, (2012-2047), Online publication date: 1-Jul-2018.
  19. Fischer J, I T, Köppl D and Sadakane K (2018). Lempel---Ziv Factorization Powered by Space Efficient Suffix Trees, Algorithmica, 80:7, (2048-2081), Online publication date: 1-Jul-2018.
  20. ACM
    Gog S, Konow R and Navarro G (2017). Practical Compact Indexes for Top-k Document Retrieval, ACM Journal of Experimental Algorithmics, 22, (1-37), Online publication date: 15-Dec-2017.
  21. Belazzougui D and Cunial F (2017). A Framework for Space-Efficient String Kernels, Algorithmica, 79:3, (857-883), Online publication date: 1-Nov-2017.
  22. ACM
    Pibiri G and Venturini R (2017). Clustered Elias-Fano Indexes, ACM Transactions on Information Systems, 36:1, (1-33), Online publication date: 30-Aug-2017.
  23. ACM
    Grossi R, Iacono J, Navarro G, Raman R and Satti S (2017). Asymptotically Optimal Encodings of Range Data Structures for Selection and Top-k Queries, ACM Transactions on Algorithms, 13:2, (1-31), Online publication date: 29-May-2017.
  24. (2017). On prefix normal words and prefix normal forms, Theoretical Computer Science, 659:C, (1-13), Online publication date: 10-Jan-2017.
  25. R. Brisaboa N, De Bernardo G, Konow R, Navarro G and Seco D (2016). Aggregated 2D range queries on clustered points, Information Systems, 60:C, (34-49), Online publication date: 1-Aug-2016.
  26. ACM
    Gog S and Venturini R Fast and Compact Hamming Distance Index Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, (285-294)
  27. Cordova J and Navarro G Practical Dynamic Entropy-Compressed Bitvectors with Applications Proceedings of the 15th International Symposium on Experimental Algorithms - Volume 9685, (105-117)
  28. Brisaboa N, Cerdeira-Pena A, Fariña A and Navarro G A Compact RDF Store Using Suffix Arrays Proceedings of the 22nd International Symposium on String Processing and Information Retrieval - Volume 9309, (103-115)
  29. ACM
    Belazzougui D and Navarro G (2015). Optimal Lower and Upper Bounds for Representing Sequences, ACM Transactions on Algorithms, 11:4, (1-21), Online publication date: 23-Jun-2015.
  30. ACM
    Navarro G, Puglisi S and Valenzuela D (2015). General Document Retrieval in Compact Space, ACM Journal of Experimental Algorithmics, 19, (1-46), Online publication date: 3-Feb-2015.
  31. ACM
    González R, Navarro G and Ferrada H (2015). Locally Compressed Suffix Arrays, ACM Journal of Experimental Algorithmics, 19, (1.1-1.30), Online publication date: 3-Feb-2015.
  32. Claude F, Navarro G and Ordóñez A (2015). The wavelet matrix, Information Systems, 47:C, (15-32), Online publication date: 1-Jan-2015.
  33. Claude F, Konow R and Navarro G Efficient Indexing and Representation of Web Access Logs Proceedings of the 21st International Symposium on String Processing and Information Retrieval - Volume 8799, (65-76)
  34. Navarro G and Ordóñez A Grammar Compressed Sequences with Rank/Select Support Proceedings of the 21st International Symposium on String Processing and Information Retrieval - Volume 8799, (31-44)
  35. Belazzougui D and Cunial F Indexed Matching Statistics and Shortest Unique Substrings Proceedings of the 21st International Symposium on String Processing and Information Retrieval - Volume 8799, (179-190)
  36. ACM
    Belazzougui D and Navarro G (2014). Alphabet-Independent Compressed Text Indexing, ACM Transactions on Algorithms, 10:4, (1-19), Online publication date: 1-Aug-2014.
  37. ACM
    Belazzougui D Linear time construction of compressed text indices in compact space Proceedings of the forty-sixth annual ACM symposium on Theory of computing, (148-193)
  38. ACM
    Hon W, Shah R, Thankachan S and Vitter J (2014). Space-Efficient Frameworks for Top-k String Retrieval, Journal of the ACM, 61:2, (1-36), Online publication date: 1-Apr-2014.
  39. ACM
    Navarro G (2014). Spaces, Trees, and Colors, ACM Computing Surveys, 46:4, (1-47), Online publication date: 1-Apr-2014.
  40. ACM
    Gog S and Ohlebusch E (2013). Compressed suffix trees, ACM Journal of Experimental Algorithmics, 18, (2.1-2.31), Online publication date: 1-Dec-2013.
  41. ACM
    Hsu B and Ottaviano G Space-efficient data structures for Top-k completion Proceedings of the 22nd international conference on World Wide Web, (583-594)
  42. ACM
    Vigna S Quasi-succinct indices Proceedings of the sixth ACM international conference on Web search and data mining, (83-92)
  43. Brisaboa N, Ladra S and Navarro G (2013). DACs, Information Processing and Management: an International Journal, 49:1, (392-404), Online publication date: 1-Jan-2013.
  44. Barbay J, Fischer J and Navarro G (2012). LRM-Trees, Theoretical Computer Science, 459, (26-41), Online publication date: 1-Nov-2012.
  45. Takabatake Y, Tabei Y and Sakamoto H Variable-Length codes for space-efficient grammar-based compression Proceedings of the 19th international conference on String Processing and Information Retrieval, (398-410)
  46. Hernández C and Navarro G Compressed representation of web and social networks via dense subgraphs Proceedings of the 19th international conference on String Processing and Information Retrieval, (264-276)
  47. Claude F and Navarro G The wavelet matrix Proceedings of the 19th international conference on String Processing and Information Retrieval, (167-179)
  48. Belazzougui D and Navarro G New lower and upper bounds for representing sequences Proceedings of the 20th Annual European conference on Algorithms, (181-192)
  49. Navarro G and Providel E Fast, small, simple rank/select on bitmaps Proceedings of the 11th international conference on Experimental Algorithms, (295-306)
  50. Claude F, Navarro G, Peltola H, Salmela L and Tarhio J (2012). String matching with alphabet sampling, Journal of Discrete Algorithms, 11, (37-50), Online publication date: 1-Feb-2012.
  51. ACM
    Lim H, Fan B, Andersen D and Kaminsky M SILT Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, (1-13)
  52. Claude F, Nicholson P and Seco D Space efficient wavelet tree construction Proceedings of the 18th international conference on String processing and information retrieval, (185-196)
  53. Kimura K, Koike A and Nakai K Seed-set construction by equi-entropy partitioning for efficient and sensitive short-read mapping Proceedings of the 11th international conference on Algorithms in bioinformatics, (151-162)
  54. Claude F and Navarro G (2011). Self-Indexed Grammar-Based Compression, Fundamenta Informaticae, 111:3, (313-337), Online publication date: 1-Aug-2011.
  55. Brodal G, Gfeller B, Jørgensen A and Sanders P (2011). Towards optimal range medians, Theoretical Computer Science, 412:24, (2588-2601), Online publication date: 1-May-2011.
  56. ACM
    Culpepper J and Moffat A (2010). Efficient set intersection for inverted indexing, ACM Transactions on Information Systems, 29:1, (1-25), Online publication date: 1-Dec-2010.
  57. Fernández J, Martínez-Prieto M and Gutierrez C Compact representation of large RDF data sets for publishing and exchange Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I, (193-208)
  58. ACM
    Claude F and Navarro G (2010). Fast and Compact Web Graph Representations, ACM Transactions on the Web, 4:4, (1-31), Online publication date: 1-Sep-2010.
  59. Burcsi P, Cicalese F, Fici G and Lipták Z On table arrangements, scrabble freaks, and jumbled pattern matching Proceedings of the 5th international conference on Fun with algorithms, (89-101)
  60. Barbay J, Claude F and Navarro G Compact rich-functional binary relation representations Proceedings of the 9th Latin American conference on Theoretical Informatics, (170-183)
  61. Claude F and Navarro G Extended compact web graph representations Algorithms and Applications, (77-91)
  62. ACM
    Välimäki N, Mäkinen V, Gerlach W and Dixit K (2010). Engineering a compressed suffix tree implementation, ACM Journal of Experimental Algorithmics, 14, (4.2-4.23), Online publication date: 1-Dec-2009.
  63. Fariòa A, Ladra S, Pedreira O and Places Á (2009). Rank and Select for Succinct Data Structures, Electronic Notes in Theoretical Computer Science (ENTCS), 236, (131-145), Online publication date: 1-Apr-2009.
  64. Farzan A and Munro J A Uniform Approach Towards Succinct Representation of Trees Proceedings of the 11th Scandinavian workshop on Algorithm Theory, (173-184)
  65. ACM
    Lu H and Yeh C (2008). Balanced parentheses strike back, ACM Transactions on Algorithms, 4:3, (1-13), Online publication date: 1-Jun-2008.
  66. Vigna S Broadword implementation of rank/select queries Proceedings of the 7th international conference on Experimental algorithms, (154-168)
  67. Yamanaka K and Nakano S A compact encoding of plane triangulations with efficient query supports Proceedings of the 2nd international conference on Algorithms and computation, (120-131)
  68. Golynski A (2007). Optimal lower bounds for rank and select indexes, Theoretical Computer Science, 387:3, (348-359), Online publication date: 20-Nov-2007.
  69. Mäkinen V and Navarro G (2007). Rank and select revisited and extended, Theoretical Computer Science, 387:3, (332-347), Online publication date: 20-Nov-2007.
  70. ACM
    Raman R, Raman V and Satti S (2007). Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets, ACM Transactions on Algorithms, 3:4, (43-es), Online publication date: 1-Nov-2007.
  71. Culpepper J and Moffat A Compact set representation for information retrieval Proceedings of the 14th international conference on String processing and information retrieval, (137-148)
  72. Na J and Park K (2007). Alphabet-independent linear-time construction of compressed suffix arrays using o(nlogn)-bit working space, Theoretical Computer Science, 385:1-3, (127-136), Online publication date: 1-Oct-2007.
  73. Lee S and Park K Dynamic rank-select structures with applications to run-length encoded texts Proceedings of the 18th annual conference on Combinatorial Pattern Matching, (95-106)
  74. Fredriksson K and Nikitin F Simple compression code supporting random access and fast string matching Proceedings of the 6th international conference on Experimental algorithms, (203-216)
  75. Yamanaka K and Nakano S A Compact Encoding of Rectangular Drawings with Efficient Query Supports Proceedings of the 3rd international conference on Algorithmic Aspects in Information and Management, (68-81)
  76. ACM
    Ferragina P, Manzini G, Mäkinen V and Navarro G (2007). Compressed representations of sequences and full-text indexes, ACM Transactions on Algorithms, 3:2, (20-es), Online publication date: 1-May-2007.
  77. ACM
    Navarro G and Mäkinen V (2007). Compressed full-text indexes, ACM Computing Surveys, 39:1, (2-es), Online publication date: 12-Apr-2007.
  78. Golynski A Optimal lower bounds for rank and select indexes Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I, (370-381)
  79. Mäkinen V and Navarro G Position-Restricted substring searching Proceedings of the 7th Latin American conference on Theoretical Informatics, (703-714)
  80. Na J Linear-Time construction of compressed suffix arrays using o(n log n)-bit working space for large alphabets Proceedings of the 16th annual conference on Combinatorial Pattern Matching, (57-67)
  81. Mäkinen V and Navarro G Succinct suffix arrays based on run-length encoding Proceedings of the 16th annual conference on Combinatorial Pattern Matching, (45-56)
  82. Kim D, Na J, Kim J and Park K Efficient implementation of rank and select functions for succinct representation Proceedings of the 4th international conference on Experimental and Efficient Algorithms, (315-327)
  83. Miltersen P Lower bounds on the size of selection and rank indexes Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, (11-12)
  84. Raman R, Raman V and Rao S Succinct indexable dictionaries with applications to encoding k-ary trees and multisets Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, (233-242)
  85. Chiang Y, Lin C and Lu H Orderly spanning trees with applications to graph encoding and graph drawing Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, (506-515)
  86. ACM
    Grossi R and Vitter J Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract) Proceedings of the thirty-second annual ACM symposium on Theory of computing, (397-406)
Contributors
  • University of Waterloo
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations