Abstract
The dynamic trie is a fundamental data structure with applications in many areas of computer science. This paper proposes a new technique for maintaining a dynamic trie T of size at most 2w nodes under the unit-cost RAM model with a fixed word size w. It is based on the idea of partitioning T into a set of linked small tries, each of which can be maintained efficiently. Our method is not only space-efficient, but also allows the longest common prefix between any query pattern P and the strings currently stored in T to be computed in o(|P|) time for small alphabets, and allows any leaf to be inserted into or deleted from T in o(log|T|) time. To demonstrate the usefulness of our new data structure, we apply it to LZ-compression. Significantly, we obtain the first algorithm for generating the lz78 encoding of a given string of length n over an alphabet of size σ in sublinear (o(n)) time and sublinear (o(nlogσ) bits) working space for small alphabets (\(\sigma= 2^{o(\log n \frac{\log\log\log n}{(\log\log n)^{2}})}\)). Moreover, the working space for our new algorithm is asymptotically less than or equal to the space for storing the output compressed text, regardless of the alphabet size.
Similar content being viewed by others
Notes
In this paper, log denotes the base-2 logarithm, ln is the natural logarithm, and log σ is the base-σ logarithm. Furthermore, ⌈ and ⌉ symbols have been omitted to increase the readability.
We use the term size of a trie to refer to the number of nodes in the trie, not the number of bits used to represent the trie.
References
Andersson, A., Nilsson, S.: Improved behaviour of tries by adaptive branching. Inf. Process. Lett. 46(6), 295–300 (1993)
Arroyuelo, D.: An improved succinct representation for dynamic k-ary trees. In: Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM 2008). Lecture Notes in Computer Science, vol. 5029, pp. 277–289. Springer, Berlin (2008)
Arroyuelo, D., Navarro, G.: Space-efficient construction of Lempel-Ziv compressed text indexes. Inf. Comput. 209(7), 1070–1102 (2011)
Beame, P., Fich, F.E.: Optimal bounds for the predecessor problem and related problems. J. Comput. Syst. Sci. 65(1), 38–72 (2002)
Blandford, D.K., Blelloch, G.E.: Dictionaries using variable-length keys and data, with applications. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2005), pp. 1–10 (2005)
Darragh, J.J., Cleary, J.G., Witten, I.H.: Bonsai: a compact representation of trees. Softw. Pract. Exp. 23(3), 277–291 (1993)
Devroye, L., Szpankowski, W.: Probabilistic behavior of asymmetric level compressed tries. Random Struct. Algorithms 27(2), 185–200 (2005)
Ferragina, P., Grossi, R.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM 46(2), 236–280 (1999)
Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)
Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960)
Gabow, H.N., Tarjan, R.E.: A linear-time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci. 30(2), 209–221 (1985)
Geary, R.F., Rahman, N., Raman, R., Raman, V.: A simple optimal representation for balanced parentheses. Theor. Comput. Sci. 368(3), 231–246 (2006)
Hagerup, T.: Sorting and searching on the word RAM. In: Procedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science (STACS 1998). Lecture Notes in Computer Science, vol. 1373, pp. 366–398. Springer, Berlin (1998)
Hon, W.-K., Lam, T.-W., Sadakane, K., Sung, W.-K., Yiu, S.-M.: A space and time efficient algorithm for constructing compressed suffix arrays. Algorithmica 48(1), 23–36 (2007)
Kärkkäinen, J., Sutinen, E.: Lempel-Ziv index for q-grams. Algorithmica 21(1), 137–154 (1998)
Kosaraju, S.R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. Comput. 29(3), 893–911 (1999)
Lippert, R.A., Mobarry, C.M., Walenz, B.P.: A space-efficient construction of the Burrows-Wheeler transform for genomic data. J. Comput. Biol. 12(7), 943–951 (2005)
Mehlhorn, K., Näher, S.: Bounded ordered dictionaries in O(loglogN) time and O(n) space. Inf. Process. Lett. 35(4), 183–189 (1990)
Morrison, D.R.: PATRICIA—Practical Algorithm To Retrieve Information Coded In Alphanumeric. J. ACM 15(4), 514–534 (1968)
Munro, J.I., Raman, V.: Succinct representation of balanced parentheses and static trees. SIAM J. Comput. 31(3), 762–776 (2001)
Navarro, G.: Indexing text using the Ziv-Lempel trie. J. Discrete Algorithms 2(1), 87–114 (2004)
Nilsson, S., Karlsson, G.: IP-address lookup using LC-tries. IEEE J. Sel. Areas Commun. 17(6), 1083–1092 (1999)
Overmars, M.H.: The Design of Dynamic Data Structures. Lecture Notes in Computer Science., vol. 156. Springer, Berlin (1983)
Raman, R., Rao, S.S.: Succinct dynamic dictionaries and trees. In: Proceedings of the 30th International Colloquium on Automata, Languages and Programming (ICALP 2003). Lecture Notes in Computer Science, vol. 2719, pp. 357–368. Springer, Berlin (2003)
Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2006), pp. 1230–1239 (2006)
Welch, T.A.: A technique for high-performance data compression. IEEE Comput. 17(6), 8–19 (1984)
Willard, D.E.: Log-logarithmic worst-case range queries are possible in space Θ(N). Inf. Process. Lett. 17(2), 81–84 (1983)
Willard, D.E.: New trie data structures which support very fast search operations. J. Comput. Syst. Sci. 28(3), 379–394 (1984)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)
Acknowledgements
The authors would like to thank the anonymous referees for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Parts of the results presented in this paper appeared in preliminary form in Proceedings of the 27th Annual International Conference on the Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2007), volume 4855 of Lecture Notes in Computer Science, Springer-Verlag, pp. 424–435, 2007.
J.J. was funded by the Hakubi Project at Kyoto University and the Japan Society for the Promotion of Science (JSPS).
K.S. was partially supported by the Japan Society for the Promotion of Science (JSPS) through the “Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program)”.
W.-K.S. was partially supported by the MOE AcRF Tier 2 funding R-252-000-444-112.
Rights and permissions
About this article
Cite this article
Jansson, J., Sadakane, K. & Sung, WK. Linked Dynamic Tries with Applications to LZ-Compression in Sublinear Time and Space. Algorithmica 71, 969–988 (2015). https://doi.org/10.1007/s00453-013-9836-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-013-9836-6