Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Linked Dynamic Tries with Applications to LZ-Compression in Sublinear Time and Space

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

The dynamic trie is a fundamental data structure with applications in many areas of computer science. This paper proposes a new technique for maintaining a dynamic trie T of size at most 2w nodes under the unit-cost RAM model with a fixed word size w. It is based on the idea of partitioning T into a set of linked small tries, each of which can be maintained efficiently. Our method is not only space-efficient, but also allows the longest common prefix between any query pattern P and the strings currently stored in T to be computed in o(|P|) time for small alphabets, and allows any leaf to be inserted into or deleted from T in o(log|T|) time. To demonstrate the usefulness of our new data structure, we apply it to LZ-compression. Significantly, we obtain the first algorithm for generating the lz78 encoding of a given string of length n over an alphabet of size σ in sublinear (o(n)) time and sublinear (o(nlogσ) bits) working space for small alphabets (\(\sigma= 2^{o(\log n \frac{\log\log\log n}{(\log\log n)^{2}})}\)). Moreover, the working space for our new algorithm is asymptotically less than or equal to the space for storing the output compressed text, regardless of the alphabet size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In this paper, log denotes the base-2 logarithm, ln is the natural logarithm, and log σ is the base-σ logarithm. Furthermore, ⌈ and ⌉ symbols have been omitted to increase the readability.

  2. We use the term size of a trie to refer to the number of nodes in the trie, not the number of bits used to represent the trie.

References

  1. Andersson, A., Nilsson, S.: Improved behaviour of tries by adaptive branching. Inf. Process. Lett. 46(6), 295–300 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  2. Arroyuelo, D.: An improved succinct representation for dynamic k-ary trees. In: Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (CPM 2008). Lecture Notes in Computer Science, vol. 5029, pp. 277–289. Springer, Berlin (2008)

    Chapter  Google Scholar 

  3. Arroyuelo, D., Navarro, G.: Space-efficient construction of Lempel-Ziv compressed text indexes. Inf. Comput. 209(7), 1070–1102 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  4. Beame, P., Fich, F.E.: Optimal bounds for the predecessor problem and related problems. J. Comput. Syst. Sci. 65(1), 38–72 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  5. Blandford, D.K., Blelloch, G.E.: Dictionaries using variable-length keys and data, with applications. In: Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2005), pp. 1–10 (2005)

    Google Scholar 

  6. Darragh, J.J., Cleary, J.G., Witten, I.H.: Bonsai: a compact representation of trees. Softw. Pract. Exp. 23(3), 277–291 (1993)

    Article  Google Scholar 

  7. Devroye, L., Szpankowski, W.: Probabilistic behavior of asymmetric level compressed tries. Random Struct. Algorithms 27(2), 185–200 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  8. Ferragina, P., Grossi, R.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM 46(2), 236–280 (1999)

    MATH  MathSciNet  Google Scholar 

  9. Ferragina, P., Manzini, G.: Indexing compressed texts. J. ACM 52(4), 552–581 (2005)

    MathSciNet  Google Scholar 

  10. Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960)

    Article  Google Scholar 

  11. Gabow, H.N., Tarjan, R.E.: A linear-time algorithm for a special case of disjoint set union. J. Comput. Syst. Sci. 30(2), 209–221 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  12. Geary, R.F., Rahman, N., Raman, R., Raman, V.: A simple optimal representation for balanced parentheses. Theor. Comput. Sci. 368(3), 231–246 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  13. Hagerup, T.: Sorting and searching on the word RAM. In: Procedings of the 15th Annual Symposium on Theoretical Aspects of Computer Science (STACS 1998). Lecture Notes in Computer Science, vol. 1373, pp. 366–398. Springer, Berlin (1998)

    Google Scholar 

  14. Hon, W.-K., Lam, T.-W., Sadakane, K., Sung, W.-K., Yiu, S.-M.: A space and time efficient algorithm for constructing compressed suffix arrays. Algorithmica 48(1), 23–36 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  15. Kärkkäinen, J., Sutinen, E.: Lempel-Ziv index for q-grams. Algorithmica 21(1), 137–154 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  16. Kosaraju, S.R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv algorithms. SIAM J. Comput. 29(3), 893–911 (1999)

    Article  MathSciNet  Google Scholar 

  17. Lippert, R.A., Mobarry, C.M., Walenz, B.P.: A space-efficient construction of the Burrows-Wheeler transform for genomic data. J. Comput. Biol. 12(7), 943–951 (2005)

    Article  Google Scholar 

  18. Mehlhorn, K., Näher, S.: Bounded ordered dictionaries in O(loglogN) time and O(n) space. Inf. Process. Lett. 35(4), 183–189 (1990)

    Article  MATH  Google Scholar 

  19. Morrison, D.R.: PATRICIA—Practical Algorithm To Retrieve Information Coded In Alphanumeric. J. ACM 15(4), 514–534 (1968)

    Google Scholar 

  20. Munro, J.I., Raman, V.: Succinct representation of balanced parentheses and static trees. SIAM J. Comput. 31(3), 762–776 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  21. Navarro, G.: Indexing text using the Ziv-Lempel trie. J. Discrete Algorithms 2(1), 87–114 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  22. Nilsson, S., Karlsson, G.: IP-address lookup using LC-tries. IEEE J. Sel. Areas Commun. 17(6), 1083–1092 (1999)

    Article  Google Scholar 

  23. Overmars, M.H.: The Design of Dynamic Data Structures. Lecture Notes in Computer Science., vol. 156. Springer, Berlin (1983)

    MATH  Google Scholar 

  24. Raman, R., Rao, S.S.: Succinct dynamic dictionaries and trees. In: Proceedings of the 30th International Colloquium on Automata, Languages and Programming (ICALP 2003). Lecture Notes in Computer Science, vol. 2719, pp. 357–368. Springer, Berlin (2003)

    Chapter  Google Scholar 

  25. Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2006), pp. 1230–1239 (2006)

    Google Scholar 

  26. Welch, T.A.: A technique for high-performance data compression. IEEE Comput. 17(6), 8–19 (1984)

    Article  Google Scholar 

  27. Willard, D.E.: Log-logarithmic worst-case range queries are possible in space Θ(N). Inf. Process. Lett. 17(2), 81–84 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  28. Willard, D.E.: New trie data structures which support very fast search operations. J. Comput. Syst. Sci. 28(3), 379–394 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  29. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous referees for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jesper Jansson.

Additional information

Parts of the results presented in this paper appeared in preliminary form in Proceedings of the 27th Annual International Conference on the Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2007), volume 4855 of Lecture Notes in Computer Science, Springer-Verlag, pp. 424–435, 2007.

J.J. was funded by the Hakubi Project at Kyoto University and the Japan Society for the Promotion of Science (JSPS).

K.S. was partially supported by the Japan Society for the Promotion of Science (JSPS) through the “Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program)”.

W.-K.S. was partially supported by the MOE AcRF Tier 2 funding R-252-000-444-112.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jansson, J., Sadakane, K. & Sung, WK. Linked Dynamic Tries with Applications to LZ-Compression in Sublinear Time and Space. Algorithmica 71, 969–988 (2015). https://doi.org/10.1007/s00453-013-9836-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-013-9836-6

Keywords

Navigation