Abstract
A space-efficient linear-time approximation algorithm for the grammar-based compression problem, which requests for a given string to find a smallest context-free grammar deriving the string, is presented. The algorithm consumes only O(g * log g *) space and achieves the worst-case approximation ratio O(log g * log n), with the size n of an input and the optimum grammar size g *. Experimental results for typical benchmarks demonstrate that our algorithm is practical and efficient.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., Marchetti-Spaccamela, A., Protasi, M.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer, Heidelberg (1999)
De Agostino, S., Storer, J.A.: On-Line versus Off-Line Computation in Dynamic Text Compression. Inform. Process. Lett. 59, 169–174 (1996)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Rasala, A., Sahai, A., Shelat, A.: Approximating the Smallest Grammar: Kolmogorov Complexity in Natural Models. In: Proc. 29th Ann. Sympo. on Theory of Computing, pp. 792–801 (2002)
Farach, M.: Optimal Suffix Tree Construction with Large Alphabets. In: Proc. 38th Ann. Sympo. on Foundations of Computer Science, pp. 137–143 (1997)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. In: Computer Science and Computational Biology, Cambridge University Press, Cambridge (1997)
Kida, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage System: a Unifying Framework for Compressed Pattern Matching. Theoret. Comput. Sci. (to appear)
Kieffer, J.C., Yang, E.-H.: Grammar-Based Codes: a New Class of Universal Lossless Source Codes. IEEE Trans. on Inform. Theory 46(3), 737–754 (2000)
Kieffer, J.C., Yang, E.-H., Nelson, G., Cosman, P.: Universal Lossless Compression via Multilevel Pattern Matching. IEEE Trans. Inform. Theory IT-46(4), 1227–1245 (2000)
Knuth, D.: Seminumerical Algorithms, pp. 441–462. Addison-Wesley, Reading (1981)
Larsson, N.J., Moffat, A.: Offline Dictionary-Based Compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)
Lehman, E.: Approximation Algorithms for Grammar-Based Compression. PhD thesis, MIT (2002)
Lehman, E., Shelat, A.: Approximation Algorithms for Grammar-Based Compression. In: Proc. 20th Ann. ACM-SIAM Sympo. on Discrete Algorithms, pp. 205–212 (2002)
Lothaire, M.: Combinatorics on Words. Encyclopedia of Mathematics and Its Applications, vol. 17. Addison-Wesley, Reading (1983)
Nevill-Manning, C., Witten, I.: Compression and Explanation Using Hierarchical Grammars. Computer Journal 40(2/3), 103–116 (1997)
Nevill-Manning, C., Witten, I.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artificial Intelligence Research 7, 67–82 (1997)
Rytter, W.: Application of Lempel-Ziv Factorization to the Approximation of Grammar-Based Compression. In: Proc. 13th Ann. Sympo. Combinatorial Pattern Matching, pp. 20–31 (2002)
Sakamoto, H.: A Fully Linear-Time Approximation Algorithm for Grammar-Based Compression. Journal of Discrete Algorithms (to appear)
Salomon, D.: Data compression: the complete reference, 2nd edn. Springer, Heidelberg (1998)
Storer, J., Szymanski, T.: Data compression via textual substitution. J. Assoc. Comput. Mach. 29(4), 928–951 (1982)
Storer, J.A., Szymanski, T.G.: The Macro Model for Data Compression. In: Proc. 10th Ann. Sympo. on Theory of Computing, San Diego, California, pp. 30–39. ACM Press, New York (1978)
Welch, T.A.: A Technique for High Performance Data Compression. IEEE Comput. 17, 8–19 (1984)
Yang, E.-H., Kieffer, J.C.: Efficient Universal Lossless Data Compression Algorithms Based on a Greedy Sequential Grammar Transform–Part One: without Context Models. IEEE Trans. on Inform. Theory 46(3), 755–777 (2000)
Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Trans. on Inform. Theory IT-23(3), 337–349 (1977)
Ziv, J., Lempel, A.: Compression of Individual Sequences via Variable-Rate Coding. IEEE Trans. on Inform. Theory 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sakamoto, H., Kida, T., Shimozono, S. (2004). A Space-Saving Linear-Time Algorithm for Grammar-Based Compression. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-30213-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive