Abstract
Though many compression methods are based on the use of variable length codes, there has recently been a trend to search for alternatives in which the lengths of the codewords are more restricted, which can be useful for fast decoding and compressed searches. This paper explores the construction of variable-to-fixed length codes, which have been suggested long ago by Tunstall. Using a new heuristic based on suffix trees, the performance of Tunstall codes could be improved by more than 30%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abrahams, J.: Code and parse trees for lossless source encoding. Comm. in Information and Systems 1(2), 113–146 (2001)
Brisaboa, N.R., Fariña, A., Navarro, G., Esteller, M.F. (s,c)-dense coding: an optimized compression code for natural language text databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003)
Brisaboa, N.R., Iglesias, E.L., Navarro, G., Paramá, J.R.: An efficient compression code for text databases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 468–481. Springer, Heidelberg (2003)
Crochemore, M., Ilie, L., Smyth, W.F.: A Simple Algorithm for Computing the Lempel Ziv Factorization. In: Proc. Data Compression Conference DCC 2008, Snowbird, Utah, pp. 482–488 (2008)
Fraenkel, A.S., Klein, S.T.: Complexity Aspects of Guessing Prefix Codes. Algorithmica 12, 409–419 (1994)
Chrobak, M., Kolman, P., Sgall, J.: The greedy algorithm for the minimum common string partition problem. ACM Transactions on Algorithms 1(2), 350–366 (2005)
Fraenkel, A.S., Mor, M., Perl, Y.: Is text compression by prefixes and suffixes practical? Acta Informatica 20, 371–389 (1983)
Huffman, D.: A method for the construction of minimum redundancy codes. Proc. of the IRE 40, 1098–1101 (1952)
Klein, S.T.: Improving static compression schemes by alphabet extension. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 210–221. Springer, Heidelberg (2000)
Klein, S.T., Kopel Ben-Nissan, M.: Using Fibonacci compression codes as alternatives to dense codes. In: Proc. Data Compression Conference DCC 2008, Snowbird, Utah, pp. 472–481 (2008)
Moffat, A.: Word-based text compression. Software – Practice & Experience 19, 185–198 (1989)
de Moura, E.S., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Trans. on Information Systems 18, 113–139 (2000)
Savari, S.A., Gallager, R.G.: Generalized Tunstall codes for sources with memory. IEEE Trans. Info. Theory IT–43, 658–668 (1997)
Tjalkens, T.J., Willems, F.M.J.: Variable to fixed length codes for Markov sources. IEEE Trans. Info. Theory IT–33, 246–257 (1987)
Tunstall, B.P.: Synthesis of noiseless compression codes, Ph.D dissertation, Georgia Institute of Technology, Atlanta, GA (1967)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Véronis, J., Langlais, P.: Evaluation of parallel text alignment systems: The arcade project. In: Véronis, J. (ed.) Parallel Text Processing, pp. 369–388. Kluwer Academic Publishers, Dordrecht (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Klein, S.T., Shapira, D. (2008). Improved Variable-to-Fixed Length Codes. In: Amir, A., Turpin, A., Moffat, A. (eds) String Processing and Information Retrieval. SPIRE 2008. Lecture Notes in Computer Science, vol 5280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89097-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-89097-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89096-6
Online ISBN: 978-3-540-89097-3
eBook Packages: Computer ScienceComputer Science (R0)