Abstract
Text compression algorithms based on the Burrows–Wheeler transform (BWT) typically achieve a good compression ratio but are slow compared to Lempel–Ziv type compression algorithms. The main culprit is the time needed to compute the BWT during compression and its inverse during decompression. We propose to speed up BWT-based compression by performing a grammar-based precompression before the transform. The idea is to reduce the amount of data that BWT and its inverse have to process. We have developed a very fast grammar precompressor using pair replacement. Experiments show a substantial speed up in practice without a significant effect on compression ratio.
Supported by Academy of Finland grant 118653 (ALGODAN).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abel, J.: Post BWT stages of the Burrows-Wheeler compression algorithm. Softw., Pract. Exper. 40(9), 751–777 (2010)
Adjeroh, D., Bell, T., Mukherjee, A.: The Burrows–Wheeler Transform: Data Compression Suffix Arrays, and Pattern Matching. Springer (2008)
Bentley, J.L., McIlroy, M.D.: Data compression with long repeated strings. Inf. Sci. 135(1-2), 1–11 (2001)
Cannane, A., Williams, H.E.: General-purpose compression for efficient retrieval. JASIST 52(5), 430–437 (2001)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Fariña, A., Brisaboa, N.R., Navarro, G., Claude, F., Places, Á.S., Rodríguez, E.: Word-based self-indexes for natural language text. ACM Trans. Inf. Syst. 30(1), 1 (2012)
Ferragina, P., Manzini, G.: On compressing the textual web. In: Proc. 3rd Conference on Web Search and Web Data Mining (WSMD), pp. 391–400. ACM (2010)
Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Slashing the time for BWT inversion. In: Proc. Data Compression Conference, pp. 99–108. IEEE CS (2012)
Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88, 1722–1732 (2000)
Mahoney, M.: Large text compression benchmark (July 10, 2012), http://mattmahoney.net/dc/text.html
Manber, U.: A text compression scheme that allows fast searching directly in the compressed file. ACM Trans. Inf. Syst. 15(2), 124–136 (1997)
Skibinski, P., Grabowski, S., Deorowicz, S.: Revisiting dictionary-based compression. Softw., Pract. Exper. 35(15), 1455–1476 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kärkkäinen, J., Mikkola, P., Kempa, D. (2012). Grammar Precompression Speeds Up Burrows–Wheeler Compression. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34109-0_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-34109-0_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34108-3
Online ISBN: 978-3-642-34109-0
eBook Packages: Computer ScienceComputer Science (R0)