article

Free access

Modeling for text compression

Authors:

Timothy Bell,

Ian H. Witten,

John G. ClearyAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 21, Issue 4

Pages 557 - 591

https://doi.org/10.1145/76894.76896

Published: 01 December 1989 Publication History

PDF eReader

Abstract

The best schemes for text compression use large models to help them predict which characters will come next. The actual next characters are coded with respect to the prediction, resulting in compression of information. Models are best formed adaptively, based on the text seen so far. This paper surveys successful strategies for adaptive modeling that are suitable for use in practical text compression systems.

The strategies fall into three main classes: finite-context modeling, in which the last few characters are used to condition the probability distribution for the next one; finite-state modeling, in which the distribution is conditioned by the current state (and which subsumes finite-context modeling as an important special case); and dictionary modeling, in which strings of characters are replaced by pointers into an evolving dictionary. A comparison of different methods on the same sample texts is included, along with an analysis of future research directions.

References

[1]

ABRAMSON, D. M. 1989. An adaptive dependency source model for data compression. Commun. ACM 32, 1 (Jan.), 77-83.

Crossref

Google Scholar

[2]

ANGLUIN, D., AND SMITH, C. H. 1983. Inductive inference: Theory and methods. Cornput. Surv. 15, 3 (Sept.), 237-269.

Crossref

Google Scholar

[3]

AUSLANDER, M., HARRISON, W., MILLER, V., AND WEGMAN, M. 1985. PCTERM: A terminal emulator using compression. In Proceedings of the IEEE Globecom '85. IEEE Press, pp. 860-862.

Google Scholar

[4]

BAUM, L. E., PETRIE, T., $OULES, G., AND WEISS, N. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Star. 41,164-171.

Google Scholar

[5]

BELL, T. C. 1986. Better OPM/L text compression. IEEE Trans. Commun. COM-34, 12 (Dec.), 1176-1182.

Google Scholar

[6]

BELL, T. C. 1987. A unifying theory and improvements for existing approaches to text compression. Ph.D. dissertation, Dept. of Computer Science, Univ. of Canterbury, New Zealand.

Google Scholar

[7]

BELL, T. C. 1989. Longest match string searching for Ziv-Lempel compression. Res. Rept. 6/89, Dept. of Computer Science, Univ. of Canterbury, New Zealand.

Google Scholar

[8]

BELL, T. C., AND MOFFAT, A. M. 1989. A note on the DMC data compression scheme. Computer J. 32, 1 (Feb.), 16-20.

Crossref

Google Scholar

[9]

BELL, T. C., AND WITTEN, I. H. 1987. Greedy macro text compression. Res. Rept. 87/285/33. Department of Computer Science, University of Calgary.

Google Scholar

[10]

BENTLEY, J. L., SLEATOR, D. D., TARJAN, R. E., AND WEI, V. K. 1986. A locally adaptive data compression scheme. Commun. 29, 4 (Apr.), 320-330.

Crossref

Google Scholar

[11]

BOOKSTEIN, A., AND FOUTY, G. 1976. A mathematical model for estimating the effectiveness of bigram coding. Inf. Process. Manage. 12.

Google Scholar

[12]

BRENT, R. P. 1987. A linear algorithm for data compression. Aust. Comput. J. 19, 2, 64-68.

Google Scholar

[13]

CAMERON, R. D. 1986. Source encoding using syntactic information source models. LCCR Tech. Rept. 86-7, Simon Fraser University.

Google Scholar

[14]

CLEARY, J. G. 1980. An associative and impressible computer. Ph.D. dissertation. Univ. of Canterbury, Christchurch, New Zealand.

Google Scholar

[15]

CLEARY, J. G., AND WITTEN, I. H. 1984a. A comparison of enumerative and adaptive codes. IEEE Trans. Inf. Theory, IT-30, 2 (Mar.), 306-315.

Google Scholar

[16]

CLEARY, J. G., AND WlTTEN, I. H. 1984b. Data compression using adaptive coding and partial string matching. IEEE Trans. Commun. COM- 32, 4 (Apr.), 396-402.

Google Scholar

[17]

COOrER, D., AND LYNCH, M. F. 1982. Text compression using variable-to-fixed-length encodings. J. Am. Soc. Inf. Sck (Jan.), 18-31.

Google Scholar

[18]

CORMACK, G. V., AND HORSrOOL, R. N. 1984. Algorithms for adaptive Huffman codes. Inf. Process. Lett. 18, 3 (Mar.), 159-166.

Crossref

Google Scholar

[19]

CORMACK, G. V., AND HORSPOOL, R. N. 1987. Data compression using dynamic Markov modelling. Comput. J. 30, 6 (Dec.), 541-550.

Crossref

Google Scholar

[20]

CORTESI, D. 1982. An effective text-compression algorithm. Byte 7, i (Jan.), 397-403.

Google Scholar

[21]

COVER, T. M., ANt) KING, R. C. 1978. A convergent gambling estimate of the entropy of English. IEEE Trans. inf. Theory IT-24, 4 (Jul.), 413-421.

Google Scholar

[22]

DARRAGH, J. J., WITTEN, I. H., AND CLEARY, J. G. 1983. Adaptive text compression to enhance a modem. Res. Rept. 83/132/21. Computer Science Dept., Univ. of Calgary.

Google Scholar

[23]

ELIAS, P. 1975. Universal codeword sets and representations of the integers. IEEE Trans. Inf. Theory IT-21, 2 (Mar.), 194-203.

Google Scholar

[24]

ELIAS, P. 1987. Interval and recency rank source coding: Two on-line adaptive variable-length schemes. IEEE Trans. Inf. Theory IT-33, 1 (Jan.), 3-10.

Crossref

Google Scholar

[25]

EL GAMAL, A. A., HEMACHANDRA, L. A., SHPERLING, I., AND WEI, V. K. 1987. Using simulated annealing to design good codes. IEEE Trans. Inf. Theory, IT-33, 1, 116-123.

Crossref

Google Scholar

[26]

EVANS, T. G. 1971. Grammatical inference techniques in pattern analysis. In Software Engineering, J. Tou, Ed. Academic Press, New York, pp. 183-202.

Google Scholar

[27]

FALLER, N. 1973. An adaptive system for data compression. Record of the 7th Asilomar Conference on Circuits, Systems and Computers. Naval Postgraduate School, Monterey, CA, pp. 593-597.

Google Scholar

[28]

FIALA, E. R., AND GREENE, D. H. 1989. Data compression with finite windows. Commun. A CM 32, 4 (Apr.), 490-505.

Crossref

Google Scholar

[29]

FLAJOLET, P. 1985. Approximate counting: A detailed analysis. Bit 25, 113-134.

Crossref

Google Scholar

[30]

GAINES, B. R. 1976. Behaviour/structure transformations under uncertainty. Int. J. Man-Mach. Stud. 8, 337-365.

Google Scholar

[31]

GAINES, B. R. 1977. System identification, approximation and complexity. Int. J. General Syst. 3, 145-174.

Google Scholar

[32]

GALLAGER, R. G. 1978. Variations on a theme by Huffman. IEEE Trans. Inf. Theory IT-24, 6 (Nov.), 668-674.

Google Scholar

[33]

GOLD, E. M. 1978. On the complexity of automaton identification from given data. Inf. Control 37, 302-320.

Google Scholar

[34]

GONZALEZ-SMITH, M. E., AND STORER, J. A. 1985. Parallel algorithms for data compression. J. ACM 32, 2, 344-373.

Crossref

Google Scholar

[35]

GOTrLIEB, D., HAGERTH, S. A., LEHOT, P. G. H., AND RABiNOWITZ, H. S. 1975. A classification of compression methods and their usefulness for a large data processing center. National Comput. Conf. 44, 453-458.

Google Scholar

[36]

GUAZZO, M. 1980. A general minimum-redundancy source-coding algorithm. IEEE Trans. Inf. Theory IT-26, i (Jan.), 15-25.

Google Scholar

[37]

HELD, G. 1983. Data Compression: Techniques and Applications, Hardware and Software Considerations. Wiley, New York.

Crossref

Google Scholar

[38]

HELMAN, D. R., AND LANGDON, G. G. 1988. Data compression. IEEE Potentials (Feb.), 25-28.

Google Scholar

[39]

HORSPOOL, R. N., AND CORMACK, G. V. (1983). Data compression based on token recognition. Unpublished.

Google Scholar

[40]

HORSPOOL, R. N., AND CORMACK, G. V. 1986. Dynamic Markov modelling--A prediction technique. In Proceedings of the International Conference on the System Sciences, Honolulu, Hi, pp. 7OO-7O7.

Google Scholar

[41]

HUFFMAN, D. A. 1952. A method for the construction of minimum-redundancy codes. In Proceedings of the Institute of Electrical and Radio Engineers 40, 9 (Sept.), pp. 1098-1101.

Google Scholar

[42]

HUNTER, R., AND ROBINSON, A. H. 1980. International digital facsimile coding standards. In Proceedings of the Institute of Electrical and Electronic Engineers 68, 7 (Jul.), pp. 854-867.

Google Scholar

[43]

JAGGER, D. 1989. Fast Ziv-Lempel decoding using RISC architecture. Res. Rept., Dept. of Computer Science, Univ. of Canterbury, New Zealand.

Google Scholar

[44]

JAKOBSSON, M. 1985. Compression of character strings by an adaptive dictionary. BIT 25, 4, 593-603.

Crossref

Google Scholar

[45]

JAMISON, D., AND JAMISON, K. 1968. A note on the entropy of partially-known languages. Inf. Control 12, 164-167.

Google Scholar

[46]

JEWELL, G. C. 1976. Text compaction for information retrieval systems. IEEE Syst., Man and Cybernetics Soc. Newsletter 5, 47.

Google Scholar

[47]

JONES, D. W. 1988. Application of splay trees to data compression. Commun. ACM 31, 8 (Aug.), 996-1007.

Crossref

Google Scholar

[48]

KATAJAINEN, J., AND RAITA, T. 1987a. An approximation algorithm for space-optimal encoding of a text. Res. Rept., Dept. of Computer Science, Univ. of Turku, Turku, Finland.

Google Scholar

[49]

KATAJAINEN, J., AND RAITA, T. 1987b. An analysis of the longest match and the greedy heuristics for text encoding. Res. Rept., Dept. of Computer Science, Univ. of Turku, Turku, Finland.

Google Scholar

[50]

KATAJAINEN, J., PENTTONEN, M., AND TEUHOLA, J. 1986. Syntax-directed compression of program files. Software--Practice and Experience 16, 3, 269-276.

Crossref

Google Scholar

[51]

KNUTH, D. E. 1973. The Art of Computer Programming. Vol. 2, Sorting and Searching. Addison- Wesley, Reading, MA.

Crossref

Google Scholar

[52]

KNUTH, D. E. 1985. Dynamic Huffman coding. J. Algorithms 6, 163-180.

Crossref

Google Scholar

[53]

LANGOON, G. G. 1983. A note on the Ziv-Lempel model for compressing individual sequences. IEEE Trans. Inf. Theory IT-29, 2 (Mar.), 284-287.

Google Scholar

[54]

LANGDON, G. C. 1984. An introduction to arithmetic coding, iBM J. Res. Dev. 28, 2 (Mar.), 135-149.

Google Scholar

[55]

LANGDON, G. G., AND RmSANrN, J. 1981. Compression of black-white images with arithmetic coding. IEEE Trans. Cornmun. COM-29, 6 (Jun.), 858-867.

Google Scholar

[56]

LANGDON, G. G., AND RISSANEN, J. J. 1982. A simple general binary source code. IEEE Trans. Inf. Theory IT-28 (Sept.), 800-803.

Google Scholar

[57]

LANGOON, G. G., ANO RISSANEN, J. J. 1983. A doubly-adaptive file compression algorithm. IEEE Trans. Commun. COM-31, 11 (Nov.), 1253-1255.

Google Scholar

[58]

LrLrWER, D. A., ANO HiRSCHBERG, D. S. 1987. Data compression. Comput. Surv. 13, 3 (Sept.), 261-296.

Crossref

Google Scholar

[59]

LEMPrL, A., AND ZlV, J. 1976. On the complexity of finite sequences. IEEE Trans. Inf. Theory IT-22, i (Jan.), 75-81.

Google Scholar

[60]

LEViNSON, S. E., RABINER, L. R., ANO SONOHI, M. 1983. An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition. Bell Syst. Tech. J. 62, 4 (Apr.), 1035-1074.

Google Scholar

[61]

LLEWELLYN, Z. A. 1987. Data compression for a source with Markov characteristics. Comput. J. 30, 2, 149-156.

Crossref

Google Scholar

[62]

LYNCH, M. F. 1973. Compression of bibliographic files using an adaptation of run-length coding. Inf. Storage Retrieval 9, 207-214.

Google Scholar

[63]

LYNCH, T. J. 1985. Data Compression--Techniques and Applications. Lifetime Learning Publications, Belmont, CA.

Crossref

Google Scholar

[64]

MAYNE, A., AND JAMES, E. B. 1975. Information compression by factorizing common strings. Comput. J. 18, 2, 157-160.

Google Scholar

[65]

G. & C. MERRIAM COMPANY 1963. Webster's Seventh New Collegiate Dictionary. Springfield, MA.

Google Scholar

[66]

MILLER, V. S., AND WEC, MAN, M. N. 1984. Variations on a theme by Ziv and Lempel. In Combinatorial Algorithms on Words. A. Apostolico and Z. Galil, Eds. NATO ASI Series, Vol. F12. Springer-Verlag, Berlin, pp. 131-140.

Google Scholar

[67]

MOrFAT, A. 1987. Word based text compression. Res. Rept., Dept. of Computer Science, Univ. of Melbourne, Victoria, Australia.

Google Scholar

[68]

MOFrAT, A. 1988a. A data structure for arithmetic encoding on large alphabets. In Proceedings of the 11th Australian Computer Science Conference. Brisbane, Australia (Feb.), pp. 309-317.

Google Scholar

[69]

MOFFAT, A. 1988b. A note on the PPM data compression algorithm. Res. Rept. 88/7, Dept. of Computer Science, Univ. of Melbourne, Victoria, Australia.

Google Scholar

[70]

MORRIS, R. 1978. Counting large numbers of events in small registers. Commun. ACM 21, 10 (Oct.), 840-842.

Crossref

Google Scholar

[71]

MORRmON, D. R. 1968. PATRICIA--Practical Algorithm To Retrieve Information Coded In Alphanumeric. J. A CM 15, 514-534.

Crossref

Google Scholar

[72]

OZEKI, K. 1974a. Optimal encoding of linguistic information. Systems, Computers, Controls 5, 3, 96-103. Translated from Denshi Tsushin Gakkai Ronbunshi, Vol. 57-D, No. 6, June 1974, pp. 361-368.

Google Scholar

[73]

OZEKI, K. 1974b. Stochastic context-free grammar and Markov chain. Systems, Computers, Controls 5, 3, 104-110. Translated from Denshi Tsushin Gakkai Ronbunshi, Vol. 57-D, No. 6, June 1974, pp. 369-375.

Google Scholar

[74]

OZEKI, K. 1975. Encoding of linguistic information generated by a Markov chain which is associated with a stochastic context-free grammar. Systems, Computers, Controls 6, 3, 75-80. Translated from Denshi Tsushin Gakkai Ronbunshi, Vol. 58-D, Nol. 6, June 1975, pp. 322-327.

Google Scholar

[75]

PASCO, R. 1976. Source coding algorithms for fast data compression. Ph.D. dissertation. Dept. of Electrical Engineering, Stanford Univ.

Crossref

Google Scholar

[76]

PIKE, J. 1981. Text compression using a 4 bit coding system. Comput. J. 24, 4.

Google Scholar

[77]

RAmNER, L. R., AND JUANG, B. H. 1986. An Introduction to Hidden Markov Models. IEEE ASS P Mag. (Jan.).

Google Scholar

[78]

RAITA, T., AND TEUHOLA, J. (1987). Predictive text compression by hashing. A CM Conference on Information Retrieval, New Orleans.

Crossref

Google Scholar

[79]

RISSANEN, J. J. 1976. Generalized Kraft inequality and arithmetic coding. IBM J. Res. Dev. 20, (May), 198-203.

Google Scholar

[80]

RlSSANEN, J. J. 1979. Arithmetic codings as number representations. Acta Polytechnic Scandinavica, Math 31 (Dec.), 44-51.

Google Scholar

[81]

RISSANEN, J. 1983. A universal data compression system. IEEE Trans. Inf. Theory IT-29, 5 (Sept.), 656-664.

Google Scholar

[82]

RISSANEN, J., AND LANGDON, G. G. 1979. Arithmetic coding. IBM J. Res. Dev. 23, 2 (Mar.), 149-162.

Google Scholar

[83]

RlSSANEN, J., AND LANGDON, G. G. 1981. Universal modeling and coding. IEEE Trans. Inf. Theory IT-27, 1 (Jan.), 12-23.

Google Scholar

[84]

ROBERTS, M. G. 1982. Local order estimating Markovian analysis for noiseless source coding and authorship identification. Ph.D. dissertation. Stanford Univ.

Google Scholar

[85]

ROOEH, M., PRATT, V. R., AND EVEN, S. 1981. Linear algorithm for data compression via string matching. J. ACM 28, 1 (Jan.), 16-24.

Crossref

Google Scholar

[86]

RUBIN, F. 1976. Experiments in text file compression. Commun. ACM 19, 11, 617-623.

Crossref

Google Scholar

[87]

RUBIN, F. 1979. Arithmetic stream coding using fixed precision registers. IEEE Trans. In/. Theory IT-25, 6 (Nov.), 672-675.

Google Scholar

[88]

RYABKO, B. Y. 1980. Data compression by means of a "book stack." Problemy Peredachi Informatsii 16, 4.

Google Scholar

[89]

SCHIEBER, W. D., AND THOMAS, G. W. 1971. An algorithm for compaction of alphanumeric data. J. Library Automation 4, 198-206.

Google Scholar

[90]

SCHUEGRAF, E. J., AND HEAPS, H. S. 1973. Selection of equifrequent word fragments for information retrieval. Inf. Storage Retrieval 9, 697-711.

Google Scholar

[91]

SCHUE6RA~, E. J., AND HEAPS, H. S. 1974. A comparison of algorithms for data-base compression by use of fragments as language elements. Inf. Storage Retrieval 10, 309-319.

Google Scholar

[92]

SHANNON, C. E. 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27 (Jul.), 398-403.

Google Scholar

[93]

SHANNON, C. E. 1951. Prediction and entropy of printed English. Bell Syst. Tech. J. (Jan.), 50-64.

Google Scholar

[94]

SNYDERMAN, M., AND HUNT, B. 1970. The myriad virtues of text compaction. Datamation I (Dec.), 36-40.

Google Scholar

[95]

STORER, J. A. 1977. NP-completeness results concerning data compression. Tech. Rept. 234. Dept. of Electrical Engineering and Computer Science, Princeton Univ., Princeton, NJ.

Google Scholar

[96]

STORER, J. A. 1988. Data Compression: Methods and Theory. Computer Science Press, Rockville, MD.

Crossref

Google Scholar

[97]

STORER, J. A., AND SZYMANSKI, T. G. 1982. Data compression via textual substitution. J. ACM 29, 4 (Oct.), 928-951.

Crossref

Google Scholar

[98]

SVANKS, M. I. 1975. Optimizing the storage of alphanumeric data. Can. Datasyst. (May), 38-40.

Google Scholar

[99]

TAN, C. P. 1981. On the entropy of the Malay language, iEEE Trans. Inf. Theory IT-27, 3 (May), 383-384.

Google Scholar

[100]

THOMAS, S. W., MCKm, J., DAVIES, S., TURKOWSKI, K., WOODS, J. A., AND OROST, J. W. 1985. Compress (Version 4.0) program and documentation. Available from joe(~petsd. UUCP.

Google Scholar

[101]

TISCHER, P. 1987. A modified Lempel-Ziv-Welch data compression scheme. Aust. Comp. Sck Commun. 9, 1, 262-272.

Google Scholar

[102]

TODD, S., LANGDON, G. G., AND RISSANEN, J. 1985. Parameter reduction and context selection for compression of grey-scale images. IBM J. Res. Dev. 29, 2 (Mar.), 188-193.

Crossref

Google Scholar

[103]

TROPPER, R. 1982. Binary-coded text, a compression method. Byte 7, 4 (Apr.), 398-413.

Google Scholar

[104]

VITTER, J. S. 1987. Design and analysis of dynamic Huffman codes. J. ACM 34, 4 (Oct.), 825-845.

Crossref

Google Scholar

[105]

VITTER, J. S. 1989. Dynamic Huffman coding. ACM Trans. Math. Softw. 15, 2 (Jun.), 158-167.

Crossref

Google Scholar

[106]

WAGNER, R. A. 1973. Common phrase and minimum-space text storage. Commun. A CM 16, 3, 148-152.

Crossref

Google Scholar

[107]

WALKER, D. E., AND AMSLER, R. A. 1986. The use of machine-readable dictionaries in sublanguage analysis. In Analysing languages in restricted domains: Sublanguage description and processing, R. Grishman and R. Kittridge, Eds. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 69-83.

Google Scholar

[108]

WELCH, T. A. 1984. A technique for high-performance data compression. IEEE Computer 17, 6 (Jun.), 8-19.

Google Scholar

[109]

WHITE, H. E. 1967. Printed English compression by dictionary encoding. In Proceedings of the Institute of Electrical and Electronic Engineers 55, 3, 390-396.

Google Scholar

[110]

WILLIAMS, R. 1988. Dynamic-history predictive compression. Inf. Syst. 13, 1,129-140.

Crossref

Google Scholar

[111]

WlTTEN, I. H. 1979. Approximate, non-deterministic modelling of behaviour sequences. Int. J. General Systems 5 (Jan.), 1-12.

Google Scholar

[112]

WITTEN, I. H. 1980. Probabilistic behaviour/structure transformations using transitive Moore models. Int. J. General Syst. 6, 3, 129-137.

Google Scholar

[113]

WlTTEN, I. H., AND CLEARY, J. 1983. Picture coding and transmission using adaptive modelling of quad trees. In Proceedings of the International Electrical, Electronics Conference 1, Toronto, ON, pp. 222-225.

Google Scholar

[114]

WITTEN, I. H., AND CLEARY, J. G. 1988. On the privacy afforded by adaptive text compression. Computers and Security 7, 4 (Aug.), 397-408.

Crossref

Google Scholar

[115]

WITTEN, I. H., SEAL, R., AND CLEARY, J. G. 1987. Arithmetic coding for data compression. Commun. ACM 30, 6 (Jun.), 520-540.

Crossref

Google Scholar

[116]

WOLFF, J. G. 1978. Recoding of natural language for economy of transmission or storage. Comput. J. 21, 1, 42-44.

Google Scholar

[117]

YOUNG, D. M. 1985. MacWrite file formats. Wheels for the Mind (Newsletter of the Australian Apple University Consortium), University of Western Australia, Nedlands, WA 6009, Australia, p. 34.

Google Scholar

[118]

ZIv, J., AND LEMPEL, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory IT-23, 3 (May), 337-343.

Google Scholar

[119]

ZIv, J., ANO LEMPEL, A. 1978. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory IT-24, 5 (Sept.), 530-536.

Google Scholar

Cited By

View all

Zhu WTong WGe HZhang ZZhang MZhou W(2024)LpaqHP: A High-Performance FPGA Accelerator for LPAQ CompressionProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673051(898-907)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673051
Yuan YWang RRanganathan NRao NKumar SLantz PSanjeepan VCabrera JKwatra ASankaran RJeong IKim N(2024)Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00066(848-862)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00066
Alapatt BPoonia RSingh C(2024)Dictionary-Based BPT Compression with Trimodal Encryption for Efficient Fiber-Optic Data Management and SecuritySmart Systems: Innovations in Computing10.1007/978-981-97-3690-4_48(641-658)Online publication date: 30-Sep-2024
https://doi.org/10.1007/978-981-97-3690-4_48
Show More Cited By

Recommendations

Compressed Context Modeling for Text Compression
DCC '11: Proceedings of the 2011 Data Compression Conference

In text compression, statistical context modeling aims to construct a model to calculate the probability distribution of a character based upon its context. The order -- $k$ context of a symbol is defined as the string formed by its preceding $k$ ...
Dictionary design for text image compression with JBIG2

The JBIG2 standard for lossy and lossless bilevel image coding is a very flexible encoding strategy based on pattern matching techniques. This paper addresses the problem of compressing text images with JBIG2. For text image compression, JBIG2 allows ...
Double-byte text compression

Reviews

Reviewer: Glen G. Langdon

The first part of this survey follows the modeling and coding approach described by Rissanen and Langdon [1], and the second part surveys dictionary approaches, including the Ziv-Lempel approach. The work describes general techniques for compressing data composed of a sequence of characters. The novice will have some difficulty with the ambiguous use of “model” in place of a more specific term. Fixed-context and finite-context models apparently mean the same thing, which I would guess is a shift-register finite state machine (FSM) for a finite history of previous symbols. A classification of concepts in Section 1 (“Context Modeling Techniques”) and Section 2 (“Other Statistical Modeling Techniques”) fails to note that model structure (the context FSM) and model statistics (the probability estimation FSM) are independent. The doubly adaptive file compression (DAFC) reference [2] illustrates the point: both the structure and the statistics are adaptive. DAFC starts with a single context and adaptively grows additional contexts, thus anticipating the basic idea of dynamic Markov compression [3]. The survey refers to a valuable corpus of text files. The concluding section on future research shows superb insight into the field. The set of references is excellent.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ACM Computing Surveys Volume 21, Issue 4

Dec. 1989

107 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/76894

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 1989

Published in CSUR Volume 21, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

161
Total Citations
View Citations
3,956
Total Downloads

Downloads (Last 12 months)415
Downloads (Last 6 weeks)54

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhu WTong WGe HZhang ZZhang MZhou W(2024)LpaqHP: A High-Performance FPGA Accelerator for LPAQ CompressionProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673051(898-907)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673051
Yuan YWang RRanganathan NRao NKumar SLantz PSanjeepan VCabrera JKwatra ASankaran RJeong IKim N(2024)Intel Accelerators Ecosystem: An SoC-Oriented Perspective : Industry Product2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00066(848-862)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00066
Alapatt BPoonia RSingh C(2024)Dictionary-Based BPT Compression with Trimodal Encryption for Efficient Fiber-Optic Data Management and SecuritySmart Systems: Innovations in Computing10.1007/978-981-97-3690-4_48(641-658)Online publication date: 30-Sep-2024
https://doi.org/10.1007/978-981-97-3690-4_48
Saini PAgarwal S(2023)Modified Huffman based Text Compression Scheme for VLF Communication System2023 First International Conference on Microwave, Antenna and Communication (MAC)10.1109/MAC58191.2023.10177081(1-4)Online publication date: 24-Mar-2023
https://doi.org/10.1109/MAC58191.2023.10177081
Jeong GSharma BTerrell NDhanotia AZhao ZAgarwal NKejariwal AKrishna T(2023)Characterization of Data Compression in Datacenters2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00010(1-12)Online publication date: Apr-2023
https://doi.org/10.1109/ISPASS57527.2023.00010
Liu PWei ZYu CChen S(2022)HybriDC: A Resource-Efficient CPU-FPGA Heterogeneous Acceleration System for Lossless Data CompressionMicromachines10.3390/mi1311202913:11(2029)Online publication date: 19-Nov-2022
https://doi.org/10.3390/mi13112029
Marjai PLehotay-Kéry PKiss A(2022)A Novel Dictionary-Based Method to Compress Log Files with Different Message Frequency DistributionsApplied Sciences10.3390/app1204204412:4(2044)Online publication date: 16-Feb-2022
https://doi.org/10.3390/app12042044
Jeong GSharma BTerrell NDhanotia AZhao ZAgarwal NKejariwal AKrishna T(2022)Understanding Data Compression in Warehouse-Scale Datacenter Services2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS55109.2022.00028(221-223)Online publication date: May-2022
https://doi.org/10.1109/ISPASS55109.2022.00028
Sahu SPal S(2022)Effect of stopwords in Indian language IRSādhanā10.1007/s12046-021-01731-z47:1Online publication date: 10-Jan-2022
https://doi.org/10.1007/s12046-021-01731-z
Goyal MTatwawadi KChandak SOchoa I(2021)DZip: improved general-purpose loss less compression based on novel neural network modeling2021 Data Compression Conference (DCC)10.1109/DCC50243.2021.00023(153-162)Online publication date: Mar-2021
https://doi.org/10.1109/DCC50243.2021.00023
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Recommendations

Compressed Context Modeling for Text Compression

Dictionary design for text image compression with JBIG2

Double-byte text compression

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations