Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/789085.789587guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Linear-Time, Incremental Hierarchy Inference for Compression

Published: 25 March 1997 Publication History

Abstract

Data compression and learning are, in some sense, two sides of the same coin. If we paraphrase Occam's razor by saying that a small theory is better than a larger theory with the same explanatory power, we can characterize data compression as a preoccupation with small, and learning as a preoccupation with better. Nevill-Manning et al. (see Proc. Data Compression Conference, Los Alamitos, CA, p.244-253, 1994) presented an algorithm, since dubbed SEQUITUR, that presents both faces of the compression/learning coin. Its performance as a data compression scheme outstrips other dictionary schemes, and the structures that it learns from sequences as diverse as DNA and music are intuitively compelling. We present three new results that characterize SEQUITUR's computational and compression performance. First, we prove that SEQUITUR operates in time linear in n, the length of the input sequence, despite its ability to build a hierarchy as deep as log(n). Second, we show that a sequence can be compressed incrementally, improving on the non-incremental algorithm that was described by Nevill-Manning et al., and making on-line compression feasible. Third, we present an intriguing result that emerged during benchmarking; whereas PPMC outperforms SEQUITUR on most files in the Calgary corpus, SEQUITUR regains the lead when tested on multimegabyte sequences. We make some tentative conclusions about the underlying reasons for this phenomenon, and about the nature of current compression benchmarking.

Cited By

View all
  • (2019)Safer Program Behavior Sharing Through Trace WringingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304074(1059-1072)Online publication date: 4-Apr-2019
  • (2018)Efficient document analytics on compressed dataProceedings of the VLDB Endowment10.14778/3236187.323620311:11(1522-1535)Online publication date: 1-Jul-2018
  • (2018)GrammarViz 3.0ACM Transactions on Knowledge Discovery from Data10.1145/305112612:1(1-28)Online publication date: 13-Feb-2018
  • Show More Cited By
  1. Linear-Time, Incremental Hierarchy Inference for Compression

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    DCC '97: Proceedings of the Conference on Data Compression
    March 1997
    ISBN:0818677619

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 25 March 1997

    Author Tags

    1. Calgary corpus
    2. SEQUITUR algorithm
    3. compression benchmarking
    4. compression performance
    5. computational performance
    6. data compression
    7. dictionary schemes
    8. grammar
    9. input sequence length
    10. learning
    11. linear-time incremental hierarchy inference
    12. multimegabyte sequences
    13. nonincremental algorithm
    14. online compression

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 23 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Safer Program Behavior Sharing Through Trace WringingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304074(1059-1072)Online publication date: 4-Apr-2019
    • (2018)Efficient document analytics on compressed dataProceedings of the VLDB Endowment10.14778/3236187.323620311:11(1522-1535)Online publication date: 1-Jul-2018
    • (2018)GrammarViz 3.0ACM Transactions on Knowledge Discovery from Data10.1145/305112612:1(1-28)Online publication date: 13-Feb-2018
    • (2016)Hierarchical Program PathsACM Transactions on Software Engineering and Methodology10.1145/296309425:3(1-44)Online publication date: 22-Aug-2016
    • (2014)SunCat: helping developers understand and predict performance problems in smartphone applicationsProceedings of the 2014 International Symposium on Software Testing and Analysis10.1145/2610384.2610410(282-292)Online publication date: 21-Jul-2014
    • (2013)Elastic and scalable tracing and accurate replay of non-deterministic eventsProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2465001(59-68)Online publication date: 10-Jun-2013
    • (2010)Scalable Communication Trace CompressionProceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing10.1109/CCGRID.2010.111(408-417)Online publication date: 17-May-2010
    • (2009)Profiling Java programs for parallelismProceedings of the 2009 ICSE Workshop on Multicore Software Engineering10.1109/IWMSE.2009.5071383(49-55)Online publication date: 18-May-2009
    • (2009)A holistic approach to managing software change impactJournal of Systems and Software10.1016/j.jss.2009.06.05282:12(2051-2067)Online publication date: 1-Dec-2009
    • (2008)Dynamic slicing on Java bytecode tracesACM Transactions on Programming Languages and Systems10.1145/1330017.133002130:2(1-49)Online publication date: 14-Mar-2008
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media