CPT+: Decreasing the Time/Space Complexity of the Compact Prediction Tree

Ted Gueniche¹⁰,
Philippe Fournier-Viger¹⁰,
Rajeev Raman¹¹ &
…
Vincent S. Tseng¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9078))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

4344 Accesses
33 Citations

Abstract

Predicting next items of sequences of symbols has many applications in a wide range of domains. Several sequence prediction models have been proposed such as DG, All-k-order markov and PPM. Recently, a model named Compact Prediction Tree (CPT) has been proposed. It relies on a tree structure and a more complex prediction algorithm to offer considerably more accurate predictions than many state-of-the-art prediction models. However, an important limitation of CPT is its high time and space complexity. In this article, we address this issue by proposing three novel strategies to reduce CPT’s size and prediction time, and increase its accuracy. Experimental results on seven real life datasets show that the resulting model (CPT+) is up to 98 times more compact and 4.5 times faster than CPT, and has the best overall accuracy when compared to six state-of-the-art models from the literature: All-K-order Markov, CPT, DG, Lz78, PPM and TDAG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction

Succinct BWT-Based Sequence Prediction

Constraint-Based Sequence Mining Using Constraint Programming

References

Begleiter, R., El-yaniv, R., Yona, G.: On prediction using variable order markov models. Journal of Artificial Intelligence Research 22, 385–421 (2004)
MATH MathSciNet Google Scholar
Cleary, J., Witten, I.: Data compression using adaptive coding and partial string matching. IEEE Trans. on Inform. Theory 24(4), 413–421 (1984)
MathSciNet Google Scholar
Deshpande, M., Karypis, G.: Selective markov models for predicting web page accesses. ACM Transactions on Internet Technology 4(2), 163–184 (2004). https://www.developers.google.com/prediction, Accessed: 2014–02-15
Article Google Scholar
Gopalratnam, K., Cook, D.J.: Online sequential prediction via incremental parsing: The active lezi algorithm. IEEE Intelligent Systems 22(1), 52–58 (2007)
Article Google Scholar
Gueniche, T., Fournier-Viger, P., Tseng, V.S.: Compact prediction tree: a lossless model for accurate sequence prediction. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013, Part II. LNCS, vol. 8347, pp. 177–188. Springer, Heidelberg (2013)
Chapter Google Scholar
Fournier-Viger, P., Gueniche, T., Tseng, V.S.: Using partially-ordered sequential rules to generate more accurate sequence prediction. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS, vol. 7713, pp. 431–442. Springer, Heidelberg (2012)
Chapter Google Scholar
Laird, P., Saul, R.: Discrete sequence prediction and its applications. Machine Learning 15(1), 43–68 (1994)
MATH Google Scholar
Padmanabhan, V.N., Mogul, J.C.: Using Prefetching to Improve World Wide Web Latency. Computer Communications 16, 358–368 (1998)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans. Known. Data Engin. 16(11), 1424–1440 (2004)
Article Google Scholar
Pitkow, J., Pirolli, P.: Mining longest repeating subsequence to predict world wide web surng. In: Proc. 2nd USENIX Symposium on Internet Technologies and Systems, Boulder, CO, pp. 13–25 (1999)
Google Scholar
Sun, R., Giles, C.L.: Sequence Learning: From Recognition and Prediction to Sequential Decision Making. IEEE Intelligent Systems 16(4), 67–70 (2001)
Article Google Scholar
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of computer science, University of Moncton, Moncton, Canada
Ted Gueniche & Philippe Fournier-Viger
Department of Computer Science, University of Leicester, Leicester, UK
Rajeev Raman
Department of computer science and inf. eng., National Cheng Kung University, Tainan City, Taiwan
Vincent S. Tseng

Authors

Ted Gueniche
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Raman
View author publications
You can also search for this author in PubMed Google Scholar
Vincent S. Tseng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Fournier-Viger .

Editor information

Editors and Affiliations

Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Tru Cao
Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Nanjing University, Nanjing, China
Zhi-Hua Zhou
Japan Advanced Institute of Science and Technology, Nomi City, Japan
Tu-Bao Ho
The University of Hong Kong, Hong Kong, Hong Kong SAR
David Cheung
Osaka University, Osaka, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gueniche, T., Fournier-Viger, P., Raman, R., Tseng, V.S. (2015). CPT+: Decreasing the Time/Space Complexity of the Compact Prediction Tree. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_49

Download citation

DOI: https://doi.org/10.1007/978-3-319-18032-8_49
Published: 09 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CPT+: Decreasing the Time/Space Complexity of the Compact Prediction Tree

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction

Succinct BWT-Based Sequence Prediction

Constraint-Based Sequence Mining Using Constraint Programming

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

CPT+: Decreasing the Time/Space Complexity of the Compact Prediction Tree

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction

Succinct BWT-Based Sequence Prediction

Constraint-Based Sequence Mining Using Constraint Programming

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation