Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2525373.2525383dlproceedingsArticle/Chapter ViewAbstractPublication PagesausdmConference Proceedingsconference-collections
research-article
Free access

Coding of non-stationary sources as a foundation for detecting change points and outliers in binary time-series

Published: 05 December 2012 Publication History

Abstract

An interesting scheme for estimating and adapting distributions in real-time for non-stationary data has recently been the focus of study for several different tasks relating to time series and data mining, namely change point detection, outlier detection and online compression/sequence prediction. An appealing feature is that unlike more sophisticated procedures, it is as fast as the related stationary procedures which are simply modified through discounting or windowing. The discount scheme makes older observations lose their influence on new predictions. The authors of this article recently used a discount scheme for introducing an adaptive version of the Context Tree Weighting compression algorithm. The mentioned change point and outlier detection methods rely on the changing compression ratio of an online compression algorithm. Here we are beginning to provide theoretical foundations for the use of these adaptive estimation procedures that have already shown practical promise.

References

[1]
Barry, D. & Hartigan, J. A. (1993), 'A bayesian analysis for change point problems', Journal of the American Statistical Association 88(421), 309--319.
[2]
Burrows, M. & Wheeler, D. (1994), 'A block-sorting lossless data compression algorithm', Digital SRC Research Report.
[3]
Cleary, J. G. & Witten, I. H. (1984), 'Data compression using adaptive coding and partial string matching', IEEE Transactions on Communications 32, 396--402.
[4]
Cormack, G. V. & Horspool, R. N. S. (1987), 'Data compression using dynamic Markov modelling', The Computer Journal 30(6), 541--550.
[5]
E. S., P. (1955), 'A test for change in a parameter occurring at an unknown point', Biometrika 42.
[6]
Fawcett, T. & Provost, F. (1999), Activity monitoring: noticing interesting changes in behavior, in 'Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining', KDD '99, ACM, New York, NY, USA, pp. 53--62.
[7]
Fu, T. (2011), 'A review on time series data mining', Eng. Appl. Artif. Intell. 24(1), 164--181.
[8]
Guralnik, V. & Srivastava, J. (1999), Event detection from time series data, in 'KDD', pp. 33--42.
[9]
Kawabata, T. & Rashid, M. M. (2003), 'A zero redundancy estimator for the context tree weighting method with a finite window', IEEE International Symposium on Information Theory p. 114.
[10]
Kawahara, Y. & Sugiyama, M. (2012), 'Sequential change-point detection based on direct density-ratio estimation', Stat. Anal. Data Min. 5(2), 114--127.
[11]
Krichevsky, R. E. (1968), 'The connection between the redundancy and reliability of information about the sources', Problems of Information Transmission 4(3), 48--57.
[12]
Krichevsky, R. E. (1970), Lectures on information theory, Technical report, Novosibirsk State University.
[13]
Krichevsky, R. E. (1998), 'Laplace's law of succession and universal encoding', IEEE Transactions on Information Theory 44, 296--303.
[14]
Krichevsky, R. E. & Trofimov, V. K. (1981), 'The performance of universal encoding', IEEE Transactions on Information Theory 27(2), 199--207.
[15]
Lorden, G. (1971), 'Procedures for reacting to a change in distribution', Annals of Mathematical Statistics 42(6), 1897--1908.
[16]
O'Neill, A., Hutter, M., Shao, W. & Sunehag, P. (2012), Adaptive context tree weighting, in '2012 Data Compression Conference, Snowbird, UT, USA, April 10-12, 2012', IEEE Computer Society, pp. 317--326.
[17]
Rissanen, J. J. (1976), 'Generalized Kraft inequality and arithmetic coding', IBM Journal of Research and Devlopment 20(3), 198--203.
[18]
Rissanen, J. J. & Langdon, G. G. (1979), 'Arithmetic coding', IBM Journal of Research and Devlopment 23, 149--162.
[19]
R. P, A. & D. J., M. (2007), 'Bayesian online change-point detection'.
[20]
Takeuchi, J. & Yamanishi, K. (2006), 'A unifying framework for detecting outliers and change points from time series', IEEE Trans. Knowl. Data Eng. 18(4), 482--492.
[21]
Trofimov, V. K. (1974), 'The redundancy of markov source encoding', Problems of Information Transmission 10(4), 16--24.
[22]
Turner, R., Saatci, Y. & Rasmussen, C. E. (2009), Adaptive sequential Bayesian change point detection, in 'Advances in Neural Information Processing Systems (NIPS): Temporal Segmentation Workshop'.
[23]
Willems, F. M. J. (1996), 'Coding for a binary independent piecewise-identically-distributed source', IEEE Transactions on Information Theory 42(6), 2210--2217.
[24]
Willems, F. M. J., Shtarkov, Y. M. & Tjalkens, T. (1995), 'The context tree weighting method: Basic properties', IEEE Transactions on Information Theory 41, 653--664.
[25]
Yamanishi, K. & Takeuchi, J. (2002), A unifying framework for detecting outliers and change points from non-stationary time series data, in 'Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada', ACM, pp. 676--681.
[26]
Zacks, S. (1983), Survey of classical and Bayesian approaches to the change-point problem: Fixed sample and sequential procedures in testing and estimation, Academic Press.
[27]
Zhang, K., Hutter, M. & Jin, W. (2009), A new local distance-based outlier detection approach for scattered real-world data, in 'Proc. 13th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'09)', Vol. 5467 of LNAI, Springer, Bangkok, pp. 813--822.
[28]
Ziv, J. & Lempel, A. (1977), 'A universal algorithm for sequential data compression', IEEE Transactions on Information Theory 23, 337--342.
[29]
Ziv, J. & Lempel, A. (1978), 'Compression of individual sequences via variable-rate coding', IEEE Transactions on Information Theory 24, 530--536.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
AusDM '12: Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
December 2012
233 pages
ISBN:9781921770142

Publisher

Australian Computer Society, Inc.

Australia

Publication History

Published: 05 December 2012

Author Tags

  1. change point
  2. compression
  3. detection
  4. non-stationary sources
  5. outlier
  6. time-series

Qualifiers

  • Research-article

Acceptance Rates

AusDM '12 Paper Acceptance Rate 25 of 55 submissions, 45%;
Overall Acceptance Rate 98 of 232 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 125
    Total Downloads
  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)3
Reflects downloads up to 09 Mar 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media