Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Controlled permutations for testing adaptive learning models

Published: 01 June 2014 Publication History

Abstract

We study evaluation of supervised learning models that adapt to changing data distribution over time (concept drift). The standard testing procedure that simulates online arrival of data (test-then-train) may not be sufficient to generalize about the performance, since that single test concludes how well a model adapts to this fixed configuration of changes, while the ultimate goal is to assess the adaptation to changes that happen unexpectedly. We propose a methodology for obtaining datasets for multiple tests by permuting the order of the original data. A random permutation is not suitable, as it makes the data distribution uniform over time and destroys the adaptive learning task. Therefore, we propose three controlled permutation techniques that make it possible to acquire new datasets by introducing restricted variations in the order of examples. The control mechanisms with theoretical guarantees of preserving distributions ensure that the new sets represent close variations of the original learning task. Complementary tests on such sets allow to analyze sensitivity of the performance to variations in how changes happen and this way enrich the assessment of adaptive supervised learning models.

References

[1]
Aldous D, Diaconis P (1986) Shuffling cards and stopping times. Am Math Mon 93(5):333---348
[2]
Antoch J, Huskova M (2001) Permutation tests in change point analysis. Stat Probab Lett 53:37---46
[3]
Atkinson M (1999) Restricted permutations. Discret Math 195:27---38
[4]
Baena-Garcia M, del Campo-Avila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Proceedings of ECML PKDD workshop on knowledge discovery from Data Streams, p 7786
[5]
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601---1604
[6]
Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavalda R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 139---148
[7]
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1---30
[8]
Diaconis P (1988) Group representations in probability and statistics, vol 11 of Lecture notes-monograph series. Hayward Institute of Mathematical Statistics
[9]
Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895---1923
[10]
Durrett R (2003) Shuffling chromosomes. J Theor Probab 16(3):725---750
[11]
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Proceedings of Brazilian symposium on artificial intelligence (SBIA), pp 286---295
[12]
Gama J, Sebastiao R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 329---338
[13]
Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Technical report, The University of South Wales
[14]
Ikonomovska E, Gama J, Dzeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Discov 23(1):128---168
[15]
Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22:371---391
[16]
Kolter J, Maloof M (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755---2790
[17]
Ojala M, Garriga G (2010) Permutation tests for studying classifier performance. J Mach Learn Res 11:1833---1863
[18]
Pemantle R (1989) Randomization time for the overhand shuffle. J Theor Probab 2(1):37---49
[19]
Pfahringer B, Holmes G, Kirkby R (2007) New options for hoeffding trees. In: Proceedings of the 20th Australian joint conference on advances in artificial intelligence (AJCAAI), pp 90---99
[20]
Politis D (2003) The impact of bootstrap methods on time series analysis. Stat Sci 18(2):219---230
[21]
Schiavinotto T, Stutzle T (2007) A review of metrics on permutations for search landscape analysis. Comput Oper Res 34(10):3143---3153
[22]
Sorensen K (2007) Distance measures based on the edit distance for permutation-type representations. J Heuristics 13(1):35---47
[23]
Welch W (1990) Construction of permutation tests. J Am Stat Assoc 85(411):693---698
[24]
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69---101
[25]
Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Los Altos, CA
[26]
Wozniak M (2011) A hybrid decision tree training method using data streams. Knowl Inf Syst 29(2):335---347
[27]
Vlachos M, Yu P, Castelli V, Meek Ch (2006) Structural periodic measures for time-series data. Data Min Knowl Discov 12:1---28
[28]
Zliobaite I (2011) Combining similarity in time and space for training set formation under concept drift. Intell Data Anal 15(4):589---611
[29]
Zliobaite I (2011) Controlled permutations for testing adaptive classifiers. In: Proceedings of the 14th international conference discovery science (DS), pp 365---379

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems
Knowledge and Information Systems  Volume 39, Issue 3
June 2014
243 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 June 2014

Author Tags

  1. Concept drift
  2. Data streams
  3. Evaluation
  4. Permutations

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Challenges in benchmarking stream learning algorithms with real-world dataData Mining and Knowledge Discovery10.1007/s10618-020-00698-534:6(1805-1858)Online publication date: 7-Jul-2020
  • (2016)Characterizing concept driftData Mining and Knowledge Discovery10.1007/s10618-015-0448-430:4(964-994)Online publication date: 1-Jul-2016
  • (2015)Adaptive ensembles for evolving data streamsProceedings of the 4th International Conference on New Frontiers in Mining Complex Patterns10.5555/3122094.3122096(3-16)Online publication date: 7-Sep-2015
  • (2015)Recovery analysis for adaptive learning from non-stationary data streamsNeurocomputing10.1016/j.neucom.2014.09.076150:PA(250-264)Online publication date: 20-Feb-2015
  • (2014)Open challenges for data stream mining researchACM SIGKDD Explorations Newsletter10.1145/2674026.267402816:1(1-10)Online publication date: 25-Sep-2014

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media