article

Controlled permutations for testing adaptive learning models

Author:

Indrė ŽliobaitėAuthors Info & Claims

Knowledge and Information Systems, Volume 39, Issue 3

Pages 565 - 578

https://doi.org/10.1007/s10115-013-0629-7

Published: 01 June 2014 Publication History

Abstract

We study evaluation of supervised learning models that adapt to changing data distribution over time (concept drift). The standard testing procedure that simulates online arrival of data (test-then-train) may not be sufficient to generalize about the performance, since that single test concludes how well a model adapts to this fixed configuration of changes, while the ultimate goal is to assess the adaptation to changes that happen unexpectedly. We propose a methodology for obtaining datasets for multiple tests by permuting the order of the original data. A random permutation is not suitable, as it makes the data distribution uniform over time and destroys the adaptive learning task. Therefore, we propose three controlled permutation techniques that make it possible to acquire new datasets by introducing restricted variations in the order of examples. The control mechanisms with theoretical guarantees of preserving distributions ensure that the new sets represent close variations of the original learning task. Complementary tests on such sets allow to analyze sensitivity of the performance to variations in how changes happen and this way enrich the assessment of adaptive supervised learning models.

References

[1]

Aldous D, Diaconis P (1986) Shuffling cards and stopping times. Am Math Mon 93(5):333---348

[2]

Antoch J, Huskova M (2001) Permutation tests in change point analysis. Stat Probab Lett 53:37---46

[3]

Atkinson M (1999) Restricted permutations. Discret Math 195:27---38

[4]

Baena-Garcia M, del Campo-Avila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Proceedings of ECML PKDD workshop on knowledge discovery from Data Streams, p 7786

[5]

Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601---1604

Digital Library

[6]

Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavalda R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 139---148

[7]

Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1---30

Digital Library

[8]

Diaconis P (1988) Group representations in probability and statistics, vol 11 of Lecture notes-monograph series. Hayward Institute of Mathematical Statistics

[9]

Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895---1923

Digital Library

[10]

Durrett R (2003) Shuffling chromosomes. J Theor Probab 16(3):725---750

[11]

Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Proceedings of Brazilian symposium on artificial intelligence (SBIA), pp 286---295

[12]

Gama J, Sebastiao R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 329---338

[13]

Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Technical report, The University of South Wales

[14]

Ikonomovska E, Gama J, Dzeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Discov 23(1):128---168

Digital Library

[15]

Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22:371---391

Digital Library

[16]

Kolter J, Maloof M (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755---2790

Digital Library

[17]

Ojala M, Garriga G (2010) Permutation tests for studying classifier performance. J Mach Learn Res 11:1833---1863

Digital Library

[18]

Pemantle R (1989) Randomization time for the overhand shuffle. J Theor Probab 2(1):37---49

[19]

Pfahringer B, Holmes G, Kirkby R (2007) New options for hoeffding trees. In: Proceedings of the 20th Australian joint conference on advances in artificial intelligence (AJCAAI), pp 90---99

[20]

Politis D (2003) The impact of bootstrap methods on time series analysis. Stat Sci 18(2):219---230

[21]

Schiavinotto T, Stutzle T (2007) A review of metrics on permutations for search landscape analysis. Comput Oper Res 34(10):3143---3153

Digital Library

[22]

Sorensen K (2007) Distance measures based on the edit distance for permutation-type representations. J Heuristics 13(1):35---47

Digital Library

[23]

Welch W (1990) Construction of permutation tests. J Am Stat Assoc 85(411):693---698

[24]

Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69---101

[25]

Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Los Altos, CA

[26]

Wozniak M (2011) A hybrid decision tree training method using data streams. Knowl Inf Syst 29(2):335---347

Digital Library

[27]

Vlachos M, Yu P, Castelli V, Meek Ch (2006) Structural periodic measures for time-series data. Data Min Knowl Discov 12:1---28

Digital Library

[28]

Zliobaite I (2011) Combining similarity in time and space for training set formation under concept drift. Intell Data Anal 15(4):589---611

[29]

Zliobaite I (2011) Controlled permutations for testing adaptive classifiers. In: Proceedings of the 14th international conference discovery science (DS), pp 365---379

Cited By

Souza Vdos Reis DMaletzke ABatista G(2020)Challenges in benchmarking stream learning algorithms with real-world dataData Mining and Knowledge Discovery10.1007/s10618-020-00698-534:6(1805-1858)Online publication date: 7-Jul-2020
https://dl.acm.org/doi/10.1007/s10618-020-00698-5
Webb GHyde RCao HNguyen HPetitjean F(2016)Characterizing concept driftData Mining and Knowledge Discovery10.1007/s10618-015-0448-430:4(964-994)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1007/s10618-015-0448-4
Stefanowski J(2015)Adaptive ensembles for evolving data streamsProceedings of the 4th International Conference on New Frontiers in Mining Complex Patterns10.5555/3122094.3122096(3-16)Online publication date: 7-Sep-2015
https://dl.acm.org/doi/10.5555/3122094.3122096
Show More Cited By

Recommendations

Tighter upper bound for sorting permutations with prefix transpositions

Permutations are sequences where each symbol in the given alphabet Σ appears exactly once. A transposition is an operation that exchanges two adjacent sublists in a permutation; if one of these sublists is restricted to be a prefix then one obtains a ...
Improved upper bound for sorting permutations by prefix transpositions
Abstract
Modelling of chromosomes with permutations has triggered the research of sorting permutations using global rearrangement operations in computational molecular biology. One such rearrangement is transposition which swaps two adjacent ...
The patterns of permutations
Kleitman and combinatorics: a celebration

Let n,k be positive integers, with k ≤ n, and let τ be a fixed permutation of {1,...,k}. We will call τ the pattern. We will look for the pattern τ in permutations σ of n letters. A pattern τ is said to occur in a permutation σ if there are integers 1 ≤ ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems

Knowledge and Information Systems Volume 39, Issue 3

June 2014

243 pages

ISSN:0219-1377

Issue’s Table of Contents

Copyright © Copyright © 2014 Springer-Verlag London.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 June 2014

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Souza Vdos Reis DMaletzke ABatista G(2020)Challenges in benchmarking stream learning algorithms with real-world dataData Mining and Knowledge Discovery10.1007/s10618-020-00698-534:6(1805-1858)Online publication date: 7-Jul-2020
https://dl.acm.org/doi/10.1007/s10618-020-00698-5
Webb GHyde RCao HNguyen HPetitjean F(2016)Characterizing concept driftData Mining and Knowledge Discovery10.1007/s10618-015-0448-430:4(964-994)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1007/s10618-015-0448-4
Stefanowski J(2015)Adaptive ensembles for evolving data streamsProceedings of the 4th International Conference on New Frontiers in Mining Complex Patterns10.5555/3122094.3122096(3-16)Online publication date: 7-Sep-2015
https://dl.acm.org/doi/10.5555/3122094.3122096
Shaker AHüllermeier E(2015)Recovery analysis for adaptive learning from non-stationary data streamsNeurocomputing10.1016/j.neucom.2014.09.076150:PA(250-264)Online publication date: 20-Feb-2015
https://dl.acm.org/doi/10.1016/j.neucom.2014.09.076
Krempl GŽliobaite IBrzeziński DHüllermeier ELast MLemaire VNoack TShaker ASievi SSpiliopoulou MStefanowski J(2014)Open challenges for data stream mining researchACM SIGKDD Explorations Newsletter10.1145/2674026.267402816:1(1-10)Online publication date: 25-Sep-2014
https://dl.acm.org/doi/10.1145/2674026.2674028

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents