Controlled permutations for testing adaptive learning models

Indrė Žliobaitė^2,3,1

418 Accesses
7 Citations
Explore all metrics

Abstract

We study evaluation of supervised learning models that adapt to changing data distribution over time (concept drift). The standard testing procedure that simulates online arrival of data (test-then-train) may not be sufficient to generalize about the performance, since that single test concludes how well a model adapts to this fixed configuration of changes, while the ultimate goal is to assess the adaptation to changes that happen unexpectedly. We propose a methodology for obtaining datasets for multiple tests by permuting the order of the original data. A random permutation is not suitable, as it makes the data distribution uniform over time and destroys the adaptive learning task. Therefore, we propose three controlled permutation techniques that make it possible to acquire new datasets by introducing restricted variations in the order of examples. The control mechanisms with theoretical guarantees of preserving distributions ensure that the new sets represent close variations of the original learning task. Complementary tests on such sets allow to analyze sensitivity of the performance to variations in how changes happen and this way enrich the assessment of adaptive supervised learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recovery Analysis for Adaptive Learning from Non-stationary Data Streams

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Introduction

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Available at https://sites.google.com/site/zliobaite/permutations

References

Aldous D, Diaconis P (1986) Shuffling cards and stopping times. Am Math Mon 93(5):333–348
Article MATH MathSciNet Google Scholar
Antoch J, Huskova M (2001) Permutation tests in change point analysis. Stat Probab Lett 53:37–46
Article MATH MathSciNet Google Scholar
Atkinson M (1999) Restricted permutations. Discret Math 195:27–38
Article MATH Google Scholar
Baena-Garcia M, del Campo-Avila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Proceedings of ECML PKDD workshop on knowledge discovery from Data Streams, p 7786
Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
Google Scholar
Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavalda R (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 139–148
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar
Diaconis P (1988) Group representations in probability and statistics, vol 11 of Lecture notes-monograph series. Hayward Institute of Mathematical Statistics
Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923
Article Google Scholar
Durrett R (2003) Shuffling chromosomes. J Theor Probab 16(3):725–750
Article MATH MathSciNet Google Scholar
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Proceedings of Brazilian symposium on artificial intelligence (SBIA), pp 286–295
Gama J, Sebastiao R, Rodrigues PP (2009) Issues in evaluation of stream learning algorithms. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 329–338
Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Technical report, The University of South Wales
Ikonomovska E, Gama J, Dzeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Discov 23(1):128–168
Article MATH MathSciNet Google Scholar
Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22:371–391
Article Google Scholar
Kolter J, Maloof M (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790
Google Scholar
Ojala M, Garriga G (2010) Permutation tests for studying classifier performance. J Mach Learn Res 11:1833–1863
MATH MathSciNet Google Scholar
Pemantle R (1989) Randomization time for the overhand shuffle. J Theor Probab 2(1):37–49
Article MATH MathSciNet Google Scholar
Pfahringer B, Holmes G, Kirkby R (2007) New options for hoeffding trees. In: Proceedings of the 20th Australian joint conference on advances in artificial intelligence (AJCAAI), pp 90–99
Politis D (2003) The impact of bootstrap methods on time series analysis. Stat Sci 18(2):219–230
Article MathSciNet Google Scholar
Schiavinotto T, Stutzle T (2007) A review of metrics on permutations for search landscape analysis. Comput Oper Res 34(10):3143–3153
Article MATH Google Scholar
Sorensen K (2007) Distance measures based on the edit distance for permutation-type representations. J Heuristics 13(1):35–47
Google Scholar
Welch W (1990) Construction of permutation tests. J Am Stat Assoc 85(411):693–698
Article Google Scholar
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
Google Scholar
Witten I, Frank E, Hall M (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Los Altos, CA
Google Scholar
Wozniak M (2011) A hybrid decision tree training method using data streams. Knowl Inf Syst 29(2):335–347
Article Google Scholar
Vlachos M, Yu P, Castelli V, Meek Ch (2006) Structural periodic measures for time-series data. Data Min Knowl Discov 12:1–28
Article MathSciNet Google Scholar
Zliobaite I (2011) Combining similarity in time and space for training set formation under concept drift. Intell Data Anal 15(4):589–611
Google Scholar
Zliobaite I (2011) Controlled permutations for testing adaptive classifiers. In: Proceedings of the 14th international conference discovery science (DS), pp 365–379

Download references

Acknowledgments

The research leading to these results has received funding from the European Commission within the Marie Curie Industry and Academia Partnerships and Pathways (IAPP) programme under grant agreement no. 251617.

Author information

Authors and Affiliations

Department of Information and Computer Science, Aalto University, Espoo, Finland
Indrė Žliobaitė
Helsinki Institute for Information Technology (HIIT), Espoo, Finland
Indrė Žliobaitė
Bournemouth University, Poole, Dorset, UK
Indrė Žliobaitė

Authors

Indrė Žliobaitė
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Indrė Žliobaitė.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Žliobaitė, I. Controlled permutations for testing adaptive learning models. Knowl Inf Syst 39, 565–578 (2014). https://doi.org/10.1007/s10115-013-0629-7

Download citation

Received: 10 November 2011
Revised: 15 December 2012
Accepted: 08 March 2013
Published: 26 March 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10115-013-0629-7

Controlled permutations for testing adaptive learning models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Recovery Analysis for Adaptive Learning from Non-stationary Data Streams

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Introduction

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Controlled permutations for testing adaptive learning models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Recovery Analysis for Adaptive Learning from Non-stationary Data Streams

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Introduction

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation