Revisiting the effect of history on learning performance: the problem of the demanding lord

George Giannakopoulos¹ &
Themis Palpanas²

233 Accesses
Explore all metrics

Abstract

In a variety of settings ranging from recommendation systems to information filtering, approaches which take into account feedback have been introduced to improve services and user experience. However, as also indicated in the machine learning literature, there exist several settings where the requirements and target concept of a learning system changes over time, which consists a case of “concept drift”. In several systems, a sliding window over the training instances has been used to follow drifting concepts. However, no general analytic study has been performed on the relation between the size of the sliding window and the average performance of a learning system, since previous works have focused on instantaneous performance and specific underlying learners and data characteristics. This work proposes an analytic model that describes the effect of memory window size on the prediction performance of a learning system that is based on iterative feedback. The analysis considers target concepts changing over time, either periodically or randomly, using a formulation termed “the problem of the demanding lord”. Using a signal-to-noise approach to sketch learning ability of underlying machine learning algorithms, we estimate the average performance of a learning system regardless of its underlying algorithm and, as a corollary, propose a stepping stone toward finding the memory window that maximizes the average performance for a given drift setting and a specific modeling system. We experimentally support the proposed methodology with very promising results on three synthetic and four real datasets, using a variety of learning algorithms including Support Vector Machines, Naive Bayes, Nearest Neighbor and Decision Trees on classification and regression tasks. The results validate the analysis and indicate very good estimation performance in different settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Dynamic Feature Weighting for Feature Drifting Data Streams

Characterizing concept drift

Article 15 April 2016

A Simple Unlearning Framework for Online Learning Under Concept Drifts

Notes

A preliminary version of this work can be found in Giannakopoulos and Palpanas [19].
We also applied nonlinear regression alternatives for the search, but the methods failed to converge in some cases and, thus, we were forced to try genetic algorithms.
The results of the SEA dataset are similar to those of the STAGGER dataset.

References

Ahmed A, Low, Y, Aly M, Josifovski V, Smola A (2011) Scalable distributed inference of dynamic user interests for behavioral targeting. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 114–122
Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342
Google Scholar
Baena-García M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavaldà, R, Morales-Bueno R (2006) Early drift detection method. In: ECML PKDD 2006 workshop on knowledge discovery from data streams
Bouquet P, Stoermer H, Bazzanella B (2008). An entity name system (ENS) for the semantic web. In: European semantic web conference, pp 258–272
Cleveland WS (1981) LOWESS: a program for smoothing scatterplots by robust locally weighted regression. Am Stat 35:54
Article Google Scholar
Crabtree BI, Soltysiak S (1998) Identifying and tracking changing interests. Int J Digit Libr 2(1):38–53
Article Google Scholar
Dashnyam B, Liu Y-C, Hsu P-Y, Tsai Y-T (2011) Real time prediction of closing price and duration of b2b reverse auctions. Knowl Inf Syst. http://www.springerlink.com/content/p302748t360w4047/
Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923
Article Google Scholar
Domingos P, Hulten G, (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80
Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
MATH Google Scholar
Fdez-Riverola F, Iglesias E, Díaz F, Méndez J, Corchado J (2007) Applying lazy learning algorithms to tackle concept drift in spam filtering. Expert Syst Appl 33(1):36–48
Article Google Scholar
Fidalgo-Merino R, Nunez M (2011) Self-adaptive induction of regression trees. IEEE Trans Pattern Anal Mach Intell 33(8):1659–1672
Article Google Scholar
Freund Y, Schapire R (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296
Article MATH Google Scholar
Fukunaga K, Hayes R (1989a) Effects of sample size in classifier design. IEEE Trans Pattern Anal Mach Intell 11(8):873–885
Article Google Scholar
Fukunaga K, Hayes R (1989b) Estimation of classifier performance. IEEE Trans Pattern Anal Mach Intell 11(10):1087–1101
Article Google Scholar
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Lecture notes in computer science, pp 286–295
Giannakopoulos G, Palpanas T (2009) Adaptivity in entity subscription services. In: Proceedings of ADAPTIVE2009, Athens, Greece
Giannakopoulos G, Palpanas T (2010a) Content and type as orthogonal modeling features: a study on user interest awareness in entity subscription services. Int J Adv Netw Serv 3(2). http://www.iit.demokritos.gr/ggianna/Publications/adaptiveSubscription.pdf
Giannakopoulos G, Palpanas T (2010b) The effect of history on modeling systems’ performance. In: The problem of the demanding lord, ICDM 2010, IEEE. http://www.iit.demokritos.gr/ggianna/Publications/ICDM2010.pdf
Goldberg D (1989) Genetic algorithms in search and optimization. Addison-Wesley, Boston
MATH Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Article Google Scholar
Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Technical report, University of South Wales
Harter H, Khamis H, Lamb R (1984) Modified Kolmogorov-Smirnov tests of goodness of fit. Commun Stat Simul Comput 13(3):293–323
Article Google Scholar
Helmbold DP, Long PM, (1994) Tracking drifting concepts by minimizing disagreements. Mach Learn 14(1):27–45. http://dx.doi.org/10.1007/BF00993161
Google Scholar
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 97–106
Ifo Institute (2010) Ifo business survey—the ifo business climate for Germany since January 1991. http://www.cesifo-group.de/portal/page/portal/ifoHome/a-winfo/d6zeitreihen/15reihen/_reihenkt
Ikonomovska E, Gama J, Deroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Discov 23(1):128–168
Article MathSciNet MATH Google Scholar
Ikonomovska E, Gama J, Sebastião R, Gjorgjevik D (2009) Regression trees from data streams with drift detection. Discov Sci 5808:121–135
Google Scholar
Joachims T (2000) Estimating the generalization performance of an SVM efficiently. In: ICML ’00 Proceedings of the seventeenth international conference on machine learning, pp 431–438
Kiefer J (1953) Sequential minimax search for a maximum. Proc Am Math Soc 4(3):502–506
Article MathSciNet MATH Google Scholar
Klinkenberg R, Renz I, (1998). Adaptive information filtering: learning in the presence of concept drifts. In: Learning for text categorization, pp 33–40
Kohavi R (1995) The power of decision tables. Mach Learn ECML-95 912:174–189
Article Google Scholar
Koychev I, Lothian R, (2005) Tracking drifting concepts by time window optimisation. In: Proceedings of AI-2005, the twenty-fifth SGAI international conference on innovative techniques and applications of artificial intelligence. Citeseer, pp 46–59
Koychev I, Schwab I (2000) Adaptation to drifting user’s interests. In: Proceedings of ECML2000 workshop: machine learning in new unformation age Citeseer, pp 39–45
Kuh A, Petsche T, Rivest R (1990) Learning time-varying concepts. In: Proceedings of the 1990 conference on advances in neural information processing systems 3, Morgan Kaufmann, Burlington, p 189
Kuncheva L, Žliobait${\dot{\text{ e}}}$ I (2009) On the window size for classification in changing environments. Intell Data Anal 13(6):861–872
Google Scholar
Lam W, Mukhopadhyay S, Mostafa J, Palakal M, (1996) Detection of shifts in user interests for personalized information filtering. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in, information retrieval, p 325
Lazarescu M, Venkatesh S, Bui H (2004) Using multiple windows to track concept drift. Intell Data Anal 8(1):29–59
Google Scholar
Liu W, Wang T (2011) Online active multi-field learning for efficient email spam filtering. Knowl Inf Syst. http://www.springerlink.com/content/173321g2qg2u0q61/
Maloof M, Michalski R (1999) Selecting examples for partial memory learning. Mach Learn 41(1):27–52
Article Google Scholar
Maloof M, Michalski R (2004) Incremental learning with partial instance memory. Artif Intell 154(1–2): 95–126
Article MathSciNet MATH Google Scholar
Massey F Jr (1951) The Kolmogorov-Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78
Article MATH Google Scholar
Mitchell T, Caruana R, Freitag D, McDermott J, Zabowski D (1994) Experience with a learning personal assistant. Commun ACM 37(7):80–91
Article Google Scholar
Musavi M, Chan K, Hummels D, Kalantri K (1994) On the generalization ability of neural network classifiers. IEEE Trans Pattern Anal Mach Intell 16(6):659–663
Article Google Scholar
Núñez M, Fidalgo R, Morales R (2007) Learning in environments with unknown dynamics: towards more robust concept learners. J Mach Learn Res 8:2595–2628
MathSciNet MATH Google Scholar
Palpanas T, Chaudhry JA, Andritsos P, Velegrakis Y (2008) Entity data management in OKKAM DEXA workshops, pp 729–733
Patist J (2007) Optimal window change detection. In: Data mining workshops, 2007. ICDM workshops 2007, IEEE, pp 557–562
Ramon J, Driessens K, Croonenborghs T (2007) Transfer learning in reinforcement learning problems through partial policy recycling. Mach Learn ECML 2007 4701:699–707
Article Google Scholar
Reinke R, Michalski R (1988) Incremental learning of concept descriptions: a method and experimental results. Mach Intell 11:263–288
Google Scholar
Schlimmer J, Granger R (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354
Google Scholar
Scholz M, Klinkenberg R (2007) Boosting classifiers for drifting concepts. Intell Data Anal 11(1):3–28
Google Scholar
Stanley K (2001) Learning concept drift with a committee of decision trees, Technical report. Computer Science Department, University of Texas, Austin
Street WN, Kim YS (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: 7th ACM SIGKDD international conference on knowledge discovery and data mining, pp 377–382
Tsymbal A (2004) The problem of concept drift: definitions and related work, technical report
Vapnik V (1999) The nature of statistical learning theory. Springer, New York
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: 9th ACM SIGKDD international conference on knowledge discovery and data mining. KDD 03, ACM, pp 226–235. ACM ID: 956778
Wang Y, Witten I (1996) Induction of model trees for predicting continuous classes. In: Poster articles of the 9th European conference on machine learning
Webb GI, Pazzani MJ, Billsus D (2001) Machine learning for user modeling. User Model User-Adapt Interact 11(1):19–29
Article MATH Google Scholar
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
Google Scholar
Widyantoro D, Ioerger T, Yen J (1999) An adaptive algorithm for learning changes in user interests. In: Proceedings of the eighth international conference on information and knowledge management. ACM New York, pp 405–412
Zhang P, Gao BJ, Zhu X, Guo L (2011) Enabling fast lazy learning for data streams. In: Proceedings of ICDM 2011
Zliobaite I (2010) Change with delayed labeling: when is it detectable? In: Data mining workshops (ICDMW), 2010 IEEE international conference on IEEE, pp 843–850

Download references

Acknowledgments

The authors would also like to thank Prof. Alberto Valli of the University of Trento for useful advice and insight on the estimation of the sigmoid function. This work was partially supported by the FP7 EU Large-scale Integrating Project OKKAM “Enabling a Web of Entities” (contract no. ICT-215032). See http://www.okkam.org.

Author information

Authors and Affiliations

Institute of Informatics and Telecommunications, NCSR Demokritos, 15310, Aghia Paraskevi, Greece
George Giannakopoulos
DISI, University of Trento, Trento, Italy
Themis Palpanas

Authors

George Giannakopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Themis Palpanas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George Giannakopoulos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Giannakopoulos, G., Palpanas, T. Revisiting the effect of history on learning performance: the problem of the demanding lord. Knowl Inf Syst 36, 653–691 (2013). https://doi.org/10.1007/s10115-012-0568-8

Download citation

Received: 13 August 2011
Revised: 30 July 2012
Accepted: 22 September 2012
Published: 13 October 2012
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10115-012-0568-8

Revisiting the effect of history on learning performance: the problem of the demanding lord

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On Dynamic Feature Weighting for Feature Drifting Data Streams

Characterizing concept drift

A Simple Unlearning Framework for Online Learning Under Concept Drifts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Revisiting the effect of history on learning performance: the problem of the demanding lord

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On Dynamic Feature Weighting for Feature Drifting Data Streams

Characterizing concept drift

A Simple Unlearning Framework for Online Learning Under Concept Drifts

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation