Abstract
In a variety of settings ranging from recommendation systems to information filtering, approaches which take into account feedback have been introduced to improve services and user experience. However, as also indicated in the machine learning literature, there exist several settings where the requirements and target concept of a learning system changes over time, which consists a case of “concept drift”. In several systems, a sliding window over the training instances has been used to follow drifting concepts. However, no general analytic study has been performed on the relation between the size of the sliding window and the average performance of a learning system, since previous works have focused on instantaneous performance and specific underlying learners and data characteristics. This work proposes an analytic model that describes the effect of memory window size on the prediction performance of a learning system that is based on iterative feedback. The analysis considers target concepts changing over time, either periodically or randomly, using a formulation termed “the problem of the demanding lord”. Using a signal-to-noise approach to sketch learning ability of underlying machine learning algorithms, we estimate the average performance of a learning system regardless of its underlying algorithm and, as a corollary, propose a stepping stone toward finding the memory window that maximizes the average performance for a given drift setting and a specific modeling system. We experimentally support the proposed methodology with very promising results on three synthetic and four real datasets, using a variety of learning algorithms including Support Vector Machines, Naive Bayes, Nearest Neighbor and Decision Trees on classification and regression tasks. The results validate the analysis and indicate very good estimation performance in different settings.
Similar content being viewed by others
Notes
A preliminary version of this work can be found in Giannakopoulos and Palpanas [19].
We also applied nonlinear regression alternatives for the search, but the methods failed to converge in some cases and, thus, we were forced to try genetic algorithms.
The results of the SEA dataset are similar to those of the STAGGER dataset.
References
Ahmed A, Low, Y, Aly M, Josifovski V, Smola A (2011) Scalable distributed inference of dynamic user interests for behavioral targeting. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 114–122
Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319–342
Baena-García M, del Campo-Ávila J, Fidalgo R, Bifet A, Gavaldà, R, Morales-Bueno R (2006) Early drift detection method. In: ECML PKDD 2006 workshop on knowledge discovery from data streams
Bouquet P, Stoermer H, Bazzanella B (2008). An entity name system (ENS) for the semantic web. In: European semantic web conference, pp 258–272
Cleveland WS (1981) LOWESS: a program for smoothing scatterplots by robust locally weighted regression. Am Stat 35:54
Crabtree BI, Soltysiak S (1998) Identifying and tracking changing interests. Int J Digit Libr 2(1):38–53
Dashnyam B, Liu Y-C, Hsu P-Y, Tsai Y-T (2011) Real time prediction of closing price and duration of b2b reverse auctions. Knowl Inf Syst. http://www.springerlink.com/content/p302748t360w4047/
Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923
Domingos P, Hulten G, (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 71–80
Fan R, Chang K, Hsieh C, Wang X, Lin C (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
Fdez-Riverola F, Iglesias E, Díaz F, Méndez J, Corchado J (2007) Applying lazy learning algorithms to tackle concept drift in spam filtering. Expert Syst Appl 33(1):36–48
Fidalgo-Merino R, Nunez M (2011) Self-adaptive induction of regression trees. IEEE Trans Pattern Anal Mach Intell 33(8):1659–1672
Freund Y, Schapire R (1999) Large margin classification using the perceptron algorithm. Mach Learn 37(3):277–296
Fukunaga K, Hayes R (1989a) Effects of sample size in classifier design. IEEE Trans Pattern Anal Mach Intell 11(8):873–885
Fukunaga K, Hayes R (1989b) Estimation of classifier performance. IEEE Trans Pattern Anal Mach Intell 11(10):1087–1101
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Lecture notes in computer science, pp 286–295
Giannakopoulos G, Palpanas T (2009) Adaptivity in entity subscription services. In: Proceedings of ADAPTIVE2009, Athens, Greece
Giannakopoulos G, Palpanas T (2010a) Content and type as orthogonal modeling features: a study on user interest awareness in entity subscription services. Int J Adv Netw Serv 3(2). http://www.iit.demokritos.gr/ggianna/Publications/adaptiveSubscription.pdf
Giannakopoulos G, Palpanas T (2010b) The effect of history on modeling systems’ performance. In: The problem of the demanding lord, ICDM 2010, IEEE. http://www.iit.demokritos.gr/ggianna/Publications/ICDM2010.pdf
Goldberg D (1989) Genetic algorithms in search and optimization. Addison-Wesley, Boston
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18
Harries M (1999) Splice-2 comparative evaluation: electricity pricing. Technical report, University of South Wales
Harter H, Khamis H, Lamb R (1984) Modified Kolmogorov-Smirnov tests of goodness of fit. Commun Stat Simul Comput 13(3):293–323
Helmbold DP, Long PM, (1994) Tracking drifting concepts by minimizing disagreements. Mach Learn 14(1):27–45. http://dx.doi.org/10.1007/BF00993161
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 97–106
Ifo Institute (2010) Ifo business survey—the ifo business climate for Germany since January 1991. http://www.cesifo-group.de/portal/page/portal/ifoHome/a-winfo/d6zeitreihen/15reihen/_reihenkt
Ikonomovska E, Gama J, Deroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Discov 23(1):128–168
Ikonomovska E, Gama J, Sebastião R, Gjorgjevik D (2009) Regression trees from data streams with drift detection. Discov Sci 5808:121–135
Joachims T (2000) Estimating the generalization performance of an SVM efficiently. In: ICML ’00 Proceedings of the seventeenth international conference on machine learning, pp 431–438
Kiefer J (1953) Sequential minimax search for a maximum. Proc Am Math Soc 4(3):502–506
Klinkenberg R, Renz I, (1998). Adaptive information filtering: learning in the presence of concept drifts. In: Learning for text categorization, pp 33–40
Kohavi R (1995) The power of decision tables. Mach Learn ECML-95 912:174–189
Koychev I, Lothian R, (2005) Tracking drifting concepts by time window optimisation. In: Proceedings of AI-2005, the twenty-fifth SGAI international conference on innovative techniques and applications of artificial intelligence. Citeseer, pp 46–59
Koychev I, Schwab I (2000) Adaptation to drifting user’s interests. In: Proceedings of ECML2000 workshop: machine learning in new unformation age Citeseer, pp 39–45
Kuh A, Petsche T, Rivest R (1990) Learning time-varying concepts. In: Proceedings of the 1990 conference on advances in neural information processing systems 3, Morgan Kaufmann, Burlington, p 189
Kuncheva L, Žliobait\({\dot{\text{ e}}}\) I (2009) On the window size for classification in changing environments. Intell Data Anal 13(6):861–872
Lam W, Mukhopadhyay S, Mostafa J, Palakal M, (1996) Detection of shifts in user interests for personalized information filtering. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in, information retrieval, p 325
Lazarescu M, Venkatesh S, Bui H (2004) Using multiple windows to track concept drift. Intell Data Anal 8(1):29–59
Liu W, Wang T (2011) Online active multi-field learning for efficient email spam filtering. Knowl Inf Syst. http://www.springerlink.com/content/173321g2qg2u0q61/
Maloof M, Michalski R (1999) Selecting examples for partial memory learning. Mach Learn 41(1):27–52
Maloof M, Michalski R (2004) Incremental learning with partial instance memory. Artif Intell 154(1–2): 95–126
Massey F Jr (1951) The Kolmogorov-Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78
Mitchell T, Caruana R, Freitag D, McDermott J, Zabowski D (1994) Experience with a learning personal assistant. Commun ACM 37(7):80–91
Musavi M, Chan K, Hummels D, Kalantri K (1994) On the generalization ability of neural network classifiers. IEEE Trans Pattern Anal Mach Intell 16(6):659–663
Núñez M, Fidalgo R, Morales R (2007) Learning in environments with unknown dynamics: towards more robust concept learners. J Mach Learn Res 8:2595–2628
Palpanas T, Chaudhry JA, Andritsos P, Velegrakis Y (2008) Entity data management in OKKAM DEXA workshops, pp 729–733
Patist J (2007) Optimal window change detection. In: Data mining workshops, 2007. ICDM workshops 2007, IEEE, pp 557–562
Ramon J, Driessens K, Croonenborghs T (2007) Transfer learning in reinforcement learning problems through partial policy recycling. Mach Learn ECML 2007 4701:699–707
Reinke R, Michalski R (1988) Incremental learning of concept descriptions: a method and experimental results. Mach Intell 11:263–288
Schlimmer J, Granger R (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354
Scholz M, Klinkenberg R (2007) Boosting classifiers for drifting concepts. Intell Data Anal 11(1):3–28
Stanley K (2001) Learning concept drift with a committee of decision trees, Technical report. Computer Science Department, University of Texas, Austin
Street WN, Kim YS (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: 7th ACM SIGKDD international conference on knowledge discovery and data mining, pp 377–382
Tsymbal A (2004) The problem of concept drift: definitions and related work, technical report
Vapnik V (1999) The nature of statistical learning theory. Springer, New York
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: 9th ACM SIGKDD international conference on knowledge discovery and data mining. KDD 03, ACM, pp 226–235. ACM ID: 956778
Wang Y, Witten I (1996) Induction of model trees for predicting continuous classes. In: Poster articles of the 9th European conference on machine learning
Webb GI, Pazzani MJ, Billsus D (2001) Machine learning for user modeling. User Model User-Adapt Interact 11(1):19–29
Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
Widyantoro D, Ioerger T, Yen J (1999) An adaptive algorithm for learning changes in user interests. In: Proceedings of the eighth international conference on information and knowledge management. ACM New York, pp 405–412
Zhang P, Gao BJ, Zhu X, Guo L (2011) Enabling fast lazy learning for data streams. In: Proceedings of ICDM 2011
Zliobaite I (2010) Change with delayed labeling: when is it detectable? In: Data mining workshops (ICDMW), 2010 IEEE international conference on IEEE, pp 843–850
Acknowledgments
The authors would also like to thank Prof. Alberto Valli of the University of Trento for useful advice and insight on the estimation of the sigmoid function. This work was partially supported by the FP7 EU Large-scale Integrating Project OKKAM “Enabling a Web of Entities” (contract no. ICT-215032). See http://www.okkam.org.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Giannakopoulos, G., Palpanas, T. Revisiting the effect of history on learning performance: the problem of the demanding lord. Knowl Inf Syst 36, 653–691 (2013). https://doi.org/10.1007/s10115-012-0568-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0568-8