Nothing Special   »   [go: up one dir, main page]

Skip to main content

An Overview of Concept Drift Applications

  • Chapter
  • First Online:
Big Data Analysis: New Algorithms for a New Society

Part of the book series: Studies in Big Data ((SBD,volume 16))

Abstract

In most challenging data analysis applications, data evolve over time and must be analyzed in near real time. Patterns and relations in such data often evolve over time, thus, models built for analyzing such data quickly become obsolete over time. In machine learning and data mining this phenomenon is referred to as concept drift. The objective is to deploy models that would diagnose themselves and adapt to changing data over time. This chapter provides an application oriented view towards concept drift research, with a focus on supervised learning tasks. First we overview and categorize application tasks for which the problem of concept drift is particularly relevant. Then we construct a reference framework for positioning application tasks within a spectrum of problems related to concept drift. Finally, we discuss some promising research directions from the application perspective, and present recommendations for application driven concept drift research and development.

We dedicate this chapter to Dr. Alexey Tsymbal who passed away suddenly and unexpectedly in November 2014 at age of 39. Alexey contributed to the progress of data mining and medical informatics on several topics, including notable work on handling concept drift.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.acm.org/about/class/ccs98-html.

  2. 2.

    http://www.kdnuggets.com/polls/2010/analytics-data-mining-industries-applications.html.

  3. 3.

    www.netflixprize.com.

References

  1. Ang, H.H., Gopalkrishnan V., Zliobaite I., Pechenizkiy M., Hoi S.C.H.: Predictive handling of asynchronous concept drifts in distributed environments. IEEE Trans. Knowl. Data Eng. 25, 2343–2355 (2013)

    Google Scholar 

  2. Anguita, D.: Smart adaptive systems: state of the art and future directions of research. In: Proceedings of the 1st European Sympposium on Intelligent Technologies, Hybrid Systems and Smart Adaptive Systems, EUNITE (2001)

    Google Scholar 

  3. Becker, R.A., Volinsky, C., Wilks, A.R.: Fraud detection in telecommunications: History and lessons learned. Technometrics 52(1), 20–33 (2010)

    Article  MathSciNet  Google Scholar 

  4. Billsus, D., Pazzani, M.: A hybrid user model for news story classification. In: Proceedings of the 7th International Conference on User Modeling, UM, pp. 99–108 (1999)

    Google Scholar 

  5. Black, M., Hickey, R.: Classification of customer call data in the presence of concept drift and noise. In: Proceedings of the 1st International Conference on Computing in an Imperfect World, pp. 74–87 (2002)

    Google Scholar 

  6. Black, M., Hickey, R.: Detecting and adapting to concept drift in bioinformatics, pp. 161–168. In Proc. of Knowledge Exploration in Life Science Informatics, International Symposium (2004)

    Google Scholar 

  7. Bolton, R., Hand, D.: Statistical fraud detection: A review. Stat. Sci. 17(3), 235–255 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bose, R.P.J.C., van der Aalst W.M.P., Zliobaite, I., Pechenizkiy, M. Dealing with concept drift in process mining. IEEE Trans. Neur. Net. Lear. Syst. accepted (2013)

    Google Scholar 

  9. Budka, M., Eastwood, M., Gabrys, B., Kadlec, P., Martin-Salvador, M., Schwan, S., Tsakonas, A., Zliobaite, I.: From sensor readings to predictions: on the process of developing practical soft sensors. In: Procedings of the 13th International Symposium on Intelligent Data Analysis, pp. 49–60 (2014)

    Google Scholar 

  10. Carmona, J., Gavaldà, R.: Online techniques for dealing with concept drift in process mining. In: Proceedings of the 11th International Symposium on Intelligent Data Analysis, pp. 90–102 (2012)

    Google Scholar 

  11. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0 step-by-step data mining guide. Technical report, The CRISP-DM consortium (2000)

    Google Scholar 

  12. Charles, D., Kerr, A., McNeill, M., McAlister, M. Black, M., Kucklich, J., Moore, A., Stringer, K.: Player-centred game design: player modelling and adaptive digital games. In: Proceedings of the Digital Games Research Conference, pp. 285–298 (2005)

    Google Scholar 

  13. Crespo, F., Weber, R.: A methodology for dynamic data mining based on fuzzy clustering. Fuzzy Sets and Syst. 150, 267–284 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. Crook, J., Hamilton, R., Thomas, L.C.: The degradation of the scorecard over the business cycle. IMA J. Manage. Math. 4, 111–123 (1992)

    Google Scholar 

  15. da Silva, A., Lechevallier, Y., Rossi, F., de Carvalho, F.: Construction and analysis of evolving data summaries: an application on web usage data. In: Proceedings of the 7th International Conference on Intelligent Systems Design and Applications, pp. 377–380 (2007)

    Google Scholar 

  16. De Bra, P., Aerts, A., Berden, B., de Lange, B., Rousseau, B., Santic, T., Smits, D., Stash, N.: AHA! the adaptive hypermedia architecture. In: Proceedings of the 14th ACM Conference on Hypertext and hypermedia, pp. 81–84 (2003)

    Google Scholar 

  17. Delany, S., Cunningham, P., Tsymbal, A.: A comparison of ensemble and case-base maintenance techniques for handling concept drift in spam filtering. In: Proceedings of Florida Artificial Intelligence Research Society Conference, pp. 340–345 (2006)

    Google Scholar 

  18. Ding, Y., Li, X.: Time weight collaborative filtering. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 485–492 (2005)

    Google Scholar 

  19. Donoho, S.: Early detection of insider trading in option markets. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 420–429 (2004)

    Google Scholar 

  20. Ekanayake, J., Tappolet, J., Gall, H.C., Bernstein, A.: Tracking concept drift of software projects using defect prediction quality. In: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories, pp. 51–60 (2009)

    Google Scholar 

  21. Fdez-Riverola, F., Iglesias, E., Diaz, F., Mendez, J., Corchado, J.: Applying lazy learning algorithms to tackle concept drift in spam filtering. Expert Syst. Appl. 33(1), 36–48 (2007)

    Article  Google Scholar 

  22. Flasch, O., Kaspari, A., Morik, K., Wurst, M.: Aspect-based tagging for collaborative media organization. In: Proceedings of Workshop on Web Mining, From Web to Social Web: Discovering and Deploying User and Content Profiles, pp. 122–141 (2007)

    Google Scholar 

  23. Forman, G.: Incremental machine learning to reduce biochemistry lab costs in the search for drug discovery. In: Proceedings of the 2nd Workshop on Data Mining in Bioinformatics, pp. 33–36 (2002)

    Google Scholar 

  24. Gago, P., Silva, A., Santos, M.: Adaptive decision support for intensive care. In: Proceedings of 13th Portuguese Conference on Artificial Intelligence, pp. 415–425 (2007)

    Google Scholar 

  25. Gama, J., Kosina, P.: Learning about the learning process. In: Proceedings of the 10th International Conference on Advances in intelligent data analysis, IDA, pp. 162–172, Germany, Springer (2011)

    Google Scholar 

  26. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, pp. 286–295 (2004)

    Google Scholar 

  27. Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)

    Google Scholar 

  28. Gauch, S. Speretta, M., Chandramouli, A., Micarelli, A.: User profiles for personalized information access. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web, pp. 54–89. Springer (2007)

    Google Scholar 

  29. Giacomini, R., Rossi, B.: Detecting and predicting forecast breakdowns. Working Paper 638, ECB (2006)

    Google Scholar 

  30. Hand, D.J.: Fraud detection in telecommunications and banking: discussion of Becker, Volinsky, and Wilks (2010); Sudjianto et al. Technometrics 52(1), 34–38 (2010)

    Google Scholar 

  31. Hand, D.: Classifier technology and the illusion of progress. Stat. Sci. 21(1), 1–14 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  32. Hand, D.J., Adams, N.M.: Selection bias in credit scorecard evaluation. JORS 65(3), 408–415 (2014)

    Article  Google Scholar 

  33. Harries, M., Horn, K.: Detecting concept drift in financial time series prediction using symbolic machine learning. In: In Proceedings of the 8th Australian Joint Conference on Artificial Intelligence, pp. 91–98 (1995)

    Google Scholar 

  34. Harries, M., Sammut, C., Horn, K.: Extracting hidden context. Mach. Learn. 32(2), 101–126 (1998)

    Article  MATH  Google Scholar 

  35. Hasan, M., Nantajeewarawat, E.: Towards intelligent and adaptive digital library services. In: Proceedings of the 11th International Conference on Asian Digital Libraries, pp. 104–113 (2008)

    Google Scholar 

  36. Haykin, S., Li, L.: Nonlinear adaptive prediction of nonstationary signals. IEEE Trans. Sig. Process. 43(2), 526–535 (1995)

    Article  Google Scholar 

  37. Hilas, C.: Designing an expert system for fraud detection in private telecommunications networks. Expert Syst. Appl. 36(9), 11559–11569 (2009)

    Article  Google Scholar 

  38. Horta, R., de Lima, B., Borges, C.: Data pre-processing of bankruptcy prediction models using data mining techniques (2009)

    Google Scholar 

  39. Jermaine, C.: Data mining for multiple antibiotic resistance. Online (2008)

    Google Scholar 

  40. Kadlec, P., Grbic, R., Gabrys, B.: Review of adaptation mechanisms for data-driven soft sensors. Comput. Chem. Eng. 35, 1–24 (2011)

    Article  Google Scholar 

  41. Kadlec, P., Gabrys, B.: Local learning-based adaptive soft sensor for catalyst activation prediction. AIChE J. 57(5), 1288–1301 (2011)

    Article  Google Scholar 

  42. Kiseleva, J., Crestan, E., Brigo, R., Dittel, R.: Modelling and detecting changes in user satisfaction. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, pp. 1449–1458 (2014)

    Google Scholar 

  43. Kleinberg, J.: Bursty and hierarchical structure in streams. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 91–101. ACM (2002)

    Google Scholar 

  44. Klinkenberg, R.: Meta-learning, model selection and example selection in machine learning domains with concept drift. In: Proceedings of annual workshop of the Special Interest Group on Machine Learning, Knowledge Discovery, and Data Mining, pp. 64–171 (2005)

    Google Scholar 

  45. Koren, Y.: Collaborative filtering with temporal dynamics. Commun. ACM 53(4), 89–97 (2010)

    Article  Google Scholar 

  46. Kukar, M.: Drifting concepts as hidden factors in clinical studies. In: Proceedings of the 9th Conference on Artificial Intelligence in Medicine in Europe, pp. 355–364 (2003)

    Google Scholar 

  47. Lathia, N., Hailes, S., Capra, L.: kNN CF: a temporal social network. In: Proceedings of the ACM Conference on Recommender Systems, pp. 227–234 (2008)

    Google Scholar 

  48. Lattner, A., Miene, A., Visser, U., Herzog, O.: Sequential pattern mining for situation and behavior prediction in simulated robotic soccer. In: Proceedings of Robot Soccer World Cup IX, pp. 118–129 (2006)

    Google Scholar 

  49. Lebanon, G., Zhao, Y.: Local likelihood modeling of temporal text streams. In: Proceedings of the 25th International Conference on Machine Learning, pp. 552–559 (2008)

    Google Scholar 

  50. Lee, W., Stolfo, S.J., Mok, K.W.: Adaptive intrusion detection: A data mining approach. Artif. Intell. Rev. 14(6), 533–567 (2000)

    Article  MATH  Google Scholar 

  51. Liao, L., Patterson, D., Fox, D., Kautz, H.: Learning and inferring transportation routines. Artif. Intell. 171(5–6), 311–331 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  52. Luo, J., Pronobis, A., Caputo, B., Jensfelt, P.: Incremental learning for place recognition in dynamic environments. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 721–728 (2007)

    Google Scholar 

  53. Martin, M.T., Knudsen, T.B., Judson, R.S., Kavlock, R.J., Dix, D.J.: Economic benefits of using adaptive predictive models of reproductive toxicity in the context of a tiered testing program. Syst. Biol. Reprod. Med. 58, 3–9 (2012)

    Article  Google Scholar 

  54. Mazhelis, O., Puuronen, S.: Comparing classifier combining techniques for mobile-masquerader detection. In: Proceedings of the The 2nd International Conference on Availability, Reliability and Security, pp. 465–472 (2007)

    Google Scholar 

  55. Minku, L.L., White, A.P., Yao, X.: The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. Knowl. Data Eng. 22(5), 730–742 (2010)

    Article  Google Scholar 

  56. Morales, G.D.F., A, Bifet.: SAMOA: Scalable advanced massive online analysis. J. Mach. Learn. Res. 16, 149–153 (2015)

    Google Scholar 

  57. Moreira, J.: Travel time prediction for the planning of mass transit companies: a machine learning approach. PhD thesis, University of Porto (2008)

    Google Scholar 

  58. Moreno-Torres, J.G., Raeder, T., Alaiz-Rodríguez, R., Chawla, N.V., Herrera, F.: A unifying view on dataset shift in classification. Pattern Recogn. 45(1), 521–530 (2012)

    Google Scholar 

  59. Mourao, F., Rocha, L., Araujo, R., Couto, T., Goncalves, M., Meira, W.: Understanding temporal aspects in document classification. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 159–170 (2008)

    Google Scholar 

  60. Pawling, A., Chawla, N., Madey, G.: Anomaly detection in a mobile communication network. Comput. Math. Organ. Theory 13(4), 407–422 (2007)

    Article  MATH  Google Scholar 

  61. Pechenizkiy, M., Bakker, J., Zliobaite, I., Ivannikov, A., Karkkainen, T.: Online mass flow prediction in CFB boilers with explicit detection of sudden concept drift. SIGKDD Explor. 11(2), 109–116 (2009)

    Article  Google Scholar 

  62. Poh, N., Wong, R., Kittler, J., Roli, F.: Challenges and research directions for adaptive biometric recognition systems. In: Proceedings of the 3rd International Conference on Advances in Biometrics, pp. 753–764 (2009)

    Google Scholar 

  63. Procopio, M., Mulligan, J., Grudic, G.: Learning terrain segmentation with classifier ensembles for autonomous robot navigation in unstructured environments. J. Field Robot. 26(2), 145–175 (2009)

    Article  Google Scholar 

  64. Rashidi, P., Cook, D.: Keeping the resident in the loop: Adapting the smart home to the user. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum 39(5), 949–959 (2009)

    Google Scholar 

  65. Reinartz, T.P.: Focusing solutions for data mining: analytical studies and experimental results in real-world domains. In: Lecture Notes in Computer Science, vol. 1623. Springer (1999)

    Google Scholar 

  66. Rozsypal, A., Kubat, M.: Association mining in time-varying domains. Intell. Data Anal. 9(3), 273–288 (2005)

    Google Scholar 

  67. Scanlan, J., Hartnett, J., Williams. R.: DynamicWEB: adapting to concept drift and object drift in cobweb. In: Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence, pp. 454–460 (2008)

    Google Scholar 

  68. Sudjianto, A., Nair, S., Yuan, M., Zhang, A., Kern, D., Cela-Diaz, F.: Statistical methods for fighting financial crimes. Technometrics 52(1), 5–19 (2010)

    Article  MathSciNet  Google Scholar 

  69. Sung, T., Chang, N., Lee, G.: Dynamics of modeling in data mining: interpretive approach to bankruptcy prediction. J. Manage. Inf. Syst. 16(1), 63–85 (1999)

    Article  Google Scholar 

  70. Thrun, S., Montemerlo, M., Dahlkamp, H., Stavens, D., Aron, A., Diebel, J., Fong, P., Gale, J., Halpenny, M., Hoffmann, G., Lau, K., Oakley, C., Palatucci, M., Pratt, V., Stang, P., Strohband, S., Dupont, C., Jendrossek, L.-E., Koelen, C., Markey, C., Rummel, C., van Niekerk, J., Jensen, E., Alessandrini, P., Bradski, G., Davies, B., Ettinger, S., Kaehler, A., Nefian, A., Mahoney, P.: Winning the darpa grand challenge. J. Field Robot. 23(9), 661–692 (2006)

    Article  Google Scholar 

  71. Tsymbal, A.: The problem of concept drift: definitions and related work. Technical report, Department of Computer Science, Trinity College Dublin, Ireland (2004)

    Google Scholar 

  72. Tsymbal, A., Pechenizkiy, M., Cunningham, P., Puuronen, S.: Dynamic integration of classifiers for handling concept drift. Inf. Fusion 9(1), 56–68 (2008)

    Article  Google Scholar 

  73. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)

    Google Scholar 

  74. Widyantoro, D., Yen, J.: Relevant data expansion for learning concept drift from sparsely labeled data. IEEE Trans. Knowl. Data Eng. 17(3), 401–412 (2005)

    Google Scholar 

  75. Yampolskiy, R., Govindaraju, V.: Direct and indirect human computer interaction based biometrics. J. Comput. 2(10), 76–88 (2007)

    Article  Google Scholar 

  76. Yang, Y., Wu, X., Zhu, X.: Mining in anticipation for concept change: Proactive-reactive prediction in data streams. Data Min. Knowl. Discov. 13(3), 261–289 (2006)

    Article  MathSciNet  Google Scholar 

  77. Zhou, J., Cheng, L., Bischof, W.: Prediction and change detection in sequential data for interactive applications. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence, pp. 805–810 (2008)

    Google Scholar 

  78. Zliobaite, I., Bakker, J., Pechenizkiy, M.: Beating the baseline prediction in food sales: How intelligent an intelligent predictor is? Expert Syst. Appl. 31(1), 806–815 (2012)

    Article  Google Scholar 

Download references

Acknowledgments

This work was partially supported by European Commission through the project MAESTRA (Grant number ICT-2013-612944).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Indrė Žliobaitė .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Žliobaitė, I., Pechenizkiy, M., Gama, J. (2016). An Overview of Concept Drift Applications. In: Japkowicz, N., Stefanowski, J. (eds) Big Data Analysis: New Algorithms for a New Society. Studies in Big Data, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-26989-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26989-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26987-0

  • Online ISBN: 978-3-319-26989-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics