Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Recent advances in storage and processing have provided the possibility of automatic gathering of information, which in turn leads to fast and continuous flows of data. The data which are produced and stored in this way are called data streams. Data streams are produced in large size, and much dynamism and have some unique properties which make them applicable to model many real data mining applications. The main challenge of streaming data is the occurrence of concept drift. In addition, regarding the costs of labeling of instances, it is often assumed that only a small fraction of instances are labeled. In this paper, we propose an ensemble algorithm to classify instances of non-stationary data streams in a semi-supervised environment. Furthermore, this method is intended to recognize recurring concept drifts of data streams. In the proposed algorithm, a pool of classifiers is maintained by the algorithm with each classifier being representative of one single concept. At first, a batch of instances is classified by the algorithm. Thereafter, some of these instances are labeled and this partially labeled batch is used to update the classifiers in the pool. This process repeats for consecutive batches of the streams. The main advantage of the algorithm is that it uses unlabeled instances as well as labeled ones in the learning task. Experimental results show the effectiveness of the proposed algorithm over the state-of-the-art methods, in different aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Aggarwal CC (2006) Data streams: models and algorithms. Springer-Verlag New York Inc, New York

    Google Scholar 

  2. Ahmadi Z, Beigy H (2012) Semi-supervised ensemble learning of data streams in the presence of concept drift. In: Proceedings of the 7th International Conference on Hybrid Artificial Intelligent Systems. Salamanca. Springer, Spain, pp 526–537

  3. Bennett KP, Demiriz A, Maclin R (2002) Exploiting unlabeled data in ensemble methods. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 289–296

  4. Bifet A, Gavaldà R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of SIAM International Conference on Data Mining (SDM). Minneapolis, Minnesota, United States

  5. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 99(9):1601–1604

    Google Scholar 

  6. Castillo G (2008) Adaptive learning algorithms for Bayesian network classifiers. AI Commun 21(1):87–88

    MathSciNet  Google Scholar 

  7. Chapelle O, Schalkopf B, Zien A (2006) Semi-supervised learning. MIT press, Cambridge

    Book  Google Scholar 

  8. Ditzler G, Polikar R (2011) Semi-supervised learning in nonstationary environments. In: Proceeding of the International Joint Conference on Neural Networks (IJCNN). IEEE, pp 2741–2748

  9. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. Boston, Massachusetts, United States, pp 71–80

  10. Fan W (2004) Systematic data selection to mine concept-drifting data streams. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. Seattle. WA, United States, pp 128–137

  11. Fan W, Huang Y, Yu PS (2004) Decision tree evolution using limited number of labeled data items from drifting data streams. In: Proceedings of the 4th IEEE International Conference on Data Mining. IEEE Computer Society, pp 379–382

  12. Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Anal 10(1):23–45

    Google Scholar 

  13. Gama J, Medas P, Rocha R (2004) Forest trees for on-line data. In: Proceedings of the ACM Symposium on Applied Computing. ACM, pp 632–636

  14. Gao J, Fan W, Han J (2007) On appropriate assumptions to mine data streams: analysis and practice. In: Proceedings of the 7th IEEE International Conference on Data Mining. IEEE Computer Society

  15. Gholipour A, Hosseini MJ, Beigy H (2013) An adaptive regression tree for non-stationary data streams. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM. Coimbra, Portugal, pp 815–817

  16. Gomes JB, Menasalvas E, Sousa P (2010) Tracking recurrent concepts using context. In: Proceedings of the 7th International Conference on Rough Sets and Current Trends in Computing. Springer-Verlag, Warsaw, Poland, pp 168–177

  17. Hosseini MJ, Ahmadi Z, Beigy H (2011) Pool and accuracy based stream classification: a new ensemble algorithm on data stream classification using recurring concepts detection. In: Proceedings of the IEEE International Conference on Data Mining Workshops. IEEE. Vancouver, Canada, pp 588–595

  18. Hosseini MJ, Ahmadi Z, Beigy H (2012) New management operations on classifiers pool to track recurring concepts. In: Proceedings of the 14th international conference on data warehousing and knowledge discovery. Springer, Vienna, Austria, pp 327–339

  19. Hosseini MJ, Ahmadi Z, Beigy H (2013) Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification. Evol Syst 4(1):1–18

    Article  Google Scholar 

  20. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM. San Francisco, California, United States, pp 97–106

  21. Karimi Z, Abolhassani H, Beigy H (2012) A new method of mining data streams using harmony search. J Intell Inf Syst 39(2):491–511

    Article  Google Scholar 

  22. Katakis I, Tsoumakas G, Vlahavas I (2009) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391

    Article  Google Scholar 

  23. Klinkenberg R (2004) Learning drifting concepts: example selection vs. example weighting. Intell Data Anal 8(3):281–300

    Google Scholar 

  24. Kolter JZ, Maloof MA (2005) Using additive expert ensembles to cope with concept drift. In: Proceedings of the 22nd International Conference on Machine learning. ACM, Bonn, Germany, pp 449–456

  25. Li P, Wu X, Hu X (2010) Mining recurring concept drifts with limited labeled streaming data. In: Proceeding of the 2nd Asian Conference on Machine Learning (JMLR), Tokyo, Japan, pp 241–252

  26. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, California, United States, pp 281–297

  27. Masud MM, Gao J, Khan L, Han J, Thuraisingham B (2008) A practical approach to classify evolving data streams: training with limited amount of labeled data. In: Proceedings of the 8th IEEE International Conference on Data Mining, pp 929–934

  28. Masud MM, Woolam C, Gao J, Khan L, Han J, Hamlen KW, Oza NC (2012) Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl Inf Syst 33(1):213–244

    Article  Google Scholar 

  29. Minku LL (2011) Online ensemble learning in the presence of concept drift. University of Birmingham, Birmingham

    Google Scholar 

  30. Moon TK (1996) The expectation-maximization algorithm. Signal Process Mag IEEE 13(6):47–60

    Article  Google Scholar 

  31. Nishida K (2008) Learning and detecting concept drift. Information science and technology. Hokkaido University, Hokkaido

    Google Scholar 

  32. Padovitz A, Loke SW, Zaslavsky A (2004) Towards a theory of context spaces. In: Proceedings of the 2nd IEEE Annual Conference on Pervasive Computing and Communications Workshops. IEEE Computer Society, pp 38–42

  33. Scholz M, Klinlenberg R (2005) An ensemble classifier for drifting concepts. In: Proceedings of the 2nd International Workshop on Knowledge Discovery in Data Streams, Porto, Portugal, pp 53–64

  34. Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco, California, United States, pp 377–382

  35. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Washington, DC, United States, pp 531–540

  36. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

    Google Scholar 

  37. Woolam C, Masud MM, Khan L (2009) Lacking labels in the stream: classifying evolving stream data with few labels. In: Proceedings of the 18th International Symposium on Foundations of Intelligent Systems. Springer-Verlag, Prague, Czech Republic, pp 552–562

  38. Wu X, Li P, Hu X (2012) Learning from concept drifting data streams with unlabeled data. Neurocomputing. Elsevier, Amsterdam

    Google Scholar 

  39. Zhou ZH (2011) When semi-supervised learning meets ensemble learning. Front Electr Electron Eng China 6(1):6–16

    Article  Google Scholar 

  40. Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Synth Lect Artif Intell Mach learn 3(1):1–130

    Article  Google Scholar 

  41. Zliobaite I (2009) Learning under concept drift: an overview. Vilnius University, Technical Report

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions which improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Beigy.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hosseini, M.J., Gholipour, A. & Beigy, H. An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowl Inf Syst 46, 567–597 (2016). https://doi.org/10.1007/s10115-015-0837-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0837-4

Keywords

Navigation