Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Online GBDT with Chunk Dynamic Weighted Majority Learners for Noisy and Drifting Data Streams

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In the field of data mining, data stream mining has become one of the research focuses, in which noise and concept drift are two main challenges. Generally, sensitive classifiers are likely to over-fit noisy samples, while robust classifiers are prone to ignore concept drift in data streams. This contradictory raises a high requirement for online algorithms to respond to the above two challenges. In this paper, a Chunk Dynamic Weighted Majority module which processes samples chunk-by-chunk is combined with the online Gradient Boosting Decision Tree framework to cope with drifting data streams with noise. The method could discard weak classifiers which are not appropriate for the current concept distribution, and create new weak classifiers to adapt to drifting data streams. Besides, a robust Doom2 loss function is developed to address the noise sensitivity in stable data streams. The result of experiments demonstrates that compared with the state-of-the-art online algorithms, the proposed algorithm can obtain better results in both noisy stable data streams and drifting data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Nguyen HL, Woon YK, Ng WK (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569

    Article  Google Scholar 

  2. Tschumitschew K, Klawonn F (2017) Effects of drift and noise on the optimal sliding window size for data stream regression models. Commun Statist Theory and Meth 46(10):5109–5132

    Article  MathSciNet  Google Scholar 

  3. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp 148-156

  4. Krawczyk B, Minku LL, Gama J, Stefanowski J, Wozniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fusion 37(9):132–156

    Article  Google Scholar 

  5. Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Industr Inf 15(7):3952–3961

    Article  Google Scholar 

  6. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670

    Article  MathSciNet  Google Scholar 

  7. Hong C, Yu J, Chen X (2013) Image-based 3D human pose recovery with locality sensitive sparse retrieval. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics, pp 2103-2108

  8. Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE transac cybernet 45(4):767–779

    Article  Google Scholar 

  9. Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE transactions on pattern analysis and machine intelligence

  10. Leistner C, Saffari A, Roth P M (2009) On robustness of on-line boosting-a competitive study. In: IEEE 12th International Conference on Computer Vision Workshops, pp 1362-1369

  11. de Carvalho Santos S G T, Júnior P M G, dos Santos Silva G D (2014) Speeding up recovery from concept drifts. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 179-194

  12. Chu F, Wang Y, Zaniolo C (2004) An adaptive learning approach for noisy data streams. In: IEEE International Conference on Data Mining, pp 351-354

  13. Wang X, Chen B, Chang F (2011) A classification algorithm for noisy data streams with concept-drifting. J Comput Inform Syst 7(12):4392–4399

    Google Scholar 

  14. Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Analy 10(1):23–45

    Article  Google Scholar 

  15. Li P, Wu X, Hu X (2015) Learning concept-drifting data streams with random ensemble decision trees. NeuroComputing 166(10):68–83

    Article  Google Scholar 

  16. Oza, Nikunj C (2005) Online bagging and boosting. In: IEEE international conference on systems, man and cybernetics, pp 2340-2345

  17. Gomes HM, Bifet A, Read J (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9–10):1469–1495

    Article  MathSciNet  Google Scholar 

  18. Jung Y H, Goetz J, Tewari A (2017) Online multiclass boosting. In: Advances in neural information processing systems, pp 919-928

  19. Frery J, Habrard A, Sebban M (2018) Online non-linear gradient boosting in multi-latent spaces. In: International Symposium on Intelligent Data Analysis, pp 99-110

  20. de Carvalho Santos S G T, de Barros RSM (2019) Online AdaBoost-based methods for multiclass problems. Artif Intell Rev 53(2):1293–1322

    Google Scholar 

  21. Jung Y H, Tewari A (2018) Online boosting algorithms for multi-label ranking. In: International Conference on Artificial Intelligence and Statistics, pp 279-287

  22. Wang B, Pineau J (2016) Online bagging and boosting for imbalanced data streams. IEEE Trans Knowl Data Eng 28(12):3353–3366

    Article  Google Scholar 

  23. Miao Q, Cao Y, Xia G (2015) Rboost: label noise-robust boosting algorithm based on a nonconvex loss function and the numerically stable base learners. IEEE transac Neural Netw and Learning Syst 27(11):2216–2228

    Article  MathSciNet  Google Scholar 

  24. Pi T, Li X, Zhang Z (2016) Self-paced boost learning for classification. In: International Joint Conference on Artificial Intelligence, pp 1932-1938

  25. Xiao Z, Luo Z, Zhong B (2017) Robust and efficient boosting method using the conditional risk. IEEE Transac Neural Netw Learn Syst 29(7):3069–3083

    Google Scholar 

  26. Chen S, Zhu S, Yan Y (2016) Robust visual tracking via online semi-supervised co-boosting. Multimedia Syst 22(3):297–313

    Article  Google Scholar 

  27. de Barros R S M, de Carvalho Santos S G T, Júnior P M G (2016) A boosting-like online learning ensemble. In: International Joint Conference on Neural Networks, pp 1871-1878

  28. Zhang B, Chen Y, Xue L (2018) Research on concept-drifting data stream based on fuzzy integral ensemble classifier system. In: International Conference in Communications, Signal Processing, and Systems, pp 225-232

  29. Shan J, Zhang H, Liu W (2018) Online active learning ensemble framework for drifted data streams. IEEE Transac Neural Netw Learn Syst 30(2):486–498

    Article  Google Scholar 

  30. Ghomeshi H, Gaber MM, Kovalchuk Y (2019) EACD: evolutionary adaptation to concept drifts in data streams. Data Min Knowl Disc 33(3):663–694

    Article  Google Scholar 

  31. Zhang P, Zhu X, Shi Y (2011) Robust ensemble learning for mining noisy data streams. Decis Support Syst 50(2):469–479

    Article  Google Scholar 

  32. Zhai S, Xia T, Tan M (2013) Direct 0–1 loss minimization and margin maximization with boosting. In: Advances in Neural Information Processing Systems, pp 872–880

  33. Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8(12):2755–2790

    MATH  Google Scholar 

  34. Huang C (2020) Exponential stability of inertial neural networks involving proportional delays and non-reduced order method. J Experim Theor Artif Intell 32(1):133–146

    Article  Google Scholar 

  35. Huang C, Long X, Cao J (2020) Stability of antiperiodic recurrent neural networks with multiproportional delays. Math Meth Appl Sci 43(9):6093–6102

    Article  MathSciNet  Google Scholar 

  36. Zhang J, Huang C (2020) Dynamics analysis on a class of delayed neural networks involving inertial terms. Advances in Difference Equations 1:1–12

  37. Yang X, Wen S, Liu Z, Li C, Huang C (2019) Dynamic properties of foreign exchange complex network. Mathematics 7(9):832

    Article  Google Scholar 

  38. Street W N, Kim Y S (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Conference on Knowledge Discovery and Data Mining, pp 377-382

  39. de Barros RSM, de Carvalho Santos S G T (2019) An overview and comprehensive comparison of ensembles for concept drift. Inf Fusion 52(12):213–244

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Limin Pan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, S., Zhao, W. & Pan, L. Online GBDT with Chunk Dynamic Weighted Majority Learners for Noisy and Drifting Data Streams. Neural Process Lett 53, 3783–3799 (2021). https://doi.org/10.1007/s11063-021-10565-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10565-z

Keywords

Navigation