Online GBDT with Chunk Dynamic Weighted Majority Learners for Noisy and Drifting Data Streams

376 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In the field of data mining, data stream mining has become one of the research focuses, in which noise and concept drift are two main challenges. Generally, sensitive classifiers are likely to over-fit noisy samples, while robust classifiers are prone to ignore concept drift in data streams. This contradictory raises a high requirement for online algorithms to respond to the above two challenges. In this paper, a Chunk Dynamic Weighted Majority module which processes samples chunk-by-chunk is combined with the online Gradient Boosting Decision Tree framework to cope with drifting data streams with noise. The method could discard weak classifiers which are not appropriate for the current concept distribution, and create new weak classifiers to adapt to drifting data streams. Besides, a robust Doom2 loss function is developed to address the noise sensitivity in stable data streams. The result of experiments demonstrates that compared with the state-of-the-art online algorithms, the proposed algorithm can obtain better results in both noisy stable data streams and drifting data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OEC: an online ensemble classifier for mining data streams with noisy labels

Article 12 December 2023

Dynamic weighted selective ensemble learning algorithm for imbalanced data streams

Article 23 September 2021

Kappa Updated Ensemble for drifting data stream mining

Article 02 October 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Nguyen HL, Woon YK, Ng WK (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569
Article Google Scholar
Tschumitschew K, Klawonn F (2017) Effects of drift and noise on the optimal sliding window size for data stream regression models. Commun Statist Theory and Meth 46(10):5109–5132
Article MathSciNet Google Scholar
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp 148-156
Krawczyk B, Minku LL, Gama J, Stefanowski J, Wozniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fusion 37(9):132–156
Article Google Scholar
Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Industr Inf 15(7):3952–3961
Article Google Scholar
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Article MathSciNet Google Scholar
Hong C, Yu J, Chen X (2013) Image-based 3D human pose recovery with locality sensitive sparse retrieval. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics, pp 2103-2108
Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE transac cybernet 45(4):767–779
Article Google Scholar
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE transactions on pattern analysis and machine intelligence
Leistner C, Saffari A, Roth P M (2009) On robustness of on-line boosting-a competitive study. In: IEEE 12th International Conference on Computer Vision Workshops, pp 1362-1369
de Carvalho Santos S G T, Júnior P M G, dos Santos Silva G D (2014) Speeding up recovery from concept drifts. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 179-194
Chu F, Wang Y, Zaniolo C (2004) An adaptive learning approach for noisy data streams. In: IEEE International Conference on Data Mining, pp 351-354
Wang X, Chen B, Chang F (2011) A classification algorithm for noisy data streams with concept-drifting. J Comput Inform Syst 7(12):4392–4399
Google Scholar
Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Analy 10(1):23–45
Article Google Scholar
Li P, Wu X, Hu X (2015) Learning concept-drifting data streams with random ensemble decision trees. NeuroComputing 166(10):68–83
Article Google Scholar
Oza, Nikunj C (2005) Online bagging and boosting. In: IEEE international conference on systems, man and cybernetics, pp 2340-2345
Gomes HM, Bifet A, Read J (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9–10):1469–1495
Article MathSciNet Google Scholar
Jung Y H, Goetz J, Tewari A (2017) Online multiclass boosting. In: Advances in neural information processing systems, pp 919-928
Frery J, Habrard A, Sebban M (2018) Online non-linear gradient boosting in multi-latent spaces. In: International Symposium on Intelligent Data Analysis, pp 99-110
de Carvalho Santos S G T, de Barros RSM (2019) Online AdaBoost-based methods for multiclass problems. Artif Intell Rev 53(2):1293–1322
Google Scholar
Jung Y H, Tewari A (2018) Online boosting algorithms for multi-label ranking. In: International Conference on Artificial Intelligence and Statistics, pp 279-287
Wang B, Pineau J (2016) Online bagging and boosting for imbalanced data streams. IEEE Trans Knowl Data Eng 28(12):3353–3366
Article Google Scholar
Miao Q, Cao Y, Xia G (2015) Rboost: label noise-robust boosting algorithm based on a nonconvex loss function and the numerically stable base learners. IEEE transac Neural Netw and Learning Syst 27(11):2216–2228
Article MathSciNet Google Scholar
Pi T, Li X, Zhang Z (2016) Self-paced boost learning for classification. In: International Joint Conference on Artificial Intelligence, pp 1932-1938
Xiao Z, Luo Z, Zhong B (2017) Robust and efficient boosting method using the conditional risk. IEEE Transac Neural Netw Learn Syst 29(7):3069–3083
Google Scholar
Chen S, Zhu S, Yan Y (2016) Robust visual tracking via online semi-supervised co-boosting. Multimedia Syst 22(3):297–313
Article Google Scholar
de Barros R S M, de Carvalho Santos S G T, Júnior P M G (2016) A boosting-like online learning ensemble. In: International Joint Conference on Neural Networks, pp 1871-1878
Zhang B, Chen Y, Xue L (2018) Research on concept-drifting data stream based on fuzzy integral ensemble classifier system. In: International Conference in Communications, Signal Processing, and Systems, pp 225-232
Shan J, Zhang H, Liu W (2018) Online active learning ensemble framework for drifted data streams. IEEE Transac Neural Netw Learn Syst 30(2):486–498
Article Google Scholar
Ghomeshi H, Gaber MM, Kovalchuk Y (2019) EACD: evolutionary adaptation to concept drifts in data streams. Data Min Knowl Disc 33(3):663–694
Article Google Scholar
Zhang P, Zhu X, Shi Y (2011) Robust ensemble learning for mining noisy data streams. Decis Support Syst 50(2):469–479
Article Google Scholar
Zhai S, Xia T, Tan M (2013) Direct 0–1 loss minimization and margin maximization with boosting. In: Advances in Neural Information Processing Systems, pp 872–880
Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8(12):2755–2790
MATH Google Scholar
Huang C (2020) Exponential stability of inertial neural networks involving proportional delays and non-reduced order method. J Experim Theor Artif Intell 32(1):133–146
Article Google Scholar
Huang C, Long X, Cao J (2020) Stability of antiperiodic recurrent neural networks with multiproportional delays. Math Meth Appl Sci 43(9):6093–6102
Article MathSciNet Google Scholar
Zhang J, Huang C (2020) Dynamics analysis on a class of delayed neural networks involving inertial terms. Advances in Difference Equations 1:1–12
Yang X, Wen S, Liu Z, Li C, Huang C (2019) Dynamic properties of foreign exchange complex network. Mathematics 7(9):832
Article Google Scholar
Street W N, Kim Y S (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Conference on Knowledge Discovery and Data Mining, pp 377-382
de Barros RSM, de Carvalho Santos S G T (2019) An overview and comprehensive comparison of ensembles for concept drift. Inf Fusion 52(12):213–244
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Electronics, Beijing Institute of Technology, Beijing, China
Senlin Luo, Weixiao Zhao & Limin Pan

Authors

Senlin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Weixiao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Limin Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Limin Pan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, S., Zhao, W. & Pan, L. Online GBDT with Chunk Dynamic Weighted Majority Learners for Noisy and Drifting Data Streams. Neural Process Lett 53, 3783–3799 (2021). https://doi.org/10.1007/s11063-021-10565-z

Download citation

Accepted: 13 June 2021
Published: 10 July 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11063-021-10565-z

Online GBDT with Chunk Dynamic Weighted Majority Learners for Noisy and Drifting Data Streams

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

OEC: an online ensemble classifier for mining data streams with noisy labels

Dynamic weighted selective ensemble learning algorithm for imbalanced data streams

Kappa Updated Ensemble for drifting data stream mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Online GBDT with Chunk Dynamic Weighted Majority Learners for Noisy and Drifting Data Streams

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

OEC: an online ensemble classifier for mining data streams with noisy labels

Dynamic weighted selective ensemble learning algorithm for imbalanced data streams

Kappa Updated Ensemble for drifting data stream mining

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now