Abstract
In the field of data mining, data stream mining has become one of the research focuses, in which noise and concept drift are two main challenges. Generally, sensitive classifiers are likely to over-fit noisy samples, while robust classifiers are prone to ignore concept drift in data streams. This contradictory raises a high requirement for online algorithms to respond to the above two challenges. In this paper, a Chunk Dynamic Weighted Majority module which processes samples chunk-by-chunk is combined with the online Gradient Boosting Decision Tree framework to cope with drifting data streams with noise. The method could discard weak classifiers which are not appropriate for the current concept distribution, and create new weak classifiers to adapt to drifting data streams. Besides, a robust Doom2 loss function is developed to address the noise sensitivity in stable data streams. The result of experiments demonstrates that compared with the state-of-the-art online algorithms, the proposed algorithm can obtain better results in both noisy stable data streams and drifting data streams.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Nguyen HL, Woon YK, Ng WK (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569
Tschumitschew K, Klawonn F (2017) Effects of drift and noise on the optimal sliding window size for data stream regression models. Commun Statist Theory and Meth 46(10):5109–5132
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp 148-156
Krawczyk B, Minku LL, Gama J, Stefanowski J, Wozniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fusion 37(9):132–156
Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Industr Inf 15(7):3952–3961
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Hong C, Yu J, Chen X (2013) Image-based 3D human pose recovery with locality sensitive sparse retrieval. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics, pp 2103-2108
Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE transac cybernet 45(4):767–779
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE transactions on pattern analysis and machine intelligence
Leistner C, Saffari A, Roth P M (2009) On robustness of on-line boosting-a competitive study. In: IEEE 12th International Conference on Computer Vision Workshops, pp 1362-1369
de Carvalho Santos S G T, Júnior P M G, dos Santos Silva G D (2014) Speeding up recovery from concept drifts. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 179-194
Chu F, Wang Y, Zaniolo C (2004) An adaptive learning approach for noisy data streams. In: IEEE International Conference on Data Mining, pp 351-354
Wang X, Chen B, Chang F (2011) A classification algorithm for noisy data streams with concept-drifting. J Comput Inform Syst 7(12):4392–4399
Gama J, Fernandes R, Rocha R (2006) Decision trees for mining data streams. Intell Data Analy 10(1):23–45
Li P, Wu X, Hu X (2015) Learning concept-drifting data streams with random ensemble decision trees. NeuroComputing 166(10):68–83
Oza, Nikunj C (2005) Online bagging and boosting. In: IEEE international conference on systems, man and cybernetics, pp 2340-2345
Gomes HM, Bifet A, Read J (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9–10):1469–1495
Jung Y H, Goetz J, Tewari A (2017) Online multiclass boosting. In: Advances in neural information processing systems, pp 919-928
Frery J, Habrard A, Sebban M (2018) Online non-linear gradient boosting in multi-latent spaces. In: International Symposium on Intelligent Data Analysis, pp 99-110
de Carvalho Santos S G T, de Barros RSM (2019) Online AdaBoost-based methods for multiclass problems. Artif Intell Rev 53(2):1293–1322
Jung Y H, Tewari A (2018) Online boosting algorithms for multi-label ranking. In: International Conference on Artificial Intelligence and Statistics, pp 279-287
Wang B, Pineau J (2016) Online bagging and boosting for imbalanced data streams. IEEE Trans Knowl Data Eng 28(12):3353–3366
Miao Q, Cao Y, Xia G (2015) Rboost: label noise-robust boosting algorithm based on a nonconvex loss function and the numerically stable base learners. IEEE transac Neural Netw and Learning Syst 27(11):2216–2228
Pi T, Li X, Zhang Z (2016) Self-paced boost learning for classification. In: International Joint Conference on Artificial Intelligence, pp 1932-1938
Xiao Z, Luo Z, Zhong B (2017) Robust and efficient boosting method using the conditional risk. IEEE Transac Neural Netw Learn Syst 29(7):3069–3083
Chen S, Zhu S, Yan Y (2016) Robust visual tracking via online semi-supervised co-boosting. Multimedia Syst 22(3):297–313
de Barros R S M, de Carvalho Santos S G T, Júnior P M G (2016) A boosting-like online learning ensemble. In: International Joint Conference on Neural Networks, pp 1871-1878
Zhang B, Chen Y, Xue L (2018) Research on concept-drifting data stream based on fuzzy integral ensemble classifier system. In: International Conference in Communications, Signal Processing, and Systems, pp 225-232
Shan J, Zhang H, Liu W (2018) Online active learning ensemble framework for drifted data streams. IEEE Transac Neural Netw Learn Syst 30(2):486–498
Ghomeshi H, Gaber MM, Kovalchuk Y (2019) EACD: evolutionary adaptation to concept drifts in data streams. Data Min Knowl Disc 33(3):663–694
Zhang P, Zhu X, Shi Y (2011) Robust ensemble learning for mining noisy data streams. Decis Support Syst 50(2):469–479
Zhai S, Xia T, Tan M (2013) Direct 0–1 loss minimization and margin maximization with boosting. In: Advances in Neural Information Processing Systems, pp 872–880
Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8(12):2755–2790
Huang C (2020) Exponential stability of inertial neural networks involving proportional delays and non-reduced order method. J Experim Theor Artif Intell 32(1):133–146
Huang C, Long X, Cao J (2020) Stability of antiperiodic recurrent neural networks with multiproportional delays. Math Meth Appl Sci 43(9):6093–6102
Zhang J, Huang C (2020) Dynamics analysis on a class of delayed neural networks involving inertial terms. Advances in Difference Equations 1:1–12
Yang X, Wen S, Liu Z, Li C, Huang C (2019) Dynamic properties of foreign exchange complex network. Mathematics 7(9):832
Street W N, Kim Y S (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Conference on Knowledge Discovery and Data Mining, pp 377-382
de Barros RSM, de Carvalho Santos S G T (2019) An overview and comprehensive comparison of ensembles for concept drift. Inf Fusion 52(12):213–244
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Luo, S., Zhao, W. & Pan, L. Online GBDT with Chunk Dynamic Weighted Majority Learners for Noisy and Drifting Data Streams. Neural Process Lett 53, 3783–3799 (2021). https://doi.org/10.1007/s11063-021-10565-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-021-10565-z