Abstract
Few online classification algorithms based on traditional inductive ensembling focus on handling concept drifting data streams while performing well on noisy data. Motivated by this, an incremental algorithm based on random Ensemble Decision Trees for Concept-drifting data streams (EDTC) is proposed in this paper. Three variants of random feature selection are developed to implement split-tests. To better track concept drifts in data streams with noisy data, an improved two-threshold-based drifting detection mechanism is introduced. Extensive studies demonstrate that our algorithm performs very well compared to several known online algorithms based on single models and ensemble models. A conclusion is hence drawn that multiple solutions are provided for learning from concept drifting data streams with noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abdulsalam, H., Skillicorn, D.B., Martin, P.: Streaming Random Forests. In: DEAS 2007, pp. 225–232 (2007)
Abdulsalam, H., Skillicorn, D.B., Martin, P.: Classifying Evolving Data Streams Using Dynamic Streaming Random Forests. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 643–651. Springer, Heidelberg (2008)
Albert, B., Holmes, G., Pfahringer, B., Kirkby, R., Gavald, R.: New Ensemble Methods For Evolving Data Streams. In: KDD 2009, pp. 139–148 (2009)
Baena-García, M., Campo-Ávila, J.D., Fidalgo, R., Bifet, A., Gavaldà, R., Morales-Bueno, R.: Early Drift Detection Method. In: ECML PKDD Workshop 2006, pp. 77–86 (2006)
Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)
Dietterich, T.G.: An Experimental Comparison of Three Methods for Constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)
Fan, W.: On the Optimality of Probability Estimation by Random Decision Trees. In: AAAI 2004, pp. 336–341 (2004)
Fan, W.: StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams. In: VLDB 2004, pp. 1257–1260 (2004)
Fan, W., Wang, H.X., Yu, P.S., Ma, S.: Is Random Model Better? On Its Accuracy and Efficiency. In: ICDM 2003, pp. 51–58 (2003)
Gama, J., Medas, P., Castillo, G., Rodrigues, P.P.: Learning with Drift Detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)
Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in Evaluation of Stream Learning Algorithms. In: KDD 2009, pp. 329–338 (2009)
Hoeffding, W.: Probability Inequalities for Sums of Bounded Random Variabless. Journal of the American Statistical Association 58(301), 13–30 (1963)
Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis (2007), http://sourceforge.net/projects/moa-datastream
Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Hulten, G., Spencer, L., Domingos, P.: Mining Time-Changing Data Streams. In: KDD 2001, pp. 97–106 (2001)
KDDCUP99 data set (1999), http://kdd.ics.uci.edu/databases/kddcup99
Li, P.P., Hu, X.G., Wu, X.D.: Mining Concept-drifting Data Streams with Multiple Semi-random Decision Trees. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 733–740. Springer, Heidelberg (2008)
Li, P., Liang, Q., Wu, X., Hu, X.: Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 376–388. Springer, Heidelberg (2009)
Quinlan, R.J.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Schlimmer Jr., J.C., Granger, R.H.: Incremental Learning from Noisy Data. Machine Learning 1(3), 317–354 (1986)
Scholz, M., Klinkenberg, R.: Boosting Classifiers for Drifting Concepts. Intelligent Data Analysis (IDA) 11(1), 3–28 (2007)
Shafer, J., Agrawal, R., Mehta, M.: SPRINT: A Scalable Parallel Classifier for Data Mining. In: VLDB 1996, pp. 544–555 (1996)
Street, W.N., Kim, Y.S.: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification. In: KDD 2001, pp. 377–382 (2001)
Wang, H.X., Fan, W., Yu, P.S., Han, J.W.: Mining Concept-drifting Data Streams Using Ensemble Classifiers. In: KDD 2003, pp. 226–235 (2003)
Yahoo! Shopping Web Services, http://developer.yahoo.com/everything.html
Yang, Y., Wu, X., Zhu, X.: Combining Proactive and Reactive Predictions for Data Streams. In: KDD 2005, pp. 710–715 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, P., Wu, X., Liang, Q., Hu, X., Zhang, Y. (2011). Random Ensemble Decision Trees for Learning Concept-Drifting Data Streams. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20841-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-20841-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20840-9
Online ISBN: 978-3-642-20841-6
eBook Packages: Computer ScienceComputer Science (R0)