Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1557019.1557041acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

New ensemble methods for evolving data streams

Published: 28 June 2009 Publication History

Abstract

Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is when concepts drift or change completely, is becoming one of the core issues. When tackling non-stationary concepts, ensembles of classifiers have several advantages over single classifier methods: they are easy to scale and parallelize, they can adapt to change quickly by pruning under-performing parts of the ensemble, and they therefore usually also generate more accurate concept descriptions. This paper proposes a new experimental data stream framework for studying concept drift, and two new variants of Bagging: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. Using the new experimental framework, an evaluation study on synthetic and real-world datasets comprising up to ten million examples shows that the new ensemble methods perform very well compared to several known methods.

Supplementary Material

JPG File (p139-bifet_nemeds_01.jpg)
MP4 File (p139-bifet_nemeds_01.mp4)

References

[1]
R. Agrawal, S. P. Ghosh, T. Imielinski, B. R. Iyer, and A. N. Swami. An interval classifier for database mining applications. In VLDB '92, pages 560--573, 1992.
[2]
R. Agrawal, T. Imielinski, and A. Swami. Database mining: A performance perspective. IEEE Trans. on Knowl. and Data Eng., 5(6):914--925, 1993.
[3]
A. Asuncion and D. Newman. UCI machine learning repository, 2007.
[4]
M. Baena-Garcıa, J. D. Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavaldà, and R. Morales-Bueno. Early drift detection method. In Fourth International Workshop on Knowledge Discovery from Data Streams, 2006.
[5]
A. Bifet and R. Gavaldà. Learning from time-changing data with adaptive windowing. In SIAM International Conference on Data Mining, pages 443--448, 2007.
[6]
L. Breiman et al. Classification and Regression Trees. Chapman&Hall, New York, 1984.
[7]
F. Chu and C. Zaniolo. Fast and light boosting for adaptive mining of data streams. In PAKDD, pages 282--292. Springer Verlag, 2004.
[8]
P. Domingos and G. Hulten. Mining high-speed data streams. In Knowledge Discovery and Data Mining, pages 71--80, 2000.
[9]
J. Gama, P. Medas, G. Castillo, and P. Rodrigues. Learning with drift detection. In SBIA Brazilian Symposium on Artificial Intelligence, pages 286--295, 2004.
[10]
J. Gama, R. Rocha, and P. Medas. Accurate decision trees for mining high-speed data streams. In KDD '03, pages 523--528, August 2003.
[11]
J. Gehrke, R. Ramakrishnan, and V. Ganti. RainForest - a framework for fast decision tree construction of large datasets. In VLDB '98, pages 416--427, 1998.
[12]
M. Harries. Splice-2 comparative evaluation: Electricity pricing. Technical report, The University of South Wales, 1999.
[13]
G. Holmes, R. Kirkby, and B. Pfahringer. MOA: Massive Online Analysis. http://sourceforge.net/projects/ moa-datastream. 2007.
[14]
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD'01, pages 97--106, San Francisco, CA, 2001. ACM Press.
[15]
R. Kirkby. Improving Hoeffding Trees. PhD thesis, University of Waikato, November 2007.
[16]
D. D. Margineantu and T. G. Dietterich. Pruning adaptive boosting. In ICML '97, pages 211--218, 1997.
[17]
M. Mehta, R. Agrawal, and J. Rissanen. SLIQ: A fast scalable classifier for data mining. In EDBT '96, pages 18--32, London, UK, 1996. Springer-Verlag.
[18]
A. Narasimhamurthy and L. I. Kuncheva. A framework for generating data to simulate changing environments. In AIAP'07, pages 384--389, 2007.
[19]
N. Oza and S. Russell. Online bagging and boosting. In Artificial Intelligence and Statistics 2001, pages 105--112. Morgan Kaufmann, 2001.
[20]
N. C. Oza and S. Russell. Experimental comparisons of online and batch versions of bagging and boosting. In KDD '01, pages 359--364, August 2001.
[21]
R. Pelossof, M. Jones, I. Vovsha, and C. Rudin. Online coordinate boosting. http://arxiv.org/abs/0810.4553, 2008.
[22]
B. Pfahringer, G. Holmes, and R. Kirkby. New options for hoeffding trees. In AI, pages 90--99, 2007.
[23]
J. C. Schlimmer and R. H. Granger. Incremental learning from noisy data. Machine Learning, 1(3):317--354, 1986.
[24]
J. C. Shafer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. In VLDB '96, pages 544--555, 1996.
[25]
W. N. Street and Y. Kim. A streaming ensemble algorithm (SEA) for large-scale classification. In KDD '01, pages 377--382, New York, NY, USA, 2001. ACM Press.
[26]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, second edition, June 2005.
[27]
P. Zhang, X. Zhu, and Y. Shi. Categorizing and mining concept drifting data streams. In KDD '08, pages 812--820. ACM, 2008.

Cited By

View all
  • (2024)OEBench: Investigating Open Environment Challenges in Real-World Relational Data StreamsProceedings of the VLDB Endowment10.14778/3648160.364817017:6(1283-1296)Online publication date: 1-Feb-2024
  • (2024)Imbalance-Robust Multi-Label Self-Adjusting kNNACM Transactions on Knowledge Discovery from Data10.1145/366357518:8(1-30)Online publication date: 11-May-2024
  • (2024)Mini-batching with Fused Training and Testing for Data Streams Processing on the EdgeProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649188(51-60)Online publication date: 7-May-2024
  • Show More Cited By

Index Terms

  1. New ensemble methods for evolving data streams

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
    June 2009
    1426 pages
    ISBN:9781605584959
    DOI:10.1145/1557019
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. concept drift
    2. data streams
    3. decision trees
    4. ensemble methods

    Qualifiers

    • Research-article

    Conference

    KDD09

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)90
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)OEBench: Investigating Open Environment Challenges in Real-World Relational Data StreamsProceedings of the VLDB Endowment10.14778/3648160.364817017:6(1283-1296)Online publication date: 1-Feb-2024
    • (2024)Imbalance-Robust Multi-Label Self-Adjusting kNNACM Transactions on Knowledge Discovery from Data10.1145/366357518:8(1-30)Online publication date: 11-May-2024
    • (2024)Mini-batching with Fused Training and Testing for Data Streams Processing on the EdgeProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649188(51-60)Online publication date: 7-May-2024
    • (2024)An Adaptive Hoeffding Tree Model Based on Differential Entropy and Relative Entropy for Concept Drift Detection2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650818(1-8)Online publication date: 30-Jun-2024
    • (2024)Generating Explanations for Model Incorrect Decisions via Hierarchical Optimization of Conceptual Sensitivity2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650097(1-8)Online publication date: 30-Jun-2024
    • (2024)OPTWIN: Drift Identification with Optimal Sub-Windows2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00049(331-337)Online publication date: 13-May-2024
    • (2024)Adaptive regularized ensemble for evolving data stream classificationPattern Recognition Letters10.1016/j.patrec.2024.02.026Online publication date: Mar-2024
    • (2024)Synchronization-based semi-supervised data streams classification with label evolution and extreme verification delayInformation Sciences10.1016/j.ins.2024.120933(120933)Online publication date: Jun-2024
    • (2024)An experimental review of the ensemble-based data stream classification algorithms in non-stationary environmentsComputers and Electrical Engineering10.1016/j.compeleceng.2024.109420118(109420)Online publication date: Sep-2024
    • (2024)A comprehensive review of clustering techniques in artificial intelligence for knowledge discovery: Taxonomy, challenges, applications and future prospectsAdvanced Engineering Informatics10.1016/j.aei.2024.10279962(102799)Online publication date: Oct-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media