Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3477314.3507074acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Optimizing ADWIN for steady streams

Published: 06 May 2022 Publication History

Abstract

With the ever-growing data generation rates and stringent constraints on the latency of analyzing such data, stream analytics is overtaking. Learning from data streams, aka online machine learning, is no exception. However, online machine learning comes with many challenges for the different aspects of the learning process, starting from the algorithm design to the evaluation method. One of these challenges is the ability of a learning system to adapt to the change in data distribution, known as concept drift, to maintain the accuracy of the predictions. Over time, several drift detection approaches have been proposed. A prominent approach is adaptive windowing (ADWIN) which can detect changes in features data distribution without explicit feedback on the correctness of the prediction. Several variants for ADWIN have been proposed to enhance its runtime performance, w.r.t throughput, and latency. However, the drift detection accuracy of these variants was not compared with the original algorithm. Moreover, there is no study concerning the memory consumption of the variants and the original algorithm. Additionally, the evaluation was done on synthetic datasets with a considerable number of drifts not covering all types of drifts or steady streams, those that do not have drifts at all or almost negligible drifts.
The contribution of this paper is two-fold. First, we compare the original Adaptive Window (ADWIN) and its variants: Serial, HalfCut, and Optimistic in terms of drift detection accuracy, detection speed, and memory consumption, represented in the internal window size. We compare them using synthetic data sets covering different types of concept drifts, namely: incremental, gradual, abrupt, and steady. We also use two real-life datasets whose drifts are unknown. Second, we present ADWIN++. We use an adaptive bucket dropping technique to control window size. We evaluate our technique on the same data sets above and new datasets with fewer drifts. Experiments show that our approach saves about 80% of memory consumption. Moreover, it takes less time to detect concept drift and maintains the drift detection accuracy.

References

[1]
Manuel Baena-Garcia, José del Campo-Ávila, Raúl Fidalgo, Albert Bifet, R Gavalda, and Rafael Morales-Bueno. 2006. Early drift detection method. In Fourth international workshop on knowledge discovery from data streams, Vol. 6. 77--86.
[2]
Albert Bifet and Ricard Gavaldà. 2007. Learning from Time-Changing Data with Adaptive Windowing. In ICDM. SIAM, 443--448.
[3]
Albert Bifet, Ricard Gavaldà, Geoff Holmes, and Bernhard Pfahringer. 2018. Machine learning for data streams: with practical examples in MOA. MIT press.
[4]
Dariusz Brzezinski and Jerzy Stefanowski. 2011. Accuracy Updated Ensemble for Data Streams with Concept Drift. In HAIS (LNCS, Vol. 6679). Springer, 155--163.
[5]
Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 2002. Maintaining stream statistics over sliding windows. SIAM journal on computing 31, 6 (2002), 1794--1813.
[6]
Roberto Souto Maior de Barros, Danilo Rafael de Lima Cabral, Paulo Mauricio Gonçalves Jr., and Silas Garrido Teixeira de Carvalho Santos. 2017. RDDM: Reactive drift detection method. Expert Syst. Appl. 90 (2017), 344--355.
[7]
Danilo Rafael de Lima Cabral and Roberto Souto Maior de Barros. 2018. Concept drift detection based on Fisher's Exact test. Inf. Sci. 442--443 (2018), 220--234.
[8]
Pedro M. Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In SIGKDD. ACM, 71--80.
[9]
João Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues. 2004. Learning with Drift Detection. In Advances in Artificial Intelligence - SBIA 2004. Springer Berlin Heidelberg, 286--295.
[10]
João Gama, Pedro Medas, Gladys Castillo, and Pedro Pereira Rodrigues. 2004. Learning with Drift Detection. In SBIA (Lecture Notes in Computer Science, Vol. 3171). Springer, 286--295.
[11]
Philipp M. Grulich, René Saitenmacher, Jonas Traub, Sebastian Breß, Tilmann Rabl, and Volker Markl. 2018. Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing. In EDBT. OpenProceedings.org, 477--480.
[12]
Wassily Hoefding. 1963. PROBABILITY INEQUALITIES FOR SUMS OF BOUNDED RANDOM V ARIABLES. J. Am. Stat. Assoc. 58 (1963), 13--30.
[13]
Geoff Hulten, Laurie Spencer, and Pedro M. Domingos. 2001. Mining time-changing data streams. In SIGKDD. ACM, 97--106.
[14]
Jeremy Z. Kolter and Marcus A. Maloof. 2003. Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift. In ICDM. IEEE, 123--130.
[15]
Gang Liu, Hong-rong Cheng, Zhi-guang Qin, Qiao Liu, and Cai-xia Liu. 2013. E-CVFDT: An improving CVFDT method for concept drift data stream. In ICCCAS. IEEE, 315--318.
[16]
Gang Liu, Hong-rong Cheng, Zhi-guang Qin, Qiao Liu, and Cai-xia Liu. 2013. E-CVFDT: An improving CVFDT method for concept drift data stream. In ICCCAS, Vol. 1. IEEE, 315--318.
[17]
Simona Micevska, Ahmed Awad, and Sherif Sakr. 2021. SDDM: an interpretable statistical concept drift detection method for data streams. J. Intell. Inf. Syst. 56, 3 (2021), 459--484.
[18]
Ewan S Page. 1954. Continuous inspection schemes. Biometrika 41, 1/2 (1954), 100--115.
[19]
W. Nick Street and YongSeog Kim. 2001. A streaming ensemble algorithm (SEA) for large-scale classification. In SIGKDD. ACM, 377--382.
[20]
A Wald. 1973. Sequential analysis: Courier Corporation.
[21]
Haixun Wang, WeiFan, Philip S Yu, and Jiawei Han. 2003. Mining concept-drifting data streams using ensemble classifiers. In SIGKDD. ACM, 226--235.
[22]
Scott Wares, John Isaacs, and Eyad Elyan. 2019. Data stream mining: methods and challenges for handling concept drift. SN Applied Sciences 1, 11 (2019), 1--19.

Cited By

View all
  • (2024)Multi-Step Ahead Water Level Forecasting Using Deep Neural NetworksWater10.3390/w1621315316:21(3153)Online publication date: 4-Nov-2024
  • (2024)A comprehensive analysis of concept drift locality in data streamsKnowledge-Based Systems10.1016/j.knosys.2024.111535289:COnline publication date: 25-Jun-2024
  • (2022)Benchmarking Concept Drift Detectors for Online Machine LearningModel and Data Engineering10.1007/978-3-031-21595-7_4(43-57)Online publication date: 21-Nov-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
April 2022
2099 pages
ISBN:9781450387132
DOI:10.1145/3477314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 May 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ADWIN
  2. concept drifts
  3. online machine learning
  4. steady streams

Qualifiers

  • Research-article

Funding Sources

Conference

SAC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Multi-Step Ahead Water Level Forecasting Using Deep Neural NetworksWater10.3390/w1621315316:21(3153)Online publication date: 4-Nov-2024
  • (2024)A comprehensive analysis of concept drift locality in data streamsKnowledge-Based Systems10.1016/j.knosys.2024.111535289:COnline publication date: 25-Jun-2024
  • (2022)Benchmarking Concept Drift Detectors for Online Machine LearningModel and Data Engineering10.1007/978-3-031-21595-7_4(43-57)Online publication date: 21-Nov-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media