research-article

GPU-Accelerated Extreme Learning Machines for Imbalanced Data Streams with Concept Drift

Author:

Bartosz KrawczykAuthors Info & Claims

Procedia Computer Science, Volume 80, Issue C

Pages 1692 - 1701

https://doi.org/10.1016/j.procs.2016.05.509

Published: 01 June 2016 Publication History

Abstract

Mining data streams is one of the most vital fields in the current era of big data. Continuously arriving data may pose various problems, connected to their volume, variety or velocity. In this paper we focus on two important difficulties embedded in the nature of data streams: non- stationary nature and skewed class distributions. Such a scenario requires a classifier that is able to rapidly adapt itself to concept drift and displays robustness to class imbalance problem. We propose to use online version of Extreme Learning Machine that is enhanced by an efficient drift detector and method to alleviate the bias towards the majority class. We investigate three approaches based on undersampling, oversampling and cost-sensitive adaptation. Additionally, to allow for a rapid updating of the proposed classifier we show how to implement online Extreme Learning Machines with the usage of GPU. The proposed approach allows for a highly efficient mining of high-speed, drifting and imbalanced data streams with significant acceleration offered by GPU processing.

References

[1]

M. Alia-Martinez, J. Antonanzas, F. Antoanzas-Torres, A.V. Perna-Espinoza, and R. Urraca- Valle. A straightforward implementation of a gpu-accelerated ELM in R with NVIDIA graphic cards. In Hybrid Artificial Intelligent Systems - 10th International Conference, HAIS 2015, Bilbao, Spain, June 22-24, 2015, Proceedings, pages 656-667, 2015.

[2]

A. Bifet, G. De Francisci Morales, J. Read, G. Holmes, and B. Pfahringer. Efficient online evaluation of big data stream classifiers. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015, pages 59-68, 2015.

Digital Library

[3]

A. Bifet and R. Gavalda. Learning from time-changing data with adaptive windowing. In Proceedings of the Seventh SIAM International Conference on Data Mining, April 26-28, 2007, Minneapolis, Minnesota, USA, pages 443-448, 2007.

[4]

D. Brzezinski and J. Stefanowski. Prequential AUC for classifier evaluation and drift detection in evolving data streams. In New Frontiers in Mining Complex Patterns - Third International Work- shop, NFMCP 2014, Held in Conjunction with ECML-PKDD 2014, Nancy, France, September 19, 2014, Revised Selected Papers, pages 87-101, 2014.

[5]

P. Buteneers, K. Caluwaerts, J. Dambre, D. Verstraeten, B. Schrauwen, Optimized parameter search for large datasets of the regularization parameter and feature selection for ridge regression, Neural Processing Letters, 38 (2013) 403-416.

Digital Library

[6]

E. K. P. Chong and S. H. Z ak. An introduction to optimization, volume 76. John Wiley & Sons, 2013.

[7]

B. Cyganek, S. Gruszczyski., Hybrid computer vision system for drivers eye recognition and fatigue monitoring, Neurocomputing, 126 (2014) 78-94.

Digital Library

[8]

S. Ding, H. Zhao., Y. Zhang, X. Xu, R. Nie., Extreme learning machine: algorithm, theory and applications, Artif. Intell. Rev., 44 (2015) 103-115.

Digital Library

[9]

J. Gama., A survey on learning from data streams: current and future trends, Progress in AI, 1 (2012) 45-55.

[10]

G.-B. Huang, N.-Y. Liang, H.-J. Rong, P. Saratchandran, and N. Sundararajan. On-line sequential extreme learning machine. In IASTED International Conference on Computational Intelligence, Calgary, Alberta, Canada, July 4-6, 2005, pages 232-237, 2005.

[11]

D. Jankowski and K. Jackowski. Evolutionary algorithm for decision tree induction. In Computer Information Systems and Industrial Management - 13th IFIP TC8 International Conference, CISIM 2014, Ho Chi Minh City, Vietnam, November 5-7, 2014. Proceedings, pages 23-32, 2014.

[12]

B. Mirza, Z. Lin, N. Liu., Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift, Neurocomputing, 149 (2015) 316-329.

Digital Library

[13]

Y.-H. Pao, G.H. Park, D.J. Sobajic, Learning and generalization characteristics of the random vector functional-link net, Neurocomputing, 6 (1994) 163-180.

[14]

M. Przewozniczek, R. Goscien, K. Walkowiak, M. Klinkowski, Towards solving practical problems of large solution space using a novel pattern searching hybrid evolutionary algorithm - an elastic optical network optimization case study, Expert Syst. Appl., 42 (2015) 7781-7796.

Digital Library

[15]

I. Triguero, S. del Ro, V. Lpez, J. Bacardit, J.M. Bentez, F. Herrera, ROSEFW-RF: the winner algorithm for the ecbdl14 big data competition: An extremely imbalanced big data bioinformatics problem, Knowl. -Based Syst., 87 (2015) 69-79.

Digital Library

[16]

S. Wang, L.L. Minku, X. Yao., Resampling-based ensemble methods for online class imbalance learning, IEEE Trans. Knowl. Data Eng., 27 (2015) 1356-1368.

Digital Library

[17]

M. Woniak., A hybrid decision tree training method using data streams, Knowl. Inf. Syst., 29 (2011) 335-347.

Digital Library

[18]

M. Woniak. Application of combined classifiers to data stream classification. In Computer Information Systems and Industrial Management - 12th IFIP TC8 International Conference, CISIM 2013, Krakow, Poland, September 25-27, 2013. Proceedings, pages 13-23, 2013.

[19]

M. Woniak, P. Cal, and B. Cyganek. The influence of a classifiers diversity on the quality of weighted aging ensemble. In Intelligent Information and Database Systems - 6th Asian Conference, ACIIDS 2014, Bangkok, Thailand, April 7-9, 2014, Proceedings, Part II, pages 90-99, 2014.

[20]

M. Woniak, M. Graa, E. Corchado, A survey of multiple classifier systems as hybrid systems, Information Fusion, 16 (2014) 3-17.

Digital Library

[21]

M. Woniak, A. Kasprzak, and P. Cal. Weighted aging classifier ensemble for the incremental drifted data streams. In Flexible Query Answering Systems - 10th International Conference, FQAS 2013, Granada, Spain, September 18-20, 2013. Proceedings, pages 579-588, 2013.

Digital Library

Cited By

Shokrzade ATab FRamezani M(2020)ELM-NET, a closer to practice approach for classifying the big data using multiple independent ELMsCluster Computing10.1007/s10586-019-02957-723:2(735-757)Online publication date: 1-Jun-2020
https://dl.acm.org/doi/10.1007/s10586-019-02957-7

Index Terms

GPU-Accelerated Extreme Learning Machines for Imbalanced Data Streams with Concept Drift
1. Computing methodologies
  1. Machine learning

Recommendations

The Gradual Resampling Ensemble for mining imbalanced data streams with concept drift

Knowledge extraction from data streams has received increasing interest in recent years. However, most of the existing studies assume that the class distribution of data streams is relatively balanced. The reaction of concept drifts is more difficult if ...
Learning concept-drifting data streams with random ensemble decision trees

Few online classification algorithms based on traditional inductive ensembling, such as online bagging or boosting, focus on handling concept drifting data streams while performing well on noisy data. Motivated by this, an incremental algorithm based on ...
Mining concept-drifting data streams using ensemble classifiers
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Procedia Computer Science

Procedia Computer Science Volume 80, Issue C

June 2016

2452 pages

ISSN:1877-0509

EISSN:1877-0509

Issue’s Table of Contents

Copyright © The Authors.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 June 2016

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shokrzade ATab FRamezani M(2020)ELM-NET, a closer to practice approach for classifying the big data using multiple independent ELMsCluster Computing10.1007/s10586-019-02957-723:2(735-757)Online publication date: 1-Jun-2020
https://dl.acm.org/doi/10.1007/s10586-019-02957-7

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents