Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Machine learning for streaming data: state of the art, challenges, and opportunities

Published: 26 November 2019 Publication History

Abstract

Incremental learning, online learning, and data stream learning are terms commonly associated with learning algorithms that update their models given a continuous influx of data without performing multiple passes over data. Several works have been devoted to this area, either directly or indirectly as characteristics of big data processing, i.e., Velocity and Volume. Given the current industry needs, there are many challenges to be addressed before existing methods can be efficiently applied to real-world problems. In this work, we focus on elucidating the connections among the current stateof- the-art on related fields; and clarifying open challenges in both academia and industry. We treat with special care topics that were not thoroughly investigated in past position and survey papers. This work aims to evoke discussion and elucidate the current research opportunities, highlighting the relationship of different subareas and suggesting courses of action when possible.

References

[1]
Z. S. Abdallah, M. M. Gaber, B. Srinivasan, and S. Krishnaswamy. Activity recognition with evolving data streams: A review. ACM Computing Surveys (CSUR), 51(4):71, 2018.
[2]
A. Agarwal, O. Chapelle, M. Dud´ik, and J. Langford. A reliable effective terascale linear learning system. The Journal of Machine Learning Research, 15(1):1111--1133, 2014.
[3]
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In International Conference on Very Large Data Bases (VLDB), pages 81--92, 2003.
[4]
C. C. Aggarwal and P. S. Yu. On classification of highcardinality data streams. In SIAM International Conference on Data Mining, pages 802--813, 2010.
[5]
T. Al-Khateeb, M. M. Masud, L. Khan, C. C. Aggarwal, J. Han, and B. M. Thuraisingham. Stream classification with recurring and novel class detection using class-based ensemble. In ICDM, pages 31--40, 2012.
[6]
M. Armbrust, T. Das, J. Torres, B. Yavuz, S. Zhu, R. Xin, A. Ghodsi, I. Stoica, and M. Zaharia. Structured streaming: A declarative api for real-time applications in apache spark. In International Conference on Management of Data, pages 601--613, 2018.
[7]
M. Baena-Garc´a, J. del Campo- ´Avila, R. Fidalgo, A. Bifet, R. Gavald'a, and R. Morales-Bueno. Early drift detection method. 2006.
[8]
D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012.
[9]
J. P. Barddal, H. M. Gomes, and F. Enembreck. Analyzing the impact of feature drifts in streaming learning. In International Conference on Neural Information Processing, pages 21--28. Springer, 2015.
[10]
J. P. Barddal, H. M. Gomes, F. Enembreck, and B. Pfahringer. A survey on feature drift adaptation: Definition, benchmark, challenges and future directions. Journal of Systems and Software, 127:278 -- 294, 2017.
[11]
R. Bardenet, M. Brendel, B. K´egl, and M. Sebag. Collaborative hyperparameter tuning. In International Conference on Machine Learning, pages 199-- 207, 2013.
[12]
M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In ACM Symposium on Information, computer and communications security, pages 16--25, 2006.
[13]
L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The annals of mathematical statistics, 41(1):164--171, 1970.
[14]
Y. Ben-Haim and E. Tom-Tov. A streaming parallel decision tree algorithm. The Journal of Machine Learning Research, 11:849--872, 2010.
[15]
A. Bifet. Classifier concept drift detection and the illusion of progress. In International Conference on Artificial Intelligence and Soft Computing, pages 715--725. Springer, 2017.
[16]
A. Bifet, G. de Francisci Morales, J. Read, G. Holmes, and B. Pfahringer. Efficient online evaluation of big data stream classifiers. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 59--68, 2015.
[17]
A. Bifet and R. Gavalda. Learning from time-changing data with adaptive windowing. In SIAM international conference on data mining, pages 443--448, 2007.
[18]
A. Bifet and R. Gavald'a. Adaptive learning from evolving data streams. In International Symposium on Intelligent Data Analysis, pages 249--260. Springer, 2009.
[19]
A. Bifet, R. Gavalda, G. Holmes, and B. Pfahringer. Machine Learning for Data Streams: with Practical Examples in MOA. Adaptive Computation and Machine Learning series. MIT Press, 2018.
[20]
A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. Moa: Massive online analysis. The Journal of Machine Learning Research, 11:1601--1604, 2010. SIGKDD Explorations Volume 21, Issue 1 Page 19
[21]
A. Bifet, G. Holmes, and B. Pfahringer. Leveraging bagging for evolving data streams. In PKDD, pages 135--150, 2010.
[22]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.
[23]
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Conference on Computational learning theory, pages 92--100, 1998.
[24]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.
[25]
P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4), 2015.
[26]
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321--357, 2002.
[27]
S. Chen and H. He. Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach. Evolving Systems, 2(1):35--50, 2011.
[28]
M. Chenaghlou, M. Moshtaghi, C. Leckie, and M. Salehi. Online clustering for evolving data streams with online anomaly detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 508--521. Springer, 2018.
[29]
W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. Tseng. Unbiased online active learning in data streams. In ACM SIGKDD international conference on Knowledge discovery and data mining, pages 195-- 203, 2011.
[30]
G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.
[31]
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM journal on computing, 31(6):1794--1813, 2002.
[32]
E. R. de Faria, A. C. P. de Leon Ferreira de Carvalho, and J. Gama. MINAS: multiclass learning algorithm for novelty detection in data streams. Data Mining Knowledge Discovery, 30(3):640--680, 2016.
[33]
G. De Francisci Morales and A. Bifet. Samoa: Scalable advanced massive online analysis. Journal of Machine Learning Research, 16:149--153, 2015.
[34]
K. Dembczy´nski, W. Waegeman, W. Cheng, and E. H¨ullermeier. On label dependence and loss minimization in multi-label classification. Mach. Learn., 88(1--2):5--45, July 2012.
[35]
G. Ditzler, M. D. Muhlbaier, and R. Polikar. Incremental learning of new classes in unbalanced datasets: Learn++.udnc. In International Workshop on Multiple Classifier Systems, pages 33--42, 2010.
[36]
G. Ditzler, R. Polikar, and N. Chawla. An incremental learning algorithm for non-stationary environments and class imbalance. In International Conference on Pattern Recognition, pages 2997--3000, 2010.
[37]
P. Domingos and G. Hulten. Mining high-speed data streams. In ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71-- 80, 2000.
[38]
A. R. T. Donders, G. J. Van Der Heijden, T. Stijnen, and K. G. Moons. A gentle introduction to imputation of missing values. Journal of clinical epidemiology, 59(10):1087--1091, 2006.
[39]
Y. Dong and N. Japkowicz. Threaded ensembles of autoencoders for stream learning. Computational Intelligence, 34(1):261--281, 2018.
[40]
D. M. dos Reis, P. Flach, S. Matwin, and G. Batista. Fast unsupervised online drift detection using incremental kolmogorov-smirnov test. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1545--1554, 2016.
[41]
K.-L. Du and M. N. Swamy. Neural Networks and Statistical Learning. Springer Publishing Company, Incorporated, 2013.
[42]
R. Elwell and R. Polikar. Incremental learning of concept drift in nonstationary environments. IEEE Transactions on Neural Networks, 22(10):1517--1531, 2011.
[43]
M. A. Faisal, Z. Aung, J. R.Williams, and A. Sanchez. Data-stream-based intrusion detection system for advanced metering infrastructure in smart grid: A feasibility study. IEEE Systems journal, 9(1):31--44, 2015.
[44]
W. Fan and A. Bifet. Mining big data: current status, and forecast to the future. ACM SIGKDD Explorations Newsletter, 14(2):1--5, 2013.
[45]
M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy. Mining data streams: a review. ACM Sigmod Record, 34(2):18--26, 2005.
[46]
M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybridbased approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4):463--484, 2012.
[47]
S. Galelli, G. B. Humphrey, H. R. Maier, A. Castelletti, G. C. Dandy, and M. S. Gibbs. An evaluation framework for input variable selection algorithms for environmental data-driven models. Environmental Modelling & Software, 62:33 -- 51, 2014.
[48]
J. Gama and P. Kosina. Learning about the learning process. In International Symposium on Intelligent Data Analysis, pages 162--172, 2011.
[49]
J. Gama and P. Kosina. Recurrent concepts in data streams classification. Knowledge and Information Systems, 40(3):489--507, 2014. SIGKDD Explorations Volume 21, Issue 1 Page 20
[50]
J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia. A survey on concept drift adaptation. ACM computing surveys (CSUR), 46(4):44, 2014.
[51]
S. Garc´a, S. Ram´rez-Gallego, J. Luengo, J. M. Ben´tez, and F. Herrera. Big data preprocessing: methods and prospects. Big Data Analytics, 1(1):9, 2016.
[52]
A. Ghazikhani, R. Monsefi, and H. S. Yazdi. Ensemble of online neural networks for non-stationary and imbalanced data streams. Neurocomputing, 122:535--544, 2013.
[53]
H. M. Gomes, J. P. Barddal, F. Enembreck, and A. Bifet. A survey on ensemble learning for data stream classification. ACM Computing Surveys, 50(2):23:1--23:36, 2017.
[54]
H. M. Gomes, A. Bifet, J. Read, J. P. Barddal, F. Enembreck, B. Pfharinger, G. Holmes, and T. Abdessalem. Adaptive random forests for evolving data stream classification. Machine Learning, 106(9- 10):1469--1495, 2017.
[55]
H. M. Gomes and F. Enembreck. Sae: Social adaptive ensemble classifier for data streams. In IEEE Symposium on Computational Intelligence and Data Mining, pages 199--206, April 2013.
[56]
H. M. Gomes and F. Enembreck. Sae2: Advances on the social adaptive ensemble classifier for data streams. In Proceedings of the 29th Annual ACM Symposium on Applied Computing (SAC), SAC 2014, pages 199--206, March 2014.
[57]
H. M. Gomes, J. Read, and A. Bifet. Streaming random patches for evolving data stream classification. In IEEE International Conference on Data Mining. IEEE, 2019.
[58]
M. Grzenda, H. M. Gomes, and A. Bifet. Delayed labelling evaluation for data streams. Data Mining and Knowledge Discovery, to appear.
[59]
S. Guha, N. Mishra, G. Roy, and O. Schrijvers. Robust random cut forest based anomaly detection on streams. In International Conference on Machine Learning, pages 2712--2721, 2016.
[60]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD Explorations newsletter, 11(1):10--18, 2009.
[61]
A. Haque, B. Parker, L. Khan, and B. Thuraisingham. Evolving big data stream classification with mapreduce. In International Conference on Cloud Computing (CLOUD), pages 570--577, 2014.
[62]
M. Harries and N. S.Wales. Splice-2 comparative evaluation: Electricity pricing. 1999.
[63]
S. Hashemi, Y. Yang, Z. Mirzamomen, and M. Kangavari. Adapted one-versus-all decision trees for data stream classification. IEEE Transactions on Knowledge and Data Engineering, 21(5):624--637, 2009.
[64]
M. J. Hosseini, A. Gholipour, and H. Beigy. An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams. Knowledge and Information Systems, 46(3):567--597, 2016.
[65]
L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. Tygar. Adversarial machine learning. In ACM workshop on Security and artificial intelligence, pages 43--58, 2011.
[66]
N. Jiang and L. Gruenwald. Research issues in data stream association rule mining. ACM Sigmod Record, 35(1):14--19, 2006.
[67]
T. Joachims. Transductive inference for text classification using support vector machines. In ICML, volume 99, pages 200--209, 1999.
[68]
I. Katakis, G. Tsoumakas, E. Banos, N. Bassiliades, and I. Vlahavas. An adaptive personalized news dissemination system. Journal of Intelligent Information Systems, 32(2):191--212, 2009.
[69]
R. Klinkenberg. Using labeled and unlabeled data to learn drifting concepts. In IJCAI Workshop on Learning from Temporal and Spatial Data, pages 16--24, 2001.
[70]
J. Z. Kolter, M. Maloof, et al. Dynamic weighted majority: A new ensemble method for tracking concept drift. In ICDM, pages 123--130, 2003.
[71]
P. Kosina and J. a. Gama. Very fast decision rules for classification in data streams. Data Mining and Knowledge Discovery, 29(1):168--202, Jan. 2015.
[72]
N. Kourtellis, G. D. F. Morales, A. Bifet, and A. Murdopo. Vht: Vertical hoeffding tree. In IEEE International Conference on Big Data, pages 915--922, 2016.
[73]
G. Krempl, I. Zliobaite, D. Brzezi´nski, E. H¨ullermeier, M. Last, V. Lemaire, T. Noack, A. Shaker, S. Sievi, M. Spiliopoulou, et al. Open challenges for data stream mining research. ACM SIGKDD Explorations newsletter, 16(1):1--10, 2014.
[74]
L. I. Kuncheva. A stability index for feature selection. In International Multi-Conference: Artificial Intelligence and Applications, AIAP'07, pages 390--395, 2007.
[75]
B. Kveton, H. H. Bui, M. Ghavamzadeh, G. Theocharous, S. Muthukrishnan, and S. Sun. Graphical model sketch. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 81--97, 2016.
[76]
J. Langford, L. Li, and A. Strehl. Vowpal Wabbit, 2007.
[77]
P. Lehtinen, M. Saarela, and T. Elomaa. Online chimerge algorithm. In Data Mining: Foundations and Intelligent Paradigms, pages 199--216. 2012.
[78]
M. Li, M. Liu, L. Ding, E. A. Rundensteiner, and M. Mani. Event stream processing with out-of-order data arrival. In International Conference on Distributed Computing Systems Workshops, pages 67--67, 2007. SIGKDD Explorations Volume 21, Issue 1 Page 21
[79]
E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, J. Gonzalez, K. Goldberg, and I. Stoica. Ray rllib: A composable and scalable reinforcement learning library. arXiv preprint arXiv:1712.09381, 2017.
[80]
V. L´opez, A. Fern´andez, S. Garc´a, V. Palade, and F. Herrera. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250:113--141, 2013.
[81]
V. Losing, B. Hammer, and H. Wersing. Incremental on-line learning: A review and comparison of state of the art algorithms. Neurocomputing, 275:1261--1274, 2018.
[82]
G. Louppe and P. Geurts. Ensembles on random patches. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 346--361. Springer, 2012.
[83]
D. Marron, J. Read, A. Bifet, T. Abdessalem, E. Ayguade, and J. Herrero. Echo state hoeffding tree learning. In R. J. Durrant and K.-E. Kim, editors, Asian Conference on Machine Learning, volume 63, pages 382--397, 2016.
[84]
M. Masud, J. Gao, L. Khan, J. Han, and B. M. Thuraisingham. Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Transactions on Knowledge and Data Engineering, 23(6):859--874, 2011.
[85]
M. M. Masud, J. Gao, L. Khan, J. Han, and B. Thuraisingham. A practical approach to classify evolving data streams: Training with limited amount of labeled data. In ICDM, pages 929--934. IEEE, 2008.
[86]
I. Mitliagkas, C. Caramanis, and P. Jain. Memory limited, streaming pca. In Advances in Neural Information Processing Systems, pages 2886--2894, 2013.
[87]
J. Montiel, J. Read, A. Bifet, and T. Abdessalem. Scikit-multiflow: A multi-output streaming framework. Journal of Machine Learning Research, 19(72), 2018.
[88]
M. D. Muhlbaier, A. Topalis, and R. Polikar. Learn++.nc: Combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE transactions on neural networks, 20(1):152--168, 2009.
[89]
J. Nelder and R. Mead. A simplex method for function minimization. The Computer Journal, 7:308--313, 1965.
[90]
H.-L. Nguyen, Y.-K. Woon, W.-K. Ng, and L. Wan. Heterogeneous ensemble for feature drifts in data streams. In P.-N. Tan, S. Chawla, C. K. Ho, and J. Bailey, editors, Advances in Knowledge Discovery and Data Mining, pages 1--12, 2012.
[91]
S. Nogueira and G. Brown. Measuring the stability of feature selection. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 442--457. Springer, 2016.
[92]
A. Oliver, A. Odena, C. A. Raffel, E. D. Cubuk, and I. Goodfellow. Realistic evaluation of deep semisupervised learning algorithms. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, pages 3238--3249. 2018.
[93]
N. Oza. Online bagging and boosting. In IEEE International Conference on Systems, Man and Cybernetics, volume 3, pages 2340--2345 Vol. 3, Oct 2005.
[94]
S. J. Pan, Q. Yang, et al. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345--1359, 2010.
[95]
B. Parker and L. Khan. Rapidly labeling and tracking dynamically evolving concepts in data streams. IEEE International Conference on Data Mining Workshops, 0:1161--1164, 2013.
[96]
B. Parker, A. M. Mustafa, and L. Khan. Novel class detection and feature via a tiered ensemble approach for stream mining. In IEEE International Conference on Tools with Artificial Intelligence, volume 1, pages 1171--1178, 2012.
[97]
B. S. Parker and L. Khan. Detecting and tracking concept class drift and emergence in non-stationary fast data streams. In AAAI Conference on Artificial Intelligence, 2015.
[98]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825--2830, 2011.
[99]
B. Pfahringer, G. Holmes, and R. Kirkby. Handling numeric attributes in hoeffding trees. In Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pages 296--307, 2008.
[100]
X. C. Pham, M. T. Dang, S. V. Dinh, S. Hoang, T. T. Nguyen, and A. W. C. Liew. Learning from data stream based on random projection and hoeffding tree classifier. In International Conference on Digital Image Computing: Techniques and Applications, pages 1--8, 2017.
[101]
C. Pinto and J. Gama. Partition incremental discretization. In Portuguese conference on artificial intelligence, pages 168--174, 2005.
[102]
J. Plasse and N. Adams. Handling delayed labels in temporally evolving data streams. In IEEE ICBD, pages 2416--2424, 2016.
[103]
S. Ram´rez-Gallego, B. Krawczyk, S. Garc´a, M. Wo´zniak, and F. Herrera. A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing, 239:39--57, 2017.
[104]
J. Read, A. Bifet, G. Holmes, and B. Pfahringer. Scalable and efficient multi-label classification for evolving data streams. Machine Learning, 88(1--2):243--272, 2012. SIGKDD Explorations Volume 21, Issue 1 Page 22
[105]
J. Read, L. Martino, and J. Hollm´en. Multi-label methods for prediction with sequential data. Pattern Recognition, 63(March):45--55, 2017.
[106]
J. Read, B. Pfahringer, G. Holmes, and E. Frank. Classifier chains for multi-label classification. Machine Learning, 85(3):333--359, 2011.
[107]
P. Reutemann and J. Vanschoren. Scientific workflow management with adams. In Machine Learning and Knowledge Discovery in Databases, pages 833--837. Springer, 2012.
[108]
P. Roy, A. Khan, and G. Alonso. Augmented sketch: Faster and more accurate stream processing. In International Conference on Management of Data, pages 1449--1463, 2016.
[109]
J. Rushing, S. Graves, E. Criswell, and A. Lin. A coverage based ensemble algorithm (cbea) for streaming data. In IEEE International Conference on Tools with Artificial Intelligence, pages 106--112, 2004.
[110]
M. Salehi and L. Rashidi. A survey on anomaly detection in evolving data:[with application to forest fire risk prediction]. ACM SIGKDD Explorations Newsletter, 20(1):13--23, 2018.
[111]
J. C. Schlimmer and R. H. Granger. Incremental learning from noisy data. Machine learning, 1(3):317--354, 1986.
[112]
A. Shrivastava, A. C. Konig, and M. Bilenko. Time adaptive sketches (ada-sketches) for summarizing data streams. In International Conference on Management of Data, pages 1417--1432, 2016.
[113]
V. Sindhwani, P. Niyogi, and M. Belkin. Beyond the point cloud: from transductive to semi-supervised learning. In ICML, pages 824--831, 2005.
[114]
K. O. Stanley and R. Miikkulainen. Efficient reinforcement learning through evolving neural network topologies. In Genetic and Evolutionary Computation Conference, page 9, San Francisco, 2002.
[115]
I. Stoica, D. Song, R. A. Popa, D. Patterson, M. W. Mahoney, R. Katz, A. D. Joseph, M. Jordan, J. M. Hellerstein, J. E. Gonzalez, et al. A berkeley view of systems challenges for ai. arXiv preprint arXiv:1712.05855, 2017.
[116]
W. N. Street and Y. Kim. A streaming ensemble algorithm (sea) for large-scale classification. In ACM SIGKDD international conference on Knowledge discovery and data mining, pages 377--382, 2001.
[117]
R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press, 1st edition, 1998.
[118]
L. Torgo, R. P. Ribeiro, B. Pfahringer, and P. Branco. Smote for regression. In Portuguese conference on artificial intelligence, pages 378--389. Springer, 2013.
[119]
A. Tsymbal. The problem of concept drift: definitions and related work. Technical report, 2004.
[120]
B. Veloso, J. Gama, and B. Malheiro. Self hyperparameter tuning for data streams. In International Conference on Discovery Science, page to appear, 2018.
[121]
I. Zliobaite, A. Bifet, J. Read, B. Pfahringer, and G. Holmes. Evaluation methods and decision theory for classification of streaming data with temporal dependence. Machine Learning, 98(3):455--482, 2014.
[122]
G. I. Webb. Contrary to popular belief incremental discretization can be sound, computationally efficient and extremely useful for streaming data. In ICDM, pages 1031--1036. IEEE, 2014.
[123]
G. I.Webb, R. Hyde, H. Cao, H. L. Nguyen, and F. Petitjean. Characterizing concept drift. Data Mining and Knowledge Discovery, 30(4):964--994, 2016.
[124]
G. Widmer and M. Kubat. Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1):69--101, Apr. 1996.
[125]
K. Wu, K. Zhang, W. Fan, A. Edwards, and S. Y. Philip. Rs-forest: A rapid density estimator for streaming anomaly detection. In ICDM, pages 600-- 609. IEEE, 2014.
[126]
X. Wu, P. Li, and X. Hu. Learning from concept drifting data streams with unlabeled data. Neurocomputing, 92:145--155, 2012.
[127]
X. Wu, X. Zhu, G.-Q. Wu, and W. Ding. Data mining with big data. IEEE transactions on knowledge and data engineering, 26(1):97--107, 2014.
[128]
T. Yang, L. Liu, Y. Yan, M. Shahzad, Y. Shen, X. Li, B. Cui, and G. Xie. Sf-sketch: A fast, accurate, and memory efficient data structure to store frequencies of data items. In ICDE, pages 103--106, 2017.
[129]
W. Yu, Y. Gu, and J. Li. Single-pass pca of large highdimensional data. In International Joint Conference on Artificial Intelligence, pages 3350--3356, 2017.
[130]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized streams: Fault-tolerant streaming computation at scale. In ACM Symposium on Operating Systems Principles, pages 423--438, 2013.
[131]
M.-L. Zhang and Z.-H. Zhou. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8):1819--1837, 2014.
[132]
Z. Zhao, F. Morstatter, S. Sharma, S. Alelyani, A. Anand, and H. Liu. Advancing feature selection research. ASU feature selection repository, pages 1--28, 2010.
[133]
G. Zhou, K. Sohn, and H. Lee. Online incremental feature learning with denoising autoencoders. In Artificial intelligence and statistics, pages 1453--1461, 2012.
[134]
I. Zliobaite. Change with delayed labeling: When is it detectable? In IEEE International Conference on Data Mining Workshops, pages 843--850, 2010.
[135]
I. Zliobaite, A. Bifet, J. Read, B. Pfahringer, and G. Holmes. Evaluation methods and decision theory for classification of streaming data with temporal dependence. Machine Learning, 98(3):455--482, 2015.

Cited By

View all
  • (2025)Towards more realistic evaluations: The impact of label delays in malware detection pipelinesComputers & Security10.1016/j.cose.2024.104122148(104122)Online publication date: Jan-2025
  • (2024)Bayesian quantile regression for streaming dataAIMS Mathematics10.3934/math.202412769:9(26114-26138)Online publication date: 2024
  • (2024)Data-Driven Process Monitoring and Fault Diagnosis: A Comprehensive SurveyProcesses10.3390/pr1202025112:2(251)Online publication date: 24-Jan-2024
  • Show More Cited By
  1. Machine learning for streaming data: state of the art, challenges, and opportunities

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGKDD Explorations Newsletter
    ACM SIGKDD Explorations Newsletter  Volume 21, Issue 2
    December 2019
    100 pages
    ISSN:1931-0145
    EISSN:1931-0153
    DOI:10.1145/3373464
    Issue’s Table of Contents
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 November 2019
    Published in SIGKDD Volume 21, Issue 2

    Check for updates

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)580
    • Downloads (Last 6 weeks)70
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Towards more realistic evaluations: The impact of label delays in malware detection pipelinesComputers & Security10.1016/j.cose.2024.104122148(104122)Online publication date: Jan-2025
    • (2024)Bayesian quantile regression for streaming dataAIMS Mathematics10.3934/math.202412769:9(26114-26138)Online publication date: 2024
    • (2024)Data-Driven Process Monitoring and Fault Diagnosis: A Comprehensive SurveyProcesses10.3390/pr1202025112:2(251)Online publication date: 24-Jan-2024
    • (2024)A Hybrid Optimization Model for Efficient Detection and Classification of Malware in the Internet of ThingsMathematics10.3390/math1210143712:10(1437)Online publication date: 7-May-2024
    • (2024)An Adaptive Active Learning Method for Multiclass Imbalanced Data Streams with Concept DriftApplied Sciences10.3390/app1416717614:16(7176)Online publication date: 15-Aug-2024
    • (2024)Learning manifolds from non-stationary streamsJournal of Big Data10.1186/s40537-023-00872-811:1Online publication date: 23-Mar-2024
    • (2024)A Systematic Literature Review of Novelty Detection in Data Streams: Challenges and OpportunitiesACM Computing Surveys10.1145/365728656:10(1-37)Online publication date: 12-Apr-2024
    • (2024)TensAIR: Real-Time Training of Neural Networks from Data-streamsProceedings of the 2024 8th International Conference on Machine Learning and Soft Computing10.1145/3647750.3647762(73-82)Online publication date: 26-Jan-2024
    • (2024)Evolutionary Multi-Objective Optimisation for Fairness-Aware Self Adjusting Memory Classifiers in Data StreamsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654038(258-266)Online publication date: 14-Jul-2024
    • (2024)AIM: Attributing, Interpreting, Mitigating Data UnfairnessProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671797(2014-2025)Online publication date: 25-Aug-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media