On-line learning from streaming data with delayed attributes: a comparison of classifiers and strategies

Mónica Millán-Giraldo¹,
J. Salvador Sánchez¹ &
V. Javier Traver¹

239 Accesses
3 Citations
Explore all metrics

Abstract

In many real applications, data are not all available at the same time, or it is not affordable to process them all in a batch process, but rather, instances arrive sequentially in a stream. The scenario of streaming data introduces new challenges to the machine learning community, since difficult decisions have to be made. The problem addressed in this paper is that of classifying incoming instances for which one attribute arrives only after a given delay. In this formulation, many open issues arise, such as how to classify the incomplete instance, whether to wait for the delayed attribute before performing any classification, or when and how to update a reference set. Three different strategies are proposed which address these issues differently. Orthogonally to these strategies, three classifiers of different characteristics are used. Keeping on-line learning strategies independent of the classifiers facilitates system design and contrasts with the common alternative of carefully crafting an ad hoc classifier. To assess how good learning is under these different strategies and classifiers, they are compared using learning curves and final classification errors for fifteen data sets. Results indicate that learning in this stringent context of streaming data and delayed attributes can successfully take place even with simple on-line strategies. Furthermore, active strategies behave generally better than more conservative passive ones. Regarding the classifiers, it was found that simple instance-based classifiers such as the well-known nearest neighbor may outperform more elaborate classifiers such as the support vector machines, especially if some measure of classification confidence is considered in the process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online Machine Learning Algorithms over Data Streams

Stream Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Agarwal C (2004) On-demand classification of data streams. In: Proceedings of the ACM international conference on knowledge discovery and data mining, pp 503–508
Agarwal C (2007) Data streams: models and algorithms. Springer, New York
Google Scholar
Asuncion A, Newman DJ (2007) UCI machine learning repository. School of Information and Computer Science, University of California, Irvine, CA. http://www.archive.ics.uci.edu/ml/
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 1–16
Bruzzone L, Roli R, Serpico SB (1995) An extension of the Jeffreys–Matusita distance to multiclass cases for feature selection. IEEE Trans Geosci Remote Sens 33(6):1318–1321
Article Google Scholar
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar
Ganti V, Gehrke J, Ramakrishnan R (2001) Demon: mining and monitoring evolving data. IEEE Trans Knowl Data Eng 13(1):50–63
Article Google Scholar
Gelman A, Meng XL (2004) Applied Bayesian modeling and causal inference from incomplete data perspectives. Wiley, Chichester
Book MATH Google Scholar
Hashemi S, Yang Y (2009) Flexible decision tree for data stream classification in the presence of concept change, noise and missing values. Data Min Knowl Discov 19(1):95–131
Article MathSciNet Google Scholar
Keerthi SS, Lin CJ (2003) Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput 15(7):1667–1689
Article MATH Google Scholar
Kuncheva LI (2008) Classifier ensembles for detecting concept change in streaming data: overview and perspectives. In: Proceedings of the 2nd workshop on supervised and unsupervised ensemble methods and their applications, pp 5–10
Maimon O, Rokach L (2005) Data mining and knowledge discovery handbook. Springer Science+Business Media, New York
Book MATH Google Scholar
Marwala T (2009) Computational intelligence for missing data imputation, estimation and management: knowledge optimization techniques. Information Science Reference, Hershey
Book Google Scholar
Millán-Giraldo M, Sánchez JS, Traver VJ (2009) Exploring early classification strategies of streaming data with delayed attributes. In: 16th International conference on neural information processing, LNCS 6863, part I, Bangkok, pp 875–883
Millán-Giraldo M, Duin RPW, Sánchez JS (2010) Dissimilarity-based classification of data with missing attributes. In: The 2nd international workshop on cognitive information processing (submitted)
Muthukrishnan S (2005) Data streams: algorithms and applications. Found Trends Theor Comput Sci 1(2):117–236
Article MathSciNet Google Scholar
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
MATH Google Scholar
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
MATH Google Scholar
Saar-Tsechansky M, Provost F (2007) Handling missing values when applying classification models. J Mach Learn Res 8:1625–1657
Google Scholar
Street WN, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the 7th international conference on knowledge discovery and data mining, pp 377–382
Takeuchi J, Yamanishi K (2006) A unifying framework for detecting outliers and change points from time series. IEEE Trans Knowl Data Eng 18(4):482–492
Article Google Scholar
Tsymbal A (2004) The problem of concept drift: definitions and related work. Technical report, Department of Computer Science, Trinity College, Dublin
Vázquez F, Sánchez JS, Pla F (2005) A stochastic approach to Wilsons editing algorithm. In: Proceedings of the 2nd Iberian conference on pattern recognition and image analysis, pp 35–42
Widyantoro DH, Yen J (2005) Relevant data expansion for learning concept drift from sparsely labeled data. IEEE Trans Knowl Data Eng 17(3):401–412
Article Google Scholar

Download references

Acknowledgments

This work has been supported in part by the Spanish Ministry of Education and Science under grants CSD2007-00018 Consolider Ingenio 2010 and TIN2009-14205, and by Fundació Caixa Castelló—Bancaixa under grant P1-1B2009-04.

Author information

Authors and Affiliations

Dept. Lenguajes y Sistemas Informáticos, Institute of New Imaging Technologies, Universitat Jaume I, Av. Vicent Sos Baynat s/n, 12071, Castellón, Spain
Mónica Millán-Giraldo, J. Salvador Sánchez & V. Javier Traver

Authors

Mónica Millán-Giraldo
View author publications
You can also search for this author in PubMed Google Scholar
J. Salvador Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
V. Javier Traver
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Salvador Sánchez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Millán-Giraldo, M., Sánchez, J.S. & Traver, V.J. On-line learning from streaming data with delayed attributes: a comparison of classifiers and strategies. Neural Comput & Applic 20, 935–944 (2011). https://doi.org/10.1007/s00521-010-0402-8

Download citation

Received: 05 May 2010
Accepted: 03 June 2010
Published: 17 June 2010
Issue Date: October 2011
DOI: https://doi.org/10.1007/s00521-010-0402-8

On-line learning from streaming data with delayed attributes: a comparison of classifiers and strategies

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online Machine Learning Algorithms over Data Streams

Online Machine Learning Algorithms over Data Streams

Stream Classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

On-line learning from streaming data with delayed attributes: a comparison of classifiers and strategies

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Online Machine Learning Algorithms over Data Streams

Online Machine Learning Algorithms over Data Streams

Stream Classification

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation