A Comparison of Several Approaches to Missing Attribute Values in Data Mining

Jerzy W. Grzymala-Busse² &
Ming Hu³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2005))

Included in the following conference series:

International Conference on Rough Sets and Current Trends in Computing

5548 Accesses
130 Citations

Abstract

In the paper nine different approaches to missing attribute values are presented and compared. Ten input data files were used to investigate the performance of the nine methods to deal with missing attribute values. For testing both naive classification and new classification techniques of LERS (Learning from Examples based on Rough Sets) were used. The quality criterion was the average error rate achieved by ten-fold cross-validation. Using the Wilcoxon matched-pairs signed rank test, we conclude that the C4.5 approach and the method of ignoring examples with missing attribute values are the best methods among all nine approaches; the most common attribute-value method is the worst method among all nine approaches; while some methods do not differ from other methods significantly. The method of assigning to the missing attribute value all possible values of the attribute and the method of assigning to the missing attribute value all possible values of the attribute restricted to the same concept are excellent approaches based on our limited experimental results. However we do not have enough evidence to support the claim that these approaches are superior.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Booker, L. B., Goldberg, D. E., and Holland, J. F.: Classifier systems and genetic algorithms. In Machine Learning. Paradigms and Methods. Carbonell, J. G. (ed.), The MIT Press, Cambridge MA (1990) 235–282.
Google Scholar
Chiu, D. K. and Wong A. K. C.: Synthesizing knowledge: A cluster analysis approach using event-covering. IEEE Trans. Syst., Man, and Cybern. SMC-16 (1986), 251–259.
Article Google Scholar
Clark, P. Niblett, T.: The CN2 induction algorithm. Machine Learning 3 (1989) 261–283.
Google Scholar
Grzymala-Busse, J. W.: On the unknown attribute values in learning from examples. Proc. of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, Charlotte, North Carolina, October 162-19, 1991, Lecture Notes in Artificial Intelligence, vol. 542. Springer-Verlag, Berlin Heidelberg New York (1991) 368–377.
Google Scholar
Grzymala-Busse, J. W.: LERS-A System for Learning from Examples Based on Rough Sets. In: Slowinski, R. (ed.): Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory. Kluwer Academic Publishers, Boston MA (1992) 3–18.
Google Scholar
Grzymala-Busse, J. W. and Wang A. Y.: Modified algorithms LEM1 and LEM2 for rule induction from data with missing attribute values. Proc. of the Fifth International Workshop on Rough Sets and Soft Computing (RSSC'97) at the Third Joint Conference on Information Sciences (JCIS'97), Research Triangle Park, NC, March 22-5, 1997, 69–72.
Google Scholar
Hamburg, M.: Statistical Analysis for Decision Making. Harcourt Brace Jovanovich, Inc., New York NY (1983) 546–550, 721.
Google Scholar
Holland, J. H., Holyoak K. J., and Nisbett, R. E.: Induction. Processes of Inference, Learning, and Discovery. The MIT Press, Cambridge MA (1986).
Google Scholar
Knonenko, I., Bratko, and I. Roskar, E.: Experiments in automatic learning of medical diagnostic rules. Technical Report, Jozef Stefan Institute, Lljubljana, Yugoslavia, 1984.
Google Scholar
Michalski, R. S., Mozetic, I., Hong, J. and Lavrac, N.: The AQ15 inductive learning system: An overview and experiments. Department of Computer Science, University of Illinois, Rep. UIUCDCD-R-86-1260, 1986.
Google Scholar
Polkowski, L. and Skowron, A. (eds.): Rough Sets in Knowledge Discovery, 2, Applications, Case Studies and Software Systems, Appendix 2: Software Systems. Physica Verlag, Heidelberg New York (1998) 551–601.
Google Scholar
Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo CA (1993).
Google Scholar
Stefanowski, J.: On rough set based approaches to induction of decision rules. In Polkowski L., Skowron A. (eds.) Rough Sets in Data Mining and Knowledge Discovery. Physica Verlag, Heidelberg New York (1998) 500–529.
Google Scholar
Wong, K. C. and Chiu, K. Y.: Synthesizing statistical knowledge for incomplete mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 9 (1987) 796–805.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA
Jerzy W. Grzymala-Busse
JP Morgan, New York, NY 10260, USA
Ming Hu

Authors

Jerzy W. Grzymala-Busse
View author publications
You can also search for this author in PubMed Google Scholar
Ming Hu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Regina Regina, S4S 0A2, Saskatchewan, Canada
Wojciech Ziarko & Yiyu Yao &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grzymala-Busse, J.W., Hu, M. (2001). A Comparison of Several Approaches to Missing Attribute Values in Data Mining. In: Ziarko, W., Yao, Y. (eds) Rough Sets and Current Trends in Computing. RSCTC 2000. Lecture Notes in Computer Science(), vol 2005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45554-X_46

Download citation

DOI: https://doi.org/10.1007/3-540-45554-X_46
Published: 18 December 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43074-2
Online ISBN: 978-3-540-45554-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics