Abstract
The patterns of missing values are important for assessing the quality of a classification data set and the validation of classification results. The paper discusses the critical patterns of missing values in a classification data set: missing at random, uneven symmetric missing, and uneven asymmetric missing. It proposes a self-organizing maps (SOM) based cluster analysis method to visualize the patterns of missing values in classification data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bello, A.L.: Imputation techniques in regression analysis: Looking closely at their implementation. Computational Statistics and Data Analysis 20, 45–57 (1995)
Chan, P., Dunn, O.J.: The treatment of missing values in discriminant analysis. Journal of the American Statistical Association 6, 473–477 (1972)
Deboeck, G., Kohonen, T.: Visual Explorations in Finance with Self-Organizing Maps. Springer, London, UK (1998)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B39(1), 1–38 (1997)
Gnanadesikan, R., Kettenring, J.R.: Discriminant analysis and clustering. Statistical Science 14(1), 34–69 (1989)
Hand, D.J.: Discrimination and Classification. Wiley, New York (1981)
Hand, D.J.: Data mining: Statistics and more? The American Statistician 52(2), 112–118 (1998)
Hartigan, J.A.: Clustering Algorithms. Wiley, New York, NY (1995)
Kalton, G., Kasprzyk, D.: The treatment of missing survey data. Survey Methodology 12, 1–16 (1986)
Kohonen, T.: Self-Organization and Associative Memory, 3rd edn. Springer, Heidelberg (1989)
Little, R.J.A., Rubin, D.B. (eds.): Statistical Analysis with Missing Data, 2nd edn. John Wiley and Sons, New York (2002)
Mundfrom, D.J., Whitcomb, A.: Imputing missing values: The effect on the accuracy of classification. Multiple Linear Regression Viewpoints 25(1), 13–19 (1998)
Romesburg, H.C.: Cluster Analysis for Researchers, Robert E. Krieger: Malabar, FL (1990)
Seber, G.A.F.: Multivariate Observations. Wiley, New York, NY (1984)
Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering 5(6), 970–974 (1996)
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Bostein, D., Altman, R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Wang, H., Wang, S.: Data mining with incomplete data, in Encyclopedia of Data Warehousing and Mining. In: Wang, J. (ed.), Idea Group Inc. Hershey, PA, pp. 293–296 (2005)
Yang, Q., Ling, C., Chai, X., Pan, R.: Test-cost sensitive classification on data with missing values. IEEE Transactions on Knowledge and Data Engineering 18(5), 626–638 (2006)
Zhang, S., Qin, Z., Ling, C., Sheng, S.: “Missing is useful”: Missing values in cost-sensitive decision trees. IEEE Transactions on Knowledge and Data Engineering 17(12), 1689–1693 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, H., Wang, S. (2007). Visualization of the Critical Patterns of Missing Values in Classification Data. In: Qiu, G., Leung, C., Xue, X., Laurini, R. (eds) Advances in Visual Information Systems. VISUAL 2007. Lecture Notes in Computer Science, vol 4781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76414-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-76414-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76413-7
Online ISBN: 978-3-540-76414-4
eBook Packages: Computer ScienceComputer Science (R0)