Imputing missing value through ensemble concept based on statistical measures

Moslem Mohammadi Jenghara^1,2,
Hossein Ebrahimpour-Komleh²,
Vahideh Rezaie^3,4,
Samad Nejatian^4,5,
Hamid Parvin^4,6 &
…
Sharifah Kamilah Syed Yusof⁷

470 Accesses
Explore all metrics

Abstract

Many datasets include missing values in their attributes. Data mining techniques are not applicable in the presence of missing values. So an important step in preprocessing of a data mining task is missing value management. One of the most important categories in missing value management techniques is missing value imputation. This paper presents a new imputation technique. The proposed imputation technique is based on statistical measurements. The suggested imputation technique employs an ensemble of the estimators built to estimate the missing values based on positive and negative correlated observed attributes separately. Each estimator guesses a value for a missed value based on the average and variance of that feature. The average and variance of the feature are estimated from the non-missed values of that feature. The final consensus value for a missed value is the weighted aggregation of the values estimated by different estimators. The chief weight is attribute correlation, and the slight weight is dependent to kernel function such as kurtosis, skewness, number of involved samples and composition of them. The missing values are deliberately produced randomly at different levels. The experimentations indicate that the suggested technique has a good accuracy in comparison with the classical methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble Learning for Heterogeneous Missing Data Imputation

Multiple Imputation Ensembles (MIE) for Dealing with Missing Data

Article Open access 23 April 2020

Multiple Imputation and Ensemble Learning for Classification with Incomplete Data

References

Zhang S (2011) Shell-neighbor method and its application in missing data imputation. Appl Intell 35:123–133
Article Google Scholar
Conrady S, Jouffe L (2011) Missing values imputation. Bayesia, Changé
Google Scholar
Eirola E, Doquire G, Verleysen M, Lendasse A (2013) Distance estimation in numerical data sets with missing values. Inf Sci 240:115–128
Article MathSciNet MATH Google Scholar
Zhu B, He C, Liatsis P (2012) A robust missing value imputation method for noisy data. Appl Intell 36:61–74
Article Google Scholar
Ghannad-Rezaie M, Soltanian-Zadeh H, Ying H, Dong M (2010) Selection fusion approach for classification of datasets with missing values. Pattern Recognit 43:2340–2350
Article MATH Google Scholar
Ibrahim JG, Chen M-H, Lipsitz SR, Herring AH (2005) Missing-data methods for generalized linear models: a comparative review. J Am Stat Assoc 100:332–346
Article MathSciNet MATH Google Scholar
Kang P (2013) Locally linear reconstruction based missing value imputation for supervised learning. Neurocomputing 118:65–78
Article Google Scholar
Acuña E, Rodriguez C (2004) The treatment of missing values and its effect on classifier accuracy. In: Banks D, McMorris FR, Arabie P, Gaul W (eds) Classification, clustering, and data mining applications. Studies in classification, data analysis, and knowledge organisation. Springer, Berlin, Heidelberg
Hron K, Templ M, Filzmoser P (2010) Imputation of missing values for compositional data using classical and robust methods. Comput Stat Data Anal 54:3095–3107
Article MathSciNet MATH Google Scholar
Silva-Ramrez E-L, Pino-Mejas R, Lpez-Coello M, Cubiles-de-la-Vega M-D (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw 24:121–129
Article Google Scholar
Stekhoven DJ, Bhlmann P (2012) MissForest non-parametric missing value imputation for mixed-type data. Bioinformatics 28:112–118
Article Google Scholar
Qin Y, Zhang S, Zhu X, Zhang J, Zhang C (2007) Semi-parametric optimization for missing data imputation. Appl Intell 27:79–88
Article MATH Google Scholar
Theodoridis S, Koutroumbas K (2003) Pattern recognition
Wang J (2003) Data mining: opportunities and challenges. IGI Global, Hershey
Book Google Scholar
Schafer JL (2010) Analysis of incomplete multivariate data. CRC Press, Boca Raton
MATH Google Scholar
Liu Y, Brown SD (2013) Comparison of five iterative imputation methods for multivariate classification. Chemom Intell Lab Syst 120:106–115
Article Google Scholar
Farhangfar A, Kurgan L, Dy J (2008) Impact of imputation of missing values on classification error for discrete data. Pattern Recognit 41:3692–3705
Article MATH Google Scholar
Ford B (1983) An overview of hot deck procedures. In: Madow W, Nisselson H, Olkin I (eds) Incomplete data in sample surveys, theory and bibliographies, vol 2. Academic Press, pp 185–207
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38
MathSciNet MATH Google Scholar
Ghahramani Z, Jordan M (1994) Supervised learning from incomplete data via an EM approach. In: Cowan JD, Tesauro G, Alspector J (eds) Advances in neural information processing systems, vol 6, pp 120–127
Liao Z, Lu X, Yang T, Wang H (2009) Missing data imputation: a fuzzy K-means clustering algorithm over sliding window. In: Sixth international conference on fuzzy systems and knowledge discovery, FSKD’09, pp 133–137
Zhang S, Zhang J, Zhu XF, Qin YQ, Zhang C (2008) Missing value imputation based on data clustering. In: Gavrilova ML, Tan CJK (eds) Transactions on computational science I, vol 4750. Springer, Berlin, Heidelberg, pp 128–138
Rubin DB (1976) Inference and missing data. Biometrika 63:581–592
Article MathSciNet MATH Google Scholar
Ennett CM, Frize M, Walker CR (2008) Imputation of missing values by integrating neural networks and case-based reasoning. In: 30th annual international conference of the IEEE on engineering in medicine and biology society, 2008. EMBS 2008, pp 4337–4341
Grzymała-Busse J, Hu M (2001) A comparison of several approaches to missing attribute values in data mining. In: Ziarko W, Yao Y (eds) Rough sets and current trends in computing. Lecture notes in computer science, vol 2005. Springer, Berlin, Heidelberg, pp 378–385
Su X, Greiner R, Khoshgoftaar TM, Napolitano A (2011) Using classifier-based nominal imputation to improve machine learning. In: Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 124–135
Hruschka ER, Jr Hruschka ER, Ebecken NFF (2003) Evaluating a nearest-neighbor method to substitute continuous missing values. In: The 16th Australian joint conference on artificial intelligence. Lecture notes in artificial intelligence (LNAI), vol 2903. Springer, pp 723–734
Van Hulse J, Khoshgoftaar TM (2014) Incomplete-case nearest neighbor imputation in software measurement data. Inf Sci 259:596–610
Article Google Scholar
Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1992) Statistical description of data. In: Numerical recipes in FORTRAN: The art of scientific computing, 2nd edn, Chap 14. Cambridge University Press, Cambridge, England, pp 603–649
Frank A, Asuncion A (2010) UCI machine learning repository. In: School of Information and Computer Science. University of California, Irvine, CA, vol 213. http://archive.ics.uci.edu/ml

Download references

Acknowledgements

We thank anonymous reviewers for their very useful comments and suggestions.

Author information

Authors and Affiliations

Department of Information Technology, Payame Noor University, Miandoab, Iran
Moslem Mohammadi Jenghara
Department of Computer and Electrical Engineering, University of Kashan, Kashan, Iran
Moslem Mohammadi Jenghara & Hossein Ebrahimpour-Komleh
Department of Mathematics, Islamic Azad University, Yasooj Branch, Yasooj, Iran
Vahideh Rezaie
Young Researchers and Elite Club, Islamic Azad University, Yasooj Branch, Yasooj, Iran
Vahideh Rezaie, Samad Nejatian & Hamid Parvin
Department of Electrical Engineering, Islamic Azad University, Yasooj Branch, Yasooj, Iran
Samad Nejatian
Department of Computer Engineering, Islamic Azad University, Yasooj Branch, Yasooj, Iran
Hamid Parvin
UTM-MIMOS Centre of Excellence, Faculty of Electrical Engineering, Universiti Teknologi Malaysia (UTM), Johor Bahru, Malaysia
Sharifah Kamilah Syed Yusof

Authors

Moslem Mohammadi Jenghara
View author publications
You can also search for this author in PubMed Google Scholar
Hossein Ebrahimpour-Komleh
View author publications
You can also search for this author in PubMed Google Scholar
Vahideh Rezaie
View author publications
You can also search for this author in PubMed Google Scholar
Samad Nejatian
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Parvin
View author publications
You can also search for this author in PubMed Google Scholar
Sharifah Kamilah Syed Yusof
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samad Nejatian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jenghara, M.M., Ebrahimpour-Komleh, H., Rezaie, V. et al. Imputing missing value through ensemble concept based on statistical measures. Knowl Inf Syst 56, 123–139 (2018). https://doi.org/10.1007/s10115-017-1118-1

Download citation

Received: 10 January 2017
Revised: 10 August 2017
Accepted: 10 October 2017
Published: 24 October 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10115-017-1118-1

Imputing missing value through ensemble concept based on statistical measures

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Ensemble Learning for Heterogeneous Missing Data Imputation

Multiple Imputation Ensembles (MIE) for Dealing with Missing Data

Multiple Imputation and Ensemble Learning for Classification with Incomplete Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Imputing missing value through ensemble concept based on statistical measures

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Ensemble Learning for Heterogeneous Missing Data Imputation

Multiple Imputation Ensembles (MIE) for Dealing with Missing Data

Multiple Imputation and Ensemble Learning for Classification with Incomplete Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation