IMPROVING CLASSIFICATION PERFORMANCE OF NEURO-FUZZY CLASSIFIER BY IMPUTING MISSING DATA
DOI:
https://doi.org/10.47839/ijc.18.4.1619Keywords:
Classifier, Imputation, Neuro-fuzzy Classifier, Training tuples, Missing data.Abstract
In medical data classification, if the size of data sets is small and if it contains multiple missing attribute values, in such cases improving classification performance is an important issue. The foremost objective of machine learning research is to improve the classification performance of the classifiers. The number of training instances provided for training must be sufficient in size. In the proposed algorithm, we substitute missing attribute values with attribute available domain values and generate additional training tuples that are in addition to original training tuples. These additional, plus original training samples provide sufficient data samples for learning. The neuro-fuzzy classifier trained on this dataset. The classification performance on test data for the neuro-fuzzy classifier is obtained using the k-fold cross-validation method. The proposed method attains around 2.8% and 3.61% improvement in classification accuracy for this classifier.References
P. Flach, Machine Learning: The Art and Science of Algorithms that Make Sense of Data, Cambridge University Press, Edition 2012.
J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, 3rd Edition, Morgan Kaufmann Publishers Inc. San Francisco, USA, 2011.
B. Tarle, R. Tajanpure, S. Jena, “Medical data classification using different optimization techniques: A survey,” International Journal of Research in Engineering and Technology (IJRET), vol. 5, Special Issue 5, ICIAC 2016, pp. 101-108, May 2016.
D.V. Patil, R.S. Bichkar, “Improving generalization ability of classifier with multiple imputation techniques,” ICIP 2012, Communications in Computer and Information Science, vol. 292, Springer, Berlin, Heidelberg, pp. 309-317, 2012.
R. W. Krause, M. Huisman, C. Steglich and T. A. Sniiders, “Missing network data a comparison of different imputation methods,” Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, 2018, pp. 159-163.
A. M. Kalteh and P. Hjorth, “Imputation of missing values
in the precipitation-run process database,” Journal of Hydrology Research, vol. 40, issue 4, pp. 420-432, 2009.
Jaemun Sim, Jonathan Sangyun Lee, and Ohbyung Kwon, “Missing Values and Optimal Selection of an Imputation Method and Classification Algorithm to Improve the Accuracy of Ubiquitous Computing Applications,” Mathematical Problems in Engineering, vol. 2015, pp. 1-14, 2015.
P. V. de Campos Souza, L. C. B. Torres, A. J. Guimaraes, V. S. Araujo, V. J. S. Araujo, and T. S. Rezende, “Self-organized direction aware for regularized fuzzy neural networks” Evolving Systems, pp. 1–15, 2019. http://doi-org-443.webvpn.fjmu.edu.cn/10.1007/s12530-019-09278-5
C. de Bodt, D. Mulders, M. Verleysen and J. A. Lee, “Nonlinear dimensionality reduction with missing data using parametric multiple imputations,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 4, pp. 1166-1179, 2019.
H. Kang, “The prevention and handling of the missing data,” Korean Journal of Anaesthesiology, vol. 64, issue 5, pp. 402-406, 2013.
M.R. Mosavi, A. Ayatollahi and S. Afrakhteh, “An efficient method for classifying motor imagery using CPSO-trained ANFIS prediction,” Evolving Systems, pp. 1-18, 2019.
R. K. Nowicki, “On classification with missing data using rough–neuro-fuzzy systems,” Int. J. Appl. Math. Computer Science, vol. 20, no. 1, pp. 55–67, 2010.
S. Faisal and G. Tutz, “Nearest neighbor imputation for categorical data by weighting of attributes,” arXiv: 1710.01011v1 [stat.ME] 3 Oct 2017.
M. Albayrak, K. Turhan and B. Kurt, “A missing data imputation approach using clustering and maximum likelihood estimation,” Proceedings of the Medical Technologies National Congress, Trabzon, 2017, pp. 1-4.
X. Ma, Y. Jin, and Q. Dong, “A generalized dynamic fuzzy neural network based on singular spectrum analysis optimized by brain storm optimization for short-term wind speed forecasting,” Applied Soft Computing, vol. 54, pp. 296–312, 2017.
O. Akande, F. Li & J. Reiter, “An empirical comparison of multiple imputation methods for categorical data,” The American Statistician, vol. 71, no. 2, pp. 162-170, 2017.
B. Tarle, Ch. Sanjay, S. Jena, “Integrating multiple methods to enhance medical data classification,” Journal Evolving Systems, Publisher Springer Berlin Heidelberg, pp. 1-10, 2019.
Ezzine and L. Benhlima, “A study of handling missing data methods for big data,” Proceedings of the IEEE 5th International Congress on Information Science and Technology CIST, Marrakech, 2018, pp. 498-501.
S. P. Susanti and F. N. Azizah, “Imputation of missing value using dynamic Bayesian network for multivariate time series data,” Proceedings of the International Conference on Data and Software Engineering, 2017, pp. 1-5.
N. Anindita, H. A. Nugroho and T. B. Adji, “A combination of multiple imputations and principal component analysis to handle missing value with the arbitrary pattern”, Proceedings of the 7th International Annual Engineering Seminar (AES), Yogyakarta, 2017, pp. 1-5.
S. Azim and S. Aggarwal, “Using fuzzy c means and multi-layer perceptron for data imputation: Simple v/s complex dataset,” Proceedings of the 3rd International Conference on Recent Advances in Information Technology (RAIT), Dhanbad, 2016, pp. 197-202.
Q. H. Do and J.-F. Chen, “A neuro-fuzzy approach in the classification of students academic performance,” Computational Intelligence and Neuroscience, vol. 2013, pp. 1-7, 2013.
M. Juhola, H. Joutsijoki, H. Aalto, and T. P. Hirvonen, “On classification in the case of a medical data set with a complicated distribution,” Elsevier Applied Computing and Informatics, vol. 10, no. 2, pp. 52-67, 2014.
M. B. Gorzałczany, and F. Rudziński, “Interpretable and accurate medical data classification – a multi-objective genetic-fuzzy optimization approach,” Expert Systems with Applications, pp. 1-17, 2016.
Lin, J., Li, N., Alam, M.A. and Yuqing Ma l,. “Data-driven missing data imputation in cluster monitoring system based on deep neural network”. Applied Intelligence, pp,1-18,2019. doi:10.1007/s10489-019-01560-y
D. Dua, and C. Graff, UCI Machine Learning Repository, Irvine, the University of California, 2019. [Online]. Available at http://archive.ics.uci.edu/ml.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.