Abstract
In medical data classification, data reduction and improving classification performance are the important issues in the current scenario. In existing medical data classification methods, initially, the medical data pre-processing is performed. After pre-processing feature selection is performed, otherwise, the process is more time consuming and has poor accuracy. Here we have proposed two algorithms for enhancing the classification performance on medical data. In first proposed method Bag of Words technique is used for better feature subset selection. Subsequently, the hybrid Fuzzy-Neural Network approach used that can handle imprecision in data while classification. This combination of feature selection technique and Fuzzy-Neural Network classifier approach gives enhanced classification accuracy. In the second proposed algorithm, we have integrated data cleaning technique to improve data quality as pre-processing technique along with bag of words and Fuzzy-Neural Network, this method performs classification on clean filtered data with appropriately reduced feature set that results in more accurate classification than the existing methods. Thus in proposed approaches we have tried to handle three issues, removing noise in data, optimal feature subset selection and handling imprecision in data. The comparative study of various medical datasets in terms of accuracy shows that the two proposed algorithms perform better as compared to existing techniques and the enhancement obtained is around 3% and 17% respectively. In addition the performance of Bag of Words feature selection method used in the proposed system is compared with two feature selection methods LSFS and SFFS.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ajam N (2015) Heart diseases diagnoses using artificial neural network, business administration college babylon university, network and complex system. http://www.iiste.org ISSN 2224-610X (Paper) ISSN 2225—0603 (Online) Vol. 5, No. 4
Alzubi R, Ramzan N, Alzoubi H, Amira A(2018) A hybrid feature selection method for complex diseases SNPs, IEEE Access, vol. 6, pp 1292–1301
Angelov P, R Yager (2013) Density-based averaging—a new operator for data fusion. Inf Sci 222:163–174
Anooj PK (2012) Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules. Elsevier Comput Inf Sci 24(1):27–40
Baruah RD, P Angelov (2012) Evolving local means method for clustering of streaming data, In: 2012 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8
Dennis B, Muthukrishnan S (2014) AGFS: adaptive genetic fuzzy system for medical data classification. Elsevier Appl Soft Comput 24:242–252
Do QH, Chen JF (2013) A neuro-fuzzy approach in the classification of students academic performance, Hindawi Publ Corp Comput Intell Neurosci, 2013:1–7
Galathiya S, Ganatra AP, Bhensdadia CK (2012) Improved decision tree induction algorithm with feature selection, cross validation, model complexity, and reduced error pruning, (IJCSIT) Int J Comput Sci Inf Technol, Vol. 3(2):3427–3431
George J, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine learning: proceedings of the eleventh international conference, Rutgers University, New Brunswick, NJ, July 10–13, 1994, pp 121–129
Gorzałczany MB, Rudziński F (2017) Interpretable and accurate medical data classification-a multi-objective genetic-fuzzy optimization approach. Elsevier Expert Syst Appl 71:26–39
Harb HM, Desuky AS (2014) Feature selection on classification of medical datasets based on particle swarm optimization. Int J Comput Appl 104(5):14–17
Jayanthi SK, Sasikala S (2014) Naive bayesian classifier and PCA for web link spam detection. Comput Sci Telecommun 41(1):3–15
Juhola M, Joutsijoki H, Aalto H, Hirvonen TP (2014) On classification in the case of a medical data set with a complicated distribution. Elsevier Appl Comput Inf 10(2):52–67
Khaleel MA, Pradham SK, Dash GN (2013) A survey of data mining techniques on medical data for finding locally frequent diseases. Int J Adv Res Comput Sci Softw Eng 3(8):149–153
Kharya S (2012) Using data mining techniques for diagnosis and prognosis of cancer disease. Int J Comput Sci Eng Inf Technol 2(2):55–66
Kumar V, Minz S (2014) Feature selection: a literature review. Smart Comput Rev 4:3
Kuncheva LI, Faithfull WJ (2014) PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Trans Neural Netw Learn Syst 25(1):69–80
Liu Y, Zhang H, Chen M, Zhang L (2016) A boosting-based spatial-spectral model for stroke patients’ EEG analysis in rehabilitation training. IEEE Trans Neural Syst Rehabil Eng 24(1):169–179
Niranjana Murthy HS, Meenakshi M (2013) Ann model to predict coronary heart disease based on risk factors. Bonfring Int J Man Mach Interface 3(2):13–18
Park HW, Li D, Piao Y, Ryu KH (2017) A hybrid feature selection method to classification and its application in hypertension diagnosis. In: Bursa M, Holzinger A, Renda M, Khuri S (eds) Information technology in bio- and medical Informatics. ITBAM 2017, vol 10443. Lecture notes in computer science. Springer, Cham
Patil DV, Bichkar RS (2012) Issues in optimization of decision tree learning: a survey. Int J Appl Inf Syst (IJAIS) 3(5):13–29
Peng Y, Wu Z, Jiang J (2010) A novel feature selection approach for biomedical data classification, school of informatics, university of Bradford. UK J Biomed Inf 43:(2010) 15–23
Samb ML, Camara F, Ndiaye S, Slimani Y, Esseghir MA (2012) A novel RFE–SVM-based feature selection approach for classification. Int J Adv Sci Technol 43:27–36
Sánchez-Maroño N, Alonso-Betanzos A, Tmobile-Sanromán M (2007) Filter methods for feature selection—a comparative study, intelligent data engineering and automated learning—IDEAL 2007. Lecture notes in computer science, vol 4881. Springer, Berlin
Setiawan D, Kusuma WA, Wigena AH (2017), Sequential forward floating selection with two selection criteria, In: 2017 international conference on advanced computer science and information systems (ICACSIS), Bali, pp 395–400
Sharma S, Agrawal J, Agarwal S, Sharma S (2013) Machine learning techniques for data mining: a survey, In: Proceedings of computational intelligence and computing research (ICCIC), IEEE international conference on 26–28 Dec 2013, pp 1–6
Sumalatha G, Muniraj NJR(2013) Survey on Medical Diagnosis Using Data Mining Techniques. In: IEEE proceedings of international conference on optical imaging sensor and security, Coimbatore, Tamil Nadu, India, July 2–3
Tarle B, Jena S (2017a) An artificial neural network based pattern classification algorithm for diagnosis of heart disease. In: IEEE proceedings of international conference on computing, communication, control and automation (ICCUBEA) on 17–18 Aug 2017, Pune. pp 1–4
Tarle B, Jena S (2017b) Improved artificial neural network (ANN) with aid of artificial bee colony (ABC) for medical data classification. Int J Bus Integilince Data Min. https://doi.org/10.1504/IJBIDM.2017.10010713
Tomar D, Agarwal S (2013) A survey on data mining approaches for healthcare. Int J Bio-Sci Bio-Technol 5(5):241–266
Usha Rani K (2011) Analysis of heart disease dataset using neural network approach. IJDKP 1(5):1–8
Xu S, Dai J, Shi H (2018) Semi-supervised Feature Selection Based on Least Square Regression with Redundancy Minimization, In: 2018 international joint conference on neural networks (IJCNN), Rio de Janeiro, pp 1–8
Yahya AA, Osman A, Ramli AR, Balola A (2011) Feature selection for high dimensional data: an evolutionary filter approach. J Comput Sci 7(5):800–820. https://doi.org/10.3844/jcssp.2011.800.820
Yusuke Adachi N, Onimura T, Yamashita SH (2016) Standard measure and SVM measure for feature selection and their performance effect for text classification, In: iiWAS ‘16 ACM proceedings of the 18th international conference on information integration and web-based applications and services Singapore, pp 262–266
Zhao R, Mao K (2018) Fuzzy bag-of-words model for document representation. IEEE Trans Fuzzy Syst 26:794–804
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tarle, B., Chintakindi, S. & Jena, S. Integrating multiple methods to enhance medical data classification. Evolving Systems 11, 133–142 (2020). https://doi.org/10.1007/s12530-019-09272-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-019-09272-x