Abstract
Diabetes mellitus is a well-known chronic disease that diminishes the insulin producing capability of the human body. This results in high blood sugar level which might lead to various complications such as eye damage, nerve damage, cardiovascular damage, kidney damage and stroke. Although diabetes has attracted huge research attention, the overall performance of such medical disease classification using machine learning techniques is relatively low, majorly due to existence of class imbalance and missing values in the data. In this paper, we propose a novel Prediction Model using Synthetic Minority Oversampling Technique, Genetic Algorithm and Decision Tree (PMSGD) for Classification of Diabetes Mellitus on Pima Indians Diabetes Database (PIDD) dataset. The framework of the proposed PMSGD prediction model is composed of four different layers. The first layer is the pre-processing layer which is responsible for handling missing values, detection of outlier and oversampling the minority class. In the second layer, the most significant features are selected using correlation and genetic algorithm. In the third layer, the proposed model is trained, and its effectiveness is evaluated in the fourth layer in terms of classification accuracy (CA), classification error (CE), precision, recall (sensitivity), measure (FM), and Area_Under_ROC (AUROC). The proposed PMSGD algorithm clearly outperforms its counterparts and achieves a remarkable accuracy of 82.1256%. The best outcome achieved by the proposed system in terms of CA, CE, precision, sensitivity, FM and AUROC is 82.1256%, 17.8744%, 0.8070%, 0.8598, 0.8326 and 0.8511, respectively. The obtained simulation results show the effectiveness and superiority of our proposed PMSGD model and their by reduced error rate to help in decision-making process.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Amutha, A., Mohan, V.: Diabetes complications in childhood and adolescent onset type 2 diabetes—a review. J. Diabetes Complicat. 30(5), 951–957 (2016). https://doi.org/10.1016/j.jdiacomp.2016.02.009
Domingueti, C.P., Dusse, L.M., Carvalho, M.D., Sousa, L.P., Gomes, K.B., Fernandes, A.P.: Diabetes mellitus: the linkage between oxidative stress, inflammation, hypercoagulability and vascular complications. J. Diabetes Complicat. 30(4), 738–745 (2016). https://doi.org/10.1016/j.jdiacomp.2015.12.018
World health organization statistics on diabetes. http://www.who.int/mediacentre/factsheets/fs312/en/. Accessed 02 Mar 2020
Pham, H.N., Triantaphyllou, E.: Prediction of diabetes by employing a new data mining approach which balances fitting and generalization. Comput. Inf. Sci. Stud. Comput. Intell. (2008). https://doi.org/10.1007/978-3-540-79187-4_2
Wild, S., Roglic, G., Green, A., Sicree, R., King, H.: Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. Diabetes Care 27(5), 1047–1053 (2004). https://doi.org/10.2337/diacare.27.5.1047
Wang, X., Bi, D., Wang, S.: Fault recognition with labeled multi-category support vector machine. In: Third international conference on natural computation (ICNC 2007). https://doi.org/10.1109/icnc.2007.382(2007)
Zhang, B., Wei, Z., Ren, J., Cheng, Y., Zheng, Z.: An empirical study on predicting blood pressure using classification and regression trees. IEEE Access 6, 21758–21768 (2018). https://doi.org/10.1109/access.2017.2787980
Tejedor, M., Woldaregay, A.Z., Godtliebsen, F.: Reinforcement learning application in diabetes blood glucose control: A systematic review. Artif. Intell. Med. 104, 101836 (2020). https://doi.org/10.1016/j.artmed.2020.101836
Pramanik, P.K., Solanki, A., Debnath, A., Nayyar, A., El-Sappagh, S., Kwak, K.: Advancing modern healthcare with nanotechnology, nanobiosensors, and internet of nano things: taxonomies, applications, architecture, and challenges. IEEE Access 8, 65230–65266 (2020). https://doi.org/10.1109/access.2020.2984269
Nielsen, K.B., Lautrup, M.L., Andersen, J.K., Savarimuthu, T.R., Grauslund, J.: Deep learning-based algorithms in screening of diabetic retinopathy: a systematic review of diagnostic performance. Ophthalmology Retina 3(4), 294–304 (2019). https://doi.org/10.1016/j.oret.2018.10.014
Remeseiro, B., Bolon-Canedo, V.: A review of feature selection methods in medical applications. Comput. Biol. Med. 112, 103375 (2019). https://doi.org/10.1016/j.compbiomed.2019.103375
Santos, B.S., Steiner, M.T., Fenerich, A.T., Lima, R.H.: Data mining and machine learning techniques applied to public health problems: a bibliometric analysis from 2009 to 2018. Comput. Ind. Eng. 138, 106120 (2019). https://doi.org/10.1016/j.cie.2019.106120
Rendón, E., Alejo, R., Castorena, C., Isidro-Ortega, F.J., Granda-Gutiérrez, E.E.: Data sampling methods to deal with the big data multi-class imbalance problem. Appl. Sci. 10(4), 1276 (2020). https://doi.org/10.3390/app10041276
Kumar, A., Krishnamurthi, R., Nayyar, A., Sharma, K., Grover, V., Hossain, E.: A novel smart healthcare design, simulation, and implementation using healthcare 4.0 processes. IEEE Access 8, 118433–118471 (2020). https://doi.org/10.1109/access.2020.3004790
Thabtah, F., Hammoud, S., Kamalov, F., Gonsalves, A.: Data imbalance in classification: experimental evaluation. Inf. Sci. 513, 429–441 (2020). https://doi.org/10.1016/j.ins.2019.11.004
Hu, T., Sung, S.Y.: Detecting pattern-based outliers. Pattern Recognit. Lett. 24(16), 3059–3068 (2003). https://doi.org/10.1016/s0167-8655(03)00165-x
Maniruzzaman, M., Rahman, M.J., Al-Mehedihasan, M., Suri, H.S., Abedin, M.M., El-Baz, A., Suri, J.S.: Accurate diabetes risk stratification using machine learning: role of missing value and outliers. J. Med. Syst. (2018). https://doi.org/10.1007/s10916-018-0940-7
Ijaz, M., Alfian, G., Syafrudin, M., Rhee, J.: Hybrid prediction model for type 2 diabetes and hypertension using DBSCAN-based outlier detection, synthetic minority over sampling technique (SMOTE), and random forest. Appl. Sci. 8(8), 1325 (2018). https://doi.org/10.3390/app8081325
Shuja, M., Mittal, S., Zaman, M.: Effective prediction of type II diabetes mellitus using data mining classifiers and SMOTE. Adv. Comput. Intell. Syst. Algorithms Intell. Syst. (2020). https://doi.org/10.1007/978-981-15-0222-4_17
Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., Tang, H.: Predicting diabetes mellitus with machine learning techniques. Front. Genet. (2018). https://doi.org/10.3389/fgene.2018.00515
Barakat, N., Bradley, A.P., Barakat, M.N.: intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans. Inf Technol. Biomed. 14(4), 1114–1120 (2010). https://doi.org/10.1109/titb.2009.2039485
Ganji, M.F., Abadeh, M.S.: A fuzzy classification system based on Ant Colony Optimization for diabetes disease diagnosis. Expert Syst. Appl. 38(12), 14650–14659 (2011). https://doi.org/10.1016/j.eswa.2011.05.018
Karegowda, A.G., Manjunath, A., Jayaram, M.: Application of genetic algorithm optimized neural network connection weights for medical diagnosis of PIMA Indians diabetes. Int. J. Soft Comput. 2(2), 15–23 (2011). https://doi.org/10.5121/ijsc.2011.2202
Aslam, M.W., Zhu, Z., Nandi, A.K.: Feature generation using genetic programming with comparative partner selection for diabetes classification. Expert Syst. Appl. 40(13), 5402–5412 (2013). https://doi.org/10.1016/j.eswa.2013.04.003
Han, L., Luo, S., Yu, J., Pan, L., Chen, S.: Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE J. Biomed. Health Inform. 19(2), 728–734 (2015). https://doi.org/10.1109/jbhi.2014.2325615
Hayashi, Y., Yukita, S.: Rule extraction using recursive-rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset. Inform. Med. Unlocked 2, 92–104 (2016). https://doi.org/10.1016/j.imu.2016.02.001
Li, H., Wang, Y., Zhang, G.: Probabilistic fuzzy classification for stochastic data. IEEE Trans. Fuzzy Syst. 25(6), 1391–1402 (2017). https://doi.org/10.1109/tfuzz.2017.2687402
Cheruku, R., Edla, D.R., Kuppili, V., Dharavath, R.: RST-BatMiner: a fuzzy rule miner integrating rough set feature selection and Bat optimization for detection of diabetes disease. Appl. Soft Comput. 67, 764–780 (2018). https://doi.org/10.1016/j.asoc.2017.06.032
Sharma, A.: Guided stochastic gradient descent algorithm for inconsistent datasets. Appl. Soft Comput. 73, 1068–1080 (2018). https://doi.org/10.1016/j.asoc.2018.09.038
Wang, Q., Cao, W., Guo, J., Ren, J., Cheng, Y., Davis, D.N.: DMP_MI: an effective diabetes mellitus classification algorithm on imbalanced data with missing values. IEEE Access 7, 102232–102238 (2019). https://doi.org/10.1109/access.2019.2929866
Ontiveros-Robles, E., Melin, P.: A hybrid design of shadowed type-2 fuzzy inference systems applied in diagnosis problems. Eng. Appl. Artif. Intell. 86, 43–55 (2019). https://doi.org/10.1016/j.engappai.2019.08.017
Zhang, X., Jiang, Y., Hu, W., Wang, S.: A parallel ensemble fuzzy classifier for diabetes diagnosis. J. Med. Imaging Health Inform. 10(3), 544–551 (2020). https://doi.org/10.1166/jmihi.2020.2972
Das, H., Naik, B., Behera, H.: Medical disease analysis using neuro-fuzzy with feature extraction model for classification. Inform. Med. Unlocked 18, 100288 (2020). https://doi.org/10.1016/j.imu.2019.100288
Nnamoko, N., Korkontzelos, I.: Efficient treatment of outliers and class imbalance for diabetes prediction. Artif. Intell. Med. 104, 101815 (2020). https://doi.org/10.1016/j.artmed.2020.101815
Ameena, R.R., Ashadevi, B.: Predictive analysis of diabetic women patients using R. Syst. Simul. Model. Cloud Comput. Big Data Appl. (2020). https://doi.org/10.1016/b978-0-12-819779-0.00006-x
Tan, F.H., Hor, C.P., Lim, S.L., Tong, C.V., Hong, J.Y., Zain, F.M., Yeow, T.P.: Traditional and emerging cardiometabolic risk profiling among Asian youth with type 2 diabetes: a case-control study. Obes. Med. 18, 100206 (2020). https://doi.org/10.1016/j.obmed.2020.100206
American Diabetes Association: Classification and diagnosis of diabetes: standards of medical care in diabetes—2020. Diabetes Care 43(Supplement 1), S14–S31 (2020). https://doi.org/10.2337/dc20-s002
Heslinga, F.G., Pluim, J.P., Houben, A., Schram, M.T., Henry, R.M., Stehouwer, C.D., Veta, M.: Direct classification of type 2 diabetes from retinal fundus images in a population-based sample from The Maastricht Study. Med. Imaging 2020 Comput. Aided Diagn. (2020). https://doi.org/10.1117/12.2549574
Albahli, S.: Type 2 machine learning: an effective hybrid prediction model for early type 2 diabetes detection. J. Med. Imaging Health Inform. 10(5), 1069–1075 (2020). https://doi.org/10.1166/jmihi.2020.3000
Zhu, C., Idemudia, C.U., Feng, W.: Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Inform. Med. Unlocked 17, 100179 (2019). https://doi.org/10.1016/j.imu.2019.100179
Alshamlan, H., Taleb, H. B., Sahow, A. A.: A gene prediction function for type 2 diabetes mellitus using logistic regression. In: 2020 11th International conference on information and communication systems (ICICS). https://doi.org/10.1109/icics49469.2020.239549 (2020)
Lukmanto, R.B., Suharjito, N.A., Akbar, H.: Early detection of diabetes mellitus using feature selection and fuzzy support vector machine. Proc. Comput. Sci. 157, 46–54 (2019). https://doi.org/10.1016/j.procs.2019.08.140
Tripathi, D., Manoj, I., Prasanth, G.R., Neeraja, K., Varma, M.K., Reddy, B.R.: Survey on classification and feature selection approaches for disease diagnosis. Emerg. Res. Data Eng. Syst. Comput. Commun. Adv. Intell. Syst. Comput. (2020). https://doi.org/10.1007/978-981-15-0135-7_52
Dzulkalnine, M.F., Sallehuddin, R.: Missing data imputation with fuzzy feature selection for diabetes dataset. SN Appl. Sci. (2019). https://doi.org/10.1007/s42452-019-0383-x
Zhou, M., Sun, S.D.: GA principle and application. National Defense industry press, Beijing (1999)
Mantawy, A., Abdel-Magid, Y., Selim, S.: Integrating genetic algorithms, tabu search, and simulated annealing for the unit commitment problem. IEEE Trans. Power Syst. 14(3), 829–836 (1999). https://doi.org/10.1109/59.780892
Han, X., Dong, Y., Yue, L., Xu, Q.: State transition simulated annealing algorithm for discrete-continuous optimization problems. IEEE Access 7, 44391–44403 (2019). https://doi.org/10.1109/access.2019.2908961
Hughes, G.: On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 14(1), 55–63 (1968). https://doi.org/10.1109/tit.1968.1054102
Abdel-Aal, R.: GMDH-based feature ranking and selection for improved classification of medical data. J. Biomed. Inform. 38(6), 456–468 (2005). https://doi.org/10.1016/j.jbi.2005.03.003
Zaki, M.J., Meira, W., Jr.: Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press (2014)
Sun, K., Likhate, S., Vittal, V., Kolluri, V.S., Mandal, S.: An online dynamic security assessment scheme using phasor measurements and decision trees. IEEE Trans. Power Syst. 22(4), 1935–1943 (2007). https://doi.org/10.1109/tpwrs.2007.908476
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953
Kaur, P., Kaur, R.: Comparative analysis of classification techniques for diagnosis of diabetes. Adv. Intell. Syst. Comput. Adv. Bioinform. Multimedia Electron. Circuits Signals (2019). https://doi.org/10.1007/978-981-15-0339-9_17
Hemeida, A.M., Hassan, S.A., Mohamed, A.A.A., Alkhalaf, S., Mahmoud, M.M., Senjyu, T., El-Din, A.B.: Nature-inspired algorithms for feed-forward neural network classifiers: a survey of one decade of research. Ain Shams Eng. J. (2020). https://doi.org/10.1016/j.asej.2020.01.007
Hasan, M.K., Alam, M.A., Das, D., Hossain, E., Hasan, M.: Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access 8, 76516–76531 (2020). https://doi.org/10.1109/ACCESS.2020.2989857
Tama, B.A., Rhee, K.: Tree-based classifier ensembles for early detection method of diabetes: an exploratory study. Artif. Intell. Rev. 51(3), 355–370 (2017). https://doi.org/10.1007/s10462-017-9565-3
Rehman, A., Naz, S., Razzak, I.: Leveraging big data analytics in healthcare enhancement: trends, challenges and opportunities. Multimedia Syst. 21, 1–33 (2021)
Hossain, M.S., Muhammad, G., Alamri, A.: Smart healthcare monitoring: a voice pathology detection paradigm for smart cities. Multimedia Syst. 25(5), 565–575 (2019)
Li, J., Zhang, B., Lu, G., You, J., Zhang, D.: Body surface feature-based multi-modal learning for diabetes mellitus detection. Inf. Sci. 472, 1–14 (2019)
Tama, B.A., Rhee, K.H.: Tree-based classifier ensembles for early detection method of diabetes: an exploratory study. Artif. Intell. Rev. 51(3), 355–370 (2019)
Islam, M.M., Rahman, M.J., Roy, D.C., Maniruzzaman, M.: Automated detection and classification of diabetes disease based on Bangladesh demographic and health survey data, 2011 using machine learning approach. Diabetes Metab. Syndr. 14(3), 217–219 (2020)
Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Proc. Comput. Sci. 132, 1578–1585 (2018). https://doi.org/10.1016/j.procs.2018.05.122
Larabi-Marie-Sainte, A., Almohaini, R., Saba, T.: Current techniques for diabetes prediction: review and case study. Appl. Sci. 9(21), 4604 (2019). https://doi.org/10.3390/app9214604
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest to report regarding the present study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Azad, C., Bhushan, B., Sharma, R. et al. Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus. Multimedia Systems 28, 1289–1307 (2022). https://doi.org/10.1007/s00530-021-00817-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-021-00817-2