Abstract
Prediabetes is a type of hyperglycemia in which patients have blood glucose levels above normal but below the threshold for type 2 diabetes mellitus (T2DM). Prediabetic patients are considered to be at high risk for developing T2DM, but not all will eventually do so. Because it is difficult to identify which patients have an increased risk of developing T2DM, we developed a model of several clinical and laboratory features to predict the development of T2DM within a 2-year period. We used a supervised machine learning algorithm to identify at-risk patients from among 1647 obese, hypertensive patients. The study period began in 2005 and ended in 2018. We constrained data up to 2 years before the development of T2DM. Then, using a time series analysis with the features of every patient, we calculated one linear regression line and one slope per feature. Features were then included in a K-nearest neighbors classification model. Feature importance was assessed using the random forest algorithm. The K-nearest neighbors model accurately classified patients in 96% of cases, with a sensitivity of 99%, specificity of 78%, positive predictive value of 96%, and negative predictive value of 94%. The random forest algorithm selected the homeostatic model assessment–estimated insulin resistance, insulin levels, and body mass index as the most important factors, which in combination with KNN had an accuracy of 99% with a sensitivity of 99% and specificity of 97%. We built a prognostic model that accurately identified obese, hypertensive patients at risk for developing T2DM within a 2-year period. Clinicians may use machine learning approaches to better assess risk for T2DM and better manage hypertensive patients. Machine learning algorithms may help health care providers make more informed decisions.
Similar content being viewed by others
References
Skyler JS, Bakris GL, Bonifacio E, Darsow T, Eckel RH, Groop L, Groop P-H, Handelsman Y, Insel RA, Mathieu C, McElvaine AT, Palmer JP, Pugliese A, Schatz DA, Sosenko JM, Wilding JPH, Ratner RE (2016) Differentiation of diabetes by pathophysiology, natural history, and prognosis. Diabetes, page db160806
Sarwar N, Gao P, Kondapally Seshasai S R, Gobin R, Kaptoge S, Di Angelantonio E, Ingelsson E, Lawlor D A, Selvin E, Stampfer M, Stehouwer C D A, Lewington S, Pennells L, Thompson A, Sattar N, White I R, Ray K K, Danesh J (2010) Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies. Lancet (London, England) 375 (9733):2215–2222
DeFronzo RA, Ferrannini E (1991) Insulin resistance. A multifaceted syndrome responsible for NIDDM, obesity, hypertension, dyslipidemia, and atherosclerotic cardiovascular disease. Diabetes Care 14(3):173–194
Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes—2018. Diabetes Care, 41(Supplement 1):S13 LP – S27 jan 2018
Cerf M (2013) Beta cell dysfunction and insulin resistance
Stevens JW, Khunti K, Harvey R, Johnson M, Preston L, Buckley Woods H, Davies M, Goyder E (2015) Preventing the progression to type 2 diabetes mellitus in adults at high risk: a systematic review and network meta-analysis of lifestyle, pharmacological and surgical interventions. Diabetes Res Clin Pract 107(3):320–331
Fonseca VA (2009) Defining and characterizing the progression of type 2 diabetes, vol 32
Garber A, Handelsman Y, Einhorn D, Bergman D, Bloomgarden Z, Fonseca V, Garvey WT, Gavin J III, Grunberger G, Horton E et al (2008) Diagnosis and management of prediabetes in the continuum of hyperglycemia—when do the risks of diabetes begin? A consensus statement from the american college of endocrinology and the american association of clinical endocrinologists. Endocrine Pract 14(7):933–946
Swain A, Mohanty S N, Das AC (2016) Comparative risk analysis on prediction of diabetes mellitus using machine learning approach. In: 2016 international conference on electrical, electronics, and optimization Techniques (ICEEOT), pp 3312–3317
Pradeep Kandhasamy J, Balamurali S (2015) Performance analysis of classifier models to predict diabetes mellitus. Proc Comput Sci 47:45–51
Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H (2018) Predicting diabetes mellitus with machine learning techniques. Frontiers in Genetics 9:515
Xu W, Zhang J, Zhang Q, Wei X (2017) Risk prediction of type II diabetes based on random forest model. In: 3rd international conference on advances in electrical, electronics, information, communication and bio-informatics (AEEICB), pp 382–386, vol 2017
Sisodia D, Sisodia DS (2018) Prediction of diabetes using classification algorithms. Proc Comput Sci 132:1578–1585
Ribeiro ÁC, Barros AK, Santana E, Príncipe JC (2015) Diabetes classification using a redundancy reduction preprocessor
Gandhi KK, Prajapati NB (2014) Diabetes prediction using feature selection and classification
Jayalakshmi T, Santhakumaran A (2010) A novel classification method for diagnosis of diabetes mellitus using artificial neural networks. In: 2010 international conference on data storage and data engineering, pp 159–163
Saxena K Dr, Khan Z, Singh S Diagnosis of diabetes mellitus using K nearest neighbor algorithm
Panwar M, Acharyya A, Shafik R A, Biswas D (2016) K-nearest neighbor based methodology for accurate diagnosis of diabetes mellitus. In: 2016 6th international symposium on embedded computing and system design (ISED), pp 132–136
Dua D, Taniskidou KE (2017) UCI machine learning repository [http://archive.ics.uci.edu/ml]. University of California, School of Information and Computer Science. Irvine, CA, 144
Shu T, Zhang B, Tang Y Y (2016) Using K-NN with weights to detect diabetes mellitus based on genetic algorithm feature selection. In: 2016 international conference on wavelet analysis and pattern recognition (ICWAPR), pp 12–17
Nai-arun N, Moungmai R (2015) Comparison of classifiers for the risk of diabetes prediction. Proc Comput Sci 69:132–142
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Tang J, Alelyani S, Liu H (2014) Feature selection for classification A review. Data classification: algorithms and applications, pp 37
Liaw A, Wiener M, et al. (2002) Classification and regression by randomforest. R News 2(3):18–22
Ng K, Steinhubl SR, DeFilippi C, Dey S, Stewart WF (2016) Early detection of heart failure using electronic health records: practical implications for time before diagnosis, data diversity, data quantity, and data density. Circ-Cardiovasc Qual Outcomes 9(6):649–658
Garcia-Carretero R, Barquero-Perez O, Mora-Jimenez I, Soguero-Ruiz C, Goya-Esteban R, Ramos-Lopez J (2019) Identification of clinically relevant features in hypertensive patients using penalized regression: a case study of cardiovascular events. Med Biol Eng Comput 57(9):2011–2026
Garcia-Carretero R, Vigil-Medina L, Mora-Jimenez I, Soguero-Ruiz C, Goya-Esteban R, Ramos-Lopez J, Barquero-Perez O (2018) Cardiovascular risk assessment in prediabetic patients in a hypertensive population: the role of cystatin C. Diabetes and metabolic syndrome: Clinical research and reviews
Garcia-Carretero R, Vigil-Medina L, Barquero-Perez O, Goya-Esteban R, Mora-Jimenez I, Soguero-Ruiz C, Ramos-Lopez J (2017) Cystatin C as a predictor of cardiovascular outcomes in a hypertensive population. Journal of human hypertension
Lepot M, Aubin J-B, Clemens F (2017) Interpolation in time series: an introductive overview of existing methods, their performance criteria and uncertainty assessment. Water 9(10):796
Kuhn M, Johnson K (2013) Applied predictive modeling
Alkhatatbeh MJ, Abdul-Razzak KK, Khasawneh LQ, Saadeh NA (2017) High prevalence of vitamin d deficiency and correlation of serum vitamin d with cardiovascular risk in patients with metabolic syndrome. Metabolic Syndrome and Related Disorders 15(5):213–219
Al-Timimi Dhia J, Ali Ardawan F (2013) Serum 25 (oh) d in diabetes mellitus type 2: relation to glycaemic control. J Clin Diagn Res JCDR 7(12):2686
Venables W N, Ripley B D (2002) Modern applied statistics with S, 4th edn. Springer, New York
R Core Team (2017) R: A language and environment for statistical computing
DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44(3):837–845
Sing T, Sander O, Beerenwinkel N, Lengauer T (2005) ROCR: visualizing classifier performance in R, vol 21
Saaristo T, Moilanen L, Korpi-Hyovalti E, Vanhala M, Saltevo J, Niskanen L, Jokelainen J, Peltonen M, Oksa H, Tuomilehto J, Uusitupa M, Keinanen-Kiukaanniemi S (2010) Lifestyle intervention for prevention of type 2 diabetes in primary health care: one-year follow-up of the Finnish National Diabetes Prevention Program (FIN-D2D). Diabetes Care 33(10):2146–2151
Saaristo T, Peltonen M, Keinanen-Kiukaanniemi S, Vanhala M, Saltevo J, Niskanen L, Oksa H, Korpi-Hyovalti E, Tuomilehto J (2007) National type 2 diabetes prevention programme in Finland: FIN-D2D. Int J Circ Health 66(2):101–112
Meijnikman AS, De Block CEM, Verrijken A, Mertens I, Van Gaal LF (2018) Predicting type 2 diabetes mellitus: a comparison between the findrisc score and the metabolic syndrome. Diabetol Metab Syndr 10 (1):12
Vandersmissen GJ, Godderis Lode (2015) Evaluation of the finnish diabetes risk score (findrisc) for diabetes screening in occupational health care. Int J Occup Med Environ Health 28(3):587–91
Wilson PWF, Meigs JB, Sullivan L, Fox CS, Nathan DM, Sr D’Agostino RB (2007) Prediction of incident diabetes mellitus in middle-aged adults: the Framingham Offspring Study. Arch Int Med 167(10):1068–107
Martinez-Millana A, Argente-Pla M, Martinez BV, Salcedo VT, Merino-Torres JF (2019) Driving type 2 diabetes risk scores into clinical practice: performance analysis in hospital settings. J Clin Med 8(1):107
Srikanthan P, Karlamangla AS (2011) Relative muscle mass is inversely associated with insulin resistance and prediabetes. Findings from the third national health and nutrition examination survey. J Clin Endocrinol Metab 96 (9):2898–2903
Wimalawansa SJ (2018) Associations of vitamin d with insulin resistance, obesity, type 2 diabetes, and metabolic syndrome. J Steroid Biochem Mol Biol 175:177–189
Lima LMTR (2017) Prediabetes definitions and clinical outcomes
Haffner SM, Mykkanen L, Festa A, Burke JP, Stern MP (2000) Insulin-resistant prediabetic subjects have more atherogenic risk factors than insulin-sensitive prediabetic subjects: implications for preventing coronary heart disease during the prediabetic state. Circulation 101(9):975–980
Funding
This work was partially funded by Research Project Nos. TEC2016-75361-R and TEC2016-75161-C2-1-R from the Spanish Government and by Research Project No. DTS17/00158 from the Instituto de Salud Carlos III (Spain).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Rights and permissions
About this article
Cite this article
Garcia-Carretero, R., Vigil-Medina, L., Mora-Jimenez, I. et al. Use of a K-nearest neighbors model to predict the development of type 2 diabetes within 2 years in an obese, hypertensive population. Med Biol Eng Comput 58, 991–1002 (2020). https://doi.org/10.1007/s11517-020-02132-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-020-02132-w