Abstract
Colorectal cancer (CRC) is a common cancer responsible for approximately 600,000 deaths per year worldwide. Thus, it is very important to find the related factors and detect the cancer accurately. However, timely and accurate prediction of the disease is challenging. In this study, we build an integrated model based on logistic regression (LR) and support vector machine (SVM) to classify the CRC into cancer and normal samples. From various factors, human location, age, gender, BMI, and cancer tumor type, tumor grade, and DNA, of the cancer, we select the most significant factors (p < 0.05) using logistic regression as main features, and with these features, a grid-search SVM model is designed using different kernel types (Linear, radial basis function (RBF), Sigmoid, and Polynomial). The result of the logistic regression indicates that the Firmicutes (AUC 0.918), Bacteroidetes (AUC 0.856), body mass index (BMI) (AUC 0.777), and age (AUC 0.710) and their combined factors (AUC 0.942) are effective for CRC detection. And the best kernel type is RBF, which achieves an accuracy of 90.1% when k = 5, and 91.2% when k = 10. This study provides a new method for colorectal cancer prediction based on independent risky factors.
Similar content being viewed by others
References
Zadeh SA, Sj SMC, Mohammadi Z (2017) A novel and reliable computational intelligence system for breast cancer detection. Germ J Med Biol Eng Comp 9:1–12
Pal JK, Ray SS, Pal SK (2015) Identifying relevant group of miRNAs in cancer using fuzzy mutual information. Germ J Medical & Biological Engineering & Computing 54:701–710
Chan AT, Giovannucci EL (2010) Primary prevention of colorectal cancer. J Gastroenterol 138:2029–2043
Saleh M, Trinchieri G (2010) Innate immune mechanisms of colitis and colitis-associated colorectal cancer. N Eng J Nature Rev Immunol 11:9–20
Brennan CA, Garrett WS (2016) Gut microbiota, inflammation, and colorectal cancer. US J Ann Rev Microbiol 70:395–411
Chatterjee S, Dey N, Shi F, Ashour AS et al (2017) Clinical application of modified bag-of-features coupled with hybrid neural-based classifier in dengue fever classification using gene expression data. Germ J Med Biol Eng Comp:1–12
Ay A, Gong D, Kahveci T (2014) Network-based prediction of cancer under genetic storm. J Cancer Inform 13:15–31
Jung KJ, Won D, Jeon C et al (2015) A colorectal cancer prediction model using traditional and genetic risk scores in Koreans. N Eng J BMC Genet 16:1–7
Cubiella J, Vega P, Salve M et al (2016) Development and external validation of a fecal immunochemical test-based prediction model for colorectal cancer detection in symptomatic patients. J BMC Med 14:128–140
Coppedè F, Grossi E, Lopomo A et al (2015) Application of artificial neural networks to link genetic and environmental factors to DNA methylation in colorectal cancer. N Eng J Epigenomics 7:175–186
Peng Y, Zhai Z, Li Z et al (2015) Role of blood tumor markers in predicting metastasis and local recurrence after curative resection of colon cancer. J Int J Clin Exp Med 8:982–990
Juan M, Philippe W, Nermin G et al (2016) An original stepwise multilevel logistic regression analysis of discriminatory accuracy: the case of neighborhoods and health. US J Plos One 11:e0153778
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. US J Mach Learn 46:389–422
Ahmad F, Mat Isa NA, Hussain Z, Osman MK, Sulaiman SN (2015) GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer. J Pattern Analysis Appl 18:861–870
Peng S, Xu Q, Ling XB, Peng X, du W, Chen L (2003) Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. J Febs Lett 555:358–362
Liu W, Zheng W L, Lu B L (2016) Emotion recognition using multimodal deep learning
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. US J Inform Sci 282:111–135
Li T, Zhang C, Ogihara M (2004) A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. N Eng J Bioinform 20:2429–2437
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. J ACM Trans Intel Systems & Technol 2:1–27
Park SI, Tae-Ho O (2016) Application of receiver operating characteristic (ROC) curve for evaluation of diagnostic test performance. J Vet Clin 33:97–108
Kim KA, Choi JY, Yoo TK, Kim SK, Chung KS, Kim DW (2013) Mortality prediction of rats in acute hemorrhagic shock using machine learning techniques. Germ J Med Biol Eng Comp 51:1059–1067
Chowdhury A R, Chatterjee T, Banerjee S (2018) A random forest classifier-based approach in the detection of abnormalities in the retina. Germ J Med Biol Eng Comp Available at doi:https://doi.org/10.1007/s11517-018-1878-0
Zhang H, Yu P, Xiang ML, Li XB, Kong WB, Ma JY, Wang JL, Zhang JP, Zhang J (2016) Prediction of drug-induced eosinophilia adverse effect by using SVM and naïve Bayesian approaches. Germ J Med Biol Eng Comp 54(2–3):361–369
Zhang S, Li X, Zong M et al (2018) Efficient KNN classification with different numbers of nearest neighbors. US J IEEE Trans Neural Networks Learn Systems (99):1–12
Bertolaccini L, Solli P, Pardolesi A, Pasini A (2017) An overview of the use of artificial neural networks in lung cancer research. J Thorac Dis 9(4):924–931
Siegel R, DeSantis C, Jemal A (2014) Colorectal cancer statistics, 2014. J CA: Cancer J Clin 64:104–117
Lee J, Meyerhardt JA, Giovannucci E, Jeon JY (2015) Association between body mass index and prognosis of colorectal cancer: a meta-analysis of prospective cohort studies. US J PloS one 10:e0120706
Chu CM, Yao CT, Chang YT et al (2014) Gene expression profiling of colorectal tumors and normal mucosa by microarrays meta-analysis using prediction analysis of microarray, artificial neural network, classification, and regression trees. J Dis Markers 2014:459–462
Orang AV, Barzegari A (2014) MicroRNAs in colorectal cancer: from diagnosis to targeted therapy. Asian Pac J Cancer Prev 15:6989–6999
Philip AK, Lubner MG, Harms B (2011) Computed tomographic colonography. J Surg Clin North Am 91:127–139
Zhang H, Qi J, Wu YQ, Zhang P, Jiang J, Wang QX, Zhu YQ (2014) Accuracy of early detection of colorectal tumors by stool methylation markers: a meta-analysis. World J Gastroenterol 20:14040–14050
Ip S, Sokoro AA, Kaita L, Ruiz C, McIntyre E, Singh H (2014) Use of fecal occult blood testing in hospitalized patients: results of an audit. Can J Gastroenterol Hepatol 28:489–494
Li H, Jin Z, Li X et al (2017) Associations between single-nucleotide polymorphisms and inflammatory bowel disease-associated colorectal cancers in inflammatory bowel disease patients: a meta-analysis. J Clinical & Transl Oncol 19:1–10
Zhang B, Liang XL, Gao HY et al (2016) Models of logistic regression analysis, support vector machine, and back-propagation neural network based on serum tumor markers in colorectal cancer diagnosis. J Genetics Mol Res 15:1–10
Zeller G, Tap J, Voigt AY, Sunagawa S, Kultima JR, Costea PI, Amiot A, Bohm J, Brunetti F, Habermann N, Hercog R, Koch M, Luciani A, Mende DR, Schneider MA, Schrotz-King P, Tournigand C, Tran van Nhieu J, Yamada T, Zimmermann J, Benes V, Kloor M, Ulrich CM, von Knebel Doeberitz M, Sobhani I, Bork P (2014) Potential of fecal microbiota for early-stage detection of colorectal cancer. US J Mol Systems Biol 10:766–783
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. N Eng J Bioinformatics 30:2114–2120
Truong DT, Franzosa EA, Tickle EL et al (2015) MetaPhlAn2 for enhanced metagenomic taxonomic profiling. US J Nat Methods 12:902–903
Vincent C, Manges AR (2015) Antimicrobial use, human gut microbiota and Clostridium difficile colonization and infection. J Antibiotics 4:230–253
Endesfelder D, zu-Castell W, Ardissone A et al (2014) Compromised gut microbiota networks in children with anti-islet cell autoimmunity. US J Diabetes DB_131676 63:2006–2014
Gao R, Gao Z, Huang L, Qin H (2017) Gut microbiota and colorectal cancer. Eur J Eur J Clin Microbiol Infect Dis 36:1–13
Zeevi D, Korem T, Zmora N, Israeli D, Rothschild D, Weinberger A, Ben-Yacov O, Lador D, Avnit-Sagi T, Lotan-Pompan M, Suez J, Mahdi JA, Matot E, Malka G, Kosower N, Rein M, Zilberman-Schapira G, Dohnalová L, Pevsner-Fischer M, Bikovsky R, Halpern Z, Elinav E, Segal E (2015) Personalized nutrition by prediction of glycemic responses. US J Cell 163:1079–1094
Schmid D, Leitzmann M F (2014) Television viewing and time spent sedentary in relation to cancer risk: a meta-analysis. J Natl Cancer Instit
Emmerzaal TL, Kiliaan AJ, Gustafson DR (2015) 2003-2013: a decade of body mass index, Alzheimer's disease, and dementia. J. J Alzheimers Dis 43:739–755
Alfa-Wali M, Boniface S, Sharma A et al (2015) Metabolic syndrome (Mets) and risk of colorectal cancer (CRC): a systematic review and meta-analysis. J World J Surg Med Radiat Oncol 4:41–52
Sears CL, Garrett WS (2014) Microbes, microbiota, and colon cancer. US J Cell Host Microbe 15:317–328
Zhu Q, Jin Z, Wu W, Gao R et al (2014) Analysis of the intestinal lumen microbiota in an animal model of colorectal cancer. US J PLoS One e90849
Zhao M, Fu C, Ji L, Tang K, Zhou M (2011) Feature selection and parameter optimization for support vector machines: a new approach based on genetic algorithm with feature chromosomes. J Expert Syst App l38:5197–5204
Hu X, Wong KK, Young GS, Guo L, Wong ST (2011) Support vector machine multiparametric MRI identification of pseudoprogression from tumor recurrence in patients with resected glioblastoma. US J Journal of Magnetic Resonance Imaging 33:296–305
Zhang H, Yu P, Xiang ML, Li XB, Kong WB, Ma JY, Wang JL, Zhang JP, Zhang J (2016) Prediction of drug-induced eosinophilia adverse effect by using SVM and naive Bayesian approaches. Germ J Medical & Biological Engineering & Computing 54:361–370
Chen T, Cao Y, Zhang Y et al Random forest in clinical metabolomics for phenotypic discrimination and biomarker selection. Evidence-Based Complementray and Alternative Medicine 2013, 2013:298183–298193
Saccá V, Campolo M, Mirarchi D et al (2018) On the classification of EEG signal by using an SVM based algorithm
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118
Funding
This research is supported by the National Natural Science Foundation of China (61876102, 61472232, 61572300, 61402270, 61602286), Taishan Scholar Program of Shandong Province in China (TSHW201502038), and Natural Science Foundation of Shandong Province in China (ZR2016FB13).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Zhao, D., Liu, H., Zheng, Y. et al. A reliable method for colorectal cancer prediction based on feature selection and support vector machine. Med Biol Eng Comput 57, 901–912 (2019). https://doi.org/10.1007/s11517-018-1930-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-018-1930-0