Abstract
High-dimensional cancer related dataset permits the researchers to timely diagnose and facilitate in effective treatment of the cancer. Biomedicine application process on the thousands of features. It is challenging to extract the precise statistics from this high-dimensional dataset. This paper presents the Incremental Wrapper based Random Forest Gene Subset Selection of Tumor discernment that mechanisms on the principle of incremental wrapper based feature subset selection with random forest classification algorithm and this algorithm also works as performance validator. Incremental wrapper based feature subset selection is a technique to pick out a finest conceivable subset of genes from the high-dimensional data with low computational cost. Random Forest will increase the overall performance as it works better in cancer related high-dimensional dataset. The efficacy of the random forest classification algorithm as performance validator will significantly improve by working on a selective discriminative subset of prognostic genes as compare to the raw data. We evaluate the proposed methodology on the six publicly available cancer related high dimensional datasets and found that the proposed methodology outperform as compare to standard random forests.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmad, F., Isa, N.A.M., Hussain, Z., Osman, M.K., Sulaiman, S.N.: A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer. Pattern Anal. Appl. 18, 861–870 (2015)
Mishra, D., Sahu, B.: Feature selection for cancer classification: a signal-to-noise ratio approach. Int. J. Sci. Eng. Res. 2, 1–7 (2011)
Deng, L., Pei, J., Ma, J., Lee, D.L.: A rank sum test method for informative gene discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 410–419 (2004)
Dasgupta, S., Saha, G., Mondal, R.: A comparison between methods for generating differentially expressed genes from microarray data for prediction of disease. In: Proceedings of the 2015 Third International Conference on Computer, Communication, Control and Information Technology (C3IT) (2015)
Hua, J., Tembe, W.D., Dougherty, E.R.: Performance of feature-selection methods in the classification of high-dimension data. Pattern Recogn. 42, 409–424 (2009)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Wu, B., et al.: Comparison of statistical methods for classification of Ovarian cancer using mass spectrometry data. Bioinformatics 19, 1636–1643 (2003)
Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinform. 7, 3 (2006)
Shu, W., Shen, H.: Incremental feature selection based on rough set in dynamic incomplete data. Pattern Recogn. 47, 3890–3906 (2014)
Prabhakar, S., Jain, A.K.: Decision-level fusion in fingerprint verification. Pattern Recogn. 35, 861–874 (2002)
Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recogn. 43, 5–13 (2010)
You, W., Yang, Z., Ji, G.: PLS-based recursive feature elimination for high-dimensional small sample. Knowl. Based Syst. 55, 15–28 (2014)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Inza, I., Larrañaga, P., Blanco, R., Cerrolaza, A.J.: Filter versus wrapper gene selection approaches in DNA microarray domains. Artif. Intell. Med. 31, 91–103 (2004)
Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S.: Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recogn. 39, 2383–2392 (2006)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–91 (1993)
Cancer program data sets (2010). Broad Institute. http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://archive.ics.uci.edu/ml
Dataset repository in ARFF (weka) (2010). BioInformatics Group Seville. http://www.upo.es/eps/bigs/datasets.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Fatima, A., Qamar, U., Rehman, S., Nazir, A.K. (2018). Incremental Wrapper Based Random Forest Gene Subset Selection for Tumor Discernment. In: Elloumi, M., et al. Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. Springer, Cham. https://doi.org/10.1007/978-3-319-99133-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-99133-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99132-0
Online ISBN: 978-3-319-99133-7
eBook Packages: Computer ScienceComputer Science (R0)