Abstract
Phenotype prediction is one of the central issues in genetics and medical sciences research. Due to the advent of high-throughput screening technologies, microarray-based cancer classification has become a standard procedure to identify cancer-related gene signatures. Since gene expression profiling in transcriptome is of high dimensionality, it is a challenging task to discover a biologically functional signature over different cell lines. In this article, we present an innovative framework for finding a small portion of discriminative genes for a specific disease phenotype classification by using information theory. The framework is a data-driven approach and considers feature relevance, redundancy, and interdependence in the context of feature pairs. Its effectiveness has been validated by using a brain cancer benchmark, where the gene expression profiling matrix is derived from Affymetrix Human Genome U95Av2 GeneChip\(^{\textregistered }\). Three multivariate filters based on information theory have also been used for comparison. To show the strengths of the framework, three performance measures, two sets of enrichment analysis, and a stability index have been used in our experiments. The results show that the framework is robust and able to discover a gene signature having a high level of classification performance and being more statistically significant enriched.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Nevins, J.R., Potti, A.: Mining gene expression profiles: expression signatures as cancer phenotypes. Nature Rev. Genet. 8, 601–609 (2007)
Kim, S.-Y.: Effects of sample size on robustness and prediction accuracy of a prognostic gene signature. BMC Bioinform. 10, 147 (2009)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007)
Bell, D.A., Wang, H.: A formalism for relevance and its application in feature subset selection. Mach. Learn. 41, 175–195 (2000)
Ein-Dor, L., Zuk, O., Domany, E.: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Nat. Acad. Sci. 103, 5923–5928 (2006)
Davies, S., Russell, S.: NP-completeness of searches for smallest possible feature sets. In: Proceedings of the 1994 AAAI Fall Symposium on Relevance, pp. 37–39 (1994)
Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., De Schaetzen, V., Duque, R., Bersini, H., Now, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 9, 1106–1119 (2012)
Albrecht, A., Vinterbo, S.A., Ohno-Machado, L.: An Epicurean learning approach to gene-expression data classification. Artif. Intell. Med. 28, 75–87 (2003)
Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recogn. 43, 5–13 (2010)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Zhou, X., Tuck, D.P.: MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23, 1106–1114 (2007)
Mundra, P.A., Rajapakse, J.C.: SVM-RFE with MRMR filter for gene selection. IEEE Trans. NanoBiosci. 9, 31–37 (2010)
Brown, G., Pocock, A., Zhao, M.-J., Luj, N.M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. The. J. Mach. Learn. Res. 13, 27–66 (2012)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 3, 185–205 (2005)
Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5, 1531–1555 (2004)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Nutt, C.L., Mani, D., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., Mclaughlin, M.E., Batchelor, T.T.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63, 1602–1607 (2003)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Saeys, Y., Abeel, T., Van de Peer, Y.: Robust feature selection using Ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008)
Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Nat. Acad. Sci. 102, 15545–15550 (2005)
Wang, J., Duncan, D., Shi, Z., Zhang, B.: WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res. 41, W77–W83 (2013)
Coussens, L.M., Zitvogel, L., Palucka, A.K.: Neutralizing tumor-promoting chronic inflammation: a magic bullet? Science 339, 286–291 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Lai, HM., Albrecht, A., Steinhöfel, K. (2015). Robust Signature Discovery for Affymetrix GeneChip\(^\circledR \) Cancer Classification. In: Duval, B., van den Herik, J., Loiseau, S., Filipe, J. (eds) Agents and Artificial Intelligence. ICAART 2014. Lecture Notes in Computer Science(), vol 8946. Springer, Cham. https://doi.org/10.1007/978-3-319-25210-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-25210-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25209-4
Online ISBN: 978-3-319-25210-0
eBook Packages: Computer ScienceComputer Science (R0)