Abstract
Transcription profiling enables researchers to understand the activity of the genes in various experimental conditions; in human genomics, abnormal gene expression is typically correlated with clinical conditions. An important application is the detection of genes which are most involved in the development of tumors, by contrasting normal and tumor cells of the same patient. Several statistical and machine learning techniques have been applied to cancer detection; more recently, deep learning methods have been attempted, but they have typically failed in meeting the same performance as classical algorithms. In this paper, we design a set of deep learning methods that can achieve similar performance as the best machine learning methods thanks to the use of external information or of data augmentation; we demonstrate this result by comparing the performance of new methods against several baselines.
A. Canakoglu and L. Nanni—Co-primary authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Agrawal, S., Agrawal, J.: Neural network techniques for cancer prediction: a survey. Procedia Comput. Sci. 60, 769–774 (2015). https://doi.org/10.1016/j.procs.2015.08.234
Ashburner, M., et al.: Gene ontology: tool for the unification of biology. Nature Genet. 25, 25 (2000)
Blagus, R., Lusa, L.: Smote for high-dimensional class-imbalanced data. BMC Bioinform. 14(1), 106 (2013). https://doi.org/10.1186/1471-2105-14-106
Canakoglu, A., et al.: Integrative warehousing of biomolecular information to support complex multi-topic queries for biomedical knowledge discovery. In: BIBE, pp. 1–4. IEEE (2013)
Danaee, P., Ghaeini, R., Hendrix, D.A.: A deep learning approach for cancer detection and relevant gene identification. In: Pacific Symposium on Biocomputing 2017, pp. 219–229. World Scientific (2017)
Furey, T.S., et al.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Golcuk, G., Tuncel, M.A., Canakoglu, A.: Exploiting ladder networks for gene expression classification. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2018. LNCS, vol. 10813, pp. 270–278. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78723-7_23
Guyon, I., et al.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002). https://doi.org/10.1023/A:1012487302797
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009). https://doi.org/10.1109/TKDE.2008.239
Hijazi, H., Chan, C.: A classification framework applied to cancer gene expression profiles. J. Healthc. Eng. 4(2), 255–284 (2013). https://doi.org/10.1260/2040-2295.4.2.255
LeCun, Y., et al.: Deep learning. Nature 521, 436 (2015)
Liu, J., Wang, X., Cheng, Y., Zhang, L.: Tumor gene expression data classification via sample expansion-based deep learning. Oncotarget 8(65), 109646–109660 (2017). https://doi.org/10.18632/oncotarget.22762
Lonsdale, J., et al.: The genotype-tissue expression (GTEx) project. Nat. Genet. 45(6), 580 (2013)
Rahimi, A., Gönen, M.: Discriminating early-and late-stage cancers using multiple kernel learning on gene sets. Bioinformatics 34(13), i412–i421 (2018)
Rasmus, A., et al.: Semi-supervised learning with ladder networks. In: Advances in Neural Information Processing Systems, pp. 3546–3554 (2015)
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Shen, R., Olshen, A.B., Ladanyi, M.: Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25(22), 2906–2912 (2009). https://doi.org/10.1093/bioinformatics/btp543
Statnikov, A., et al.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform. 9(1), 319 (2008)
Vincent, P., et al.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)
Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)
Yousefi, S., et al.: Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Sci. Rep. 7(1), 11707 (2017)
Zhang, L., et al.: Gene expression profiles in normal and cancer cells. Science 276(5316), 1268–1272 (1997). https://doi.org/10.1126/science.276.5316.1268
Zuyderduyn, S.D., et al.: A machine learning approach to finding gene expression signatures of the early developmental stages of squamous cell lung carcinoma. Cancer Res. 66(8 Supplement), 431–432 (2006). http://cancerres.aacrjournals.org/content/66/8_Supplement/431.4
Acknowledgment
This work is supported by the ERC Advanced Grant 693174 (Data-Driven Genomic Computing) and by the Amazon Machine Learning Research Award on Data-driven Machine and Deep Learning for Genomics.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Canakoglu, A., Nanni, L., Sokolovsky, A., Ceri, S. (2020). Designing and Evaluating Deep Learning Models for Cancer Detection on Gene Expression Data. In: Raposo, M., Ribeiro, P., Sério, S., Staiano, A., Ciaramella, A. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2018. Lecture Notes in Computer Science(), vol 11925. Springer, Cham. https://doi.org/10.1007/978-3-030-34585-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-34585-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34584-6
Online ISBN: 978-3-030-34585-3
eBook Packages: Computer ScienceComputer Science (R0)