Abstract
Deep learning algorithms have obtained numerous achievements in image classification, speed recognition, video processing. Visualizing metagenomic data is a challenge because of its complexity and high-dimensional. In this paper, we introduce several approaches based on dimensionality reduction algorithms and data density to visualize features which reflect the species abundance. The sophisticated methods used in this study, that are unsupervised approaches, carry out dimensionality reduction and map the data into a 2-dimensional space. From the visualizations obtained, deep learning techniques are leveraged to enhance the prediction performance for colorectal cancer. We show by experiments on five Metagenome-based colorectal cancer datasets from different regions such as Chinese, Austrian, American, German and French cohorts that the proposed visualizations allow to visualize bio-medical signatures and improve the prediction performance compared to classical machine learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dai, Z., et al.: Multi-cohort analysis of colorectal cancer metagenome identified altered bacteria across populations and universal bacterial markers. Microbiome 6, 70 (2018). https://doi.org/10.1186/s40168-018-0451-2. ISSN 2049–2618
Sudarikov, K., et al.: Methods for the metagenomic data visualization and analysis. Curr. Issues Mol. Biol. 24, 37–58 (2017). ISSN: 14673037
Oh, J., et al.: Biogeography and individuality shape function in the human skin metagenome. Nature 514, 59–64 (2014). https://www.nature.com/articles/nature13786. ISSN 1476–4687
R Development Core Team: A Language and Environment for Statistical Computing (2008). ISBN: 3-900051-07-0
Ondov, B.D., et al.: Interactive metagenomic visualization in a web browser. BMC Bioinform. 12, 385 (2011)
Kerepesi, C., et al.: AmphoraNet: the webserver implementation of the AMPHORA2 metagenomic workflow suite. Gene, 538–540 (2013). https://doi.org/10.1016/j.gene.2013.10.015
Rudis, B., Almossawi, A., Ulmer, H.: ‘metricsgraphics’, CRAN repository (2015). https://CRAN.R-project.org/package=metricsgraphics
Warnes, G.R., et al.: Package ‘gplots’, CRAN repository (2016). https://CRAN.R-project.org/package=gplots
Jiang, X., et al.: Manifold learning reveals nonlinear structure in metagenomic profiles. In: 2012 IEEE International Conference on Bioinformatics and Biomedicine (2012)
Alshawaqfeh, M., et al.: Consistent metagenomic biomarker detection via robust PCA. Biol. Direct 12(1), 4 (2016)
Huo, X., et al.: A survey of manifold-based learning methods. In: Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications, pp. 691–745 (2007). https://doi.org/10.1142/9789812779861_0015
Izenman, A.J.: Introduction to manifold learning. Wiley Interdisc. Rev.: Comput. Stat. 5, 439–446 (2012)
Meyer, F., et al.: The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform. 9(1), 386 (2011)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. In: Conference in Modern Analysis and Probability. New Haven, Conn. (1982)
Grellmann, C., et al.: Random projection for fast and efficient multivariate correlation analysis of high-dimensional data: a new approach. Front. Genet. 7, 102 (2016)
Lahiri, S., et al.: Random projections of random manifolds; arXiv:1607.04331 [cs, q-bio, stat] (2016)
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the beta-divergence; arXiv:1010.1763 [cs] (2010)
Huson, D.H., Auch, A.F., Qi, J., Schuster, S.C.: MEGAN analysis of metagenomic data 17, 377–386. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1800929/. ISSN 1088–9051
Gillis, N.: The Why and How of Nonnegative Matrix Factorization; arXiv:1401.5226 [cs, math, stat] (2010)
Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling. SSS. Springer, New York (2005). https://doi.org/10.1007/0-387-28981-X
McQueen, J., Meila, M., VanderPlas, J., Zhang, Z.: Manifold Learning with Millions of points; arxiv (2005)
Park, H.: ISOMAP induced manifold embedding and its application to Alzheimer’s disease and mild cognitive impairment. Neurosci. Lett. 513, 141–145 (2012)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2012)
Talwalkar, A., Kumar, S., Rowley, H.: Large-scale manifold learning. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition (2008)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Nguyen, T.H., et al.: Disease classification in metagenomics with 2D embeddings and deep learning. In: The Annual French Conference in Machine Learning (CAp 2018) (2018)
Hamel, P., Eck, D.: Learning features from music audio with deep belief networks (2010)
Garreta, R., Moncecchi, G.: Learning Scikit-Learn: Machine Learning in Python. Packt Publishing Ltd (2013)
Kingma, D.P., et al.: Adam: A Method for Stochastic Optimization; CoRR abs/1412.6980 (2014)
Bolger, A.M., Lohse, M., Usadel, B.: Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics 30, 2114–2120 (2014). ISSN 1367–4811
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Nguyen, T.H., Nguyen, TN. (2019). Disease Prediction Using Metagenomic Data Visualizations Based on Manifold Learning and Convolutional Neural Network. In: Dang, T., Küng, J., Takizawa, M., Bui, S. (eds) Future Data and Security Engineering. FDSE 2019. Lecture Notes in Computer Science(), vol 11814. Springer, Cham. https://doi.org/10.1007/978-3-030-35653-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-35653-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35652-1
Online ISBN: 978-3-030-35653-8
eBook Packages: Computer ScienceComputer Science (R0)