Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3457682.3457704acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article
Open access

Applying Neural Network to Reconstruction of Phylogenetic Tree

Published: 21 June 2021 Publication History

Abstract

Reconstruction of phylogenetic tree from biological sequences is a fundamental step in molecular biology, but it is computationally exhausting. Our goal is to use neural network to learn the heuristic strategy of phylogenetic tree reconstruction algorithm. We propose an attention model to learn heuristic strategies for constructing circular ordering related to phylogenetic trees. We use alignment-free K-mer frequency vector representation to represent biological sequences and use unlabeled sequence data sets to train attention model through reinforcement learning. Comparing with traditional methods, our approach is alignment-free and can be easily extended to large-scale data with computational efficiency. With the rapid growth of public biological sequence data, our method provides a potential way to reconstruct phylogenetic tree.

References

[1]
Foulds, L. R., & Graham, R. L. (1982). The steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics, 3(1). https://doi.org/10.1016/S0196-8858(82)80004-3
[2]
Yang, Z. (2014). Molecular evolution: a statistical approach. Oxford University Press.
[3]
Semple, C., & Steel, M. (2004). Circular permutations and evolutionary trees. Advances in Applied Mathematics, 32(4). https://doi.org/10.1016/S0196-8858(03)00098-8
[4]
Yang, Z., & Rannala, B. (2012). Molecular phylogenetics: Principles and practice. In Nature Reviews Genetics (Vol. 13, Issue 5). https://doi.org/10.1038/nrg3186
[5]
Price, M. N., Dehal, P. S., & Arkin, A. P. (2010). FastTree 2 - Approximately maximum-likelihood trees for large alignments. PLoS ONE, 5(3). https://doi.org/10.1371/journal.pone.0009490
[6]
Stamatakis, A. (2014). RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9). https://doi.org/10.1093/bioinformatics/btu033
[7]
Ronquist, F., Teslenko, M., Van Der Mark, P., Ayres, D. L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M. A., & Huelsenbeck, J. P. (2012). Mrbayes 3.2: Efficient bayesian phylogenetic inference and model choice across a large model space. Systematic Biology, 61(3). https://doi.org/10.1093/sysbio/sys029
[8]
Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C. H., Xie, D., Suchard, M. A., Rambaut, A., & Drummond, A. J. (2014). BEAST 2: A Software Platform for Bayesian Evolutionary Analysis. PLoS Computational Biology, 10(4). https://doi.org/10.1371/journal.pcbi.1003537
[9]
Makarenkov, V., & Leclerc, B. (1997). Circular orders of tree metrics, and their uses for the reconstruction and fitting of phylogenetic trees. https://doi.org/10.1090/dimacs/037/11
[10]
Saitou, N., & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4(4). https://doi.org/10.1093/oxfordjournals.molbev.a040454
[11]
Sneath, P. H. A., & Sokal, R. R. (1973). Unweighted pair group method with arithmetic mean. Numerical Taxonomy, 230-234.
[12]
Desper, R., & Gascuel, O. (2005). The minimum evolution distance-based approach to phylogenetic inference. Mathematics of Evolution and Phylogeny, 1.
[13]
Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5). https://doi.org/10.1093/nar/gkh340
[14]
Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2019). Neural combinatorial optimization with reinforcement learning. 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings.
[15]
Kool, W., Van Hoof, H., & Welling, M. (2019). Attention, learn to solve routing problems! 7th International Conference on Learning Representations, ICLR 2019.
[16]
Dai, H., Khalil, E. B., Zhang, Y., Dilkina, B., & Song, L. (2017). Learning combinatorial optimization algorithms over graphs. Advances in Neural Information Processing Systems, 2017-December.
[17]
Mittal, A., Dhawan, A., Manchanda, S., Medya, S., Ranu, S., & Singh, A. (2019). Learning heuristics over large graphs via deep reinforcement learning. arXiv preprint arXiv:1903.03332.
[18]
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
[19]
Korostensky, C., & Gonnet, G. H. (2000). Using traveling salesman problem algorithms for evolutionary tree construction. Bioinformatics, 16(7). https://doi.org/10.1093/bioinformatics/16.7.619
[20]
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. https://doi.org/10.3115/v1/d14-1179
[21]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-December.
[22]
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December. https://doi.org/10.1109/CVPR.2016.90
[23]
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. 32nd International Conference on Machine Learning, ICML 2015, 1.
[24]
Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings.
[25]
Yushmanov, S. V. (1984). Construction of a tree with p leaves from 2p-3 elements of its distance matrix. Matematicheskie Zametki, 35, 877-887.
[26]
DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P., & Andersen, G. L. (2006). Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Applied and Environmental Microbiology, 72(7). https://doi.org/10.1128/AEM.03006-05
[27]
Robinson, D. F., & Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical biosciences, 53(1-2), 131-147.
[28]
Sukumaran, J., & Holder, M. T. (2010). DendroPy: A Python library for phylogenetic computing. Bioinformatics, 26(12). https://doi.org/10.1093/bioinformatics/btq228

Cited By

View all
  • (2024)The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement LearningMolecular Biology and Evolution10.1093/molbev/msae10541:6Online publication date: 3-Jun-2024
  • (2023)Constructing phylogenetic networks via cherry picking and machine learningAlgorithms for Molecular Biology10.1186/s13015-023-00233-318:1Online publication date: 16-Sep-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing
February 2021
601 pages
ISBN:9781450389310
DOI:10.1145/3457682
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Attention Model
  2. Phylogenetic Tree
  3. Reinforcement Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMLC 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)571
  • Downloads (Last 6 weeks)62
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement LearningMolecular Biology and Evolution10.1093/molbev/msae10541:6Online publication date: 3-Jun-2024
  • (2023)Constructing phylogenetic networks via cherry picking and machine learningAlgorithms for Molecular Biology10.1186/s13015-023-00233-318:1Online publication date: 16-Sep-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media