research-article

Open access

Applying Neural Network to Reconstruction of Phylogenetic Tree

Authors:

Yunpeng CaiAuthors Info & Claims

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

Pages 146 - 152

https://doi.org/10.1145/3457682.3457704

Published: 21 June 2021 Publication History

All formats PDF

Abstract

Reconstruction of phylogenetic tree from biological sequences is a fundamental step in molecular biology, but it is computationally exhausting. Our goal is to use neural network to learn the heuristic strategy of phylogenetic tree reconstruction algorithm. We propose an attention model to learn heuristic strategies for constructing circular ordering related to phylogenetic trees. We use alignment-free K-mer frequency vector representation to represent biological sequences and use unlabeled sequence data sets to train attention model through reinforcement learning. Comparing with traditional methods, our approach is alignment-free and can be easily extended to large-scale data with computational efficiency. With the rapid growth of public biological sequence data, our method provides a potential way to reconstruct phylogenetic tree.

References

[1]

Foulds, L. R., & Graham, R. L. (1982). The steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics, 3(1). https://doi.org/10.1016/S0196-8858(82)80004-3

Digital Library

[2]

Yang, Z. (2014). Molecular evolution: a statistical approach. Oxford University Press.

[3]

Semple, C., & Steel, M. (2004). Circular permutations and evolutionary trees. Advances in Applied Mathematics, 32(4). https://doi.org/10.1016/S0196-8858(03)00098-8

Digital Library

[4]

Yang, Z., & Rannala, B. (2012). Molecular phylogenetics: Principles and practice. In Nature Reviews Genetics (Vol. 13, Issue 5). https://doi.org/10.1038/nrg3186

[5]

Price, M. N., Dehal, P. S., & Arkin, A. P. (2010). FastTree 2 - Approximately maximum-likelihood trees for large alignments. PLoS ONE, 5(3). https://doi.org/10.1371/journal.pone.0009490

[6]

Stamatakis, A. (2014). RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9). https://doi.org/10.1093/bioinformatics/btu033

[7]

Ronquist, F., Teslenko, M., Van Der Mark, P., Ayres, D. L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M. A., & Huelsenbeck, J. P. (2012). Mrbayes 3.2: Efficient bayesian phylogenetic inference and model choice across a large model space. Systematic Biology, 61(3). https://doi.org/10.1093/sysbio/sys029

[8]

Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C. H., Xie, D., Suchard, M. A., Rambaut, A., & Drummond, A. J. (2014). BEAST 2: A Software Platform for Bayesian Evolutionary Analysis. PLoS Computational Biology, 10(4). https://doi.org/10.1371/journal.pcbi.1003537

[9]

Makarenkov, V., & Leclerc, B. (1997). Circular orders of tree metrics, and their uses for the reconstruction and fitting of phylogenetic trees. https://doi.org/10.1090/dimacs/037/11

[10]

Saitou, N., & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4(4). https://doi.org/10.1093/oxfordjournals.molbev.a040454

[11]

Sneath, P. H. A., & Sokal, R. R. (1973). Unweighted pair group method with arithmetic mean. Numerical Taxonomy, 230-234.

[12]

Desper, R., & Gascuel, O. (2005). The minimum evolution distance-based approach to phylogenetic inference. Mathematics of Evolution and Phylogeny, 1.

[13]

Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5). https://doi.org/10.1093/nar/gkh340

[14]

Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2019). Neural combinatorial optimization with reinforcement learning. 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings.

[15]

Kool, W., Van Hoof, H., & Welling, M. (2019). Attention, learn to solve routing problems! 7th International Conference on Learning Representations, ICLR 2019.

[16]

Dai, H., Khalil, E. B., Zhang, Y., Dilkina, B., & Song, L. (2017). Learning combinatorial optimization algorithms over graphs. Advances in Neural Information Processing Systems, 2017-December.

Digital Library

[17]

Mittal, A., Dhawan, A., Manchanda, S., Medya, S., Ranu, S., & Singh, A. (2019). Learning heuristics over large graphs via deep reinforcement learning. arXiv preprint arXiv:1903.03332.

[18]

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

[19]

Korostensky, C., & Gonnet, G. H. (2000). Using traveling salesman problem algorithms for evolutionary tree construction. Bioinformatics, 16(7). https://doi.org/10.1093/bioinformatics/16.7.619

[20]

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. https://doi.org/10.3115/v1/d14-1179

[21]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-December.

Digital Library

[22]

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December. https://doi.org/10.1109/CVPR.2016.90

[23]

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. 32nd International Conference on Machine Learning, ICML 2015, 1.

[24]

Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic optimization. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings.

[25]

Yushmanov, S. V. (1984). Construction of a tree with p leaves from 2p-3 elements of its distance matrix. Matematicheskie Zametki, 35, 877-887.

[26]

DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P., & Andersen, G. L. (2006). Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Applied and Environmental Microbiology, 72(7). https://doi.org/10.1128/AEM.03006-05

[27]

Robinson, D. F., & Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical biosciences, 53(1-2), 131-147.

[28]

Sukumaran, J., & Holder, M. T. (2010). DendroPy: A Python library for phylogenetic computing. Bioinformatics, 26(12). https://doi.org/10.1093/bioinformatics/btq228

Cited By

Azouri DGranit OAlburquerque MMansour YPupko TMayrose I(2024)The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement LearningMolecular Biology and Evolution10.1093/molbev/msae10541:6Online publication date: 3-Jun-2024
https://doi.org/10.1093/molbev/msae105
Bernardini Gvan Iersel LJulien EStougie L(2023)Constructing phylogenetic networks via cherry picking and machine learningAlgorithms for Molecular Biology10.1186/s13015-023-00233-318:1Online publication date: 16-Sep-2023
https://doi.org/10.1186/s13015-023-00233-3

Recommendations

Clustering based distributed phylogenetic tree construction

Phylogenetic tree construction has received much attention recently due to the availability of vast biological data. In this study, we provide a three step method to build phylogenetic trees. Firstly, a density-based clustering algorithm is used to ...
Reconstruction of large phylogenetic trees: A parallel approach

Reconstruction of phylogenetic trees for very large datasets is a known example of a computationally hard problem. In this paper, we present a parallel computing model for the widely used Multiple Instruction Multiple Data (MIMD) architecture. Following ...
Optimal K-mer Analysis for Alignment Free Phylogenetic Tree
ICBRA '23: Proceedings of the 2023 10th International Conference on Bioinformatics Research and Applications

In bioinformatics, accurate phylogenetic tree construction from a genome sequence is one of the biggest concerns nowadays. Getting the optimal k-mer for this construction is the main concern of this work. In the early day’s alignment-based approach for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

February 2021

601 pages

ISBN:9781450389310

DOI:10.1145/3457682

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMLC 2021

ICMLC 2021: 2021 13th International Conference on Machine Learning and Computing

February 26 - March 1, 2021

Shenzhen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
1,530
Total Downloads

Downloads (Last 12 months)571
Downloads (Last 6 weeks)62

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Azouri DGranit OAlburquerque MMansour YPupko TMayrose I(2024)The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement LearningMolecular Biology and Evolution10.1093/molbev/msae10541:6Online publication date: 3-Jun-2024
https://doi.org/10.1093/molbev/msae105
Bernardini Gvan Iersel LJulien EStougie L(2023)Constructing phylogenetic networks via cherry picking and machine learningAlgorithms for Molecular Biology10.1186/s13015-023-00233-318:1Online publication date: 16-Sep-2023
https://doi.org/10.1186/s13015-023-00233-3

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents