Abstract
Placing a new sequence onto an existing phylogenetic tree is increasingly used in downstream applications ranging from microbiome analyses to epidemic tracking. Most such applications deal with noisy data, incomplete references, and model misspecifications, all of which make the correct placement uncertain. While recent placement methods have increasingly enabled placement on ultra-large backbone trees with tens to hundreds of thousands of species, they have mostly ignored the issue of uncertainty. Here, we build on the recently developed distance-based phylogenetic placement methodology and show how the distribution of placements can be estimated per input sequence. We compare parametric and non-parametric sampling methods, showing that non-parametric bootstrapping is far more accurate in estimating uncertainty. Finally, we design and implement a linear algebraic implementation of bootstrapping that makes it faster, and we incorporate the computation of support values as a new feature in the APPLES software.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anisimova, M., Gascuel, O., Sullivan, J.: Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst. Biol. 55(4), 539–552 (2006). https://doi.org/10.1080/10635150600755453
Asnicar, F., et al.: Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nat. Commun. 11(1), 2500 (2020). https://doi.org/10.1038/s41467-020-16366-7. http://www.nature.com/articles/s41467-020-16366-7
Balaban, M., Jiang, Y., Roush, D., Zhu, Q., Mirarab, S.: Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol. Ecol. Resour. (2021). https://doi.org/10.1111/1755-0998.13527. https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.13527
Balaban, M., Mirarab, S.: Phylogenetic double placement of mixed samples. Bioinformatics 36(Supplement_1), i335–i343 (2020). https://doi.org/10.1093/bioinformatics/btaa489. https://academic.oup.com/bioinformatics/article/36/Supplement_1/i335/5870522
Balaban, M., Sarmashghi, S., Mirarab, S.: APPLES: scalable distance-based phylogenetic placement with or without alignments. Syst. Biol. 69(3), 566–578 (2020). https://doi.org/10.1093/sysbio/syz063. https://academic.oup.com/sysbio/advance-article/doi/10.1093/sysbio/syz063/5572672. https://academic.oup.com/sysbio/article/69/3/566/5572672
Barbera, P., et al.: EPA-ng: massively parallel evolutionary placement of genetic sequences. Syst. Biol. 68(2), 365–369 (2019). https://doi.org/10.1093/sysbio/syy054. https://academic.oup.com/sysbio/article/68/2/365/5079844
Berger, S.A., Krompass, D., Stamatakis, A.: Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst. Biol. 60(3), 291–302 (2011). https://doi.org/10.1093/sysbio/syr010. http://sysbio.oxfordjournals.org/cgi/content/abstract/60/3/291. http://sysbio.oxfordjournals.org/content/60/3/291.abstract. http://sysbio.oxfordjournals.org/content/60/3/291.full.pdf. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3078422&tool=pmc
Berry, V., Gascuel, O.: On the interpretation of bootstrap trees: appropriate threshold of clade selection and induced gain. Mol. Biol. Evol. 13(7), 999–1011 (1996). https://doi.org/10.1093/molbev/13.7.999. https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/13.7.999
Bohmann, K., Mirarab, S., Bafna, V., Gilbert, M.T.P.: Beyond DNA barcoding: the unrealized potential of genome skim data in sample identification. Mol. Ecol. 29(14), 2521–2534 (2020). https://doi.org/10.1111/mec.15507. https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.15507
Brown, D., Truszkowski, J.: LSHPlace: fast phylogenetic placement using locality-sensitive hashing. In: Pacific Symposium on Biocomputing, pp. 310–319 (2013). https://doi.org/10.1142/9789814447973_0031. http://www.ncbi.nlm.nih.gov/pubmed/23424136. http://www.worldscientific.com/doi/abs/10.1142/9789814447973_0031
Darling, A.E., Jospin, G., Lowe, E., Matsen, F.A., Bik, H.M., Eisen, J.A.: PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2, e243 (2014). https://doi.org/10.7717/peerj.243. https://peerj.com/articles/243
Desper, R., Gascuel, O.: Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J. Comput. Biol. 9(5), 687–705 (2002). https://doi.org/10.1089/106652702761034136. http://www.liebertonline.com/doi/abs/10.1089/106652702761034136. http://www.ncbi.nlm.nih.gov/pubmed/12487758
Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7(1), 1–26 (1979). http://www.jstor.org/stable/2958830
Erdos, P., Steel, M., Szekely, L., Warnow, T.: A few logs suffice to build (almost) all trees: part II. Theoret. Comput. Sci. 221(1–2), 77–118 (1999). https://doi.org/10.1016/S0304-3975(99)00028-6
Felsenstein, J.: Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4), 783–791 (1985). https://doi.org/10.2307/2408678. http://www.jstor.org/stable/2408678
Felsenstein, J.: Inferring phylogenies (2003)
Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155(3760), 279–284 (1967). https://doi.org/10.1126/science.155.3760.279. https://www.science.org/doi/10.1126/science.155.3760.279
Guénoche, A., Garreta, H.: Can we have confidence in a tree representation? In: Gascuel, O., Sagot, M.-F. (eds.) JOBIM 2000. LNCS, vol. 2066, pp. 45–56. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45727-5_5
Guo, S., Wang, L.S., Kim, J.: Large-scale simulation of RNA macroevolution by an energy-dependent fitness model. arXiv 0912.2326 (2009). http://arxiv.org/abs/0912.2326
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. 89(22), 10915–10919 (1992). https://doi.org/10.1073/pnas.89.22.10915. http://www.pnas.org/cgi/doi/10.1073/pnas.89.22.10915
Huson, D.H., Nettles, S.M., Warnow, T.J.: Disk-covering, a fast-converging method for phylogenetic tree reconstruction. J. Comput. Biol. 6(3–4), 369–386 (1999). https://doi.org/10.1089/106652799318337. http://www.ncbi.nlm.nih.gov/pubmed/10582573
Janssen, S., et al.: Phylogenetic placement of exact amplicon sequences improves associations with clinical information. mSystems 3(3), 00021-18 (2018). https://doi.org/10.1128/mSystems.00021-18. http://msystems.asm.org/lookup/doi/10.1128/mSystems.00021-18
Jarvis, E.D., et al.: Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346(6215), 1320–1331 (2014). https://doi.org/10.1126/science.1253451. http://www.sciencemag.org/content/346/6215/1320.abstract. http://www.sciencemag.org/cgi/doi/10.1126/science.1253451
Jiang, Y., Balaban, M., Zhu, Q., Mirarab, S.: DEPP: deep learning enables extending species trees using single genes. bioRxiv (abstract in RECOMB 2021) (2021). https://doi.org/10.1101/2021.01.22.427808. http://biorxiv.org/content/early/2021/01/24/2021.01.22.427808.abstract
Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Mammalian Protein Metabolism, vol. III, pp. 21–132 (1969)
Kishino, H., Hasegawa, M.: Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J. Mol. Evol. 29(2), 170–179 (1989). https://doi.org/10.1007/BF02100115. http://www.springerlink.com/content/ll0lr02023152485
Kubatko, L.S., Degnan, J.H.: Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst. Biol. 56, 17–24 (2007). http://sysbio.oxfordjournals.org/content/56/1/17.short
Libin, P., et al.: PhyloGeoTool: interactively exploring large phylogenies in an epidemiological context. Bioinformatics 33(24), 3993–3995 (2017). https://doi.org/10.1093/bioinformatics/btx535
Linard, B., Swenson, K.M., Pardi, F.: Rapid alignment-free phylogenetic identification of metagenomic sequences. Bioinformatics 35(18), 3303–3312 (2019). https://doi.org/10.1093/bioinformatics/btz068. https://doi.org/10.1093/bioinformatics/btz068
Mai, U., Mirarab, S.: Completing gene trees without species trees in sub-quadratic time. Bioinformatics btab875 (2022). https://doi.org/10.1093/bioinformatics/btab875. https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab875/6493250
Matsen, F.A.: Phylogenetics and the human microbiome. Syst. Biol. 64(1), e26–e41 (2015). https://doi.org/10.1093/sysbio/syu053. http://arxiv.org/abs/1407.1794. https://academic.oup.com/sysbio/article/64/1/e26/2847641
Matsen, F.A., Kodner, R.B., Armbrust, E.V.: pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform. 11(1), 538 (2010). https://doi.org/10.1186/1471-2105-11-538. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3098090&tool=pmcentrez&rendertype=abstract. http://www.ncbi.nlm.nih.gov/pubmed/21034504. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC3098090
Matsen, F.A., IV., Evans, S.N., Matsen, F.A., Evans, S.N.: Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison. PLoS ONE 8(3), 1–17 (2013). https://doi.org/10.1371/journal.pone.0056859
McDonald, D., Birmingham, A., Knight, R.: Context and the human microbiome. Microbiome 3(1), 52 (2015). https://doi.org/10.1186/s40168-015-0117-2. http://www.microbiomejournal.com/content/3/1/52
Mirarab, S., Bayzid, M.S., Warnow, T.: Evaluating summary methods for multilocus species tree estimation in the presence of incomplete lineage sorting. Syst. Biol. 65(3), 366–380 (2016). https://doi.org/10.1093/sysbio/syu063. http://sysbio.oxfordjournals.org/content/early/2014/10/13/sysbio.syu063%5Cn. http://sysbio.oxfordjournals.org/content/early/2014/10/13/sysbio.syu063.abstract%5Cn. http://sysbio.oxfordjournals.org/content/early/2014/10/13/sysbio.syu063.full.pdf%5Cn
Mirarab, S., Nguyen, N., Warnow, T.: SEPP: SATé-enabled phylogenetic placement. In: Pacific Symposium on Biocomputing, pp. 247–258. World Scientific (2012). https://doi.org/10.1142/9789814366496_0024. http://www.ncbi.nlm.nih.gov/pubmed/22174280. http://www.worldscientific.com/doi/abs/10.1142/9789814366496_0024
Nayfach, S., Shi, Z.J., Seshadri, R., Pollard, K.S., Kyrpides, N.C.: New insights from uncultivated genomes of the global human gut microbiome. Nature 568(7753), 505–510 (2019). https://doi.org/10.1038/s41586-019-1058-x. http://www.nature.com/articles/s41586-019-1058-x
Nguyen, N.P., Mirarab, S., Liu, B., Pop, M., Warnow, T.: TIPP: taxonomic identification and phylogenetic profiling. Bioinformatics 30(24), 3548–3555 (2014). https://doi.org/10.1093/bioinformatics/btu721. http://bioinformatics.oxfordjournals.org/cgi/doi/10.1093/bioinformatics/btu721. https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btu721
Pasolli, E., et al.: Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176(3), 649–662 (2019). https://doi.org/10.1016/j.cell.2019.01.001. https://linkinghub.elsevier.com/retrieve/pii/S0092867419300017
Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree-2 - approximately maximum-likelihood trees for large alignments. PLoS One 5(3), e9490 (2010). https://doi.org/10.1371/journal.pone.0009490. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2835736&tool=pmcentrez&rendertype=abstract
Rabiee, M., Mirarab, S.: INSTRAL: discordance-aware phylogenetic placement using quartet scores. Syst. Biol. 69(2), 384–391 (2020). https://doi.org/10.1093/sysbio/syz045. https://academic.oup.com/sysbio/advance-article/doi/10.1093/sysbio/syz045/5530610
Roch, S., Steel, M.: Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor. Popul. Biol. 100, 56–62 (2015). https://doi.org/10.1016/j.tpb.2014.12.005. http://www.sciencedirect.com/science/article/pii/S0040580914001075. https://linkinghub.elsevier.com/retrieve/pii/S0040580914001075
Salichos, L., Rokas, A.: Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497(7449), 327–331 (2013). https://doi.org/10.1038/nature12130. http://www.nature.com/nature/journal/vaop/ncurrent/full/nature12130.html
Sayyari, E., Mirarab, S.: Fast coalescent-based computation of local branch support from quartet frequencies. Mol. Biol. Evol. 33(7), 1654–1668 (2016). https://doi.org/10.1093/molbev/msw079. https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msw079
Singh, K.: On the asymptotic accuracy of Efron’s bootstrap. Ann. Stat. 9(6), 1187–1195 (1981)
Soltis, P.S., Soltis, D.E.: Applying the bootstrap in phylogeny reconstruction. Stat. Sci. 18(2), 256–267 (2003). http://www.jstor.org/stable/3182855
Sonnhammer, E.L., Hollich, V.: Scoredist: a simple and robust protein sequence distance estimator. BMC Bioinform. 6, 1–8 (2005). https://doi.org/10.1186/1471-2105-6-108
Stark, M., Berger, S.A., Stamatakis, A., von Mering, C.: MLTreeMap-accurate maximum likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genomics 11(1), 461 (2010). https://doi.org/10.1186/1471-2164-11-461. http://www.biomedcentral.com/1471-2164/11/461
Thompson, L.R., et al.: A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551(7681), 457–463 (2017). https://doi.org/10.1038/nature24621. http://www.nature.com/doifinder/10.1038/nature24621
Turakhia, Y., et al.: Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic. Nature Genet. 53(6), 809–816 (2021). https://doi.org/10.1038/s41588-021-00862-7. http://www.nature.com/articles/s41588-021-00862-7
Warnow, T., Moret, B.M.E., John, K.S.: Absolute convergence: true trees from short sequences. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (2001)
Wedell, E., Cai, Y., Warnow, T.: Scalable and accurate phylogenetic placement using pplacer-XR. In: Martín-Vide, C., Vega-Rodríguez, M.A., Wheeler, T. (eds.) AlCoB 2021. LNCS, vol. 12715, pp. 94–105. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-74432-8_7
Zhang, C., Rabiee, M., Sayyari, E., Mirarab, S.: ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinform. 19(S6), 153 (2018). https://doi.org/10.1186/s12859-018-2129-y. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2129-y
Zheng, Q., Bartow-McKenney, C., Meisel, J.S., Grice, E.A.: HmmUFOtu: an HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome amplicon sequencing studies. Genome Biol. 19(1), 82 (2018). https://doi.org/10.1186/s13059-018-1450-0. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1450-0
Zhu, Q., et al.: Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat. Commun. 10(1), 5477 (2019). https://doi.org/10.1038/s41467-019-13443-4. http://www.nature.com/articles/s41467-019-13443-4
Zhu, Q., et al.: WoL: reference phylogeny for microbes (data pre-release) (2019). https://biocore.github.io/wol/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hasan, N.B., Biswas, A., Balaban, M., Mirarab, S., Bayzid, M.S. (2022). Fast and Accurate Branch Support Calculation for Distance-Based Phylogenetic Placements. In: Jin, L., Durand, D. (eds) Comparative Genomics. RECOMB-CG 2022. Lecture Notes in Computer Science(), vol 13234. Springer, Cham. https://doi.org/10.1007/978-3-031-06220-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-06220-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06219-3
Online ISBN: 978-3-031-06220-9
eBook Packages: Computer ScienceComputer Science (R0)