Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3307339.3342165acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
poster
Open access

Scalable Statistical Introgression Mapping Using Approximate Coalescent-Based Inference

Published: 04 September 2019 Publication History

Abstract

Recent advances in biomolecular sequencing have revealed the important role that interspecific gene flow has played in genome evolution throughout the Tree of Life. Current and future genomic studies will bring large amounts of genomic sequence to bear upon this topic, and scalable computational methodologies are needed to detect and analyze genomic signatures of interspecific introgression in large-scale datasets.
To address the methodological gap, we introduce a new computational framework known as PHiMM (or "fast PhyloNet + Hidden Markov Model"). PHiMM combines inference and learning under a combined model of genetic drift, substitutions, recombination, and gene flow with a coalescent-based approximation technique. We compare the performance of PHiMM against the state of the art using synthetic and empirical genomic sequence data. We find that PHiMM offers better computational runtime and main memory usage by multiple orders of magnitude, while returning comparable inference accuracy.
An open-source software implementation of the PHiMM framework and open data are publicly available at https://gitlab.msu.edu/liulab/phimm-dataset.

References

[1]
R. P. Brent. 1973. Algorithms for Minimization without Derivatives .Dover Publications, Mineola, New York. 1--208 pages.
[2]
John Didion, Hyuna Yang, Keith Sheppard, Chen-Ping Fu, Leonard McMillan, Fernando de Villena, and Gary Churchill. 2012. Discovery of novel variants in genotyping arrays improves genotype retention and reduces ascertainment bias. BMC Genomics, Vol. 13, 1 (2012), 34.
[3]
Eric Y. Durand, Nick Patterson, David Reich, and Montgomery Slatkin. 2011. Testing for Ancient Admixture between Closely Related Populations. Molecular Biology and Evolution, Vol. 28, 8 (2011), 2239--2252.
[4]
Scott V Edwards. 2009. Is a new and general theory of molecular systematics emerging? Evolution, Vol. 63, 1 (2009), 1--19.
[5]
B. Efron. 1979. Bootstrap Methods: Another Look at the Jackknife. Ann. Statist., Vol. 7, 1 (01 1979), 1--26.
[6]
Joseph Felsenstein. 2004. Inferring Phylogenies .Sinauer Assoc., Sunderland, MA.
[7]
Jason R Gallant, Vance E Imhoff, Arnaud Martin, Wesley K Savage, Nicola L Chamberlain, Ben L Pote, Chelsea Peterson, Gabriella E Smith, Benjamin Evans, Robert D Reed, et almbox. 2014. Ancient homology underlies adaptive mimetic diversity across butterflies. Nature communications, Vol. 5 (2014), 4817.
[8]
D Garrigan and AJ Geneva. 2014. msmove: A modified version of Hudson's coalescent simulator ms allowing for finer control and tracking of migrant genealogies.
[9]
Richard E. Green, Johannes Krause, Adrian W. Briggs, Tomislav Maricic, Udo Stenzel, Martin Kircher, Nick Patterson, Heng Li, Weiwei Zhai, Markus Hsi-Yang Fritz, Nancy F. Hansen, Eric Y. Durand, Anna-Sapfo Malaspinas, Jeffrey D. Jensen, Tomas Marques-Bonet, Can Alkan, Kay Prüfer, Matthias Meyer, Hernán A. Burbano, Jeffrey M. Good, Rigo Schultz, Ayinuer Aximu-Petri, Anne Butthof, Barbara Höber, Barbara Höffner, Madlen Siegemund, Antje Weihmann, Chad Nusbaum, Eric S. Lander, Carsten Russ, Nathaniel Novod, Jason Affourtit, Michael Egholm, Christine Verna, Pavao Rudan, Dejana Brajkovic, v Zeljko Kucan, Ivan Guv sic, Vladimir B. Doronichev, Liubov V. Golovanova, Carles Lalueza-Fox, Marco de la Rasilla, Javier Fortea, Antonio Rosas, Ralf W. Schmitz, Philip L. F. Johnson, Evan E. Eichler, Daniel Falush, Ewan Birney, James C. Mullikin, Montgomery Slatkin, Rasmus Nielsen, Janet Kelso, Michael Lachmann, David Reich, and Svante P"a"abo. 2010. A Draft Sequence of the Neandertal Genome. Science, Vol. 328, 5979 (2010), 710--722.
[10]
Bettina Harr, Emre Karakoc, Rafik Neme, Meike Teschke, Christine Pfeifle, uZeljka Pezer, Hiba Babiker, Miriam Linnenbrink, Inka Montero, Rick Scavetta, Mohammad Reza Abai, Marta Puente Molins, Mathias Schlegel, Rainer G. Ulrich, Janine Altmüller, Marek Franitza, Anna Büntge, Sven Künzel, and Diethard Tautz. 2016. Genomic resources for wild populations of the house mouse, Mus musculus and its close relative Mus spretus . Scientific Data, Vol. 3 (2016), 160075 EP --.
[11]
Jotun Hein, Mikkel Schierup, and Carsten Wiuf. 2004. Gene Genealogies, Variation and Evolution: a Primer in Coalescent Theory .Oxford University Press, Oxford.
[12]
Hussein A Hejase and Kevin J Liu. 2016. A scalability study of phylogenetic network inference methods using empirical datasets and simulations involving a single reticulation. BMC Bioinformatics, Vol. 17, 1 (2016), 422.
[13]
Hussein A. Hejase, Natalie VandePol, Gregory M. Bonito, and Kevin J. Liu. 2018. FastNet: Fast and Accurate Statistical Inference of Phylogenetic Networks Using Large-Scale Genomic Sequence Data. In Comparative Genomics, Mathieu Blanchette and Aïda Ouangraoua (Eds.). Springer International Publishing, Cham, 242--259.
[14]
Asger Hobolth, Ole F Christensen, Thomas Mailund, and Mikkel H Schierup. 2007. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genetics, Vol. 3, 2 (2007), e7.
[15]
Katharina T Huber, Bengt Oxelman, Martin Lott, and Vincent Moulton. 2006. Reconstructing the evolutionary history of polyploids from multilabeled trees. Molecular Biology and Evolution, Vol. 23, 9 (2006), 1784--1791.
[16]
Richard R. Hudson. 2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, Vol. 18, 2 (2002), 337--338.
[17]
Michael I Jensen-Seaman, Terrence S Furey, Bret A Payseur, Yontao Lu, Krishna M Roskin, Chin-Fu Chen, Michael A Thomas, David Haussler, and Howard J Jacob. 2004. Comparative recombination rates in the rat, mouse, and human genomes. Genome research, Vol. 14, 4 (2004), 528--538.
[18]
T.H. Jukes and C.R. Cantor. 1969. Evolution of Protein Molecules .Academic Press, New York, NY, USA, 21--132.
[19]
John Frank Charles Kingman. 1982 a. The coalescent. Stochastic Processes and their Applications, Vol. 13, 3 (1982), 235--248.
[20]
J. F. C. Kingman. 1982 b. On the Genealogy of Large Populations. Journal of Applied Probability, Vol. 19 (1982), pp. 27--43.
[21]
Giddy Landan and Dan Graur. 2007. Heads or tails: a simple reliability check for multiple sequence alignments. Molecular biology and evolution, Vol. 24, 6 (2007), 1380--1383.
[22]
Adam D Leaché, Rebecca B Harris, Bruce Rannala, and Ziheng Yang. 2014. The influence of gene flow on species tree estimation: a simulation study. Systematic Biology, Vol. 63, 1 (2014), 17--30.
[23]
Kevin J. Liu, Jingxuan Dai, Kathy Truong, Ying Song, Michael H. Kohn, and Luay Nakhleh. 2014. An HMM-Based Comparative Genomic Framework for Detecting Introgression in Eukaryotes. PLoS Computational Biology, Vol. 10, 6 (06 2014), e1003649.
[24]
K. J. Liu, E. Steinberg, A. Yozzo, Y. Song, M. H. Kohn, and L. Nakhleh. 2015. Interspecific introgressive origin of genomic diversity in the house mouse. Proceedings of the National Academy of Sciences, Vol. 112, 1 (2015), 196--201.
[25]
Thomas Mailund, Anders E. Halager, Michael Westergaard, Julien Y. Dutheil, Kasper Munch, Lars N. Andersen, Gerton Lunter, Kay Prüfer, Aylwyn Scally, Asger Hobolth, and Mikkel H. Schierup. 2012. A New Isolation with Migration Model along Complete Genomes Infers Very Different Divergence Processes among Closely Related Great Ape Species. PLoS Genet, Vol. 8, 12 (12 2012), e1003125.
[26]
Gilean AT McVean and Niall J Cardin. 2005. Approximating the coalescent with recombination. Philosophical Transactions of the Royal Society B: Biological Sciences, Vol. 360, 1459 (2005), 1387--1393.
[27]
Michael JD Powell. 2009. The BOBYQA algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge (2009), 26--46.
[28]
Lawrence R Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, Vol. 77, 2 (1989), 257--286.
[29]
David Reich, Richard E. Green, Martin Kircher, Johannes Krause, Nick Patterson, Eric Y. Durand, Bence Viola, Adrian W. Briggs, Udo Stenzel, Philip L. F. Johnson, Tomislav Maricic, Jeffrey M. Good, Tomas Marques-Bonet, Can Alkan, Qiaomei Fu, Swapan Mallick, Heng Li, Matthias Meyer, Evan E. Eichler, Mark Stoneking, Michael Richards, Sahra Talamo, Michael V. Shunkov, Anatoli P. Derevianko, Jean-Jacques Hublin, Janet Kelso, Montgomery Slatkin, and Svante Paabo. 2010. Genetic history of an archaic hominin group from Denisova Cave in Siberia . Nature, Vol. 468, 7327 (2010), 1053--1060.
[30]
F. Rodriguez, J.L. Oliver, A. Marin, and J.R. Medina. 1990. The general stochastic model of nucleotide substitution. J. Theoretical Biology, Vol. 142 (1990), 485--501.
[31]
M. J. Sanderson. 2003. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics, Vol. 19, 2 (2003), 301--302.
[32]
Paul Scheet and Matthew Stephens. 2006. A Fast and Flexible Statistical Model for Large-Scale Population Genotype Data: Applications to Inferring Missing Genotypes and Haplotypic Phase. The American Journal of Human Genetics, Vol. 78, 4 (2006), 629 -- 644.
[33]
Ying Song, Stefan Endepols, Nicole Klemann, Dania Richter, Franz-Rainer Matuschka, Ching-Hua Shih, Michael W. Nachman, and Michael H. Kohn. 2011. Adaptive Introgression of Anticoagulant Rodent Poison Resistance by Hybridization between Old World Mice. Current Biology, Vol. 21, 15 (2011), 1296 -- 1301.
[34]
Fabian Staubach, Anna Lorenc, Philipp W. Messer, Kun Tang, Dmitri A. Petrov, and Diethard Tautz. 2012. Genome Patterns of Selection and Introgression of Haplotypes in Natural Populations of the House Mouse (shape Mus musculus ). PLoS Genetics, Vol. 8, 8 (2012), e1002891.
[35]
Cuong Than, Derek Ruths, and Luay Nakhleh. 2008. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics, Vol. 9, 1 (2008), 322.
[36]
The Heliconious Genome Consortium. 2012. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature, Vol. 487, 7405 (2012), 94--98.
[37]
Wei Wang, Jack Smith, Hussein A Hejase, and Kevin J Liu. 2018. Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences. In RECOMB International conference on Comparative Genomics. Springer, 294--308.
[38]
Dingqiao Wen, Yun Yu, Jiafan Zhu, and Luay Nakhleh. 2018. Inferring phylogenetic networks using PhyloNet. Systematic biology, Vol. 67, 4 (2018), 735--740.
[39]
Hyuna Yang, Jeremy R. Wang, John P. Didion, Ryan J. Buus, Timothy A. Bell, Catherine E. Welsh, Francois Bonhomme, Alex Hon-Tsen Yu, Michael W. Nachman, Jaroslav Pialek, Priscilla Tucker, Pierre Boursot, Leonard McMillan, Gary A. Churchill, and Fernando Pardo-Manuel de Villena. 2011. Subspecific origin and haplotype diversity in the laboratory mouse. Nature Genetics, Vol. 43, 7 (Jul 2011), 648--655.
[40]
Yun Yu, James H. Degnan, and Luay Nakhleh. 2012. The Probability of a Gene Tree Topology within a Phylogenetic Network with Applications to Hybridization Detection. PLoS Genetics, Vol. 8, 4 (04 2012), e1002660.

Cited By

View all
  • (2024)Selection Shapes the Genomic Landscape of Introgressed Ancestry in a Pair of Sympatric Sea Urchin SpeciesGenome Biology and Evolution10.1093/gbe/evae12416:6Online publication date: 14-Jun-2024
  • (2021)Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisitedBioinformatics10.1093/bioinformatics/btab26337:Supplement_1(i111-i119)Online publication date: 12-Jul-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BCB '19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
September 2019
716 pages
ISBN:9781450366663
DOI:10.1145/3307339
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2019

Check for updates

Author Tags

  1. butterfly
  2. coalescent
  3. gene flow
  4. hidden markov model
  5. introgression
  6. mouse
  7. non-parametric resampling
  8. phylogenetic network

Qualifiers

  • Poster

Funding Sources

Conference

BCB '19
Sponsor:

Acceptance Rates

BCB '19 Paper Acceptance Rate 42 of 157 submissions, 27%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)89
  • Downloads (Last 6 weeks)14
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Selection Shapes the Genomic Landscape of Introgressed Ancestry in a Pair of Sympatric Sea Urchin SpeciesGenome Biology and Evolution10.1093/gbe/evae12416:6Online publication date: 14-Jun-2024
  • (2021)Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisitedBioinformatics10.1093/bioinformatics/btab26337:Supplement_1(i111-i119)Online publication date: 12-Jul-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media