Abstract
Using available phylogeographical data of 3585 SARS–CoV–2 genomes we attempt at providing a global picture of the virus’s dynamics in terms of directly interpretable parameters. To this end we fit a hidden state multistate speciation and extinction model to a pre-estimated phylogenetic tree with information on the place of sampling of each strain. We find that even with such coarse–grained data the dominating transition rates exhibit weak similarities with the most popular, continent–level aggregated, airline passenger flight routes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Availability of Data and Materials
The R scripts, RevBayes scripts, MCMC chains, along with the used phylogenetic tree, geographical classification, inside and between regions air passenger volume fractions are available at https://github.com/KHDS-mod/COVID-19-HiSSE and https://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-185867. An already constructed phylogenetic tree and strain (i.e. leaf) data were downloaded from NextStrain (https://nextstrain.org/ncov/global) on 26\(^{\textrm{th}}\) April 2020. This data set contains 3585 genomes sampled between December 2019 and April 2020. A full acknowledgments table of the research groups and authors from the whole world generating the sequence data, from which NextStrain’s phylogenetic tree is constructed, is provided in the nextstrain_ncov_global_authors.tsv file in COVID-19-HiSSE repository. The geographic distribution of COVID–19 case fatalities worldwide (presented in Tab. 1) were downloaded from European Centre for Disease Prevention and Control (https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide ECDC) on 11\(^{\textrm{th}}\) May 2020. We took a subset of the case fatalities for 26\(^{\textrm{th}}\) April 2020 corresponding to NextStrain’s sequences. The region of North America includes the following countries: Canada, Mexico, Panama, USA. The region of South America includes the following countries: Brazil, Chile, Colombia, Ecuador, Peru, Uruguay. The 5 deaths from Georgia were subtracted from Europe and added to Asia, because Georgia is classified as Asia in the NextStrain data. In addition, there are 7 deaths not classified in any of the regions by ECDC. These are labelled as “Cases on an international conveyance Japan” and seem to correspond to deaths on cruise ships. We excluded these completely. The air passenger data have been obtained through the commercial provider SABRE [18]. Data are consolidated for the years 2019 and 2020.
References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Parzen, E., Tanabe, K., Kitagawa, G. (eds.) Selected Papers of Hirotugu Akaike. Springer Series in Statistics, pp. 199–213. Springer, New York (1998). https://doi.org/10.1007/978-1-4612-1694-0_15
Beaulieu, J.M., O’Meara, B.C.: Detecting hidden diversification shifts in models of trait-dependent speciation and extinction. Syst. Biol. 65(4), 583–601 (2016). https://doi.org/10.1093/sysbio/syw022
Cole, D.J.: Parameter redundancy and identifiability in hidden Markov models. METRON 77, 105–118 (2019). https://doi.org/10.1007/s40300-019-00156-3
FitzJohn, R.G.: Diversitree: comparative phylogenetic analyses of diversification in R. Methods Ecol. Evol. 3, 1084–1092 (2012). https://doi.org/10.1111/j.2041-210X.2012.00234.x
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn, pp. 296–297. CRC Press, Boca Raton (2004)
Geoghegan, J.L., et al.: Genomic epidemiology reveals transmission patterns and dynamics of SARS-CoV-2 in Aotearoa New Zealand. Nature Commun. 11(1), 6351 (2020). https://doi.org/10.1038/s41467-020-20235-8, https://www.nature.com/articles/s41467-020-20235-8
Hadfield, J., et al.: Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34(23), 4121–4123 (2018). https://doi.org/10.1093/bioinformatics/bty407
Höhna, S., et al.: RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65(4), 726–736 (2016). https://doi.org/10.1093/sysbio/syw021
Kermack, W.O., McKendrick, A.G., Walker, G.T.: A contribution to the mathematical theory of epidemics. Proc. Roy. Soc. Lond. Ser. A Containing Papers Math. Phys. Character 115(772), 700–721 (1927). https://doi.org/10.1098/rspa.1927.0118
Lemieux, J.E., et al.: Phylogenetic analysis of SARS-CoV-2 in Boston highlights the impact of superspreading events. Science 371(6529) (2021). https://doi.org/10.1126/science.abe3261, https://science.sciencemag.org/content/371/6529/eabe3261
Newton, M.A., Raftery, A.E.: Approximate Bayesian inference with the weighted likelihood bootstrap. J. Roy. Stat. Soc. Ser. B (Methodol.) 56(1), 3–26 (1994)
Pan, B., et al.: Identification of epidemiological traits by analysis of SARS-CoV-2 sequences. Viruses 13(5), 764 (2021). https://doi.org/10.3390/v13050764
Paradis, E., Schliep, K.: ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019)
Popa, A., et al.: Genomic epidemiology of superspreading events in Austria reveals mutational dynamics and transmission properties of SARS-CoV-2. Sci. Transl. Med. 12(573) (2020). https://doi.org/10.1126/scitranslmed.abe2555, https://stm.sciencemag.org/content/12/573/eabe2555
Price, M.N., Dehal, P.S., Arkini, A.P.: Fasttree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26(7), 1641–1650 (2009). https://doi.org/10.1093/molbev/msp077
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2020). https://www.R-project.org/
Revell, L.J.: phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012)
SABRE: Sabre market intelligence platform (2020). https://www.sabreairlinesolutions.com/images/uploads/AirVision-Market-Intelligence_GDD_Profile_Sabre.pdf
Sagulenko, P., Puller, V., Neher, R.A.: TreeTime: maximum-likelihood phylodynamic analysis. Virus Evol. 4(1), vex042 (2018). https://doi.org/10.1093/ve/vex042
Schwarz, G.E.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978). https://doi.org/10.1214/aos/1176344136
Sjaarda, C.P., et al.: Phylogenomics reveals viral sources, transmission, and potential superinfection in early-stage COVID-19 patients in Ontario, Canada. Sci. Rep. 11(1) (2021). https://doi.org/10.1038/s41598-021-83355-1, https://www.nature.com/articles/s41598-021-83355-1
Takahashi, S., Greenhouse, B., Rodríguez-Barraquer, I.: Are seroprevalence estimates for severe acute respiratory syndrome coronavirus \(2\) biased? J. Infect. Dis. 222(11), 1772–1775 (2020). https://doi.org/10.1093/infdis/jiaa523
Yanev, N.M., Stoimenova, V.K., Atanasov, D.V.: Branching stochastic processes as models of Covid-\(19\) epidemic development. arXiv e-prints (2020)
Acknowledgements
We thank Fredrik Ronquist for very valuable comments. K.B.’s research is supported by Vetenskapsrådets Grant 2017–04951 and partially by an ELLIIT Call C grant. H.K.’s research is partially supported by Vetenskapsrådets Grant 2017–04951.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kiang, H.C., Bartoszek, K., Sakowski, S., Iacus, S.M., Vespe, M. (2022). Summarizing Global SARS-CoV-2 Geographical Spread by Phylogenetic Multitype Branching Models. In: Chicco, D., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2021. Lecture Notes in Computer Science(), vol 13483. Springer, Cham. https://doi.org/10.1007/978-3-031-20837-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-20837-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20836-2
Online ISBN: 978-3-031-20837-9
eBook Packages: Computer ScienceComputer Science (R0)