Abstract
Background
Bats (Order Chiroptera) are an important reservoir of emerging zoonotic microbes, including viruses of public health concern such as henipaviruses, lyssaviruses, and SARS-related coronaviruses. Despite the continued discovery of new viruses in bat populations, a significant proportion of these viral agents remain uncharacterized, highlighting the imperative for additional research aimed at elucidating their evolutionary relationship and taxonomic classification.Results
In order to delve deeper into the viral reservoir hosted by bats, the present study employed Next Generation Sequencing (NGS) technology to analyze 13,105 swab samples obtained from various locations in China. Analysis of 378 sample pools revealed the presence of 846 vertebrate-associated viruses. Subsequent thorough examination, adhering to the International Committee on Taxonomy of Viruses (ICTV) criteria for virus classification, identified a total of 120 putative viral species with the potential to emerge as novel viruses, comprising a total of 294 viral strains. Phylogenetic analysis of conserved genomic regions indicated the novel virus exhibited a diverse array of viral lineages and branches, some of which displayed close genetic relationships to known human and livestock pathogens, such as poxviruses and pestiviruses.Conclusions
This study investigates the breadth of DNA and RNA viruses harbored by bats, delineating several novel evolutionary lineages and offering significant contributions to virus taxonomy. Furthermore, the identification of hitherto unknown viruses with relevance to human and livestock health underscores the importance of this study in encouraging infectious disease monitoring and management efforts in both public health and veterinary contexts. Video Abstract.Free full text
Unveiling bat-borne viruses: a comprehensive classification and analysis of virome evolution
Abstract
Background
Bats (Order Chiroptera) are an important reservoir of emerging zoonotic microbes, including viruses of public health concern such as henipaviruses, lyssaviruses, and SARS-related coronaviruses. Despite the continued discovery of new viruses in bat populations, a significant proportion of these viral agents remain uncharacterized, highlighting the imperative for additional research aimed at elucidating their evolutionary relationship and taxonomic classification.
Results
In order to delve deeper into the viral reservoir hosted by bats, the present study employed Next Generation Sequencing (NGS) technology to analyze 13,105 swab samples obtained from various locations in China. Analysis of 378 sample pools revealed the presence of 846 vertebrate-associated viruses. Subsequent thorough examination, adhering to the International Committee on Taxonomy of Viruses (ICTV) criteria for virus classification, identified a total of 120 putative viral species with the potential to emerge as novel viruses, comprising a total of 294 viral strains. Phylogenetic analysis of conserved genomic regions indicated the novel virus exhibited a diverse array of viral lineages and branches, some of which displayed close genetic relationships to known human and livestock pathogens, such as poxviruses and pestiviruses.
Conclusions
This study investigates the breadth of DNA and RNA viruses harbored by bats, delineating several novel evolutionary lineages and offering significant contributions to virus taxonomy. Furthermore, the identification of hitherto unknown viruses with relevance to human and livestock health underscores the importance of this study in encouraging infectious disease monitoring and management efforts in both public health and veterinary contexts.
Video Abstract(124M, mp4)
Supplementary Information
The online version contains supplementary material available at 10.1186/s40168-024-01955-1.
Background
Most emerging viruses of public health importance are caused by pathogens that originate in animals, in particular wildlife species [1, 2]. These viruses are often predisposed to cross-species transmission due to their viral characteristics, host ecological traits, and environmental changes that provide pathways for spillover to people [3, 4]. The quest to pinpoint the animal species most likely to harbor zoonotic viruses and identify the viral groups adept at crossing species boundaries to infect humans has been a persistent focus of scientific inquiry. Bats (Order Chiroptera) harbor a seemingly disproportionately high diversity of zoonotic viruses, including high-consequence pathogens such as henipaviruses, lyssaviruses, and SARS-related coronaviruses (SARSr-CoVs) [5–11]. Bats occur across Africa, the Americas, Asia, Europe, and Oceania, and comprise more than 1450 recognized species to date [12], which have come to symbolize the rich tapestry of zoonotic viruses harbored by wild vertebrate animals. Bat viral diversity is far greater than currently known, with substantial potential for future zoonotic emergence [13].
The emergence of bat-related coronaviruses has highlighted the phenomenon of inter-genus switching. The closest relative virus strains of both SARS-CoV and SARS-CoV-2 are mainly identified from horseshoe bats (Rhinolophus spp. of family Rhinolophidae) [14–19]. Analysis of viral diversity in bats provides evidence that SARSr-CoVs are shared among species and genera, and that high levels of recombination among viral lineages result [20–22]. This is likely enhanced by bat species that cohabit similar ecological niches including species from the genera Rhinolophus species, Hipposideros species, and Aselliscus species [22, 23]. The emergence of bat CoVs in captive-reared wildlife, game animals, and people is also likely enhanced by increasing contact among wildlife species, captive animal, and human populations, driven by land use change, agricultural expansion, and other environmental changes [4, 24, 25]. In Southeast Asia, close contact between bats, other animals, and people is estimated to lead to a median of over 66,000 individuals becoming infected by SARSr-CoVs each year [26]. This process of ecological changes leading to enhanced inter-species transmission may explain some of the chains of emergence of bat-origin viruses via intermediate hosts, such as Middle East Respiratory Syndrome (MERS) via camels, Nipah virus via pigs, and Hendra virus via horses [27–29].
Despite these developments, the factors propelling the cross-species transmission of viruses among animals, as well as their potential to spark outbreaks in humans, remain shrouded in mystery. NGS technology has led to the identification of numerous novel viruses in bats [14, 30–38]. However, the utility of these studies is undermined by difficulty in interpreting a novel virus’ potential for cross-species transmission without further characterization, and by disparities in sampling strategies that may under- or over-estimate the importance of a region, host species, or viral group. A comprehensive exploration of diverse habitats and ecological states among bats may hold greater value than solely focusing on specific locations. To address these challenges, we conducted a large-scale virome study involving 54 species of bats in mainland China. After analyzing data from the families Coronaviridae and Paramyxoviridae [21, 22, 39], this study delved deeper into other eukaryotic-related viruses, particularly those related to vertebrates, within the sequencing data. The taxonomic categorization of vertebrate animal viruses frequently relies on distinctions encompassing host specificity, pathogenic attributes, hemagglutination characteristics, whole-genome nucleotide differentials, or variances in amino acid sequences of particular coding entities. Through consultation of ICTV reports and related research results, we have formulated a proposed classification scheme for bat-borne viruses delineated within the scope of this investigation. The results provide a deeper understanding of the bat virome, and knowledge on the evolutionary relationships among bat-associated viruses, and their potential for cross-species transmission, including people.
Methods
Sample collection
Between 2016 and 2021, oral and anal swabs belonging to 13,105 bats of 54 species captured in their natural habitats, including karstic caves, woods, forests, and abandoned buildings were taken. We used hand nets, mist nets, and harp traps to capture bats, and recorded location information with place names and GPS coordinates. Oral and anal swab samples from the same bat were stored in the same sample storage tube pre-filled with viral transport medium (Yocon, China). Samples were temporarily stored at−20 °C during transportation to the laboratory, then stored at−80 °C. Bats were identified by experienced field biologists using morphological features, and these were confirmed by DNA barcoding (mitochondrial cytochrome b gene). Based on the data regarding species richness and sample sizes collected from various provinces, autonomous regions, and municipalities, the Shannon index was calculated, and the results were visualized using the R package ggplot2 in RStudio. The principal component analysis (PCA) was generated and visualized by TBtools-II [40].
In order to improve efficiency and reduce the burden of library preparation and sequencing, a pooling strategy was adopted for samples of the same bat species collected from the same sampling location during the same time period. Our sample pooling strategy mainly considered the bat species and collection location, as well as the collection time. The number of samples in each pool usually did not exceed 50, with an average of 35 samples per pool. The pooling scheme for this study is also described in the previous published studies on coronaviruses and paramyxoviruses [21, 22, 39].
Nucleic acid extraction, library preparation, and sequencing
To extract total DNA and RNA from swab samples, we adhered to a previously outlined protocol [30]. Pooled swab samples underwent homogenization and subsequent filtration through a 0.45 µm polyvinylidene difluoride filter (Millipore, Germany). The filtrate was then subjected to centrifugation at 150,000×g for 3 h. The resulting precipitate underwent treatment with a cocktail of DNase and RNase enzymes to eliminate free nucleic acids. Simultaneously, the QIAamp MinElute Virus Spin Kit (Qiagen, Germany) was employed to isolate the viral nucleic acid.
For the synthesis of first-strand cDNA, the SuperScript™ IV First-Strand Synthesis System (Invitrogen, USA) was utilized along with the reverse primer K-8N, a random primer with a 5′-anchor. The conversion of single-stranded cDNA into double-strand was accomplished using Klenow fragment (NEB, USA) and primer K-8N. The sequence-independent single-primer amplification (SIA) method involved the use of primer K, and magnetic beads (Beckman Coulter, USA) and facilitated the purification of amplification products ranging from 300 to 2000 bp. Sequencing libraries for swab samples were constructed using Nextera DNA Flex (Illumina, USA). Paired-end (2×150 bp) sequencing of the library was performed on the Illumina HiSeq X Ten platform.
Viral contigs assembly and annotation
The raw paired-end sequence reads underwent initial quality control procedures, including the removal of adaptor sequences, primer K sequences, and low-quality reads. The resulting clean reads were then de novo assembled into contigs using MEGAHIT [41]. The assembled contigs were compared using DIAMOND [42] against the nonredundant protein database (nr, download in June 2023) obtained from NCBI (https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz). The E-value cut-off was set at 1E-5 to maintain high sensitivity at a low false-positive rate. The daa file obtained from DIAMOND was introduced into MEGAN [43] to visualize the taxonomy tree of species, and contigs were extracted separately for viruses, each viral family, and genus.
After extracting the sequences, particular attention was given to the conserved genomic regions of each viral group. Notable targets for this classification included the RNA-dependent RNA polymerase (RdRp) gene for RNA viruses, the reverse transcriptase (RT) gene for retroviruses, and the DNA polymerase (DPOL) gene for DNA viruses. Although certain crucial viral entities may not be readily uncovered through broad-spectrum scanning methodologies, the implementation of specific local alignment techniques can serve to augment the depth and precision of viral information retrieval. The ORFfinder tool (https://www.ncbi.nlm.nih.gov/orffinder/) was employed to identify Open Reading Frames (ORFs) in viral sequences. By integrating outputs from both ORFfinder and local BLAST alignments facilitated by the diamond software, the virus sequences procured in this investigation were annotated in accordance with reference genomes. However, genomic RNAs translated through ribosomal frameshifting mechanisms—such as certain virus genera of Astroviridae, Retroviridae, and various other viral families—manual processing were conducted to ensure accurate annotation. Determination of viral hosts was carried out based on their primary hosts using the ViralZone web resource (www.expasy.org/viralzone/). This study is primarily focused on vertebrate viruses; however, the hosts of certain viruses such as the families Circoviridae and Picobirnaviridae remain uncertain. Consequently, viruses with unidentified hosts were only briefly investigated alongside insect and plant viruses. Given that sequences affiliated with coronaviruses and paramyxoviruses have been previously curated and disseminated, this study refrains from offering exhaustive descriptions and analyses of redundant data.
Demonstrating the viral diversity harbored by bat populations in China
Viral diversity relies on conserved genomic features as previously elucidated, and the clustering analysis for viral contigs was performed using CD-hit version 4.7 [44]. The vertebrate virus dataset procured during our research endeavor was combined with publicly accessible bat virus data, with the aim of providing a comprehensive overview of bat-borne viruses prevalent in China. Subsequent to the combination, we partitioned the compiled dataset according to the spatial delineation of provinces, municipalities, and autonomous regions, as well as taxonomic classifications pertaining to the genus of the bat hosts. Following this segmentation, a clustering analysis was conducted. The resulting dataset was analyzed using TBtools-II software to create a heatmap for visualization purposes, as well as to conduct principal component analysis and generate an upset plot [40]. PCA was employed to examine the distribution of virus families across geographic location provinces.
To delve into the diversity of viruses carried by different genera of bats, this study constructed a relational network graph between bat genera and viral families using Gephi software [45]. During the network construction, we employed the Force Atlas layout algorithm to optimize the graphical presentation, enhancing the visual harmony of the network by adjusting the repulsion strength parameter.
Phylogenetic analysis
Prior to constructing the evolutionary tree, classification criteria from diverse vertebrate virus families were referenced according to their respective ICTV reports (https://ictv.global/report/genome, https://ictv.global/report_9th), and gene segments conducive to virus classification were employed for phylogenetic analysis. To determine whether the retroviruses identified in this study are endogenous, having integrated into the host genome, or exogenous, transmitted between hosts through infection, we performed a blastn search of the retroviral sequences identified in this study against 85 publicly accessible bat genome datasets. In cases where individualized search programs were employed for certain viruses of interest, specific sequence segments were also utilized for analysis. Alignment of viral nucleotide and amino acid sequences was performed using the L-INS-i algorithm implemented in MAFFT [46], and subsequently, gaps and ambiguously aligned regions were removed using TrimAL [47]. Phylogenetic trees were then estimated using the maximum likelihood (ML) method implemented in iqtree version 2.2.2.7, with SH-aLRT and ultrafast bootstrap (UFBoot) support values calculated from 1000 replicate trees [48, 49]. The best-fit model of substitution was identified using ModelFinder [50]. Pairwise distances (p-distances) of aligned coding sequence (CDS) or deduced polyprotein sequences were calculated with the tools implemented in MEGA X [51]. For the definition of new viruses, this study prioritizes entries from the ICTV reports. In cases where there are no explicit numerical requirements or updated recommendations for relevant classifications, a rigorous review process will be employed to define potential new viral species/genus.
Result
Overview of Chiroptera virome
From 2016 to 2021, we collected 13,105 bat swab samples from 54 bat species across 8 families (Fig. 1a) in tropical and subtropical regions, with a comparatively smaller number from temperate areas within China (Fig. S1). These included both Yinpterochiroptera and Yangopterochiroptera suborders, with many samples from the families Vespertilionidae, Rhinolophidae, and Hipposideridae. The genera Rhinolophus, Myotis, and Hipposideros constituted the highest proportion of samples (Fig. 1b). Of the 14 sampled provinces, autonomous regions, and municipalities, Guangdong, Guangxi, and Yunnan exhibited the highest Shannon index, suggestive of both high species diversity and abundance (Fig. 1c). The principal component analysis (PCA) (Fig. 1d) was consistent with the Shannon index results, further demonstrating that sample collection patterns in the remaining provinces were relatively uniform.
For viral metagenomic analyses, a total of 4.13 terabytes (TB) of clean data were initially subjected to de novo assembly from 378 sequencing libraries, resulting in 28 gigabytes (GB) of contigs. Detailed information regarding each library can be found in Table S1. A total of 533 megabytes (MB) virus-like contigs were extracted from the daa file, comprising 1,342,258 contigs with an average length of 404.48 base pairs. A cumulative sum of 1.37 terabytes (TB) of viral sequences was successfully aligned to virus-like contigs. Each vertebrate-associated virus within each pool is explicitly labeled in Table S1; the families Adenoviridae, Astroviridae, Caliciviridae, Coronaviridae, Hepeviridae, Orthoherpesviridae, Flaviviridae, Papillomaviridae, Paramyxoviridae, Picornaviridae, Polyomaviridae, Retroviridae, Sedoreoviridae, the subfamilies Parvovirinae and Chordopoxvirinae. Within this spectrum of viruses, the family Coronaviridae was identified at the highest prevalence of approximately 87.04%, followed by the subfamily Parvovirinae at 86.24% and the family Picornaviridae at 80.42%.
Upon employing suitable gene markers for viral classification, an extensive analysis was conducted on the bat-derived virus dataset. As a result, the viruses were categorized into two groups based on their potential hosts: vertebrate viruses (Table S2), non-vertebrate eukaryotic viruses and viruses with an uncertain host affiliation (Table S3). The most noteworthy subset among the newly identified viruses encompassed 846 vertebrate viruses. This particular group is of particular interest due to their potential association with disease. In the vertebrate virus’s dataset, the predominant proportion of virus sequences was attributed to bats. Notably, a subset of sequences originating from samples of Rickett’s big-footed myotis (Myotis ricketti) exhibited similarities to aquatic-associated viruses upon comparative analysis. This observation can be attributed to the feeding behavior of M. ricketti, which has been documented preying on small fish [52]. Furthermore, various viruses associated with hosts that are challenging to ascertain, alongside invertebrate viruses, plant viruses, and other eukaryotic viruses, probably indicate links to the predation habits and habitat of bats. The assembled and annotated viral sequence data can be accessed in the GenBank database (Accession numbers OR951005–OR951402, OR952772–OR952931, OR998540–OR998571, OR998733–OR998965, PP744635–PP746020) and in Additional file 3.
Diversity of vertebrate viruses discovered in bats
In the analysis of 378 sequencing libraries, a total of 846 vertebrate virus strains were extracted. By cross-referencing the information with the ICTV report (Table S4), referring to pertinent literature, and computing the p-distance of genes conducive for taxonomic classification, we pinpointed 120 viruses within the vertebrate virus dataset that exhibit characteristics indicative of potential new viral species. Among these newly discovered vertebrate viruses, 61.67% (74/120) were DNA viruses, while 38.33% (46/120) were RNA viruses (Table S2). This subset of novel viruses included 294 strains in total, spanning 16 viral families. In terms of quantity, most of the newly discovered viruses originated from the family Vespertilionidae, followed by the families Hipposideridae and Rhinolophidae. On a per-sample basis, the highest proportion novel viruses were identified in the host family Emballonuridae, with 0.152 novel viruses per sample, followed by Pteropodidae with 0.035 novel viruses per sample, and Vespertilionidae with 0.022 novel viruses per sample (Fig. 1a and Table S2).
Within DNA viruses, adenoviruses exhibited high diversity and a widespread distribution across bat populations (Fig. 2a, Table S18, and S19). Except for two viral families within the order Reovirales, our study’s outcomes closely align with data from other studies within the merged database, indicating a high level of consistency. PCA was performed separately for the sampling regions and the identified virus species in this study. The variables considered were the diversity values of viruses within their respective virus families, in relation to their geographical location and host species. Distinct clustering was observed among the sampling regions, with Guangdong, Guangxi, Zhejiang, and Yunnan exhibiting clear separation. Similarly, among virus families, Picornaviridae, Papillomaviridae, and subfamily Parvovirinae were notably segregated from other families, as illustrated in Fig. 2b and c. Subsequently, statistical analyses were conducted on the host information for each cluster of viral strains identified in the study. The results revealed the number of clusters shared between different bat families and their respective viral families or subfamilies (Fig. 2d, Table S20). Notably, the genera Rhinolophus and Hipposideros, which were previously classified within the same family, harbored the highest number of viruses, comprising 10 viral clusters. These clusters not only included the widely studied RNA viruses but also small, non-enveloped DNA viruses, making these species critical for viral sharing. The second most significant host pair was Rhinolophus and Myotis (mouse-eared bats), which shared 5 clusters. This result is consistent with the habitat overlap observed among these bat species.
We examined the relationship between the pairwise p-distances of clustered genes and the actual distances between strains. Most data points followed a specific pattern, showing that combinations with low ratios tended to occur when p-distances were low. This pattern indicates the presence of highly similar viral strains in hosts sampled at distant geographical locations (Fig. S2a). Combinations with a ratio lower than 1 represented the clusters with widespread geographic distribution, with the majority consisting of DNA viruses (18 out of 26 clusters). These clusters included several from the subfamily Parvovirinae and the family Papillomaviridae. RNA viruses (8 out of 26 clusters) were predominantly represented by members of the family Picornaviridae, with only one cluster belonging to Hepeviridae. Combinations with a ratio lower than 1 were marked and displayed on a topographic map, revealing that these clusters are distributed over distances exceeding 1600 km, primarily in plains regions, with relatively fewer occurrences in highland areas (Fig. S2a).
To gain a better understanding of virus spread patterns among these wild mammals, we created a host-virus association network. Network analysis shows that the genera Hipposideros and Rhinolophus occupy central positions, suggesting significant roles in virus transmission. The bat genera Taphozous, Myotis, Tylonycteris, Scotophilus, Rousettus, and Miniopterus exhibit increasingly lower centrality in the network. Picornaviridae, Parvovirinae, and Astroviridae have the highest centrality, followed by the families Caliciviridae, Adenoviridae, Papillomaviridae, and Polyomaviridae (Fig. 3a). Statistical data on the viral loads in bats of varying genera, extrapolated from amino acid levels in conservative viral regions, were collated to generate a comprehensive overview of the spectrum of viruses harbored by bats across diverse regions in China. This data was visually represented to elucidate the viral diversity within different bat genera (Fig. 3a). Notably, bats inhabiting the southeastern and southwestern regions of China demonstrated the highest viral species diversity, in contrast to the relatively low diversity observed in the northern regions and Qinghai province, which may be attributed to factors such as sample size variations or ecological distribution disparities. Intermediate levels of viral diversity are evident in bats from the central and western regions. Analysis of virus carriage by distinct bat genera revealed that members of the genus Rhinolophus exhibit the most extensive viral diversity across most regions in China, with the genera Myotis and Pipistrellus following suit. According to the data provided, the genera in our sample have the representation in terms of sample size.
Evolutionary relationships of vertebrate viruses discovered in bats
Among the viral families identified in this study, Picornaviridae and Caliciviridae demonstrate exhibited broad distribution across multiple bat genera, while viruses in the family Papillomaviridae were limited to fewer host species (Fig. 4). Interestingly, while previous studies have identified bat-associated viruses across all four genera of the family Flaviviridae, this study only detected members of the genus Pestivirus in our samples.
For other viral families, such as Adenoviridae, Astroviridae, and Hepeviridae, bat-borne viruses displayed host-specific clustering, with the bat-associated Hepeviridae viruses belonging to the genus Chirohepevirus. Novel strains were also identified in the family Polyomaviridae, where viruses were found in multiple genera, including a new branch with recently discovered strains. Within the family Orthoherpesviridae, bat-borne viruses are exclusively found in the subfamilies Betaherpesvirinae and Gammaherpesvirinae, with no bat viruses identified in the subfamily Alphaherpesvirinae. In the case of the family Retroviridae, we detected both endogenous and exogenous viruses. For example, the two Betaretroviruses carried by the lesser dawn bat (Eonycteris spelaea) were likely endogenous, as confirmed by their alignment with bat genomes (Table S5), whereas other retroviruses are likely exogenous.
Notably, we identified novel bat poxviruses in Rhinolophus spp. and other small bats. Although there has been one prior report of a bat poxvirus in China, specifically in eastern bent-winged bat (Miniopterus fuliginosus), where a small segment of the A2L gene was identified (GenBank accession no. KJ651959.1), our study assembled and amplified seven poxvirus contigs from a Chinese rufous horseshoe bat (Rhinolophus sinicus), with a total length of 152 kb. The evolutionary analysis of this virus revealed significant divergence from known poxviruses, with only 60% sequence identity to the closest relative, molluscum contagiosum virus (Molluscipoxvirus molluscum) of the genus Molluscipoxvirus, which can infect humans, suggesting it may represent a new species. The evolutionary relationship of the newly discovered poxvirus was investigated through specific analysis of the gB gene (Fig. 5a). Subsequently, the amino acid sequences of 25 conserved genes were concatenated to construct an evolutionary tree for Rhinolophus bat poxvirus and reference sequences (Fig. 5b). Within the subfamily Chordopoxvirinae, the phylogenetic relatedness of Rhinolophus bat poxvirus is notably similar to molluscum contagiosum virus. Conversely, the bat poxvirus detected in the Americas and Australia displays limited genetic similarity with the strain identified in Asia. Analysis of the major antigenic protein gB revealed distinct antigenicity between these two bat-borne poxviruses. The bat poxvirus identified in this study could potentially signify a new species within the family Poxviridae based on its evolutionary relationship and genetic divergence.
In addition to the rare detection of poxviruses, a nucleotide segment from a great leaf-nosed bat (Hipposideros armiger)-borne rubella-related virus (Fig. S3d) was also extracted. This sequence may represent a new host for the family Matonaviridae after humans, domestic donkey (Equus asinus), cyclops leaf-nosed bat (Doryrhina cyclops), and yellow-necked field mouse (Apodemus flavicollis) [53]. Importantly, both H. armiger and D. cyclops belong to the family Hipposideridae.
Aquatic-associated viruses were also detected (Fig. S3a, b, c). Two fish astrovirus strains, two Salovirus strains (belonging to the family Caliciviridae), and two Secondpapillomavirinae strains were identified in M. ricketti, which has an unusual piscivorous diet [52]. Fish papillomavirus has been rarely detected in China, and the L1 protein sequences of these two viruses shared less than 40% similarity with all known viruses in this group.
Identification of new virus members
This study examines vertebrate viruses that demonstrate alignment with recognized clades within their respective viral families, as well as introducing novel clades not previously documented in the literature. Given this scenario, a thorough review was conducted to determine the classification of the identified entity as a novel component of viral taxonomic classification. Our study reported novel viruses in the genera Mastadenovirus, Mamastrovirus, Sapovirus, Pestivirus, Roseolovirus, Amdoparvovirus, Bocaparvovirus, Rosavirus, Dependoparvovirus, Protoparvovirus, Hepatovirus, Rosavirus, Teschovirus, Alphapolyomavirus, Betapolyomavirus, and Tupavirus as well as certain unclassified virus members such as bat poxvirus, bat papillomavirus, and bat polyomavirus (Table S2).
In the family Adenoviridae, bat-borne viruses belong to the genus Mastadenovirus, and adenoviruses carried by different bat families often form distinct host-related branches on the evolutionary tree. In light of the limited availability of pathogenic and coagulation data, the identification of a novel Mastadenovirus species found in bats necessitates a focused scrutiny on the p-distance of the DNA polymerase gene (DPOL), typically necessitating a threshold exceeding 0.1 to 0.15 (Table S4, S6). Consequently, an upper limit within this range was implemented, classifying values surpassing 0.15 as indicative of novel members within the genus Mastadenovirus.
The taxonomy of Astroviridae and Adenoviridae exhibits notable similarities, particularly in the formation of the largest genus known as Mamastrovirus within the respective virus families in mammals. ICTV has outlined classification guidelines for Mamastrovirus, predominantly centered on host specificity. Therefore, we adopted a p-distance greater than 0.2 for defining new members with a high similarity to RNA viruses, which is a relatively high threshold for the conserved RdRp region (Table S4, S7). Bat caliciviruses belong to the genera Sapovirus and Norovirus and do not demonstrate any classification criteria related to numbers. Hence, the selection of novel bat caliciviruses was characterized by a p-distance greater than 0.2 for the RdRp region (Table S4, S8).
In the family Flaviviridae, the viruses identified in this study belong to the genus Pestivirus. Pestiviruses harbored by bat species are found across various clades within the genus, as illustrated in Fig. 5c. The pestivirus found in the Japanese house bat (Pipistrellus abramus) (Pipistrellus bat pestivirus, PiPeV) not only bears striking resemblance to Pestivirus L (Linda virus, LindaV), which was implicated in an outbreak on an Austrian pig farm with an unknown source [54], but also shares close genetic similarity with a pestivirus strain (GenBank accession no. PP663643) identified in swine serum from Sichuan province, China (Fig. 5c). Additionally, viruses from the greater horseshoe bat (Rhinolophus ferrumequinum) exhibit a close relationship with the atypical porcine pestivirus (Pestivirus scrofae, APPeV). Similarly, Pestivirus S which is found in the lesser Asiatic yellow house bat (Scotophilus kuhlii) closely resembles a virus discovered in cattle blood called Zikole virus (GenBank accession no. OU592965) from Uganda in Africa [55]. The classification of pestiviruses necessitates consideration of various factors including host specificity, evolutionary relationships, and antigenic properties. In virome research, priority is given to evolutionary relationships and host associations. Notably, studies have shown that the conservative NS5B amino acid sequence (same as RdRp region) exhibits a p-distance of less than 0.13 within conspecific members, while this value exceeds 0.11 among interspecific counterparts, the instances falling within the range of 0.11 to 0.13 necessitate a thorough examination involving multiple factors, notably the specificity of the virus towards its host [56]. The p-distance among PiPeV is less than 0.028 and contrasts with a p-distance exceeding 0.155 with their nearest known relative LindaV. The branch of PiPeV contains a strain from swine which indicates that this branch is not host-specific. Noteworthy, Pestivirus S and Zikole virus display a minimal p-distance in their NS5B amino acid sequence, recorded at 0.12. Furthermore, the p-distance observed at the amino acid level of the polyprotein is consistently below 0.165. This necessitates a more in-depth analysis for their taxonomic categorization within the pestivirus classification system, especially considering the substantial geographic separation between the continents where they are distributed. The novel pestiviruses identified from R. ferrumequinum and pomona leaf-nosed bat (Hipposideros pomona) fulfills the criteria for defining a novel constituent, irrespective of the viewpoint of the host or the genetic divergence (Table S4, S9).
When categorizing Orthoherpesviridae, it was observed that existing guidelines lacked precise criteria (Table S4). Consequently, a statistical examination was conducted on reference sequences from the three subfamilies within this viral family. The specimens examined in this study belong to the subfamilies Betaherpesvirinae and Gammaherpesvirinae, displaying median p-distances of 0.383 and 0.723 (Table S10), respectively. Therefore, we comprehensively evaluated the p-distance of both the new host and DPOL. Notably, Novel species 1 and 4 exhibit notable distinctions from established sequences, while Novel species 2 and 3 introduce new hosts to existing branches (Table S11).
The taxonomic classification of members within the family Papillomaviridae necessitates elucidating their phylogenetic relationships through comprehensive analysis of whole-genome sequences. In the present investigation, a concatenated analysis was performed utilizing reference sequences along with the E1, E2, L2, and L1 genes of the study strain in conjunction with complete genome sequences (Fig. 6a and Table S12). The Papillomaviridae members identified in this study predominantly fall within various virus genera associated with bat-borne viruses, specifically Dyopsipapillomavirus, Treisdeltapapillomavirus, and Dyotaupapillomavirus, in addition to several unassigned clades. Through statistical analysis, it was determined that the median genetic distances between and within genera are 0.529 and 0.396, respectively (Fig. 6b). Consequently, our criterion for delineating a novel viral species is based on a threshold distance exceeding 0.396 (Table S13).
The categorization of members within the family Parvoviridae is predominantly determined by the genetic divergence observed in their non-structural protein 1 (NS1) sequences. The NS1 sequences of Parvovirinae were aligned by online blast, and subsequently annotated based on the alignment outcomes that demonstrated a similarity of less than 80% (Table S14).
Members of the family Picornaviridae are categorized into distinct genera based on well-defined classification criteria, despite demonstrating a considerable degree of divergence, with shared traits surpassing 30%. Consequently, this threshold serves as a guideline for the recognition of novel Picornaviridae members (Table S15). Among these emerging viruses, Teschovirus, harbored by E. spelaea and Leschenault’s Rousette (Rousettus leschenaultii) of the family Pteropodidae, and Rosavirus carried by the black-bearded tomb bat (Taphozous melanopogon) are closely affiliated with known pathogenic agents.
The taxonomic classification of members within the Polyomaviridae family is primarily based on the nucleotide sequence divergence observed in the coding regions of the large T-antigen (LTAg) gene. Irrespective of the host species, a threshold of greater than 15% difference in LTAg sequence is the established criterion for delineating distinct viral species. Applying this standard, several newly identified polyomavirus strains have been annotated accordingly (Table S16). Among these, the novel species 2 represents a host-specific lineage harbored by the P. abramus bat, while the novel species 4 occupies a phylogenetic position distinct from the two existing polyomavirus genera, warranting its classification within a novel taxonomic group. Novel species 4 members share a common characteristic with fish polyomaviruses, with the absence of introns in the large T-antigen gene. Viruses from this branch have also been detected in bats in other studies (Genbank accession no. NC033737 and MZ218055). However, these bats do not appear to prey on fish and these may genuinely represent bat-associated viruses rather than fish viruses.
This investigation identified potential exogenous retroviruses predominantly concentrated within the Spumaretrovirinae subfamily. These occupy distinct evolutionary branches from other established members of Spumaretrovirinae. Presently, the genus demarcation criteria for Spumaretrovirinae rely solely on host distinctions. Hence, the comparisons of known inter-virus reverse transcriptase (RT) distances were conducted, revealing the inter-genus p-distance was from 0.304 to 0.405 (median=0.384). Consequently, based on the novel host association and p-distance almost all are greater than the median of inter-genus. Herein, these novel spumaretroviruses are tentatively proposed to represent a distinct virus species.
Discussion
Numerous experimental techniques have been established for virome research with the SIA method particularly effective in identifying diverse RNA and DNA viruses [57]. Since 2012, Chinese researchers have been employing the SIA method to conduct virome investigations on bats [30, 31], and the viruses identified in these studies have played a crucial role in elucidating the evolutionary origins of human coronaviruses [11, 14, 20, 34, 58]. Bats harbor a plethora of viruses, with potential zoonotic transmission of some pathogens, capable of crossing inter-species barriers to induce infections in both human and animal populations. This underscores the significance of preemptively gathering data on the viral diversity within bat populations. Such proactive efforts are pivotal for early detection and surveillance of emerging infectious diseases. Hence, the research team has been engaged in extensive surveillance of viral pathogens harbored by bat populations, examining specimens collected from bats between the years 2016 and 2021.
This study reports a total of 13,105 samples from 54 different bat species across 14 provinces. Additionally, the PCA findings illustrated notable distinctions between samples obtained from the three designated hotspot regions of Yunnan province, Guangxi province, and Guangdong province in comparison to samples from other provinces. This suggests that these hotspot regions hold particular importance for future research endeavors. Our study reported novel viruses in 16 viral genera and several unclassified virus members such as bat poxvirus, bat papillomaviruses, and bat polyomaviruses. These novel viral clades suggest that substantial further viral diversity remains yet-to-be identified, and supporting conclusions of other studies that our current knowledge of viral diversity represents a significant underestimation of the total in mammalian hosts [13, 58].
PCA analysis of virus diversity highlighted that the three provinces (Yunnan, Guangxi, and Guangdong) where host species exhibit unique diversity patterns continue to exhibit distinct virus diversity, with the addition of Zhejiang province. At the viral family level of virus diversity analysis, the most enriched viral taxon was the family Papillomaviridae and the subfamily Parvovirinae. We believe that the previous underrepresentation of bat-borne viruses in the DNA virus category may have been due to sequencing library types and subjective selection factors.
Apart from evolutionary relationships, the statistical analysis of distribution ranges at the same scale (utilizing clusters formed with an 80% similarity threshold in this study) has revealed that within the same cluster, smaller genetic distances may correspond to broader distribution ranges. This phenomenon may be attributed to viruses within this branch having more stable genomes and enhanced transmission capabilities. It is imperative to conduct specific monitoring of these cluster viruses in subsequent research. Additionally, the migratory abilities of hosts need to be considered in the context of virus transmission. Therefore, virus transmission in high-altitude or cold regions might be somewhat restricted. This is also visually reflected in Fig. S2.
The emergence of a number of bat-origin (or likely bat-origin) viruses appears to have involved an intermediate or amplifying host [27, 29, 59, 60]. Despite certain viruses, such as specific lyssaviruses, may possess the ability for direct transmission without the need for an intermediary host, the most successful transmission of bat-borne viruses to vertebrate animals often requires an intermediate or amplifying host. Domestic animals, including pigs, horses, camels, and dogs, commonly serve as intermediate hosts, playing a critical role in zoonotic transmissions. The identification of PiPeV in domestic pig populations within China, exhibiting a notable genetic affinity with viruses previously detected in Austria commercial pig herds [54], underscores the rising incidence of cross-species pathogen transmission events originating from bats. Enhanced pathogen surveillance at the wildlife-human interface is crucial to prevent infectious disease outbreaks. Spillover events often go unnoticed until they cause significant outbreaks. From a bat virus’s perspective, humans are just another host species, and cross-species transmission is largely driven by ecological and behavioral factors rather than biology. The changing environment and the growth of livestock farming, game farming, and the wildlife trade highlight the need for prioritized surveillance and viral discovery within the One Health framework to protect public and livestock health [61].
Conclusions
This study utilized NGS technique to systematically investigate the viruses carried by bats from 14 provinces in China. The research uncovered several unknown viruses closely related to human and livestock health and identified multiple novel viral evolutionary branches. These viruses were classified according to the ICTV criteria for virus classification, significantly enhancing our understanding of the diversity of viruses carried by bats. In the future, continued monitoring of viruses carried by important wildlife such as bats will reveal a broader diversity of viruses, which is crucial for the prevention and control of zoonotic diseases.
Acknowledgements
We express our gratitude to Dr. Marco Salemi and Dr. Carla Mavian from the University of Florida for their valuable input on the research methodology.
Authors’ contributions
Z.W. designed and supervised the research. Z.W., W.Z., J.Z, and S.Z. organized collection of samples. Y.W., P.X., Y.H., L.Z., and R.L. performed laboratory work. Y.W., P.X., Y.H., and J.L. carried out the analyses and prepared figures and tables. Y.W., Z.W., and P.D. drafted the manuscript. All co-authors assisted with the interpretation of the results and editing of the manuscript.
Funding
This study is supported by the CAMS Innovation Fund for Medical Sciences (CIFMS) (Grant No. 2021-I2M-1–038 and 2022-I2M-CoV19-002), National Natural Science Foundation of China (Grant No. 32370176), Science & Technology Fundamental Resources Investigation Program (Grant No. 2022FY100901), Foundation for Innovative Research Groups of the National Natural Science Foundation of China (Grant No. 82221004), Postdoctoral Fellowship Program of CPSF (Grant No. GZC20230311), Major Project of Guangzhou National Laboratory (Grant No. GZNL2023A01001), and National Institute of Allergy and Infectious Diseases of the U.S.A. (Grant No. R01AI110964).
Declarations
All animals were captured and sampled following guidelines of the Regulations for the Administration of Laboratory Animals (Decree No. 2 of the State Science and Technology Commission of the People’s Republic of China, 1988), and as approved by the Ethics Committee of Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College (Approval number: IPB EC20100415).
Not applicable.
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Yuyang Wang, Panpan Xu, and Yelin Han contributed equally to this work.
References
Articles from Microbiome are provided here courtesy of BMC
Citations & impact
This article has not been cited yet.
Impact metrics
Alternative metrics
Discover the attention surrounding your research
https://www.altmetric.com/details/170433221