Abstract
Free full text
16S rRNA Gene Sequencing for Bacterial Identification in the Diagnostic Laboratory: Pluses, Perils, and Pitfalls
The use of 16S rRNA gene sequences to study bacterial phylogeny and taxonomy has been by far the most common housekeeping genetic marker used for a number of reasons. These reasons include (i) its presence in almost all bacteria, often existing as a multigene family, or operons; (ii) the function of the 16S rRNA gene over time has not changed, suggesting that random sequence changes are a more accurate measure of time (evolution); and (iii) the 16S rRNA gene (1,500 bp) is large enough for informatics purposes (12). In 1980 in the Approved Lists, 1,791 valid names were recognized at the rank of species. Today, this number has ballooned to 8,168 species, a 456% increase (http://www.bacterio.cict.fr/number.html#total). The explosion in the number of recognized taxa is directly attributable to the ease in performance of 16S rRNA gene sequencing studies as opposed to the more cumbersome manipulations involving DNA-DNA hybridization investigations. DNA-DNA hybridization is unequivocally the “gold standard” for proposed new species and for the definitive assignment of a strain with ambiguous properties to the correct taxonomic unit. Based upon DNA-DNA reassociation kinetics, the genetic definition of a species is quantifiable, i.e., (i) ca. ≥70% DNA-DNA relatedness and (ii) 5°C or less ΔTm for the stability of heteroduplex molecules. DNA hybridization assays are not without their shortcomings, however, being time-consuming, labor-intensive, and expensive to perform. Today, fewer and fewer laboratories worldwide perform such assays, and many studies describing new species are solely based upon small subunit (SSU) sequences or other polyphasic data.
In the early 1990s the availability DNA sequencers in terms of cost, methodologies, and technology improved dramatically, such that many centers can now afford such instrumentation. In 1994, Stackebrandt and Goebel (15) summarized the emergence of SSU sequence technology and its potential usefulness in the definition of a species. Although it has been demonstrated that 16S rRNA gene sequence data on an individual strain with a nearest neighbor exhibiting a similarity score of <97% represents a new species, the meaning of similarity scores of >97% is not as clear (13). This latter value can represent a new species or, alternatively, indicate clustering within a previously defined taxon. DNA-DNA hybridization studies have traditionally been required to provide definitive answers for such questions. Whereas 16S rRNA gene sequence data can be used for a multiplicity of purposes, unlike DNA hybridization (>70% reassociation) there are no defined “threshold values” (e.g., 98.5% similarity) above which there is universal agreement of what constitutes definitive and conclusive identification to the rank of species.
Unidentified bacteria or isolates with ambiguous profiles.
One of the most attractive potential uses of 16S rRNA gene sequence informatics is to provide genus and species identification for isolates that do not fit any recognized biochemical profiles, for strains generating only a “low likelihood” or “acceptable” identification according to commercial systems, or for taxa that are rarely associated with human infectious diseases. The cumulative results from a limited number of studies to date suggest that 16S rRNA gene sequencing provides genus identification in most cases (>90%) but less so with regard to species (65 to 83%), with from 1 to 14% of the isolates remaining unidentified after testing (5, 11, 17). Difficulties encountered in obtaining a genus and species identification include the recognition of novel taxa, too few sequences deposited in nucleotide databases, species sharing similar and/or identical 16S rRNA sequences, or nomenclature problems arising from multiple genomovars assigned to single species or complexes.
Routine isolates.
Surveys have looked at the feasibility of identifying routine clinical isolates or specific groups of medically important bacteria using SSU gene sequence data. In each of these studies, SSU sequence data has been compared to identification results obtained either in conventional or commercial test formats (Table (Table1)1) . A couple of general observations can be made from these investigations, namely, (i) a higher percentage of species identifications were obtained using SSU sequence results than with either conventional or commercial methods and (ii) most studies, with the exception of one study by Fontana et al. (6), have found that 16S yielded species identification rates of 62 to 91%. In the study by Fontana et al. (6) the closest match in the MicroSeq 500 database was considered the identification no matter what the distance score was. For bacteria that are difficult to grow or identify the identification rates were lower with 16S rRNA sequencing (62 to 83%) than the values traditionally acceptable in the clinical laboratory (i.e., ≥90%) (12). Problems again revolved around complete and accurate databases and groups that are not easily distinguishable by 16S rRNA gene sequencing (2, 8).
TABLE 1.
No. of strains | Group studieda | 16S
| Commercial system(s) | Species identification (%)c
| Reference | ||||
---|---|---|---|---|---|---|---|---|---|
Size(s) (bp) | Databaseb | Criteria (%)c | Conv | Comm | 16S | ||||
72 | GNB | 1,189, 527, 418 | MicroSeq | CM | Conv, MIDI, Biolog | 90 | 67.7-84.6 | 89.2 | 16 |
328 | Mycobacteria | 500 | MicroSeq | ≥99 | Conv. | 42 | 62.5 | 8 | |
83 | GNB, GPB | 527 | MicroSeq | CM | Vitek 2, Phoenix | 77.1 | 100 | 6 | |
231 | Bacteroides | 899, 711 | GenBank | ≥99 | Conv | 74.5 | 83.1 | 14 | |
47 | CNS | 1,500 | GenBank | >97 | API StaphID, Phoenix | 63.8-85.1 | 87.2 | 9 | |
20 | GPA | 1,500 | GenBank | ≥98 | Vitek ANA, RapID ANA II, | 20-45 | 65 | 10 | |
MicroSeq | API 20A | ||||||||
107 | GNNFB | 796 | GenBank, EMBL, DDBJ | ≥99 | API 20NE, Vitek 2 | 53.2-54.2 | 91.6 | 2 |
It is clear from the information listed in Table Table11 that 16S rRNA gene sequence information has an expanding role in the identification of bacteria in clinical or public health settings. However, the data also clearly show that it is not foolproof and applicable in each and every situation.
There were more than 1,700 species on the 1980 Approved Lists, but this list does not imply that all of these taxa are valid. Many names included predate modern DNA-DNA hybridization studies and most certainly phylogenetic investigations. Thus, the type strains for many species may not accurately reflect the entire genomic composition of the nomenspecies, and such situations have a direct bearing on SSU studies with reference to microbial identification. Some bacterial species exist as “phenospecies” or “complexes,” that is, more than one genomovar (DNA group) exists within that species and cannot be separated phenotypically. Examples of these kinds of situations include Enterobacter cloacae (at least 7 genomovars originally), Pseudomonas stutzeri (18 genomovars originally), and the genus Acinetobacter (22 genomovars originally).
Although 16S rRNA gene sequencing is highly useful in regards to bacterial classification, it has low phylogenetic power at the species level and poor discriminatory power for some genera (2, 11), and DNA relatedness studies are necessary to provide absolute resolution to these taxonomic problems. The genus Bacillus is a good example of this. The type strains of B. globisporus and B. psychrophilus share >99.5% sequence similarity with regard to their 16S rRNA genes, and yet at the DNA level exhibit only 23 to 50% relatedness in reciprocal hybridization reactions (7). In our laboratory we have found that the type strains of Edwardsiella species exhibit 99.35 to 99.81% similarity to each other, and yet these three species are clearly distinguishable biochemically and by DNA homology (28 to 50% relatedness). Such examples indicate that SSU sequence similarity even to a very high level does not in each case imply identity or accuracy in microbial identifications. Many investigators have found resolution problems at the genus and/or species level with 16S rRNA gene sequencing data (Table (Table2).2). These groups include (not exclusively), the family Enterobacteriaceae (in particular, Enterobacter and Pantoea), rapid-growing mycobacteria, the Acinetobacter baumannii-A. calcoaceticus complex, Achromobacter, Stenotrophomonas, and Actinomyces. Some of these problems are related to bacterial nomenclature and taxonomy while others are related to different issues cited below.
TABLE 2.
Genus | Species |
---|---|
Aeromonas | A. veronii |
Bacillus | B. anthracis, B. cereus, B. globisporus, B. psychrophilus |
Bordetella | B. bronchiseptica, B. parapertussis, B. pertussis |
Burkholderia | B. cocovenenans, B. gladioli, B. pseudomallei, B. thailandensis |
Campylobacter | Non-jejuni-coli group |
Edwardsiella | E. tarda, E. hoshinae, E. ictaluri |
Enterobacter | E. cloacae |
Neisseria | N. cinerea, N. meningitidis |
Pseudomonas | P. fluorescens, P. jessenii |
Streptococcus | S. mitis, S. oralis, S. pneumoniae |
A further problem regarding the resolution of 16S rRNA gene sequencing concerns sequence identity or very high similarity scores. Reports have documented 16S rRNA gene sequence similarities or identity for the Streptococcus mitis group and other nonfermenters (Table (Table2).2). In such instances 16S rRNA gene sequence data cannot provide a definitive answer since it cannot distinguish between recently diverged species (13, 16). In other instances, the difference between the closest and next closest match to the unknown strain is <0.5% divergence (>99.5% similarity). In these circumstances, such small differences cannot justify choosing the closest match as a definitive identification, although in some studies this is exactly what was done (6).
The usefulness of 16S rRNA gene sequencing as a tool in microbial identification is dependent upon two key elements, deposition of complete unambiguous nucleotide sequences into public or private databases and applying the correct “label” to each sequence. Years ago the overall quality of nucleotide sequences deposited in public databases was questionable, since many depositions were of poor quality (9, 13). Much of this misinformation that was originally present in such databases was thought to have been corrected; however, a recent multicenter study from the United Kingdom (1) conservatively estimates that at least 5% of the 1,399 sequences searched had substantial errors associated with them ranging from chimeras (64%) to sequencing errors or anomalies (35%). A 1995 study by Clayton et al. (4) also revealed that at least 26% of 16S rRNA gene sequence pairs (two sequences deposited for the same species) in GenBank had >1% random sequencing errors and, of these, almost half had >2% random sequencing errors.
Unfortunately, no universal definition for species identification via 16S rRNA gene sequencing exists, and authors vary widely in their use of acceptable criteria for establishing a “species” match (Table (Table1).1). In none of these studies does the definition of a species “match” ever exceed 99% similarity (<1% divergence). Based on the data listed above, even this threshold value may not be sufficient in all instances to guarantee an accurate identification. In the case of Aeromonas veronii the genome can contain up to six copies of the 16S rRNA gene that differ by up to 1.5% among themselves. This implies intragenomic heterogeneity of the 16S rRNA gene among aeromonads and would preclude the use of this technology alone for species identification. The collective data described above strongly suggest that any microbial identifications using 16S rRNA distance scores of >1% are unsatisfactory for a diagnostic or public health reference laboratory.
A number of other issues related to SSU gene sequencing merit brief mention. These include the number of position ambiguities, sequence gaps, and use of gap and/or nongapped programs with regard to sequence evaluation and analysis. Other concerns involve isolate purity, DNA extraction methods, and possible chimeric molecule formation (9, 16, 17). All of these problems to some extent affect final identifications.
The use of 16S rRNA gene sequencing in the clinical laboratory is becoming commonplace for identifying biochemically unidentified bacteria or for providing reference identifications for unusual strains. Although some researchers would never question using a molecular identification over a conventional one, 16S rRNA gene sequencing is not infallible, and examples of such misidentifications have been published (3). Although it is clear that SSU sequencing plays an important role in the identification of unknown isolates or those with ambiguous biochemical profiles, it is less clear what that role is in other situations. An intriguing question concerns how accurate is our routine identification of very common species using conventional methodologies or commercial systems. Although it is generally regarded that these identifications are highly accurate, we now have a more convenient and precise mechanism for checking these identifications on a molecular basis. Such studies need to be performed and published.
The use of 16S rRNA gene sequencing for definitive microbial identifications and for publication requires a harmonious set of guidelines for interpretation of sequence data that needs to be implemented so that results from one study can be accurately compared to another. In 2000, Drancourt et al. (5) made several recommendations concerning proposed criteria for 16S rRNA gene sequencing as a reference method for bacterial identification. We support Drancourt's guidelines for including full 16S rRNA gene sequences whenever possible, and in particular, for groups such as Campylobacter species that absolutely require it for accurate species identifications. Table Table33 expands on these recommendations for use in the diagnostic setting. It is clear that the appropriate use of such technology requires the adoption of standards similar to those previously defined for DNA-DNA hybridization. Because the adaptation of 16S rRNA gene sequencing as a tool in species identification is still a relatively new phenomenon in most clinical laboratories, such standards will most likely continue to evolve over time. Furthermore, use of microarray-based technologies with 16S or other housekeeping gene targets in the future may provide a much more sensitive and definitive platform for molecular species identification in the future.
TABLE 3.
Category | Guidelines |
---|---|
Strain to be sequenced | Phenetic profile of strain is not known by general grouping to present difficulties for identification by 16S rRNA gene analysis (Table (Table22) |
For strains such as those in Table Table22 requiring molecular identification, another housekeeping gene is required (e.g., rpoB) | |
16S rRNA gene sequencing | Minimum: 500 to 525 bp sequenced; ideal: 1,300 to 1,500 bp sequenced |
<1% position ambiguities | |
Criteria for species identification | Minimum: >99% sequence similarity; ideal: >99.5% sequence similarity |
Sequence match is to type strain or reference strain of species that has undergone DNA-relatedness studies | |
For matches with distance scores <0.5% to the next closest species, other properties, including phenotype, should be considered in final species identification |
Published ahead of print on 11 July 2007.
Articles from Journal of Clinical Microbiology are provided here courtesy of American Society for Microbiology (ASM)
Full text links
Read article at publisher's site: https://doi.org/10.1128/jcm.01228-07
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc2045242?pdf=render
Citations & impact
Impact metrics
Article citations
Optimization of a DNA extraction protocol for improving bacterial and fungal classification based on Nanopore sequencing.
Access Microbiol, 6(10):000754.v3, 07 Oct 2024
Cited by: 0 articles | PMID: 39376590 | PMCID: PMC11457918
Co-Culture of Gut Bacteria and Metabolite Extraction Using Fast Vacuum Filtration and Centrifugation.
Methods Protoc, 7(5):74, 19 Sep 2024
Cited by: 0 articles | PMID: 39311375 | PMCID: PMC11417889
Biodiversity of microorganisms in the Baltic Sea: the power of novel methods in the identification of marine microbes.
FEMS Microbiol Rev, 48(5):fuae024, 01 Sep 2024
Cited by: 0 articles | PMID: 39366767 | PMCID: PMC11500664
Review Free full text in Europe PMC
A novel barcoded nanopore sequencing workflow of high-quality, full-length bacterial 16S amplicons for taxonomic annotation of bacterial isolates and complex microbial communities.
mSystems, 9(10):e0085924, 10 Sep 2024
Cited by: 0 articles | PMID: 39254034 | PMCID: PMC11494973
The Etiological and Antimicrobial Susceptibility Profiles of the Bacteria Obtained from Ovine Caseous Lymphadenitis Cases in the Çankırı Region, Türkiye.
Life (Basel), 14(9):1078, 28 Aug 2024
Cited by: 0 articles | PMID: 39337862 | PMCID: PMC11433428
Go to all (737) article citations
Other citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
16S rRNA sequencing in routine bacterial identification: a 30-month experiment.
J Microbiol Methods, 67(3):574-581, 21 Jul 2006
Cited by: 122 articles | PMID: 16859787
Then and now: use of 16S rDNA gene sequencing for bacterial identification and discovery of novel bacteria in clinical microbiology laboratories.
Clin Microbiol Infect, 14(10):908-934, 01 Oct 2008
Cited by: 345 articles | PMID: 18828852
Review
[16S rRNA gene sequencing for pathogen identification from clinical specimens].
Zhonghua Yi Xue Za Zhi, 88(2):123-126, 01 Jan 2008
Cited by: 3 articles | PMID: 18353221
Improved diagnosis specificity in bone and joint infections using molecular techniques.
J Infect, 55(6):510-517, 29 Oct 2007
Cited by: 56 articles | PMID: 18029022