Abstract
Free full text
Sequencing-based methods and resources to study antimicrobial resistance
Abstract
Antimicrobial resistance extracts high morbidity, mortality and economic costs yearly by rendering bacteria immune to antibiotics. Identifying and understanding antimicrobial resistance are imperative for clinical practice to treat resistant infections and for public health efforts to limit the spread of resistance. Technologies such as next-generation sequencing are expanding our abilities to detect and study antimicrobial resistance. This Review provides a detailed overview of antimicrobial resistance identification and characterization methods, from traditional antimicrobial susceptibility testing to recent deep-learning methods. We focus on sequencing-based resistance discovery and discuss tools and databases used in antimicrobial resistance studies.
Antimicrobials are small molecules that can inhibit or kill bacteria. These small molecules are commonly used as therapeutics for bacterial infections, but some bacteria can grow and survive despite antimicrobial pressures, a property known as antimicrobial resistance. In clinical settings, resistant bacterial infections decrease available treatment options and increase morbidity and mortality compared with those caused by susceptible bacteria1–5. Resistance is observed against nearly all antimicrobials (FIG. 1a,b), including so-called last-resort antimicrobials used in life-threatening, multidrug-resistant infections6–10. Bacteria resistant to first-line antimicrobials infect 2 million people in the USA yearly, and these infections exact a US$20 billion health-care cost11–13. This problem is not isolated to the USA. In the European Union, antimicrobial resistance has accounted for >30,000 deaths and nearly 900,000 disability-adjusted life-years14. In fact, multiple national and global public health organizations categorize antimicrobial resistance as an imminent danger and uniformly agree that tracking its emergence and prevalence is critical to minimize the threat to human health14–17. Antimicrobial susceptibility testing (AST) is the traditional method for assaying antimicrobial resistance in bacteria (Box 1). These culture-based tests determine how well bacteria can grow in the presence of antimicrobials. AST is widely used in hospital clinical microbiology laboratories because it provides actionable phenotypic resistance data to guide patient treatment decisions. Although culture-based resistance determination can provide critical information for patient management and resistance gene epidemiology, it has drawbacks in implementation and information content18. Conducting AST requires microbiology facilities and trained clinical microbiology personnel for accuracy. Additionally, AST is viable only for cultivable bacteria, precluding studies on the emergence and spread of antimicrobial resistance in diverse and complex microbial communities with large fractions of currently uncultured bacteria19.
Bacterial antimicrobial resistance is usually genetically encoded (FIG. 1c). Genetically encoded antimicrobial resistance can occur through a number of mechanisms, including overexpression or duplication of existing genes, point mutations or the acquisition of entirely new genes via horizontal gene transfer (HGT). Improvements in next-generation sequencing technologies and computational methods are facilitating rapid antimicrobial resistance gene identification and characterization in genomes and metagenomes. These developing technologies and methods complement traditional culture-based methods for clinical and surveillance applications and provide opportunities for quick and sensitive resistance determinations in cultivable and uncultivable bacteria. Large-scale and comparative studies of human, animal and environmental samples have provided unprecedented insights into the global distribution of antimicrobial resistance genes and the spread of multidrug-resistant bacteria20–24, resistance exchange networks25 and how different habitats and phylogeny affect the evolutionary dynamics of antimicrobial resistance worldwide26. Understanding and surveying genetic determinants of resistance using sequencing data pose unique challenges that are being addressed by improved computational algorithms that organize genomic data and predict antimicrobial resistance and by improving in vitro sequencing modalities.
In this Review, we discuss the strengths and weak-nesses of current and emerging methods for studying resistance, including computational strategies and resources for resistance gene identification in genomic and metagenomic samples. We also describe recent advancements to mitigate weaknesses in resistance detection methods, and we highlight areas requiring greater focus.
Sequencing-based resistance discovery
Advancements in sequencing technologies have increased bacterial sequence data availability, and continually decreasing costs have made sequencing a viable antimicrobial resistance surveillance tool. Several methods and tools have been published in recent years for detecting genetic determinants of antimicrobial resistance from whole-genome sequencing (WGS) and whole-metagenome sequencing (WMS) data (TABLE 1). Organizing sequencing data is an important pre-processing step before antimicrobial resistance gene analysis. Short reads, generated by technologies such as Illumina, can either be processed using assembly-based methods, whereby sequencing reads are first assembled into contiguous fragments (contigs) and then annotated by comparing with custom or public reference databases, or directly analysed using read-based methods, whereby resistance determinants are predicted by mapping reads directly to a reference database (FIG. 2).
Table 1 |
name | Description | Accessibility | Year | Link | status |
---|---|---|---|---|---|
Assembly-based tools | |||||
Resfinder72 | Tool for detecting acquired AR genes from sequenced or partially sequenced bacterial isolates | Web and/or standalone | 2012 | https://cge.cbs.dtu.dk/services/ResFinder/ | Active |
ARG-ANNOT66 | Tool for pairwise comparison of query sequence with ARG-ANNOT database | Standalone | 2014 | Not available | Archived |
RGI67 | • Pairwise comparison of query sequence with the CARD • Uses curated AR detection models to predict intrinsic resistance genes, dedicated resistance genes and acquired resistance from mutations in drug targets | Web and/or standalone | 2015 | https://card.mcmaster.ca/analyze/rgi | Active |
ARGs-OAP (v2)74 | • Online analysis pipeline for AR genes • Detection from metagenomic data using an integrated structured database of AR sequences | Web and/or standalone | 2016 | https://galaxyproject.org/use/args-oap/ | Active |
ARIBA52 | Tool for rapid AR genotyping directly from sequencing reads using curated public databases | Standalone | 2017 | https://github.com/sanger-pathogens/ariba | Active |
PointFinder73 | Web tool for WGS-based detection of AR associated with chromosomal point mutations in bacterial pathogens | Web and/or standalone | 2018 | https://cge.cbs.dtu.dk/services/ResFinder/ | Active |
NCBI-AMRFinder | Tool for identification of acquired resistance genes using NCBI’s curated AR database and curated collection of HMMs | Standalone | 2018 | https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/AMRFinder/ | Active |
Read-based tools | |||||
SRST2 (REF.50) | Tool for direct mapping of reads to curated AR databases | Standalone | 2014 | https://github.com/katholt/srst2 | Active |
SEAR54 | Cloud-compatible pipeline and web interface for rapidly detecting AR genes directly from sequencing reads | Web and/or standalone (archived) | 2015 | https://github.com/will-rowe/SEAR | Archived |
ShortBRED61 | Tool to profile protein families in the metagenomic data using short peptide marker sequences | Standalone | 2015 | http://huttenhower.sph.harvard.edu/shortbred | Active |
PATRIC176 | A unique resource for studying AR | Web | 2016 | www.patricbrc.org | Active |
SSTAR177 | Tool to identify known, putative new alleles and truncated versions of existing AR genes from WGS data | Standalone | 2016 | https://github.com/tomdeman-bio/Sequence-Search-Tool-for-Antimicrobial-Resistance-SSTAR- | Active |
KmerResistance51 | Program that compares the co-occurrence of k-mers from raw reads with k-mers from multiple databases | Web | 2016 | https://cge.cbs.dtu.dk/services/KmerResistance/ | Active |
GROOT56 | Software that enables resistome profiling by mapping metagenomic reads to graph representation of reference gene sets | Standalone | 2018 | https://github.com/will-rowe/groot | Active |
DeepArgs110 | A deep-learning approach for predicting AR genes from metagenomic data | Web | 2018 | https://bench.cs.vt.edu/deeparg | Active |
AR, antimicrobial resistance; ARG-ANNOT, Antibiotic Resistance Gene Annotation; CARD, Comprehensive Antibiotic Resistance Database; GROOT, Graphing Resistance Out of Metagenomes; HMM, hidden Markov model; NCBI, National Center for Biotechnology Information; PATRIC, Pathosystems Resource Integration Center; RGI, Resistance Gene Identifier; SEAR, Search Engine for Antimicrobial Resistance; ShortBRED, Short, Better Representative Extract Dataset; WGS, whole-genome sequencing.
Assembly-based methods.
The de novo assembly of WGS of bacterial genomes from short-read data is generally performed by De Bruijn graph (DBG)-based assemblers such as SPAdes27, Velvet28, ABySS29 and SOAPdenovo30. In this approach, sequencing reads are divided into shorter overlapping sub-sequences (called k-mers) of length k (where k is less than the read length) and are used to form a network graph. The assemblers then reconstruct the genome sequence by finding an optimum path (euler’s path) through the graph that visits each edge once (see reF.31 for more information on DBG-based assembly). Although the DBG approach is computationally efficient in handling high-volume sequencing data, it is greatly affected by errors introduced during sequencing32. Errors in sequencing data introduce false k-mers in the graph, resulting in fragmented assemblies. Several assemblers (for example, SPAdes and Velvet) heuristically eliminate these errors before finding a Euler’s path in the graph31,33. Assembling WMS data is more complicated than single-isolate assembly (FIG. 2a), as the algorithms need to account for unknown abundances of different organisms with unknown phylogenetic relationships32. In single-genome assembly, uniform sequencing coverage across the genome is used by assemblers to correct sequencing errors and to identify repetitive sequences and plasmids — several assemblers exploit the higher coverage of plasmids owing to copy number to distinguish between chromosome and plasmid sequences in isolate genomes34–37 — but uneven coverage of different organisms in WMS data makes detecting repeats difficult. Long stretches of identical sequences in unrelated species further complicate assembly by making it difficult to assign reads to a particular species. Thus, algorithms developed for single-genome assembly cannot be directly applied to assemble metagenomes. Several metagenome-specific assemblers have been developed to overcome these challenges, either by partitioning or optimizing the graph for uneven sequencing depths32. Some notable metagenomic assemblers are IDBA-UD38, MEGAHIT39, MetaSPAdes40 and MetaVelvet41 (extensions of SPAdes and Velvet for metagenomes). The CAMI project42, now starting its second iteration43, seeks to benchmark these assemblers on highly complex and close to real data sets for users. However, currently, there is no single assembler that stands out as the best one that would accurately reconstruct known genomes and capture the majority of the taxonomic diversity in real data sets. Both biological factors (such as sample source and microbial community structure) and technical factors (such as library preparation method, sequencing depth and sequencing platform choice) affect the ability of an assembler to generate accurate and larger contigs. Thus, it is recommended to apply multiple assemblers on a subset of samples to determine the best fit for a given data set.
Following assembly, genomic or metagenomic contigs are annotated for resistance determinants by predicting protein-coding regions on contigs and then comparing them against antimicrobial resistance reference databases using similarity-based search tools (for example, BLAST44, USEARCH45 or DIAMOND46). Although pairwise alignment between the query and antimicrobial resistance reference sequences is the most commonly applied approach for characterizing the resistome from contigs, an inherent bias of databases towards human-associated organisms is reflected in prediction outputs, so choosing the appropriate databases to compare assembled contigs with reference sequences is imperative47.
Given sufficient coverage, assembly-based methods can construct whole genomes or large contigs with protein-coding genes, regulatory sequence information and the complete surrounding genomic context. This information can be used to study co-associated genes and biological pathways that are involved in resistance determination. Assembly and annotation of WMS data can identify antimicrobial resistance genes that are more divergent from and lack homology to known sequences in the reference databases. However, the process of de novo assembly and annotation is computationally expensive, time consuming and requires higher genome coverage than reference-based assembly or read mapping-based methods, which can be difficult to achieve for all samples, specifically when dealing with metagenomic samples with high microbial diversity and uneven taxonomic composition.
Read-based methods.
Antimicrobial resistance genes in a sample can be detected without genome assembly either by aligning reads to the reference databases using pairwise alignment tools such as Bowtie2 (REF.48) or BWA49, or by splitting reads into k-mers and mapping them to the reference databases.
SRST2 (REF.50) is one widely used tool that aligns reads to a custom reference database using Bowtie2 to predict antimicrobial resistance genes in the sample. Alternatively, KmerResistance51 splits reads into k-mers, maps them and counts the co-occurrence of k-mers between reads and a reference database to predict resistance genes and associated species. Both methods can identify antimicrobial resistance genes even in the presence of contaminants (for example, background noise in the raw reads owing to the presence of laboratory or host contamination) and in samples for which insufficient reads are available for de novo assembly, but they cannot predict antimicrobial resistance conferred by single-nucleotide polymorphisms (SNPs). By contrast, ARIBA52 uses a hybrid approach where reference sequences in the database are first clustered using CD-HIT53 before sequences from each cluster are assembled independently. Resulting contigs are then compared with the closest reference to identify allelic variants. Additionally, ARIBA provides information on whether genes are complete or fragmented and reports sequence variants along with their potential effects (for example, missense, nonsense or frameshift mutations and small insertions and deletions (indels)). Clustering reference sequences and using a representative sequence from the cluster to map reads considerably reduce ambiguous alignments54, but using a single linear representative locus masks subtle yet important variation between subtypes and subfamilies of genes within clusters54,55. To account for this information loss, Graphing Resistance Out of Metagenomes (GROOT)56, a newly established tool for resistome profiling of metagenomes, builds a variation graph for reference gene sets and aligns sequence reads to these graphs. Variation graphs are bidirectional acyclic sequence graphs that represent overall sequence variation within a given population. The alignment of reads against variation graphs effectively removes reference bias and facilitates accurate annotation of antimicrobial resistance genes. Before aligning sequences against a variation graph, traversals within the graphs are indexed by either Burrows–Wheeler transform or hash-map (minHash), indexing algorithms that considerably improve the mapping rate of large-scale sequencing reads to the graphs48,57. The read-based approach is generally fast and less computationally demanding because it bypasses de novo assembly, protein-coding gene prediction and pairwise alignment to public databases. For this reason, read-based methods have gained traction in recent years, especially in clinical diagnostics where conducting real-time sequencing-based resistance prediction is crucial.
Choosing the right approach.
Presently, there is no consensus on which sequence analysis approach is better, and the choice of analysis mainly depends on the type of sequencing (WGS versus WMS), availability of computational resources and the study objective. Both approaches have trade-offs, as assembly causes information loss compared with direct read analysis58 but enables identification of protein-coding genes and for investigation of upstream and downstream regulatory elements, whereas direct read analysis lacks the positional information required to analyse upstream and downstream factors of identified resistance genes. New sequencing technologies, such as long-read sequencing and chromosome conformation capture-derived assays, are helping to alleviate this information loss by improving assembly fidelity (BOX 2).
The read-based approaches scale well with ever-increasing query sequences and antimicrobial resistance reference data. More importantly, they enable identification of antimicrobial resistance genes from low-abundance organisms present in complex communities, which may be missed by assembly-based methods owing to incomplete or poor assemblies. However, mapping reads directly to large data sets can inflate false-positive predictions, as reads derived from protein-coding sequences may spuriously align to other genes as a result of local sequence homology59. Thus, it is important that the reference databases are comprehensive and contain all variants of the reference genes. Database choice is especially critical when identifying antimicrobial resistance genes from large and complex communities such as soil and ocean, as novel or distant homologues of antimicrobial resistance genes present in understudied, less characterized environmental communities may be missed.
Well-studied sample types, such as the human gut, are now extensively characterized, even for low-abundance microorganisms and, thus, read-based approaches can be more confidently applied60. However, analysis of diverse samples is confounded by the lack of reference sequences, so the antimicrobial resistance genes in these environments are likely underestimated. To address this problem, a marker-based method, Short, Better Representative Extract Dataset (ShortBRED)61, was developed that enables fast and accurate profiling of the resistome in metagenomic data sets. ShortBRED first identifies marker sequences (short peptide sequences) representative of antimicrobial resistance protein families from the reference database and then maps reads to these markers to quantify the relative abundance of the associated antimicrobial resistance protein families. Several studies have applied this method to quantify the abundance of resistance genes in large and complex metagenomic data sets, including human25,62, animal63 and environmental data sets64. Downstream analysis of resistomes from metagenomic samples can be performed similarly to taxonomic and functional profiling. A comprehensive discussion on processing and analysing metagenomic samples has been previously published60.
Antimicrobial resistance databases
Both assembly-based and read-based approaches for the computational prediction of antimicrobial resistance in pathogens and environmental bacteria depend largely on curated antimicrobial resistance gene databases that link known genetic determinants of resistance to the antimicrobials they confer phenotypic resistance against (TABLE 2). These databases usually represent information accumulated from multiple studies that include AST of bacteria harbouring specific antimicrobial resistance genes.
Table 2 |
Database | Description | Link | status |
---|---|---|---|
General databases | |||
CARD67 | • Ontology-based database that provides comprehensive information of AR genes and their resistance mechanisms • Currently contains >2,200 protein homologues and includes a curated set of resistance-conferring chromosomal mutations in protein-coding genes | https://card.mcmaster.ca/ | Active; launched in 2013; updated monthly |
Resfinder72 | Collation of AR genes involved in HGT events | https://cge.cbs.dtu.dk//services/ResFinder/ | Active; started in 2012; update regularly; last update in February 2019 |
ResfinderFG84 | Collection of resistance gene variants identified in multiple functional metagenomics studies | https://cge.cbs.dtu.dk/services/ResFinderFG/ | Active; last update in November 2016 |
Resfams26 | A profile HMM-based curated database confirmed for AR function | http://www.dantaslab.org/resfams/ | Active; last update in January 2015 |
ARDB65 | • First centralized resource of AR gene information • Manually curated; contains >4,500 AR sequences | https://ardb.cbcb.umd.edu/ | Archived; last updated in 2009 |
MEGARes178 | • Collation of multiple databases (CARD, ARG-ANNOT and ResFinder) to avoid redundancy between entries • For high-throughput screening and statistical analysis | https://megares.meglab.org/ | Active; last update in December 2016 |
NDARO | • Collated and curated data from multiple databases (CARD, Lahey, Pasteur Institute β-Lactamases and ResFinder) • Contains 4,500 AR sequences | https://www.ncbi.nlm.nih.gov/bioproject/PRJNA313047 | Active; started in 2016 |
ARG-ANNOT66 | • Repository of >1,800 AR sequences collated from scientific literature and online resources • Also includes point mutation data for select AR-associated chromosomal genes | Not available | Archived; last update in May 2018 |
Mustard85 | Resource containing 6,095 AR determinants from 20 families, including curated sets of AR genes identified in functional metagenomics studies | http://mgps.eu/Mustard/ | Active; last update in November 2018 |
FARME database83 | Curated set of microbial sequences functionally screened to confer resistance in various functional metagenomics studies of different habitats | http://staff.washington.edu/jwallace/farme/ | Active; last update in 2017 |
SARG (v2)74 | • Hierarchical structured database derived from ARDB, CARD and NCBI-NR database • Contains >12,000 AR genes; also includes profile HMMs for 189 AR genes subtypes | http://smile.hku.hk/SARGs | Active |
Lahey list of β-lactamases70 | First initiative to compile known β-lactamases and assign nomenclature to novel ones | http://www.lahey.org/Studies/ | Archived; last update in 2015 |
BLDB179 | Manually curated database for AR enzymes classified by class, family and subfamily | http://bldb.eu/ | Active; last update in November 2018 |
LacED68,69 | Curated database of TEM and SHV β-lactamases, including a curated set of known TEM and SHV variants | http://www.laced.uni-stuttgart.de/ | TEMLacED active: last update in 2017; SHVED archived: last update in April 2010 |
CBMAR71 | Database that identifies and characterizes novel β-lactamases on the basis of Ambler classification | http://proteininformatics.org/mkumar/lactamasedb/ | Last update in September 2014 |
Species-specific databases | |||
MUBII-TB-DB76 | Database of mutations associated with AR in Mycobacterium tuberculosis | https://umr5558-bibiserv.univ-lyon1.fr/mubii/mubii-select.cgi | Last update in December 2013 |
u-CARE180 | User-friendly, comprehensive AR repository for Escherichia coli | http://www.e-bioinformatics.net/ucare | Last update in 2016 |
AR, antimicrobial resistance; ARDB, Antibiotic Resistance Genes Database; ARG-ANNOT, Antibiotic Resistance Gene Annotation; BLDB, β-Lactamase Database; CARD, Comprehensive Antibiotic Resistance Database; CBMAR, Comprehensive β-Lactamase Molecular Annotation Resource; FARME, Functional Antibiotic Resistance Metagenomic Element; HGT, horizontal gene transfer; HMM, hidden Markov model; LacED, Lactamase Engineering Database; NDARO, National Center for Biotechnology Information (NCBI) Bacterial AR Reference Gene Database/National Database of Antibiotic Resistant Organisms.
Generalized versus specialized databases.
Public databases vary considerably in the scope of the resistance mechanisms19 that they cover and in the type of information they provide for annotations. Generalized antimicrobial resistance databases, such as the now archived Antibiotic Resistance Genes Database (ARDB)65 or the active Antibiotic Resistance Gene Annotation (ARG-ANNOT)66 and Comprehensive Antibiotic Resistance Database (CARD)67, cover broad spectrums of antimicrobial resistance genes and mechanism information, whereas specialized antimicrobial resistance databases provide comprehensive information for specific gene families or species (TABLE 2). For example, targeted databases such as Lactamase Engineering Database (LacED)68,69, the Lahey database of β-lactamases70, National Center for Biotechnology Information (NCBI) β-Lactamase Alleles Initiative, and the Comprehensive β-Lactamase Molecular Annotation Resource (CBMAR)71 focus on β-lactamases, a family of antimicrobial resistance enzymes that facilitate hydrolysation of the key β-lactam rings in β-lactam antimicrobials, thus protecting the bacteria from the antimicrobial activity. Resfinder72 is a web-based and standalone tool for detecting acquired antimicrobial resistance genes from sequenced or partially sequenced bacterial isolates (TABLE 1). Unlike other databases that require contigs as an input, Resfinder72 also accepts short reads as an input for comparison against known acquired resistance genes in bacterial genomes. In 2017, Resfinder72 updated its web-based service to enable identification of chromosomal mutations using PointFinder73. However, the identification of antimicrobial resistance-conferring chromosomal mutations is available for only a limited set of pathogenic microorganisms (Campylobacter, Escherichia coli, Mycobacterium tuberculosis, Neisseria gonorrhoeae, Plasmodium falciparum and Salmonella). Similar to Resfinder72, CARD67 offers its own tool, known as Resistance Gene Identifier (RGI), which uses curated antimicrobial resistance detection models to predict intrinsic antimicrobial resistance genes, dedicated resistance genes and acquired resistance from mutations in drug targets. RGI uses two antimicrobial resistance detection models: Protein Homologue Model for detecting functional homologues of antimicrobial resistance proteins and Protein Variant Model for the detection of mutations conferring antimicrobial resistance in otherwise sensitive targets. ARGs-OAP (v2)74 uses a custom database constructed from ARDB65 and CARD67, called SARG, with a hybrid UBLAST and BLASTX algorithm, reflecting the critical need for a comprehensive database combined with lower identity matching for antimicrobial resistance gene annotation of metagenomic sequence data.
Species-specific databases exist for pathogenic or model bacteria such as M. tuberculosis (for example, Tuberculosis Drug Resistance Database75 or MUBII-TB-DB76) and E. coli (TABLE 2). These species-specific databases are invaluable for understanding resistance in these specific organisms but also highlight the importance of considering antimicrobial resistance genes in their phylogenetic context, especially as some bacteria can have intrinsic resistance to some antimicrobials (reviewed previously77). Species-centric databases enable rapid and effective curation of new antimicrobial resistance genes and chromosomal mutations and can offer quick preliminary screening for characterization. Such screening has proved highly effective for pathogens such as M. tuberculosis in which HGT events are rare and drug resistance originates mainly from chromosomal mutations78. The CRyPTIC Consortium and 100,000 Genomes Project demonstrated this effectiveness in M. tuberculosis with resistance predictions with over 90% sensitivity and specificity for all four first-line anti-tuberculosis drugs79.
While these tools are all steps in the right direction, a continuously updating and comprehensive database with extensive gene metadata and the ability to find both point mutation matches and remote homologues is needed.
Hidden Markov model-based databases.
One major limitation of these databases is that the antimicrobial resistance genes they contain are heavily biased towards human pathogens and easily cultivable model organisms, making it difficult to identify remote homologues or novel resistance sequences present in fastidious or uncultured bacteria80. This bias complicates antimicrobial resistance gene identification across less commonly studied bacteria, a difficulty that is magnified by the diverse and complicated mechanisms that cause resistance81. One potential solution to overcome this bias is to use hidden Markov model (HMM) databases. Derived from the multiple sequence alignment of known sequences, an HMM can find sequences with similar function but low sequence identity82. Resfams26 is an HMM database of antimicrobial resistance proteins derived from multiple sequence alignments of manually curated sets of representative antimicrobial resistance protein sequences obtained from the generalized CARD and the specialized LacED69 and Lahey database70 (TABLE 2). The authors of the Resfams26 database showed that it can identify a substantially greater number of novel antimicrobial resistance genes and remote homologues of known antimicrobial resistance genes than other databases such as ARDB and CARD that rely on BLAST-based methods for gene identification. A direct comparison of manually curated antimicrobial resistance gene sets showed that Resfams26 identified 64% more antimicrobial resistance genes in both soil and human gut microbiota than the BLAST-based search of CARD and ARDB. This increased sensitivity demonstrates the versatility of the HMM in annotating sequences from non-clinical samples with sparser representation in publicly available resistance gene databases. However, HMM-based approaches may have poor specificity (yield higher number of false-positive hits) and may not be able to distinguish between protein families with closely related functions. This could occur owing to the higher probability of selecting sequences from other subfamilies on the basis of domains common to the family. To mitigate the lack of specificity, Resfams26 (like the Pfam database) uses curated thresholds (for example, a gathering threshold) for each profile HMM. These profile-specific gathering threshold values set an inclusion or exclusion bit score cut-off by comparing it with test data sets containing negative sequences. Currently, Resfams26 contains 166 profile HMMs that represent major antimicrobial resistance gene families. HMM-based antimicrobial resistance databases could be valuable in identifying large and diverse arrays of resistance determinants in understudied environmental samples compared with BLAST-based databases. However, current HMM-based databases do not identify resistance arising from chromosomal mutations. To further facilitate the detection of antimicrobial resistance genes in large complex environments, the Functional Antibiotic Resistance Metagenomic Element (FARME)83 database comprises a curated set of microbial sequences excluded from current databases but functionally screened to confer resistance in various functional metagenomics studies of different habitats. Apart from predicted protein-coding antimicrobial resistance sequences, the FARME database also includes regulatory elements, mobile genetic elements and predicted proteins flanking antimicrobial resistance genes.
A similar database, the functional resistance database (ResfinderFG)84, was built by aggregating data from four functional metagenomics studies selected against 23 antimicrobials. When comparing this database with the Resfinder72 database, the authors noted that they found different results by total antimicrobial use; this observation may represent a difference in how resistance is conferred when putative resistance determinants are cloned into E. coli as compared with when they are expressed in their native bacterial host.
The Mustard85 antimicrobial resistance determinants database uses an innovative approach of incorporating 3D protein structure to help predict resistance genes. When this approach was applied to predicted proteins from metagenomic samples, it predicted >6,000 resistance genes compared with 67 genes identified by BLASTP and 50 by Resfinder72, suggesting higher sensitivity.
Remaining challenges.
Considerable developments in biocuration of antimicrobial resistance sequences have enabled the identification and characterization of antimicrobial resistance genes from genomes and metagenomes, but several limitations still preclude cost-effective and rapid antimicrobial resistance surveillance. One major bottleneck is the lack of effective curation strategies. With few exceptions, antimicrobial resistance databases lack efficient and sustainable curation pipelines, so they tend to receive active maintenance for a few years before becoming outdated.
Many antimicrobial resistance genes can be assigned names on the basis of nucleotide sequences and protein sequences, leading to conflicting naming schemes. Conflicting gene names and synonyms create redundancy across databases and confuse users (for example, dihydrofolate reductase is referred to as dhfr in some databases and dfrA in others)86. This problem is exacerbated by assigning gene names by sequence identity. A plethora of different sequence identity-based systems exists for assigning nomenclature to a new resistance gene. These systems offer different cut-offs and are not in consensus with the reference87.
Antimicrobial resistance genomic data are an ever expanding data source. HGT events and selection pressures that proliferate new antimicrobial resistance mutations require active biocuration strategies whereby entries can be curated as they are recognized. The propagation of new colistin resistance mechanisms such as mcr-1, which was first described in 2016 from Chinese bacterial isolates9 and then subsequently identified worldwide in newly collected and previously stored isolates88–96, demonstrates the need for frequent database updating and curation. When properly implemented, they facilitate rapid collection of epidemiological data for recently discovered resistant determinants97. Indeed, antimicrobial resistance annotation should be a continuous effort, as all downstream analyses depend on the accuracy of reference databases. Establishing best practices for biocuration, systematically assigning annotations to newly discovered genes and preventing misinterpretations will pay dividends for public health and basic science.
Another important limitation of current antimicrobial resistance databases is their focus on the identification and characterization of protein-coding resistance genes; they ignore other potential antimicrobial resistance mechanisms such as genomic changes or de novo mutations in ribosomal RNA (rRNA) genes and regulatory elements and drug target mutations. Recent efforts by CARD67 and Resfinder72 have tried to address this issue.
Functional metagenomics
In addition to sequence-based metagenomics, functional metagenomics is a powerful, culture-independent, sequence-unbiased approach for characterizing resistomes98,99. In this method, a metagenomic library is generated by cloning the total community DNA extracted from a sample into an expression vector. This library is transformed into a susceptible indicator host strain and is assayed for antimicrobial resistance by plating on selective media that are lethal to the wild-type host. The selected inserts from the surviving recombinant, antimicrobial-resistant host cells are then sequenced, and resulting sequences are subsequently assembled and annotated (FIG. 3). Parallel Annotation and Reassembly of Functional Metagenomic Selections (PARFuMS)100 is a custom computational pipeline that assembles reads from functional metagenomic selections into contigs using the Velvet28 and Phrap101 assemblers and annotates the assemblies for antimicrobial resistance genes using MetaGeneMark102 and Resfams26. This approach enables high-throughput analysis of large genomic content (up to 50 Gbp of unique metagenomic DNA interrogated per library), and antimicrobial resistance phenotypes can be associated directly with causative genes, obviating the need to culture individual antimicrobial resistance gene carriers.
Functional metagenomics has enabled the discovery of several new antimicrobial resistance mechanisms and their related genes103. One such example is the recently discovered tetracycline resistance mechanism by tetracycline destructases104, whereby soil functional metagenomics led to the discovery of nine genes that confer tetracycline resistance through enzymatic inactivation. Further analysis and biochemical characterization revealed that these enzymes catalyse tetracycline oxidation in an FAD-dependent manner, thereby inactivating tetracycline104.
While the preceding study shows the strength and usefulness of functional metagenomics, this approach has certain limitations. For example, a gene has to be functional outside its native microbial host to be identified by functional metagenomic selections. Many times, differences between a recombinant expression host such as E. coli and the original host (for example, some Gram-positive organisms) do not confer the same phenotype for the same gene. This problem was highlighted by studies showing effects of different hosts on the same metagenomic libraries97,105. Thus, there is a need to include a phylogenetically diverse group of hosts that can be used for functional metagenomic selections. In addition, genes outside their genomic context, such as syntenic regulatory elements, may have different phenotypes in the recombinant expression host from those in the original host106. Thus, it is important that novel antimicrobial resistance genes identified by functional metagenomics screens be characterized microbiologically and biochemically. Extension of the current functional metagenomics approach and development of new techniques to discover novel resistance genes are deserving research directions.
Machine learning for resistance prediction
Numerous studies have explored machine learning algorithms for studying antimicrobial resistance, highlighting its role in predicting resistance phenotype directly from genotype. Machine learning approaches can be implemented as supervised learning or unsupervised learning approaches. In supervised learning, the training data set with outcome of interest can be utilized to build a prediction model that can be further applied to query sequences to predict their outcome. Several studies have used gene presence or absence and AST outcomes as features to create the training set for models. In one study, a logistic regression approach was used to develop a model based on 14 gene parameters and 3 molecular typing markers that can differentiate between vancomycin-susceptible and vancomycin-intermediate Staphylococcus aureus using publicly available genomic data and patient isolates107. The model performance was tested by a leave-one-out validation method, and it showed 84% classification accuracy. Although this accuracy level does not meet clinical standards, the approach provides an important proof of concept that motivates the development of more sophisticated models for identifying antimicrobial resistance. Another study evaluated a rules-based and a machine learning-based approach (that is, logistic regression) for predicting antimicrobial resistance profiles and showed that the machine learning-based approach had higher accuracy with novel variants in known antimicrobial resistance genes than the rules-based approach24.
Recent studies and tools use k-mers derived from whole genomes of antimicrobial-resistant and antimicrobial-susceptible species along with their AST outcomes to develop prediction models. Mykrobe predictor108, a fast k-mer screening tool, is used to identify antimicrobial resistance genes and SNPs in S. aureus and M. tuberculosis. It utilizes the curated genetic information of resistant and susceptible alleles of the same species to build reference graphs (DBG) of these two categories and to map k-mers derived from sequencing reads to these graphs. Mykrobe predictor showed 99.1% and 82.6% sensitivity and 99.6% and 98.5% specificity for S. aureus and M. tuberculosis, respectively, on an independent validation set and provided important insights on potential antimicrobial resistance elements.
By contrast, Rapid Annotation using Subsystem Technology (RAST)109 is a k-mer-based tool that uses a machine learning classifier (AdaBoost) based on the Pathosystems Resource Integration Center (PATRIC) database to identify target-specific antimicrobial resistance genes in a specific collection of pathogens. RAST is trained on k-mer data derived from the contigs of each genome. These k-mer counts were converted to a binary matrix of 1s and 0s to depict whether a particular k-mer is present in that genome or not. The binary matrix along with AST outcome is then used to form a classifier model as well as to identify putative k-mers associated with resistance. The RAST classifier could identify carbapenem resistance in Acinetobacter baumannii, methicillin resistance in S. aureus and β-lactam and co-trimoxazole resistance in Streptococcus pneumoniae with accuracies of 88–99%109.
One major shortcoming of any machine learning classifier is its dependency on the training data or existing knowledge base. To apply machine learning classifiers in clinical diagnostics, a large data set of curated antimicrobial resistance genes that contains accurate genotypic data linked to curated AST data (Box 1) will be required to build an effective and robust machine learning-based classifier for antimicrobial-resistant organisms. In addition to differentiating between an antimicrobial-resistant and antimicrobial-susceptible organism, machine learning approaches are currently being applied to predict antimicrobial resistance genes in metagenomic data. DeepArgs110 is a newly established tool that applies deep learning111 to identify antimicrobial resistance genes. On the basis of curated data sets of CARD and ARDB combined with Uniprot protein data, DeepArgs built a dissimilarity matrix between antimicrobial resistance proteins and non-antimicrobial resistance proteins and used it to train two deep-learning models: DeepArg-LS for assembled genes and DeepArg-SS for short reads. These models can be used to predict antimicrobial resistance genes in new test data.
Although the application of machine learning to antimicrobial resistance prediction and classification is promising, these techniques have a long way to go before they can be used for rapid diagnostic purposes and replace traditional culture techniques and AST, which can take days or weeks to yield results.
Conclusions and future perspectives
Antimicrobial resistance is a major public health threat. Monitoring and understanding the prevalence, mechanisms and spread of antimicrobial resistance are priorities for both individual patient care and global infection control strategies. Despite stellar advancements, hurdles for antimicrobial resistance detection and understanding persist. Costs are decreasing for sequencing and for automated antimicrobial resistance detection instruments, but start-up and operation costs still outstrip many health-care budgets. Further cost reductions for these technologies will be important for widespread adoption.
The accurate identification of resistance determinants and the correlation of antimicrobial resistance gene profiles to antimicrobial treatment outcomes will facilitate personalized approaches to developing treatment regimens. The success of this approach depends heavily on the comprehensiveness and quality of public antimicrobial resistance gene databases, which have major roles in the development of biological assays and computational tools that expand our ability to detect resistance genes in single isolates and in microbial communities. While progress has been made in building comprehensive antimicrobial resistance gene databases, lack of standardization across databases and long update intervals hold back their potential. Moreover, complex resistance mechanisms (FIG. 1b) are difficult to capture in antimicrobial resistance databases. For example, resistance can arise from epistatic relationships between multiple genes such as in the case of carbapenem resistance, which can arise from the combination of extended-spectrum β-lactamases and efflux pumps or porin impermeability112. Resistance can even occur via overexpression of normal genes such as those encoding efflux pumps, and detection of these resistance mechanisms requires transcriptional measurements113,114 (FIG. 1b,c). These complex resistance mechanisms, coupled with the fact that known antimicrobial resistance genes may not always be expressed, contribute to the difficulty in accurately predicting phenotypic antimicrobial resistance from genotypic antimicrobial resistance data. Machine learning algorithms have made headway in using isolate genomic sequence data and antimicrobial resistance gene databases to predict phenotypic resistance, but these techniques tend to be specialized to specific bacteria or are not accurate and consistent enough for general clinical deployment. To realize the goal of making phenotypic predictions from genotypic data, we need more comprehensive databases that link specific antimicrobial resistance genes to specific AST results. Importantly, these databases should include a broad diversity of bacteria with full sequence and antimicrobial resistance gene prediction metadata and report AST results with exact zone sizes or minimum inhibitory concentrations (BOX 1) rather than categorical guideline interpretations. Parallel improvements in AST and sequence-based antimicrobial resistance gene prediction will augment efforts to mitigate the clinical impact of antimicrobial resistance.
Although techniques for novel antimicrobial resistance gene discovery exist, such as functional metagenomics, these techniques still have major caveats in the types of antimicrobial resistance genes that they can detect. Innovative methods to determine other antimicrobial resistance gene mechanisms are sorely needed. Moreover, robust models to predict which resistance genes will spread both on the local level within a health-care setting and on the global scale between countries are needed. These models will likely need to incorporate not only the antimicrobial resistance gene sequence and mechanism but also the genomic context, host bacterial species and geographic location.
Rapid and accurate identification of resistance genes in isolate and metagenomic samples would augment the ability of clinicians to make treatment plans for bacterial infections, facilitating a future where sequence-based personized medicine is routine. It would also ease antimicrobial resistance surveillance efforts and enable low-resource areas to benefit more fully from rapidly decreasing sequencing costs.
Acknowledgements
The authors thank K. Sukhum and M. Pandey for reading through a draft of this paper. This work was supported in part by awards to G.D. through the National Institute of Allergy and Infectious Diseases (NIAID), the Eunice Kennedy Shriver National Institute of Child Health & Human Development and the National Center for Complementary and Integrative Health of the US National Institutes of Health (NIH) under award numbers R01AI123394, R01HD092414 and R01AT009741, respectively. A.W.D. received support from the Institutional Program Unifying Population and Laboratory-Based Sciences Burroughs Wellcome Fund Grant to Washington University. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.
Footnotes
Competing interests
The authors declare no competing interests.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Reviewer information
Nature Reviews Genetics thanks J. Parkhill, E. Ruppé and other anonymous reviewer(s) for their contribution to the peer review of this work.
References
This paper demonstrates the impact of antimicrobial resistance on the health-care system and identifies major priorities for future mitigation efforts.
This article shows that soil bacteria are a reservoir for resistance determinants.
This paper describes the creation of a profile HMM-based resistance database and presents an application of this database showing that environmental-based and human-based samples have different resistance profiles.
This short paper explains how DBGs are used in genome assembly.
Peng et al. (2012), Li et al. (2015), Nurk et al. (2017) and Namiki et al. (2012) are method papers of metagenomic assemblers developed to assemble complex metagenomics data sets with uneven sequencing depths.
Together with Sczyrba et al., this paper describes the CAMI project designed to evaluate the differences between different metagenomics tools for metagenomic assembly, taxonomic classification and assembled contig binning.
This detailed review discusses the best strategies used in shotgun metagenomics studies.
ARDB was one of the first general antimicrobial resistance gene databases, and this paper spawned several other efforts to compile resistance gene information across drug classes and bacterial species.
This paper describes recent updates to the CARD and tools that are associated with the database.
This article describes Resfinder, a widely used tool for the identification of acquired antimicrobial resistance genes in whole-genome data.
This paper shows the effectiveness of a sequencing approach to phenotypic antimicrobial resistance predictions in M. tuberculosis.
This paper compiles putative resistance determinants from functional antimicrobial selections in public databases to identify resistance determinants that are not well represented in databases built primarily from clinical bacterial isolates.
This article presents a new technique for identifying antimicrobial resistance determinants by including 3D information.
This is one of the initial studies to demonstrate the application of functional metagenomics selections for discovering novel antibiotic resistance genes.
This paper applies a functional metagenomics approach and assembly pipeline to show evidence of resistance gene exchange between human pathogens and soil bacteria.
This article covers the importance of genomic context in understanding how genotypic resistance determinants result in varied phenotypic antimicrobial susceptibility profiles.
This is a review of traditional microbiology techniques and of several automation innovations, including disc diffusion, microbroth dilution and a Vitek system.
This paper uses expression profiles to help predict phenotypic resistance from genotypic data, showing the power of combining multiple omics techniques.
Full text links
Read article at publisher's site: https://doi.org/10.1038/s41576-019-0108-4
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc6525649?pdf=render
Citations & impact
Impact metrics
Article citations
Differential Expression Analysis Reveals Possible New Quaternary Ammonium Compound Resistance Gene in Highly Resistant Serratia sp. HRI.
Microorganisms, 12(9):1891, 13 Sep 2024
Cited by: 0 articles | PMID: 39338566 | PMCID: PMC11433835
Characterization of acquired β-lactamases in <i>Pseudomonas aeruginosa</i> and quantification of their contributions to resistance.
Microbiol Spectr, 12(10):e0069424, 09 Sep 2024
Cited by: 0 articles | PMID: 39248479 | PMCID: PMC11448201
Comparison of targeted next-generation sequencing and metagenomic next-generation sequencing in the identification of pathogens in pneumonia after congenital heart surgery: a comparative diagnostic accuracy study.
Ital J Pediatr, 50(1):174, 12 Sep 2024
Cited by: 0 articles | PMID: 39267108 | PMCID: PMC11395185
BacSPaD: A Robust Bacterial Strains' Pathogenicity Resource Based on Integrated and Curated Genomic Metadata.
Pathogens, 13(8):672, 09 Aug 2024
Cited by: 0 articles | PMID: 39204272 | PMCID: PMC11357117
Enhancing antimicrobial resistance detection with MetaGeneMiner: Targeted gene extraction from metagenomes.
Chin Med J (Engl), 137(17):2092-2098, 27 Jun 2024
Cited by: 0 articles | PMID: 38934052 | PMCID: PMC11374256
Go to all (182) article citations
Other citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
BioProject
- (1 citation) BioProject - PRJNA313047
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Funding
Funders who supported this work.
NCCIH NIH HHS (1)
Grant ID: R01 AT009741
NIAID NIH HHS (2)
Grant ID: R01 AI123394
Grant ID: U01 AI123394
NICHD NIH HHS (1)
Grant ID: R01 HD092414