Wu2014 PDF
Wu2014 PDF
Wu2014 PDF
S
online and in print.)
C E
I N
A
D V A
20.1
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
INTRODUCTION
Viruses are ubiquitous pathogens found in all types of cellular organisms. Typically, viral particles
or virions range from 30 to 450 nm in size and contain a small genome packaged by capsid proteins
with or without a lipid envelope. Whereas cellular organisms store genetic information exclusively
in DNA, both DNA and RNA can serve as the genomes of viruses. In addition to viruses, the cir-
cular noncoding RNA viroids also cause diseases in plants. Because of their small size and inability
to propagate outside living host cells, viruses and viroids remain difficult to detect and identify
compared to cellular pathogens. Enzyme-linked immunosorbent assay (ELISA), polymerase chain
reaction (PCR), and nucleic acid hybridization techniques (including microarray) developed in the
last several decades collectively provide rapid and inexpensive diagnoses for the known viruses and
viroids and are widely used in agriculture and medicine (20, 64, 70, 83). However, because these
assays depend on the reagents (antibodies, primers, or probes) developed from the characterized
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
viruses and viroids, they are ineffective when the disease is caused by a new pathogen or a mixture
of pathogens that share little or no sequence similarity with those described previously.
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
Major advances in DNA sequencing technology over the last decade have led to the develop-
ment of new approaches for the identification and detection of viruses and viroids. These new
approaches, frequently referred to as metagenomics (64), sequence the total nucleic acid con-
tent in disease samples by next-generation sequencing (NGS) technologies for the subsequent
identification of pathogens by bioinformatics tools. Unlike existing methods, the metagenomics
approach does not require prior knowledge of the pathogens and can potentially identify both the
known and new viruses and viroids in a disease sample. For example, use of NGS technologies
has played a key role in the identification of Israeli acute paralysis virus associated with colony
collapse disorder in honeybees (16) and of a polyomavirus in human Merkel cell carcinoma (30).
Rapid development in NGS technologies has dramatically reduced the cost and time for pathogen
identification by metagenomics approaches, leading to a recent explosion in metagenomics studies
on viruses and viroids in plants (46, 61, 84). After a short introduction of the main NGS plat-
forms currently available, this article reviews recent applications of the homology-dependent and
homology-independent metagenomics approaches to the identification and detection of viruses
and viroids in plants.
20.2 Wu et al.
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
a
SBS, sequencing by synthesis; SBH, sequencing by hybridization; SMRT, single molecule, real time.
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
by a factor of 100–1,000 in daily throughput, they were termed next-generation sequencing (NGS)
instead of HTS (47). These NGS approaches (Table 1) have different underlying biochemistries
and differ in template preparation, sequencing and imaging, and data analysis (63). For sequencing
by the 454 platform (e.g., GS FLX+; see Table 1), the DNA template is first amplified by emulsion
PCR in which a single DNA template is amplified to thousands of copies in each droplet of an oil so-
lution. The amplified DNA templates are incubated with a DNA polymerase, single-strand DNA
binding proteins, and the ATP sulfurylases and luciferases; the light emitted during the incorpo-
ration is captured for all wells in parallel using a high-resolution charge-coupled device (CCD)
camera. After capture of the light intensity, the remaining unincorporated nucleotides are washed
away before the next nucleotide is added. The 454 technology is referred to as pyrosequencing
because the sequencer generates visible light from inorganic pyrophosphate of deoxynucleotides
(dNTPs) released during DNA synthesis by sulphurylase and luciferase. The average number
of reads produced per run by the current GS FLX+ sequencer is more than 1 million reads of
approximately 700 bases with a run time of approximately 23 h (47, 60).
The strategy of template preparation and sequencing of Illumina instruments is quite different
from that of 454 pyrosequencing. Bridge amplification of each DNA template produces a cluster
of thousands of original sequences in very close proximity to each other on the flow cell. A typical
slide generates hundreds of millions of spatially separated clusters per run. The Illumina platform
uses four dNTPs with different reversible dye terminators in the sequencing reaction so that the
incorporation reaction is stopped after each base and the incorporated base is easily read out
with fluorescent dyes. The fluorescently labeled terminator is imaged and then cleaved from the
nascent end of DNA for the next cycle. The Illumina platform supports both single-end and
mate-paired-end sequencing. Illumina HiSeq2500 allows a powerful combination of 2 × 125 base
pair (bp) read lengths and up to 1 Tetrabase per run within 6 days, whereas MiSeq generates up
to 15 Gb data in only 55 h (Table 1) (59, 63).
In the SOLiD platform of ABI the sequence extension reaction is carried out by ligases, rather
than polymerases as in the 454 and Illumina platforms. The single-stranded copies of the DNA
library molecules are first hybridized with a sequencing primer before the addition of a mixture of
8-mer probes carrying four distinct fluorescent labels to compete for ligation to the sequencing
primer. After the fluorescence determined by the two 3 -most nucleotides of the probe is read out,
three bases including the dye are cleaved from the 5 end of the probe to leave a free 5 phosphate
for further ligation. After multiple cycles of ligations (typically up to 10 cycles), the synthesized
strands are melted, and the ligation product is washed away before a new sequencing primer
(shifted by one nucleotide) is annealed. The same process is repeated for the remaining three
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
primers, facilitating the readout of the dinucleotide encoding for each start position in the DNA
sequence. Using specific fluorescent label encoding, the dye readouts (i.e., colors) are converted
to a DNA sequence. For parallelization, the sequencing process uses beads covered with multiple
copies of the sequence to be determined. These beads are created in a fashion similar to that
described above for the 454 platform. The SOLiD 5500xl system currently generates 95 Gb data
per run during 6 days with a typical read length combination of 2 × 60 bp (47, 92).
New sequencing platforms with improved performance continue to be developed and released
(Table 1). Approaches that sequence single large DNA molecules without the need to halt between
read steps are sometimes referred as third-generation sequencing (88). However, Ion Torrent
and Proton are between the second and third generation, since they do not completely fulfill
the features assigned to either category (9). Advancement of sequencing technology provides
unprecedented opportunity for pathogen discovery in plants by unbiased viral metagenomics.
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
20.4 Wu et al.
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
comptoniana
Illumina Velvet/Geneious 1 H. comptoniana 103
454 CLC 1 Lettuce 95
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
Table 2 (Continued )
Viruses Viroids
New
NGS
Enrichment platform Assembly tools Known RNA DNA Known New Host species Reference
Virus-like 454 Newbler 1 Sweet potato 81
particles SOLiD Velvet 1 Eggplant 28
454 Newbler 1 P. domestica 93
454 Newbler/CLC 1 Potato 79
SOLiD Velvet 1 Pepper 27
454 CLC 1 1 Sugarcane 12
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
20.6 Wu et al.
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
Samples
Pretreatment
chromatography
CF11 cellulose
Library construction
Next-generation sequencing
Raw data
Preprocessing
Assembling
Contigs
Figure 1
The key steps in the discovery of viruses and viroids by next-generation sequencing. The experimental part,
including the strategies of viral pathogen enrichment, is shown in blue, whereas the informatics process is
shown in green. Abbreviations: BLASTn, nucleotide-nucleotide BLAST; BLASTp, protein-protein
BLAST; dsRNAs, double-stranded RNAs; dT, thymine deoxyribonucleotide; PFOR, progressive filtering of
overlapping small RNAs; VLP, virus-like particle.
dsRNA preparations shorter than 1 kb (82). However, six of the seven new viruses discovered by
dsRNA sequencing contain RNA genomes, possibly because plant DNA viruses do not produce
sufficiently long dsRNA in their life cycle. We note that the new DNA geminivirus was identified
from sequencing a total dsRNA preparation that was not treated with RNase and DNase before
library construction (5).
Virus-like particles. Viral genomic RNA or DNA packaged in viral particles is protected from
DNase and RNase treatments. Therefore, enrichment of VLPs (Figure 1) by homogenization,
filtration, and ultracentrifugation has been widely used for virus discovery in ocean, environmental,
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
and fecal samples (20, 52, 64, 96) as well as in several studies of plant samples (12, 27, 28, 36, 79, 81,
93). VLP preparations contain contaminating mitochondria and bacteria, so VLPs are often treated
with chloroform to disrupt mitochondrial and bacterial membranes before nuclease digestion and
the extraction of VLP-associated nucleic acids for deep sequencing (51, 100). Unfortunately,
enveloped viruses are also sensitive to chloroform treatment. Moreover, because successful virion
purification of many viruses requires development of specific protocols, it is unlikely that all viruses
can be captured by a single protocol for VLP enrichment. Nevertheless, deep sequencing of the
total nucleic acids from VLPs extracted from plants led to the discovery of four DNA viruses and
two RNA viruses (Table 2).
Because the total nucleic acid content from both VLP and dsRNA preparations is often very
low, an extra step to amplify the extracted nucleic acids by PCR or reverse transcription PCR
(RT-PCR) in a sequence-independent manner is necessary before the construction of libraries
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
for deep sequencing (Figure 1) (16, 64). An improved rolling-circle amplification technique to
amplify circular dsDNA viral genomes is available (89, 94).
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
Small RNAs. It is known that plants and invertebrates produce virus-derived small interfering
RNAs (siRNAs) in response to infection by both RNA and DNA viruses (Figure 1) (22–24).
Recent studies have also detected abundant production of viral siRNAs in mammalian cells after
infection by two unrelated positive-strand RNA viruses (55, 58). Moreover, dsRNA replicative
intermediates of viroids and satellite RNAs are also processed into siRNAs in plants (22–24). The
first deep-sequencing study of viral siRNAs has revealed that viral siRNAs in fact overlap each other
extensively despite being only 21 to 24 nt long (7). Ubiquitous production in diverse eukaryotic
hosts and the overlapping property of viral siRNAs have been utilized independently by two groups
to develop a novel strategy for virus discovery (Figure 2) by enriching and sequencing small RNAs
(49, 101). In this approach, known as virus discovery by deep sequencing and assembly of total
small RNAs (vdSAR), small RNAs are enriched from diseased cells/tissues for deep sequencing by
NGS platforms and assembled into large sequence contigs/fragments that are then used for virus
discovery as those sequences are obtained from dsRNAs and VLPs.
The original protocols to construct a library of small RNAs for sequencing took up to two weeks
(7, 8). However, current protocols no longer require gel purification of small RNAs either before
or after ligation with 5 and 3 adaptors and may be completed within one day (55). Because the
sequence-independent amplification required for dsRNA and VLP preparations is not necessary,
sample preparation for small RNA sequencing is less technically demanding and time-consuming
than are dsRNA and VLP purification protocols (Figure 1).
There are several reasons why more investigators chose to enrich and sequence small RNAs
for the identification of plant viruses and viroids over other enrichment strategies (Table 2). First,
replication of RNA and DNA viruses and replication of subviral agents such as viroids and satellite
RNAs in plants all induce extremely abundant accumulation of the pathogen-specific siRNAs,
which represent up to 30% of total small RNAs sequenced from diseased plants (22–24). Because
both the amount of sequencing and data complexity are greatly reduced, sequencing of multiple
samples marked by barcodes in a single lane still provides sufficient depth for pathogen discovery.
Thus, vdSAR is cost-effective. Second, unlike dsRNA and VLP preparations, all replicating viruses
and viroids in a diseased plant can be detected in principle from the deep sequencing of a single
library of small RNAs. This feature is particularly attractive for many disease diagnostic and
quarantine applications. Third, viral and subviral siRNAs are the products of an active host immune
response to infection and exhibit specific patterns of size distribution in distinct host species due
to their biogenesis by specific Dicer proteins (22–24). Therefore, the size distribution pattern of
small RNAs may reveal if the identified viral and subviral pathogen actively replicates in plants or
20.8 Wu et al.
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
Cleaning
PFOR/
Assembling PFOR2
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
i ii
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
Phased
siRNA
reads
Figure 2
The classification principles of endogenous and exogenous small RNAs (sRNAs) sequenced from plants in the discovery of (i ) viruses
by deep sequencing and assembly of total small RNAs (vdSAR) and (ii ) viroids by progressive filtering of overlapping small RNAs
(PFOR). Abbreviations: nat-siRNAs, natural antisense siRNAs; nt, nucleotides; tasiRNAs, trans-acting siRNAs; miRNAs, microRNAs;
SLS, splitting longer reads into shorter fragments.
in the associated fungal or insect species (119). Finally, data mining of the same libraries of small
RNAs may lead to the discovery of novel pathogens that exhibit no sequence similarity detectable
by currently available informatics tools (Figures 1 and 2; see also below).
Bioinformatics Analysis
Prior to sequence assembly and pathogen identification, the raw data generated by NGS platforms
must be preprocessed to remove adaptors and low-quality sequences (Figure 1). The quality
control is dependent on the sequencing technology used. Standard parameters and thresholds are
usually provided by the manufacturer. It is now a standardized process on “older” technologies
like 454/Roche pyrosequencing or Illumina sequencing by synthesis. For multiplex sequencing of
mixed libraries in a single lane, an extra step of demultiplexing using barcodes built in the PCR
primers is necessary before sequence assembly. When the genome sequence of the host plant is
available, in silico subtraction of host-specific sequences (Figure 2) before assembly will speed up
the downstream bioinformatics analysis (54, 64). It should be pointed out that adaptor removal
and computational filtering of host sequences can lead to other artifacts (43, 84).
The assembly of the preprocessed reads can be executed by several mainstream algorithms that
are publicly available, including Velvet, Oases, and Vcake (42, 90, 116). The parameters used in
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
sequence assembly are similar to those for genome assembly and are defined by the algorithm
employed. Subsequently, the assembled contig sequences are queried by homology search tools
against previously documented sequences stored either in a local database or in public databases
such as GenBank (64). A common approach is to compare the assembled sequences with the
nonredundant nucleotide database of GenBank using a BLAST package. BLASTn and BLASTx
are the two frequently used programs for nucleotide and amino acid comparison, respectively.
New and known viruses are readily identified when contigs show high similarity (>90% similarity
and 85% coverage) with a known virus (78, 101). When a contig shows distant homology with a
known virus, especially only at protein level, the contig often represents a new virus that can be
taxonomically assigned only at the level of virus family (78, 101).
A bottleneck in the viral metagenomics approach is the de novo annotation of thousands
of the assembled contigs. The computing time required for the annotation is likely to increase
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
as sequence databases continue their exponential growth. The use of dedicated databases or of
subsections of GenBank could overcome this problem; however, it will not be possible to identify
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
bacterial, archaeal, and nonhost eukaryotic sequences, leading to an overestimation of the fraction
of unknowns (64). To meet the demand of the huge number of sequencing data from NGS,
computer programs that are orders of magnitude faster than BLAST have been developed, such
as USEARCH and HHbits (29, 77).
Although software packages used for viral metagenomics are freely accessible, data analysis
often requires a trained bioinformaticist, a requirement that hinders the routine application of
NGS in the identification of viruses and viroids. Recently, an easy-to-use graphical interface,
SearchSmallRNA, became available to reconstruct viral genomes with high reliability using small
RNA data from NGS. SearchSmallRNA bypasses the need of line command and basic bioinfor-
matics knowledge (18). Commercial packages such as CLC Genomics Workbench and Geneious
or open platforms such as Galaxy also provide user-friendly interfaces and simplify the use and
parameterization of these tools. These efforts will facilitate identification of viruses and viroids by
NGS technologies.
20.10 Wu et al.
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
Jap
Anthocercis littorea
ane
sep
Squash
ers
Eggplant
Papaya
im
Pru
mo
nu
er
per
Che
Re sd
n
dr om e pp
Pep
est kp
rry
as ica
pb c
er Bla ato
ry Tom
Rose
Potato
Pagoda
Sweet p
otato
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
Citrus
Soybean
a
lobos
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
ag Yam
Gom phren bean
e
p evin Ha
Gra rde
m nb
nt osu Pa
ssi
erg
ia c
me flo
ce
o om
Ca
mt ra
ttu
cae pto
ss
u
rcti nia
av
Le
A rul na
a
a
Eudicots
Monocots
ane
Sugarc Ca
nn
a
ze
ai Garli
c
M
rass
rum
Or
ot g
ch
e
Iris
sRNA ipp
ksfo
ids
Narcis
H
poly(A) RNA
Coc
VPL
Total RNA/Ribo-RNA
dsRNA
Figure 3
The taxonomy of host plant species selected for the identification of viruses and viroids by next-generation
sequencing. Also shown are the strategies ( frequencies are marked by colored boxes) used to enrich virus and/or
viroid sequences in each host species. The clustering of host species on a tree may not be strictly consistent
with the phylogenetic relationship of these species. Abbreviations: dsRNA, double-stranded RNA; poly-A
RNA, polyadenylated transcript; ribo-RNA, ribosomal RNA-depleted RNA; sRNA, small RNA; VLPs,
virus-like particles.
sequence homology with a known pathogen in a database. Therefore, the metagenomics approach
as described above cannot discover novel pathogens that show little or no sequence homology with
known pathogens. In a hypothetical scenario, for example, BLAST searches would not be able to
identify NGS reads of geminiviruses as virus-specific in 1977 (33, 38) because geminiviral genomes
show no detectable sequence similarity with other viruses (37).
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
20.12 Wu et al.
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
Table 3 (Continued )
RNA viruses Family (genus) Infective? Reference
Two badnaviruses Caulimoviridae (Badnavirus); Geminiviridae NA 49
One mastrevirus (Mastrevirus)
Grapevine vein clearing virus (GVCV) Caulimoviridae (Madnavirus) Yes 117
Citrus chlorotic dwarf-associated virus (CCDaV) Geminiviridae Yes 57
Grapevine geminivirus (GVGV) Geminiviridae NA 91
Pagoda yellow mosaic associated virus (PYMAV) Caulimoviridae (Badnavirus) NA 98
Viroids
Persimmon viroid 2 (PVd2) Apscaviroid Yes 41
Grapevine hammerhead viroidlike RNA (GHVd Avsunviroidae NA 102
RNA)
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
A novel homology-independent metagenomics approach has been recently developed for the
discovery of viroids (102). This approach (Figure 2) involves analyzing the sequences of the
total small RNAs of the infected plants obtained by NGS with a novel computational algorithm,
progressive filtering of overlapping small RNAs (PFOR). Viroid infection triggers production of
viroid-derived overlapping siRNAs that cover the entire genome at high densities (11, 21, 69).
However, these viroid siRNAs cannot be assembled into complete viroid genomes by conventional
genome assembly algorithms such as Velvet because of the high heterogeneity of viroid populations
(102). PFOR retains viroid siRNAs for genome assembly by progressively eliminating small RNAs
that do not overlap and those that overlap but cannot be assembled into a direct repeat RNA, which
is synthesized during the rolling-circle replication of the viroid RNA. Viroids from both of the
known viroid families are readily identified and their full-length sequences assembled by PFOR
from small RNAs sequenced from infected plants. The recent algorithm update significantly
enhances the performance by adopting parallel programming into the filtering step, which takes
up more than 90% of the running time (119).
A new viroid, Grapevine latent viroid (GLVd), was identified by the homology-independent
metagenomics approach (119). GLVd was proposed as a new species of Apscaviroid because it
contains the typical structural elements found in this group and independently infects grapevine
seedlings (119). PFOR/PFOR2 also enabled the discovery of two putative viroids designated
grapevine hammerhead viroid-like RNA (GHVd RNA) and apple hammerhead viroid-like RNA
(AHVd RNA), neither of which exhibits sequence similarity with any of the known molecules
detectable by the available algorithms. Assembly of AHVd and GHVd RNA repeats by PFOR
from siRNAs, with their 21-nt/22-nt ratio characteristic of the viral siRNAs targeting plant
RNA viruses known to replicate in the cytoplasm, indicates that both circular RNAs replicate
via the rolling-circle mechanism. Both AHVd and GHVd RNAs also encode a biologically
active hammerhead ribozyme in each polarity that is structurally conserved in viroids from the
Avsunviroidae. Moreover, AHVd RNA was not specifically associated with any of the viruses found
in apple plants by deep sequencing. Although independent infectivity has yet to be demonstrated,
the available lines of evidence suggest that AHVd and GHVd RNAs may represent novel viroids.
A simple computational program known as SLS (for splitting longer reads into shorter frag-
ments) was recently developed as part of PFOR2 to discover biologically active circular RNAs via
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
the deep sequencing of long RNAs instead of small RNAs (119). The program (Figure 2) cuts
the sequenced long RNAs into 21-nt virtual small RNAs of 20-nt overlap with their 5 and 3
neighboring small RNAs before PFOR2 analysis to retain only those that could be assembled into
direct repeat RNAs. SLS-PFOR2 was validated by successful assembly of the full-length PSTVd
genome from the total RNAs deep sequenced from the infected potato plants after rRNA depletion
or RNase R treatment to enrich circular RNAs (119). The development of SLS-PFOR2 allows
the discovery of viroids that infect a host species but do not trigger Dicer-dependent biogenesis
of viroid siRNAs, which may lead to the identification of novel viroids.
CONCLUDING REMARKS
Determination of the total nucleic acid content in a biological sample by NGS technologies pro-
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
vides a powerful tool for plant pathologists in the diagnosis and identification of viruses and viroids.
NGS technologies will also likely transform the inspection and quarantine services required to
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
provide a fast, accurate, and full indexing of viruses and viroids in plant samples. In contrast to the
current techniques such as ELISA, PCR, and microarray, the metagenomics approaches do not
require prior knowledge of the pathogens. Successful detection of diverse viruses and viroids by the
metagenomics approaches has been described in plants since 2009. Available data show that among
the strategies developed to enrich NGS reads specific to viruses and/or viroids, targeting the total
small RNAs for deep sequencing has been the most effective to identify RNA and DNA viruses as
well as viroids (25). Analyses of plant samples by NGS and homology-dependent computational
algorithms have identified two new viroids and 49 new viruses from 20 known or assigned virus
families in the last six years. Notably, development of the algorithm PFOR allows the discovery
of viroids independent of not only the tedious purification of the naked circular RNAs but also
any sequence similarity to a known viroid. Despite widespread distribution in potato, horticultural
species, and fruit trees, viroids have so far not been found in many major monocot or small fruit
crops, or in animals. The recent algorithm update SLS-POFR2 further expands the range of host
species for viroid discovery to include those that may not produce overlapping siRNAs targeting
the replicating circular RNAs.
Many challenges remain in the application of NGS technologies to pathogen discovery. First,
there is a critical need for the development of new computational algorithms capable of discover-
ing novel viruses from NGS datasets in a homology-independent manner, as illustrated by PFOR
in viroid discovery. These algorithms may also help identify the origins of the large volume of un-
known sequences present in any deep-sequencing project that are not homologous to any entry in
GenBank. Second, development of user-friendly software interfaces such as SearchSmallRNA that
are accessible to the public and require little informatics training will facilitate more widespread
applications of NGS technologies to the diagnosis and identification of viruses and viroids. Third,
Koch’s postulates have not been fulfilled for many viruses and viroids discovered by the metage-
nomics approaches. However, because production of virus- and viroid-specific siRNAs reveals
induction of the host immune response to an active infection, identification of viruses and viroids
by small RNA sequencing also provides evidence for their replication in the host.
SUMMARY POINTS
1. The power of next-generation sequencing (NGS) technologies to allow rapid determina-
tion of the total nucleic acid content in a biological sample has transformed the diagnosis
and identification of viruses and viroids.
20.14 Wu et al.
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
2. All viruses and viroids identical or similar to those described previously can be identified
in a plant sample by a single NGS run and homology-dependent algorithms.
3. A total of 49 new viruses and one new viroid have been identified by NGS and homology-
dependent algorithms since 2009. Twelve of these viruses may become the founding
members of new virus genera or families.
4. Common strategies to enrich viruses and/or viroids for deep sequencing include the
purification of dsRNAs, virus-like particles, or small RNAs. However, identification of
RNA and DNA viruses as well as viroids in a single NGS run is possible only by sequencing
total small RNAs.
5. The recent development of homology-independent algorithm PFOR/PFOR2 makes
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
it possible to discover novel viroids by deep sequencing of small RNAs and total RNAs
either deleted of rRNA or enriched for circular RNAs. It is likely that PFOR will facilitate
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
discovery of previously uncharacterized viroids in diverse plant and animal species in both
agriculture and the environment.
DISCLOSURE STATEMENT
The authors are not aware of any affiliations, memberships, funding, or financial holdings that
might be perceived as affecting the objectivity of this review.
ACKNOWLEDGMENTS
The authors’ research projects described in this review were supported by grants to Q.W. from the
National Basic Research Program of China (No. 2014CB138405) and the Chinese National Natu-
ral Science Foundation (No. 31272011) and to S.W.D. from the US–Israel Binational Agricultural
Research and Development Fund (BARD-IS-4513-2), USIsrael Binational Science Foundation
(BSF-2011302), and the US Department of Agriculture Research Service (6659-22000-025).
LITERATURE CITED
1. Adams IP, Glover RH, Monger WA, Mumford R, Jackeviciene E, et al. 2009. Next-generation sequenc-
ing and metagenomic analysis: a universal diagnostic tool in plant virology. Mol. Plant Pathol. 10:537–45
2. Adams IP, Miano DW, Kinyua ZM, Wangai A, Kimani E, et al. 2013. Use of next-generation sequencing
for the identification and characterization of Maize chlorotic mottle virus and Sugarcane mosaic virus causing
maize lethal necrosis in Kenya. Plant Pathol. 62:741–49
3. Al Rwahnih M, Daubert S, Golino D, Rowhani A. 2009. Deep sequencing analysis of RNAs from a
grapevine showing Syrah decline symptoms reveals a multiple virus infection that includes a novel virus.
Virology 387:395–401
4. Al Rwahnih M, Daubert S, Urbez-Torres JR, Cordero F, Rowhani A. 2011. Deep sequencing evidence
from single grapevine plants reveals a virome dominated by mycoviruses. Arch. Virol. 156:397–403
5. Al Rwahnih M, Dave A, Anderson MM, Rowhani A, Uyemoto JK, Sudarshana MR. 2013. Association
of a DNA virus with grapevines affected by red blotch disease in California. Phytopathology 103:1069–76
6. Al Rwahnih M, Sudarshana MR, Uyemoto JK, Rowhani A. 2012. Complete genome sequence of a novel
vitivirus isolated from grapevine. J. Virol. 86:9545
7. Aliyari R, Wu Q, Li HW, Wang XH, Li F, et al. 2008. Mechanism of induction and suppression of
antiviral immunity directed by virus-derived small RNAs in Drosophila. Cell Host Microbe 4:387–97
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
8. Ambros V, Lee RC. 2004. Identification of microRNAs and other tiny noncoding RNAs by cDNA
cloning. Methods Mol. Biol. 265:131–58
9. Barzon L, Lavezzo E, Militello V, Toppo S, Palu G. 2011. Applications of next-generation sequencing
technologies to diagnostic virology. Int. J. Mol. Sci. 12:7861–84
10. Bi YQ, Tugume AK, Valkonen JPT. 2012. Small-RNA deep sequencing reveals Arctium tomentosum as
a natural host of Alstroemeria virus X and a new putative Emaravirus. PLOS ONE 7:e427587
11. Bolduc F, Hoareau C, St-Pierre P, Perreault JP. 2010. In-depth sequencing of the siRNAs associated
with peach latent mosaic viroid infection. BMC Mol. Biol. 11:16
12. Candresse T, Filloux D, Muhire B, Julian C, Galzi S, et al. 2014. Appearances can be deceptive: revealing
a hidden viral infection with deep sequencing in a plant quarantine context. PLOS ONE 9:e102945
13. Candresse T, Marais A, Faure C, Gentit P. 2013. Association of Little cherry virus 1 (LChV1) with the
Shirofugen stunt disease and characterization of the genome of a divergent LChV1 isolate. Phytopathology
103:293–8
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
14. Carvajal-Yepes M, Olaya C, Lozano I, Cuervo M, Castano M, Cuellar WJ. 2014. Unraveling complex
viral infections in cassava (Manihot esculenta Crantz) from Colombia. Virus Res. 186:76–86
15. Coetzee B, Freeborough MJ, Maree HJ, Celton JM, Rees DJ, Burger JT. 2010. Deep sequencing analysis
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
20.16 Wu et al.
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
33. Goodman RM. 1977. Single-stranded DNA genome in a whitefly-transmitted plant virus. Virology
83:171–79
34. Gu YH, Tao X, Lai XJ, Wang HY, Zhang YZ. 2014. Exploring the polyadenylated RNA virome of sweet
potato through high-throughput sequencing. PLOS ONE 9:e98884
35. Hagen C, Frizzi A, Kao J, Jia L, Huang M, et al. 2011. Using small RNA sequences to diagnose, sequence,
and investigate the infectivity characteristics of vegetable-infecting viruses. Arch. Virol. 156:1209–16
36. Hany U, Adams IP, Glover R, Bhat AI, Boonham N. 2014. The complete genome sequence of Piper
yellow mottle virus (PYMoV). Arch. Virol. 159:385–38
37. Harrison BD. 1985. Advances in geminivirus research. Annu. Rev. Phytopathol. 23:55–82
38. Harrison BD, Barker H, Bock KR, Guthrie EJ, Meredith G, Atkinson M. 1977. Plant viruses with circular
single-stranded DNA. Nature 270:760–62
39. He Y, Yang Z, Hong N, Wang G, Ning G, Xu W. 2015. Deep sequencing reveals a novel closterovirus
associated with wild rose leaf rosette disease. Mol. Plant Pathol. 16:449–58
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
40. Ito T, Suzaki K, Nakano M. 2013. Genetic characterization of novel putative rhabdovirus and dsRNA
virus from Japanese persimmon. J. Gen. Virol. 94:1917–21
41. Ito T, Suzaki K, Nakano M, Sato A. 2013. Characterization of a new apscaviroid from American per-
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
58. Maillard PV, Ciaudo C, Marchais A, Li Y, Jay F, et al. 2013. Antiviral RNA interference in mammalian
cells. Science 342:235–38
59. Mardis ER. 2008. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 9:387–
402
60. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al. 2005. Genome sequencing in microfab-
ricated high-density picolitre reactors. Nature 437:376–80
61. Massart S, Olmos A, Jijakli H, Candresse T. 2014. Current impact and future directions of high through-
put sequencing in plant virus diagnostics. Virus Res. 188:90–96
62. Mbanzibwa DR, Tugume AK, Chiunga E, Mark D, Tairo FD. 2014. Small RNA deep sequencing-based
detection and further evidence of DNA viruses infecting sweetpotato plants in Tanzania. Ann. Appl. Biol.
165:329–39
63. Metzker ML. 2010. Sequencing technologies—the next generation. Nat. Rev. Genet. 11:31–46
64. Mokili JL, Rohwer F, Dutilh BE. 2012. Metagenomics and future perspectives in virus discovery. Curr.
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
tomic analysis of small RNAs present in soybean deep sequencing libraries. Genet. Mol. Biol. 35(Suppl.
1):292–303
66. Monger WA, Adams IP, Glover RH, Barrett B. 2010. The complete genome sequence of Canna yellow
streak virus. Arch. Virol. 155:1515–18
67. Monger WA, Alicai T, Ndunguru J, Kinyua ZM, Potts M, et al. 2010. The complete genome sequence
of the Tanzanian strain of Cassava brown streak virus and comparison with the Ugandan strain sequence.
Arch. Virol. 155:429–33
68. Nakamura S, Yang CS, Sakon N, Ueda M, Tougan T, et al. 2009. Direct metagenomic detection of viral
pathogens in nasal and fecal specimens using an unbiased high-throughput sequencing approach. PLOS
ONE 4:e4219
69. Navarro B, Pantaleo V, Gisel A, Moxon S, Dalmay T, et al. 2009. Deep sequencing of viroid-derived small
RNAs from grapevine provides new insights on the role of RNA silencing in plant-viroid interaction.
PLOS ONE 4:e7686
70. Nicolaisen M. 2011. An oligonucleotide-based microarray for detection of plant RNA viruses. J. Virol.
Methods 173:137–43
71. Palacios G, Druce J, Du L, Tran T, Birch C, et al. 2008. A new arenavirus in a cluster of fatal transplant-
associated diseases. N. Engl. J. Med. 358:991–98
72. Pallett DW, Ho T, Cooper I, Wang H. 2010. Detection of Cereal yellow dwarf virus using small interfering
RNAs and enhanced infection rate with Cocksfoot streak virus in wild cocksfoot grass (Dactylis glomerata).
J. Virol. Methods 168:223–27
73. Pantaleo V, Saldarelli P, Miozzi L, Giampetruzzi A, Gisel A, et al. 2010. Deep sequencing analysis of
viral short RNAs from an infected Pinot Noir grapevine. Virology 408:49–56
74. Petrosino JF, Highlander S, Luna RA, Gibbs RA, Versalovic J. 2009. Metagenomic pyrosequencing and
microbial identification. Clin. Chem. 55:856–66
75. Poojari S, Alabi OJ, Fofanov VY, Naidu RA. 2013. A leafhopper-transmissible DNA virus with novel evo-
lutionary lineage in the family Geminiviridae implicated in grapevine redleaf disease by next-generation
sequencing. PLOS ONE 8:e64194
76. Quito-Avila DF, Jelkmann W, Tzanetakis IE, Keller K, Martin RR. 2011. Complete sequence and
genetic characterization of Raspberry latent virus, a novel member of the family Reoviridae. Virus Res.
155:397–405
77. Remmert M, Biegert A, Hauser A, Soding J. 2012. HHblits: lightning-fast iterative protein sequence
searching by HMM-HMM alignment. Nat. Methods 9:173–75
78. Reyes A, Haynes M, Hanson N, Angly FE, Heath AC, et al. 2010. Viruses in the faecal microbiota of
monozygotic twins and their mothers. Nature 466:334–38
79. Richards RS, Adams IP, Kreuze JF, De Souza J, Cuellar W, et al. 2014. The complete genome sequences
of two isolates of potato black ringspot virus and their relationship to other isolates and nepoviruses.
Arch. Virol. 159:811–15
20.18 Wu et al.
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
80. Rodamilans B, Leon DS, Muhlberger L, Candresse T, Neumuller M, et al. 2014. Transcriptomic analysis
of Prunus domestica undergoing hypersensitive response to Plum pox virus infection. PLOS ONE 9:e100477
81. Rodriguez Pardina PE, Bejerman N, Luque AV, Di Feo L. 2012. Complete nucleotide sequence of an
Argentinean isolate of sweet potato virus G. Virus Genes 45:593–95
82. Romanovskaya A, Sarin LP, Bamford DH, Poranen MM. 2013. High-throughput purification of double-
stranded RNA molecules using convective interaction media monolithic anion exchange columns.
J. Chromatogr. A 1278:54–60
83. Roossinck MJ. 2011. The big unknown: plant virus biodiversity. Curr. Opin. Virol. 1:63–67
84. Roossinck MJ. 2012. Plant virus metagenomics: biodiversity and ecology. Annu. Rev. Genet. 46:359–69
85. Roossinck MJ, Saha P, Wiley GB, Quan J, White JD, et al. 2010. Ecogenomics: using massively parallel
pyrosequencing to understand virus ecology. Mol. Ecol. 19(Suppl. 1):81–88
86. Roy A, Choudhary N, Guillermo LM, Shao J, Govindarajulu A, et al. 2013. A novel virus of the genus
Cilevirus causing symptoms similar to citrus leprosis. Phytopathology 103:488–500
Access provided by New York University - Bobst Library on 06/07/15. For personal use only.
87. Saqib M, Wylie SJ, Jones MGK. 2014. Serendipitous identification of a new Iflavirus-like virus infecting
tomato and its subsequent characterization. Plant Pathol. 64:519–24
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
88. Schadt EE, Turner S, Kasarskis A. 2010. A window into third-generation sequencing. Hum. Mol. Genet.
19:R227–40
89. Schowalter RM, Pastrana DV, Pumphrey KA, Moyer AL, Buck CB. 2010. Merkel cell polyomavirus
and two previously unknown polyomaviruses are chronically shed from human skin. Cell Host Microbe
7:509–15
90. Schulz MH, Zerbino DR, Vingron M, Birney E. 2012. Oases: robust de novo RNA-seq assembly across
the dynamic range of expression levels. Bioinformatics 28:1086–92
91. Seguin J, Rajeswaran R, Malpica-Lopez N, Martin RR, Kasschau K, et al. 2014. De novo reconstruction
of consensus master genomes of plant RNA and DNA viruses from siRNAs. PLOS ONE 9:e88513
92. Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, et al. 2005. Accurate multiplex polony
sequencing of an evolved bacterial genome. Science 309:1728–32
93. Sheveleva A, Kudryavtseva A, Speranskaya A, Belenikin M, Melnikova N, Chirkov S. 2013. Complete
genome sequence of a novel Plum pox virus strain W isolate determined by 454 pyrosequencing. Virus
Genes 47:385–8
94. van der Meijden E, Janssens RW, Lauber C, Bouwes Bavinck JN, Gorbalenya AE, Feltkamp MC. 2010.
Discovery of a new human polyomavirus associated with Trichodysplasia spinulosa in an immunocompro-
mized patient. PLOS Pathog. 6:e1001024
95. Verbeek M, Dullemans AM, van Raaij HMG, Verhoeven JTJ, van der Vlugt RAA. 2014. Lettuce necrotic
leaf curl virus, a new plant virus infecting lettuce and a proposed member of the genus Torradovirus. Arch.
Virol. 159:801–5
96. Victoria JG, Kapoor A, Li L, Blinkova O, Slikas B, et al. 2009. Metagenomic analyses of viruses in stool
samples from children with acute flaccid paralysis. J. Virol. 83:4642–51
97. Vives MC, Velazquez K, Pina JA, Moreno P, Guerri J, Navarro L. 2013. Identification of a new En-
amovirus associated with citrus vein enation disease by deep sequencing of small RNAs. Phytopathology
103:1077–86
98. Wang YL, Cheng XF, Wu XX, Wang AM, Wu XY. 2014. Characterization of complete genome and
small RNA profile of pagoda yellow mosaic associated virus, a novel badnavirus in China. Virus Res.
188:103–8
99. Widana Gamage S, Persley DM, Higgins CM, Dietzgen RG. 2015. First complete genome sequence of
a capsicum chlorosis tospovirus isolate from Australia with an unusually large S RNA intergenic region.
Arch. Virol. 160:869–72
100. Willner D, Furlan M, Schmieder R, Grasis JA, Pride DT, et al. 2011. Metagenomic detection of phage-
encoded platelet-binding factors in the human oral cavity. Proc. Natl. Acad. Sci. USA 108(Suppl. 1):4547–
53
101. Wu Q, Luo Y, Lu R, Lau N, Lai EC, et al. 2010. Virus discovery by deep sequencing and assembly of
virus-derived small silencing RNAs. Proc. Natl. Acad. Sci. USA 107:1606–11
Changes may still occur before final publication online and in print
PY53CH20-Ding ARI 22 May 2015 9:2
157:1471–80
107. Wylie SJ, Li H, Dixon KW, Richards H, Jones MGK. 2013. Exotic and indigenous viruses infect wild
Annu. Rev. Phytopathol. 2015.53. Downloaded from www.annualreviews.org
populations and captive collections of temperate terrestrial orchids (Diuris species) in Australia. Virus
Res. 171:22–32
108. Wylie SJ, Li H, Liu J, Jones MGK. 2014. First report of Narcissus mosaic virus from Australia and from
Iris. Australas. Plant Dis. Notes 9:1–2
109. Wylie SJ, Li H, Jones MGK. 2012. First report of an isolate of Japanese iris necrotic ring virus from
Australia. Australas. Plant Dis. Notes 7:107–10
110. Wylie SJ, Li H, Jones MGK. 2013. Donkey orchid symptomless virus: a viral ‘platypus’ from Australian
terrestrial orchids. PLOS ONE 8:e79587
111. Wylie SJ, Li H, Jones MGK. 2014. Yellow tailflower mild mottle virus: a new tobamovirus described
from Anthocercis littorea (Solanaceae) in Western Australia. Arch. Virol. 159:791–95
112. Wylie SJ, Li H, Saqib M, Jones MGK. 2014. The global trade in fresh produce and the vagility of plant
viruses: a case study in garlic. PLOS ONE 9:e105044
113. Wylie SJ, Tan AJ, Li H, Dixon KW, Jones MGK. 2012. Caladenia virus A, an unusual new member of
the family Potyviridae from terrestrial orchids in Western Australia. Arch. Virol. 157:2447–52
114. Xie G, Yu J, Duan Z. 2013. New strategy for virus discovery: viruses identified in human feces in the
last decade. Sci. China. Life Sci. 56:688–96
115. Zablocki O, Pietersen G. 2014. Characterization of a novel citrus tristeza virus genotype within three
cross-protecting source GFMS12 sub-isolates in South Africa by means of Illumina sequencing. Arch.
Virol. 159:2133–39
116. Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.
Genome Res. 18:821–29
117. Zhang Y, Singh K, Kaur R, Qiu W. 2011. Association of a novel DNA virus with the grapevine vein-
clearing and vine decline syndrome. Phytopathology 101:1081–90
118. Zhang YL, Yu NT, Huang QX, Yin GH, Guo AP, et al. 2014. Complete genome of Hainan papaya
ringspot virus using small RNA deep sequencing. Virus Genes 48:502–8
119. Zhang ZX, Qi SS, Tang N, Zhang XX, Chen SS, et al. 2014. Discovery of replicating circular RNAs by
RNA-Seq and computational algorithms. PLOS Pathog. 10:e1004553
20.20 Wu et al.
Changes may still occur before final publication online and in print