Abstract
Free full text
RNA recognition by double-stranded RNA binding domains: a matter of shape and sequence
Abstract
The double-stranded RNA binding domain (dsRBD) is a small protein domain of 65–70 amino acids adopting an αβββα fold, whose central property is to bind to double-stranded RNA (dsRNA). This domain is present in proteins implicated in many aspects of cellular life, including antiviral response, RNA editing, RNA processing, RNA transport and, last but not least, RNA silencing. Even though proteins containing dsRBDs can bind to very specific dsRNA targets in vivo, the binding of dsRBDs to dsRNA is commonly believed to be shape-dependent rather than sequence-specific. Interestingly, recent structural information on dsRNA recognition by dsRBDs opens the possibility that this domain performs a direct readout of RNA sequence in the minor groove, allowing a global reconsideration of the principles describing dsRNA recognition by dsRBDs. We review in this article the current structural and molecular knowledge on dsRBDs, emphasizing the intricate relationship between the amino acid sequence, the structure of the domain and its RNA recognition capacity. We especially focus on the molecular determinants of dsRNA recognition and describe how sequence discrimination can be achieved by this type of domain.
Introduction
Double-stranded RNA (dsRNA) is present in many biological processes. For example, many viruses carry their genetic information in the form of dsRNA, which may also be generated by the replication of single-stranded RNA viruses. Moreover, non-pathogenic cellular pathways also constitute a massive source of dsRNA within the cell. Indeed, dsRNA elements can be formed by base pairing of complementary sequences within primary RNA transcripts. These dsRNA modules appear, for example, in untranslated regions of mRNAs [1, 2], in mRNA precursors prior to intron removal where intronic sequences are found to base pair with complementary flanking exonic sequences [3], in stable RNAs and their precursors such as ribosomal RNAs and transfer RNAs [4], and, last but not least, in small RNA precursors such as short-interfering RNA (siRNA) and micro-RNA (miRNA) precursors [5–7]. Regarding the broad and different origins of dsRNA in the cell, dsRNA recognition plays critical roles in many cellular processes as diverse as response to viral infection, gene silencing through RNA interference pathways, regulation of translation, RNA processing, mRNA editing, export and localization (Table 1) [8–10].
Table 1
Protein name | Principal functions |
---|---|
RNase III family | dsRNA specific endoribonucleases |
Bacterial RNase III | Processing of rRNA and tRNA |
Yeast Rnt1p | Processing of rRNA |
Dicer | Processing of siRNA and miRNA |
Drosha | Processing of rRNA and miRNA |
ADAR family | A-to-I editing of viral and cellular RNAs |
Staufen | Localization of mRNAs |
PKR | Serine/Threonine kinase |
Anti-viral defense | |
Cellular stress response | |
Control of protein synthesis | |
TRBP/PACT | Modulator of PKR/RNA interference |
TRBP | Inhibition of PKR |
Translation activation | |
Component in RNAi pathway | |
PACT | Activation of PKR |
HYL1/HEN1 | Maturation of miRNA in plants |
In all these different cellular processes, dsRNA recognition is achieved by a superfamily of proteins having at least one double-stranded RNA binding domain (dsRBD; also referred to as dsRBM for double-stranded RNA binding motif). The dsRBD is one of the most abundant RNA binding domains after the RNA recognition motif (RRM), which is a well-characterized single-stranded RNA binding domain [11], and zinc fingers, which are best known for their ability to bind to DNA but which may also interact with dsRNA [12–14]. We also want here to mention that the dsRBD is not the only protein domain known to bind to dsRNA modules. Indeed, protein domains such as zinc fingers [12–14], SAM domains [15, 16], Z-DNA/Z-RNA binding domains [17–19], or even RRM domains [20–23], are also known to recognize RNA modules containing dsRNA fragments, such as regular dsRNA fragments, left-handed Z-RNA helices or short RNA hairpins. This review concentrates on dsRBDs, and readers interested in how other type of domains interact with particular dsRNA modules may refer to the existing literature on those domains (see references above).
Regarding the geometry of an A-form RNA helix and the nature of the chemical groups accessible to proteins in dsRNA [24–26], binding of dsRBDs to their dsRNA targets is commonly believed to be shape-dependent rather than sequence-specific [27–29]. However, in vivo, proteins containing dsRBDs bind to specific dsRNA targets [2, 30] and, despite a widespread significance in diverse cellular functions, our current understanding of how dsRBDs recognize these specific dsRNA targets remains imperfect [10, 31]. Nevertheless, the recent increase of structural information on dsRNA recognition by dsRBDs allows a global reconsideration of the precepts describing dsRNA recognition by this class of domain.
We review here the current structural and molecular knowledge on the double-stranded RNA binding domain. As more comprehensive reviews on the function of dsRBD-containing proteins are available elsewhere [8–10, 32], we focus in this article on the structural features of the dsRBD, the molecular determinants of dsRNA recognition and especially on how sequence discrimination can be achieved by this type of domain.
Structural characteristics of a dsRNA binding domain
The dsRBD is a ~65–70 amino acids domain found in eukaryotic, prokaryotic and even viral proteins. This domain, which specifically interacts with dsRNA (as compared with dsDNA or single-stranded nucleic acid molecules), was first recognized as a conserved functional domain in 1992 from sequence similarity between three different proteins: Staufen, a protein responsible for mRNA localization in Drosophila; human TAR-RNA binding protein (TRBP), a multi-functional protein first characterized as a an activator of HIV-1 gene expression; and Xenopus leavis RNA-binding protein A (Xlrbpa), a homolog of human TRBP [33]. In the same study, database searches enabled the identification of several other proteins containing one copy of the domain. At the same time, different domains were characterized as dsRNA binding domains in proteins such as PKR, a dsRNA-dependent protein kinase [34–36] and human TRBP [1]. A sequence alignment of dsRBDs of diverse origins is presented in Fig. 1. The sequence consensus is drawn below the alignment. Even if amino-acids are conserved along the entire domain, the last third at the C-terminal end is the most conserved part of the dsRBD. The first two-thirds from the N-terminus are more divergent, and some domains differing notably from the consensus in this part have been referred to as type B dsRBDs and have been shown to have rather low binding activity for dsRNA [33, 37].
The first three-dimensional structures of dsRBDs, namely the E. coli RNase III dsRBD and the third dsRBD of Drosophila Staufen, were solved by solution NMR uncovering a mixed α/β fold with a conserved αβββα topology in which the two α-helices are packed against a three-stranded anti-parallel β-sheet (Fig. 2a–e) [38, 39]. At the date of this review, the structure of about 30 dsRBDs have been determined from both NMR and X-ray crystallography (Table 2) [38–62]. These structures confirmed the conserved topology and the characteristic features of the dsRBD fold and uncovered the existence of some modest variations and some striking extensions to the canonical fold (see below).
Table 2
Protein name | PDB accession codea | Referencesb |
---|---|---|
RNase III family | ||
E. coli RNase III | ND | [38] |
S. cerevisiae Rnt1p | 1T4O, 1T4N, 1T4L, 2LBS | [40–42] |
A. aeolicus RNase III | 1RC7, 1YZ9, 1YYK, 1YYO, 1YYW, 2EZ6, 2NUG, 2NUF, 2NUE | [43–46] |
T. maritima RNase III | 1O0W | JCSG |
M. musculus Dicer | 3C4B, 3C4T | [47] |
S. pombe Dcr1 | 2L6M | [48] |
K. polysporus Dcr1 | 3RV0 | [49] |
H. sapiens Drosha | 2KHX | [50] |
ADAR family | ||
R. norvegicus ADAR2 | 2B7T, 2B7V, 2L3C, 2L2K, 2L3J | [51, 52] |
D. melanogaster ADAR | 2LJH | [53] |
Staufen | ||
D. melanogaster Staufen | 1STU, 1EKZ | [39, 54] |
M. musculus Stau2 | 1UHZ | RIKEN |
TRBP/PACT | ||
X. leavis RBPA | 1DI2 | [55] |
H. sapiens TRBP | 3ADL, 3LLH, 2CPN | [56, 57] |
H. sapiens PACT | 2DIX | RIKEN |
HYL1/HEN1 | ||
A. thaliana HYL1 | 3ADI, 3ADJ, 3ADG, 2L2N, 2L2M | [56, 58] |
A. thaliana HEN1 | 3HTX | [59] |
DGCR8 | ||
H. sapiens DGCR8 | 2YT4, 1X47 | [60] and RIKEN |
PKR | ||
H. sapiens PKR | 1QU6 | [61] |
M. musculus PKR | 1X48, 1X49 | RIKEN |
ILF3/SPNR | ||
H. sapiens ILF3 | 3P1X, 2L33 | NESGC |
H. sapiens SPNR | 2DMY | RIKEN |
RHA | ||
M. musculus RHA | 2RS6, 2RS7, 1UIL, 1WHQ | [62] and RIKEN |
aApart for E. coli RNase III structure for which no coordinates have been deposited (ND not deposited), PDB accession codes are given
bPrimary references related to each structure are given. In the case of structures solved by structural genomics centres and not associated with a publication, the name of the structural genomic centre is given
The structure of the second dsRBD of Xlrbpa constitutes the archetype of a canonical dsRBD domain [55]. This canonical dsRBD structure is represented in Fig. 2a, b. The conserved residues matching the sequence consensus (>40 %) of the dsRBD sequence alignment of Fig. 1 are labelled and shown as sticks on the structure. These residues appear to be conserved on the one hand for maintaining a stable hydrophobic core with the two α-helices packed on the β-sheet surface (Fig. 2c) and on the other hand for optimal dsRNA binding (Fig. 2d):
(1) Conserved hydrophobic residues come from almost all secondary structured elements, namely α1, β1, β2 and α2 (Figs. 1, ,2a–c).2a–c). Aliphatic side-chains are mostly found in the two helices whereas aromatic rings are predominantly found in the β-strands. To be precise, aliphatic side chain include L6 and L9 in helix α1 (residue numbering refers to the alignment of Fig. 1), V39 and V41 in β2 and A58, A62, A63, A66, L67 and L70 in helix α2. Note that the small side chains of A58 and A62 in helix α2 together with the conserved GxG motif at the end of the β3 strand allow a tight packing in this region with helix α2 coming in remarkably close proximity with this part of the β-sheet, with, for instance, Cα carbons of G50 and A62 being less than 4 Å apart. In addition to the conserved aliphatic residues, two aromatic side chains are almost absolutely conserved in dsRBDs, namely Y21 in the β1 strand and F35 in the β2 strand (Figs. 1, ,2a–c).2a–c). These aromatic residues have been shown to be indirectly involved in RNA binding by maintaining key positively charged residues in an optimal orientation for dsRNA binding [37, 39, 54, 55] (see below). Indeed, mutations of these aromatic residues were reported to completely abolish dsRNA binding [37, 39, 54]. However, regarding the position of those aromatic rings, at the periphery of the dsRBD hydrophobic core, they most probably contribute in addition to the stability of the overall domain.
(2) Systematic mutational analysis conducted in different dsRBDs uncovered three different regions important for dsRNA interaction [36, 39, 54, 63–65]. These three regions are shown on Fig. 1. Region 1 is located in helix α1, region 2 in the loop joining the β1 and β2 strands, and region 3 at the N-terminal tip of helix α2. In these three regions, conserved residues in the sequence consensus can therefore be explained by their involvement in dsRNA binding. To be precise, the side chain of E8 in helix α1, the GPxH motif in the β1–β2 loop and the positively charged residues in the KKxAK motif at the beginning of helix α2 are highly conserved in dsRBDs as a consequence of their participation in dsRNA binding. Remarkably, the orientation of the side chains of K55 and K59 in the free protein is pre-organized for RNA binding [53, 55]. The orientation of both side chains is stabilized by an extensive set of van der Waals interactions (Fig. 2d). Namely, the first lysine interacts with three residues from the β-sheet whose side chains point down inside the hydrophobic core of the dsRBD: a residue in strand β1 (position 23), which is often a leucine or a valine, the conserved phenylalanine in strand β2 (position 35), and another less conserved residue in strand β2 (position 37). The third lysine interacts with a hydrophobic residue from helix α1 (position 3), the aromatic cycle of the conserved tyrosine in strand β1 (position 21) and with the hydrophobic residue from strand β2 already mentioned (position 37). The third residue of the KKxAK motif is not conserved. It is exposed at the surface of the dsRBD and does not make any significant interaction with other parts of the domain. The conserved alanine of the motif is packed against the β-sheet surface. A likely explanation for the conservation of this alanine is that a bulky side-chain at this position would cause steric clashes that would destabilize the β-sheet. In the third dsRBD of Staufen, this alanine is replaced by a serine without causing any major rearrangements [41, 54]. Note that molecular details of dsRNA recognition by these three conserved regions of the dsRBD will be precisely described in later paragraphs of this review.
Some variations and extensions to the canonical dsRBD fold
Although dsRBD structures are overall well conserved, some variations are found, especially in two regions of the domain. The first region of variability consists of the loop between helix α1 and the β1 strand. This appears clearly in the alignment of Fig. 1, where gaps are frequently needed to maintain a proper alignment in this region. In addition, the length of helix α1 can sometimes be shorter as in Yeast RNase III, mammalian ADAR2 and Drosophila ADAR dsRBDs (Figs. 1, ,3b,3b, c). Yet no particular properties in terms of RNA binding or other function have been assigned to such dsRBDs containing a shorter helix α1 [40, 51]. The second region of variability consists of the loop connecting the β1 to the β2 strand (loop 2). This loop which is typically 6 amino acids in length can accommodate some long insertion, as in the case of S. pombe Dcr1 and A. thaliana HEN1 (Fig. 3e, f) [48, 59]. In the case of S. pombe Dcr1, this longer loop 2 has been shown to be structurally heterogeneous in the free form of the protein, and a deletion mutant with a shortened loop 2 failed to bind to dsRNA [48]. In the crystal structure of A. thaliana HEN1, conserved hydrophobic residues in this long loop 2 were found to interact with other conserved residues of the methylase catalytic domain [59]. This could suggest that long β1–β2 loops would have a dual role in both dsRNA binding and protein binding and could therefore serve to orient catalytic or auxiliary domains in respect to dsRNA substrate. This is an attractive possibility that would require further investigation.
In addition to these modest variations around the canonical dsRBD fold, some remarkable extensions have also been observed. So far, all of them are found as C-terminal extensions, thus occurring after helix α2 (Fig. 3c–e). The first C-terminal extension have been described in the yeast RNase III (Rnt1p) and consists of a long α-helix (helix α3) that folds back after helix α2 in the direction of helix α1 and the loop joining helix α1 with the β1 strand (Fig. 3c) [40, 41]. The additional helix α3 has been shown to be important for the overall stability of the domain and to play a critical role in RNA binding. Even if helix α3 is not in direct contact with RNA substrates, it has been proposed to contribute indirectly to RNA binding by helping positioning helix α1, which is the primary determinant of RNA recognition by Rnt1p [40, 41]. In addition, it has been noticed that helix α3 would clash with the C-terminal end of helix α1 if this latter one was not shorter as compared to canonical dsRBDs (Fig. 3a, c) [40]. Although shorter, a similar helix α3 was found in the structure of K. polysporus Dcr1 (Fig. 3d) [49]. Thus far, no particular role have been associated to helix α3 in K. polysporus Dcr1. Indubitably, the most striking C-terminal extension found in dsRBDs resides in the C-terminal dsRBD of the fission yeast dicer (Schizosaccharomyces pombe Dcr1) [48]. Indeed, the C-terminal extension is composed of a short α-helical turn (helix α3) followed by a zinc-coordination site (Fig. 3e). The short helix α3 is clearly reminiscent of S. cerevisiae Rnt1p and K. polysporus Dcr1. But the CHCC zinc-binding motif has no equivalent in any dsRBD structure. What is especially remarkable is that not all four ligands composing this zinc-binding site (three cysteines and one histidine, CHCC) are part of the C-terminal extension. Only the last two ligands (chCC) are found within the extension, whereas the first two ligands (CHcc) are part of the dsRBD fold itself. These first two ligands are located in loop 1 between α1 and β1 and in loop 3 between β2 and β3, respectively. This C-terminal extension has been shown to play a critical role in the sub-cellular localization of S. pombe Dcr1 [48, 66]. Zinc coordination is indeed required for the proper folding of the domain and contributes to the formation of a protein–protein interaction surface that mediates nuclear localization of Dcr1. The C-terminal extension would allow the attachment of Dcr1 to a nuclear protein, resulting in nuclear accumulation of Dcr1 [48, 67].
Attractively, other dsRBDs have been reported to mediate nucleocytoplasmic trafficking, promoting either nuclear import or nuclear export [68–72]. However, the exact mechanisms controlling the nucleocytoplasmic distribution of these dsRBD-containing proteins are poorly understood. Structural information would probably be needed to unravel the molecular basis of these mechanisms and particularly to determine whether extensions to the canonical dsRBD fold would also be present in these dsRBDs and would as well be involved in the regulation of the sub-cellular localization of these proteins.
Interestingly, several other dsRBDs have been reported to mediate interaction with protein domains. These interactions commonly involve other dsRBDs, but different types of domains have also been reported to mediate these dsRBD–protein interactions [59, 73–81]. To date, structural information describing how dsRBD domains would interact with other protein partner is very limited. However, the crystal structure of the two tandem dsRBDs of DGCR8 that extensively interact with each other might give an insight into dsRBD–dsRBD associations [60]. To date, the structure of DGCR8 dsRBDs is unique, as, in the available solution structures of tandem dsRBDs, namely PKR and ADAR2, the linker between the two dsRBDs is flexible, leading to independent dsRBDs [51, 52, 61]. Additional structural studies of proteins containing multiple dsRBD domains would be critical to a better understanding of how dsRBDs might associate with each other and how they collaborate for dsRNA binding, in terms of affinity, cooperativity and specificity.
dsRNA recognition by dsRNA binding domains
Structural characteristics of the A-form RNA helix
The most salient property of dsRBDs is their ability to discriminate among the structural and chemical variety of nucleic acid polymers to preferentially bind to dsRNA. The dsRNA helix structure has been extensively studied and the interested reader is referred to other reviews for further study [24, 82]. Nevertheless, we will point out a few aspects of dsRNA structure relevant to the specific recognition of dsRNA by dsRBDs. The conformation adopted in solution by dsRNA is the so-called A-form RNA double helix. The morphology of this helix is characterized by a wide and shallow minor groove where the edge of the bases is readily accessible and a deep and narrow major groove where access to the bases is hindered (Fig. 4a). The other distinctive characteristic of dsRNA is the presence of 2′-OH functional groups on the ribose sugars, lining up in the minor groove, which are absent in the DNA double helix. Discrimination between dsDNA and dsRNA is achieved by dsRBDs by probing chemical features (presence vs. absence of ribose 2′-OH groups) and structural features (the width of the minor and major grooves). Due to the difference of grooves accessibility, contacts between RNA bases and dsRBDs occur in the minor groove. Figure 4c, d show the hydrogen bond donor/acceptor groups exposed in the minor grooves that can be probed by dsRBDs to distinguish an A–U base pair from a G–C base pair. As can be seen, the only pattern difference that can be exploited to discriminate A–U from G–C base pairs is the presence of one hydrogen bond donor in G–C base pairs (Fig. 4c, d). In the next part of this review, we describe in detail the interactions observed in the high resolution structures of dsRBDs–RNA complexes solved in the past 15 years (Table 3) that gave us insights into the molecular basis of dsRNA recognition. We will then discuss the sequence-specific contacts observed in three recent high resolution structures of dsRBD–RNA complexes, which gave new insights in the mechanisms of specific RNA target recognition by dsRBDs.
Table 3
Protein name | PDB code | Method | RNA substrates | References |
---|---|---|---|---|
RNase III family | ||||
S. cerevisiae Rnt1p | 1T4L, 2LBS | NMR | Single hairpina | [41, 42] |
A. aeolicus RNase III | 1RC7 | X-ray | Coaxially stacked duplexesb | [43] |
1YZ9 | X-ray | Coaxially stacked duplexesc | [44] | |
1YYW | X-ray | Coaxially stacked duplexesd | [44] | |
1YYO, 1YYK | X-ray | Coaxially stacked duplexese | [44] | |
2EZ6 | X-ray | Coaxially stacked hairpinsf | [45] | |
2NUF | X-ray | Coaxially stacked hairpinsf | [46] | |
2NUE | X-ray | Single hairpinf | [46] | |
2NUG | X-ray | Coaxially stacked duplexesf | [46] | |
ADAR family | ||||
R. norvegicus ADAR2 | 2L3C | NMR | Single hairping | [52] |
2L2K | NMR | Single hairping | [52] | |
2L3J | NMR | Single hairping | [52] | |
Staufen | ||||
D. melanogaster Staufen | 1EKZ | NMR | Single hairpinh | [54] |
TRBP/PACT | ||||
X. leavis RBPA | 1DI2 | X-ray | Coaxially stacked duplexesb | [55] |
H. sapiens TRBP | 3ADL | X-ray | Coaxially stacked duplexesi | [56] |
HYL1/HEN1 | ||||
A. thaliana HYL1 | 3ADI | X-ray | Coaxially stacked duplexesj | [56] |
A. thaliana HEN1 | 3HTX | X-ray | Coaxially stacked duplexesk | [59] |
aSmall nucleolar RNA snR47 capped by an AGAA (1T4L) or AAGU (2LBS) tetraloop
bNon-natural substrate [GGCGCGCGCC]2
cNon-natural substrate [CGAACUUCGCG]2
dNon-natural substrate [AAAUAUAUAUUU]2
eNon-natural substrate [CGCGAAUUCGCG]2
fDerived from R1.1, a canonical substrate of E. coli RNase III [109]
gRNA hairpins derived from the R/G editing site of mammalian GluR2
hNon-natural substrate (the tetraloop is underlined) G G A C A G C U G U C C C U U C G G G G A C A G C U G U C C
iNon-natural substrate [CGCGCGCGCG]2
jNon-natural substrate [CUCGAUAACC]/[GGUUAUCGAG]
kDerived from miR173/miR173*, a natural substrate of HEN1
High resolution dsRBD–RNA structures
At the date of writing, high resolution structures of nine different dsRBDs in complex with various types of RNA partners have been determined (Table 3) [41–46, 52, 54–56, 59]. The proteins containing these dsRBDs are involved in a broad range of biological processes including RNA editing (ADAR2), RNA silencing (TRBP [81, 83] and HYPONASTIC LEAVES1 [84]), ribosomal RNA processing (Rnt1p and Aquifex aeolicus RNase III) and mRNA localization (Drosophila Staufen). The RNA molecules used in these structure determinations are of diverse types (RNA duplexes or RNA hairpins), origin (synthetic or natural target), size, and nucleotide sequence. For instance, Aa RNase III has been crystallized with five different types of RNA including coaxially stacked poly GC duplexes of synthetic origin, as well as RNA hairpins deriving from canonical substrates (see Table 3 and references for more details). This dataset of structures also provides interesting examples of specific RNA secondary structure recognition in four structures of dsRBDs in complex with RNA stem–loop (Staufen dsRBD3, Rnt1p dsRBD, ADAR2 dsRBD1). Interestingly, RNA sequence-specific contacts are observed in several structures. This raises the possibility that direct readout of the nucleotide sequence in the RNA minor groove modulates RNA recognition by dsRBDs. The majority of the structures of the dataset were determined by X-ray crystallography (13 structures) and 5 structures were solved by NMR spectroscopy (Table 3). In the next part of this paper, we present what we learnt from this set of structural data concerning two aspects of RNA recognition by dsRBDs: shape recognition (A-form RNA helix, hairpins apical loops) and direct readout of the nucleotide sequence.
Description of the dsRBD–RNA binding interface
Global description of dsRBD–RNA interface
High resolution structures of dsRBDs in complex with dsRNA gave a precise description of the RNA binding surface and revealed how different parts of the domain combine to form a surface ‘shaped’ to recognize specifically the A-form helix conformation adopted by dsRNA. DsRBDs achieve specific dsRNA recognition by making contacts to bases and ribose moieties located at two successive minor grooves and by contacting the phosphate backbone delimiting the intervening major groove (Fig. 5a). As mentioned previously, three distinct regions of the protein participate in the recognition of dsRNA: (1) the N-terminal tip of helix α2, (2) helix α1, located at the N-terminus of the domain, and (3) the loop connecting the first and the second β strands (loop 2) (Figs. 1, ,5a)5a) [55]. The molecular basis of this recognition is described in detail below. All the positions mentioned in the text refer to the sequence alignment shown in Fig. 1.
Recognition of the dsRNA major groove by the N-terminal tip of helix α2 (region 3)
The N-terminal tip of helix α2 is part of the canonical dsRNA binding surface and was originally designated ‘region 3’ [55] (Fig. 5a, b). The amino acids sequence corresponding to this region consists of the well-conserved KKxAK motif, which is indeed part of the dsRBD consensus sequence originally identified [33] (Figs. 1, ,5a).5a). In the canonical RNA binding mode observed in the high resolution structures of ADAR2 dsRBD1 and 2, TRBP dsRBD2, Xlrbpa, and Aa RNase III, this surface contacts the phosphodiester backbone of both RNA strands across the major groove of the helix (Fig. 5a, b). The side chains of the first and the third lysines of the motif point toward one of the RNA strands while the second lysine points toward the other strand, forming an arch spanning the width of the major groove (see K55, K56 and K59 in Figs. 5a, a,6c).6c). At the atomic level, the amino group of each lysine contacts a non-bridging oxygen atom of the RNA backbone. In addition, the amide proton of the first lysine (K55) also forms a direct hydrogen bond with a non-bridging oxygen atom of the RNA backbone (Figs. 5a, a,6a,6a, c). This remarkable set of interactions results from a particular orientation of the side chains of these three lysines, which is stabilized by: (1) the rigidity of the peptide backbone, which is embedded into an α-helical secondary structure (namely, N-terminal tip of helix α2), and (2) an important set of van der Waals interactions between the side chains of the first and the third lysines (K55 and K59) and other side chains from the hydrophobic core of the domain (Fig. 6c, see also the section concerning the structural characteristic of the dsRBD fold). It is noteworthy that this motif is not absolutely conserved, the most frequent variations being the substitution of one or more lysine by an arginine or a glutamine residue (e.g. Xlrbpa dsRBD2, HsPKR dsRBD2) (Fig. 1). These kind of variations are not expected to significantly affect the way region 3 interacts with dsRNA as can be seen in the structure of Xlrbpa dsRBD2 [55]. In several cases, the third lysine is more drastically replaced by a negatively charged residue or a glycine (Fig. 1). In all these cases, this substitution is compensated by the presence of a lysine in helix α1 (Figs. 5a, b, b,6a,6a, c) as discussed in detail in the next paragraph.
Recognition of the dsRNA minor groove by helix α1 (region 1)
Helix α1 is part of the canonical dsRNA binding surface and was originally designated ‘region 1’ [55] (Fig. 5a, b). It is an amphipathic α-helix with a hydrophobic surface involved in the constitution of the hydrophobic core of the dsRBD, and a solvent-exposed hydrophilic surface making extensive contacts with bases and riboses in the minor groove of the dsRNA. Analysis of the structures of dsRBDs–dsRNA complexes (Table 3) shows that the interaction surface of helix α1 with RNA is constituted by four to five amino acids always at positions 3, 4, 7, 8 and 11 of the sequence alignment shown in Fig. 1. The position of theses residues at the surface of helix α1 is schematically shown in Fig. 7. It is noteworthy that, among these residues, only one is part of the dsRBD consensus sequence (position 8 in Fig. 1). As has already been noted, the relative lack of conservation of these exposed side chains could be one of the factors modulating the RNA binding properties of different dsRBDs [54]. The interactions involving these residues are described in the following paragraphs.
Residue found in position 3 (Figs. 1, ,7)7) are almost invariably bulky and hydrophobic amino acids such as valine or isoleucine. As can be seen in the structures of Xlrbpa dsRBD2, ADAR2 dsRBD1 and 2, this side chain has three important involvements. First, it makes van der Waals contacts to a ribose moiety of the RNA, contributing to helix α1 positioning in the minor groove (see Val 3 in Fig. 6c). Second, it interacts with the aliphatic portion of the third lysine of the KKxAK motif, forcing it to orient toward the RNA backbone (see Lys 59 in Fig. 6c). Third, it interacts with the well-conserved alanine located in helix α2 (position 63 in Fig. 1), participating in the formation of the hydrophobic core of the dsRBD fold. A striking exception is found in Aa RNase III dsRBD, where position 3 is occupied by a lysine. The high resolution structure of this dsRBD in complex with dsRNA reveals that the aliphatic moiety of the side chain of this lysine makes hydrophobic contacts with the core of the domain, whereas the amino group points towards the RNA backbone to contact the non-bridging oxygen atom that is normally interacting with the last lysine of the canonical KKxAK motif (see Lys 3 in Figs. 5b, b,6a).6a). Interestingly, RNase III dsRBD exhibits a non canonical KKxAE motif where the last lysine of the canonical KKxAK motif is replaced by a glutamate. In this case, the recognition of the RNA major groove is achieved by a bipartite motif where the first two lysines (Lys 55 and Lys 56) are coming from the N-terminal part of helix α2 as is usually the case, and the third lysine (Lys 3) is coming from helix α1 (Figs. 1, ,5,5, ,6a).6a). This situation occurs several times in the sequence alignment of Fig. 1: the loss of the last lysine of the KKxAK motif (which is replaced by a glutamate or a glycine) is rescued by the presence of a lysine in the N-terminal part of helix α1 (at position 3 in Fig. 1). This functional complementation of the lysine at position 3 in helix α1 and an incomplete KKxAx motif is supported by structural data from the different Aa RNase III-RNA complexes. Additional structural and biochemical data would be needed to know whether or not this observation can be generalized.
Amino acids found at position 4 in helix α1 (Figs. 1, ,7a)7a) are Thr, Ser, Cys, Gly, Gln, Asn, Arg or Met. Except for Gly or Met, all these side chains can potentially be involved in hydrogen bond interactions. The structures of Aa RNase III in complex with dsRNA show that the hydroxyl group of the threonine is involved in a hydrogen bonds network with the O4′ atom and the 2′-OH group of two adjacent ribose sugars. In most structures, helix α1 comes in close contact with the RNA ribose sugars, with a distance Cα–2′OH of about 3.5 Å. In both dsRBDs of ADAR2, this position is occupied by a methionine residue whose side chain points towards the edge of the bases in the minor groove to make a sequence-specific interaction [52] (Figs. 6b, b,7c).7c). The impact of this interaction on RNA recognition will be discussed in detail later in the text.
Position 7 (Figs. 1, ,7a)7a) is occupied in most cases by either a glutamine or an asparagine residue (50 % in the alignment of Fig. 1). In several structures of Aa RNase III in complex with dsRNA, the side chain of the glutamine makes a bidentate hydrogen bond to the 2′-OH group of the ribose and to the base of the same nucleotide. The role of hydrogen bond acceptor can be fulfilled either by the O2 atom of a pyrimidine or the N3 atom of a purine (Fig. 4c, d, and see Gln 7 in Fig. 6a). Therefore, as noticed by Gan and co-workers [45], this interaction is not sequence-specific since all four types of bases can support it. The same interaction pattern is observed in the structure of TRBP dsRBD2 and Xlrbpa dsRBD2. However, this interaction is not present in all known structures of dsRBD–RNA complexes; in both dsRBDs of ADAR2, the position is occupied by an asparagine which does not make such contacts, possibly due to its smaller length compared to the glutamine. In Staufen dsRBD3 and Rnt1p, position 7 is occupied by a histidine and a tyrosine, respectively. In the NMR structure of Staufen dsRBD3, the histidine side chain is too far from the RNA to interact, due to the important curvature of the RNA helix [54]. In Rnt1p, the tyrosine residue is stacked on a ribose sugar at the 3′ side of the apical loop of the RNA substrate [41, 42].
The amino acid at position 8 (Figs. 1, ,6b,6b, b,7)7) is a remarkably well-conserved glutamate which is indeed part of the consensus sequence identified for dsRBDs [33]. It is therefore expected to be important for RNA binding, and actually, mutation of this residue has been shown to abolish RNA binding [54]. In the set of dsRBDs having the shorter version of helix α1 (e.g. ADAR2 dsRBDs and Rnt1p), this is the last residue of the helix which interacts with dsRNA. In ADAR2 dsRBD2, Xlrbpa dsRBD2, TRBP dsRBD2 and RNase III dsRBD, the carboxylic group of this glutamate makes a hydrogen bond with the 2′-OH of a ribose sugar of the RNA. In ADAR2 dsRBD1 and in Staufen dsRBD3, this glutamate interacts with a nucleotide located in the apical loop of the RNA stem–loop binding partner. Rnt1p is one of the few examples where this glutamate is not conserved. It is replaced by a serine which also interacts with a 2′-OH group in the apical loop.
Position 11 is present only in dsRBDs that have the longer version of helix α1 (Fig. 1, ,6a;6a; and compare Fig. 7c, d). It is in an ad hoc position to contact the RNA minor groove. Hydrophobic residues (Val or Ile) or glutamine residues are frequently found at this position. In the second dsRBDs of TRBP and Xlrbpa, a valine is present while an isoleucine is found in the third dsRBD of Staufen. In these three structures, the hydrophobic side chain makes a van der Waals interaction with an RNA ribose. Aa RNase III provides an example where this position on helix α1 is occupied by a glutamine residue. Remarkably, the side chain of the glutamine points toward the minor groove and interacts with the edge of a nucleotide base. This interaction has been reported to be sequence-specific and will be further discussed in detail later in the review.
Interactions from region 2 (loop 2) to the dsRNA minor groove
The last region of the dsRBDs composing the canonical RNA binding surface is the loop connecting the two strands β1 and β2 (also designated ‘region 2’ [55]). In the canonical mode of binding, this loop inserts into the RNA minor groove one helix turn away from the minor groove interacting with the helix α1 (Figs. 5a, b, b,6c,6c, d). This loop has a well-defined length (6 residues, positions 28–33 in Fig. 1) and amino acid composition matching the pattern GPxHxx (Fig. 1). A notable exception is found in the second dsRBD of A. thaliana HEN1 (Fig. 3f) where the loop differs both in size and amino acid composition. Interestingly, the crystal structure of HEN1 in complex with RNA [58] shows that the loop makes extensive contacts with other parts of the protein and interacts only marginally with the RNA. The RNA binding mode of loop 2, originally described in the structure of Xlrbpa dsRBD2 [55], relies on a small set of interactions which are conserved in the other high resolution structures (TRBP dsRBD2, ADAR2 dsRBDs and Aa RNase III). Namely, the carbonyl group of the peptide backbone of the conserved His 31 (Fig. 1) makes a direct hydrogen bond to a 2′-OH group on one strand whereas the imidazole cycle stacks on one ribose and makes a hydrogen bond to the 2′OH of the previous ribose on the other strand (Fig. 6d). In many dsRBD–RNA structures, the peptide backbone carbonyl group of residue 30 makes a hydrogen bond with the amino group of a guanine. This contact is sequence-specific and will be discussed in a later paragraph. The side chain of residue 30 contacts the edge of a base in the minor groove and might also play a role in RNA recognition, although experimental evidence is still lacking here. For instance, in HYL1 dsRBD1, this position is occupied by a serine which makes a hydrogen bond with the N3 atom of a guanine, while in ADAR2 dsRBD1, this position is occupied by a valine, whose bulky side chain faces a guanine in the syn conformation (Fig. 6c). The conservation of loop 2 binding mode results from the particular conformation adopted by the peptide backbone. Although there is a certain degree of variability in loop 2 sequences (Fig. 1), a set of interactions organized around the highly conserved His 31 is almost systematically observed in the different high resolution structures. The side chain of His 31 makes van der Waals contacts with the side chains of the conserved proline (position 29) and of the next residue (position 32), which often has a significant aliphatic moiety (Fig. 1). The side chain of residue 33, which is often a lysine or an arginine folds back on the side of the loop facing the RNA minor groove. Altogether, this set of interactions forms a small hydrophobic cluster which must have a stabilising effect on loop 2 conformation. The central role of the conserved His 31 in RNA binding has also been demonstrated by mutation studies [37].
RNA shape recognition by dsRBDs
A-form helix recognition
Several studies have shown that dsRBDs specifically bind dsRNA as opposed to other types of nucleic acids (namely ssRNA, RNA:DNA hybrid, dsDNA). Dissociation constants on the micromolar range have been reported for dsRNA binding whereas dissociation constants higher by an order of magnitude or more have been reported for other types of nucleic acid [1, 48, 57, 83, 85]. The interactions described previously confer on dsRBDs the singular property to bind specifically to the A-form RNA helix. Two important characteristics of the dsRNA helix are recognized by this set of interactions: the width of the major groove and the presence of hydroxyl functional groups on the ribose sugars. The major groove of the A-form dsRNA helix has a width of about 10 Å (defined as the interstrand phosphate–phosphate distance between base pair i and i + 6) and a minor groove width of about 15 Å. For comparison, the major and minor grooves in a B-form DNA helix have a width of about 17 and 11 Å, respectively [24]. The width of the RNA major groove is probed by dsRBDs with the N-terminal tip of helix α2 (region 3; KKxAK motif) which interacts with the non-bridging oxygen atoms of the phosphodiester backbone which point outwards from the dsRNA axis. The residues of the protein involved in these interactions are located in a well-structured region of the protein and hence are not prone to undergo important structural changes. The particular spatial arrangement of the amino and amide functional groups of the lysines present in dsRNA binding region 3 results from the tight packing of the lysine side chains with elements of the hydrophobic core of the domain coming from helix α1 and from the β-sheet, emphasising the intricate dependence between the fold of the domain and the formation of an operational binding surface. This rigidity results in a strict recognition of the major groove width leading to dsRNA-specific recognition. The second distinctive chemical characteristic of RNA which is recognized by dsRBDs is the presence of hydroxyl functional groups at the 2′ position of the ribose sugars. In the dsRNA, these functional groups are located in the minor groove of the helix (Fig. 4). They are recognized by the helix α1 and loop 2 regions of the dsRBDs through direct and water-mediated hydrogen bonds as described previously.
Apical-loop recognition
RNA hairpins are structural motifs frequently found in cellular RNA including mRNA, tRNA and rRNA [86]. Internal loops are formed in non complementary regions of dsRNA whereas apical loops are the RNA structures capping RNA hairpins [87]. Biological relevance of RNA apical loop recognition by a dsRBD has been demonstrated for Rnt1p. It has been shown that AGNN tetraloop recognition by Rnt1p dsRBD was essential for proper RNA substrate recognition [88–90]. To date, it is not clearly understood whether apical loop specific recognition is a property conserved among all dsRBDs or whether it only concerns a subset of the dsRBD family. In the set of structures shown in Table 3, contacts between dsRBDs and RNA apical loops are observed for Staufen dsRBD3, Rnt1p dsRBD and ADAR2 dsRBD1 [41, 42, 52, 54] (Fig. 8a) whereas no contacts are observed in the structures of RNase III dsRBD and ADAR2 dsRBD2, although both dsRBDs are bound to RNA hairpins (Fig. 8b).
The structures of three different dsRBDs in complex with RNA hairpins (Staufen, Rnt1p, and ADAR2 dsRBD1) revealed a conserved mode of binding in which helix α1 contacts the minor groove of the RNA apical loop, with the N and C-termini of the helix oriented toward the 5′ side and the 3′ side of the RNA loop, respectively (Fig. 8a). In all these structures, the helix contacts the sugars and bases of the loop and of the base pairs located immediately below it. Most of the side chains of helix α1 contacting the RNA apical loop occupy the same positions as those involved in double-strand minor groove recognition (positions 3, 4, 7, 8 in Fig. 7). For instance, Asn 87 and Glu 88 (positions 7 and 8) in rat ADAR2 dsRBD1 and Glu 585 (position 8) in Staufen dsRBD3 are interacting with the RNA apical loop. In Staufen dsRBD3, Gln 582 and Lys 589 (positions 5 and 12) are making contacts to the RNA loop, although they do not occupy one of the standard positions shown in Fig. 7, and mutation of Gln 582 (position 5) abolishes RNA binding [54]. Rnt1p dsRBD uses side chains Lys 371, Arg 372, Tyr 375, Ser 376 (corresponding to the standard positions 3, 4, 7, 8) to contact the RNA apical loop and the closing base pairs. One of the most exciting features found in Rnt1p is the presence of an additional α helix at the C terminus of the dsRBD (Fig. 3c) which was shown to be essential for apical loop recognition [40, 41]. However, the two solution structures of Rnt1p dsRBD in complex with RNA stem loop show that this terminal helix does not interact directly with the RNA [41, 42]. Its influence on RNA binding and RNA loop recognition was proposed to result from an indirect effect resulting from the particular orientation imposed to helix α1 and loop 1 [40, 42]. Surprisingly, the two conserved A and G bases of the AGNN-type RNA apical loop are pointing in the major groove, and thus are not contacted by the dsRBD. To account for that, it was proposed that Rnt1p dsRBD recognizes the particular fold of AGNN tetraloop [91, 92] rather than the nucleotide sequence [41].
The impact of internal loops on dsRNA recognition by dsRBDs has been less studied than apical loop recognition. Nevertheless, it is expected that disturbance of the regular A-form RNA double helix will impact dsRBD binding. It was reported accordingly that the presence of internal loops has a negative impact on RNA binding affinity [85].
The sequence-specificity paradox
The structures of dsRBDs in complex with RNA (Table 3) revealed the canonical mode of dsRNA recognition. For a long time, this canonical mode of interaction was described with essentially an absence of any sequence specificity, even if few sequence-specific contacts have been noticed in some structures [43, 55]. In fact, as the majority of dsRBD–RNA interactions involve contacts with non-bridging oxygen of the phosphodiester backbone or with 2′-hydroxyl groups of the ribose sugar rings, the dsRBD has been described and often restricted to a non-sequence-specific dsRNA binding domain [27–29]. In addition, very few sequence-specific contacts have been observed in the first structures of dsRBDs in complex with dsRNA, which was actually not completely unexpected considering the fact that these structures have been solved with non-natural targets [54, 55].
On the other hand, multiple examples exist of dsRBD-containing proteins showing a high degree of specificity in their interaction with RNAs. For example, the Drosophila Staufen protein, involved in mRNA transport, contains five dsRBDs and binds specifically to the 3′-UTR of certain mRNAs like bicoid, oskar and prospero mRNAs [2, 74, 93–97]. Furthermore, pre-mRNA editing enzymes from the ADAR family, that contain up to three dsRBDs essential for RNA substrate recognition [17, 30], can be highly specific, and RNA editing catalyzed by this family of enzymes is critical for normal life in mice and Drosophila [98–102]. In addition, bacterial and yeast RNase III contain a single dsRBD domain and cleave their substrates in a highly site-specific manner, which is required for optimal RNA function [103, 104].
The necessity to reconcile these two points led to the conclusion that dsRBDs must recognize their specific substrates through their shape, meaning that a specific recognition would occur at particular distortions points of the A-form RNA helix introduced by specific RNA elements, such as apical or internal loops, base-pair mismatches or bulges [27–29]. This notion was also promoted by the many structural examples in which these types of structural imperfections are largely exploited by proteins or peptides to achieve a sequence-specific recognition of the RNA [105, 106]. Indeed, such imperfections in the dsRNA helix widen the major groove, therefore opening an access to the higher sequence information content of the major groove, as compared to the minor groove [25, 26]. Interestingly, in the case of dsRBDs, all the structures solved in complex with RNA have clearly established that contacts with RNA bases can only occur in the minor groove. There is, to date, no structure of dsRBDs bound to an irregular RNA helix containing, for instance, bulges or internal loops that could reveal how these elements would indeed target certain dsRBDs to specific sites. The only structural elements that are clearly involved in targeting dsRBDs to specific RNAs are apical loop structures found as essential capping elements for certain dsRNA substrates. As already mentioned, the shape of the A/uGNN or the AAGU tetraloops is indeed a primary element for the specific binding and cleavage of RNA substrates by the yeast RNase III [41, 42, 89, 90, 107, 108].
However, some key examples resist this simple description of dsRBDs as shape-dependent dsRNA recognition domains. For example, the molecular features controlling the site-specific cleavage of bacterial RNase III consist of a subtle combination of determinants and antideterminants favouring or impairing the dsRBD binding to its RNA substrates [109–112]. Importantly, even if the minor groove has smaller information content than the major groove, a sequence-specific recognition of dsRNA in the minor groove is still possible [25]. Indeed, recent structures of ADAR2 dsRBDs bound to a natural and specific RNA substrate have revealed that the binding is achieved via a direct readout of the RNA sequence in the minor groove [52]. The recent increase of structural information on dsRNA recognition by dsRBDs allows an extension of the currently admitted RNA shape recognition model, to include the effect of RNA sequence on binding affinity.
Molecular basis for RNA sequence recognition
Recent high resolution structures of ADAR2 dsRBDs and Aa RNase III in complex with dsRNA have shown the presence of a few RNA sequence-specific contacts between the peptide backbone functional groups or side chains and the edge of RNA bases in the minor groove. These contacts involve two regions of the dsRBDs, helix α1 and loop 2, which contact the RNA at two successive minor grooves (Fig. 5a, b). We discuss in the remaining parts the recent structural and biochemical finding supporting a direct readout of the RNA minor groove by certain dsRBDs.
Sequence specific interactions from helix α1
These sequence-specific interactions involve the side chains of residues located at positions 4 and 11 in the sequence alignment shown in Fig. 1 (see also Fig. 7). In both dsRBDs of ADAR2, position 4 (Figs. 1, ,6b,6b, b,7c)7c) is occupied by a methionine residue. The high resolution solution structures of both dsRBDs in complex with an RNA stem–loop deriving from a natural substrate of ADAR2, show that the side chain of the methionine extends into the RNA minor groove where the methyl group makes a hydrophobic contact with the edge of an adenine (Met 4 in Fig. 6b). If this adenine were to be replaced by a guanine, a steric clash is predicted to occur between the amino group of the guanine and the methyl group of the methionine (compare Fig. 4c, d) leading to a decrease in the affinity. To test this hypothesis, Stefl and co-workers [52] studied the impact of the presence of the exocyclic amino group in the minor groove on binding affinity by replacing the adenine contacted by the methionine by a guanine. Indeed, these results show an affinity decrease of four- to fivefold when the adenine is substituted by a guanine. No steric clash is predicted to occur if the adenine is replaced by a pyrimidine base; however, no structural or biochemical data are available to see the impact of such a mutation. These structural and biochemical data suggest that the guanine acts as a binding antideterminant, which restricts the number of binding sites accessible to ADAR2. In dsRBD1 of Drosophila ADAR, the equivalent methionine is replaced by an alanine, the side chain of which is too short to influence any kind of binding register [53]. This could explain why Drosophila ADAR is less selective than mammalian ADARs and leads to extensive site-specific editing events [113, 114]. The observation of such contacts for the two dsRBDs of ADAR2 is remarkable because it provides a basis to explain the mechanism of substrate selectivity by ADAR2.
Compared to a G–C base pair, it has already been recognized that an A–U base pair lacks hydrogen bonds donor/acceptor groups in the minor groove to be unambiguously recognized by proteins (Fig. 4b). Thus, the use of an aliphatic side chain could be a way for a protein to scan the RNA minor groove for the presence of the small hydrophobic surface from the adenine H2 proton. Interestingly, it can be seen in several entries of the sequence alignment (Fig. 1) that a glutamine or an asparagine is found at the same position (e.g., PACT dsRBD2). One would expect that the side chain of these residues would point toward the minor groove and could contact the edge of an RNA base. As we shall see for the dsRBD of Aa RNase III, this type of configuration would favour the presence of a guanine. However, some structural and biochemical data would be needed here to assess that possibility.
The second sequence-specific contact involving helix α1 has been observed in several crystal structures of Aa RNase III in complex with RNA [43, 45, 46]. It involves the amino acid at position 11 in the sequences alignment shown in Fig. 1 (see also Fig. 7b, d). This position corresponds to the C-terminal extremity of helix α1 and is not present in dsRBDs having a shorter version of helix α1 (e.g. ADAR2 dsRBDs, Rnt1p; compare also Fig. 7c, d). The most prevalent amino acid found at this position is glutamine (42 % in the set of sequences of Fig. 1), followed by isoleucine or valine (Figs. 1, ,7b).7b). Quite interestingly, in several structures of Aa RNase III in complex with RNA, the side chain of the glutamine points directly into the minor groove of the RNA to contact the edge of a guanine. The amide side chain of this residue makes a pair of hydrogen bonds to the N3 atom (hydrogen bond acceptor) and to the exocyclic amino group (hydrogen bond donor) of the guanine edge exposed in the minor groove (see Gln 11 in Figs. 6a, a,7d).7d). As noted previously, the amino group of guanine bases is the only hydrogen bond donor present in the RNA minor groove (compare Fig. 4c, d), therefore this contact is sequence-specific and likely has an effect on binding site selection by dsRBDs. Several reports pointed out that bacterial RNase III preferentially cleaves dsRNA at specific sites and that the sequence of the RNA region in contact with helix α1 was a major determinant [109, 110, 112]. In this context, this interaction between the glutamine and the guanine may be important to understand how RNA sequence affects target sites selection.
Overall, two kinds of sequence-specific contacts involving amino acid side chains located in helix α1 have been observed in several high resolution structures of dsRBDs in complex with RNA: namely, a methionine at position 4 contacting an adenine (ADAR2 dsRBD1, dsRBD2 [52]) (Figs. 6b, b,7c),7c), and a glutamine at position 11 contacting a guanine (Aa RNaseIII [45, 46]) (Figs. 6a, a,7d).7d). In PACT dsRBD1, these positions in helix α1 are occupied by a glutamine and a methionine, respectively (Fig. 7a, b). It would be interesting to know whether these two side chains could also make sequence-specific contacts to the RNA, thereby increasing nucleotide sequence specificity for binding.
Sequence-specific interactions in loop 2
In 12 X-ray structures out of 13 and in 2 NMR structures out of 5 (Table 3), an RNA sequence-specific interaction involving region 2 of the dsRBDs canonical binding surface has been observed. This contact involves the carbonyl group of the peptide backbone of the third residue in loop 2 (position 30 in Fig. 1), which makes a hydrogen bond with the exocyclic amino group of a guanine base in the RNA minor groove (Fig. 6c, d) [43–46, 52, 55, 56]. Since guanine is the only base exhibiting an amino group in the minor groove, the formation of this hydrogen bond is sequence-specific. Both G–C and C–G base pairs can fulfill this requirement for a guanine because the exocyclic amino group occupies virtually the same position in the minor groove in both cases [25]. Actually, this interaction has been observed in several different base pairing contexts, including Watson–Crick G–C (Xlrbpa, ADAR2 dsRBD2) and C–G base pairs (HYL1 dsRBD1, TRBP dsRBD2, Aa RNase III), G–U wobble pair (Aa RNase III), and G–G mismatch (ADAR2 dsRBD1). It is noteworthy that the structure of HYL1 dsRBD1 shows an additional hydrogen bond between the hydroxyl group of the serine (third residue of loop 2, position 30 in Fig. 1) and the N3 atom of a guanine. This bidentate hydrogen bond involving the carbonyl group and the side chain of Ser 30 HYL1 dsRBD1 (Fig. 1) could in principle discriminate a G–C from a C–G base pair but this remains to be tested.
To assess the importance of this interaction for ADAR2 dsRBDs binding affinity, Stefl and co-workers performed a series of binding assays with several mutants of the GluR2 R/G RNA substrate of ADAR2, in which specific G–C base pairs were substituted by A–U. These results show that the loss of this interaction results in a significant decrease of binding affinity (1.5- to 6.6-fold for dsRBD1 and dsRBD2, respectively) [52]. The biological relevance of this interaction has also been tested in vivo, showing that mutations affecting loop 2 of either dsRBDs lead to a decrease of ADAR2 editing activity by 80–90 % [52].
The formation of this CO–H2N hydrogen-bond highly depends on the conformation of loop 2 in order to have the peptide carbonyl group pointing in the proper direction inside the minor groove. Indeed, the conformation of loop 2 is quite well conserved in all the structures where this interaction is observed. The presence of the consensus Gly 28 and His 31 residues seems to be absolutely required, whereas the consensus Pro 29 (Fig. 1) which is not present in ADAR2 dsRBD2 seems not to be essential.
Sequence preference: register of binding
Two RNA sequence specific contacts involving residues of helix α1 and loop 2 have been observed in high resolution structures of three different dsRBDs–RNA complexes [43, 52]. Both dsRBDs of human ADAR2 and the dsRBD of Aa RNase III use a carbonyl group from the peptide backbone in loop 2 to make a hydrogen bond to the amino group of a guanine in the minor groove (Fig. 6c, d). Each of these three dsRBDs also use a side chain located in helix α1 to make a sequence-specific contact with a base in the RNA minor groove: a methionine located in the N terminal half of helix α1 interacts with an adenine in ADAR2 dsRBD1 and dsRBD2, whereas a glutamine located two helical turns after, in the C terminal half of helix α1, interacts with a guanine in the dsRBD of RNase III (Figs. 6a, b, b,7c,7c, d). As a consequence of differences in the position and the orientation of these side chains in helix α1, the number of RNA base pairs between the two sequence-specific contacts, also referred to as the ‘register length’, is different for these three dsRBDs–RNA complexes (Fig. 9). In the structure of Aa RNase III in complex with RNA, the register length between the guanine contacted by loop 2 and the guanine contacted by helix α1 is ten base pairs. In the structures of ADAR2 dsRBD1 the register between the guanine contacted by loop 2 and the adenine contacted by helix α1 is nine base pairs whereas it is eight base pairs for dsRBD2 (Fig. 9). The register difference observed for ADAR2 dsRBD1 and dsRBD2 results from a slightly different orientation of helix α1 with respect to helix α2, emphasising the importance of the residues of helix α1 involved in the hydrophobic core of the protein [51, 52]. How do these sequence-specific contacts observed in high resolution structures actually affect dsRNA binding by dsRBDs? Stefl and co-workers [52] reported that mutation of any of the RNA bases contacted specifically by the dsRBDs of ADAR2 lead to an affinity decrease from two- to fivefold for the individual dsRBDs. These results indicate that the individual dsRBDs of ADAR2 are actually able to discriminate different RNA stem–loops based on their nucleotide sequence. When the two dsRBDs are associated in tandem, as is the case in the full length ADAR2, one can speculate that even more stringent RNA sequence recognition could arise, since four sequence-specific interactions would then be present. Deletion of ADAR2 dsRBDs or mutation of the residues involved in sequence-specific contacts with RNA leads to a marked decrease in site-specific RNA editing in vivo [52]. It was also reported that ADAR1 and ADAR2 edit RNA with a different specificity [115, 116]. Interestingly, α1 helices of the dsRBDs present in these proteins show differences predicted to affect RNA binding specificity. Namely, the methionine at position 4 in ADAR2 dsRBDs is replaced by a serine or a threonine in ADAR1, whereas ADAR1 dsRBD1 has a longer helix with a glutamine at position 11 (Fig. 7a–d).
Concluding remarks
Over the past 15 years, since the determination of the first structure of a dsRBD by NMR, a wealth of structural data has accumulated, providing useful insights on the molecular basis of dsRNA-specific recognition. Originally thought to recognize exclusively the A-form nature of the RNA double helix, dsRBDs have been later found to be sensitive to additional RNA determinants. We reviewed in this paper the recent structural data showing how dsRBDs can recognize, beyond the A-form RNA helix, other RNA features, such as apical loops and nucleotide sequences. This subtle modulation of dsRBD binding is likely to play a crucial role for targeting dsRBD-containing proteins to their specific RNA substrates in vivo. A better understanding of these mechanisms should provide a basis to help predicting the optimal RNA binding sites for a given dsRBD. Although our understanding of RNA recognition by dsRBDs has made a great progress in the past few years, some particular aspects like the influence of RNA mismatches and internal loops on dsRBD binding would need further studies to be better understood. A particularly interesting point which has not been addressed so far is the possible interplay between dsRBDs in proteins harbouring multiple copies of this domain. This could result in an increase of binding specificity or it might enable dsRBDs to recognize complex RNA tertiary structures. Another fascinating aspect of dsRBDs that has recently emerged is the role played by some of them in the sub-cellular localisation of protein. This control occurs through interactions with other protein partners and sometimes with dsRNA. In conclusion, we believe dsRBDs are fascinating protein domains that concentrate a lot of possibilities into a very compact fold still leaving many aspects of their functions to be discovered.
Acknowledgments
We sincerely apologize to the colleagues whose important work is not cited because of space limitation, or unfortunately because of our negligence. This work was supported by the Swiss National Science Foundation Nr. 31003AB-133134 and 310030E-131031, the SNF-NCCR structural biology and a KTI Grant 11329.1 PFLS-LS. G.M. was supported by grant from the “Fondation pour la Recherche Médicale”. P.B. was supported by the Postdoctoral ETH Fellowship Program.
Abbreviations
ADAR | Adenosine deaminase acting on RNA |
DGCR8 | DiGeorge syndrome critical region 8 |
dsRBD | Double-stranded RNA binding domain |
dsRNA | Double-stranded RNA |
HYL1 | HYPONASTIC LEAVES1 |
ILF3 | Interleukin enhancer binding factor 3 |
PACT | PKR activator |
PKR | Protein kinase RNA-activated |
RHA | RNA helicase A |
SPNR | Spindle perinuclear protein |
TRBP | HIV transactivation response RNA binding protein |
Footnotes
G. Masliah and P. Barraud contributed equally to this work.
References
Articles from Cellular and Molecular Life Sciences: CMLS are provided here courtesy of Springer
Full text links
Read article at publisher's site: https://doi.org/10.1007/s00018-012-1119-x
Read article for free, from open access legal sources, via Unpaywall: https://www.research-collection.ethz.ch/bitstream/20.500.11850/57459/2/18_2012_Article_1119.pdf
HAL Open Archive
http://hal.archives-ouvertes.fr/hal-00725804
Citations & impact
Impact metrics
Article citations
Determinants of selectivity in the dicing mechanism.
Nat Commun, 15(1):8989, 18 Oct 2024
Cited by: 0 articles | PMID: 39420173 | PMCID: PMC11487123
Advanced sampling simulations of coupled folding and binding of phage P22 N-peptide to boxB RNA.
Biophys J, 123(19):3463-3477, 28 Aug 2024
Cited by: 0 articles | PMID: 39210596 | PMCID: PMC11480772
Structural perspectives on adenosine to inosine RNA editing by ADARs.
Mol Ther Nucleic Acids, 35(3):102284, 19 Jul 2024
Cited by: 0 articles | PMID: 39165563 | PMCID: PMC11334849
Review Free full text in Europe PMC
Differential conformational dynamics in two type-A RNA-binding domains drive the double-stranded RNA recognition and binding.
Elife, 13:RP94842, 08 Aug 2024
Cited by: 1 article | PMID: 39116184 | PMCID: PMC11309768
Structural basis for double-stranded RNA recognition by SID1.
Nucleic Acids Res, 52(11):6718-6727, 01 Jun 2024
Cited by: 0 articles | PMID: 38742627 | PMCID: PMC11194109
Go to all (152) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
BioStudies: supplemental material and supporting data
Protein structures in PDBe (Showing 38 of 38)
-
(7 citations)
PDBe - 2L2KView structure
-
(5 citations)
PDBe - 1DI2View structure
-
(4 citations)
PDBe - 1T4LView structure
-
(4 citations)
PDBe - 2L3CView structure
-
(4 citations)
PDBe - 2EZ6View structure
-
(4 citations)
PDBe - 2LBSView structure
-
(3 citations)
PDBe - 3HTXView structure
-
(3 citations)
PDBe - 2NUEView structure
-
(2 citations)
PDBe - 3RV0View structure
-
(2 citations)
PDBe - 1RC7View structure
-
(2 citations)
PDBe - 2L3JView structure
-
(2 citations)
PDBe - 1YYOView structure
-
(2 citations)
PDBe - 1YYKView structure
-
(2 citations)
PDBe - 2L6MView structure
-
(2 citations)
PDBe - 3ADIView structure
-
(2 citations)
PDBe - 3ADLView structure
-
(2 citations)
PDBe - 1YZ9View structure
-
(2 citations)
PDBe - 1YYWView structure
-
(2 citations)
PDBe - 1EKZView structure
-
(2 citations)
PDBe - 2NUGView structure
-
(2 citations)
PDBe - 2NUFView structure
-
(1 citation)
PDBe - 3C4TView structure
-
(1 citation)
PDBe - 1T4OView structure
-
(1 citation)
PDBe - 1T4NView structure
-
(1 citation)
PDBe - 2LJHView structure
-
(1 citation)
PDBe - 1STUView structure
-
(1 citation)
PDBe - 1QU6View structure
-
(1 citation)
PDBe - 2B7VView structure
-
(1 citation)
PDBe - 2B7TView structure
-
(1 citation)
PDBe - 2YT4View structure
-
(1 citation)
PDBe - 2L2NView structure
-
(1 citation)
PDBe - 2L2MView structure
-
(1 citation)
PDBe - 3C4BView structure
-
(1 citation)
PDBe - 3ADGView structure
-
(1 citation)
PDBe - 3ADJView structure
-
(1 citation)
PDBe - 2RS7View structure
-
(1 citation)
PDBe - 2RS6View structure
-
(1 citation)
PDBe - 2KHXView structure
Show less
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.