INTRODUCTION

Genome editing by clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) technology attracts great interest. The CRISPR/Cas9 system provides a means to precisely edit the genome by introducing specific substitutions via homology-directed recombination (HDR) with low background non-homologous end joining (NHEJ), and various aspects of its specificity are now the focus of particular attention. CRISPR/Cas9 technology is widely used to construct the cell lines that model various disorders, and its potential for therapeutic intervention in the human genome is considered. However, to achieve this, off-target effects of the CRISPR/Cas9 system have to be reduced, ideally, to a level that does not exceed the normal DNA mutation rate in human cells (~10−10 mutations per base pair per cell division) [1‒3].

The Cas9 protein (Streptococcus pyogenes Cas9 (SpCas9) is hereafter meant, if not specified otherwise) consists of several domains and possess RNA-dependent DNA endonuclease activity. This activity is localized in two domains. A domain with highly conserved His–Asn–His residues (the HNH domain) cleaves the DNA strand that is complementary to the RNA guide (the target strand), while a RuvC-like domain hydrolyzes the DNA strand that coincides with the RNA guide (nontarget strand) [4, 5]. The cleavage requires a protospacer, which is a double-stranded DNA (dsDNA) region where one of the strands complementary to a guide RNA (gRNA), and a protospacer-adjacent motif (PAM) 5′-NGG-3′ (Fig. 1a).

Fig. 1.
figure 1

Schematic organization of the enzyme–substrate complexes formed by RNA-guided nucleases (a) Cas9 and (b) Cas12a.

In nature, a short guide CRISPR RNA (crRNA) targets Cas9 to the protospacer, while a trans-activating crRNA (tracrRNA) is necessary for its catalytic activity. In genome editing applications, the two RNAs are often combined in a single gRNA (sgRNA; the abbreviation gRNA is hereafter used for guide RNA regardless of its nature). A sgRNA includes a variable region (20 nt in the case of SpCas9) involved in recognizing the target sequence and a minimal necessary tracrRNA fragment. The Cas9 ribonucleoprotein (RNP) cleaves both DNA strands between the third and fourth nucleotides 5' of PAM [4, 5]. The resulting double-strand break is then repaired by the HDR or NHEJ [6, 7]. HDR leads to a precise substitution of the initial sequence with a donor one as a result of editing, given that a proper donor of genetic information is available. In the case of NHEJ, short deletions or insertions usually arise at the breakpoint (Fig. 2).

Fig. 2.
figure 2

Cell pathways of double-strand break repair that result in an exact replacement (homologous recombination) or an insertion or deletion (non-homologous end joining). A donor of genetic material for homologous recombination is shown in gray.

MOLECULAR MECHANISMS OF Cas9 ERRONEOUS ACTIONS

In spite of its apparent simplicity and specific complementarity-based targeting of Cas9 nuclease by gRNA, the CRISPR/Cas9 system was found to have insufficient specificity and to introduce many off-target changes in the genome in first attempts at genome editing in human cells [811]. Studies performed to identify the main determinants of complementarity-based specificity of the Cas9–RNA complex showed that 6–8 bp flanking the PAM in a target DNA should perfectly match the gRNA sequence and that the requirement becomes less stringent when the enzyme is in excess [12, 13]. Consequences of mismatches between several sgRNAs and a target DNA were comprehensively analyzed in HEK293 and K562 cell lines, and the analysis confirmed that Cas9 nuclease is less sensitive to mismatches that are farther away from the PAM than to mismatches that are closer to the PAM [9, 10]. The sensitivity to single nucleotide substitutions is maximal within the 8–14 bp that are the closest to the PAM. This region determines the recognition specificity and is known as the seed sequence. Further studies by single molecule microscopy showed that a heteroduplex rapidly forms on the seed sequence during target DNA recognition and that RNA–DNA complementary interactions quickly spread throughout the recognition site when the heteroduplex is stable [14]. Different mismatches were found to differently affect the system specificity. A detailed analysis of 11 additional genome loci found many exceptions to the rule of perfect complementarity to the seed sequence. Mismatch tolerance varied depending on the particular base pair, and rC:dC mismatches most strongly decreased Cas9 nuclease activity [10].

The effects of multiple mismatches between the gRNA and target DNA were studied in terms of the number and mutual positions of mismatches in a sequence [9]. The number of mismatches was found to be a key factor in Cas9 activity loss; it was also of importance whether mismatches directly neighbor each other and are close to the PAM. Two or, even greater, three mismatches, especially those located in the region adjacent to the PAM, substantially reduce Cas9 activity independently of their mutual arrangement. Adjacent mismatches have the greatest effect at a distance of the PAM. However, Cas9 is capable of productive recognition of sequences with up to seven mismatches in rare cases [15].

Structures recently solved for Cas9 complexes with DNA and RNA reflect various steps of substrate recognition, making it possible to establish the structural features of the dynamic recognition process in the cases of perfectly or imperfectly matching gRNA and target DNA [16, 17] (Fig. 3). Binding to dsDNA, the Cas9–sgRNA complex bends the DNA by approximately 50°, and the three bases immediately adjacent to the PAM are consequently flipped out of the duplex. The formation of a heteroduplex in this region is sufficient for further dsDNA unwinding. When a 3‑bp mismatch occurs in the central part of the target DNA region, a heteroduplex forms in full, but DNA is not bent, and the HNH domain is consequently incapable of taking up a position necessary for catalysis. However, DNA with a 3-bp mismatch in a region distant from the PAM can bend and thus induce the catalytically competent Cas9 conformation. This is explained by the fact that nucleotides located in certain positions of the heteroduplex do not form bonds with the protein in the intermediate conformations, which precede the formation of a catalytically competent enzyme–substrate complex.

Fig. 3.
figure 3

Schematic multistep recognition of a substrate by the Cas9 RNP. Black lines show DNA; a gray line shows RNA. (1) The PAM is recognized, and a primary complex forms. (2) The enzyme conformation changes, and DNA is bent in the PAM vicinity to facilitate the duplex unwinding. (3) A heteroduplex forms on the seed sequence, but a mismatch in the seed sequence prevents this step. (4) The heteroduplex with target DNA forms in full. (5) The heteroduplex bends, and additional bonds with the protein form to bring the active centers of the HNH and RuvC domains (triangles) in contact with the target phosphodiester bonds; a mismatch in a region distant from the PAM prevents this step.

Apart from mismatches, the possibility of complementary interactions with the formation of small loops is a potential source of off-target recognition. A comprehensive analysis of how insertions and deletions in sgRNA affect Cas9 nuclease activity showed that 1-nt DNA bulges and 1- to 4-nt RNA bulges are tolerated by the system and that the degree of tolerance depends on the position of bulging nucleotides relative to the PAM [18].

When the effect of PAM nucleotide substitutions was analyzed, up to 20% of Cas9 nuclease activity was preserved in the case of a substitution of NAG or NGA for the NGG [10, 19] and approximately 10%, in the case of the NGT PAM sequence [20]. The Cas9 capability of cleaving targets in the vicinity of noncanonical PAMs was confirmed more recently by full genome sequencing in human cells transfected with sgRNA libraries [21]. DNA methylation is known to exert no effect on Cas9 activity, and recognition of noncanonical PAMs by the enzyme makes the frequency of potential target sites to be as high as 1 site per 4 bp in the human genome [10].

The observed level of off-target genome alterations is acceptable in studying the loss-of-function mutations of genes in eukaryotic cell lines [22‒25]. Cells are usually transfected with viral vector-based constructs that express Cas9 and sgRNA to a low level, even as low as a single construct per cell. The degree of cleavage was found to be 97% in a target site and less than 2.5% in 13 potential off-target sites differing from the target by no more than 3 bp in a large-scale analysis of Cas9-mediated gene knockouts in mammalian cells [22]. The only site with high-level off-target cleavage was fully complementary to sgRNA in the 8‑bp sequence adjacent to the PAM. Such sites occur at a frequency of ~2 sites per human genome and are almost always found in noncoding DNA [22].

IN SILICO SELECTION OF TARGET SITES TO IMPROVE THE EDITING PRECISION

Computer algorithms to design gRNA sequences were developed as one of the first steps to improve the precision of genome editing because the probability of off-target alterations is possible to predict to a certain accuracy from the experimental data on how mismatches affect the activity of the CRISPR/Cas9 system. In general, the site to which gRNA targets Cas9 should be such that its sequence similarity to other genome regions is minimal and that similar sequences have no PAM in their vicinity or differ from the target sequence in the PAM-proximal part. First-generation algorithms increased the editing specificity by 50% [10]. The energy of DNA/RNA heteroduplex formation was taken into account in more recent algorithms [26]. A number of programs are now available for predicting the most advantages target sequences [27, 28].

New-generation approaches to target selection became possible when efficient and rapid methods were developed to detect off-target mutations throughout the genome. Deep whole-genome sequencing was used in early studies, making it possible, in particular, to distinguish individual genome variations from off-target mutations. Digenome-Seq is a method specially adapted to detecting the off-target editing sites and is based on whole-genome sequencing of original DNA and edited DNA cleaved with Cas9 with a necessary gRNA [29]. Many less expensive methods were developed to combine a capture, fixation, amplification, and sequencing of changed genome regions: BLESS [30], NTGTS [31], GUIDE-Seq [15], SITE-Seq [32], CircleSeq [33], CHANGE-seq [34], etc. Large datasets obtained in such experiments made it possible to employ machine learning in predicting off-target modifications and optimizing the selection of target sites [34‒36]. Experimental data were accumulated for various species such as human, mouse, Danio rerio, Drosophila melanogaster, Arabidopsis thaliana, and many others, and species-specific patterns of off-target changes are now possible to consider to further improve the genome editing precision [37‒41].

Likewise, second-generation algorithms utilized the data on immunoprecipitation of catalytically inactive Cas9 (D10A H840A; dCas9) with subsequent massive parallel sequencing of bound genomic DNA fragments [11, 42‒44]. Although the specificity of dCas9 binding does not fully reflect the cleavage specificity of the active enzyme, the algorithms are successfully used to design the gRNA sequences for transcription regulation, epigenetic modification, base editing, and prime editing, where catalytically inactive Cas9 is used as a targeting module Cas9 [45‒49]. Whole-genome sequencing data on target and off-target mutations are directly used in new variants of machine learning algorithms for base editing [50].

HIGH-FIDELITY Cas9 VARIANTS

Because of the above problems with the specificity of double-strand breaks introduced by Cas9 nuclease, attempts to modify the enzyme in order to improve the system precision were already made in early studies. The first strategy was based on the use of Cas9 nickases and was essentially similar to common applications of fusion proteins that combine zinc finger recognition domains or transcription activator-like (TAL) effectors with the dimer-forming endonuclease domain of FokI restriction nuclease [51]. The substitutions D10A in the RuvC domain or H840A in the HNH domain convert Cas9 to a nickase, which introduces only single-strand breaks in DNA. Cas9 nickases used with properly selected gRNAs yield two closely spaced single-strand breaks, which together form a double-strand break, and the specificity of its formation is far higher because two sequences must be recognized simultaneously. An increase in specificity of approximately two orders of magnitude was achieved with paired Cas9 D10A–gRNA complexes in HEK-293T cell line [52], and errors were below detection limit in some studies [13, 53, 54]. As a practical application, the approach was used to obtain cattle with a point substitution in the NRAMP1 gene that confers immunity to tuberculosis [55]. Attempts were made to use the chimeric dCas9–FokI construct [56‒58]. The specificity of this protein in HEK293 cells was 140 times higher than that of Cas9 and 1.3–8.8 times higher than that of a pair of Cas9 nickases.

After the structures were solved for Cas9 and its complexes with DNA and RNA, including various conformers arising in the course of target recognition [16, 17, 59‒68], rational design methods came to be used to improve the enzyme specificity (Table 1). Alanine scanning mutagenesis of the DNA-binding channel of Cas9 yielded 11 variants with improved specificity, which was possibly due to a lower contribution of DNA–protein contacts to the stability of the Cas9–gRNA–DNA complex and a respective increase in the contribution of complementary DNA–RNA interactions [69]. Screening of a combinatorial library of these substitutions identified the variants that remained active and showed a higher specificity: eSpCa-s(1.0) (K810A K1003A R1060A) and eSpCas(1.1) (K848A K1003A R1060A). The variants did not introduce off-target changes in 22 out of the 24 most probable predicted off-target sites and were sensitive to a single mismatch occurring outside of the seed sequence [69].

Table 1.   Engineering Streptococcus pyogenes Cas9 variants with enhanced specificity

A similar approach was used to increase the Cas9 specificity by destabilizing the hydrogen bonds between the protein and DNA. Four hydrogen bonds form between target DNA phosphates and Asn497, Arg661, Gln695, and Gln926 in the complex. Screening of a library of variants with all possible combinations of Ala substitutions for the four residues revealed the highly specific variants R661A Q695A Q926A and N497A R661A Q695A Q926A (SpCas9-HF1) [70]. The rate of off-target mutations induced by SpCas9-HF1 was statistically undistinguishable from the background mutation rate in 34 out of 36 predicted off-target sites.

Substitutions of residues involved in the DNA–protein interface were initially assumed to decrease overall affinity of Cas9–DNA binding. However, the hypothesis was not confirmed in more detailed studies by dynamic Foerster resonance energy transfer [71]. A higher specificity was associated with a mechanism of conformational proofreading, which takes place in the course of consecutive changes in the spatial orientation of recognition domain 3 (REC3), REC2, and the HNH domain during enzyme–substrate binding (Fig. 3). The REC3 domain acts as an allosteric effector, which recognizes the RNA/DNA heteroduplex to ensure activation of the HNH nuclease domain. The REC2 domain prevents the catalytic residues from accessing the target phosphodiester bonds in the presence of mismatches [16, 71]. The variant HypaCas9 (N692A M694A Q695A H698A) with mutations in the REC3 domain was constructed on the basis of these experiments and proved even more specific than eSpCas(1.1) and SpCas9-HF1, while being similar in efficiency [71]. The variant SuperFi-Cas9 (Y1010D Y1013D Y1016D V1018D R1019D Q1027D K1031D), in which Asp is substituted for all amino acid residues involved in stabilizing the mismatch-containing complex, showed a 500-fold increase in specificity in vitro [16], but its effect in cells was not studied as of yet.

Apart from rational design, directed evolution was used in several attempts to obtain improved Cas9 variants (Table 1). Several in vivo selection systems were constructed to allow selection for targeted inactivation of a toxic gene and simultaneous selection for lack of off-target inactivation of a genomic locus. Both target and off-target activities of Cas9 are possible to assess simultaneously with one of the first successful systems, which was based on yeast cells [72]. Screening of a library of Cas9 variants with random mutations of the REC3 domain identified the variants that ensure a higher precision of editing without losing its efficiency. The best of the variants, evoCas9 (M495V Y515N K526E R661Q), is superior in fidelity to both the wild-type enzyme and rationally designed Cas9 variants (on average, a fourfold improvement is achieved as compared with eSpCas and SpCas9-HF1 and is similar in target activity to the wild-type protein [72]. The variant Sniper-Cas9 (F539S M763I K890N) was obtained in a system based on selection in Escherichia coli cells and showed a high specificity without loss of target activity in human cells [73]. Compared with eSpCas9(1.1), SpCas9-HF1, evoCas9, and HypaCas9, the Sniper-Cas9 enzyme showed the highest specificity in 10 out of 12 sites in HEK-293T and HeLa cells. A similar method of selection in E. coli yielded the HiFi Cas9 (R691A) variant, which was the most active in human primary hematopoietic cells when used as preformed RNP as compared with other improved variants [74]. Increased fidelity of SpCas9-HF1, HypaCas9, and HiFi Cas9 in cells is possibly due to the fact that the ability to introduce double-strand breaks is dramatically reduced in these endonucleases, which consequently act as nickases in part [75].

In contrast to the protospacer, the PAM is not recognized by complementarity. Its recognition is based exclusively on the interaction of the two G bases of the NGG DNA sequence with amino acid residues of the protein [59]. Attempts to modify the PAM recognition by Cas9 were aimed mostly at extending the range of PAM sequences rather than at improving the accuracy of PAM recognition. Unexpectedly, some evolved variants that carried multiple amino acid substitutions (xCas9) not only recognized the PAMs NG, NNG, GAA, GAT, and CAA, but also displayed a 10- to 100‑fold increase in the target specificity in HEK-293T and U2OS cells, as was the case with xCas9-3.6 (E108G S217A A262T S409I E480K E543D M694I E1219V) and xCas9-3.7 (A262T R324L S409I E480K E543D M694I E1219V) [20].

FIDELITY OF Cas NUCLEASES FROM OTHER BACTERIA

Apart from SpCas9, several RNA-guided CRISPR-associated endonucleases from other microorganisms, belonging to various types of the CRISPR system, were studied in sufficient detail. Nucleases with a high specificity and an extended PAM range were detected among these enzymes. Within type II CRISPR system, the enzymes from Staphylococcus aureus (SaCas9), Francisella novicida (FnCas9), and Neisseria meningitidis (NmCas9) attract particular interest in the context of genome editing applications. SaCas9 is appealing due to its smaller size (25% shorter than SpCas9) and higher turnover number in the catalytic reaction as compared with SpCas9 [76, 77]. The specificity of SaCas9 in human cells depends on the gRNA length, being somewhat lower than that of SpCas9 at a gRNA length optimal for activity (21–23 nt) and substantially increasing with shorter gRNAs (20 nt) as a result of a lower mismatch tolerance [76, 78‒80]. Higher-fidelity SaCas9 variants similar to SpCas9-HF1 were obtained by rational design [81]. Activity of FnCas9 is ~70% of SpCas9 activity, while its specificity is several times higher [82, 83]. However, FnCas9 is highly sensitive to chromatin structure and inactive in many human genome loci for reasons not well understood [82]. NmCas9 similarly shows a somewhat lower activity and a considerably higher fidelity as compared with SpCas9 [84, 85].

The Cas12a (Cpf1) proteins belong to type V CRISPR system and also attract great interest (Fig. 1b). In contrast to Cas9, Cas12a contains only the RuvC-like domain and requires only a rather short crRNA (~42 nt) to exert endonuclease activity, while the protospacer length is 23‒24 nt [86]. The PAM is 5' of the protospacer; its sequence is (T)2–3N. Hydrolysis of dsDNA occurs at the phosphodiester bonds between nucleotides 23 and 24 of the target strand (relative to the PAM) and nucleotides 18 and 19 of the nontarget strand and produces sticky ends. Many Cas12a enzymes are inactive when synthesized in mammalian cells, while class members from Acidaminococcus sp. (AsCas12a) and Lachnospiraceae (LbCas12a) display nuclease activity [86]. The efficiencies of AsCas12a and LbCas12a were comparable with that of Cas9 in U2OS and HEK293 cells, and no induction of off-target mutations was observed with 17 out of 20 crRNAs in the case of AsCas12a and 12 out of 20 crRNAs in the case of LbCas12a [87, 88]. The PAM specificity of AsCas12a is higher than that of LbCas12a. A systematic analysis of gRNA mismatches showed that the system is partly tolerant to single mismatches in the target DNA sequence, but two mismatches almost fully abolish enzymatic activity [87].

EFFECT OF gRNA STRUCTURE ON EDITING PRECISION

The design of the RNA component of the CRISPR/Cas9 system also makes an appreciable contribution to the precision of genome editing. For example, the editing specificity is substantially higher when two Gs are added to the 5′ end of sgRNA (GG-X20). Off-target activity was detected at all of the seven test sites with a standard-design sgRNA and only at one site with the GG-X20 sgRNA in K562 human myeloid leukemia cells [13]. A similar effect is observed when structured regions, such as G-quadruplexes, are added to the 3′ end of sgRNA [89]. Truncation of sgRNA by 1–3 nt enhances the specificity by increasing the mismatch sensitivity, but slightly decreases Cas9 activity [12, 90]. The targeting RNA region should be at least 17 nt in length to allow the efficient function of the Cas9–sgRNA complex in human cells. At this length, system activity measured as percent changed cells and the HDR : NHEJ ratio did not differ from the respective values observed with the full-length (20-nt) sgRNA [90]. In general, truncated sgRNAs can increase the system specificity by more than three orders of magnitude in human [15, 90, 91] and yeast [92] cells. It is of interest that similar target specificity profiles were observed for truncated sgRNAs (17–18 nt) and full-length sgRNA (20 nt) in an in vitro system of several oligonucleotide sgRNA and target DNA libraries [92]. Thus, the chromatin structure is also likely to play a role in editing precision. When a combination of tracrRNA and synthetic crRNA was used in place of sgRNA, a low level of off-target changes was observed in K562 and HeLa cells [13].

Chemical modification of RNA (Fig. 4) can also affect the activity and specificity of the CRISPR/Cas9 system. Early studies in the field were aimed at increasing the gRNA resistance to intracellular nucleases. For example, sgRNA with the three 5'-terminal and three 3'-terminal nucleotides modified with 2'-O-methyl-3'-thiophosphate or 2'-O-methyl-3'-thiophosphonoacetate ensured far more efficient editing in K562 cells as compared with unmodified sgRNA, although off-target activity of the CRISPR system was somewhat higher [93]. The specificity of the system increased severalfold when Cas9 was introduced as a recombinant enzyme in complex with sgRNA, rather than being synthesized from an expression plasmid in the cell. A similar strategy was used with the crRNA/tracrRNA system, in which crRNA modification with 2'-fluoro, 2'-O-methyl, and 2'–4'-bridged nucleotides at certain critical positions led to a several-fold increase in both activity and specificity of editing in HEK-293T cells [94]. Bridged nucleotides introduced in the gRNA structure decrease off-target activity by accelerating the dynamic transitions between open and closed conformations of mismatch-containing heteroduplexes [95].

Fig. 4.
figure 4

Examples of modifications introduced in gRNA structure: (a) 2'-O-methyl ribonucleotide, (b) 2'-O-methyl ribonucleotide with a 3'-thiophosphate bond, (c) 2'-O-methyl ribonucleotide with a 3'-thiophosphonoacetate bond, (d) 2'-fluoro nucleotide, (e) 2'–4'-bridged nucleotide, and (f) 2'-deoxyribonucleotide.

Hybrid guide nucleic acids, which combine both ribonucleotides and deoxyribonucleotides, also received substantial attention for their ability to increase the target recognition specificity because the interaction energy in dNMP:dNMP pairs is lower than in rNMP:dNMP pairs [96‒98]. Modifications of several types introduced simultaneously in different positions usually increase the editing efficiency due to synergistic effects [98]. A system developed to screen chemically modified active crRNAs and tracrRNAs and rational design of modification sites with preservation of protein-contacting 2′-OH group in the structure of the Cas9–RNA–DNA complex made it possible to construct highly modified enhanced sgRNAs (e-sgRNAs), in which more than half of the ribonucleotides are replaced with their 2'-fluoro, 2'-O-methyl, or thiophosphate derivatives [99, 100]. The e-sgRNAs were successfully used to edit the Pcsk9 gene in mice [99].

EFFECT OF DELIVERY SYSTEMS ON EDITING PRECISION

To knock out a gene in the eukaryotic genome by the CRISPR/Cas method, two system components, Cas9 and sgRNA (or Cas9, crRNA, and tracrRNA), must be delivered into the cell. A recombination donor is additionally required for the precise replacement via HDR. The components can be delivered as coding DNA, RNA (Cas9 mRNA and sgRNA), or a recombinant protein with RNA synthesized chemically or enzymatically.

Coding DNA constructs were the focus of early studies on genome editing in human cells because homogeneity of synthetic RNA was not high enough. Conventional transfection methods are still sufficient for many research tasks, while special vectors were developed on the basis of lentiviruses, adenoviruses, and adeno-associated viruses for potential therapeutic applications and screenings of RNA libraries [101, 102]. Regardless of the transfection method, Cas9- and RNA-coding plasmids delivered into the cell are often completely or partly integrated into the host genome at target and off-target sites [103, 104]. In addition, production of the system components from a plasmid template is sustained for several days, making off-target changes to the genome more likely [103]. Intracellular delivery of vectors coding for the components of the CRISPR/Cas9 system is consequently thought inacceptable for therapeutic applications now, and organisms constructed by this means are legally considered as genetically modified in the majority of countries.

Improvements to RNA synthesis methods made it possible to directly deliver the Cas9 mRNA and necessary sgRNA into the cell [103, 105]. The editing precision achieved with this delivery method is at least comparable to that achieved with delivery of coding constructs [106, 107]. The editing event rate is rather low upon combined delivery of the Cas9 mRNA and unmodified sgRNA, but chemical modification of sgRNA ensures a severalfold increase in both efficiency and specificity [93].

Delivering RNP formed of recombinant Cas9 and necessary sgRNA became a common method because a high efficiency and a high specificity are achieved. For example, RNP delivery in human cells via electroporation or lipofection increases the specificity of editing by one order of magnitude as compared to transfection with expression plasmids [103, 105, 108]. A higher specificity is achieved possibly because RNP lives a few hours in the cell, while expression of a plasmid vector may last several days. Less common approaches include delivering RNPs as conjugates with structurally various cell-penetrating peptides, which are internalized in the cell via a variety of endocytosis-dependent and endocytosis-independent mechanisms [109]. When such peptides were conjugated with Cas9 through a thioester bond and the resulting RNPs were used to treat cells, the editing efficiency varied from 3 to 16% in different cell lines and the specificity was 2.2–4.1 times higher than in a plasmid control [110]. Another highly promising delivery method utilizes gold nanoparticles conjugated with DNA and coated with the cationic polymer poly-{N-[N-(2-aminoethyl)-2-aminoethyl]aspartamide}, which facilitates internalization [111]. The HDR efficiency was 3–6% in a panel of primary and transformed human and mouse cells. In vivo studies were performed with a mouse model of Duchenne muscular dystrophy and intramuscular administration, and an editing efficiency of ~5.4% was achieved for the target gene in muscle cells, while off-target editing events at 21 potential sites occurred at a rate of 0.005–0.2% [111].

CONDITIONS FOR INTRACELLULAR EXPRESSION OF CRISPR/Cas9 SYSTEM COMPONENTS

A longer life and a higher concentration of the components of the CRISPR/Cas9 system in the cell promote off-target editing. Use of inducible promoters to express Cas9 (inducible Cas9 (iCas9)) was among the earliest ideas of how to achieve an optimal balance between the efficiency and specificity of the system [112‒114]. To strictly verify the system specificity in a panel of human cells (293T, HeLa, and SK-BR-3), editing with iCas9 was performed using sgRNAs (both perfectly complementary and mismatch containing) targeted to KDM5C, EMX1, and VEGFA genes. Compared with nonregulated production, induced Cas9 expression delayed the editing of imperfectly complementary targets by several tens of hours, the editing kinetics of perfect targets was similar in both expression variants, and the rate of off-target alterations decreased by at least one order of magnitude [114]. An approach useful for laboratory research is based on stable integration of the Cas9 gene under the control of an inducible promoter into a cell chromosome. Only sgRNA and, when necessary, a donor for recombination are then necessary to deliver into cells for their modification [112, 115]. The method was used to modify human pluripotent stem cells, and the modification rate at off-target sites was below detection limit [112].

Control at the posttranslational level is also possible for intracellular Cas9 activity, for example, by limiting the life of the active enzyme. In the split Cas9 system, the C- and N-terminal domains of the enzyme are synthesized separately as fusion polypeptides with the FKBP protein and the FRB domain of mTOR, which form a dimer in the presence of rapamycin [116]. However, spontaneous Cas9 dimerization rendered it impossible to completely abolish activity of the complex, although the off-target modification level was reduced to 4–20% of that observed with Cas9 [116]. A similar system was constructed by fusing the N- and C-terminal fragments of Cas9 with the Magnet domains, which originate from the Neurospora VVD photoreceptor and are capable of photodimerization. The background editing rate was reduced to an undetectable level [117]. As another means of posttranslational control, the Cas9 sequence was combined with 4-hydroxytamoxifen-dependent intein, which is a domain capable of catalyzing self-excision from the host protein in certain conditions [118]. The efficiency of editing the EMX, VEGFA, and CLTA loci in cultured HEK293 cells in the presence of the inductor was comparable with that of Cas9, while the specificity was 25 times higher [118].

To limit Cas9 activity, the enzyme is possible to express with two sgRNAs, one targeting the locus of interest and the other, Cas9. The method made it possible to substantially reduce the period of intense Cas9 expression and to increase the editing specificity by a factor of 4.0–7.5 [119, 120]. Finally, when HDR is used to perform modifications, a phenotypically neutral substitution can be introduced in the donor of genetic material. The substitution is designed to change the seed sequence hybridizing with sgRNA or to eliminate the PAM in the case of successful recombination. The approach increased the precision of editing the APP and PSEN1 loci in pluripotent stem cells and the HEK293 cell line by a factor of 2–10 [121, 122].

Because the double-strand repair mechanisms depend on the phase of the cell cycle, its synchronization with editing was considered. RNP is preferential to use as a main delivery system in this case because its action starts as soon as RNP is delivered into the cell. After transfection with RNP, HEK-293 cells, human primary fibroblasts, and human embryonic stem cells were synchronized using nocodazole at the G2/M boundary. This increased the frequency of HDR events by approximately three times, while the off-target event rate did not exceed the background as determined by whole-genome sequencing [123]. When heterozygous mutations were edited in human zygotes, introduction of RNP and donor DNA for recombination in the S-phase favored the use of exogenous DNA, rather than a chromosomal copy, as a recombination template [124].

In summary, several methods are actively pursued now in order to increase the precision of complementarity-targeted genome editing, which utilizes the CRISPR/Cas9 system in the majority of cases. The question is whether one of the methods or their certain combination will ensure the precision acceptable in therapeutic genome editing, and its solution depends not only on manipulations with editing systems, but also on the accuracy of detecting off-target events. Minor changes in the genome are detected better than large rearrangements by modern high-throughput sequencing methods, and the likelihood of large rearrangements as off-target events is still hotly debated [125‒127]. The situation will possibly change when nanopore sequencing, which yeilds far longer reads, is used on a larger scale. In any case, acceptable in vivo safety of the technology can only be achieved when the frequency of off-target alterations (with any gRNA) is comparable with the background rate of somatic mutagenesis. The replication accuracy is commonly thought to be ~10−10 mutations per base pair per cell division in cultured human noncancer cells [1], and this value agrees well with recent experimental estimates obtained by single-cell sequencing in cell clones from various tissues [128‒130]. This accuracy cannot be achieved now even with super-fidelity Cas9 variants. A somewhat lower accuracy of the editing system might be acceptable in ex vivo therapeutic manipulations (e.g., editing with subsequent autotransplantation). A lower accuracy is compensated for in this case by whole-genome sequencing performed to identify the clones that carry only target mutations. The lower the accuracy of the system, the greater is the number of sequencing attempts to be performed. In total, improving the accuracy will remain one of the main avenues of research in the field of genome editing in the nearest future.