IMPROVED PRIME EDITING METHODS AND COMPOSITIONS RELATED APPLICATIONS [0001] This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Application, U.S.S.N.63/333,103, filed April 20, 2022, the contents of which is incorporated herein by reference. GOVERNMENT SUPPORT [0002] This invention was made with government support under Grant Nos. U01AI142756, RM1HG009490, R01EB031172, and R35GM118062, awarded by the National Institutes of Health. The government has certain right in the invention. INCORPORATION BY REFERENCE [0003] This application refers to and incorporates by reference the entire contents of each of the following patent applications directed to prime editing previously filed by one or more of the present inventors: U.S. Provisional Application U.S.S.N.62/820,813, filed March 19, 2019; U.S. Provisional Application U.S.S.N.62/858,958, filed June 7, 2019; U.S. Provisional Application U.S.S.N.62/889,996, filed August 21, 2019; U.S. Provisional Application U.S.S.N.62/922,654, filed August 21, 2019; U.S. Provisional Application U.S.S.N. 62/913,553, filed October 10, 2019; U.S. Provisional Application U.S.S.N.62/973,558, filed October 10, 2019; U.S. Provisional Application U.S.S.N.62/931,195, filed November 5, 2019; U.S. Provisional Application U.S.S.N.62/944,231, filed December 5, 2019; U.S. Provisional Application U.S.S.N.62/974,537, filed December 5, 2019; U.S. Provisional Application U.S.S.N.62/991,069, filed March 17, 2020; U.S. Provisional Application U.S.S.N.63/100,548, filed March 17, 2020; International PCT Application No. PCT/US2020/023721, filed March 19, 2020; International PCT Application No. PCT/US2020/023553, filed March 19, 2020; International PCT Application No. PCT/US2020/023583, filed March 19, 2020; International PCT Application No. PCT/US2020/023730, filed March 19, 2020; International PCT Application No. PCT/US2020/023713, filed March 19, 2020; International PCT Application No. PCT/US2020/023712, filed March 19, 2020; International PCT Application No. PCT/US2020/023727, filed March 19, 2020; International PCT Application No. PCT/US2020/023724, filed March 19, 2020; International PCT Application No.
PCT/US2020/023725, filed March 19, 2020; International PCT Application No. PCT/US2020/023728, filed March 19, 2020; International PCT Application No. PCT/US2020/023732, filed March 19, 2020; and International PCT Application No. PCT/US2020/023723, filed March 19, 2020. [0004] This application also refers to and incorporates by reference the entire contents of each of the following patent applications directed to prime editing previously filed by one or more of the present inventors: International PCT Application No. PCT/US2022/012054, filed January 11, 2022, U.S. Provisional Application U.S.S.N.63/255,897, filed October 14, 2021, U.S. Provisional Application U.S.S.N.63/231,230, filed August 9, 2021, U.S. Provisional Application U.S.S.N.63/194,913, filed May 28, 2021, U.S. Provisional Application U.S.S.N. 63/194,865, filed May 28, 2021, U.S. Provisional Application U.S.S.N.63/176,202, filed April 16, 2021, U.S. Provisional Application U.S.S.N.63/176,180, filed April 16, 2021, and U.S. Provisional Application U.S.S.N.63/136,194, filed January 11, 2021. [0005] This application additionally refers to and incorporates by reference the entire contents of each of the following patent applications directed to prime editing previously filed by one or more of the present inventors: International PCT Application No. PCT/US2021/052097, filed September 24, 2021, U.S. Provisional Application U.S.S.N. 63/231,231, filed August 9, 2021, U.S. Provisional Application U.S.S.N.63/091,272, filed October 13, 2020, U.S. Provisional Application U.S.S.N.63/083,067, filed September 24, 2020, and U.S. Provisional Application U.S.S.N.63/182,633, filed April 30, 2021. [0006] This application additionally refers to and incorporates by reference the entire contents of each of the following patent applications directed to prime editing previously filed by one or more of the present inventors: International PCT Application No. PCT/US2021/031439, filed May 7, 2021, U.S. Provisional Application No.63/022,397, filed May 8, 2020, and U.S. Provisional Application No.63/116,785, filed November 20, 2020. BACKGROUND OF THE INVENTION [0007] The recent development of prime editing enables the insertion, deletion, and/or replacement of genomic DNA sequences without requiring error-prone double-strand DNA breaks. See Anzalone et al., “Search-and-replace genome editing without double-strand breaks or donor DNA,” Nature, 2019, Vol.576, pp.149-157, the contents of which are incorporated herein by reference. Prime editing uses nucleic acid programmable DNA binding protein, (e.g., an engineered Cas9 nickase) and a reverse transcriptase (e.g., a PE2 fusion protein ) in combination with an engineered prime editing guide RNA (pegRNA) that
not only directs Cas9 to a target genomic site, but also encodes the information for installing the desired edit. Without wishing to be bound by any particular theory, prime editing proceeds through a presumed multi-step editing process: 1) the Cas domain binds and nicks the target genomic DNA site, wherein the nicking site is specified by the pegRNA’s spacer sequence, and the specific PAM sequence recognized by the Cas nickase; 2) the reverse transcriptase domain uses the nicked genomic DNA as a primer to initiate the synthesis of an edited DNA strand using an engineered extension on the pegRNA as a template for reverse transcription–this generates a single-stranded 3′ flap containing the edited DNA sequence; 3) cellular DNA repair resolves the 3′ flap intermediate by the displacement of a 5′ flap species that occurs via invasion by the edited 3′ flap, excision of the 5′ flap containing the original DNA sequence, and ligation of the new 3′ flap to incorporate the edited DNA strand, forming a heteroduplex of one edited and one unedited strand; and 4) cellular DNA repair replaces the unedited strand within the heteroduplex using the edited strand as a template for repair, completing the editing process. [0008] Since 2019, prime editing has been applied to introduce genetic changes in a wide variety of cells and/or organisms. Given its rapid adoption, prime editing represents a powerful tool for genomic editing. Modifications to prime editing systems which result in increasing the specificity and/or efficiency of the prime editing process would significantly help advance the art. SUMMARY OF THE INVENTION [0009] The present application discloses various improvements in prime editing (PE) relating to the optimization of various aspects and parameters of PE, including optimizing the conducting of PE and twin prime editing (“twinPE”) experiments, as well as optimizing the design of pegRNAs and second-strand nicking guide RNAs. Guidelines and methods for how to select the proper PE system (e.g., PE1, PE2, PE3, PE3b, PE4, PE4b, PE5b, or twinPE) for a given application are also provided, as well as methods for conducting prime editing in mammalian cells. [0010] Prime editing (PE) is a precision gene editing technology that enables the programmable installation of nucleotide substitutions, insertions, and/or deletions in target DNA, for example, target genomic DNA in cells and animals without requiring double- stranded DNA breaks (DSBs). The mechanism of prime editing allows it to precisely install edits without creating DSBs and minimizes indels and other undesired outcomes. The capabilities of prime editing have also expanded since its original publication (See Anzalone
et al., “Search-and-replace genome editing without double-strand breaks or donor DNA,” Nature, 2019, Vol.576, pp.149-157). Enhanced prime editing systems, PE4 and PE5, manipulate DNA repair pathways to increase prime editing efficiency and reduce indels. Other advances that improve prime editing efficiency include engineered pegRNAs (epegRNAs), which include a structured RNA motif to stabilize and protect pegRNA 3′ ends, and the PEmax architecture, which improves editor expression and nuclear localization. New applications such as twin prime editing (twinPE) can precisely insert or delete hundreds of base pairs of DNA and can be used in tandem with recombinases to achieve gene-sized (>5 kb) insertions and inversions. Achieving optimal prime editing requires careful experimental design, and the large number of parameters that influence prime editing outcomes can be daunting. This application describes a series of optimized practices for conducting prime editing and twinPE experiments and describes the design and optimization of pegRNAs and second-strand nicking guide RNAs. In addition, the application provides additional disclosure on methods for performing prime editing in mammalian cells. [0011] It should be appreciated that the foregoing concepts, and additional concepts discussed below, may be arranged in any suitable combination, as the present disclosure is not limited in this respect. Further, other advantages and novel features of the present disclosure will become apparent from the following detailed description of various non- limiting embodiments when considered in conjunction with the accompanying figures. BRIEF DESCRIPTION OF THE DRAWINGS [0012] The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present disclosure, which can be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein. [0013] FIG.1 is an exemplary schematic showing the mechanism of prime editing. The steps shown are the putative mechanism for prime editing using various editing systems and an unmodified pegRNA. Cas9 nickase is recruited to a target DNA site by a pegRNA and nicks the target site to create a 3′ end of DNA. The primer binding site (PBS) of the pegRNA can then anneal to the genomic DNA flap. This duplex is recognized by a reverse transcriptase, which reverse transcribes nucleotides extending from the target site 3′ end, synthesizing a single stranded sequence encoded by the reverse transcription template (RTT) of the pegRNA. The newly synthesized single stranded DNA produced by reverse transcription contains the desired prime edit, and optionally downstream homology to the rest of the target
DNA site. The newly synthesized single stranded DNA (the 3′ flap shown in FIG.1) equilibrates with the corresponding endogenous target DNA sequence (the 5′ flap shown in FIG.1), which does not contain the desired edit. degradation of the 5′ flap, ligation of the edited 3′ flap into the genome, and repair of the complementary DNA strand (i.e., the strand having complementarity to the pegRNA spacer, shown as lower strand in FIG.1) by DNA repair or replication results in stable installation of the edit. Prior to repair of the complementary strand, cellular mismatch repair (MMR) can revert the edit back to the unedited sequence. In the PE3 and PE5 systems, a second nick is installed in the complementary strand of DNA, for example, a second nick ≥~50 bp away (e.g., a second nick on the lower strand in FIG.1 corresponding to a position ~50 nucleotides downstream of the first nick on the upper strand) from the pegRNA-induced nick. This additional nick can bias MMR in favor of editing. In the PE4 and PE5 systems, an engineered dominant-negative MLH1 mutant (MLH1dn) can inhibit cellular mismatch repair and thus favors desired prime editing outcomes. [0014] FIG.2: Architecture of an exemplary engineered prime editing guide RNA (epegRNA). From 5′ to 3′, the exemplary epegRNAs consist of a spacer, scaffold, RTT (reverse transcription template), PBS (prime binding site), and 3′ structural motif, such as tevopreQ1. The prime editor protein is shown in the background, with Cas9 and the reverse transcriptase (RT). The target genomic DNA is shown, with the nicked and edited strand shown in dark grey and the complementary stand in light grey. [0015] FIG.3: Exemplary experimental design of epegRNAs. Protospacers should first be identified based on available PAM sequences. Of these protospacer candidates, the ones closest to the desired edit should be tried first. Second, for a minimal initial screen, PBS lengths of 10, 13, and 15 nt and RTT lengths that extend at least 7 nt beyond the desired edit are designed. Note: the epegRNA modification is not shown here for simplicity, but it may be included in all pegRNA designs by default. Third, nicking sgRNAs are designed to target the opposite strand, typically downstream of the initial nick. Finally, PAM-disrupting or silent mutations are identified and added to the RTT of the epegRNAs. [0016] FIG.4: Exemplary experimental design for twinPE. First, high-efficiency protospacers as predicted by CRISPick should be identified. Protospacer pairs should then be selected (minimum inter-nick distance of 30 nt). Second, PBS lengths of 10, 13, and 15 nt should be tried for each protospacer. For RTT design, the desired insertion should be encoded on one epegRNA, and its reverse complement should be encoded on the other. Third, for
twinPE, epegRNA screening is not a matrix of PBS lengths x RTT lengths, but is instead a matrix of top and bottom strand epegRNAs, each of which will have three possible PBS lengths. Note: the epegRNA modification is not shown here for sake of simplicity, but may be included in all pegRNA designs by default. An example is shown of a twinPE product, in which the sequence between the two nicks is replaced with the sequence encoded in the RTTs of the epegRNAs. [0017] FIG.5: Exemplary design of a PE3b/PE5b nicking sgRNA. To use the PE3b or PE5b systems, a PAM needs to be present on the non-edited strand close to the edit. A nicking sgRNA can then be designed such that it can only bind and nick the non-edited strand after reverse transcription and flap equilibration have occurred. Such a PE3b/PE5b nicking sgRNA has a spacer that is complementary to the edited DNA sequence, but contains mismatches with the unedited sequence. [0018] FIG.6: Exemplary experimental workflow for PE optimization. To optimize prime editing at a new locus, an initial set of epegRNAs is first designed and cloned. These epegRNAs are then screened via transfection in workhorse cell lines, such as HEK293T cells or N2A cells. PE2 or PE4 can be used for this initial screen to avoid screening nicking sgRNAs in tandem. Based on sequencing results from this initial screen, additional optimization can be performed. Screening additional PBS and RTT lengths is recommended if low editing efficiency is observed. Once optimal PBS and RTT lengths are found, additional improvements, such as nicking sgRNAs and MMR-evading mutations, can be tested using the optimized epegRNA. [0019] FIGs.7A-7H: Example results of prime editing efficiency screens. FIGs.7A-7B: Heat map showing a PE4 system screen of PBS lengths and RTT lengths for CXCR4 P191A installation (FIG.7A) and IL2RB H134D + Y135F installation (FIG.7B). Note that the optimal PBS and RTT lengths are different between the two sites shown in FIGs.7A and 7B. Values shown in the heat map cells reflect the mean of n=3 independent replicates. FIGs. 7C-7D: prime editing efficiency with PE2, PE3, PE4, or PE5 editing systems at the CXCR4 locus (FIG.7C) and the IL2RB locus (FIG.7D). All values from n=3 independent replicates are shown. FIGs.7E-7F: Editing of the CXCR4 locus (FIG.7E) and the IL2RB locus (FIG. 7F) in HeLa cells, All values from n=3 independent replicates are shown. FIG.7G: Example allele table generated by CRISPResso2 based on editing outcome of installation of +1T>A and +5G>C edits at the IL2RB locus. FIG.7H: Example of delivery optimization in patient- derived iPSC cells. Relative to lipid transfection, mRNA electroporation generated a large
improvement in editing efficiency. All values from n=3 independent replicates are shown. Data shown in FIGs.7A-7H was uniquely collected for this protocol, but experimental techniques are identical to previously reported work
15,30,31. [0020] FIG.8A is a schematic depicting an example of temporal second strand nicking exemplified by PE3b (PE3b = PE2 prime editor fusion protein + PEgRNA containing an edit + second strand nicking guide RNA) as described in International PCT Application No. PCT/US2020/023721, which is incorporated herein by reference. Temporal second strand nicking is a variant of second strand nicking in order to facilitate the formation of the desired edited product. The “temporal” term refers to the fact that the second-strand nick to the unedited strand occurs only after the desired edit is installed in the edited strand. This avoids concurrent nicks on both strands to lead to double-stranded DNA breaks. [0021] FIG.8B is a schematic depicting an example of editing mechanism by PE3b. The RTT of the pegRNA encodes one or more MMR-evading silent mutations in addition to the A>C nucleotide substitution. (*) n indicates the one or more MMR-evading silent mutations, wherein “ ” designates a single silent mutation and “n” designates an integer of at least 1. [0022] FIG.8C: Bar plot of PE conditions tested for introduction of MMR-evading silent edits in combination with PE3b nicking guide RNAs that have spacers corresponding to the silent edits introduced by pegRNA. Three prime editing approaches and a control are shown: (None), which is a PE2 approach with no secondary nicking sgRNA; (3), in which the nick 3 nicking sgRNA was used in a non-PE3b approach; (13), in which the nick 13 nicking sgRNA was targeted to a protospacer with installed MMR-evading silent edits in a PE3b approach; and (No Edit), in which cells were not edited.. [0023] FIG.9: Schematic showing nucleotide sequence of the ATP1A3 before and after editing according to the experiment described in FIG.8C and the relative positions of the pegRNA protospacer, a PE3 (non-PE3b) nicking guide RNA protospacer, and the “nick 13” nicking guide RNA protospacer. [0024] FIG.10: Schematic showing the sequence of the edited product of an experiment to correct the Alternating Hemiplegia of Childhood (AHC) associated D801N pathogenic c- 2401A mutation in the ATP1A3 locus using the PE3b approach with three nicking sgRNAs (nicks 13, 14, and 15). [0025] FIG.11: Schematic showing the unedited target sequence edited in the experiment described in FIG.10.
[0026] FIG.12: Schematic showing the target sequence of the experiment described in FIG. 10 after the initial pegRNA edit has occurred on the bottom strand of DNA. A heteroduplex of mismatched DNA exists with an edited bottom strand and an unedited top strand. [0027] FIG.13: Diagram of the nick 13 + PE RNP recognizing the DNA heteroduplex shown in FIG.12. Arrows indicate edits on the bottom strand. [0028] FIG.14: Bar plot showing Atp1a3 D801N G>A homozygous mutation correction with silent mutation installation using a PEmax prime editor, an epegRNA, and nicking guide RNAs nick 13, nick 14, or nick 15. Percent D801N A>G correction or indels are shown for using no nicking sgRNA (none), nick 13, 14, and 15 (13, 14, 15, respectively), and negative control (no edit). [0029] FIG.15: Schematic for exemplary workflow for optimizing prime editing parameters for a particular edit of interest. DEFINITIONS [0030] Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed.1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them unless specified otherwise. Cas9 [0031] The term “Cas9” or “Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 domain, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9). A “Cas9 domain,” as used herein, is a protein fragment comprising an active or fully or partly inactive cleavage domain of Cas9 and/or the gRNA binding domain of Cas9. A “Cas9 protein” is a full length Cas9 protein. A Cas9 nuclease is also referred to sometimes as a casn1 nuclease or a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat)-associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II
CRISPR systems, correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc), and a Cas9 domain. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves a linear or circular dsDNA target complementary to the spacer. The strand in the target DNA not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gRNA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the contents of which are incorporated herein by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A. 98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816- 821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease comprises one or more mutations that partially impair or inactivate the DNA cleavage domain. [0032] A nuclease-inactivated Cas9 domain may interchangeably be referred to as a “dCas9” protein (for nuclease-“dead” Cas9). Methods for generating a Cas9 domain (or a fragment
thereof) having an inactive DNA cleavage domain are known (see, e.g., Jinek et al., Science. 337:816-821(2012); Qi et al., “Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression” (2013) Cell.28;152(5):1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, whereas the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9. For example, the mutations D10A and H840A completely inactivate the nuclease activity of S. pyogenes Cas9 (Jinek et al., Science.337:816-821(2012); Qi et al., Cell.28;152(5):1173-83 (2013)). In some embodiments, proteins comprising fragments of a Cas9 protein are provided. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9. In some embodiments, proteins comprising Cas9, or fragments thereof, are referred to as “Cas9 variants.” A Cas9 variant shares homology to Cas9, or a fragment thereof. For example, a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 43). In some embodiments, the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 43). In some embodiments, the Cas9 variant comprises a fragment of SEQ ID NO: 43 Cas9 (e.g., a gRNA binding domain or a DNA- cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 43). In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 43).
CRISPR [0033] CRISPR is a family of DNA sequences (i.e., CRISPR clusters) in bacteria and archaea that represent snippets of prior infections by a virus that have invaded the prokaryote. The snippets of DNA are used by the prokaryotic cell to detect and destroy DNA from subsequent attacks by similar viruses and effectively compose, along with an array of CRISPR- associated proteins (including Cas9 and homologs thereof) and CRISPR-associated RNA, a prokaryotic immune defense system. In nature, CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc) and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3-aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves a linear or circular dsDNA target complementary to the RNA. Specifically, the DNA strand in the target that is not complementary to crRNA is first cut endonucleolytically, then trimmed 3´-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gRNA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species – the guide RNA. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. CRISPR biology, as well as Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A.98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816- 821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes
and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. [0034] In general, a “CRISPR system” refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g., tracrRNA or an active partial tracrRNA), a tracr mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” ), or other sequences and transcripts from a CRISPR locus. The tracrRNA of the system is complementary (fully or partially) to the tracr mate sequence present on the guide RNA. DNA synthesis template (or Reverse Transcriptase Template (RTT)) [0035] As used herein, the terms “DNA synthesis template” and “reverse transcriptase template (RTT)” refer to the region or portion of the extension arm of a PEgRNA that is utilized as a template by a polymerase of a prime editor to encode a 3ʹ single-strand DNA flap that contains the desired edit and which then, through the mechanism of prime editing, replaces the corresponding endogenous strand of DNA at the target site. The extension arm, including the DNA synthesis template, may be comprised of DNA or RNA. In the case of RNA, the polymerase of the prime editor can be an RNA-dependent DNA polymerase (e.g., a reverse transcriptase). In the case of DNA, the polymerase of the prime editor can be a DNA- dependent DNA polymerase. In various embodiments, the DNA synthesis template may comprise the “edit template” and the “homology arm”, and all or a portion of an optional 5′ end modifier region and/or an optional 3’ end modifier region... Said another way, in the case of a 3ʹ extension arm, the DNA synthesis template can include the portion of the extension arm that spans from the 5ʹ end of the primer binding site (PBS) to 3ʹ end of the gRNA core that may operate as a template for the synthesis of a single-strand of DNA by a polymerase (e.g., a reverse transcriptase). In the case of a 5ʹ extension arm, the DNA synthesis template can include the portion of the extension arm that spans from the 5ʹ end of the PEgRNA molecule to the 5’ end of the PBS. Certain embodiments described here refer to a “reverse transcriptase template,” an “RT template,” or an “RTT,” which is also inclusive of the edit
template and the homology arm, but wherein the RT edit template reflects the use of a prime editor having a polymerase that is a reverse transcriptase, and wherein the DNA synthesis template reflects more broadly the use of a prime editor having any polymerase. In certain embodiments, an RT template may be used to refer to a template polynucleotide for reverse transcription, e.g., in a prime editing system, complex or method using a prime editor having a polymerase that is a reverse transcriptase. In some embodiments, a DNA synthesis template may be used to refer to a template polynucleotide for DNA polymerization, e.g., RNA- dependent DNA polymerization or DNA-dependent polymerization, e.g., in a prime editing system, complex or method using a prime editor having a polymerase that is an RNA- dependent DNA polymerase or a DNA-dependent DNA polymerase. The term “edit template” refers to a portion of the extension arm that encodes the desired edit in the single strand 3ʹ DNA flap that is synthesized by the polymerase, e.g., a DNA-dependent DNA polymerase or a RNA-dependent DNA polymerase (e.g., a reverse transcriptase).. [0036] As used herein, the term “DNA synthesis template” refers to the region or portion of the extension arm of a pegRNA that is utilized as a template strand by a polymerase of a prime editor to encode a 3ʹ single-strand DNA flap that contains the desired edit and which then, through the mechanism of prime editing, replaces the corresponding endogenous strand of DNA at the target site. In various embodiments, the DNA synthesis template is shown in FIG.3A (in the context of a pegRNA comprising a 5ʹ extension arm), FIG.3B (in the context of a pegRNA comprising a 3ʹ extension arm), FIG.3C (in the context of an internal extension arm), FIG.3D (in the context of a 3ʹ extension arm), and FIG.3E (in the context of a 5ʹ extension arm). The extension arm, including the DNA synthesis template, may be comprised of DNA or RNA. In the case of RNA, the polymerase of the prime editor can be an RNA- dependent DNA polymerase (e.g., a reverse transcriptase). In the case of DNA, the polymerase of the prime editor can be a DNA-dependent DNA polymerase. In various embodiments (e.g., as depicted in FIGs.3D-3E), the DNA synthesis template comprises an the “edit template” and a “homology arm.” In various embodiments (e.g., as depicted in FIGs.3D-3E), the DNA synthesis template (4) may comprise the “edit template” and a “homology arm”, and all or a portion of the optional 5′ end modifier region, e2. That is, depending on the nature of the e2 region (e.g., whether it includes a hairpin, toeloop, or stem/loop secondary structure), the polymerase may encode none, some, or all of the e2 region, as well. Said another way, in the case of a 3ʹ extension arm, the DNA synthesis template (3) can include the portion of the extension arm that spans from the 5ʹ end of the
primer binding site (PBS) to 3ʹ end of the gRNA core that may operate as a template for the synthesis of a single-strand of DNA by a polymerase (e.g., a reverse transcriptase). In the case of a 5ʹ extension arm, the DNA synthesis template can include the portion of the extension arm that spans from the 5ʹ end of the pegRNA molecule to the 3ʹ end of the edit template. In some embodiments, the DNA synthesis template excludes the primer binding site (PBS) of pegRNAs either having a 3ʹ extension arm or a 5ʹ extension arm. Certain embodiments described here (e.g., FIG.71A) refer to an “RT template,” which is inclusive of the edit template and the homology arm, i.e., the sequence of the pegRNA extension arm which is actually used as a template during DNA synthesis. The term “RT template” is equivalent to the term “DNA synthesis template.” In certain embodiments, an RT template may be used to refer to a template polynucleotide for reverse transcription, e.g., in a prime editing system, complex or method using a prime editor having a polymerase that is a reverse transcriptase. In some embodiments, a DNA synthesis template may be used to refer to a template polynucleotide for DNA polymerization, e.g., RNA-dependent DNA polymerization or DNA-dependent polymerization, e.g., in a prime editing system, complex or method using a prime editor having a polymerase that is an RNA-dependent DNA polymerase or a DNA- dependent DNA polymerase. [0037] In some embodiments, the DNA synthesis template is a single-stranded portion of the PEgRNA that is 5’ of the PBS and comprises a region of complementarity to the PAM strand (i.e., the non-target strand or the edit strand), and comprises one or more nucleotide edits compared to the endogenous sequence of the double stranded target DNA. In some embodiments, the DNA synthesis template is complementary or substantially complementary to a sequence on the non-target strand that is downstream of a nick site, except for one or more non-complementary nucleotides at the intended nucleotide edit positions. In some embodiments, the DNA synthesis template is complementary or substantially complementary to a sequence on the non-target strand that is immediately downstream (i.e., directly downstream) of a nick site, except for one or more non-complementary nucleotides at the intended nucleotide edit positions. In some embodiments, one or more of the non- complementary nucleotides at the intended nucleotide edit positions are immediately downstream of a nick site. In some embodiments, the DNA synthesis template comprises one or more nucleotide edits relative to the double-stranded target DNA sequence. In some embodiments, the DNA synthesis template comprises one or more nucleotide edits relative to the non-target strand of the double-stranded target DNA sequence. For each PEgRNA
described herein, a nick site is characteristic of the particular napDNAbp to which the gRNA core of the PEgRNA associates with, and is characteristic of the particular PAM required for recognition and function of the napDNAbp. For example, for a PEgRNA that comprises a gRNA core that associates with a SpCas9, the nick site in the phosphodiester bond between bases three (“-3” position relative to the position 1 of the PAM sequence) and four (“-4” position relative to position 1 of the PAM sequence). In some embodiments, the DNA synthesis template and the primer binding site are immediately adjacent to each other. The terms “nucleotide edit”, “nucleotide change”, “desired nucleotide change”, and “desired nucleotide edit” are used interchangeably to refer to a specific nucleotide edit, e.g., a specific deletion of one or more nucleotides, a specific insertion of one or more nucleotides, a specific substitution(s) of one or more nucleotides, or a combination thereof, at a specific position in a DNA synthesis template of a PEgRNA to be incorporated in a target DNA sequence. In some embodiments, the DNA synthesis template comprises more than one nucleotide edits relative to the double-stranded target DNA sequence. In such embodiments, each nucleotide edit is a specific nucleotide edit at a specific position in the DNA synthesis template, each nucleotide edit is at a different specific position relative to any of the other nucleotide edits in the DNA synthesis template, and each nucleotide edit is independently selected from a specific deletion of one or more nucleotides, a specific insertion of one or more nucleotides, a specific substitution(s) of one or more nucleotides, or a combination thereof. A nucleotide edit may refer to the edit on the DNA synthesis template as compared to the sequence on the target strand of the double stranded target DNA, or may refer to the edit encoded by the DNA synthesis template on the newly synthesized single stranded DNA that replaces the endogenous target DNA sequence on the non-target strand, in either case, may be refer to as a nucleotide edit compared to the target DNA sequence. Edit strand and non-edit strand [0038] The terms “edit strand” and “non-edit strand” are terms that may be used when describing the mechanism of action of a prime editing system on a double-stranded DNA substrate. The “edit strand” refers to the strand of DNA which is nicked by the prime editor complex to form a 3ʹ end, which is then extended as a newly synthesized single stranded DNA (also referred herein as the newly synthesized 3’ DNA flap), which comprises a desired edit and ultimate displaces and replaces the single strand region of DNA just downstream of the nick, thereby installing the 3ʹ DNA flap containing the desired edit downstream of the nick on the “edit strand.” In some embodiments, the newly synthesized 3’ DNA flap
comprising the nucleotide edit is paired in a heteroduplex with the non-edit strand that does not comprise the nucleotide edit, thereby creating a mismatch. In some embodiments, the mismatch is recognized by DNA repair machinery, and/or replication machinery, e.g., an endogenous DNA repair machinery. In some embodiments, through DNA repair, the intended nucleotide edit is incorporated into both strands of the target double-stranded DNA substrate. The application may also refer to the “edit strand” as the “protospacer strand” or the “PAM strand” since these elements are present in that strand. The “edit strand” may also be called the “non-target strand” since the edit strand is not the strand that becomes annealed to the spacer of the PEgRNA molecule, but rather is the complement of the strand that is annealed by the spacer of the PEgRNA. The “non-edit” strand is not directly edited by the PE system. Rather, the desired edit created by the PE system in the 3ʹ DNA flap is incorporated into the “non-edited strand” through DNA replication and/or repair. In some embodiments, he “non- edit strand” is the strand that anneal to the spacer of the PEgRNA and thus is also called the “target strand.” Extension arm [0039] The term “extension arm” refers to a nucleotide sequence component of a PEgRNA which comprises a primer binding site (PBS) and a DNA synthesis template for a polymerase (e.g., an RT template for reverse transcriptase). In some embodiments, the extension arm is located at the 3ʹ end of the guide RNA. In other embodiments, the extension arm is located at the 5ʹ end of the guide RNA. In some embodiments, the extension arm comprises a DNA synthesis template and a primer binding site. In some embodiments, the extension arm comprises the following components in a 5ʹ to 3ʹ direction: the DNA synthesis template, and the primer binding site. In some embodiments, the extension arm also includes a homology arm. In various embodiments, the extension arm comprises the following components in a 5ʹ to 3ʹ direction: the homology arm, the edit template, and the primer binding site. Since polymerization activity of the reverse transcriptase is in the 5ʹ to 3ʹ direction, the preferred arrangement of the homology arm, edit template, and primer binding site is in the 5ʹ to 3ʹ direction such that the reverse transcriptase, once primed by an annealed primer sequence, polymerizes a single strand of DNA using the edit template as a complementary template strand. [0040] The extension arm may be described as comprising generally two regions: a primer binding site (PBS) and a DNA synthesis template, for instance. The primer binding site binds to a primer sequence, for example, a single stranded primer sequence containing a free 3’ end
at the nick site that is formed from the endogenous DNA strand of the target site when it becomes nicked by the prime editor complex, thereby exposing a 3ʹ end on the endogenous nicked strand. As explained herein, the binding of the primer sequence to the primer binding site on the extension arm of the PEgRNA creates a duplex region with an exposed 3ʹ end (i.e., the 3ʹ of the primer sequence), which then provides a substrate for a polymerase to begin polymerizing a single strand of DNA from the exposed 3ʹ end along the length of the DNA synthesis template. The sequence of the single strand DNA product is the complement of the DNA synthesis template. Polymerization continues towards the 5ʹ of the DNA synthesis template (or extension arm) until polymerization terminates. Thus, the DNA synthesis template represents the portion of the extension arm that is encoded into a single strand DNA product (i.e., the 3ʹ single strand DNA flap containing the desired nucleotide edit) by the polymerase of the prime editor complex and that ultimately replaces the corresponding endogenous DNA strand of the target site that sits immediately downstream of the PE- induced nick site. Without being bound by theory, polymerization of the DNA synthesis template continues towards the 5ʹ end of the extension arm until a termination event. Polymerization may terminate in a variety of ways, including, but not limited to (a) reaching a 5ʹ terminus of the PEgRNA (e.g., in the case of the 5ʹ extension arm wherein the DNA polymerase simply runs out of template), (b) reaching an impassable RNA secondary structure (e.g., hairpin or stem/loop), or (c) reaching a replication termination signal, e.g., a specific nucleotide sequence that blocks or inhibits the polymerase, or a nucleic acid topological signal, such as, supercoiled DNA or RNA. Fusion protein [0041] The term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino-terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C- terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein. Another example includes fusion of a Cas9 or equivalent thereof to a reverse transcriptase. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins
comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4
th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which is incorporated herein by reference. Guide RNA (“gRNA”) [0042] As used herein, the term “guide RNA” is a particular type of guide nucleic acid which is mostly commonly associated with a Cas protein of a CRISPR-Cas9 and which associates with Cas9, directing the Cas9 protein to a specific sequence in a DNA molecule that includes complementarity to the spacer sequence of the guide RNA. However, this term also embraces the equivalent guide nucleic acid molecules that associate with Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and which otherwise program the Cas9 equivalent to localize to a specific target nucleotide sequence. The Cas9 equivalents may include other napDNAbp from any type of CRISPR system (e.g., type II, V, VI), including Cpf1 (a type-V CRISPR- Cas systems), C2c1 (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system), and C2c3 (a type V CRISPR-Cas system). Further Cas-equivalents are described in Makarova et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), the contents of which are incorporated herein by reference. Exemplary sequences and structures of guide RNAs are provided herein. In addition, methods for designing appropriate guide RNA sequences are provided herein. As used herein, the “guide RNA” may also be referred to as a “traditional guide RNA” to contrast it with the modified forms of guide RNA termed “prime editing guide RNAs” (or “PEgRNAs”) and “engineered PEgRNAs” (or epegRNAs”). [0043] Guide RNAs or PEgRNAs/epegRNAs may comprise various structural elements that include, but are not limited to: [0044] Spacer sequence – the sequence in the guide RNA or pegRNA/epegRNA (having about 20 nts in length) that has the same sequence as the protospacer in the target DNA, except that the guide RNA or PEgRNA/epegRNA comprises Uracil and the target protospacer contains Thymine. [0045] gRNA core (or gRNA scaffold or backbone sequence) – the sequence within the gRNA that is responsible for binding with a nucleic acid programmable DNA binding protein, e.g., a Cas9. It does not include the spacer sequence that is used to guide Cas9 to target DNA.
[0046] in some embodiments, a pegRNA or epegRNA may also comprise an extension arm – a single strand extension at the 3ʹ end or the 5ʹ end of the PEgRNA which comprises a primer binding site and a DNA synthesis template sequence that encodes via a polymerase (e.g., a reverse transcriptase) a single stranded DNA flap containing the desired nucleotide change, which then integrates into the endogenous DNA by replacing the corresponding endogenous strand, thereby installing the desired nucleotide change. [0047] Transcription terminator – the guide RNA or PEgRNA may comprise a transcriptional termination sequence at the 3ʹ of the molecule. Linker [0048] The term “linker,” as used herein, refers to a molecule linking two other molecules or moieties. The linker can be an amino acid sequence in the case of a peptide linker joining two domains of a fusion protein. For example, a Cas9 can be fused to a reverse transcriptase by an amino acid linker sequence. The linker can also be a nucleotide sequence in the case of joining two nucleotide sequences together (e.g., in a gRNA). For example, in the instant case, the traditional guide RNA is linked via a spacer or linker nucleotide sequence to the RNA extension of a prime editing guide RNA which may comprise an RT template sequence and an RT primer binding site. In other embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-200 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated. MLH1 [0049] The term “MLH1 gene” refers to a gene encoding MLH1 (or MutL Homolog 1), a DNA mismatch repair enzyme. The protein encoded by this gene can heterodimerize with mismatch repair endonuclease PMS2 to form MutL alpha (MutLα), part of the DNA mismatch repair system. MLH1 mediates protein-protein interactions during mismatch recognition, strand discrimination, and strand removal. In mismatch repair, the heterodimer MSH2:MSH6 (MutSα) forms and binds the mismatch. MLH1 then forms a heterodimer with PMS2 (MutLα) and binds the MSH2:MSH6 heterodimer. The MutLα heterodimer then incises the nicked strand 5′ and 3′ of the mismatch, followed by excision of the mismatch from MutLα-generated nicks by EXO1. Finally, POLδ resynthesizes the excised strand, followed by LIG1 ligation.
[0050] An exemplary amino acid sequence of MLH1 is human isoform 1, P40692-1: >sp|P40692|MLH1_HUMAN DNA mismatch repair protein Mlh1 OS=Homo sapiens OX=9606 GN=MLH1 PE=1 SV=1: [0051] MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEG GLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHV TITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPS EEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSREL IEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKN THPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQ TLLPGLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKP LSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSE KRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGH EVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLR LSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEE GNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISE ESTLSGQQSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYK VFERC (SEQ ID NO: 1), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 1. [0052] Another exemplary amino acid sequence of MLH1 is human isoform 2, P40692-2 (wherein amino acids 1-241 of isoform 1 are missing): >sp|P40692-2|MLH1_HUMAN Isoform 2 of DNA mismatch repair protein Mlh1 OS=Homo sapiens OX=9606 GN=MLH1: [0053] MNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLS LEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLA GPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAI VTEDKTDISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSN PRKRHREDSDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREML HNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPL FDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLP LLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQ QSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC (SEQ ID NO: 2), or an amino acid sequence having at least 70%, at least 75%, at least 80%,
at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 2. [0054] Another exemplary amino acid sequence of MLH1 is human isoform 3, P40692-3 (where amino acids 1-101 (MSFVAGVIRR…ASISTYGFRG (SEQ ID NO: 3) is replaced with MAF): >sp|P40692-2|MLH1_HUMAN Isoform 2 of DNA mismatch repair protein Mlh1 OS=Homo sapiens OX=9606 GN=MLH1: [0055] MAFEALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQI TVEDLFYNIATRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTL PNASTVDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRL VESTSLRKAIETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILER VQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQ MVRTDSREQKLDAFLQPLSKPLSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEV AAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTP RRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTT KLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYI VEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEK ECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALRSHIL PPKHFTEDGNILQLANLPDLYKVFERC (SEQ ID NO: 4), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 4. [0056] In some embodiments, the present disclosure contemplates an inhibitor of MLH1 and/or MMR pathway components that interact with MLH1, including any wildtype or naturally occurring variant of MLH1, including any amino acid sequence having at least 70%, or 75%, or 80%, or 85%, or 90%, or 95%, or 99% or more sequence identity with any of SEQ ID NOs: 1-11, or nucleic acid molecules encoding any MLH1 or variant of MLH1 (e.g., a dominant negative mutant of MLH1 as described herein), for inhibiting, blocking, or otherwise inactivating the wild type MLH1 function in the MMR pathway, and consequently, inhibiting, blocking, or otherwise inactivating the MMR pathway, e.g., during genome editing with a prime editor. [0057] In some embodiments, inactivation of the MMR pathway involves an inhibitor that disrupts, blocks, interferes with, or otherwise inactivates the wild type function of the MLH1 protein. In some embodiments, inactivation of the MMR pathway involves a mutant of the
MLH1 protein. In some embodiments, the MLH1 mutant protein interferes with, and thereby inactivates, the function of a wild type MLH1 protein in the MMR pathway. In some embodiments, the MLH1 mutant is a dominant negative mutant. In some embodiments, the MLH mutant protein is capable of binding to an MLH1-interacting protein, for example, MutS. [0058] Without being bound by theory, MLH1 dominant negative mutants function by saturating binding of MutS, thereby blocking MutS-wild type MLH1 binding and interfering with the function of the wild type MLH1 protein in the MMR pathway. [0059] In various embodiments, the dominant negative MLH1 can include, for example, MLH1 E34A, which is based on SEQ ID NO: 5 and has the following amino acid sequence (underline and bolded to show the E34A mutation): [0060] MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKAMIENCLDAKSTSIQVIVKE GGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAH VTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNP SEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRE LIEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPK NTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFT QTLLPGLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSK PLSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMS EKRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQG HEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGV LRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEID EEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYI SEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDL YKVFERC (SEQ ID NO: 5), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 5. [0061] In various other embodiments, the dominant negative MLH1 can include, for example, MLH1 Δ756, which is based on SEQ ID NO: 6 and has the following amino acid sequence (underline and bolded to show the Δ756 mutation at the C terminus of the sequence): [0062] MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEG GLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHV
TITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPS EEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSREL IEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKN THPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQ TLLPGLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKP LSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSE KRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGH EVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLR LSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEE GNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISE ESTLSGQQSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYK VFER[-](SEQ ID NO: 6), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 6 (wherein the [- ] indicates deleted amino acid residue(s) relative to the parent or wildtype sequence). [0063] In still other embodiments, the dominant negative MLH1 can include, for example, MLH1 Δ754-Δ756, which is based on SEQ ID NO: 7 and has the following amino acid sequence (underline and bolded to show the Δ754-Δ756 mutation at the C terminus of the sequence): [0064] MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEG GLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHV TITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPS EEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSREL IEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKN THPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQ TLLPGLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKP LSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSE KRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGH EVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLR LSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEE GNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISE ESTLSGQQSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYK VF[- - -] (SEQ ID NO: 7), or an amino acid sequence having at least 70%, at least 75%, at
least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 7 (wherein the [- - -] indicates deleted amino acid residue(s) relative to the parent or wildtype sequence). [0065] In yet other embodiments, the dominant negative MLH1 can include, for example, MLH1 E34A Δ754-Δ756, which is based on SEQ ID NO: 8 and has the following amino acid sequence (underline and bolded to show the E34A and Δ754-Δ756 mutations): [0066] MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKAMIENCLDAKSTSIQVIVKE GGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAH VTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNP SEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRE LIEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPK NTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFT QTLLPGLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSK PLSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMS EKRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQG HEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGV LRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEID EEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYI SEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDL YKVF[- - -] (SEQ ID NO: 8), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 8. [0067] In certain embodiments, the dominant negative MLH1 can include, for example, MLH1 1-335, which is based on SEQ ID NO: 9 and has the following amino acid sequence (contains amino acids 1-335 of SEQ NO: 1): [0068] MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEG GLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHV TITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPS EEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSREL IEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKN THPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLL (SEQ ID NO: 9), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at
least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 9. [0069] In other embodiments, the dominant negative MLH1 can include, for example, MLH1 1-335 E34A, which is based on SEQ ID NO: 10 and has the following amino acid sequence (contains amino acids 1-335 of SEQ NO: 1 and a E34A mutation relative to SEQ ID NO: 1): [0070] MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKAMIENCLDAKSTSIQVIVKE GGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAH VTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNP SEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRE LIEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPK NTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLL (SEQ ID NO: 10), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 10. [0071] In still other embodiments, the dominant negative MLH1 can include, for example, MLH1 1-335 NLS
SV40 (or referred to as MLH1dn
NTD, which is based on SEQ ID NO: 1 and has the following amino acid sequence (contains amino acids 1-335 of SEQ NO: 1 and an NLS sequence of SV40): [0072] MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEG GLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHV TITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPS EEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSREL IEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKN THPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLPKKKRKV (SEQ ID NO: 11), with the underlined and bolded portion referring to the NLS of SV40), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 11. [0073] In still other embodiments, the dominant negative MLH1 can include, for example, MLH1 1-335 NLS
alternate (which is based on SEQ ID NO: 1 and having the following amino acid sequence (contains amino acids 1-335 of SEQ NO: 1 and an alternate NLS sequence)): [0074] MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEG GLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHV
TITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPS EEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSREL IEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKN THPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLL-[alternate NLS sequence] (SEQ ID NO: 9)-[alternate NLS sequence], or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 9. The alternate NLS sequence can be any suitable NLS sequence, including but not limited to:
[0075] In still other embodiments, the dominant negative MLH1 can include, for example, MLH1 501-756, which corresponds to a C-terminal fragment of SEQ ID NO: 1 that corresponds to amino acids 501-756 of SEQ ID NO: 1: [0076] INLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNT TKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYI VEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEK ECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALRSHIL PPKHFTEDGNILQLANLPDLYKVFERC (SEQ ID NO: 12), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 12. [0077] In still other embodiments, the dominant negative MLH1 can include, for example, MLH1 501-753, which corresponds to a C-terminal fragment of SEQ ID NO: 1 that corresponds to amino acids 501-753 of SEQ ID NO: 1:
INLTSVLSLQEEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSE ELFYQILIYDFANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFL KKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFE SLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALRSHILPPKH FTEDGNILQLANLPDLYKVF[- - -] (SEQ ID NO: 13), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 13. [0078] In still other embodiments, the dominant negative MLH1 can include, for example, MLH1 461-756, which is a C-terminal fragment of SEQ ID NO: 1 that corresponds to amino acids 461-756 of SEQ ID NO: 1: KRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGH EVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLR LSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEE GNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISE ESTLSGQQSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYK VFERC (SEQ ID NO: 14), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 14. [0079] In various embodiments, the dominant negative MLH1 can include, for example, MLH1 461-753, which is a C-terminal fragment of SEQ ID NO: 1 that corresponds to amino acids 461-753 of SEQ ID NO: 1: [0080] KRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEI NEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELFYQILIYDFA NFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYF SLEIDEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSI RKQYISEESTLSGQQSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLAN LPDLYKVF[- - -] (SEQ ID NO: 15), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 15. [0081] In various other embodiments, the dominant negative MLH1 can include, for example, MLH1461-753, which is a C-terminal fragment of SEQ ID NO: 1 that corresponds to amino acids 461-753 of SEQ ID NO: 1, and which further comprises an N-terminal NLS,
e.g., NLS
SV40: [NLS]- KRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGH EVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLR LSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEE GNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISE ESTLSGQQSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYK VF[- - -] (SEQ ID NO: 15), or an amino acid sequence having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to and including 100% sequence identity with SEQ ID NO: 15. The NLS sequence can be any suitable NLS sequence, including but not limited to:
napDNAbp [0082] As used herein, the term “nucleic acid programmable DNA binding protein” or “napDNAbp,” of which Cas9 is an example, refers to a protein that uses RNA:DNA hybridization to target and bind to specific sequences in a DNA molecule. Each napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the protospacer of a guide RNA). In other words, the guide nucleic-acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to a complementary sequence. [0083] Without being bound by theory, the binding mechanism of a napDNAbp – guide RNA complex, in general, includes the step of forming an R-loop whereby the napDNAbp induces the unwinding of a double-strand DNA target, thereby separating the strands in the region
bound by the napDNAbp. The guide RNA protospacer then hybridizes to the “target strand.” This displaces a “non-target strand” that is complementary to the target strand, which forms the single strand region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities, which then cut the DNA, leaving various types of lesions. For example, the napDNAbp may comprise a nuclease activity that cuts the non-target strand at a first location, and/or cuts the target strand at a second location. Depending on the nuclease activity, the target DNA can be cut to form a “double-stranded break” whereby both strands are cut. In other embodiments, the target DNA can be cut at only a single site, i.
., the DNA is “nicked” on one strand. Exemplary napDNAbp with different nuclease activities include “Cas9 nickase” (“nCas9”) and a deactivated Cas9 having no nuclease activities (“dead Cas9” or “dCas9”). Exemplary sequences for these and other napDNAbp are provided herein. Nickase [0084] As used herein, a “nickase” refers to a napDNAbp (e.g., a Cas protein) which is capable of cleaving only one of the two complementary strands of a double-stranded target DNA sequence, thereby generating a nick in that strand. In some embodiments, the nickase cleaves a non-target strand of a double stranded target DNA sequence. In some embodiments, the nickase comprises an amino acid sequence with one or more mutations in a catalytic domain of a canonical napDNAbp (e.g., a Cas protein), wherein the one or more mutations reduces or abolishes nuclease activity of the catalytic domain. In some embodiments, the nickase is a Cas9 that comprises one or more mutations in a RuvC-like domain relative to a wild type Cas9 sequence or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is a Cas9 that comprises one or more mutations in an HNH-like domain relative to a wild type Cas9 sequence or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is a Cas9 that comprises an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 relative to a canonical SpCas9 sequence or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is a Cas9 that comprises an H840A, N854A, and/or N863A mutation relative to a canonical SpCas9 sequence, or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the term “Cas9 nickase” refers to a Cas9 with one of the two nuclease domains inactivated. This enzyme is capable of cleaving only one strand of a target DNA. In some embodiments, the nickase is a Cas protein that is not a Cas9 nickase.
[0085] In some embodiments, the napDNAbp of the prime editing complex comprises an endonuclease having nucleic acid programmable DNA binding ability. In some embodiments, the napDNAbp comprises an active endonuclease capable of cleaving both strands of a double stranded target DNA. In some embodiments, the napDNAbp is a nuclease active endonuclease, e.g., a nuclease active Cas protein, that can cleave both strands of a double stranded target DNA by generating a nick on each strand. For example, a nuclease active Cas protein can generate a cleavage (a nick) on each strand of a double stranded target DNA. In some embodiments, the two nicks on both strands are staggered nicks, for example, generated by a napDNAbp comprising a Cas12a or Cas12b1. In some embodiments, the two nicks on both strands are at the same genomic position, for example, generated by a napDNAbp comprising a nuclease active Cas9. In some embodiments, the napDNAbp comprises an endonuclease that is a nickase. For example, in some embodiments, the napDNAbp comprises an endonuclease comprising one or more mutations that reduce nuclease activity of the endonuclease, rendering it a nickase. In some embodiments, the napDNAbp comprises an inactive endonuclease, for example, in some embodiments, the napDNAbp comprises an endonuclease comprising one or more mutations that abolish the nuclease activity. In various embodiments, the napDNAbp is a Cas9 protein or variant thereof. The napDNAbp can also be a nuclease active Cas9, a nuclease inactive Cas9 (dCas9), or a Cas9 nickase (nCas9). In a preferred embodiment, the napDNAbp is Cas9 nickase (nCas9) that nicks only a single strand. In other embodiments, the napDNAbp can be selected from the group consisting of: Cas9, Cas12e, Cas12d, Cas12a, Cas12b1, Cas12b2, Cas13a, Cas12c, Cas12d, Cas12e, Cas12h, Cas12i, Cas12g, Cas12f (Cas14), Cas12f1, Cas12j (CasΦ), and Argonaute and optionally has a nickase activity such that only one strand is cut. In some embodiments, the napDNAbp is selected from Cas9, Cas12e, Cas12d, Cas12a, Cas12b1, Cas12b2, Cas13a, Cas12c, Cas12d, Cas12e, Cas12h, Cas12i, Cas12g, Cas12f (Cas14), Cas12f1, Cas12j (CasΦ), and Argonaute and optionally has a nickase activity such that one DNA strand is cut preferentially to the other DNA strand. Nuclear localization sequence (NLS) [0086] The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed November 23, 2000, published as WO/2001/038547
on May 31, 2001, the contents of which are incorporated herein by reference for its disclosure of exemplary nuclear localization sequences. In some embodiments, an NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 31). Nucleic acid [0087] The term “nucleic acid,” as used herein, refers to a polymer of nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5-methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyl uridine, C5 propynyl cytidine, C5 methylcytidine, 7 deazaadenosine, 7 deazaguanosine, 8 oxoadenosine, 8 oxoguanosine, O(6) methylguanine, 4-acetylcytidine, 5- (carboxyhydroxymethyl)uridine, dihydrouridine, methylpseudouridine, 1-methyl adenosine, 1-methyl guanosine, N6-methyl adenosine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, modified sugars (e.g., 2′-fluororibose, ribose, 2′-deoxyribose, 2′-O-methylcytidine, arabinose, and hexose), or modified phosphate groups (e.g., phosphorothioates and 5ʹ N phosphoramidite linkages). PEgRNA [0088] As used herein, the terms “prime editing guide RNA” or “PEgRNA” or “pegRNA” or “extended guide RNA” refer to a specialized form of a guide RNA that has been modified to include one or more additional sequences for implementing the prime editing methods and compositions described herein. As described herein, the prime editing guide RNAs comprise one or more “extended regions”, also referred to herein as “extension arms”, of nucleic acid sequence. The extended regions may comprise, but are not limited to, single-stranded RNA or DNA. Further, the extended regions may occur at the 3′ end of a traditional guide RNA. In other arrangements, the extended regions may occur at the 5′ end of a traditional guide RNA. In still other arrangements, the extended region may occur at an intramolecular region of the traditional guide RNA, for example, in the gRNA core region which associates and/or binds to the napDNAbp. The extended region comprises a “DNA synthesis template” which encodes (by the polymerase of the prime editor) a single-stranded DNA which, in turn, has been designed to be (a) homologous with the endogenous target DNA to be edited, and (b) which comprises at least one desired nucleotide change (e.g., a transition, a transversion, a deletion, or an insertion) to be introduced or integrated into the endogenous target DNA. The extended region may also comprise other functional sequence elements, such as, but not
limited to, a “primer binding site” and a “linker” sequence, or other structural elements, such as, but not limited to aptamers, stem loops, hairpins, toe loops (e.g., a 3′ toeloop), or an RNA- protein recruitment domain (e.g., MS2 hairpin). As used herein, the “primer binding site” comprises a sequence that hybridizes to a single-strand DNA sequence having a 3′ end generated from the nicked DNA of the R-loop. [0089] In certain embodiments, the PEgRNAs have a 3ʹ extension arm, a spacer, and a gRNA core. The 3ʹ extension arm further comprises in the 5ʹ to 3ʹ direction a reverse transcriptase template, a primer binding site, and a linker. The reverse transcriptase template may also be referred to more broadly as the “DNA synthesis template” where the polymerase of a prime editor described herein is not an RT, but another type of polymerase. [0090] In certain other embodiments, the PEgRNAs have a 5ʹ extension arm, a spacer, and a gRNA core. The 5ʹ extension further comprises in the 5ʹ to 3ʹ direction a reverse transcriptase template, a primer binding site, and a linker. The reverse transcriptase template may also be referred to more broadly as the “DNA synthesis template” where the polymerase of a prime editor described herein is not an RT, but another type of polymerase. [0091] In still other embodiments, the PEgRNAs have in the 5ʹ to 3ʹ direction a spacer (1), a gRNA core (2), and an extension arm (3). The extension arm (3) is at the 3ʹ end of the PEgRNA. The extension arm (3) further comprises in the 5ʹ to 3ʹ direction a homology arm, an edit template, and a primer binding site. The extension arm (3) may also comprise an optional modifier region at the 3ʹ and 5ʹ ends, which may be the same sequences or different sequences. In addition, the 3ʹ end of the PEgRNA may comprise a transcriptional terminator sequence. These sequence elements of the PEgRNAs are further described and defined herein. [0092] In still other embodiments, the PEgRNAs have in the 5ʹ to 3ʹ direction an extension arm (3), a spacer (1), and a gRNA core (2). The extension arm (3) is at the 5ʹ end of the PEgRNA. The extension arm (3) further comprises in the 3ʹ to 5ʹ direction a primer binding site, an edit template, and a homology arm. The extension arm (3) may also comprise an optional modifier region at the 3ʹ and 5ʹ ends, which may be the same sequences or different sequences. The PEgRNAs may also comprise a transcriptional terminator sequence at the 3ʹ end. These sequence elements of the PEgRNAs are further described and defined herein. PE1 [0093] As used herein, “PE1” refers to a prime editing composition comprising 1) a fusion protein comprising a Cas9 protein variant Cas9(H840A) and a wild type MMLV RT having
the following structure: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(wt)] -NLS and 2) a desired PEgRNA, wherein the fusion protein (referred to as the PE1 protein) has the amino acid sequence of SEQ ID NO: 38, which is shown as follows. MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGN TDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAK VDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTD KADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPIN ASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFD LAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEI TKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGG ASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHA ILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPW NFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYV TEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVE DRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYA HLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNF MQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKVVDELV KVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEHPV ENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNK VLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGL SELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLV SDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYD VRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIV WDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDW DPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDF LEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNF LYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKV LSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDAT LIHQSITGLYETRIDLSQLGGDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTL NIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQY PMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVED IHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLT WTRLPQGFKNSPTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALL QTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREF LGTAGFCRLWIPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTK PFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAG KLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLL PLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTET EVIWAKALPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSE GKEIKNKDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTS TLLIENSSPSGGSKRTADGSEFEPKKKRKV (SEQ ID NO: 38) KEY: NUCLEAR LOCALIZATION SEQUENCE (NLS) TOP:(SEQ ID NO: 21), BOTTOM: (SEQ ID NO: 30) CAS9(H840A) (SEQ ID NO: 41) 33-AMINO ACID LINKER (SEQ ID NO: 64) M-MLV reverse transcriptase (SEQ ID NO: 79).
PE2 [0094] As used herein, “PE2” refers to prime editing composition comprising 1) a fusion protein comprising a Cas9 protein variant Cas9(H840A) and a variant MMLV RT having the following structure: [NLS]-[Cas9(H840A)]-[linker]- [MMLV_RT(D200N)(T330P)(L603W)(T306K)(W313F)] -NLS and 2) a desired PEgRNA, wherein the fusion protein (referred to as the PE2 protein) has the amino acid sequence of SEQ ID NO: 39, which is shown as follows: MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNT DRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDS FFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLI YLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAIL SARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDT YDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEH HQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDG TEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIEKIL TFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKN LPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNR KVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDI LEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRD KQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAG SPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEE GIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVP QSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDN LTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVIT LKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDY KVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEI VWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDP KKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAK GYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRD KPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETR IDLSQLGGDSGGSSGGSSGSETPGTSESATPESSGGSSGGSSTLNIEDEYRLHETSKEPD VSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQ RLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGL PPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTL FNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKA QICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFA EMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYA KGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAP HAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCL DILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGT SAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEIL ALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGG SKRTADGSEFEPKKKRKV (SEQ ID NO: 39) KEY:
NUCLEAR LOCALIZATION SEQUENCE (NLS) TOP:(SEQ ID NO: 21), BOTTOM: (SEQ ID NO: 30) CAS9(H840A) (SEQ ID NO: 41) 33-AMINO ACID LINKER (SEQ ID NO: 64) M-MLV reverse transcriptase (SEQ ID NO: 80). PE3 [0095] As used herein, “PE3” refers a prime editing composition comprising a PE2 and further comprising a second-strand nicking guide RNA that complexes with the PE2 and introduces a nick in the non-edit DNA strand in order to induce preferential replacement of the edit strand. PE3b [0096] As used herein, “PE3b” refers a prime editing composition comprising PE2 and further comprising a second-strand nicking guide RNA that complexes with the PE2 and introduces a nick in the non-edit DNA strand, wherein the second-strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing the second strand nicking guide RNA with a spacer sequence that comprise complementarity to, and only hybridizes with, only the edited strand after installation of the desired nucleotide edit(s), but not the endogenous target DNA sequence. Using this strategy, mismatches between the nicking guide RNA spacer and the unedited target DNA should disfavor nicking by the sgRNA until after the editing event on the PAM strand takes place. PE4 [0097] As used herein, “PE4” refers to a prime editing composition comprising a PE2 and further comprising an MLH1 dominant negative protein variant (i.e., wild-type MLH1 with amino acids 754-756 truncated, which may be referred to herein as “MLH1 Δ754-756” or “MLH1dn”). The MLH1 dominant negative protein variant may be expressed in trans in some embodiments. In some embodiments, a PE4 system comprises a fusion protein comprising a PE2 protein and an MLH1 dominant negative protein joined via an optional linker. . PE5 and PE5b [0098] As used herein, “PE5” refers to a prime editing composition comprising a PE3 and further comprising an MLH1 dominant negative protein variant (i.e., wild-type MLH1 with amino acids 754-756 truncated, which may be referred to as “MLH1 Δ754-756” or “MLH1dn”). The MLH1 dominant negative variant may be expressed in trans in some embodiments. In some embodiments, a PE5 system comprises a fusion protein comprising a
PE2 protein and an MLH1 dominant negative protein joined via an optional linker. “PE5b” refers to a prime editing composition comprising a PE3 and an MLH1 dominant negative protein, wherein the second-strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing the second strand nicking guide RNA with a spacer sequence that comprise complementarity to, and only hybridizes with, only the edited strand after installation of the desired nucleotide edit(s), but not the endogenous target DNA sequence. PEmax [0099] As used herein, “PEmax” refers to a prime editing composition comprising 1) a fusion protein comprising a Cas9 protein variant Cas9(R221K N39K H840A) and a variant MMLV RT having the following structure: [bipartite NLS]-[Cas9(R221K)(N394K)(H840A)]- [linker]-[MMLV_RT(D200N)(T330P)(L603W)]-[bipartite NLS]-[NLS] and 2) a desired PEgRNA, wherein the fusion protein (referred to as the PEmax protein) has the amino acid sequence of SEQ ID NO: 40, which is shown as follows: MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLG NTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMA KVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDST DKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPI NASGVDAKAILSARLSKSRKLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNF DLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNT EITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYID GGASQEEFYKFIKPILEKMDGTEELLVKLKREDLLRKQRTFDNGSIPHQIHLGE LHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETI TPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKV KYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEIS GVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERL KTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFA NRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQTVKV VDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQIL KEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDAIVPQSFLKD DSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTK AERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVI TLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVY GDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIET NGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKL IARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSS FEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELA LPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVIL ADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYT STKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSKRTADGSEFESPKKKR KVSGGSSGGSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQA PLIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGT NDYRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPT
SQPLFAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQ YVDDLLLAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEG QRWLTEARKETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTL FNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWR RPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALV KQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILA EAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGT SAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKN KDEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL IENSSPSGGSKRTADGSEFESPKKKRKVGSGPAAKRVKLD (SEQ ID NO: 40) KEY: BIPARTITE SV40 NUCLEAR LOCALIZATION SEQUENCE (NLS) TOP: (SEQ ID NO: 21), CAS9(R221K N39K H840A) (SEQ ID NO: 42) SGGSx2-BIPARTITE SV40NLS-SGGSx2 LINKER (SEQ ID NO: 65) M-MLV reverse transcriptase(D200N T306K W313F T330P L603W) (SEQ ID NO: 80) Other linker sequence (SEQ ID NO: 66) BIPARTITE SV40NLS (SEQ ID NO: 32) Other linker sequence c-Myc NLS (SEQ ID NO: 25) PE3max and PE3bmax [0100] As used herein, “PE3max” refers to a prime editing composition comprising a PEmax protein, a desired pegRNA, and a second strand nicking guide RNA. In some embodiments, PE3max can be considered as PE3 except wherein the PE2 component is substituted with PEmax. “PE3bmax” refers to a prime editing composition comprising a PEmax protein, a desired pegRNA, and a second strand nicking guide RNA, wherein the second-strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing the second strand nicking guide RNA with a spacer sequence that comprise complementarity to, and only hybridizes with, only the edited strand after installation of the desired nucleotide edit(s), but not the endogenous target DNA sequence. PE4max [0101] As used herein, “PE4max” refers to PE4 but wherein the PE2 component is substituted with PEmax.
PE5max and PE5bmax [0102] As used herein, “PE5max” refers to PE5 but wherein the PE2 component of PE3 is substituted with PEmax. “PE5bmax” refers to PE5b wherein the PE2 component of PE3 is substituted with PEmax. Polymerase [0103] As used herein, the term “polymerase” refers to an enzyme that synthesizes a nucleotide strand and that may be used in connection with the prime editor delivery systems described herein. The polymerase can be a “template-dependent” polymerase (i.e., a polymerase that synthesizes a nucleotide strand based on the order of nucleotide bases of a template strand). The polymerase can also be a “template-independent” polymerase (i.e., a polymerase that synthesizes a nucleotide strand without the requirement of a template strand). A polymerase may also be further categorized as a “DNA polymerase” or an “RNA polymerase.” In various embodiments, the prime editor system comprises a DNA polymerase. In various embodiments, the DNA polymerase can be a “DNA-dependent DNA polymerase” (i.e., whereby the template molecule is a strand of DNA). In such cases, the DNA template molecule can be a PEgRNA, wherein the extension arm comprises a strand of DNA. In such cases, the PEgRNA may be referred to as a chimeric or hybrid PEgRNA which comprises an RNA portion (i.e., the guide RNA components, including the spacer and the gRNA core) and a DNA portion (i.e., the extension arm). In various other embodiments, the DNA polymerase can be an “RNA-dependent DNA polymerase” (i.e., whereby the template molecule is a strand of RNA). In such cases, the PEgRNA is RNA, i.e., including an RNA extension. The term “polymerase” may also refer to an enzyme that catalyzes the polymerization of nucleotides (i.e., the polymerase activity). Generally, the enzyme will initiate synthesis at the 3′-end of a primer annealed to a polynucleotide template sequence (e.g., such as a primer sequence annealed to the primer binding site of a PEgRNA) and will proceed toward the 5′ end of the template strand. A “DNA polymerase” catalyzes the polymerization of deoxynucleotides. As used herein in reference to a DNA polymerase, the term DNA polymerase includes a “functional fragment thereof.” A “functional fragment thereof” refers to any portion of a wild-type or mutant DNA polymerase that encompasses less than the entire amino acid sequence of the polymerase and which retains the ability, under at least one set of conditions, to catalyze the polymerization of a polynucleotide. Such a functional fragment may exist as a separate entity, or it may be a constituent of a larger polypeptide, such as a fusion protein.
Prime editing [0104] As used herein, the term “prime editing” refers to an approach for gene editing using napDNAbps, a polymerase (e.g., a reverse transcriptase), and specialized guide RNAs that include a primer binding site and a DNA synthesis template for encoding desired new genetic information (or deleting genetic information) that is then incorporated into a target DNA sequence. Prime editing is described in Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019), which is incorporated herein by reference in its entirety. [0105] Prime editing represents a platform for genome editing that is a versatile and precise method to directly write new genetic information into a specified DNA site using a nucleic acid programmable DNA binding protein (“napDNAbp”) working in association with a polymerase (i.e., in the form of a fusion protein or otherwise provided in trans with the napDNAbp), wherein the prime editing system is programmed with a prime editing (PE) guide RNA (“PEgRNA”) that both specifies the target site and templates the synthesis of the desired edit in the form of a replacement DNA strand by way of an extension (either DNA or RNA) engineered onto a guide RNA (e.g., at the 5ʹ or 3ʹ end, or at an internal portion of a guide RNA). The replacement strand containing the desired edit (e.g., a single nucleobase substitution) shares the same sequence as the endogenous strand (or is homologous to it) immediately downstream of the nick site of the target site to be edited (with the exception that it includes the desired edit). Through DNA repair and/or replication machinery, the endogenous strand downstream of the nick site is replaced by the newly synthesized replacement strand containing the desired edit. In some cases, prime editing may be thought of as a “search-and-replace” genome editing technology since the prime editors, as described herein, not only search and locate the desired target site to be edited, but at the same time, encode a replacement strand containing a desired edit that is installed in place of the corresponding target site endogenous DNA strand. The prime editors of the present disclosure relate, in part, to the discovery that the mechanism of target-primed reverse transcription (TPRT) or “prime editing” can be leveraged or adapted for conducting precision CRISPR/Cas-based genome editing with high efficiency and genetic flexibility. TPRT is naturally used by mobile DNA elements, such as mammalian non-LTR retrotransposons and bacterial Group II introns. Cas protein-reverse transcriptase fusions or related systems are used to target a specific DNA sequence with a guide RNA, generate a single strand nick at the target site, and use the nicked DNA as a primer for reverse transcription of an engineered
reverse transcriptase template that is integrated with the guide RNA. However, while the concept begins with prime editors that use reverse transcriptase as the DNA polymerase component, the prime editors described herein are not limited to reverse transcriptases but may include the use of virtually any DNA polymerase. Indeed, while the application throughout may refer to prime editors with “reverse transcriptases,” it is set forth here that reverse transcriptases are only one type of DNA polymerase that may work with prime editing. Thus, wherever the specification mentions a “reverse transcriptase,” the person having ordinary skill in the art should appreciate that any suitable DNA polymerase may be used in place of the reverse transcriptase. Thus, in one aspect, the prime editors may comprise Cas9 (or an equivalent napDNAbp), which is programmed to target a DNA sequence by associating it with a specialized guide RNA (i.e., PEgRNA) containing a spacer sequence that anneals to a complementary sequence (the complementary sequence to an endogenous protospacer sequence) in the target DNA. The PEgRNA also contains new genetic information in the form of an extension that encodes a replacement strand of DNA containing a desired nucleotide change which is used to replace a corresponding endogenous DNA strand at the target site. To transfer information from the PEgRNA to the target DNA, the mechanism of prime editing involves nicking the target site in one strand of the DNA to expose a 3′-hydroxyl group. The exposed 3′-hydroxyl group can then be used to prime the DNA polymerization of the edit-encoding extension on PEgRNA directly into the target site. In various embodiments, the extension—which provides the template for polymerization of the replacement strand containing the edit—can be formed from RNA or DNA. In the case of an RNA extension, the polymerase of the prime editor can be an RNA-dependent DNA polymerase (such as a reverse transcriptase). In the case of a DNA extension, the polymerase of the prime editor may be a DNA-dependent DNA polymerase. The newly synthesized strand (i.e., the replacement DNA strand containing the desired nucleotide edit) that is formed by the prime editor would be homologous to the genomic target sequence (i.e., have the same sequence as), except for the inclusion of one or more desired nucleotide changes (e.g., a single nucleotide substitution, a deletion, or an insertion, or a combination thereof). The newly synthesized (or replacement) strand of DNA may also be referred to as a single strand DNA flap, which would compete for hybridization with the complementary homologous endogenous DNA strand, thereby displacing the corresponding endogenous strand. Resolution of the hybridized intermediate (also referred to as a heteroduplex, comprising the single strand DNA flap synthesized by the reverse transcriptase hybridized to the endogenous
DNA strand with the exception of mismatches at positions where desired nucleotide edits are installed in the edit strand) can include removal of the resulting displaced flap of endogenous DNA (e.g., with a 5ʹ end DNA flap endonuclease, FEN1), ligation of the synthesized single strand DNA flap to the target DNA, and assimilation of the desired nucleotide changes as a result of cellular DNA repair and/or replication processes. Because templated DNA synthesis offers single nucleotide precision for the modification of any nucleotide, including insertions and deletions, the scope of this approach is very broad and could foreseeably be used for myriad applications in basic science and therapeutics. In certain embodiments, the system can be combined with the use of an error-prone reverse transcriptase enzyme (e.g., provided as a fusion protein with the Cas9 domain, or provided in trans to the Cas9 domain). The error- prone reverse transcriptase enzyme can introduce alterations during synthesis of the single strand DNA flap. Thus, in certain embodiments, error-prone reverse transcriptase can be utilized to introduce nucleotide changes to the target DNA. Depending on the error-prone reverse transcriptase that is used with the system, the changes can be random or non-random.. [0106] In various embodiments, prime editing operates by contacting a target DNA molecule (for which a change in the nucleotide sequence is desired to be introduced) with a nucleic acid programmable DNA binding protein (napDNAbp) complexed with a prime editing guide RNA (PEgRNA). In various embodiments, the prime editing guide RNA (PEgRNA) comprises an extension at the 3′ or 5′ end of the guide RNA, or at an intramolecular location in the guide RNA and encodes the desired nucleotide change (e.g., single nucleotide substitution, insertion, or deletion). In step (a), the napDNAbp/extended gRNA complex contacts the DNA molecule, and the extended gRNA guides the napDNAbp to bind to a target locus. In step (b), a nick in one of the strands of DNA of the target locus is introduced (e.g., by a nuclease or chemical agent), thereby creating an available 3′ end in one of the strands of the target locus. In certain embodiments, the nick is created in the strand of DNA that corresponds to the R-loop strand, i.e., the strand that is not hybridized to the guide RNA sequence, i.e., the “non-target strand.” The nick, however, could be introduced in either of the strands. That is, the nick could be introduced into the R-loop “target strand” (i.e., the strand hybridized to the protospacer of the extended gRNA) or the “non-target strand” (i.e., the strand forming the single-stranded portion of the R-loop and which is complementary to the target strand). In step (c), the 3′ end of the DNA strand (formed by the nick) interacts with the extended portion of the guide RNA in order to prime reverse transcription (i.e., “target- primed RT”). In certain embodiments, the 3′ end DNA strand hybridizes to a specific RT
priming sequence on the extended portion of the guide RNA, i.e., the “reverse transcriptase priming sequence” or “primer binding site” on the PEgRNA. In step (d), a reverse transcriptase (or other suitable DNA polymerase) is introduced that synthesizes a single strand of DNA from the 3′ end of the primed site towards the 5′ end of the prime editing guide RNA. The DNA polymerase (e.g., reverse transcriptase) can be fused to the napDNAbp or alternatively can be provided in trans to the napDNAbp. This forms a single-strand DNA flap comprising the desired nucleotide change (e.g., the single base change, insertion, or deletion, or a combination thereof) and that is otherwise homologous to the endogenous DNA at or adjacent to the nick site. In step (e), the napDNAbp and guide RNA are released. Steps (f) and (g) relate to the resolution of the single strand DNA flap such that the desired nucleotide change becomes incorporated into the target locus. This process can be driven towards the desired product formation by removing the corresponding 5′ endogenous DNA flap that forms once the 3′ single strand DNA flap invades and hybridizes to the endogenous DNA sequence. Without being bound by theory, the cell’s endogenous DNA repair and replication processes resolve the mismatched DNA to incorporate the nucleotide change(s) to form the desired altered product. The process can also be driven towards product formation with “second strand nicking.” This process may introduce at least one or more of the following genetic changes: transversions, transitions, deletions, and insertions. [0107] The term “prime editor (PE) system” or “prime editor (PE)” or “PE system” or “PE editing system” refers the compositions involved in the method of genome editing using target-primed reverse transcription (TPRT) describe herein, including, but not limited to, the napDNAbps, reverse transcriptases, fusion proteins (e.g., comprising napDNAbps and reverse transcriptases), prime editing guide RNAs, and complexes comprising fusion proteins and prime editing guide RNAs, as well as accessory elements, such as second strand nicking components (e.g., second strand nicking sgRNAs) and 5′ endogenous DNA flap removal endonucleases (e.g., FEN1) for helping to drive the prime editing process towards the edited product formation. [0108] Although in the embodiments described thus far the PEgRNA constitutes a single molecule comprising a guide RNA (which itself comprises a spacer sequence and a gRNA core or scaffold) and a 5ʹ or 3ʹ extension arm comprising the primer binding site and a DNA synthesis template, the PEgRNA may also take the form of two individual molecules. For example, in some embodiments, a PEgRNA may comprises of a guide RNA and a trans prime editor RNA template (tPERT), which essentially houses the extension arm (including,
in particular, the primer binding site and the DNA synthesis domain) and an RNA-protein recruitment domain (e.g., MS2 aptamer or hairpin) in the same molecule which becomes co- localized or recruited to a modified prime editor complex that comprises a tPERT recruiting protein (e.g., MS2cp protein, which binds to the MS2 aptamer). Prime editor [0109] The term “prime editor” refers to the polypeptide or polypeptide components involved in prime editing as described herein. In some embodiments, a prime editor comprises a fusion constructs comprising a napDNAbp (e.g., Cas9 nickase) and a reverse transcriptase. In some embodiments, a prime editor is capable of carrying out prime editing on a target nucleotide sequence in the presence of a PEgRNA (or “extended guide RNA”). In some embodiments, a prime editor comprises a napDNAbp (e.g., Cas9 nickase) and a reverse transcriptase provided in trans, i.e., the napDNAbp and the reverse transcriptase are not fused. The in trans napDNAbp and the reverse transcriptase maybe tethered via a non-peptide linkage, e.g., a MS2 RNA-protein binding RNA sequence and a MS2 coat protein fused to either the napDNAbp or the reverse transcriptase, or may be unlinked to each other and simply recruited by the pegRNA. In some embodiments, a prime editor composition, system, or complex provided herein comprises a fusion protein or to the fusion protein complexed with a PEgRNA, and/or further complexed with a second-strand nicking sgRNA. In some embodiments, the prime editor system may also refer to the complex comprising a fusion protein (reverse transcriptase fused to a napDNAbp), a PEgRNA, and a regular guide RNA capable of directing the second-site nicking step of the non-edited strand as described herein. Primer binding site [0110] The term “primer binding site” or “PBS” refers to the portion of a PEgRNA as a component of the extension arm (e.g. at the 3ʹ end of the extension arm), and is a single- stranded portion of the PEgRNA as a component of the extension arm that comprises a region of complementarity to a sequence on the non-target strand of a double stranded target DNA. In some embodiments, the primer binding site is complementary to a region upstream of a nick site in a non-target strand. In some embodiments, the primer binding site is complementary to a region immediately upstream of a nick site in the non-target strand. In some embodiments, the primer binding site is capable of binding to the primer sequence that is formed after nicking of the edit strand (the non-target strand) of the target DNA sequence by the prime editor. When the prime editor (e.g., by a Cas9 nickase component of a prime editor) nicks the edit strand of the target DNA sequence, a free 3’ end is formed in the edit
strand, which serves a primer sequence that anneals to the primer binding site on the PEgRNA to prime reverse transcription. In some embodiments, the PBS is complementary to or substantially complementary to, and can anneal to a free 3’ end on the non-target strand of the double stranded target DNA at the nick site. In some embodiments, the PBS annealed to the free 3’ end on the non-target strand can initiate target-primed DNA synthesis. Protein, peptide, and polypeptide [0111] The terms “protein,” “peptide,” and “polypeptide” are used interchangeably herein and refer to a polymer of amino acid residues linked together by peptide (amide) bonds. The terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long. A protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins. One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc. A protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex. A protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide. A protein, peptide, or polypeptide may be naturally occurring, recombinant, or synthetic, or any combination thereof. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the contents of which are incorporated herein by reference. Protospacer [0112] As used herein, the term “protospacer” refers to the sequence (e.g. ~20 bp) in DNA adjacent to the PAM (protospacer adjacent motif) sequence. The protospacer shares the same sequence as the spacer sequence of the guide RNA (except that a protospacer contains Thymine and the spacer sequence contains Uracil). The guide RNA anneals to the complement of the protospacer sequence on the target DNA (specifically, one strand thereof,. ., the “target strand” versus the “non-target strand” of the target DNA sequence). In some embodiments, in order for a Cas nickase component of a prime editor to function, it also
requires a specific protospacer adjacent motif (PAM) that varies depending on the Cas protein component itself, e.g., the type of Cas protein and the bacterial species from which it is derived. The most commonly used Cas9 nuclease, derived from S. pyogenes, recognizes a PAM sequence of NGG that is directly downstream of the protospacer sequence in the genomic DNA, on the non-target strand.. Protospacer adjacent motif (PAM) [0113] As used herein, the term “protospacer adjacent motif” or “PAM” refers to a DNA sequence (e.g. approximately 2-6 nucleotide sequence) that is an important targeting component of a Cas, e.g., a Cas9, nuclease. For example, in some embodiments for a Cas9 nuclease, the PAM sequence is on either strand and is downstream in the 5ʹ to 3ʹ direction of the Cas9 cut site. The canonical PAM sequence (i.e., the PAM sequence that is associated with the Cas9 nuclease of Streptococcus pyogenes or SpCas9) is 5ʹ-NGG-3ʹ, wherein “N” is any nucleobase followed by two guanine (“G”) nucleobases. In some embodiments, SpCas9’s can also recognize additional non-canonical PAMs (e.g., NAG and NGA). [0114] Different PAM sequences can be associated with different Cas9 nucleases or equivalent proteins from different organisms. In addition, any given Cas9 nuclease, e.g., SpCas9, may be modified to alter the PAM specificity of the nuclease such that the nuclease recognizes an alternative PAM sequence. [0115] For example, with reference to the canonical SpCas9 amino acid sequence SEQ ID NO: 43, the PAM sequence can be modified by introducing one or more mutations, including (a) D1135V, R1335Q, and T1337R “the VQR variant”, which alters the PAM specificity to NGAN or NGNG, (b) D1135E, R1335Q, and T1337R “the EQR variant”, which alters the PAM specificity to NGAG, and (c) D1135V, G1218R, R1335E, and T1337R “the VRER variant”, which alters the PAM specificity to NGCG. In addition, the D1135E variant of canonical SpCas9 still recognizes NGG, but it is more selective compared to the wild type SpCas9 protein. [0116] It will also be appreciated that Cas9 enzymes from different bacterial species (i.e., Cas9 orthologs) can have varying PAM specificities. For example, Cas9 from Staphylococcus aureus (SaCas9) recognizes NGRRT or NGRRN. In addition, Cas9 from Neisseria meningitis (NmCas) recognizes NNNNGATT. In another example, Cas9 from Streptococcus thermophilis (StCas9) recognizes NNAGAAW. In still another example, Cas9 from Treponema denticola (TdCas) recognizes NAAAAC. These are examples and are not meant to be limiting. It will be further appreciated that non-SpCas9s bind a variety of PAM
sequences, which makes them useful when no suitable SpCas9 PAM sequence is present at the desired target cut site. Furthermore, non-SpCas9s may have other characteristics that make them more useful than SpCas9. For example, Cas9 from Staphylococcus aureus (SaCas9) is about 1 kilobase smaller than SpCas9, so it can be packaged into adeno- associated virus (AAV). Further reference is made to Shah et al., “Protospacer recognition motifs: mixed identities and functional diversity,” RNA Biology, 10(5): 891-899 (which is incorporated herein by reference). Reverse transcriptase [0117] The term “reverse transcriptase” describes a class of polymerases characterized as RNA-dependent DNA polymerases. All known reverse transcriptases require a primer to synthesize a DNA transcript from an RNA template. Historically, reverse transcriptase has been used primarily to transcribe mRNA into cDNA, which can then be cloned into a vector for further manipulation. Avian myoblastosis virus (AMV) reverse transcriptase was the first widely used RNA-dependent DNA polymerase (Verma, Biochim. Biophys. Acta 473:1 (1977)). The enzyme has 5ʹ-3ʹ RNA-directed DNA polymerase activity, 5ʹ-3ʹ DNA-directed DNA polymerase activity, and RNase H activity. RNase H is a processive 5ʹ and 3ʹ ribonuclease specific for the RNA strand for RNA-DNA hybrids (Perbal, A Practical Guide to Molecular Cloning, New York: Wiley & Sons (1984)). Errors in transcription cannot be corrected by reverse transcriptase because known viral reverse transcriptases lack the 3ʹ-5ʹ exonuclease activity necessary for proofreading (Saunders and Saunders, Microbial Genetics Applied to Biotechnology, London: Croom Helm (1987)). A detailed study of the activity of AMV reverse transcriptase and its associated RNaseH activity has been presented by Berger et al., Biochemistry 22:2365-2372 (1983). Another reverse transcriptase that is used extensively in molecular biology is reverse transcriptase originating from Moloney murine leukemia virus (M-MLV or “MMLV”). See, e.g., Gerard, G. R., DNA 5:271-279 (1986) and Kotewicz, M. L., et al., Gene 35:249-258 (1985). M-MLV reverse transcriptase substantially lacking in RNase H activity has also been described. See, e.g., U.S. Pat. No.5,244,797. The invention contemplates the use of any such reverse transcriptases, or variants or mutants thereof. [0118] In addition, the invention contemplates the use of reverse transcriptases that are error- prone, i.e., that may be referred to as error-prone reverse transcriptases or reverse transcriptases that do not support high fidelity incorporation of nucleotides during polymerization. During synthesis of the single-strand DNA flap based on the RT template
integrated with the guide RNA, the error-prone reverse transcriptase can introduce one or more nucleotides that are mismatched with the RT template sequence, thereby introducing changes to the nucleotide sequence through erroneous polymerization of the single-strand DNA flap. These errors introduced during synthesis of the single strand DNA flap then become integrated into the double strand molecule through hybridization to the corresponding endogenous target strand, removal of the endogenous displaced strand, ligation, and then through one more round of endogenous DNA repair and/or sequencing processes. The disclosure provides in some embodiments prime editor fusion proteins comprising MMLV RT. Reverse transcription [0119] As used herein, the term “reverse transcription” indicates the capability of an enzyme to synthesize a DNA strand (that is, complementary DNA or cDNA) using RNA as a template. In some embodiments, the reverse transcription can be “error-prone reverse transcription,” which refers to the properties of certain reverse transcriptase enzymes that are error-prone in their DNA polymerization activity. Second-strand nicking [0120] In some embodiments, prime editing results involves the resolution of heteroduplex DNA (i.e., containing one edited and one non-edited strand) formed as a result of installation of one or more desired nucleotide changes in the edit strand but not (yet) in the non-edit strand of the target DNA sequence. Resolution of the heteroduplex DNA (the edited strand paired with the endogenous non-edited strand) and installation of nucleotide changes corresponding to the desired nucleotide edits in the non-edit strand permanently integrates the desired edits in the target DNA sequence. The approach of “second-strand nicking” can be used herein to help drive the resolution of heteroduplex DNA in favor of permanent integration of the edited strand into the DNA molecule. As used herein, the concept of “second-strand nicking” refers to the introduction of a second nick on the unedited strand. In some embodiments, a second nick is introduced at a location on the non-edit strand corresponding to a position downstream of the first nick (i.e., the initial nick site that provides the free 3′ end for use in priming of the reverse transcriptase on the extended portion of the guide RNA) on the edit strand. Thus, the first nick (introduced by the prime editor in combination with the PEgRNA) and the second nick (introduced by the prime editor and a second-strand nicking guide RNA) are on opposite strands. Said another way, the first nick is on the non-target strand (i.e., the strand that forms the single strand portion of the R-loop),
and the second nick is on the target strand. Said still another way, the first nick (introduced by the prime editor in combination with the PEgRNA) is on the edit strand, and the second nick (introduced by the prime editor and second strand nicking guide RNA) is on the non-edit strand. The second nick can be introduced in the non-edit strand at a position that is opposite at least 1, 2, 3, 4, or 5 nucleotides downstream or upstream of the first nick of the edit strand, or that is opposite at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, or 150 or more nucleotides downstream or upstream of the first nick of the edit strand. The second nick can also be introduced in the non-edit strand at a position that is opposite at least 1, 2, 3, 4, or 5 nucleotides downstream or upstream of the edit site of the edit strand, or that is opposite at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, or 150 or more nucleotides downstream or upstream of the edit site of the edit strand. The second nick, in certain embodiments, can be introduced in the non-edit strand at a position that is opposite about 1-150 nucleotides downstream or upstream of the first nick of the edit strand, or that is opposite about 1-140, or about 1-130, or about 1-120, or about 1-110, or about 1-100, or about 1-90, or about 1-80, or about 1-70, or about 1-60, or about 1-50, or about 1-40, or about 1-30, or about 1-20, or about 1-10 nucleotides downstream or upstream of the first nick of the edit strand. Without being bound by theory, the second nick induces the cell’s endogenous DNA repair and replication processes towards replacement of the non-edit strand, thereby permanently installing the edited sequence on both strands of the target DNA and resolving the heteroduplex that is formed as a result of PE. [0121] In certain embodiments, the second strand nicking guide RNA (also referred to herein as the nicking guide RNA, ngRNA, secondary nicking RNA, or second strand nicking sgRNA) may include a spacer sequence that preferentially and/or selectively only anneals to the edit strand after the desired nucleotide edit(s) are installed but not to the original strand of DNA the becomes replaced by the edited strand (i.e., the 5´ single-strand DNA flap that is displaced and ultimately removed during heteroduplex resolution). This can operate by designing the second strand nicking guide RNA to comprise a spacer sequence that anneals only to the edited region of the edited strand (and thus, wherein the spacer of the second strand nicking guide RNA comprises a nucleotide sequence that is the complement of the edited sequence or region thereof and includes the complement of the edit) and thus, can discriminate between the edited strand and the original strand of the displaced 5´ single-
strand DNA flap that is immediately downstream of the cut site of the edited strand. This can be referred to as “temporal second-strand nicking” because the second strand nicking occurs only after prime editing has generated the new 3´ DNA flap containing the desired edit. This avoids the introduction of a double strand cut during prime editing which would otherwise result from the simultaneous or approximately simultaneous cutting of opposite strands by the PE complex comprising the PEgRNA and the PE complex comprising the second-strand cutting guide RNA. Spacer sequence [0122] As used herein, the term “spacer sequence” in connection with a guide RNA or a PEgRNA refers to the portion of the guide RNA or PEgRNA of about 20 nucleotides that contains a nucleotide sequence that shares the same sequence as the protospacer sequence in the target DNA sequence. The spacer sequence anneals to the complement of the protospacer sequence to form a ssRNA/ssDNA hybrid structure at the target site and a corresponding R loop ssDNA structure of the endogenous DNA strand. Silent mutation [0123] As used herein, the term “silent mutation” refers to a mutation in a nucleic acid molecule that does not have an effect on the phenotype of the nucleic acid molecule, or the protein it produces if it encodes a protein. Silent mutations can be introduced into coding regions of a nucleic acid (i.e., segments of a gene that encode for a protein), or they can be introduced in non-coding regions of a nucleic acid. A silent mutation in a nucleic acid sequence, e.g., in a target DNA sequence or in a DNA synthesis template sequence to be installed in the target sequence, may be a nucleotide alteration that does not result in expression or function of the amino acid sequence encoded by the nucleic acid sequence, or other functional features of the target nucleic acid sequence. When silent mutations are present in a coding region, they may be synonymous mutations. Synonymous mutations refer to substitutions of one base for another in a gene such that the corresponding amino acid residue of the protein produced by the gene is not modified. This is due to the redundancy of the genetic code, allowing for multiple different codons to encode for the same amino acid in a particular organism. When a silent mutation is in a noncoding region or a junction of a coding region and a non-coding region (e.g., an intron/exon junction), it may be in a region that does not impact any biological properties of the nucleic acid molecule (e.g., splicing, gene regulation, RNA lifetime, etc.). In particular embodiments, a silent mutation may also be a “benign” mutation, for example, where a nucleotide substitution results in one or more
alterations in the amino acid sequence encoded, but does not result in detrimental impact on the expression or function of the polypeptide. Silent mutations may be useful, for example, for increasing the length of contiguous changes in a desired nucleotide edit or the number of nucleotide edits made to a target nucleotide sequence using prime editing to evade correction of the edit by the MMR pathway as described herein. In certain embodiments, the number of silent mutations installed may be one, or two, or three, or four, or five, or six, or seven, or eight, or nine, or ten, or more. In certain other embodiments involving at least two silent mutations, the silent mutations may be installed within one, or two, or three, or four, or five, or six, or seven, or eight, or nine, or ten, or 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,22, 23, 24, or 25 nucleotides from the intended edit site. Subject [0124] The term “subject,” as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile, a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a research animal. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non-human subject. The subject may be of either sex and at any stage of development. Target site [0125] The term “target site” refers to a sequence within a nucleic acid molecule that is edited by a prime editor (PE) disclosed herein. The target site further refers to the sequence within a nucleic acid molecule to which a complex of the prime editor (PE) and gRNA binds. Temporal second-strand nicking [0126] As used herein, the term “temporal second-strand nicking” refers to a variant of second strand nicking whereby the installation of the second-strand nick in the unedited strand occurs only after the desired edit is installed in the edited strand by the PE complexed with the PEgRNA. Without being bound by theory, the second-strand nick in the unedited strand induces the cell’s endogenous DNA repair and replication processes towards replacement of the unedited strand, thereby permanently installing the edited sequence on both strands and resolving the heteroduplex that is formed as a result of PE. In some embodiments, a prime editor system comprising a second strand nicking guide RNA designed
with the temporal second strand nicking strategy, which can avoid concurrent nicks on both strands that could lead to double-stranded DNA breaks. The second-strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing a gRNA with a spacer sequence that matches only the edited strand, but not the original allele. Using this strategy, mismatches between the spacer of the second-strand nicking guide RNA and the unedited allele should disfavor second-strand nicking until after the editing event on the PAM strand takes place. In certain embodiments, the second strand nicking guide RNA may include a spacer sequence that preferentially and/or selectively only anneals to the edited strand (i.e., after PE synthesizes the edit), but not to the original strand of DNA the becomes replaced by the edited strand (i.e., the 5´ single-strand DNA flap that is displaced and ultimately removed during heteroduplex resolution). This can operate by designing the second strand nicking guide RNA to comprise a spacer sequence that anneals only to the edited region of the edited strand (and thus, wherein the spacer of the second strand nicking guide RNA comprises a nucleotide sequence that is the complement of the edited sequence or region thereof and includes the complement of the edit) and thus, can discriminate between the edited strand and the original strand of the displaced 5´ single-strand DNA flap that is immediately downstream of the cut site of the edited strand. This avoids the introduction of a double strand cut during prime editing which would otherwise result from the simultaneous or approximately simultaneous cutting of opposite strands by the PE complex comprising the PEgRNA and the PE complex comprising the second-strand cutting guide RNA. [0127] In some embodiments, a prime editor system (e.g., a PE3b system or a PE5b system) comprises components that improve temporal second-strand nicking by including PE-based installation of one or more silent mutations around an edit site (e.g.., introducing one or more silent mutations located upstream and/or downstream of a non-silent, desired nucleotide edit or adjacent to the non-silent nucleotide edit). In some embodiments, a prime editor system comprises a pegRNA, the DNA synthesis template of which comprises one or more non- silent nucleotide edits and further comprises one or more silent mutations compared to the endogenous sequence of the target strand (and accordingly encodes a single stranded DNA comprising the one or more non-silent nucleotide edits and the silent mutations compared to the endogenous sequence of the edit strand). In some embodiments, the one or more silent mutations are adjacent to or immediately adjacent to a non-silent nucleotide edit in the DNA synthesis template. For example, in some embodiments, the one or more silent mutations are
within 5 nucleotides upstream of the non-silent nucleotide edit. In some embodiments, the one or more silent mutations are within 5 nucleotides downstream of the non-silent nucleotide edit. In some embodiments, the one or more silent mutations are immediately adjacent to the non-silent nucleotide edit, such that the DNA synthesis template contains at least 3 contiguous nucleotides that are not complement to the corresponding endogenous sequence downstream of the nick site on the edit strand of the target DNA sequence. Without wishing to be bound by a particular theory, such silent mutations may improve prime editing efficiency by evading cellular mismatch repair pathway by avoiding reversion of the PE- installed edit on the edit strand back to the pre-edited sequence. In some embodiments, a prime editor system comprising a pegRNA with the one or more silent mutations in addition to the non-silent mutation in the DNA synthesis template can result in improved editing efficiency of the target DNA, as compared to a control prime editor system comprising a pegRNA that only contains the non-silent mutation and not the one or more silent mutations in the DNA synthesis template. In some embodiments, combining PE3b designs with the silent mutations can further improve prime editing efficiency and/or reduce indel frequency resulted from editing. This can operate by designing a second strand nicking guide RNA that comprises a spacer sequence that anneals only to the edited strand, which includes not only a desired edit, but also the one or more installed silent mutations that are installed at proximal continuous or non-continuous positions near the desired edit. The single-strand nicking guide RNA comprises a spacer sequence that is complementary to the PE-edited strand can discriminate between the edited strand and the original strand which corresponds to the displaced 5´ single-strand DNA flap that is immediately downstream of the first nick site of the edited strand. This improved strategy of temporal second-strand nicking avoids the introduction of a double strand cut during prime editing which would otherwise result from the simultaneous or approximately simultaneous cutting of opposite strands by the PE complex comprising the PEgRNA and the PE complex comprising the second-strand cutting guide RNA. [0128] The silent mutations may be installed in coding regions of the target nucleic acid molecule or in non-coding regions of the target nucleic acid molecule. When the silent mutations are installed in a coding region, they introduce into the nucleic acid molecule one or more alternate codons encoding the same amino acid as the unedited nucleic acid molecule. Alternatively, when the silent mutations are installed in a non-coding region, the silent mutations may be present in a region of the nucleic acid molecule that does not
influence splicing, gene regulation, RNA lifetime, or other biological properties of the target site on the nucleic acid molecule. Treatment [0129] The terms “treatment,” “treat,” and “treating,” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. As used herein, the terms “treatment,” “treat,” and “treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. In some embodiments, treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or progression of a disease. For example, treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence. Variant [0130] As used herein, the term “variant” should be taken to mean the exhibition of qualities that have a pattern that deviates from what occurs in nature, e.g., a variant Cas9 is a Cas9 comprising one or more changes in amino acid residues as compared to a wild type Cas9 amino acid sequence. The term “variant” encompasses homologous proteins having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% identity with a reference sequence and having the same or substantially the same functional activity or activities as the reference sequence. The term also encompasses mutants, truncations, or domains of a reference sequence that display the same or substantially the same functional activity or activities as the reference sequence. Vector [0131] The term “vector,” as used herein, refers to a nucleic acid that can be modified to encode a gene of interest and that is able to enter a host cell, mutate, and replicate within the host cell, and then transfer a replicated form of the vector into another host cell. Exemplary suitable vectors include viral vectors, such as retroviral vectors or bacteriophages and filamentous phage, and conjugative plasmids. Additional suitable vectors will be apparent to those of skill in the art based on the instant disclosure.
Wild type [0132] As used herein the term “wild type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene, or characteristic as it occurs in nature as distinguished from mutant or variant forms. DETAILED DESCRIPTION [0133] CRISPR-Cas systems allow for manipulation of genes in living systems with unprecedented speed, convenience, and programmability
1,2. CRISPR-derived editing agents for basic research have revolutionized the understanding of biological systems, and have also been used ex vivo and in vivo to treat patients with sickle cell disease, β-thalassemia, and transthyretin amyloidosis
3,4. The reliance of early gene editing techniques on double-stranded DNA breaks (DSBs), however, limits the types of edits that can be made with programmable nucleases such as CRISPR-Cas9 primarily to those that disrupt or delete genes. In addition, DSBs can also result in a variety of undesirable outcomes, such as unwanted mixtures of insertions and deletions (indels) at the target site, translocations
5–8, large deletions
9,10, aneuploidy
11,12, chromothrypsis
9,13, and p53 activation that can enrich oncogenic cells
14. While homology-directed repair (HDR) using DSBs and donor DNA templates has been successfully used to correct, rather than disrupt, mutations in cell types including stem cells and T cells
15–17, HDR-mediated correction has proven inefficient in most therapeutically relevant cell types due to the cell-cycle dependence of cellular machinery required for HDR. [0134] The difficulty of correcting genes using nucleases limits the ability to study and potentially treat genetic diseases, most of which require targeted gene correction, rather than gene disruption, for treatment. These considerations have stimulated the development of precision programmable gene correction technologies that do not require cutting the DNA double helix. One such example of a DSB-free gene editing method that can mediate gene correction, in addition to gene disruption, is base editing. Cytosine base editors (CBEs) and adenine base editors (ABEs) can precisely install C•G-to-T•A mutations and A•T-to-G•C mutations, respectively, without requiring DSBs
2,18–21. Base editors have been used both ex vivo and in vivo to rescue animal models of sickle cell disease
22, Hutchinson-Gilford Progeria
23, and several other genetic diseases
24, but are limited to the installation of transition point mutations and, in some cases, C•G-to-G•C transversions
25–29. [0135] To further expand the scope of precise gene correction without requiring DSBs, prime editing was developed
15. Prime editors (PEs) enable precise, highly versatile substitution, insertion, deletion, or combination edits through a DSB-independent mechanism
15. A prime
editor protein comprises a nucleic acid programmable DNA binding protein (napDNAbp), e.g., a Cas9 nickase, and a DNA polymerase, e.g., a reverse transcriptase (RT). The original prime editor, PE1, is composed of a Cas9(H840A) nickase fused to the M-MLV reverse transcriptase (RT) and uses a modified sgRNA called a prime editing guide RNA (pegRNA). A pegRNA comprises an additional extension arm, e.g., a 3′ extension arm compared to a “traditional” CRISPR guide RNA. In some embodiments, the extension arm comprises a DNA synthesis template, which encodes one or more desired nucleotide edits, and a primer- binding site (PBS), which comprises a region of complementarity to a sequence in the edit strand of the target DNA sequence, e.g., a sequence in the edit strand that is upstream of a nick site generated by the prime editor. When contacted with a double stranded target DNA sequence, the spacer of the pegRNA targets the prime editor protein to a specific target site, e.g., a specific target locus in genomic DNA in a cell. In some embodiments, the prime editor, e.g., through a Cas9 nickase domain, then binds and nicks the target DNA, exposing a free 3′ end. The PBS of the pegRNA then anneals to this 3′ end, and the DNA polymerase, e.g., the RT domain of the prime editor, uses the resulting DNA/RNA duplex as a substrate. The target DNA 3′ end serves as a primer, and the DNA polymerase, e.g., the RT, extends the free 3′ end, synthesizing a single stranded DNA sequence encoded by the DNA synthesis template (in cases where the polymerase is a RT, the DNA synthesis template is a reverse transcription template, or RTT) of the pegRNA. The resulting newly synthesized DNA 3′ flap contains the desired nucleotide edit (a substitution, insertion, deletion, or a combination thereof), optionally followed by downstream homology. In some embodiments, flap equilibration between the newly synthesized single stranded DNA (i.e., the DNA 3′ flap) and the corresponding endogenous sequence in the edit strand results in hybridization of the edited 3′ flap onto the unedited complementary target strand. Subsequent DNA repair, including the cell’s innate propensity to cleave 5′ DNA flaps, incorporates the edit into both target DNA strands (FIG.1). The PE2 prime editor uses an engineered RT that contains five mutations that together strongly increase the efficiency of prime editing. [0136] Prime editing intermediates may be susceptible to cellular mismatch repair (MMR), which can reduce prime editing efficiency by reverting the edited DNA strand back to the endogenous sequence
15,30. In some embodiments, a prime editing system comprises a prime editor protein, a pegRNA, and further comprises a second strand nicking guide RNA (ngRNA) that comprises a ngRNA spacer and a scaffold, wherein the ngRNA spacer comprises a region of complementarity to the edit strand of the double stranded target DNA
sequence. In some embodiments, the prime editing system is a PE3 system that comprises a PE2 protein, a pegRNA, and a second strand nicking guide RNA (ngRNA) that comprises a ngRNA spacer comprising a region of complementarity to the edit strand. In some embodiments, the ngRNA is capable of directing the prime editor, e.g., through the nicking activity of a Cas9 nickase component of the prime editor, to generate a second nick on the edit strand. Without wishing to be bound by theory, second strand nicking can mitigate the possibility of reverting the edited DNA strand back to the endogenous target DNA sequence. Because no 3′ extension is included on the ngRNA, a prime editor that engages this sgRNA only nicks the non-edited stand. Due to the nick-directed nature of eukaryotic MMR
18, the additional nick biases outcomes towards replacement of the nicked non-edited strand using the edited strand as a template
15. In some embodiments, PE3 achieves higher editing efficiency than PE2. Subsequent versions of prime editors, PE4 and PE5, transiently inhibit MMR to bias outcomes in favor of editing while also minimizing indels
30 (described in the prime editing developments section below). [0137] Compared to DSB-mediated genome editing techniques, prime editing offers a much higher editing:indel ratio and is less dependent on cellular repair pathways. Efficient prime editing has been demonstrated in many cell types, including primary cortical neurons, T cells, iPSCs, and patient-derived fibroblasts
15,30,31. Additionally, because the desired edit is encoded in the pegRNA, delivery of an exogenous DNA template is not required, which simplifies basic research experiments and greatly facilitates in vivo delivery. Finally, off-target edits are minimized in prime editing. Cas9-dependent off target editing is much less frequent with prime editors than with Cas9 nuclease
15,32–34, likely because prime editing requires three distinct DNA hybridization events with the spacer, PBS, and 3′ homology encoded by the pegRNA in order for productive editing to take place, and each event provides an opportunity to reject an off-target sequence. Additionally, three recent studies did not detect any Cas9- independent off target edits from prime editing, as measured by clonal whole-genome sequencing of edited human stem cell-derived intestinal and liver organoids, embryonic stem cells, and rice plants
33,35,36. Overall, prime editing offers versatile, efficient, and precise genome editing across many cell types with minimal off-target edits. The protocol described herein details how to use prime editing in mammalian cells and how to choose a prime editing system that is well-matched for a given application.
Methods for optimizing PE efficiency for a specific edit [0138] In some aspects, the present disclosure provides methods for optimizing prime editing efficiency for a particular target edit of interest (see, for example, FIG.15). Generally, there are four main categories that can be optimized when making an edit using prime editing: (1) pegRNA design, including, for example, the length of various components of the pegRNA such as the RTT and the PBS, as well as the addition of various motifs to the pegRNA (e.g., as used in epegRNAs); (2) selection of the prime editing system (e.g., selecting either PE2, PE3, PE4, or PE5, which may each have benefits under particular circumstances as discussed herein); (3) selection of the prime editor architecture (e.g., using the PEmax architecture in either PE2max, PE3max, PE4max, or PE5max); and (4) installation of silent mutations (e.g., for inhibiting MMR to avoid reversion of the installed edit, to introduce a protospacer for a second strand nicking sgRNA in a PE3b approach as discussed herein, and/or to introduce a noncanonical PAM that can be recognized by a second strand nicking sgRNA as discussed herein). Each of these decisions is dependent on the edit, the target cell type, the delivery method, and various other aspects of the particular prime editing experiment. Thus, in some aspects, the present disclosure provides guidelines for making these decisions. In some embodiments, the present disclosure provides methods for testing and selecting pegRNAs and ngRNAs for editing a target DNA sequence. [0139] In some aspects, the methods provided herein comprise designing optimized pegRNAs for a particular target of interest as described above. In some embodiments, it is beneficial to use epegRNAs over unmodified pegRNAs due to their increased efficiency. In some embodiments, a epegRNA comprises five components: a spacer, a scaffold, an RTT, a PBS, and a tevopreQ1 motif (FIG.2). In some embodiments, the scaffold and tevopreQ1 portions are constant, but the spacer, PBS, and RTT may be optimized for each editing target. In some embodiments, the epegRNA modification, e.g., an tevopreQ1 modification, is included in all pegRNA designs screened. [0140] Thus, in some embodiments, the methods described herein comprise a step of testing the efficiency of installation of a target edit of interest by a prime editor using two or more PEgRNAs, wherein each PEgRNA comprises a spacer sequence, a scaffold, a primer binding site (PBS) and a reverse transcriptase template (RTT). In some embodiments, PEgRNAs with different spacers can be tested to identify optimal PEgRNAs. In some embodiments, each of the two or more PEgRNAs comprises the same spacer sequence. In some embodiments, each of the RTT of the two or more PEgRNAs comprises the same nucleotide edit(s) to be
installed into the target DNA sequence. In some embodiments, each of the two or more pegRNAs comprises a different RTT length and/or PBS length. [0141] In some embodiments, the method of designing and selecting a pegRNA for a target DNA sequence involves scanning the target locus for candidate protospacer sequences that are immediately 5′ of an appropriate PAM sequence (e.g., NGG for SpCas9). For example, in some embodiments, the prime editor comprises a Cas9 nickase, and the corresponding pegRNA should be designed to install nucleotide edit(s) 3′ of the nick induced by the Cas9 domain of the prime editor. For editing with a prime editor having a Cas9 nickase, as a frame of reference, the first base 3′ of the epegRNA-induced nick—the first editable base—can be considered the +1 position. In some embodiments, targeting protospacers more proximal to the desired nucleotide edit position yields higher editing efficiencies compared to nucleotide edits that are distal from the protospacer. In some embodiments, the desired nucleotide edit is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides downstream of the nick site (i.e., at position +1, +2, +3, +4, +5, +6, +7, +8, +9, +10, +11, +12, +13, +14, +15, +16, +17, +18, +19, +20, +21, +22, +23, +24, or +25). In some embodiments, the desired nucleotide edit is at most 5, at most 10, at most 15, or at most 20 nucleotides downstream of the nick site. In some embodiments, the desired nucleotide edit is less than 5 nucleotides downstream of the nick site. In some embodiments, the desired nucleotide edit is less than 10 nucleotides downstream of the nick site. In some embodiments, ideal candidate protospacer sequences are chosen as close to the desired editing site as possible while keeping the target site in the editable region of a prime editor, e.g., a Cas9 prime editor (i.e., 3′ of the nick, see FIG.3). [0142] In some embodiments, a target site and protospacer sequence is chosen such that the 5′ most nucleotide of a pegRNA (e.g., for a pegRNA having the configuration 5′-spacer sequence-scaffold-RTT-PBS, the 5′ most nucleotide of the spacer sequence) is a Guanine (G). In some embodiments, e.g., where the 5′ most nucleotide of a chosen spacer is not G, a 5′ G is added at the 5′ end of the pegRNA spacer to ensure efficient initiation of transcription from a U6 RNA polymerase III. [0143] After identifying candidate protospacers, PBS and RTT lengths may be optimized. In some embodiments, these lengths are optimized empirically for a specific edit to maximize editing efficiency. Various PBS and RTT lengths may be screened. In some embodiments, an optimal PBS is 8 to 15 nt in length. In some embodiments, an optimal RTT is 10 to 74 nt in length. In some embodiments, the methods described herein comprise examining a matrix of
PBS and RTT length combinations for each protospacer. For example, in some embodiments, PBS lengths of 10, 13, and 15 nt may be screened for a particular edit site. [0144] Unlike the PBS, the RTT design is dictated by the edit to be installed
15. For small changes such as SNPs, the shortest RTT length tested should generally encode at least approximately 7 nt of homology downstream of the edit to promote hybridization to the complementary genomic strand. In some embodiments comprising installation of larger edits (e.g., the insertion of epitope tags), a longer stretch of downstream homology (e.g., ~20 nt minimum) may be used. In some embodiments, two longer RTT lengths (e.g., ~4-10 nt longer than the minimum) may be tested as well. This creates a 3 PBS x 3 RTT matrix, representing 9 epegRNAs total for a first-pass assessment. This process is summarized in FIGs.3 and 15. In some embodiments, the methods described herein are performed in an immortalized cell line. In some embodiments, the methods are performed in an immortalized human cell line. In some embodiments, the methods are performed in a cell line capable of achieving stable growth and/or efficient transfection, e.g., transfection with a trans-gene bearing plasmid. Such cell lines may be referred to as “workhorse cell lines.” In some embodiments, a workhorse cell line exhibits plasmid transfection efficiency of at least 70%, 80%, 85%, 90%, 95%, or 99% when tested with Calcium phosphate transfection method as described in Kingston et al. , Curr. Protoc. Mol. Biol. Chapter 9: Unit 9.1 (2003). In certain embodiments, the workhorse cell line comprises HEK293 cells. In some embodiments, the workhorse cell line comprises HEK293T cells (e.g., for human targets). In certain embodiments, the workhorse cell lines comprises N2A cells (e.g., for murine targets). In some embodiments, the pegRNAs comprising various RTT and PBS lengths are screened on the exact target sequence for editing. In certain embodiments, a cell line that harbors the target mutation is created to screen the pegRNAs. [0145] In some embodiments, the methods described herein comprise optimizing the pegRNAs such that they do not comprise four or more consecutive uridines in the pegRNA sequence (e.g., to avoid premature truncation when expressed from a U6 promoter). In some embodiments, the sequences of the spacer, PBS, and RTT avoid poly(U) tracts. In some embodiments, the methods described herein comprise ensuring that the RTT sequence does not begin with a cytosine. In some embodiments, the methods described herein optionally further comprise optimizing the RTT and PBS lengths one or more additional times by testing varying RTT and/or PBS lengths.
[0146] In some aspects, the methods described herein further comprise screening one or more additional parameters of the pegRNA and/or prime editor. For example, five prime editing systems have been reported to date. PE2, PE3, PE4, and PE5 (including the PE3b and PE5b systems and the PEmax systems described herein) can each be favored for various applications as described further herein. Thus, in some embodiments, the methods described herein further comprise testing pegRNAs with PE2, PE3, PE3b, PE4, PE5, and/or PE5b, (and/or each of these prime editors comprising the PEmax architecture) to determine which prime editor provides the optimal editing efficiency for the desired modification. [0147] In some embodiments, the methods described herein further comprise designing a secondary nicking guide (e.g., when using the PE3 and PE5 systems). Several nicking guide spacer sequences may be tested to maximize editing efficiency while minimizing the incorporation of indels. In some embodiments, the optimal secondary nick is approximately 50-90 nt upstream or downstream of the pegRNA (e.g., an epegRNA)-induced first nick. In some embodiments, if a PAM is positioned near the desired edit, a PE3b/PE5b nicking sgRNA, which only nicks after prime editing occurs, may be used. In some embodiments, the spacer sequence of the nicking sgRNA is designed such that it overlaps with the edited base(s) on the other strand (e.g., as shown in FIG.5). In some embodiments, the nicking sgRNA comprises a 5′ G at the start of the spacer for transcription initiation. In some embodiments, the nicking sgRNA is re-optimized after transitioning between different cell lines, such as from a workhorse cell line to a target cell line. [0148] In some embodiments, the methods comprise further testing the selected pegRNAs in PE4 and/or PE5 systems. In PE4 and PE5, an extra plasmid or other construct providing MLH1dn maybe added to the transfection mixture. The addition of MLH1dn can drastically improve editing efficiency for the same edit in a more MMR-competent cell type. Therefore, even if using PE4 or PE5 in initial screening in a workhorse cell line, e.g., HEK293T cells, shows modest benefits, these PE systems may be tested again later on in the target cell type. Using MMR-evading silent edits to design PE3b second strand nicking guide RNA spacers [0149] As described herein, the inclusion of continuous or semi-continuous silent edits near a prime edit (e.g., contiguous silent edits immediately adjacent to a desired non-silent edit, for example, resulting in three or more contiguous nucleotides in the DNA synthesis template or the newly synthesized 3′ DNA flap that are different from the endogenous target DNA sequence, or one or more or two or more nucleotide edits within 5 nucleotides upstream or
downstream of a desired non-silent edit) can increase edit installation efficiency and reduce indels. In some aspects, the present disclosure provides methods of prime editing in which these silent mutations used to evade MMR also allow for designs of second strand nicking guide RNA (ngRNA) spacers to allow a PE3b approach. As described herein, in a PE3b approach, the nick on the target strand only occurs after the desired nucleotide edit(s) has been installed on the edited strand, but before the edit has been incorporated into the non-edit strand. In some embodiments, prime editing with the PE3b approach results in a reduction in indels compared to a PE3 approach. [0150] In some embodiments, one or more or several MMR-evading silent edits can be installed around a desired edit, e.g., a non-silent nucleotide edit. PE3b ngRNA spacers can be designed to use these silent edits to more effectively discriminate between edited and unedited DNA strand. Without wishing to be bound by theory, PE3b ngRNAs that comprises a spacer having complementarity to the edited edit strand of the target DNA comprising the one or more installed silent edits in addition to the non-silent nucleotide edit are less likely to unintentionally nick the target strand DNA before a prime edit has been installed (compared to a PE3b ngRNA comprising a spacer sequence that only rely on the non-silent nucleotide edit, e.g., a single nucleotide edit, to selectively nick after an edit has been installed). Because these newly described PE3b nicks rely on the installation of several silent edits to make nicking possible, the temporal order of nicking is tightly controlled (compared to a PE3b strategy that only relies on a single edit). As a result, nicking only occurs after the initial pegRNA edit has occurred, greatly reducing indels. The effect of this enhanced selectivity may be a reduced rate of observed indels. Accordingly, in some embodiments, provided herein are method of prime editing comprising contacting a target DNA with a prime editing composition comprising a prime editor protein, a pegRNA, and an ngRNA, wherein the pegRNA comprises a DNA synthesis template comprising at least a non-silent nucleotide edit and one or more silent nucleotide edits compared to the endogenous sequence of the target DNA, and wherein the ngRNA comprises a spacer sequence that comprises a region of complementarity to a sequence in the edit strand of the target DNA that comprises the at least one of the one or more silent edits. In some embodiments, the ngRNA comprises a spacer sequence that comprises a region of complementarity to a sequence in the edit strand of the target DNA that comprises the at least the non-silent edit and at least one of the one or more silent edits. In some embodiments, the ngRNA comprises a spacer sequence that comprises a region of complementarity to a sequence in the edit strand of the target DNA that comprises
the at least two of the one or more silent edits. In some embodiments, contacting the target DNA with a prime editing system comprising 1) the pegRNA comprising a DNA synthesis template comprising a non-silent edit and one or more silent edits compared to the endogenous sequence of the target DNA and 2) the ngRNA comprising a spacer sequence comprising a region of complementarity to a sequence in the edit strand of the target DNA that comprises the one or more silent edits results in reduced indel frequency compared to contacting the target DNA with a control prime editing system comprising a pegRNA having a DNA synthesis template only having the non-silent edit and an ngRNA that comprises a spacer sequence comprising a region of complementarity to a sequence in the edit strand comprising the non-silent edit (but no silent edit introduced). In some embodiments, contacting the target DNA with a prime editing system comprising 1) a pegRNA comprising a DNA synthesis template comprising a non-silent edit and one or more silent edits and 2) the ngRNA comprising a spacer sequence comprising a region of complementarity to a sequence in the edit strand of the target DNA that comprises the one or more silent edits results in an editing efficiency higher than contacting the target DNA with a control prime editing system comprising a pegRNA having a DNA synthesis template only having the non-silent edit and an ngRNA that comprises a spacer sequence comprising a region of complementarity to a sequence in the edit strand comprising the non-silent edit (but no silent edit introduced). In some embodiments, the prime editing system comprises a pegRNA and a PE3b (or PE5b) ngRNA, wherein the pegRNA comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ,12, 13, 14, 15, or more silent edits. In some embodiments, the silent edits are contiguous. In some embodiments, at least 2, 3, 4, 5, or more silent edits are contiguous. In some embodiments, each silent edit is a silent nucleotide substitution. In some embodiments, contacting the target DNA with a prime editing system comprising 1) a pegRNA comprising a DNA synthesis template comprising a non-silent edit and one or more silent edits and 2) the ngRNA comprising a spacer sequence comprising a region of complementarity to a sequence in the edit strand of the target DNA that comprises the one or more silent edits results in an indel frequency of at most 10%, at most 7.5%, at most 5%, at most 2.5%, at most 2%, at most 1.5%, at most 1%, or at most 0.5%. In some embodiments, contacting the target DNA with a prime editing system comprising 1) a pegRNA comprising an DNA synthesis template comprising a non-silent edit and one or more silent edits and 2) the ngRNA comprising a spacer sequence comprising a region of complementarity to a sequence in the edit strand of the target DNA that comprises the one or more silent edits results in an editing efficiency of at least 10%, at least 15%, at
least 20%, at least 25%, at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, or at least 90%. In some embodiments, editing efficiency and/or indel frequency of contacting a target DNA is determined by contacting the target DNA in a population of cells, e.g., a population of cells containing the target DNA in the genome, and calculating percentage of editing and indels based on high throughput sequencing of the population of cells, e.g., by Miseq. [0151] The approach for incorporating silent MMR-evading edits that allow for nicking sgRNA spacer designs described herein may be generalizable to any site where continuous or semi-continuous MMR-evading silent edits are used. In some embodiments, possible PE3b protospacers that overlap the installed silent and non-silent edits (e.g., corrective edits that correct a disease associated mutation in a gene) may be designed, incorporated into pegRNAs, and tested. In some embodiments, the number of silent edits in the PE3b protospacer is maximized to increase the selectivity of the PE3b nicking sgRNA for nicking only after all silent edits have been installed. Such an approach may lead to reduced rates of indel observation through the identification of PE3b sgRNAs that are highly selective for editing only after pegRNA edits have been installed. Using noncanonical PAMs to allow for PE3 and PE3b ngRNA designs [0152] In some embodiments, prime editing systems rely on the PAM specificity of the napDNAbp component of the prime editor, e.g., Cas9 nickase derived from WT SpCas9 to identify protospacers for pegRNAs and nicking sgRNAs. While SpCas9’s canonical PAM preference is NGG, it also capable of recognizing non-canonical PAMs (e.g., NAG and NGA). [0153] As described further herein, it was discovered by the inventors that second strand nicking sgRNAs can utilize non-canonical PAMs of SpCas9, such as NAG, to introduce a secondary nick. Prior to the present disclosure, the use of a non-canonical PAM for a secondary nicking sgRNA has not been previously considered for use in PE3, PE3b, PE5, and PE5b approaches. Introducing non-canonical PAMs into consideration for PE3, PE3b, PE5, and PE5b secondary nicks broadens the potential options of viable nicking sgRNAs that can be used. Previously, the availability of second strand nicking sgRNAs in PE3, PE3b, PE5, and PE5b approaches was limited because many prime editing systems were designed based on SpCas9’s recognition preference of canonical NGG PAM. The observation that these
prime editors can also take advantage of WT SpCas9’s non-canonical PAMs such as NAG and NGA, as described herein, broadens the possibility of available nicking sgRNAs. [0154] Thus, in some aspects, the present disclosure provides methods for prime editing that comprise introducing MMR-evading silent mutations that double as a non-canonical PAM that can be recognized by a second strand nicking sgRNA. The present disclosure contemplates the use of a nicking protospacer with a non-canonical PAM for PE3, PE3b, PE5, and PE5b approaches. The consideration of nicking protospacers with a non-canonical PAM for PE3, PE3b, PE5, and PE5b approaches will increase the number of available nicking sgRNAs to screen and improve the chances of finding an optimal nick for a given application. Accordingly, provided herein are methods of prime editing comprising contacting a double-stranded target DNA sequence with a prime editing system, wherein the prime editing system comprises: (i) a prime editor comprising a napDNAbp and a reverse transcriptase, (ii) a prime editing guide RNA (PEgRNA) comprising: (a) a spacer sequence that comprises a region of complementarity to the non-edit strand of the double-stranded target DNA sequence (b) an extension arm that comprises a DNA synthesis template and a primer binding site, wherein the primer binding site comprises a region of complementarity to a region upstream of a first nick site in the edit strand of the double-stranded target DNA sequence, and wherein the DNA synthesis template encodes a single strand DNA that comprises one or more nucleotide changes compared to a region downstream of the first nick site in the edit strand of the double-stranded target DNA sequence, and (c) a gRNA core that interacts with the napDNAbp; and (iii) a nicking guide RNA (ngRNA) comprising: (a) a spacer sequence that comprises a region of complementarity to the single strand DNA encoded by the DNA synthesis template that comprises the one or more nucleotide changes, and (b) a gRNA core that interacts with the napDNAbp, wherein the first spacer sequence is identical to a first protospacer sequence directly adjacent to a first PAM in the non-edit strand, wherein the second spacer sequence is identical to a second protospacer sequence directly adjacent to a second PAM in the edit strand, wherein the first PAM and the second PAM are different from each other based on comparison between the non-degenerate sequences (i.e., the non-“N” nucleotides) of the first and second PAM sequences, wherein the contacting installs the one or more nucleotide changes in the
double-stranded target DNA sequence, thereby modifying the double-stranded target DNA sequence. napDNAbp [0155] In various embodiments, the prime editor proteins utilized in the methods described herein comprise a nucleic acid programmable DNA binding protein (napDNAbp). [0156] In various embodiments, prime editor fusion proteins may include a napDNAbp domain having a wild type Cas9 sequence, including, for example the canonical Streptococcus pyogenes Cas9 sequence of SEQ ID NO: 43, shown as follows.
[0157] In other embodiments, the prime editor fusion proteins may include a napDNAbp domain having a modified Cas9 sequence, including, for example the nickase variant of Streptococcus pyogenes Cas9 of SEQ ID NO: 44 having an H840A substitution relative to the wild type SpCas9 (of SEQ ID NO: 43), shown as follows:
[0158] The prime editor fusion proteins described herein may include any of the modified Cas9 sequences described above, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto. In some embodiments, the prime editor fusion proteins used in the methods described herein include any of the
following other wild type SpCas9 sequences, which may be modified with one or more of the mutations described herein at corresponding amino acid positions:
[0159] The prime editor fusion proteins used in the methods described herein may include any of the above SpCas9 sequences, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto. In other embodiments, the Cas9 protein can be a wild type Cas9 ortholog from another bacterial species different from the canonical Cas9 from S. pyogenes. For example, modified versions of the following Cas9 orthologs can be used in connection with the prime editor fusion proteins described in this specification by making mutations at positions corresponding to H840A or any other amino acids of interest in wild type SpCas9. In addition, any variant Cas9 orthologs having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to any of the below orthologs may also be used with the prime editors.
[0160] The napDNAbp used in the prime editor fusion proteins described herein may include any suitable homologs and/or orthologs or naturally occurring enzymes, such as, Cas9. Cas9 homologs and/or orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. The Cas moiety may be configured (e.g., mutagenized, recombinantly engineered, or otherwise obtained from nature) as a nickase, i.e., capable of cleaving only a single strand of the target double-stranded DNA. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease has an inactive (e.g., an inactivated) DNA cleavage domain; that is, the Cas9 is a nickase. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the Cas9 orthologs in the above tables.
Reverse transcriptase domain [0161] In various embodiments, the prime editors used in the methods described herein comprise a reverse transcriptase domain. In some embodiments, the reverse transcriptase domain is a wild type MMLV reverse transcriptase. In some embodiments, the reverse transcriptase domain is a variant of wild type MMLV reverse transcriptase having the amino acid sequence of SEQ ID NO: 80. [0162] For example, PE2 and PEmax comprise a variant reverse transcriptase domain of SEQ ID NO: 80, which is based on the wild type MMLV reverse transcriptase domain of SEQ ID NO: 79 (and, in particular, a Genscript codon optimized MMLV reverse transcriptase having the nucleotide sequence of SEQ ID NO: 79) and which comprises amino acid substitutions D200N T306K W313F T330P L603W relative to the wild type MMLV RT of SEQ ID NO: 80. The amino acid sequence of the variant RT of PE2 and PEmax is SEQ ID NO: 80. [0163] Prime editors may also comprise other variant RTs as well. In various embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising one or more of the following mutations: P51L, S67K, E69K, L139P, T197A, D200N, H204R, F209N, E302K, E302R, T306K, F309N, W313F, T330P, L345G, L435G, N454K, D524G, E562Q, D583N, H594Q, L603W, E607K, or D653N in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence. [0164] Some exemplary reverse transcriptases that can be fused to napDNAbp proteins or provided as individual proteins according to various embodiments of this disclosure are provided below. Exemplary reverse transcriptases include variants with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the following wild-type enzymes or partial enzymes:
[0165] In various other embodiments, the prime editors described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising one or more of the following mutations: P51X, S67X, E69X, L139X, T197X, D200X, H204X, F209X, E302X, T306X, F309X, W313X, T330X, L345X, L435X, N454X, D524X, E562X, D583X, H594X, L603X, E607X, or D653X in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. [0166] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a P51X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is L. [0167] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an S67X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is K.
[0168] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an E69X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is K. [0169] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an L139X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is P. [0170] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a T197X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is A. [0171] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a D200X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is N. [0172] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an H204X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is R. [0173] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an F209X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is N. [0174] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an E302X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding
amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is K. [0175] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an E302X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is R. [0176] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a T306X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is K. [0177] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an F309X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is N. [0178] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a W313X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is F. [0179] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a T330X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is P. [0180] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an L345X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is G.
[0181] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an L435X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is G. [0182] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an N454X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is K. [0183] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a D524X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is G. [0184] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an E562X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is Q. [0185] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a D583X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is N. [0186] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an H594X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is Q. [0187] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an L603X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding
amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is W. [0188] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising an E607X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is K. [0189] In various other embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising a D653X mutation in the wild type M-MLV RT of SEQ ID NO: 79, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. In certain embodiments, X is N. [0190] Some exemplary reverse transcriptases that can be fused to napDNAbp proteins or provided as individual proteins according to various embodiments of this disclosure are provided below. Exemplary reverse transcriptases include variants with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the wild-type enzymes or partial enzymes described in SEQ ID NOs: 81-96. [0191] The prime editor (PE) system described here contemplates any publicly-available reverse transcriptase described or disclosed in any of the following U.S. patents (each of which are incorporated by reference in their entireties): U.S. Patent Nos: 10,202,658; 10,189,831; 10,150,955; 9,932,567; 9,783,791; 9,580,698; 9,534,201; and 9,458,484, and any variant thereof that can be made using known methods for installing mutations, or known methods for evolving proteins. The following references describe reverse transcriptases in known the art. Each of their disclosures are incorporated herein by reference in their entireties. [0192] Herzig, E., Voronin, N., Kucherenko, N. & Hizi, A. A Novel Leu92 Mutant of HIV-1 Reverse Transcriptase with a Selective Deficiency in Strand Transfer Causes a Loss of Viral Replication. J. Virol.89, 8119–8129 (2015). [0193] Mohr, G. et al. A Reverse Transcriptase-Cas1 Fusion Protein Contains a Cas6 Domain Required for Both CRISPR RNA Biogenesis and RNA Spacer Acquisition. Mol. Cell 72, 700-714.e8 (2018). [0194] Zhao, C., Liu, F. & Pyle, A. M. An ultraprocessive, accurate reverse transcriptase encoded by a metazoan group II intron. RNA 24, 183–195 (2018).
[0195] Zimmerly, S. & Wu, L. An Unexplored Diversity of Reverse Transcriptases in Bacteria. Microbiol Spectr 3, MDNA3-0058–2014 (2015). [0196] Ostertag, E. M. & Kazazian Jr, H. H. Biology of Mammalian L1 Retrotransposons. Annual Review of Genetics 35, 501–538 (2001). [0197] Perach, M. & Hizi, A. Catalytic Features of the Recombinant Reverse Transcriptase of Bovine Leukemia Virus Expressed in Bacteria. Virology 259, 176–189 (1999). [0198] Lim, D. et al. Crystal structure of the moloney murine leukemia virus RNase H domain. J. Virol.80, 8379–8389 (2006). [0199] Zhao, C. & Pyle, A. M. Crystal structures of a group II intron maturase reveal a missing link in spliceosome evolution. Nature Structural & Molecular Biology 23, 558–565 (2016). [0200] Griffiths, D. J. Endogenous retroviruses in the human genome sequence. Genome Biol.2, REVIEWS1017 (2001). [0201] Baranauskas, A. et al. Generation and characterization of new highly thermostable and processive M-MuLV reverse transcriptase variants. Protein Eng Des Sel 25, 657–668 (2012). [0202] Zimmerly, S., Guo, H., Perlman, P. S. & Lambowltz, A. M. Group II intron mobility occurs by target DNA-primed reverse transcription. Cell 82, 545–554 (1995). [0203] Feng, Q., Moran, J. V., Kazazian, H. H. & Boeke, J. D. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905–916 (1996). [0204] Berkhout, B., Jebbink, M. & Zsíros, J. Identification of an Active Reverse Transcriptase Enzyme Encoded by a Human Endogenous HERV-K Retrovirus. Journal of Virology 73, 2365–2375 (1999). [0205] Kotewicz, M. L., Sampson, C. M., D’Alessio, J. M. & Gerard, G. F. Isolation of cloned Moloney murine leukemia virus reverse transcriptase lacking ribonuclease H activity. Nucleic Acids Res 16, 265–277 (1988). [0206] Arezi, B. & Hogrefe, H. Novel mutations in Moloney Murine Leukemia Virus reverse transcriptase increase thermostability through tighter binding to template-primer. Nucleic Acids Res 37, 473–481 (2009). [0207] Blain, S. W. & Goff, S. P. Nuclease activities of Moloney murine leukemia virus reverse transcriptase. Mutants with altered substrate specificities. J. Biol. Chem.268, 23585– 23592 (1993).
[0208] Xiong, Y. & Eickbush, T. H. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J 9, 3353–3362 (1990). [0209] Herschhorn, A. & Hizi, A. Retroviral reverse transcriptases. Cell. Mol. Life Sci.67, 2717–2747 (2010). [0210] Taube, R., Loya, S., Avidan, O., Perach, M. & Hizi, A. Reverse transcriptase of mouse mammary tumour virus: expression in bacteria, purification and biochemical characterization. Biochem. J.329 ( Pt 3), 579–587 (1998). [0211] Liu, M. et al. Reverse Transcriptase-Mediated Tropism Switching in Bordetella Bacteriophage. Science 295, 2091–2094 (2002). [0212] Luan, D. D., Korman, M. H., Jakubczak, J. L. & Eickbush, T. H. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72, 595–605 (1993). [0213] Nottingham, R. M. et al. RNA-seq of human reference RNA samples using a thermostable group II intron reverse transcriptase. RNA 22, 597–613 (2016). [0214] Telesnitsky, A. & Goff, S. P. RNase H domain mutations affect the interaction between Moloney murine leukemia virus reverse transcriptase and its primer-template. Proc. Natl. Acad. Sci. U.S.A.90, 1276–1280 (1993). [0215] Halvas, E. K., Svarovskaia, E. S. & Pathak, V. K. Role of Murine Leukemia Virus Reverse Transcriptase Deoxyribonucleoside Triphosphate-Binding Site in Retroviral Replication and In Vivo Fidelity. Journal of Virology 74, 10349–10358 (2000). [0216] Nowak, E. et al. Structural analysis of monomeric retroviral reverse transcriptase in complex with an RNA/DNA hybrid. Nucleic Acids Res 41, 3874–3887 (2013). [0217] Stamos, J. L., Lentzsch, A. M. & Lambowitz, A. M. Structure of a Thermostable Group II Intron Reverse Transcriptase with Template-Primer and Its Functional and Evolutionary Implications. Molecular Cell 68, 926-939.e4 (2017). [0218] Das, D. & Georgiadis, M. M. The Crystal Structure of the Monomeric Reverse Transcriptase from Moloney Murine Leukemia Virus. Structure 12, 819–829 (2004). [0219] Avidan, O., Meer, M. E., Oz, I. & Hizi, A. The processivity and fidelity of DNA synthesis exhibited by the reverse transcriptase of bovine leukemia virus. European Journal of Biochemistry 269, 859–867 (2002). [0220] Gerard, G. F. et al. The role of template-primer in protection of reverse transcriptase from thermal inactivation. Nucleic Acids Res 30, 3118–3129 (2002).
[0221] Monot, C. et al. The Specificity and Flexibility of L1 Reverse Transcription Priming at Imperfect T-Tracts. PLOS Genetics 9, e1003499 (2013). [0222] Mohr, S. et al. Thermostable group II intron reverse transcriptase fusion proteins and their use in cDNA synthesis and next-generation RNA sequencing. RNA 19, 958–970 (2013). Nuclear localization sequences (NLS) [0223] In various embodiments, the prime editor fusion proteins described herein may comprise one or more nuclear localization sequences (NLS), which help promote translocation of a protein into the cell nucleus. Such sequences are well-known in the art and can include the following examples:
[0224] The NLS examples above are non-limiting. The prime editor fusion proteins used in the presently described methods may comprise any known NLS sequence, including any of those described in Cokol et al., “Finding nuclear localization signals,” EMBO Rep., 2000, 1(5): 411-415 and Freitas et al., “Mechanisms and Signals for the Nuclear Import of Proteins,” Current Genomics, 2009, 10(8): 550-7, each of which are incorporated herein by reference. [0225] In various embodiments, the fusion proteins and constructs encoding the fusion proteins described herein further comprise one or more, preferably, at least two nuclear localization sequences. In certain embodiments, the fusion proteins comprise at least two NLSs. In embodiments with at least two NLSs, the NLSs can be the same NLSs or they can
be different NLSs. In some embodiments, one or more of the NLSs are bipartite NLSs (“bpNLS”). In certain embodiments, the disclosed fusion proteins comprise two bipartite NLSs. In some embodiments, the disclosed fusion proteins comprise more than two bipartite NLSs. [0226] The location of the NLS fusion can be at the N-terminus, the C-terminus, or within a sequence of a fusion protein (e.g., inserted between the encoded napDNAbp component (e.g., Cas9) and a polymerase domain (e.g., a reverse transcriptase). [0227] The NLSs may be any known NLS sequence in the art. The NLSs may also be any future-discovered NLSs for nuclear localization. The NLSs also may be any naturally- occurring NLS, or any non-naturally occurring NLS (e.g., an NLS with one or more desired mutations). [0228] The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., International PCT application PCT/EP2000/011690, filed November 23, 2000, published as WO/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference. In some embodiments, an NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 31), MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 22), KRTADGSEFESPKKKRKV (SEQ ID NO: 32), or KRTADGSEFEPKKKRKV (SEQ ID NO: 33). In other embodiments, an NLS comprises the amino acid sequences NLSKRPAAIKKAGQAKKKK (SEQ ID NO: 34), PAAKRVKLD (SEQ ID NO: 25), RQRRNELKRSF (SEQ ID NO: 35), or NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 36). [0229] In one aspect of the disclosure, a prime editor or other fusion protein may be modified with one or more nuclear localization sequences (NLS), preferably at least two NLSs. In certain embodiments, the fusion proteins are modified with two or more NLSs. The disclosure contemplates the use of any nuclear localization sequence known in the art at the time of the disclosure, or any nuclear localization sequence that is identified or otherwise made available in the state of the art after the time of the instant filing. A representative nuclear localization sequence is a peptide sequence that directs the protein to the nucleus of the cell in which the sequence is expressed. A nuclear localization signal is predominantly basic, can be positioned almost anywhere in a protein's amino acid sequence, generally
comprises a short sequence of four amino acids (Autieri & Agrawal, (1998) J. Biol. Chem. 273: 14731-37, incorporated herein by reference) to eight amino acids, and is typically rich in lysine and arginine residues (Magin et al., (2000) Virology 274: 11-16, incorporated herein by reference). Nuclear localization sequences often comprise proline residues. A variety of nuclear localization sequences have been identified and have been used to effect transport of biological molecules from the cytoplasm to the nucleus of a cell. See, e.g., Tinland et al., (1992) Proc. Natl. Acad. Sci. U.S.A.89:7442-46; Moede et al., (1999) FEBS Lett.461:229- 34, which is incorporated herein by reference. Translocation is currently thought to involve nuclear pore proteins. [0230] Most NLSs can be classified in three general groups: (i) a monopartite NLS exemplified by the SV40 large T antigen NLS (PKKKRKV (SEQ ID NO: 31)); (ii) a bipartite motif consisting of two basic domains separated by a variable number of spacer amino acids and exemplified by the Xenopus nucleoplasmin NLS (KRXXXXXXXXXXKKKL (SEQ ID NO: 37)); and (iii) noncanonical sequences such as M9 of the hnRNP Al protein, the influenza virus nucleoprotein NLS, and the yeast Gal4 protein NLS (Dingwall and Laskey 1991). [0231] Nuclear localization sequences appear at various points in the amino acid sequences of proteins. NLS have been identified at the N-terminus, the C-terminus, and in the central region of proteins. Thus, the disclosure provides fusion proteins that may be modified with one or more NLSs at the C-terminus and/or the N-terminus, as well as at internal regions of the fusion protein. The residues of a longer sequence that do not function as component NLS residues should be selected so as not to interfere, for example, tonically or sterically, with the nuclear localization signal itself. Therefore, although there are no strict limits on the composition of an NLS-comprising sequence, in practice, such a sequence can be functionally limited in length and composition. [0232] The present disclosure contemplates any suitable means by which to modify a fusion protein to include one or more NLSs. In one aspect, the fusion proteins may be engineered to express a fusion protein that is translationally fused at its N-terminus or its C-terminus (or both) to one or more NLSs, i.e., to form a prime editor-NLS fusion construct. In other embodiments, a fusion protein-encoding nucleotide sequence may be genetically modified to incorporate a reading frame that encodes one or more NLSs in an internal region of the encoded prime editor. In addition, the NLSs may include various amino acid linkers or spacer regions encoded between the prime editor and the N-terminally, C-terminally, or internally-
attached NLS amino acid sequence, e.g., and in the central region of proteins. Thus, the present disclosure also provides for nucleotide constructs, vectors, and host cells for expressing fusion proteins that comprise a prime editor and one or more NLSs, among other components. [0233] The prime editor fusion proteins described herein may also comprise nuclear localization sequences that are linked to a prime editor through one or more linkers, e.g., a polymeric, amino acid, nucleic acid, polysaccharide, chemical, or nucleic acid linker element. The linkers within the contemplated scope of the disclosure are not intended to have any limitations and can be any suitable type of molecule (e.g., polymer, amino acid, polysaccharide, nucleic acid, lipid, or any synthetic chemical linker domain) and can be joined to the prime editor by any suitable strategy that effectuates forming a bond (e.g., covalent linkage, hydrogen bonding) between the prime editor and the one or more NLSs. Linkers [0234] The prime editor fusion proteins used in the methods described herein may include one or more linkers. As defined above, the term “linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g., a binding domain and a cleavage domain of a nuclease. In some embodiments, a linker joins a gRNA binding domain of an RNA-programmable nuclease and a polymerase (e.g., a reverse transcriptase). In some embodiments, a linker joins a Cas9 nickase and a reverse transcriptase. Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60- 70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated. [0235] The linker may be as simple as a covalent bond, or it may be a polymeric linker many atoms in length. In certain embodiments, the linker is a polypeptide, or amino acid-based. In other embodiments, the linker is not peptide-like. In certain embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In certain embodiments, the linker is a carbon-nitrogen bond of an amide linkage. In certain embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or
unbranched aliphatic or heteroaliphatic linker. In certain embodiments, the linker is polymeric (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminoalkanoic acid. In certain embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, ethanoic acid, alanine, beta-alanine, 3-aminopropanoic acid, 4-aminobutanoic acid, 5-pentanoic acid, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminohexanoic acid (Ahx). In certain embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises amino acids. In certain embodiments, the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl moiety. In certain embodiments, the linker is based on a phenyl ring. The linker may include functionalized moieties to facilitate attachment of a nucleophile (e.g., thiol, amino) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, Michael acceptors, alkyl halides, aryl halides, acyl halides, and isothiocyanates. [0236] In some other embodiments, the linker comprises the amino acid sequence (GGGGS)
n (SEQ ID NO: 67), (G)n (SEQ ID NO: 68), (EAAAK)n (SEQ ID NO: 69), (GGS)n (SEQ ID NO: 70), (SGGS)n (SEQ ID NO: 71), (XP)n (SEQ ID NO: 72), or any combination thereof, wherein n is independently an integer between 1 and 30, and wherein X is any amino acid. In some embodiments, the linker comprises the amino acid sequence (GGS)
n (SEQ ID NO: 70), wherein n is 1, 3, or 7. In some embodiments, the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 73). In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 74). In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 75). In some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 66). In other embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSS GGS (SEQ ID NO: 76, 60AA). In some embodiments, the linker comprises the amino acid sequence GGS, GGSGGS (SEQ ID NO: 77), GGSGGSGGS (SEQ ID NO: 78), SGGSSGGSSGSETPGTSESATPESSGGSSGGSS (SEQ ID NO: 64), SGSETPGTSESATPES (SEQ ID NO: 73), or
SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSS GG S (SEQ ID NO: 76). [0237] In certain embodiments, linkers may be used to link any of the peptides or peptide domains or moieties of the invention (e.g., a napDNAbp linked or fused to a reverse transcriptase domain, and/or a napDNAbp linked to one or more NESs). Any of the domains of the fusion proteins described herein may also be connected to one another through any of the presently described linkers. Additional prime editor domains A. Flap endonucleases (e.g., FEN1) [0238] In various embodiments, the prime editor proteins described herein may comprise one or more flap endonucleases (e.g., FEN1), which refers to an enzyme that catalyzes the removal of 5ʹ single strand DNA flaps (provided in trans or fused to the PE fusion proteins). These are naturally occurring enzymes that process the removal of 5ʹ flaps formed during cellular processes, including DNA replication. The prime editors described herein may utilize endogenously supplied flap endonucleases or those provided in trans to remove the 5ʹ flap of endogenous DNA formed at the target site during prime editing. Flap endonucleases are known in the art and can are described in Patel et al., “Flap endonucleases pass 5ʹ-flaps through a flexible arch using a disorder-thread-order mechanism to confer specificity for free 5ʹ-ends,” Nucleic Acids Research, 2012, 40(10): 4507-4519 and Tsutakawa et al., “Human flap endonuclease structures, DNA double-base flipping, and a unified understanding of the FEN1 superfamily,” Cell, 2011, 145(2): 198-211 (each of which are incorporated herein by reference). An exemplary flap endonuclease is FEN1, which can be represented by the following amino acid sequence:
[0239] The flap endonucleases may also include any FEN1 variant, mutant, or other flap endonuclease ortholog, homolog, or variant. Non-limiting FEN1 variant examples are as follows:
[0240] In various embodiments, the prime editor fusion proteins utilized in the methods and compositions contemplated herein may include any flap endonuclease variant of the above- disclosed sequences having an amino acid sequence that is at least about 70% identical, at
least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to any of the above sequences. Other endonucleases that may be utilized by the instant compositions and methods to facilitate removal of the 5′ end single strand DNA flap include, but are not limited to (1) trex 2, (2) exo1 endonuclease (e.g., Keijzers et al., Biosci Rep.2015, 35(3): e00206) Trex 2 [0241] Three prime (3´) repair exonuclease 2 (TREX2) – human Accession No. NM_080701 MSEAPRAETFVFLDLEATGLPSVEPEIAELSLFAVHRSSLENPEHDESGALVLPRVLD KLTLCMCPERPFTAKASEITGLSSEGLARCRKAGFDGAVVRTLQAFLSRQAGPICLVA HNGFDYDFPLLCAELRRLGARLPRDTVCLDTLPALRGLDRAHSHGTRARGRQGYSL GSLFHRYFRAEPSAAHSAEGDVHTLLLIFLHRAAELLAWADEQARGWAHIEPMYLPP DDPSLEA (SEQ ID NO: 103). [0242] Three prime (3´) repair exonuclease 2 (TREX2) – mouse Accession No. NM_011907 MSEPPRAETFVFLDLEATGLPNMDPEIAEISLFAVHRSSLENPERDDSGSLVLPRVLDK LTLCMCPERPFTAKASEITGLSSESLMHCGKAGFNGAVVRTLQGFLSRQEGPICLVAH NGFDYDFPLLCTELQRLGAHLPQDTVCLDTLPALRGLDRAHSHGTRAQGRKSYSLA SLFHRYFQAEPSAAHSAEGDVHTLLLIFLHRAPELLAWADEQARSWAHIEPMYVPPD GPSLEA (SEQ ID NO: 104). [0243] Three prime (3´) repair exonuclease 2 (TREX2) – rat Accession No. NM_001107580 MSEPLRAETFVFLDLEATGLPNMDPEIAEISLFAVHRSSLENPERDDSGSLVLPRVLD KLTLCMCPERPFTAKASEITGLSSEGLMNCRKAAFNDAVVRTLQGFLSRQEGPICLV AHNGFDYDFPLLCTELQRLGAHLPRDTVCLDTLPALRGLDRVHSHGTRAQGRKSYS LASLFHRYFQAEPSAAHSAEGDVNTLLLIFLHRAPELLAWADEQARSWAHIEPMYVP PDGPSLEA (SEQ ID NO: 105). ExoI [0244] Human exonuclease 1 (EXO1) has been implicated in many different DNA metabolic processes, including DNA mismatch repair (MMR), micro-mediated end-joining, homologous recombination (HR), and replication. Human EXO1 belongs to a family of eukaryotic nucleases, Rad2/XPG, which also include FEN1 and GEN1. The Rad2/XPG
family is conserved in the nuclease domain through species from phage to human. The EXO1 gene product exhibits both 5′ exonuclease and 5′ flap activity. Additionally, EXO1 contains an intrinsic 5′ RNase H activity. Human EXO1 has a high affinity for processing double stranded DNA (dsDNA), nicks, gaps, and pseudo Y structures and can resolve Holliday junctions using its inherit flap activity. Human EXO1 is implicated in MMR and contains conserved binding domains interacting directly with MLH1 and MSH2. EXO1 nucleolytic activity is positively stimulated by PCNA, MutSα (MSH2/MSH6 complex), 14-3-3, MRN, and 9-1-1 complex. [0245] Exonuclease 1 (EXO1) Accession No. NM_003686 (Homo sapiens exonuclease 1 (EXO1), transcript variant 3) – isoform A MGIQGLLQFIKEASEPIHVRKYKGQVVAVDTYCWLHKGAIACAEKLAKGEPTDRYV GFCMKFVNMLLSHGIKPILVFDGCTLPSKKEVERSRRERRQANLLKGKQLLREGKVS EARECFTRSINITHAMAHKVIKAARSQGVDCLVAPYEADAQLAYLNKAGIVQAIITE DSDLLAFGCKKVILKMDQFGNGLEIDQARLGMCRQLGDVFTEEKFRYMCILSGCDY LSSLRGIGLAKACKVLRLANNPDIVKVIKKIGHYLKMNITVPEDYINGFIRANNTFLY QLVFDPIKRKLIPLNAYEDDVDPETLSYAGQYVDDSIALQIALGNKDINTFEQIDDYN PDTAMPAHSRSHSWDDKTCQKSANVSSIWHRNYSPRPESGTVSDAPQLKENPSTVG VERVISTKGLNLPRKSSIVKRPRSAELSEDDLLSQYSLSFTKKTKKNSSEGNKSLSFSE VFVPDLVNGPTNKKSVSTPPRTRNKFATFLQRKNEESGAVVVPGTRSRFFCSSDSTDC VSNKVSIQPLDETAVTDKENNLHESEYGDQEGKRLVDTDVARNSSDDIPNNHIPGDH IPDKATVFTDEESYSFESSKFTRTISPPTLGTLRSCFSWSGGLGDFSRTPSPSPSTALQQ FRRKSDSPTSLPENNMSDVSQLKSEESSDDESHPLREEACSSQSQESGEFSLQSSNASK LSQCSSKDSDSEESDCNIKLLDSQSDQTSKLRLSHFSKKDTPLRNKVPGLYKSSSADS LSTTKIKPLGPARASGLSKKPASIQKRKHHNAENKPGLQIKLNELWKNFGFKKF (SEQ ID NO: 106). [0246] Exonuclease 1 (EXO1) Accession No. NM_006027 (Homo sapiens exonuclease 1 (EXO1), transcript variant 3) – isoform B MGIQGLLQFIKEASEPIHVRKYKGQVVAVDTYCWLHKGAIACAEKLAKGEPTDRYV GFCMKFVNMLLSHGIKPILVFDGCTLPSKKEVERSRRERRQANLLKGKQLLREGKVS EARECFTRSINITHAMAHKVIKAARSQGVDCLVAPYEADAQLAYLNKAGIVQAIITE DSDLLAFGCKKVILKMDQFGNGLEIDQARLGMCRQLGDVFTEEKFRYMCILSGCDY LSSLRGIGLAKACKVLRLANNPDIVKVIKKIGHYLKMNITVPEDYINGFIRANNTFLY QLVFDPIKRKLIPLNAYEDDVDPETLSYAGQYVDDSIALQIALGNKDINTFEQIDDYN
PDTAMPAHSRSHSWDDKTCQKSANVSSIWHRNYSPRPESGTVSDAPQLKENPSTVG VERVISTKGLNLPRKSSIVKRPRSAELSEDDLLSQYSLSFTKKTKKNSSEGNKSLSFSE VFVPDLVNGPTNKKSVSTPPRTRNKFATFLQRKNEESGAVVVPGTRSRFFCSSDSTDC VSNKVSIQPLDETAVTDKENNLHESEYGDQEGKRLVDTDVARNSSDDIPNNHIPGDH IPDKATVFTDEESYSFESSKFTRTISPPTLGTLRSCFSWSGGLGDFSRTPSPSPSTALQQ FRRKSDSPTSLPENNMSDVSQLKSEESSDDESHPLREEACSSQSQESGEFSLQSSNASK LSQCSSKDSDSEESDCNIKLLDSQSDQTSKLRLSHFSKKDTPLRNKVPGLYKSSSADS LSTTKIKPLGPARASGLSKKPASIQKRKHHNAENKPGLQIKLNELWKNFGFKKDSEK LPPCKKPLSPVRDNIQLTPEAEEDIFNKPECGRVQRAIFQ (SEQ ID NO: 107). [0247] Exonuclease 1 (EXO1) Accession No. NM_001319224 (Homo sapiens exonuclease 1 (EXO1), transcript variant 4) – isoform C MGIQGLLQFIKEASEPIHVRKYKGQVVAVDTYCWLHKGAIACAEKLAKGEPTDRYV GFCMKFVNMLLSHGIKPILVFDGCTLPSKKEVERSRRERRQANLLKGKQLLREGKVS EARECFTRSINITHAMAHKVIKAARSQGVDCLVAPYEADAQLAYLNKAGIVQAIITE DSDLLAFGCKKVILKMDQFGNGLEIDQARLGMCRQLGDVFTEEKFRYMCILSGCDY LSSLRGIGLAKACKVLRLANNPDIVKVIKKIGHYLKMNITVPEDYINGFIRANNTFLY QLVFDPIKRKLIPLNAYEDDVDPETLSYAGQYVDDSIALQIALGNKDINTFEQIDDYN PDTAMPAHSRSHSWDDKTCQKSANVSSIWHRNYSPRPESGTVSDAPQLKENPSTVG VERVISTKGLNLPRKSSIVKRPRSELSEDDLLSQYSLSFTKKTKKNSSEGNKSLSFSEV FVPDLVNGPTNKKSVSTPPRTRNKFATFLQRKNEESGAVVVPGTRSRFFCSSDSTDCV SNKVSIQPLDETAVTDKENNLHESEYGDQEGKRLVDTDVARNSSDDIPNNHIPGDHIP DKATVFTDEESYSFESSKFTRTISPPTLGTLRSCFSWSGGLGDFSRTPSPSPSTALQQFR RKSDSPTSLPENNMSDVSQLKSEESSDDESHPLREEACSSQSQESGEFSLQSSNASKLS QCSSKDSDSEESDCNIKLLDSQSDQTSKLRLSHFSKKDTPLRNKVPGLYKSSSADSLS TTKIKPLGPARASGLSKKPASIQKRKHHNAENKPGLQIKLNELWKNFGFKKDSEKLP PCKKPLSPVRDNIQLTPEAEEDIFNKPECGRVQRAIFQ (SEQ ID NO: 108). B. Inteins and split-inteins [0248] It will be understood that in some embodiments (e.g., delivery of a prime editor in vivo), it may be advantageous to split a polypeptide (e.g., a reverse transcriptase or a napDNAbp) or a fusion protein (e.g., a prime editor) into an N-terminal half and a C-terminal half, deliver them separately, and then allow their colocalization to reform the complete protein (or fusion protein as the case may be) within the cell. Separate halves of a protein or a
fusion protein may each comprise a split-intein tag to facilitate the reformation of the complete protein or fusion protein by the mechanism of protein trans splicing. [0249] Protein trans-splicing, catalyzed by split inteins, provides an entirely enzymatic method for protein ligation. A split-intein is essentially a contiguous intein (e.g., a mini- intein) split into two pieces named N-intein and C-intein, respectively. The N-intein and C- intein of a split intein can associate non-covalently to form an active intein and catalyze the splicing reaction in essentially the same way as a contiguous intein does. Split inteins have been found in nature and have also been engineered in laboratories. As used herein, the term “split intein” refers to any intein in which one or more peptide bond breaks exists between the N-terminal and C-terminal amino acid sequences such that the N-terminal and C-terminal sequences become separate molecules that can non-covalently reassociate, or reconstitute, into an intein that is functional for trans-splicing reactions. Any catalytically active intein, or fragment thereof, may be used to derive a split intein for use in the methods of the invention. For example, in one aspect, the split intein may be derived from a eukaryotic intein. In another aspect, the split intein may be derived from a bacterial intein. In another aspect, the split intein may be derived from an archaeal intein. Preferably, the split intein so-derived will possess only the amino acid sequences essential for catalyzing trans-splicing reactions. [0250] As used herein, the “N-terminal split intein (In)” refers to any intein sequence that comprises an N-terminal amino acid sequence that is functional for trans-splicing reactions. An In thus also comprises a sequence that is spliced out when trans-splicing occurs. An In can comprise a sequence that is a modification of the N-terminal portion of a naturally occurring intein sequence. For example, an In can comprise additional amino acid residues and/or mutated residues, as long as the inclusion of such additional and/or mutated residues does not render the In non-functional in trans-splicing. Preferably, the inclusion of the additional and/or mutated residues improves or enhances the trans-splicing activity of the In. [0251] As used herein, the “C-terminal split intein (Ic)” refers to any intein sequence that comprises a C-terminal amino acid sequence that is functional for trans-splicing reactions. In one aspect, the Ic comprises 4 to 7 contiguous amino acid residues, at least 4 amino acids of which are from the last β-strand of the intein from which it was derived. An Ic thus also comprises a sequence that is spliced out when trans-splicing occurs. An Ic can comprise a sequence that is a modification of the C-terminal portion of a naturally occurring intein sequence. For example, an Ic can comprise additional amino acid residues and/or mutated residues, as long as the inclusion of such additional and/or mutated residues does not render
the In non-functional in trans-splicing. Preferably, the inclusion of the additional and/or mutated residues improves or enhances the trans-splicing activity of the Ic. [0252] In some embodiments of the invention, a peptide linked to an Ic or an In can comprise an additional chemical moiety including, among others, fluorescence groups, biotin, polyethylene glycol (PEG), amino acid analogs, unnatural amino acids, phosphate groups, glycosyl groups, radioisotope labels, and pharmaceutical molecules. In other embodiments, a peptide linked to an Ic can comprise one or more chemically reactive groups including, among others, ketones, aldehydes, Cys residues, and Lys residues. The N-intein and C-intein of a split intein can associate non-covalently to form an active intein and catalyze the splicing reaction when an "intein-splicing polypeptide (ISP)" is present. As used herein, "intein- splicing polypeptide (ISP)" refers to the portion of the amino acid sequence of a split intein that remains when the Ic, In, or both, are removed from the split intein. In certain embodiments, the In comprises the ISP. In another embodiment, the Ic comprises the ISP. In yet another embodiment, the ISP is a separate peptide that is not covalently linked to In nor to Ic. [0253] Split inteins may be created from contiguous inteins by engineering one or more split sites in the unstructured loop or intervening amino acid sequence between the -12 conserved beta-strands found in the structure of mini-inteins. Some flexibility in the position of the split site within regions between the beta-strands may exist, provided that creation of the split will not disrupt the structure of the intein, the structured beta-strands in particular, to a sufficient degree that protein splicing activity is lost. [0254] In protein trans-splicing, one precursor protein consists of an N-extein part followed by the N-intein, another precursor protein consists of the C-intein followed by a C-extein part, and a trans-splicing reaction (catalyzed by the N- and C-inteins together) excises the two intein sequences and links the two extein sequences with a peptide bond. Protein trans- splicing, being an enzymatic reaction, can work with very low (e.g., micromolar) concentrations of proteins and can be carried out under physiological conditions. [0255] Exemplary sequences are as follows:
[0256] Although inteins are most frequently found as a contiguous domain, some exist in a naturally split form. In this case, the two fragments are expressed as separate polypeptides and must associate before splicing takes place, so-called protein trans-splicing. [0257] An exemplary split intein is the Ssp DnaE intein, which comprises two subunits, namely, DnaE-N and DnaE-C. The two different subunits are encoded by separate genes, namely dnaE-n and dnaE-c, which encode the DnaE-N and DnaE-C subunits, respectively. DnaE is a naturally occurring split intein in Synechocytis sp. PCC6803 and is capable of directing trans-splicing of two separate proteins, each comprising a fusion with either DnaE- N or DnaE-C. [0258] Additional naturally occurring or engineered split-intein sequences are known in the art or can be made from whole-intein sequences described herein or those available in the art. Examples of split-intein sequences can be found in Stevens et al., “A promiscuous split intein with expanded protein engineering applications,” PNAS, 2017, Vol.114: 8538-8543; Iwai et al., “Highly efficient protein trans-splicing by a naturally split DnaE intein from Nostoc punctiforme, FEBS Lett, 580: 1853-1858, each of which are incorporated herein by reference. Additional split intein sequences can be found, for example, in WO 2013/045632, WO
2014/055782, WO 2016/069774, and EP2877490, the contents of each of which are incorporated herein by reference. [0259] In addition, protein splicing in trans has been described in vivo and in vitro (Shingledecker, et al., Gene 207:187 (1998), Southworth, et al., EMBO J.17:918 (1998); Mills, et al., Proc. Natl. Acad. Sci. USA, 95:3543-3548 (1998); Lew, et al., J. Biol. Chem., 273:15887-15890 (1998); Wu, et al., Biochim. Biophys. Acta 35732:1 (1998b), Yamazaki, et al., J. Am. Chem. Soc.120:5591 (1998), Evans, et al., J. Biol. Chem.275:9091 (2000); Otomo, et al., Biochemistry 38:16040-16044 (1999); Otomo, et al., J. Biolmol. NMR 14:105- 114 (1999); Scott, et al., Proc. Natl. Acad. Sci. USA 96:13638-13643 (1999)) and provides the opportunity to express a protein as two inactive fragments that subsequently undergo ligation to form a functional product. RNA-protein interaction domain [0260] In various embodiments, two separate protein domains (e.g., a Cas9 domain and a polymerase domain) may be colocalized to one another to form a functional complex (akin to the function of a fusion protein comprising the two separate protein domains) by using an “RNA-protein recruitment system,” such as the “MS2 tagging technique.” Such systems generally tag one protein domain with an “RNA-protein interaction domain” (a.k.a. “RNA- protein recruitment domain”) and the other with an “RNA-binding protein” that specifically recognizes and binds to the RNA-protein interaction domain, e.g., a specific hairpin structure. These types of systems can be leveraged to colocalize the domains of a prime editor, as well as to recruit additional functionalities to a prime editor, such as a UGI domain. In one example, the MS2 tagging technique is based on the natural interaction of the MS2 bacteriophage coat protein (“MCP” or “MS2cp”) with a stem-loop or hairpin structure present in the genome of the phage, i.e., the “MS2 hairpin.” In the case of the MS2 hairpin, it is recognized and bound by the MS2 bacteriophage coat protein (MCP). Thus, in one exemplary scenario, a reverse transcriptase-MS2 fusion can recruit a Cas9-MCP fusion. [0261] A review of other modular RNA-protein interaction domains are described in the art, for example, in Johansson et al., “RNA recognition by the MS2 phage coat protein,” Sem Virol., 1997, Vol.8(3): 176-185; Delebecque et al., “Organization of intracellular reactions with rationally designed RNA assemblies,” Science, 2011, Vol.333: 470-474; Mali et al., “Cas9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering,” Nat. Biotechnol., 2013, Vol.31: 833-838; and Zalatan et al., “Engineering complex synthetic transcriptional programs with CRISPR RNA scaffolds,”
Cell, 2015, Vol.160: 339-350, each of which are incorporated herein by reference in their entireties. Other systems include the PP7 hairpin, which specifically recruits the PCP protein, and the “com” hairpin, which specifically recruits the Com protein. See Zalatan et al. [0262] The nucleotide sequence of the MS2 hairpin (or equivalently referred to as the “MS2 aptamer”) is: GCCAACATGAGGATCACCCATGTCTGCAGGGCC (SEQ ID NO: 117). [0263] The amino acid sequence of the MCP or MS2cp is: GSASNFTQFVLVDNGGTGDVTVAPSNFANGVAEWISSNSRSQAYKVTCSVRQSSAQ NRKYTIKVEVPKVATQTVGGEELPVAGWRSYLNMELTIPIFATNSDCELIVKAMQGL LKDGNPIPSAIAANSGIY (SEQ ID NO: 118). C. Additional PE elements [0264] In certain embodiments, the prime editors utilized in the methods and compositions described herein may comprise an inhibitor of base repair. The term “inhibitor of base repair” or “IBR” refers to a protein that is capable in inhibiting the activity of a nucleic acid repair enzyme, for example, a base excision repair enzyme. In some embodiments, the IBR is an inhibitor of OGG base excision repair. In some embodiments, the IBR is an inhibitor of base excision repair (“iBER”). Exemplary inhibitors of base excision repair include inhibitors of APE1, Endo III, Endo IV, Endo V, Endo VIII, Fpg, hOGG1, hNEIL1, T7 EndoI, T4PDG, UDG, hSMUG1, and hAAG. In some embodiments, the IBR is an inhibitor of Endo V or hAAG. In some embodiments, the IBR is an iBER that may be a catalytically inactive glycosylase or catalytically inactive dioxygenase or a small molecule or peptide inhibitor of an oxidase, or variants threreof. In some embodiments, the IBR is an iBER that may be a TDG inhibitor, an MBD4 inhibitor, or an inhibitor of an AlkBH enzyme. In some embodiments, the IBR is an iBER that comprises a catalytically inactive TDG or catalytically inactive MBD4. An exemplary catalytically inactive TDG is an N140A mutant of SEQ ID NO: 122 (human TDG). [0265] Some exemplary glycosylases are provided below. The catalytically inactivated variants of any of these glycosylase domains are iBERs that may be fused to the napDNAbp or polymerase domain of the prime editors utilized in the methods and compositions provided in this disclosure. [0266] OGG (human) MPARALLPRRMGHRTLASTPALWASIPCPRSELRLDLVLPSGQSFRWREQSPAHWSG VLADQVWTLTQTEEQLHCTVYRGDKSQASRPTPDELEAVRKYFQLDVTLAQLYHH WGSVDSHFQEVAQKFQGVRLLRQDPIECLFSFICSSNNNIARITGMVERLCQAFGPRL
IQLDDVTYHGFPSLQALAGPEVEAHLRKLGLGYRARYVSASARAILEEQGGLAWLQ QLRESSYEEAHKALCILPGVGTKVADCICLMALDKPQAVPVDVHMWHIAQRDYSW HPTTSQAKGPSPQTNKELGNFFRSLWGPYAGWAQAVLFSADLRQSRHAQEPPAKRR KGSKGPEG (SEQ ID NO: 119) [0267] MPG (human) MVTPALQMKKPKQFCRRMGQKKQRPARAGQPHSSSDAAQAPAEQPHSSSDAAQAP CPRERCLGPPTTPGPYRSIYFSSPKGHLTRLGLEFFDQPAVPLARAFLGQVLVRRLPN GTELRGRIVETEAYLGPEDEAAHSRGGRQTPRNRGMFMKPGTLYVYIIYGMYFCMNI SSQGDGACVLLRALEPLEGLETMRQLRSTLRKGTASRVLKDRELCSGPSKLCQALAI NKSFDQRDLAQDEAVWLERGPLEPSEPAVVAAARVGVGHAGEWARKPLRFYVRGS PWVSVVDRVAEQDTQA (SEQ ID NO: 120) [0268] MBD4 (human) MGTTGLESLSLGDRGAAPTVTSSERLVPDPPNDLRKEDVAMELERVGEDEEQMMIK RSSECNPLLQEPIASAQFGATAGTECRKSVPCGWERVVKQRLFGKTAGRFDVYFISP QGLKFRSKSSLANYLHKNGETSLKPEDFDFTVLSKRGIKSRYKDCSMAALTSHLQNQ SNNSNWNLRTRSKCKKDVFMPPSSSSELQESRGLSNFTSTHLLLKEDEGVDDVNFRK VRKPKGKVTILKGIPIKKTKKGCRKSCSGFVQSDSKRESVCNKADAESEPVAQKSQL DRTVCISDAGACGETLSVTSEENSLVKKKERSLSSGSNFCSEQKTSGIINKFCSAKDSE HNEKYEDTFLESEEIGTKVEVVERKEHLHTDILKRGSEMDNNCSPTRKDFTGEKIFQE DTIPRTQIERRKTSLYFSSKYNKEALSPPRRKAFKKWTPPRSPFNLVQETLFHDPWKL LIATIFLNRTSGKMAIPVLWKFLEKYPSAEVARTADWRDVSELLKPLGLYDLRAKTI VKFSDEYLTKQWKYPIELHGIGKYGNDSYRIFCVNEWKQVHPEDHKLNKYHDWLW ENHEKLSLS (SEQ ID NO: 121) [0269] TDG (human) MEAENAGSYSLQQAQAFYTFPFQQLMAEAPNMAVVNEQQMPEEVPAPAPAQEPVQ EAPKGRKRKPRTTEPKQPVEPKKPVESKKSGKSAKSKEKQEKITDTFKVKRKVDRFN GVSEAELLTKTLPDILTFNLDIVIIGINPGLMAAYKGHHYPGPGNHFWKCLFMSGLSE VQLNHMDDHTLPGKYGIGFTNMVERTTPGSKDLSSKEFREGGRILVQKLQKYQPRIA VFNGKCIYEIFSKEVFGVKVKNLEFGLQPHKIPDTETLCYVMPSSSARCAQFPRAQDK VHYYIKLKDLRDQLKGIERNMDVQEVQYTFDLQLAQEDAKKMAVKEEKYDPGYEA AYGGAYGENPCSSEPCGFSSNGLIESVELRGESAFSGIPNGQWMTQSFTDQIPSFSNH CGTQEQEEESHA (SEQ ID NO: 122)
[0270] In some embodiments, the fusion proteins described herein may comprise one or more heterologous protein domains (e.g., about or more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more domains in addition to the prime editor components). A fusion protein may comprise any additional protein sequence, and optionally a linker sequence between any two domains. Other exemplary features that may be present are localization sequences, such as cytoplasmic localization sequences, export sequences, such as nuclear export sequences, or other localization sequences, as well as sequence tags that are useful for solubilization, purification, or detection of the fusion proteins. [0271] Examples of protein domains that may be fused to a prime editor or component thereof (e.g., the napDNAbp domain, the polymerase domain, or the NLS domain) include, without limitation, epitope tags and reporter gene sequences. Non-limiting examples of epitope tags include histidine (His) tags, V5 tags, FLAG tags, influenza hemagglutinin (HA) tags, Myc tags, VSV-G tags, and thioredoxin (Trx) tags. Examples of reporter genes include, but are not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT), beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP). A prime editor may be fused to a gene sequence encoding a protein or a fragment of a protein that binds DNA molecules or binds other cellular molecules, including, but not limited to, maltose binding protein (MBP), S-tag, Lex A DNA binding domain (DBD) fusions, GAL4 DNA binding domain fusions, and herpes simplex virus (HSV) BP16 protein fusions. Additional domains that may form part of a prime editor are described in US Patent Publication No.2011/0059502, published March 10, 2011, and incorporated herein by reference in its entirety. [0272] In an aspect of the disclosure, a reporter gene that includes, but is not limited to, glutathione-5-transferase (GST), horseradish peroxidase (HRP), chloramphenicol acetyltransferase (CAT) beta-galactosidase, beta-glucuronidase, luciferase, green fluorescent protein (GFP), HcRed, DsRed, cyan fluorescent protein (CFP), yellow fluorescent protein (YFP), and autofluorescent proteins including blue fluorescent protein (BFP), may be introduced into a cell to encode a gene product that serves as a marker by which to measure the alteration or modification of expression of the gene product. In certain embodiments of the disclosure, the gene product is luciferase. In a further embodiment of the disclosure, the expression of the gene product is decreased.
[0273] Suitable protein tags provided herein include, but are not limited to, biotin carboxylase carrier protein (BCCP) tags, myc-tags, calmodulin-tags, FLAG-tags, hemagglutinin (HA)-tags, polyhistidine tags, also referred to as histidine tags or His-tags, maltose binding protein (MBP)-tags, nus-tags, glutathione-S-transferase (GST)-tags, green fluorescent protein (GFP)-tags, thioredoxin-tags, S-tags, Softags (e.g., Softag 1, Softag 3), strep-tags, biotin ligase tags, FlAsH tags, V5 tags, and SBP-tags. Additional suitable sequences will be apparent to those of skill in the art. In some embodiments, the fusion protein comprises one or more His tags. [0274] In some embodiments of the present disclosure, the activity of the prime editing system may be temporally regulated by adjusting the residence time, the amount, and/or the activity of the expressed components of the PE system. For example, as described herein, the PE may be fused with a protein domain that is capable of modifying the intracellular half-life of the PE. In certain embodiments involving two or more vectors (e.g., a vector system in which the components described herein are encoded on two or more separate vectors), the activity of the PE system may be temporally regulated by controlling the timing in which the vectors are delivered. For example, in some embodiments a vector encoding the nuclease system may deliver the PE prior to the vector encoding the template. In other embodiments, the vector encoding the PEgRNA may deliver the guide prior to the vector encoding the PE system. In some embodiments, the vectors encoding the PE system and PEgRNA are delivered simultaneously. In certain embodiments, the simultaneously delivered vectors temporally deliver, e.g., the PE, PEgRNA, and/or second strand guide RNA components. In further embodiments, the RNA (such as, e.g., the nuclease transcript) transcribed from the coding sequence on the vectors may further comprise at least one element that is capable of modifying the intracellular half-life of the RNA and/or modulating translational control. In some embodiments, the half-life of the RNA may be increased. In some embodiments, the half-life of the RNA may be decreased. In some embodiments, the element may be capable of increasing the stability of the RNA. In some embodiments, the element may be capable of decreasing the stability of the RNA. In some embodiments, the element may be within the 3' UTR of the RNA. In some embodiments, the element may include a polyadenylation signal (PA). In some embodiments, the element may include a cap, e.g., an upstream mRNA or PEgRNA end. In some embodiments, the RNA may comprise no PA such that it is subject to quicker degradation in the cell after transcription. In some embodiments, the element may include at least one AU-rich element (ARE). The AREs may be bound by ARE binding
proteins (ARE-BPs) in a manner that is dependent upon tissue type, cell type, timing, cellular localization, and environment. In some embodiments, the destabilizing element may promote RNA decay, affect RNA stability, or activate translation. In some embodiments, the ARE may comprise 50 to 150 nucleotides in length. In some embodiments, the ARE may comprise at least one copy of the sequence AUUUA. In some embodiments, at least one ARE may be added to the 3' UTR of the RNA. In some embodiments, the element may be a Woodchuck Hepatitis Virus (WHP). [0275] Posttranscriptional Regulatory Element (WPRE), which creates a tertiary structure to enhance expression from the transcript. In further embodiments, the element is a modified and/or truncated WPRE sequence that is capable of enhancing expression from the transcript, as described, for example in Zufferey et al., J Virol, 73(4): 2886-92 (1999) and Flajolet et al., J Virol, 72(7): 6175-80 (1998). In some embodiments, the WPRE or equivalent may be added to the 3' UTR of the RNA. In some embodiments, the element may be selected from other RNA sequence motifs that are enriched in either fast- or slow-decaying transcripts. [0276] In some embodiments, the vector encoding the PE or the PEgRNA may be self- destroyed via cleavage of a target sequence present on the vector by the PE system. The cleavage may prevent continued transcription of a PE or a PEgRNA from the vector. Although transcription may occur on the linearized vector for some amount of time, the expressed transcripts or proteins subject to intracellular degradation will have less time to produce off-target effects without continued supply from expression of the encoding vectors. Inhibiting the DNA Mismatch Repair (MMR) Pathway [0277] In some embodiments, the present disclosure contemplates delivery of an inhibitor of the mismatch repair (MMR) pathway alongside a prime editor to enhance the efficiency of prime editing. Thus, the present disclosure contemplates any suitable means to inhibit MMR. In one embodiment, the disclosure embraces administering an effective amount of an inhibitor of the MMR pathway. In various embodiments, the MMR pathway may be inhibited by inhibiting, blocking, or inactivating any one or more MMR proteins or variants at the genetic level (e.g., in the gene encoding the one or more MMR proteins, such as introducing a mutation that inactivates the MMR protein or variant thereof), transcriptional level (e.g., by transcript knockdown), translational level (e.g., by blocking translation of one or more MMR proteins from their cognate transcripts), or at the protein level (e.g., application of an inhibitor (e.g., small molecule, antibody, dominant negative protein partner) or by targeted protein degradation (e.g., PROTAC-based degradation). The present disclosure also contemplates
methods of prime editing which are designed to install modifications to a nucleic acid molecule that evade correction by the MMR pathway, for example, by designing DNA synthesis templates comprising (and for initial installation in the edit strand) contiguous silent edits (e.g. three or more contiguous silent edits, or two or more contiguous silent edits immediately adjacent to a non-silent edit), or silent edits in close proximity to a non-silent edit (e.g., one or more or two or more silent edits within 5 nucleotides upstream or downstream of a corrective non-silent edit) without the need to provide an MMR inhibitor. Delivering an MMR inhibitor alongside the prime editor, or installing modifications to a nucleic acid molecule that avoid correction by the MMR pathway, may result in increased editing efficiency and reduced indel formation. As used herein, “during” prime editing can embrace any suitable sequence of events, such that the prime editing step can be applied before, at the same time, or after the step of blocking, inhibiting, or inactivating the MMR pathway (e.g., by targeting the inhibition of MLH1). For example, in some embodiments, an inhibitor of the MMR pathway may be delivered at the same time as the prime editor. In some embodiments, an inhibitor of the MMR pathway may be delivered before delivery of the prime editor, or after delivery of the prime editor. [0278] In some embodiments, a prime editing system component, e.g., a pegRNA, is designed to install modifications in the target nucleic acid which evade the MMR system, without the need to provide an inhibitor. In certain embodiments, the DNA mismatch repair (MMR) system can be inhibited, blocked, or otherwise inactivated by inhibiting one or more proteins of the MMR system, including, but not limited to MLH1, PMS2 (or MutL alpha), PMS1 (or MutL beta), MLH3 (or MutL gamma), MutS alpha (MSH2-MSH6), MutS beta (MSH2-MSH3), MSH2, MSH6, PCNA, RFC, EXO1, POLδ, and PCNA. [0279] Thus, in one aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) by delivering an inhibitor of the MMR pathway and a prime editor. [0280] In another aspect, the present disclosure provides a method for editing a nucleotide molecule (e.g., a genome) by delivering an inhibitor of the MMR system, e.g., MLH1, PMS2 (or MutL alpha), PMS1 (or MutL beta), MLH3 (or MutL gamma), MutS alpha (MSH2- MSH6), MutS beta (MSH2-MSH3), MSH2, MSH6, PCNA, RFC, EXO1, POLδ, and PCNA, and a prime editor. [0281] In one aspect, the present disclosure contemplates delivery of a prime editor and an inhibitor of MLH1 or a variant thereof. Without being bound by theory, MLH1 is an MMR protein that heterodimerizes with PMS2 to form MutL alpha, a component of the post-
replicative DNA mismatch repair system (MMR). DNA repair is initiated by MutS alpha (MSH2-MSH6) or MutS beta (MSH2-MSH3) binding to a dsDNA mismatch, then MutL alpha is recruited to the heteroduplex. Assembly of the MutL-MutS-heteroduplex ternary complex in presence of RFC and PCNA is sufficient to activate endonuclease activity of PMS2. It introduces single-strand breaks near the mismatch and thus generates new entry points for the exonuclease EXO1 to degrade the strand containing the mismatch. DNA methylation would prevent cleavage and therefore assure that only the newly mutated DNA strand is going to be corrected. MutL alpha (MLH1-PMS2) interacts physically with the clamp loader subunits of DNA polymerase III, suggesting that it may play a role to recruit the DNA polymerase III to the site of the MMR. Also implicated in DNA damage signaling, a process which induces cell cycle arrest and can lead to apoptosis in case of major DNA damages. MLH1 also heterodimerizes with MLH3 to form MutL gamma which plays a role in meiosis. The “canonical” human MLH1 amino acid sequence is represented by: [0282] >sp|P40692|MLH1_HUMAN DNA mismatch repair protein Mlh1 OS=Homo sapiens OX=9606 GN=MLH1 PE=1 SV=1 MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIVKEGGLKLI Q IQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTA DGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIATRRKALKNPSEEYGKIL EVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRSIFGNAVSRELIEIGCED KTLAF KMNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEIS P QNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE MVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTED KTDIS SGRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHRED SDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGC VNPQWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLAL DSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVP PLEGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSI PNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC(SEQ ID NO: 1)
[0283] MLH1 also may include other human isoforms, including P40692-2, which differs from the canonical sequence in that residues 1-241 of the canonical sequence are missing: [0284] >sp|P40692-2|MLH1_HUMAN Isoform 2 of DNA mismatch repair protein Mlh1 OS=Homo sapiens OX=9606 GN=MLH1 MNGYISNANYSVKKCIFLLFINHRLVESTSLRKAIETVYAAYLPKNTHPFLYLSLEISP Q NVDVNVHPTKHEVHFLHEESILERVQQHIESKLLGSNSSRMYFTQTLLPGLAGPSGE MVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLSSQPQAIVTED KTDISS GRARQQDEEMLELPAPAEVAAKNQSLEGDTTKGTSEMSEKRGPTSSNPRKRHREDS DVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQEEINEQGHEVLREMLHNHSFVGCV NPQWALAQHQTKLYLLNTTKLSEELFYQILIYDFANFGVLRLSEPAPLFDLAMLALD SPESGWTEEDGPKEGLAEYIVEFLKKKAEMLADYFSLEIDEEGNLIGLPLLIDNYVPPL EGLPIFILRLATEVNWDEEKECFESLSKECAMFYSIRKQYISEESTLSGQQSEVPGSIPN SWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC(SEQ ID NO: 2) [0285] MLH1 also may include a third known isoform known as P40692-3, which differs from the canonical sequence in that residues 1-101 (of MSFVAGVIRR…ASISTYGFRG (SEQ ID NO: 3)) are replaced with MAF: [0286] >sp|P40692-3|MLH1_HUMAN Isoform 3 of DNA mismatch repair protein Mlh1 OS=Homo sapiens OX=9606 GN=MLH1 MAFEALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDL FYNIATRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNAST VDNIRSIFGNAVSRELIEIGCEDKTLAFKMNGYISNANYSVKKCIFLLFINHRLVESTSL RKAIET VYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILERVQQHIESKLLG SN SSRMYFTQTLLPGLAGPSGEMVKSTTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDA FLQPLSKPLSSQPQAIVTEDKTDISSGRARQQDEEMLELPAPAEVAAKNQSLEGDTTK GTSEMSEKRGPTSSNPRKRHREDSDVEMVEDDSRKEMTAACTPRRRIINLTSVLSLQ EEINEQGHEVLREMLHNHSFVGCVNPQWALAQHQTKLYLLNTTKLSEELFYQILIYD FANFGVLRLSEPAPLFDLAMLALDSPESGWTEEDGPKEGLAEYIVEFLKKKAEMLAD YFSLEIDEEGNLIGLPLLIDNYVPPLEGLPIFILRLATEVNWDEEKECFESLSKECAMFY SIRKQYISEESTLS
GQQSEVPGSIPNSWKWTVEHIVYKALRSHILPPKHFTEDGNILQLANLPDLYKVFERC (SEQ ID NO: 4). [0287] The disclosure contemplates that inhibitors of any of the following proteins may be delivered to inhibit the MMR pathway during prime editing. In addition, such exemplary proteins may also be used to engineer or otherwise make a dominant negative variant that may be used as a type of inhibitor when administered in an effective amount which blocks, inactivates, or inhibits the MMR. Without being bound by theory, it is believed that MLH1 dominant negative mutants can saturate binding of MutS. Exemplary MLH1 proteins include the following amino acid sequences, or amino acid sequences having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or up to 100% sequence identity with any of the following sequences:
[0288] MLH1 mutants or truncated variants may be provided with the prime editors for inhibition of the MMR pathway of the present disclosure. In some embodiments, the mutants and truncated variants of the human MLH1 wild-type protein are utilized. [0289] In one aspect, a truncated variant of human MLH1 is delivered along with a prime editor. In some embodiments, amino acids 754-756 of the wild-type human MLH1 protein are truncated (Δ754-756, hereinafter referred to as MLH1dn). In some embodiments, a truncated variant of human MLH1 comprising only the N-terminal domain (amino acids 1- 335) is provided (hereinafter referred to as MLH1dn
NTD). In various embodiments, the following MLH1 variants are provided in this disclosure:
[0290] In various embodiments, the MMR pathway inhibitor can be a small molecule inhibitor. In other embodiments, the inhibitor can be an anti-MLH1 antibody, e.g., a neutralizing antibody that inactivates MLH1. In still other embodiments, the inhibitor can be a dominant negative mutant of MLH1. In still other embodiments, the inhibitor can be targeted at the level of transcription of MLH1, e.g., an siRNA or other nucleic acid agent that knocks down the level of a transcript encoding MLH1. [0291] In still other aspects, the present disclosure provides methods for prime editing whereby correction by the MMR pathway of the alterations introduced into a target nucleic acid molecule is evaded, without the need to provide an inhibitor of the MMR pathway. pegRNAs designed with consecutive nucleotide mismatches compared to a target site on the target nucleic acid, for example, pegRNAs that have three or more consecutive mismatching nucleotides, can evade correction by the MMR pathway, resulting in an increase in prime editing efficiency and/or a decrease in the frequency of indel formation compared to the introduction of a single nucleotide mismatch using prime editing. In addition, insertions and deletions of 10 or more nucleotides in length introduced by prime editing may also evade correction by the MMR pathway, resulting in an increase in prime editing efficiency and/or a decrease in the frequency of indel formation compared to the introduction of an insertion or deletion of less than 10 nucleotides in length using prime editing. [0292] Thus, in one aspect, the present disclosure provides methods for editing a nucleic acid molecule by prime editing using a prime editor and a pegRNA comprising a DNA synthesis template on its extension arm comprising three or more consecutive nucleotide mismatches
relative to a target site on the nucleic acid molecule. At least one of the consecutive nucleotide mismatches results in an alteration in the amino acid sequence of a protein expressed from the nucleic acid molecule. In some embodiments, more than one of the consecutive nucleotide mismatches results in an alteration in the amino acid sequence of a protein expressed from the nucleic acid molecule. On the other hand, at least one of the remaining nucleotide mismatches (i.e., those that do not result in an alteration in the amino acid sequence of a protein expressed from the nucleic acid molecule) are silent mutations. The silent mutations may be present in coding regions of the target nucleic acid molecule or in non-coding regions of the target nucleic acid molecule. When the silent mutations are present in a coding region, they introduce into the nucleic acid molecule one or more alternate codons encoding the same amino acid as the unedited nucleic acid molecule. Alternatively, when the silent mutations are in a non-coding region, the silent mutations may be present in a region of the nucleic acid molecule that does not influence splicing, gene regulation, RNA lifetime, or other biological properties of the target site on the nucleic acid molecule. [0293] Any number of consecutive nucleotide mismatches of three or more can be used to achieve the benefits of evading correction by the MMR pathway. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 consecutive nucleotide mismatches relative to the endogenous sequence of a target site in the nucleic acid molecule edited by prime editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 3, 4, or 5 consecutive nucleotide mismatches relative to the endogenous sequence of a target site in the nucleic acid molecule edited by prime editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 consecutive nucleotide mismatches relative to the endogenous sequence of a target site in the nucleic acid molecule edited by prime editing. In some embodiments, the DNA synthesis template of the extension arm on the pegRNA comprises four or more, five or more, six or more, seven or more, eight or more, nine or more, or ten or more consecutive nucleotide mismatches relative to a target site on the nucleic acid molecule. [0294] In another aspect, the present disclosure provides methods for editing a nucleic acid molecule by prime editing comprising delivering a prime editor as described herein and a pegRNA comprising a DNA synthesis template on its extension arm comprising an insertion
or deletion of 10 or more nucleotides relative to a target site on the nucleic acid molecule. Insertions and deletions of 10 or more nucleotides in length evade correction by the MMR pathway when introduced by prime editing and thus can benefit from the inhibition of the MMR pathway without the need to provide an inhibitor of MMR. Insertions and deletions of any length greater than 10 nucleotides can be used to achieve the benefits of naturally evading correction by the MMR pathway. In some embodiments, the DNA synthesis template comprises an insertion or deletion of 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides relative to the endogenous sequence at a target site of the nucleic acid molecule edited by prime editing. In some embodiments, the DNA synthesis template comprises an insertion or deletion of 11 or more nucleotides, 12 or more nucleotides, 13 or more nucleotides, 14 or more nucleotides, 15 or more nucleotides, 16 or more nucleotides, 17 or more nucleotides, 18 or more nucleotides, 19 or more nucleotides, 20 or more nucleotides, 21 or more nucleotides, 22 or more nucleotides, 23 or more nucleotides, 24 or more nucleotides, or 25 or more nucleotides relative to a target site on a nucleic acid molecule. In certain embodiments, the DNA synthesis template comprises an insertion or deletion of 15 or more nucleotides relative to a target site on the nucleic acid molecule. [0295] In some embodiments, pegRNAs comprising MMR-evading silent edits or mutations in addition to desired non-silent edit(s) can be used to design PE3b (or PE5b) second strand nicking guide RNAs (ngRNAs) for tighter temporal control of second strand nicking, and hence reduced indel formation, as compared to prime editing using pegRNAs that do not include additional silent edits and PE3b (or PE5b) ngRNAs designed accordingly. In some embodiments, provided herein are prime editing systems comprising pegRNAs and ngRNAs for MMR evasion and reduced indel frequency. [0296] In some embodiments, a prime editing system comprises (i) a prime editor comprising a nucleic acid programmable DNA binding protein (napDNAbp) and a reverse transcriptase, (ii) a prime editing guide RNA (PEgRNA) comprising: (a) a spacer sequence that comprises a region of complementarity to a non-edit strand of a double-stranded target DNA sequence; (b) an extension arm that comprises a DNA synthesis template and a primer binding site in a 5′ to 3′ orientation, wherein the primer binding site comprises a region of complementarity to a region upstream of a first nick site in the edit strand of the double-stranded target DNA sequence, and wherein the DNA synthesis template encodes a single strand DNA sequence that comprises two or more nucleotide changes compared to a region downstream of the first
nick site in the edit strand of the double-stranded target DNA sequence, wherein the two or more nucleotide changes includes a non-silent mutation and at least one silent mutation, and (c) a gRNA core that interacts with the napDNAbp; and (iii) a second strand nicking guide RNA (ngRNA) comprising (a) a spacer sequence that comprises a region of complementarity to a region in the single strand DNA sequence comprising at least two of the two or more nucleotide changes, and (b) a gRNA core that interacts with the napDNAbp. [0297] In some embodiments, contacting the double stranded target DNA with the prime editing composition installs the two or more nucleotide changes in the double-stranded target DNA sequence, thereby modifying the double-stranded target DNA sequence. In some embodiments, contacting the double stranded target DNA with the prime editing composition installs the two or more nucleotide changes in the double-stranded target DNA sequence, thereby modifying the double-stranded target DNA sequence. [0298] In some embodiments, contacting the double stranded target DNA sequence with the prime editing composition results in increased the modification efficiency and/or reduced indel frequency as compared to contacting the double stranded target DNA sequence with a control prime editing composition that comprises a control pegRNA and a control ngRNA, wherein the control pegRNA comprises a DNA synthesis template that encodes a single stranded DNA comprising only the non-silent mutation and not the at least one silent mutation compared to the region downstream of the first nick site in the edit strand of the double-stranded target DNA sequence, and wherein the control ngRNA comprises a spacer sequence that comprises a region of complementarity to a region to a single stranded DNA encoded by the DNA synthesis template of the control pegRNA comprising the non-silent mutation. . [0299] In some embodiments, provided herein is a method of prime editing of a double- stranded target DNA sequence, the method comprising: contacting the double-stranded target DNA sequence, which comprises an edit strand and a non-edit strand, with a prime editing system, wherein the prime editing system comprises (i) a prime editor comprising a nucleic acid programmable DNA binding protein (napDNAbp) and a reverse transcriptase, (ii) a prime editing guide RNA (PEgRNA) comprising: (a) a spacer sequence that comprises a region of complementarity to a non-edit strand of a double-stranded target DNA sequence; (b) an extension arm that comprises a DNA synthesis template and a primer binding site in a
5′ to 3′ orientation, wherein the primer binding site comprises a region of complementarity to a region upstream of a first nick site in the edit strand of the double-stranded target DNA sequence, and wherein the DNA synthesis template encodes a single strand DNA sequence that comprises two or more nucleotide changes compared to a region downstream of the first nick site in the edit strand of the double-stranded target DNA sequence, wherein the two or more nucleotide changes includes a non-silent mutation and at least one silent mutation, and (c) a gRNA core that interacts with the napDNAbp; and (iii) a second strand nicking guide RNA (ngRNA) comprising(a) a spacer sequence that comprises a region of complementarity to a region in the single strand DNA sequence comprising at least two of the two or more nucleotide changes, and (b) a gRNA core that interacts with the napDNAbp, wherein the contacting installs the two or more nucleotide changes in the double-stranded target DNA sequence, thereby modifying the double-stranded target DNA sequence, [0300] wherein the modification efficiency is increased and wherein the indel frequency is reduced as compared to contacting the double stranded target DNA sequence with a control prime editing composition that comprises a pegRNA comprising a DNA synthesis template that encodes a single stranded DNA comprising only the non-silent mutation and not the at least one silent mutation compared to the region downstream of the first nick site in the edit strand of the double-stranded target DNA sequence, and wherein the control ngRNA comprises a spacer sequence that comprises a region of complementarity to a region to a single stranded DNA encoded by the DNA synthesis template of the control pegRNA comprising the non-silent mutation. [0301] In some embodiments, the contacting results in nicking the edit strand to form a free 3′ end at the first nick site. In some embodiments, the contacting results in annealing the primer binding site with the region of the edit strand upstream of the first nick site. In some embodiments, the contacting results in synthesizing the single strand DNA sequence encoded by the DNA synthesis template from the free 3′ end of the edit strand. In some embodiments, the contacting results in annealing the single strand DNA sequence to the non- edit strand downstream of the first nick site in the edit strand, thereby displacing said region downstream of the first nick site. In some embodiments, the contacting results in nicking the non-edit strand replicating the DNA, thereby incorporating the two or more nucleotide changes of the single strand DNA sequence to the non-edit strand.
[0302] In some embodiments, the modification efficiency and/or the indel frequency are is determined by contacting the double stranded target DNA sequence in a population of cells each comprising at least one copy of the double stranded target DNA sequence and calculating percentage of editing and indels based on high throughput sequencing of the population of cells after the contacting. PEgRNAs [0303] The prime editing systems described herein contemplates the use of any suitable PEgRNAs. PEgRNA architecture [0304] In some embodiments, an extended guide RNA, or pegRNA, used in the prime editing systems disclosed herein includes a spacer sequence (e.g. a ~20 nt spacer sequence) and a gRNA core region, which binds with the napDNAbp. In some embodiments, the peg RNA includes an extended RNA segment, i.e., an extension arm, at the 5´ end, i.e., a 5´ extension. In some embodiments, the 5´extension includes a reverse transcription template sequence, a primer binding site, and an optional 5-20 nucleotide linker sequence. The RT primer binding site hybridizes to the free 3ʹ end that is formed after a nick is formed in the non-target strand of the R-loop, thereby priming reverse transcriptase for DNA polymerization in the 5´-3´ direction. [0305] In another embodiment, an extended guide RNA, i.e., a pegRNA, usable in the prime editing system is used in the methods and compositions includes a spacer sequence (e.g. a ~20 nt spacer sequence) and a gRNA core, which binds with the napDNAbp. In some embodiments, the pegRNA includes an extended RNA segment, i.e., an extension arm, at the 3´ end, i.e., a 3´ extension. In some embodiments, the 3´extension includes a reverse transcription template sequence, and a reverse transcription primer binding site. The RT primer binding site hybridizes to the free 3´ end that is formed after a nick is formed in the non-target strand of the R-loop, thereby priming reverse transcriptase for DNA polymerization in the 5´-3´ direction. [0306] In another embodiment, an extended guide RNA, i.e., a pegRNA, usable in the prime editing system is used in the methods and compositions includes a spacer sequence (e.g. a ~20 nt spacer sequence) and a gRNA core, which binds with the napDNAbp. In some embodiments, the pegRNA includes an extended RNA segment, i.e., an extension arm, at an intermolecular position within the gRNA core, i.e., an intramolecular extension. In some embodiments, the intramolecular extension includes a reverse transcription template
sequence, and a reverse transcription primer binding site. The RT primer binding site hybridizes to the free 3´ end that is formed after a nick is formed in the non-target strand of the R-loop, thereby priming reverse transcriptase for DNA polymerization in the 5´-3´ direction. [0307] In one embodiment, the position of the intermolecular RNA extension is not in the spacer sequence of the guide RNA. In another embodiment, the position of the intermolecular RNA extension in the gRNA core. In still another embodiment, the position of the intermolecular RNA extension is anywhere within the guide RNA molecule except within the spacer sequence, or at a position which disrupts the spacer sequence. In one embodiment, the intermolecular RNA extension is inserted downstream from the 3´ end of the spacer sequence. In another embodiment, the intermolecular RNA extension is inserted at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, or at least 25 nucleotides downstream of the 3´ end of the spacer sequence. [0308] In other embodiments, the intermolecular RNA extension is inserted into the gRNA core, which refers to the portion of a traditional guide RNA corresponding or comprising the tracrRNA, which binds and/or interacts with the napDNAbp, e.g., a Cas9 protein or equivalent thereof (i.e., a different napDNAbp). Preferably the insertion of the intermolecular RNA extension does not disrupt or minimally disrupts the interaction between the tracrRNA portion and the napDNAbp. [0309] The length of the RNA extension (which includes at least the RT template and primer binding site) can be any useful length. In various embodiments, the RNA extension is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least
100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length. [0310] The RT template sequence can also be any suitable length. For example, the RT template sequence can be at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length. [0311] In still other embodiments, the reverse transcription primer binding site sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length. [0312] In other embodiments, the optional linker or spacer sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length. [0313] The RT template sequence, in certain embodiments, encodes a single-stranded DNA molecule which is homologous to the non-target strand (and thus, complementary to the corresponding site of the target strand) but includes one or more nucleotide changes. The one
or more nucleotide changes may include one or more single-base nucleotide changes, one or more deletions, and/or one or more insertions. [0314] The synthesized single-stranded DNA product of the RT template sequence is homologous to the non-target strand except that it contains one or more nucleotide changes. The single-stranded DNA product of the RT template sequence hybridizes in equilibrium with the complementary target strand sequence, thereby displacing the homologous endogenous target strand sequence. The displaced endogenous strand may be referred to in some embodiments as a 5´ endogenous DNA flap species. This 5´ endogenous DNA flap species can be removed by a 5´ flap endonuclease (e.g., FEN1) and the single-stranded DNA product, now hybridized to the endogenous target strand, may be ligated, thereby creating a mismatch between the endogenous sequence and the newly synthesized strand. The mismatch may be resolved by the cell’s innate DNA repair and/or replication processes. [0315] In various embodiments, the nucleotide sequence of the RT template sequence corresponds to the nucleotide sequence of the non-target strand that becomes displaced as the 5´ flap species and that overlaps with the site to be edited. [0316] In various embodiments of the extended guide RNAs, the reverse transcription template sequence may encode a single-strand DNA flap that is complementary to an endogenous DNA sequence adjacent to a nick site, wherein the single-strand DNA flap comprises a desired nucleotide change. The single-stranded DNA flap may displace an endogenous single-strand DNA at the nick site. The displaced endogenous single-strand DNA at the nick site can have a 5´ end and form an endogenous flap, which can be excised by the cell. In various embodiments, excision of the 5´ end endogenous flap can help drive product formation since removing the 5´ end endogenous flap encourages hybridization of the single- strand 3´ DNA flap to the corresponding complementary DNA strand, and the incorporation or assimilation of the desired nucleotide change carried by the single-strand 3´ DNA flap into the target DNA. [0317] The terms “cleavage site,” “nick site,” and “cut site” as used interchangeably herein in the context of prime editing, refer to a specific position in between two nucleotides or two base pairs in the double-stranded target DNA sequence. In some embodiments, the position of a nick site is determined relative to the position of a specific PAM sequence. In some embodiments, the nick site is the particular position where a nick will occur when the double stranded target DNA is contacted with a napDNAbp, e.g., a nickase such as a Cas nickase, that recognizes a specific PAM sequence. For each PEgRNA described herein, a nick site
(e.g., the “first nick site” when referred to in the context of PE3, PE5 and similar approaches), is characteristic of the particular napDNAbp to which the gRNA core of the PEgRNA associates with, and is characteristic of the particular PAM required for recognition and function of the napDNAbp. For example, for a PEgRNA that comprises a gRNA core that associates with a SpCas9, the nick site in the phosphodiester bond between bases three (“-3” position relative to the position 1 of the PAM sequence) and four (“-4” position relative to position 1 of the PAM sequence). [0318] In some embodiments, a nick site is in a target strand of the double-stranded target DNA sequence. In some embodiments, a nick site is in a non-target strand of the double- stranded target DNA sequence. In some embodiments, the nick site is in a protospacer sequence. In some embodiments, the nick site is adjacent to a protospacer sequence. In some embodiments, a nick site is downstream of a region, e.g., on a non-target strand, that is complementary to a primer binding site of a PEgRNA. In some embodiments, a nick site is downstream of a region, e.g., on a non-target strand, that binds to a primer binding site of a PEgRNA. In some embodiments, a nick site is immediately downstream of a region, e.g., on a non-target strand, that is complementary to a primer binding site of a PEgRNA. In some embodiments, the nick site is upstream of a specific PAM sequence on the non-target strand of the double stranded target DNA, wherein the PAM sequence is specific for recognition by a napDNAbp that associates with the gRNA core of a PEgRNA. In some embodiments, the nick site is downstream of a specific PAM sequence on the non-target strand of the double stranded target DNA. wherein the PAM sequence is specific for recognition by a napDNAbp that associates with the gRNA core of a PEgRNA. In some embodiments, the nick site is 3 nucleotides upstream of the PAM sequence, and the PAM sequence is recognized by a Streptococcus pyogenes Cas9 nickase, a P. lavamentivorans Cas9 nickase, a C. diphtheriae Cas9 nickase, a N. cinerea Cas9, a S. aureus Cas9, or a N. lari Cas9 nickase. In some embodiments, the nick site is 3 nucleotides upstream of the PAM sequence, and the PAM sequence is recognized by a Cas9 nickase, wherein the Cas9 nickase comprises a nuclease active HNH domain and a nuclease inactive RuvC domain. In some embodiments, the nick site is 2 base pairs upstream of the PAM sequence, and the PAM sequence is recognized by a S. thermophilus Cas9 nickase. [0319] In various embodiments of the extended guide RNAs, the cellular repair of the single- strand DNA flap results in installation of the desired nucleotide change, thereby forming a desired product.
[0320] In still other embodiments, the desired nucleotide change is installed in an editing window that is between about -5 to +5 of the nick site, or between about -10 to +10 of the nick site, or between about -20 to +20 of the nick site, or between about -30 to +30 of the nick site, or between about -40 to + 40 of the nick site, or between about -50 to +50 of the nick site, or between about -60 to +60 of the nick site, or between about -70 to +70 of the nick site, or between about -80 to +80 of the nick site, or between about -90 to +90 of the nick site, or between about -100 to +100 of the nick site, or between about -200 to +200 of the nick site. [0321] In other embodiments, the desired nucleotide change is installed in an editing window that is between about +1 to +2 from the nick site, or about +1 to +3, +1 to +4, +1 to +5, +1 to +6, +1 to +7, +1 to +8, +1 to +9, +1 to +10, +1 to +11, +1 to +12, +1 to +13, +1 to +14, +1 to +15, +1 to +16, +1 to +17, +1 to +18, +1 to +19, +1 to +20, +1 to +21, +1 to +22, +1 to +23, +1 to +24, +1 to +25, +1 to +26, +1 to +27, +1 to +28, +1 to +29, +1 to +30, +1 to +31, +1 to +32, +1 to +33, +1 to +34, +1 to +35, +1 to +36, +1 to +37, +1 to +38, +1 to +39, +1 to +40, +1 to +41, +1 to +42, +1 to +43, +1 to +44, +1 to +45, +1 to +46, +1 to +47, +1 to +48, +1 to +49, +1 to +50, +1 to +51, +1 to +52, +1 to +53, +1 to +54, +1 to +55, +1 to +56, +1 to +57, +1 to +58, +1 to +59, +1 to +60, +1 to +61, +1 to +62, +1 to +63, +1 to +64, +1 to +65, +1 to +66, +1 to +67, +1 to +68, +1 to +69, +1 to +70, +1 to +71, +1 to +72, +1 to +73, +1 to +74, +1 to +75, +1 to +76, +1 to +77, +1 to +78, +1 to +79, +1 to +80, +1 to +81, +1 to +82, +1 to +83, +1 to +84, +1 to +85, +1 to +86, +1 to +87, +1 to +88, +1 to +89, +1 to +90, +1 to +90, +1 to +91, +1 to +92, +1 to +93, +1 to +94, +1 to +95, +1 to +96, +1 to +97, +1 to +98, +1 to +99, +1 to +100, +1 to +101, +1 to +102, +1 to +103, +1 to +104, +1 to +105, +1 to +106, +1 to +107, +1 to +108, +1 to +109, +1 to +110, +1 to +111, +1 to +112, +1 to +113, +1 to +114, +1 to +115, +1 to +116, +1 to +117, +1 to +118, +1 to +119, +1 to +120, +1 to +121, +1 to +122, +1 to +123, +1 to +124, or +1 to +125 from the nick site. [0322] In still other embodiments, the desired nucleotide change is installed in an editing window that is between about +1 to +2 from the nick site, or about +1 to +5, +1 to +10, +1 to +15, +1 to +20, +1 to +25, +1 to +30, +1 to +35, +1 to +40, +1 to +45, +1 to +50, +1 to +55, +1 to +100, +1 to +105, +1 to +110, +1 to +115, +1 to +120, +1 to +125, +1 to +130, +1 to +135, +1 to +140, +1 to +145, +1 to +150, +1 to +155, +1 to +160, +1 to +165, +1 to +170, +1 to +175, +1 to +180, +1 to +185, +1 to +190, +1 to +195, or +1 to +200, from the nick site.
[0323] In various aspects, the extended guide RNAs are modified versions of an extended guide RNA. pegRNAs (i.e. extended guide RNAs) and ngRNAs maybe expressed from an encoding nucleic acid, or synthesized chemically. Methods are well known in the art for obtaining or otherwise synthesizing guide RNAs, and for determining the appropriate sequence of the pegRNA, including the protospacer sequence which interacts and hybridizes with the target strand of a genomic target site of interest. [0324] In various embodiments, the particular design aspects of a pegRNA sequence and ngRNA sequence will depend upon the nucleotide sequence of a genomic target site of interest (i.e., the desired site to be edited) and the type of napDNAbp (e.g., Cas9 protein) present in the prime editing systems utilized in the methods and compositions described herein, among other factors, such as PAM sequence locations, percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc. [0325] In general, a spacer sequence (i.e. a guide sequence) of a pegRNA or ngRNA can be any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a napDNAbp (e.g., a Cas9, Cas9 homolog, or Cas9 variant) to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). In some embodiments, a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. [0326] In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide sequence to direct sequence- specific binding of a prime editor to a target sequence may be assessed by any suitable assay. For example, the components of a prime editor, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of a prime editor disclosed herein,
followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a prime editor, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art. [0327] A guide sequence may be selected to target any target sequence. In some embodiments, the target sequence is a sequence within a genome of a cell. Exemplary target sequences include those that are unique in the target genome. For example, for the S. pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGG where NNNNNNNNNNNNXGG (N is A, G, T, or C; and X can be anything). A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGG where NNNNNNNNNNNXGG (N is A, G, T, or C; and X can be anything). For the S. thermophilus CRISPR1Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXXAGAAW where NNNNNNNNNNNNXXAGAAW (N is A, G, T, or C; X can be anything; and W is A or T). A unique target sequence in a genome may include an S. thermophilus CRISPR 1 Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXXAGAAW where NNNNNNNNNNNXXAGAAW (N is A, G, T, or C; X can be anything; and W is A or T). For the S. pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGGXG where NNNNNNNNNNNNXGGXG (N is A, G, T, or C; and X can be anything). A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGGXG where NNNNNNNNNNNXGGXG (N is A, G, T, or C; and X can be anything). In each of these sequences “M” may be A, G, T, or C, and need not be considered in identifying a sequence as unique. [0328] In some embodiments, a guide sequence is selected to reduce the degree of secondary structure within the guide sequence. Secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res.9 (1981), 133-148). Another example folding algorithm is the online
webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see, e.g., A. R. Gruber et al., 2008, Cell 106(1): 23-24; and PA Carr and GM Church, 2009, Nature Biotechnology 27(12): 1151- 62). Further algorithms may be found in U.S. Application Ser. No.61/836,080, incorporated herein by reference. [0329] In some embodiments, the scaffold or gRNA core portion of a pegRNA comprises sequences corresponding to the tracr sequence and tracr mate sequence of a traditional guide RNA. In general, a tracr mate sequence includes any sequence that has sufficient complementarity with a tracr sequence to promote one or more of: (1) excision of a guide sequence flanked by tracr mate sequences in a cell containing the corresponding tracr sequence; and (2) formation of a complex at a target sequence, wherein the complex comprises the tracr mate sequence hybridized to the tracr sequence. In general, degree of complementarity is with reference to the optimal alignment of the tracr mate sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self-complementarity within either the tracr sequence or tracr mate sequence. In some embodiments, the degree of complementarity between the tracr sequence and tracr mate sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. In some embodiments, the tracr sequence and tracr mate sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin. Preferred loop forming sequences for use in hairpin structures are four nucleotides in length, and most preferably have the sequence GAAA. However, longer or shorter loop sequences may be used, as may alternative sequences. The sequences preferably include a nucleotide triplet (for example, AAA), and an additional nucleotide (for example C or G). Examples of loop forming sequences include CAAA and AAAG. In an embodiment of the invention, the transcript or transcribed polynucleotide sequence has at least two or more hairpins. In preferred embodiments, the transcript has two, three, four or five hairpins. In a further embodiment of the invention, the transcript has at most five hairpins. In some embodiments, the single transcript further includes a transcription termination sequence; preferably this is a polyT sequence, for example six T nucleotides.
Further non-limiting examples of single polynucleotides comprising a guide sequence, a tracr mate sequence, and a tracr sequence are as follows (listed 5′ to 3′), where “N” represents a base of a guide sequence, the first block of lower case letters represent the tracr mate sequence, and the second block of lower case letters represent the tracr sequence, and the final poly-T sequence represents the transcription terminator: (1)NNNNNNNNGTTTTTGTACTCTCAAGATTTAGAAATAAATCTTGCAGAAGCTACA AAGATAAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTC GTTATTTAATTTTTT (SEQ ID NO: 123); (2)NNNNNNNNNNNNNNNNNNGTTTTTGTACTCTCAGAAATGCAGAAGCTACAAA GATAAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTCGT TATTTAATTTTTT (SEQ ID NO: 124); (3)NNNNNNNNNNNNNNNNNNNNGTTTTTGTACTCTCAGAAATGCAGAAGCTACA AAGATAAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTT T (SEQ ID NO: 125); (4)NNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAGTTAAAATAA GGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTT (SEQ ID NO: 126); (5)NNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAGTTAAAATAA GGCTAGTCCGTTATCAACTTGAAAAAGTGTTTTTTT (SEQ ID NO: 127); AND (6) NNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCATTTTTTTT (SEQ ID NO: 128). [0330] In some embodiments, sequences (1) to (3) are used in combination with Cas9 from S. thermophilus CRISPR1. In some embodiments, sequences (4) to (6) are used in combination with Cas9 from S. pyogenes. In some embodiments, the tracr sequence is a separate transcript from a transcript comprising the tracr mate sequence. [0331] It will be apparent to those of skill in the art that in order to target any of the fusion proteins comprising a Cas9 domain and a single-stranded DNA binding protein, as disclosed herein, to a target site, e.g., a site comprising a point mutation to be edited, it is typically necessary to co-express the fusion protein together with a guide RNA, e.g., an sgRNA. As explained in more detail elsewhere herein, a guide RNA typically comprises a tracrRNA framework allowing for Cas9 binding, and a guide sequence, which confers sequence specificity to the Cas9:nucleic acid editing enzyme/domain fusion protein.
[0332] In some embodiments, a pegRNA comprises a structure 5ʹ-[guide sequence]- GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAAGGCUAGUCCGUUAUCAACU UGAAAAAGUGGCACCGAGUCGGUGCUUUUU(SEQ ID NO: 129)-extension arm-3ʹ, wherein the guide sequence comprises a sequence that is complementary to the target sequence. The guide sequence, also referred to herein as the spacer sequence, is typically 20 nucleotides long. The sequences of suitable guide RNAs for targeting Cas9:nucleic acid editing enzyme/domain fusion proteins to specific genomic target sites will be apparent to those of skill in the art based on the instant disclosure. Such suitable guide RNA sequences typically comprise guide sequences that are complementary to a nucleic acid sequence within 50 nucleotides upstream or downstream of the target nucleotide to be edited. Some exemplary guide RNA sequences suitable for targeting any of the provided fusion proteins to specific target sequences are provided herein. Additional guide sequences are well known in the art and can be used with the prime editors utilized in the methods and compositions described herein. [0333] In some embodiments, a PEgRNA comprises three main component elements ordered in the 5ʹ to 3ʹ direction, namely: a spacer, a gRNA core, and an extension arm at the 3ʹ end. In some embodiments, the extension arm may further be divided into the following structural elements in the 5ʹ to 3ʹ direction, namely: an edit template , a homology arm, and a primer binding site. In some embodiments, the extension arm may further be divided into the following structural elements in the 5ʹ to 3ʹ direction, namely: a homology arm, an edit template, and a primer binding site. In some embodiments, the extension arm may further be divided into the following structural elements in the 5ʹ to 3ʹ direction, namely: a DNA synthesis template (e.g., a RT template), and a primer binding site. In addition, the PEgRNA may comprise an optional 3ʹ end modifier region and an optional 5ʹ end modifier region . Still further, the PEgRNA may comprise a transcriptional termination signal at the 3ʹ end of the PEgRNA. These structural elements are further defined herein. The depiction of the structure of the PEgRNA is not meant to be limiting and embraces variations in the arrangement of the elements. For example, the optional sequence modifiers and could be positioned within or between any of the other regions shown, and not limited to being located at the 3ʹ and 5ʹ ends. PEgRNA modifications [0334] The PEgRNAs may also include additional design modifications that may alter the properties and/or characteristics of PEgRNAs, thereby improving the efficacy of prime editing. In various embodiments, these modifications may belong to one or more of a number
of different categories, including but not limited to: (1) designs to enable efficient expression of functional PEgRNAs from non-polymerase III (pol III) promoters, which would enable the expression of longer PEgRNAs without burdensome sequence requirements; (2) modifications to the core, Cas9-binding PEgRNA scaffold, which could improve efficacy; (3) modifications to the PEgRNA to improve RT processivity, allowing the insertion of longer sequences at targeted genomic loci; and (4) addition of RNA motifs to the 5ʹ or 3ʹ termini of the PEgRNA that improve PEgRNA stability, enhance RT processivity, prevent misfolding of the PEgRNA, or recruit additional factors important for genome editing. [0335] In one embodiment, PEgRNA could be designed with polIII promoters to improve the expression of longer-length PEgRNA with larger extension arms. sgRNAs are typically expressed from the U6 snRNA promoter. This promoter recruits pol III to express the associated RNA and is useful for expression of short RNAs that are retained within the nucleus. However, pol III is not highly processive and is unable to express RNAs longer than a few hundred nucleotides in length at the levels required for efficient genome editing. Additionally, pol III can stall or terminate at stretches of U’s, potentially limiting the sequence diversity that could be inserted using a PEgRNA. Other promoters that recruit polymerase II (such as pCMV) or polymerase I (such as the U1 snRNA promoter) have been examined for their ability to express longer sgRNAs. However, these promoters are typically partially transcribed, which would result in extra sequence 5ʹ of the spacer in the expressed PEgRNA, which has been shown to result in markedly reduced Cas9:sgRNA activity in a site-dependent manner. Additionally, while pol III-transcribed PEgRNAs can simply terminate in a run of 6-7 U’s, PEgRNAs transcribed from pol II or pol I would require a different termination signal. Often such signals also result in polyadenylation, which would result in undesired transport of the PEgRNA from the nucleus. Similarly, RNAs expressed from pol II promoters such as pCMV are typically 5ʹ-capped, also resulting in their nuclear export. [0336] In some aspects, the present disclosure provides next-generation modified pegRNAs (also referred to herein as “engineered pegRNAs” or “epegRNAs”) with improved properties, including but not limited to, increased stability and cellular lifespan, and improved binding affinity for a napDNAbp. These modified pegRNAs result in improved genome editing as demonstrated by increase editing efficiency at a wide variety of genomic sites. By appending certain nucleic acid structural motifs to terminus of the extension arm of a pegRNA, including but limited to, a prequeosin1-1 riboswitch aptamer (“evopreQ1-1”) or variant
thereof, a pseudoknot from the MMLV viral genome (“evopreQ
1-1”) or variant thereof, a modified tRNA used by MMLV RT as a primer for reverse transcription or variant thereof, and a G quadruplex or variant thereof, a consistent increase in editing activity may be achieved. [0337] In one embodiment, the modified pegRNAs include a nucleic acid moiety at the 3′ end of the pegRNA. Optionally, the 3′ end of the pegRNA is fused to the nucleic acid moiety through a nucleotide linker. In various embodiments, it will be appreciated that a wide variety of nucleotide sequences will work reasonably well for each genomic target site. Linker length can also be variable. In some cases, linkers ranging in length from 3-18 nucleotides will work. In other cases, the linker may be at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, or at least 30 nucleotides. [0338] In general, the nucleic acid moieties that may be used to modify a pegRNA, for example, by attaching it to the 3′ end of a pegRNA, may include any nucleic acid moiety, including, for instance, a nucleic acid molecule comprising or which forms a double-helix moiety, toeloop moiety, hairpin moiety, stem-loop moiety, pseudoknot moiety, aptamer moiety, G quadraplex moiety, tRNA moiety, or a ribozyme moiety. The nucleic acid moiety may be characterized as forming a secondary nucleic acid structure, a tertiary nucleic acid structure, or a quadruple nucleic acid structure. In other words, the nucleic acid moiety may form any two-dimensional or three-dimensional structure known to be formed by such structures. The nucleic acid moiety may be DNA or RNA. [0339] Without restriction, the following are specific examples of nucleotide motifs that may be appended to the terminus of the extension arm of a pegRNA. Thus, in the case of a 3′ extension arm, the nucleotide motif would be coupled, attached, or otherwise linked to the 3′ of the pegRNA, optionally via a linker. In the case of a 5′ extension arm, the nucleotide motif would be coupled, attached, or otherwise linked to the 5′ end of the pegRNA, optionally via a linker.
[0340] As indicated above, these motifs may be coupled, attached, or otherwise joined to a canonical pegRNA via a linker. Exemplary linkers include, but are not limited to:
site being targeted by prime editing and the modified pegRNA. [0342] In various embodiments, it will be appreciated that a wide variety of nucleotide sequences will work reasonably well for each genomic target site. Linker length is also likely to be variable. In some cases, linkers ranging in length from 3-18 nucleotides will work. In other cases, the linker may be at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 26 nucleotides, at least 27 nucleotides, at least 28 nucleotides, at least 29 nucleotides, or at least 30 nucleotides. [0343] In one embodiment, the linker is 8 nucleotides in length. [0344] The present disclosure also contemplates variants of the above nucleotide motifs and linkers that have at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% sequence identity with any of the above motif and linker sequences. [0345] The pegRNAs may also include additional design improvements that may modify the properties and/or characteristics of pegRNAs thereby improving the efficacy of prime editing. In various embodiments, these improvements may belong to one or more of a number of different categories, including but not limited to: (1) designs to enable efficient expression of functional pegRNAs from non-polymerase III (pol III) promoters, which would enable the expression of longer pegRNAs without burdensome sequence requirements; (2) improvements to the core, Cas9-binding pegRNA scaffold, which could improve efficacy; (3) modifications to the pegRNA to improve RT processivity, allowing the insertion of longer sequences at targeted genomic loci; and (4) addition of RNA motifs to the 5ʹ or 3ʹ termini of the pegRNA that improve pegRNA stability, enhance RT processivity, prevent misfolding of the pegRNA, or recruit additional factors important for genome editing.
[0346] In one embodiment, pegRNA could be designed with polIII promoters to improve the expression of longer-length pegRNA with larger extension arms. sgRNAs are typically expressed from the U6 snRNA promoter. This promoter recruits pol III to express the associated RNA and is useful for expression of short RNAs that are retained within the nucleus. However, pol III is not highly processive and is unable to express RNAs longer than a few hundred nucleotides in length at the levels required for efficient genome editing. Additionally, pol III can stall or terminate at stretches of U’s, potentially limiting the sequence diversity that could be inserted using a pegRNA. Other promoters that recruit polymerase II (such as pCMV) or polymerase I (such as the U1 snRNA promoter) have been examined for their ability to express longer sgRNAs. However, these promoters are typically partially transcribed, which would result in extra sequence 5ʹ of the spacer in the expressed pegRNA, which has been shown to result in markedly reduced Cas9:sgRNA activity in a site- dependent manner. Additionally, while pol III-transcribed pegRNAs can simply terminate in a run of 6-7 U’s, pegRNAs transcribed from pol II or pol I would require a different termination signal. Often such signals also result in polyadenylation, which would result in undesired transport of the pegRNA from the nucleus. Similarly, RNAs expressed from pol II promoters such as pCMV are typically 5ʹ-capped, also resulting in their nuclear export. [0347] Exemplary U6 promoters include, but are not limited to: [0348] U6 promoter: GAGGGCCTATTTCCCATGATTCCTTCATATTTGCATATACGATACAAGGCTGTTA GAGAGATAATTAGAATTAATTTGACTGTAAACACAAAGATATTAGTACAAAATA CGTGACGTAGAAAGTAATAATTTCTTGGGTAGTTTGCAGTTTTAAAATTATGTTTT AAAATGGACTATCATATGCTTACCGTAACTTGAAAGTATTTCGATTTCTTGGCTTT ATATATCTTGTGGAAAGGACGAAACACCG (SEQ ID NO: 166) [0349] U6v9 promoter: GCCTGAGGCGTGGGGCCGCCTCCCAAAGACTTCTGGGAGGGCGGTGCGGCTCAG GCTCTGCCCCGCCTCCGGGGCTATTTGCATACGACCATTTCCAGTAATTCCCAGC AGCCACCGTAGCTATATTTGGTAGAACAACGAGCACTTTCTCAACTCCAGTCAAT AACTACGTTAGTTGCATTACACATTGGGCTAATATAAATAGAGGTTAAATCTCTA GGTCATTTAAGAGAAGTCGGCCTATGTGTACAGACATTTGTTCCAGGGGCTTTAA ATAGCTGGTGGTGGAACTCAATATTCG (SEQ ID NO: 167) [0350] U6v7 promoter: AAGTCCGCGGCACGAGAAATCAAAGCCCCGGGGCCTGGGTCCCACGCGGGGTCC
CTTACCCAGGGTGCCCCGGGCGCTCATTTGCATGTCCCACCCAACAGGTAAACCT GACAGATCGGTCGCGGCCAGGTACGGCCTGGCGGTCAGAGCACCAAACTTACGA GCCTTGTGATGAGTTCCGTTACATGAAATTCTCCTAAAGGCTCCAAGATGGACAG GAAAGCGCTCGATTAGGTTACCGTAAGGAAAACAAATGAGAAACTCCCGTGCCT TATAAGACCTGGGGACGGACTTATTTGCG (SEQ ID NO: 168) [0351] U6v4 promoter: AAATTGAGTCATCTGACAGAAATTATCTTTGGCAAGGTTTTAGTCCTAGGGTTAC CAGATGGAATACAGGACATCCATTTAAATTTGAATTTCAGATAAACAGTTAACAC TTCTCAAGGATAAATATGCCTCAAATATTGCACGGGACATATTTATACTAAAAAA AAAGTGTTTTTTTTTTTCCTGCGATTCAAACTTAACTGGTGTCCTGCATTTGTATTT GTTAAATCTGTCAATCCTATCTCAGTTTCCTTTGATGGAATGTACCTCTGTGCTAA TATTTAAAAATAGGTTACATTTG (SEQ ID NO: 169) [0352] One of ordinary skill in the art will appreciate that these promoter sequences can be trimmed at the 5′ and still function at the same or nearly the same level. For example, any of the U6 promoters could be trimmed at the 5′ end by removing up to 1, 10, 20, 30, 40, 50, 60, 70, 80, 90, or 100 nucleotides from the 5′ end, i.e., approximately 30% of the promoter length. In other embodiments, up to 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, or up to 30% of the length of the promoter from the 5′ end. [0353] One of ordinary skill in the art will also appreciate that other promoters could be used to improve the expression of longer length pegRNAs with larger extension arms. For example, in different cell types, other promoters may be preferred and result in greater expression of the longer length pegRNAs. [0354] Previously, Rinn and coworkers screened a variety of expression platforms for the production of long-noncoding RNA- (lncRNA) tagged sgRNAs. These platforms include RNAs expressed from pCMV and that terminate in the ENE element from the MALAT1 ncRNA from humans, the PAN ENE element from KSHV, or the 3ʹ box from U1 snRNA. Notably, the MALAT1 ncRNA and PAN ENEs form triple helices protecting the polyA-tail. These constructs could also enhance RNA stability. It is contemplated that these expression systems will also allow the expression of longer PEgRNAs. [0355] In addition, a series of methods have been designed for the cleavage of the portion of the pol II promoter that would be transcribed as part of the PEgRNA, adding either a self- cleaving ribozyme such as the hammerhead, pistol, hatchet, hairpin, VS, twister, or twister
sister ribozymes, or other self-cleaving elements to process the transcribed guide, or a hairpin that is recognized by Csy4 and also leads to processing of the guide. Also, it is hypothesized that incorporation of multiple ENE motifs could lead to improved PEgRNA expression and stability, as previously demonstrated for the KSHV PAN RNA and element. It is also anticipated that circularizing the PEgRNA in the form of a circular intronic RNA (ciRNA) could also lead to enhanced RNA expression and stability, as well as nuclear localization. [0356] In various embodiments, the PEgRNA may include various above elements, as exemplified by the following sequences. [0357] Non-limiting example 1 - PEgRNA expression platform consisting of pCMV, Csy4 hairpin, the PEgRNA, and MALAT1 ENE TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTC CGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCC ATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAA GTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGC CTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTA CGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCG TGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATG GGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTC CGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGC AGAGCTGGTTTAGTGAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGA CTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGT TATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAG TCTGTTTTAGGGTCATGAAGGTTTTTCTTTTCCTGAGAAAACAACACGTATTGTTTT CTCAGGTTTTGCTTTTTGGCCTTTTTCTAGCTTAAAAAAAAAAAAAGCAAAAGAT GCTGGTGGTTGGCACTCCTGGTTTCCAGGACGGGGTTCAAATCCCTGCGGCGTCT TTGCTTTGACT (SEQ ID NO: 170) [0358] Non-limiting example 2 - PEgRNA expression platform consisting of pCMV, Csy4 hairpin, the PEgRNA, and PAN ENE TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTC CGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCC ATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAA
GTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGC CTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTA CGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCG TGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATG GGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTC CGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGC AGAGCTGGTTTAGTGAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGA CTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGT TATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAG TCTGTTTTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCA GACGGCAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTTTTGACAAT TTAACGTGCCTAGAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAAC ATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAAAAAA (SEQ ID NO: 171) [0359] Non-limiting example 3 - PEgRNA expression platform consisting of pCMV, Csy4 hairpin, the PEgRNA, and 3xPAN ENE TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTC CGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCC ATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAA GTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGC CTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTA CGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCG TGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATG GGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTC CGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGC AGAGCTGGTTTAGTGAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGA CTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGT TATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAG TCTGTTTTGTTTTGGCTGGGTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCA GACGGCAAGGTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTTTTGACAAT TTAACGTGCCTAGAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAAC ATAAATAAAGGTCAATGTTTAATCCATAAAAAAAAAAAAAAAAAAAACACACTGT
TTTGGCTGGGTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCAGACGGCAAG GTTTTTATCCCAGTGTATATTGGAAAAACATGTTATACTTTTGACAATTTAACGTGC CTAGAGCTCAAATTAAACTAATACCATAACGTAATGCAACTTACAACATAAATAAA GGTCAATGTTTAATCCATAAAAAAAAAAAAAAAAAAATCTCTCTGTTTTGGCTGG GTTTTTCCTTGTTCGCACCGGACACCTCCAGTGACCAGACGGCAAGGTTTTTATCC CAGTGTATATTGGAAAAACATGTTATACTTTTGACAATTTAACGTGCCTAGAGCTCA AATTAAACTAATACCATAACGTAATGCAACTTACAACATAAATAAAGGTCAATGTTT AATCCATAAAAAAAAAAAAAAAAAAA (SEQ ID NO: 172) [0360] Non-limiting example 4 - PEgRNA expression platform consisting of pCMV, Csy4 hairpin, the PEgRNA, and 3ʹ box TAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTC CGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCC GCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCC ATTGACGTCAATGGGTGGAGTATTTACGGTAAACTGCCCACTTGGCAGTACATCAA GTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGC CTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTA CGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCG TGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATG GGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTC CGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGC AGAGCTGGTTTAGTGAACCGTCAGATCGTTCACTGCCGTATAGGCAGGGCCCAGA CTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGT TATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAG TCTGTTTGTTTCAAAAGTAGACTGTACGCTAAGGGTCATATCTTTTTTTGTTTGGTT TGTGTCTTGGTTGGCGTCTTAAA (SEQ ID NO: 173) [0361] Non-limiting example 5 - PEgRNA expression platform consisting of pU1, Csy4 hairpin, the PEgRNA, and 3ʹ box CTAAGGACCAGCTTCTTTGGGAGAGAACAGACGCAGGGGCGGGAGGGAAAAAG GGAGAGGCAGACGTCACTTCCCCTTGGCGGCTCTGGCAGCAGATTGGTCGGTTGA GTGGCAGAAAGGCAGACGGGGACTGGGCAAGGCACTGTCGGTGACATCACGGAC AGGGCGACTTCTATGTAGATGAGGCAGCGCAGAGGCTGCTGCTTCGCCACTTGCT GCTTCACCACGAAGGAGTTCCCGTGCCCTGGGAGCGGGTTCAGGACCGCTGATCG GAAGTGAGAATCCCAGCTGTGTGTCAGGGCTGGAAAGGGCTCGGGAGTGCGCGG
GGCAAGTGACCGTGTGTGTAAAGAGTGAGGCGTATGAGGCTGTGTCGGGGCAGA GGCCCAAGATCTCAGTTCACTGCCGTATAGGCAGGGCCCAGACTGAGCACGTGAG TTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAA AGTGGGACCGAGTCGGTCCTCTGCCATCAAAGCGTGCTCAGTCTGTTTCAGCAAG TTCAGAGAAATCTGAACTTGCTGGATTTTTGGAGCAGGGAGATGGAATAGGAGCT TGCTCCGTCCACTCCACGCATCGACCTGGTATTGCAGTACCTCCAGGAACGGTGC ACCCACTTTCTGGAGTTTCAAAAGTAGACTGTACGCTAAGGGTCATATCTTTTTTT GTTTGGTTTGTGTCTTGGTTGGCGTCTTAAA (SEQ ID NO: 174). [0362] In various other embodiments, the PEgRNA may be improved by introducing modifications to the scaffold or core sequences. The core, Cas9-binding PEgRNA scaffold can likely be improved to enhance PE activity. Several such approaches have already been demonstrated. For instance, the first pairing element of the scaffold (P1) contains a GTTTT- AAAAC (SEQ ID NO: 175) pairing element. Such runs of Ts have been shown to result in pol III pausing and premature termination of the RNA transcript. Rational mutation of one of the T-A pairs to a G-C pair in this portion of P1 has been shown to enhance sgRNA activity, suggesting this approach would also be feasible for PEgRNAs. Additionally, increasing the length of P1 has also been shown to enhance sgRNA folding and lead to improved activity, suggesting it as another avenue for the modification of PEgRNA activity. Example modifications to the core can include: [0363] PEgRNA containing a 6 nt extension to P1 GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGCTCATGAAAATGAGCTAGCAA GTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCC TCTGCCATCAAAGCGTGCTCAGTCTGTTTTTTT (SEQ ID NO: 176) [0364] PEgRNA containing a T-A to G-C mutation within P1 GGCCCAGACTGAGCACGTGAGTTTGAGAGCTAGAAATAGCAAGTTTAAATAAGGC TAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGC GTGCTCAGTCTGTTTTTTT (SEQ ID NO: 177) [0365] In various other embodiments, the PEgRNA may be modified at the edit template region. As the size of the insertion templated by the PEgRNA increases, it is more likely to be degraded by endonucleases, undergo spontaneous hydrolysis, or fold into secondary structures unable to be reverse-transcribed by the RT, or that disrupt folding of the PEgRNA scaffold and subsequent Cas9-RT binding. Accordingly, it is likely that modification to the template of the PEgRNA might be necessary to affect large insertions, such as the insertion of
whole genes. Some strategies to do so include the incorporation of modified nucleotides within a synthetic or semi-synthetic PEgRNA that render the RNA more resistant to degradation or hydrolysis or less likely to adopt inhibitory secondary structures. Such modifications could include 8-aza-7-deazaguanosine, which would reduce RNA secondary structure in G-rich sequences; locked-nucleic acids (LNA) that reduce degradation and enhance certain kinds of RNA secondary structure; 2’-O-methyl, 2’-fluoro, or 2’-O- methoxyethoxy modifications that enhance RNA stability. Such modifications could also be included elsewhere in the PEgRNA to enhance stability and activity. Alternatively, or additionally, the template of the PEgRNA could be designed such that it is also more likely to adopt simple secondary structures that are able to allow processing by the RT. Such simple structures would act as a thermodynamic sink, making it less likely that more complicated structures that would prevent reverse transcription would occur. Finally, one could also split the template into two separate PEgRNAs. In such a design, a prime editor protein, e.g., a nCas9-RT fusion protein, would be used to initiate transcription, and also to recruit a separate template RNA to the targeted site via an RNA-binding protein fused to Cas9 or an RNA recognition element on the PEgRNA itself such as the MS2 aptamer. The RT could either directly bind to this separate template RNA, or initiate reverse transcription on the original PEgRNA before swapping to the second template. Such an approach could allow long insertions by both preventing misfolding of the PEgRNA upon addition of the long template, and also by not requiring dissociation of Cas9 from the genome for long insertions to occur, which could possibly inhibit PE-based long insertions. [0366] In still other embodiments, the PEgRNA may be modified by introducing additional RNA motifs at the 5ʹ and 3ʹ termini of the PEgRNAs, or even at positions therein between (e.g., in the gRNA core region, or the spacer). Several such motifs - such as the PAN ENE from KSHV and the ENE from MALAT1 were discussed above as possible means to terminate expression of longer PEgRNAs from non-pol III promoters. These elements form RNA triple helices that engulf the polyA tail, resulting in their being retained within the nucleus. However, by forming complex structures at the 3ʹ terminus of the PEgRNA that occlude the terminal nucleotide, these structures would also likely help prevent exonuclease- mediated degradation of PEgRNAs. [0367] Other structural elements inserted at the 3ʹ terminus could also enhance RNA stability, albeit without allowing for termination from non-pol III promoters. Such motifs could include hairpins or RNA quadruplexes that would occlude the 3ʹ terminus, or self-
cleaving ribozymes such as HDV that would result in the formation of a 2ʹ-3ʹ-cyclic phosphate at the 3ʹ terminus, and also potentially render the PEgRNA less likely to be degraded by exonucleases. Inducing the PEgRNA to cyclize via incomplete splicing - to form a ciRNA - could also increase PEgRNA stability and result in the PEgRNA being retained within the nucleus. [0368] Additional RNA motifs could also improve RT processivity or enhance PEgRNA activity by enhancing RT binding to the DNA-RNA duplex. Addition of the native sequence bound by the RT in its cognate retroviral genome could enhance RT activity. This could include the native primer binding site (PBS), polypurine tract (PPT), or kissing loops involved in retroviral genome dimerization and initiation of transcription. [0369] Addition of dimerization motifs - such as kissing loops or a GNRA tetraloop/tetraloop receptor pair - at the 5ʹ and 3ʹ termini of the PEgRNA could also result in effective circularization of the PEgRNA, improving stability. Additionally, it is envisioned that addition of these motifs could allow the physical separation of the PEgRNA spacer and primer, preventing occlusion of the spacer, which would hinder PE activity. Short 5ʹ extensions or 3ʹ extensions to the PEgRNA that form a small toehold hairpin in the spacer region or along the primer binding site could also compete favorably against the annealing of intracomplementary regions along the length of the PEgRNA, e.g., the interaction between the spacer and the primer binding site that can occur. Finally, kissing loops could also be used to recruit other template RNAs to the genomic site and enable swapping of RT activity from one RNA to the other. A number of secondary RNA structures may be engineered into any region of the PEgRNA, including in the terminal portions of the extension arm (i.e., e1 and e2), as shown. Example modifications include, but are not limited to: [0370] PEgRNA-HDV fusion GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGC TAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTCGGTCCTCTGCCATCAAAGC GTGCTCAGTCTGGGCCGGCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTGGGCAA CATGCTTCGGCATGGCGAATGGGACTTTTTTT (SEQ ID NO: 178) [0371] PEgRNA-MMLV kissing loop GGTGGGAGACGTCCCACCGGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAA TAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACCGAGTC
GGTCCTCTGCCATCAAAGCTTCGACCGTGCTCAGTCTGGTGGGAGACGTCCCACC TTTTTTT (SEQ ID NO: 179) [0372] PEgRNA-VS ribozyme kissing loop GAGCAGCATGGCGTCGCTGCTCACGGCCCAGACTGAGCACGTGAGTTTTAGAGCT AGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGGACC GAGTCGGTCCTCTGCCATCAAAGCTTCGACCGTGCTCAGTCTCCATCAGTTGACA CCCTGAGGTTTTTTT (SEQ ID NO: 180) [0373] PEgRNA-GNRA tetraloop/tetraloop receptor GCAGACCTAAGTGGUGACATATGGTCTGGGCCCAGACTGAGCACGTGAGTTTTAG AGCTAUACGTAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTUACGAAGTGG GACCGAGTCGGTCCTCTGCCATCAAAGCTTCGACCGTGCTCAGTCTGCATGCGATT AGAAATAATCGCATGTTTTTTT (SEQ ID NO: 181) [0374] PEgRNA template switching secondary RNA-HDV fusion TCTGCCATCAAAGCTGCGACCGTGCTCAGTCTGGTGGGAGACGTCCCACCGGCCG GCATGGTCCCAGCCTCCTCGCTGGCGCCGGCTGGGCAACATGCTTCGGCATGGCG AATGGGACTTTTTTT (SEQ ID NO: 182) [0375] PEgRNA scaffolds could be further improved via directed evolution, in an analogous fashion to how SpCas9 and prime editors (PE) have been improved. Directed evolution could enhance PEgRNA recognition by Cas9 or evolved Cas9 variants. Additionally, it is likely that different PEgRNA scaffold sequences would be optimal at different genomic loci, either enhancing PE activity at the site in question, reducing off-target activities, or both. Finally, evolution of PEgRNA scaffolds to which other RNA motifs have been added would almost certainly improve the activity of the fused PEgRNA relative to the unevolved, fusion RNA. For instance, evolution of allosteric ribozymes composed of c-di-GMP-I aptamers and hammerhead ribozymes led to dramatically improved activity, suggesting that evolution would improve the activity of hammerhead-PEgRNA fusions as well. In addition, while Cas9 currently does not generally tolerate 5ʹ extension of the sgRNA as well as tolerating 3’ extensions, directed evolution will likely generate mutations that mitigate this intolerance, allowing additional RNA motifs to be utilized. [0376] In various embodiments, other scaffolds that have been shown to improve activity relative to canonical sgRNA scaffolds may be used in pegRNAs and epegRNAs as described herein. Such improvements may include, for example, those disclosed in Chen, B. et al. Dynamic Imaging of Genomic Loci in Living Human Cells by an Optimized CRISPR/Cas
System. Cell.2013, 155(7), 1479-1471 and Jost, M. et al. Titrating expression using libraries of systematically attenuated CRISPR guide RNAs. Nat. Biotechnol.2020, 38, 355-364, which are herein incorporated by reference in their entirety. These improvements may enhance epegRNA activity through improved binding to the prime editor and/or improved expression. Stabilization of the sgRNA scaffold could also reduce PBS/spacer interactions that inhibit pegRNA and epegRNA activity. [0377] Example epegRNAs incorporating improved sgRNA scaffolds include, but are not limited to: [0378] HEK31-15del standard scaffold evopreQ1 GGCCCAGACTGAGCACGTGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCGCCCTCTGGAGG AAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTTCTATCTAGTTA CGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 183) [0379] HEK31-15del cr748 evopreQ1 GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTG CCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGT TCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 184) [0380] HEK31-15del cr289 evopreQ1 GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTG CCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGT TCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 185) [0381] HEK31-15del cr622 evopreQ1 GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTG CCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGT TCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 186) [0382] HEK31-15del cr772 evopreQ1 GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTG CCCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGT TCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 187)
[0383] HEK31-15del cr532 evopreQ1 GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTGC CCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTT CTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 188) [0384] HEK31-15del cr961 evopreQ1 GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTGC CCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTT CTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 189) [0385] HEK31-15del flip and extension scaffold evopreQ1 GGCCCAGACTGAGCACGTGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGC CCTCTGGAGGAAGCAGGGCTTCCCGTGCTCAGTCTGTCTCTCTCTTGACGCGGTT CTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 190) [0386] RNF21-15del cr748 evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTAT GGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 191) [0387] RNF21-15del cr289 evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTAT GGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 192) [0388] RNF21-15del cr622 evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTAT GGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 193) [0389] RNF21-15del cr772 evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTAT
GGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 194) [0390] RNF21-15del cr532 evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTAT GGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 195) [0391] RNF21-15del cr961 evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTAT GGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 196) [0392] RNF21-15del flip and extension scaffold evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTAT GGGAACTCAGTTTATATGAGTTAGTAATGACTAAGATGTCATCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 197) [0393] RUNX11-15del standard scaffold evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTACGAAGGAAAT GACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGT TACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 198) [0394] RUNX11-15del cr748 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTA CGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 199) [0395] RUNX11-15del cr289 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTA CGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 200) [0396] RUNX11-15del cr622 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTT ACGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 201) [0397] RUNX11-15del cr772 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTA CGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 202) [0398] RUNX11-15del cr532 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTA CGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 203) [0399] RUNX11-15del cr961 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTA CGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 204) [0400] RUNX11-15del flip and extension scaffold evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTA CGAAGGAAATGACTCAAATATGCCTTCCTCCTGAAAATAACTCTCTTTGACGCGG TTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 205) [0401] RUNX1 +5G-T standard scaffold evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGTCTGAAGCAAT CGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAA CCAACTAGAAATTTTTT (SEQ ID NO: 206) [0402] RUNX1 +5G-T cr748 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTG
TCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGT TACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 207) [0403] RUNX1 +5G-T cr289 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTG TCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGT TACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 208) [0404] RUNX1 +5G-T cr622 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTT GTCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTA GTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 209) [0405] RUNX1 +5G-T cr772 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTG TCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGT TACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 210) [0406] RUNX1 +5G-T cr532 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTG TCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGT TACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 211) [0407] RUNX1 +5G-T cr961 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTG TCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGT TACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 212) [0408] RUNX1 +5G-T flip and extension scaffold evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTG TCTGAAGCAATCGCTTCCTCCTGAAAATAACTCTCTTTGACGCGGTTCTATCTAGT TACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 213) [0409] DNMT11-15del standard scaffold evopreQ1
GATTCCTGGTGCCAGAAACAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGCTAAGGACTA GTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGC GTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 214) [0410] DNMT11-15del cr748 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTG CTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTAT CTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 215) [0411] DNMT11-15del cr289 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTG CTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTAT CTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 216) [0412] DNMT11-15del cr622 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTT GCTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCT ATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 217) [0413] DNMT11-15del cr772 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTG CTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTAT CTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 218) [0414] DNMT11-15del cr532 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTG CTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTAT CTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 219) [0415] DNMT11-15del cr961 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTG
CTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTAT CTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 220) [0416] DNMT11-15del flip and extension scaffold evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTG CTAAGGACTAGTTCTGCCCTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTAT CTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 221) [0417] DNMT1 +5 G--T standard scaffold evopreQ1 GATTCCTGGTGCCAGAAACAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCGTCACCACTGTTT CTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAA CTAGAAATTTTTT (SEQ ID NO: 222) [0418] DNMT1 +5 G--T cr748 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTGT CACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGC GTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 223) [0419] DNMT1 +5 G--T cr289 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTGT CACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGC GTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 224) [0420] DNMT1 +5 G--T cr622 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTG TCACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACG CGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 225) [0421] DNMT1 +5 G--T cr772 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTGT CACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGC GTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 226) [0422] DNMT1 +5 G--T cr532 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTGT CACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGC GTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 227) [0423] DNMT1 +5 G--T cr961 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTGT CACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGC GTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 228) [0424] DNMT1 +5 G--T flip and extension scaffold evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGT CACCACTGTTTCTGGCACCAGGACCTCTTCTTTGACGCGGTTCTATCTAGTTACGC GTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 229) [0425] FANCF 1-15del standard scaffold evopreQ1 GGAATCCCTTCTGCAGCACCGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTAGTGCTTGAGAC CGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCTAG TTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 230) [0426] FANCF 1-15del cr748 evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTA GTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 231) [0427] FANCF 1-15del cr289 evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTA GTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 232) [0428] FANCF 1-15del cr622 evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTT
AGTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGC GGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 233) [0429] FANCF 1-15del cr772 evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTA GTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 234) [0430] FANCF 1-15del cr532 evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTA GTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 235) [0431] FANCF 1-15del cr961 evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTA GTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 236) [0432] FANCF 1-15del flip and extension scaffold evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTA GTGCTTGAGACCGCCAGAAGCTCGGGCTGCAGAAGGGACAATCACTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 237) [0433] FANCF +5 G--T cr748 evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTG GAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCT AGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 238) [0434] FANCF +5 G--T cr289 evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTG GAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCT AGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 239) [0435] FANCF +5 G--T cr622 evopreQ1
GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTG GAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCT AGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 240) [0436] FANCF +5 G--T cr772 evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTG GAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCT AGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 241) [0437] FANCF +5 G--T cr532 evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTG GAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCT AGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 242) [0438] FANCF +5 G--T cr961 evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTG GAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCT AGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 243) [0439] FANCF +5 G--T flip and extension scaffold evopreQ1 GGAATCCCTTCTGCAGCACCGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTG GAAAAGCGATCAAGGTGCTGCAGAAGGGACAATCACTTTGACGCGGTTCTATCT AGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 244) [0440] EMX11-15del standard scaffold evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTCGTGGCAATGCG CCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTT ACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 245) [0441] EMX11-15del cr748 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTT
CGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 246) [0442] EMX11-15del cr289 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTT CGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 247) [0443] EMX11-15del cr622 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTT CGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 248) [0444] EMX11-15del cr772 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTT CGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 249) [0445] EMX11-15del cr532 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTT CGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 250) [0446] EMX11-15del cr961 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTT CGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 251) [0447] EMX11-15del flip and extension scaffold evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTT CGTGGCAATGCGCCACCGGTTGATGTTCTTCTGCTCGGAAACAATCTTTGACGCG GTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 252) [0448] EMX1 +5 G--T standard scaffold evopreQ1
GAGTCCGAGCAGAAGAAGAAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCGTGATGGGAGCA CTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTAGTTACGCGTTAA ACCAACTAGAAATTTTTT (SEQ ID NO: 253) [0449] EMX1 +5 G--T cr748 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTG TGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTA GTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 254) [0450] EMX1 +5 G--T cr289 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTG TGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTA GTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 255) [0451] EMX1 +5 G--T cr622 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCT GTGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCT AGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 256) [0452] EMX1 +5 G--T cr772 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTG TGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTA GTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 257) [0453] EMX1 +5 G--T cr532 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTG TGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTA GTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 258) [0454] EMX1 +5 G--T cr961 evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTG
TGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTA GTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 259) [0455] EMX1 +5 G--T flip and extension scaffold evopreQ1 GAGTCCGAGCAGAAGAAGAAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAG TTTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTG TGATGGGAGCACTTCTTCTTCTGCTCGGAAACAATCTTTGACGCGGTTCTATCTA GTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 260) [0456] RNF2 +1FLAG standard scaffold evopreQ1 GTCATCTTAGTCATTACCTGGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGC TAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGAGTTACAACGA ACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAAGATGTCATCTCT TTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 261) [0457] RNF2 +1FLAG cr748 evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTG AGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAA GATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAAT TTTTT (SEQ ID NO: 262) [0458] RNF2 +1FLAG cr289 evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTG AGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAA GATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAAT TTTTT (SEQ ID NO: 263) [0459] RNF2 +1FLAG cr622 evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTTG AGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAA GATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAAT TTTTT (SEQ ID NO: 264) [0460] RNF2 +1FLAG cr772 evopreQ1
GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTG AGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAA GATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAAT TTTTT (SEQ ID NO: 265) [0461] RNF2 +1FLAG cr532 evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTG AGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAA GATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAAT TTTTT (SEQ ID NO: 266) [0462] RNF2 +1FLAG cr961 evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTG AGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAA GATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAAT TTTTT (SEQ ID NO: 267) [0463] RNF2 +1FLAG flip and extension scaffold evopreQ1 GTCATCTTAGTCATTACCTGGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGTT TAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTG AGTTACAACGAACACCTCAGCTTATCGTCGTCATCCTTGTAATCGTAATGACTAA GATGTCATCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAAT TTTTT (SEQ ID NO: 268) [0464] VEGFA +5 G--T cr748 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTA ATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCT ATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 269) [0465] VEGFA +5 G--T cr289 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTA ATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCT ATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 270)
[0466] VEGFA +5 G--T cr622 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTA ATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCT ATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 271) [0467] VEGFA +5 G--T cr772 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTA ATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCT ATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 272) [0468] VEGFA +5 G--T cr532 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTA ATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCT ATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 273) [0469] VEGFA +5 G--T cr961 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTA ATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCT ATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 274) [0470] VEGFA +5 G--T flip and extension scaffold evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTA ATGTGCCATCTGGAGCACTCATCTGGCCTGCAGAACAATCTCTTGACGCGGTTCT ATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 275) [0471] VEGFA +1FLAG standard scaffold evopreQ1 GATGTCTGCAGGCCAGATGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCAATGTGCCATCTG GAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCAGAACAATCTCTT GACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 276) [0472] VEGFA +1FLAG cr748 evopreQ1
GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTA ATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCA GAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTT TTT (SEQ ID NO: 277) [0473] VEGFA +1FLAG cr289 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTA ATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCA GAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTT TTT (SEQ ID NO: 278) [0474] VEGFA +1FLAG cr622 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTA ATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCA GAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTT TTT (SEQ ID NO: 279) [0475] VEGFA +1FLAG cr772 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTA ATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCA GAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTT TTT (SEQ ID NO: 280) [0476] VEGFA +1FLAG cr532 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTA ATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCA GAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTT TTT (SEQ ID NO: 281) [0477] VEGFA +1FLAG cr961 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTA ATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCA
GAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTT TTT (SEQ ID NO: 282) [0478] VEGFA +1FLAG flip and extension scaffold evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTA ATGTGCCATCTGGAGCACTCACTTATCGTCGTCATCCTTGTAATCTCTGGCCTGCA GAACAATCTCTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTT TTT (SEQ ID NO: 283) [0479] VEGFA 1-15 del standard scaffold evopreQ1 GATGTCTGCAGGCCAGATGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCGTGTGTCCCTCTG ACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTATCTAGTTACGC GTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 284) [0480] VEGFA 1-15 del cr748 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTGT GTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTA TCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 285) [0481] VEGFA 1-15 del cr289 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTGT GTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTA TCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 286) [0482] VEGFA 1-15 del cr622 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTG TGTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCT ATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 287) [0483] VEGFA 1-15 del cr772 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTGT GTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTA TCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 288)
[0484] VEGFA 1-15 del cr532 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTGT GTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTA TCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 289) [0485] VEGFA 1-15 del cr961 evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTGT GTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTA TCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 290) [0486] VEGFA 1-15 del flip and extension scaffold evopreQ1 GATGTCTGCAGGCCAGATGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGT GTGTCCCTCTGACAATGTGCTCTGGCCTGCAGAACAATCTCTTGACGCGGTTCTA TCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 291) [0487] RUNX1 +1FLAG standard scaffold evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTGTCTGAAGCCAT CCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACTCTCTTTGACGC GGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 292) [0488] RUNX1 +1FLAG cr748 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTG TCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACT CTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 293) [0489] RUNX1 +1FLAG cr289 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTG TCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACT CTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 294) [0490] RUNX1 +1FLAG cr622 evopreQ1
GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTT GTCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAAC TCTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 295) [0491] RUNX1 +1FLAG cr772 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTG TCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACT CTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 296) [0492] RUNX1 +1FLAG cr532 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTG TCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACT CTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 297) [0493] RUNX1 +1FLAG cr961 evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTG TCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACT CTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 298) [0494] RUNX1 +1FLAG flip and extension scaffold evopreQ1 GCATTTTCAGGAGGAAGCGAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTG TCTGAAGCCATCCCTTATCGTCGTCATCCTTGTAATCCTTCCTCCTGAAAATAACT CTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 299) [0495] DNMT1 +1FLAG standard scaffold evopreQ1 GATTCCTGGTGCCAGAAACAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTCTGCCCTCCCGT CACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAGGACCTCTTCTT
TGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTTTT (SEQ ID NO: 300) [0496] DNMT1 +1FLAG cr748 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAGAGAGTGGCACCGAGTCGGTGCTTC TGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAG GACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTT TT (SEQ ID NO: 301) [0497] DNMT1 +1FLAG cr289 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTGGAAACAGTGGCACCGAGTCGGTGCTTC TGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAG GACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTT TT (SEQ ID NO: 302) [0498] DNMT1 +1FLAG cr622 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAGCTGGAAACAGCGGCACCGAGTCGGTGCTTC TGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAG GACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTT TT (SEQ ID NO: 303) [0499] DNMT1 +1FLAG cr772 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGGCACCGAGTCGGTGCTTC TGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAG GACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTT TT (SEQ ID NO: 304) [0500] DNMT1 +1FLAG cr532 evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACGTGAAAACGTGACTCCGAGTCGGAGTTTC TGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAG GACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTT TT (SEQ ID NO: 305) [0501] DNMT1 +1FLAG cr961 evopreQ1
GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTCGAAAGAGTGCAACCGAGTCGGTTGTTC TGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAG GACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTT TT (SEQ ID NO: 306) [0502] DNMT1 +1FLAG flip and extension scaffold evopreQ1 GATTCCTGGTGCCAGAAACAGTTTAAGAGCTAAGCTGGAAACAGCATAGCAAGT TTAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTC TGCCCTCCCGTCACCCCTGTCTTATCGTCGTCATCCTTGTAATCTTCTGGCACCAG GACCTCTTCTTTGACGCGGTTCTATCTAGTTACGCGTTAAACCAACTAGAAATTTT TT (SEQ ID NO: 307) [0503] The present disclosure contemplates any such ways to further improve the efficacy of the prime editing systems utilized in the methods and compositions disclosed here. [0504] In various embodiments, it may be advantageous to limit the appearance of a consecutive sequence of Ts from the extension arm, as consecutive series of T’s may limit the capacity of the PEgRNA to be transcribed. For example, strings of at least three consecutive T’s, at least four consecutive T’s, at least five consecutive T’s, at least six consecutive T’s, at least seven consecutive T’s, at least eight consecutive T’s, at least nine consecutive T’s, at least ten consecutive T’s, at least eleven consecutive T’s, at least twelve consecutive T’s, at least thirteen consecutive T’s, at least fourteen consecutive T’s, or at least fifteen consecutive T’s should be avoided when designing the PEgRNA, or should be at least removed from the final designed sequence. In one embodiment, one can avoid the inclusion of unwanted strings of consecutive T’s in PEgRNA extension arms by avoiding target sites that are rich in consecutive A:T nucleobase pairs. Pharmaceutical compositions [0505] Other aspects of the present disclosure relate to pharmaceutical compositions comprising any of the guide RNAs (including PEgRNAs and ePEgRNAs), fusion proteins, and polynucleotides described herein. The term “pharmaceutical composition”, as used herein, refers to a composition formulated for pharmaceutical use. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition comprises additional agents (e.g., for specific delivery, increasing half-life, or other therapeutic compounds).
[0506] As used here, the term “pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., organ, tissue or portion of the body). A pharmaceutically acceptable carrier is “acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.). Some examples of materials which can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients, such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids; (23) serum component, such as serum albumin, HDL and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservatives, and antioxidants can also be present in the formulation. The terms such as “excipient”, “carrier”, “pharmaceutically acceptable carrier” or the like are used interchangeably herein. [0507] In some embodiments, the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing. Suitable routes of administering the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular,
intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration. [0508] In some embodiments, the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site). In some embodiments, the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber. [0509] In other embodiments, the pharmaceutical composition described herein is delivered in a controlled release system. In one embodiment, a pump may be used (see, e.g., Langer, 1990, Science 249:1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng.14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med.321:574). In another embodiment, polymeric materials can be used. (See, e.g., Medical Applications of Controlled Release (Langer and Wise eds., CRC Press, Boca Raton, Fla., 1974); Controlled Drug Bioavailability, Drug Product Design and Performance (Smolen and Ball eds., Wiley, New York, 1984); Ranger and Peppas, 1983, Macromol. Sci. Rev. Macromol. Chem.23:61. See also Levy et al., 1985, Science 228:190; During et al., 1989, Ann. Neurol.25:351; Howard et al., 1989, J. Neurosurg.71:105). Other controlled release systems are discussed, for example, in Langer, supra. [0510] In some embodiments, the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human. In some embodiments, pharmaceutical compositions for administration by injection are solutions in sterile isotonic aqueous buffer. Where necessary, the pharmaceutical composition can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection. Generally, the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Where the pharmaceutical composition is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. Where the pharmaceutical composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration.
[0511] A pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer’s or Hank’s solution. In addition, the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated. [0512] The pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration. The particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein. Compounds can be entrapped in “stabilized plasmid-lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipid and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther.1999, 6:1438-47). Positively charged lipids such as N-[1-(2,3-dioleoyloxi)propyl]-N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles. The preparation of such lipid particles is well known. See, e.g., U.S. Patent Nos.4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference. [0513] The pharmaceutical compositions described herein may be administered or packaged as a unit dose, for example. The term “unit dose” when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle. [0514] Further, the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a compound of the invention in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection. The pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized compound of the invention. Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use, or sale for human administration. [0515] In another aspect, an article of manufacture containing materials useful for the treatment of the diseases described above is included. In some embodiments, the article of manufacture comprises a container and a label. Suitable containers include, for example, bottles, vials, syringes, and test tubes. The containers may be formed from a variety of
materials such as glass or plastic. In some embodiments, the container holds a composition that is effective for treating a disease and may have a sterile access port. For example, the container may be an intravenous solution bag or a vial having a stopper pierce-able by a hypodermic injection needle. The active agent in the composition is a compound of the invention. In some embodiments, the label on or associated with the container indicates that the composition is used for treating the disease of choice. The article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use. Kits and cells [0516] The guide RNAs (including pegRNAs and epegRNAs), fusion proteins, and compositions of the present disclosure may be assembled into kits. In some embodiments, the kit comprises polynucleotides for expression of the prime editors and/or pegRNAs and epegRNAs described herein. In other embodiments, the kit further comprises appropriate guide nucleotide sequences or nucleic acid vectors for the expression of such guide nucleotide sequences, to target the Cas9 protein of the prime editors to the desired target sequence. [0517] The kits described herein may include one or more containers housing components for performing the methods described herein, and optionally instructions for use. Any of the kits described herein may further comprise components needed for performing the prime editing methods described herein. Each component of the kits, where applicable, may be provided in liquid form (e.g., in solution) or in solid form, (e.g., a dry powder). In certain cases, some of the components may be reconstitutable or otherwise processible (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water), which may or may not be provided with the kit. [0518] In some embodiments, the kits may optionally include instructions and/or promotion for use of the components provided. As used herein, “instructions” can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc. The written instructions may be in a form prescribed by a
governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which can also reflect approval by the agency of manufacture, use or sale for animal administration. As used herein, “promoted” includes all methods of doing business including methods of education, hospital and other clinical instruction, scientific inquiry, drug discovery or development, academic research, pharmaceutical industry activity including pharmaceutical sales, and any advertising or other promotional activity including written, oral, and electronic communication of any form, associated with the disclosure. Additionally, the kits may include other components depending on the specific application, as described herein. [0519] The kits may contain any one or more of the components described herein in one or more containers. The components may be prepared sterilely, packaged in a syringe, and shipped refrigerated. Alternatively, they may be housed in a vial or other container for storage. A second container may have other components prepared sterilely. Alternatively, the kits may include the active agents premixed and shipped in a vial, tube, or other container. [0520] The kits may have a variety of forms, such as a blister pouch, a shrink-wrapped pouch, a vacuum sealable pouch, a sealable thermoformed tray, or a similar pouch or tray form, with the accessories loosely packed within the pouch, one or more tubes, containers, a box, or a bag. The kits may be sterilized after the accessories are added, thereby allowing the individual accessories in the container to be otherwise unwrapped. The kits can be sterilized using any appropriate sterilization techniques, such as radiation sterilization, heat sterilization, or other sterilization methods known in the art. The kits may also include other components, depending on the specific application, for example, containers, cell media, salts, buffers, reagents, syringes, needles, a fabric, such as gauze, for applying or removing a disinfecting agent, disposable gloves, a support for the agents prior to administration, etc. Some aspects of this disclosure provide kits comprising a nucleic acid construct comprising a nucleotide sequence encoding the prime editor systems described herein, or various components thereof (e.g., including, but not limited to, the napDNAbps, reverse transcriptase domains, and pegRNAs/epegRNAs). In some embodiments, the nucleotide sequence(s) comprises a heterologous promoter (or more than a single promoter) that drives expression of the prime editor system components. [0521] Other aspects of this disclosure provide kits comprising one or more nucleic acid constructs encoding the various components of the prime editing system described herein. In
some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the prime editing system components. [0522] Cells that may contain any of the guide RNAs, fusion proteins, and compositions described herein include prokaryotic cells and eukaryotic cells. The methods described herein may be used to deliver a prime editor and/or guide RNA into a eukaryotic cell (e.g., a mammalian cell, such as a human cell). In some embodiments, the cell is in vitro (e.g., cultured cell). In some embodiments, the cell is in vivo (e.g., in a subject such as a human subject). In some embodiments, the cell is ex vivo (e.g., isolated from a subject and may be administered back to the same or a different subject). [0523] Mammalian cells of the present disclosure include human cells, primate cells (e.g., vero cells), rat cells (e.g., GH3 cells, OC23 cells) or mouse cells (e.g., MC3T3 cells). There are a variety of human cell lines, including, without limitation, human embryonic kidney (HEK) cells, HeLa cells, cancer cells from the National Cancer Institute's 60 cancer cell lines (NCI60), DU145 (prostate cancer) cells, Lncap (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA-MB-438 (breast cancer) cells, PC3 (prostate cancer) cells, T47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (glioblastoma) cells, SHSY5Y human neuroblastoma cells (cloned from a myeloma) and Saos-2 (bone cancer) cells. In some embodiments, prime editors and/or guide RNAs are delivered into human embryonic kidney (HEK) cells (e.g., HEK293 or HEK293T cells). In some embodiments, prime editors and/or guide RNAs are delivered into stem cells (e.g., human stem cells) such as, for example, pluripotent stem cells (e.g., human pluripotent stem cells including human induced pluripotent stem cells (hiPSCs)). A stem cell refers to a cell with the ability to divide for indefinite periods in culture and to give rise to specialized cells. A pluripotent stem cell refers to a type of stem cell that is capable of differentiating into all tissues of an organism, but not alone capable of sustaining full organismal development. A human induced pluripotent stem cell refers to a somatic (e.g., mature or adult) cell that has been reprogrammed to an embryonic stem cell-like state by being forced to express genes and factors important for maintaining the defining properties of embryonic stem cells (see, e.g., Takahashi and Yamanaka, Cell 126 (4): 663–76, 2006, incorporated by reference herein). Human induced pluripotent stem cell cells express stem cell markers and are capable of generating cells characteristic of all three germ layers (ectoderm, endoderm, mesoderm). [0524] Additional non-limiting examples of cell lines that may be used in accordance with the present disclosure include 293-T, 293-T, 3T3, 4T1, 721, 9L, A-549, A172, A20, A253,
A2780, A2780ADR, A2780cis, A431, ALC, B16, B35, BCP-1, BEAS-2B, bEnd.3, BHK-21, BR 293, BxPC3, C2C12, C3H-10T1/2, C6, C6/36, Cal-27, CGR8, CHO, CML T1, CMT, COR-L23, COR-L23/5010, COR-L23/CPR, COR-L23/R23, COS-7, COV-434, CT26, D17, DH82, DU145, DuCaP, E14Tg2a, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, Hepa1c1c7, High Five cells, HL-60, HMEC, HT-29, HUVEC, J558L cells, Jurkat, JY cells, K562 cells, KCL22, KG1, Ku812, KYO1, LNCap, Ma-Mel 1, 2, 3....48, MC-38, MCF-10A, MCF-7, MDA-MB-231, MDA-MB-435, MDA- MB-468, MDCK II, MG63, MONO-MAC 6, MOR/0.2R, MRC5, MTD-1A, MyEnd, NALM- 1, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NW-145, OPCN/OPCT Peer, PNT-1A/PNT 2, PTK2, Raji, RBL cells, RenCa, RIN-5F, RMA/RMAS, S2, Saos-2 cells, Sf21, Sf9, SiHa, SKBR3, SKOV-3, T-47D, T2, T84, THP1, U373, U87, U937, VCaP, WM39, WT-49, X63, YAC-1, and YAR cells. [0525] Some aspects of this disclosure provide cells comprising any of the constructs disclosed herein. In some embodiments, a host cell is transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is taken from a subject. In some embodiments, the cell is derived from cells taken from a subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD- 3, NHDF, HeLa-S3, Huh1, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panc1, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calu1, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial, BALB/3T3 mouse embryo fibroblast, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A 172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS-2B, bEnd.3, BHK-21, BR 293. BxPC3. C3H-10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr −/−, COR-L23, COR-L23/CPR, COR-L23/5010, COR-L23/R23, COS-7, COV-434, CML T1, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepa1c1c7, HL-60, HMEC, HT-29, Jurkat, JY cells, K562 cells, Ku812, KCL22, KG1, KYO1, LNCap, Ma-Mel 1-48, MC-38, MCF-7, MCF-10A, MDA-MB-231, MDA-MB-468,
MDA-MB-435, MDCK II, MDCK 11, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI- H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, Peer, PNT-1A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T-47D, T84, THP1 cell line, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC-1, YAR, and transgenic varieties thereof. [0526] Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassus, Va.)). In some embodiments, a cell transfected with one or more vectors described herein is used to establish a new cell line comprising one or more vector-derived sequences. In some embodiments, a cell transiently transfected with the components of a CRISPR system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of a CRISPR complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence. In some embodiments, cells transiently or non-transiently transfected with one or more vectors described herein, or cell lines derived from such cells, are used in assessing one or more test compounds. EXAMPLES Example 1. Designing Prime Editing Experiments in Mammalian Cells [0527] Prime editing (PE) is a precision gene editing technology that enables the programmable installation of substitutions, insertions, and deletions in cells and animals without requiring double-stranded DNA breaks (DSBs). The mechanism of prime editing makes it less dependent on cellular replication and endogenous DNA repair than HDR-based approaches, and its ability to precisely install edits without creating DSBs minimizes indels and other undesired outcomes. The capabilities of prime editing have also expanded since its original disclosure. Enhanced prime editing systems, PE4 and PE5, manipulate DNA repair pathways to increase prime editing efficiency and reduce indels. Other advances that improve prime editing efficiency include engineered pegRNAs (epegRNAs), which include a structured RNA motif to stabilize and protect pegRNA 3′ ends, and the PEmax architecture, which improves editor expression and nuclear localization. New applications such as twin prime editing (twinPE) can precisely insert or delete hundreds of base pairs of DNA and can be used in tandem with recombinases to achieve gene-sized (>5 kb) insertions and inversions. Achieving optimal prime editing requires careful experimental design, and there are large number of parameters that influence prime editing outcomes. This present disclosure
describes methods for optimizing such parameter for conducting prime editing and twinPE experiments, as well as for the design and optimization of pegRNAs. Guidelines and methods for how to select the proper PE system (PE1 to PE5, and twinPE) for a given application are also provided. Finally, detailed methods and instructions on how to perform prime editing in mammalian cells are provided as well. Compared to other procedures for editing human cells, prime editing offers greater precision and versatility and can be completed within 2-4 weeks. Prime editing developments and comparisons with other methods [0528] The mechanism of prime editing involves a complex series of events, each of which is influenced by the structure of the prime editor and pegRNA, as well as cellular factors. Since the initial disclosure of prime editing, several aspects of the PE system have been targeted for optimization. When combined, these optimizations are often additive, offering on average a 3.5-fold (in HEK293T cells) to 72-fold (in HeLa cells) increase in editing efficiency relative to the originally published PE3 system
30,31. These optimizations are particularly helpful when applying prime editing in vivo or in difficult-to-transfect cell types
31,37. Various enhancements and their potential use cases are summarized below and in Table 1. PEgRNA optimizations [0529] The pegRNA is responsible for both targeting the editor and encoding the desired edit. Because the elements of the pegRNA that encode the edit are located at the 3′ end for commonly used 3′-extended pegRNAs, exonucleolytic degradation may be a concern. Indeed, it was recently discovered that cellular degradation of pegRNAs can result in truncated, editing-incompetent pegRNAs that poison prime editing in cells by occupying target DNA sites and prime editor proteins without the possibility of productive editing. To address this issue, engineered pegRNAs (epegRNAs) were developed. epegRNAs contain a structured 3′ motif that enhances stability and prevents 3′ degradation, which in turn results in an average improvement in editing efficiency of 1.5-fold to 4-fold over traditional pegRNAs
31. epegRNAs may be used for all prime editing applications. In the original report of epegRNAs, two different 3′ structural motifs were described: mpknot and tevopreQ
1. Both greatly enhance prime editing, and the use of tevopreQ1 is recommended throughout this protocol simply to decrease the number of epegRNAs that must be tested. [0530] Similarly, it has also been found that the “flip and extension” (F+E) sgRNA scaffold modification, which was previously shown to enhance Cas9 activity
31,38, can also improve prime editing in some circumstances. This sgRNA scaffold modification, which extends one
of the scaffold hairpins and disrupts a spacer-proximal UUUU sequence that may act as a Poll III transcriptional terminator, significantly increased editing at a subset of the sites tested
31. Because this improvement is less generalizable across sites, using an unmodified scaffold for initial epegRNA screening is recommended. However, testing an F+E-modified version of the eventual optimized epegRNA could further increase editing efficiency. To summarize, using an epegRNA harboring the tevopreQ1 motif is generally recommended (for the sake of simplifying the protocol only – the use of mpknot works equally well in prime editing applications), including during PBS and RTT screening. After optimized PBS and RTT lengths have been achieved, changing the 3′ modification to the mpknot motif or changing the scaffold to the F+E sequence could further enhance editing. Manipulating the cellular determinants of prime editing [0531] The PE3 system uses an additional sgRNA to nick the unedited strand of the genome, which directs nick-directed eukaryotic MMR to favor an edited outcome. Due to the importance of DNA repair events during prime editing, the Repair-seq CRISPRi screening platform
39 was applied to identify the cellular determinants of prime editing outcomes
30. Strikingly, knockdown of MMR proteins led to substantial increases in prime editing efficiencies and decreases in indel frequencies, even when the PE3 system is used. [0532] Based on this observation, MLH1dn, a dominant-negative variant of the MMR protein MLH1, was engineered. When transiently co-expressed with prime editing machinery, MLH1dn temporarily inhibits MMR, which greatly enhances prime editing efficiency and minimizes indels across several cell types. When the PE2 or PE3 systems are used with MLH1dn, they are referred to as PE4 and PE5, respectively
30. It was also demonstrated that careful design of pegRNAs can cause prime editing intermediates to evade MMR, without requiring a secondary nick or MLH1dn, by installing silent or benign mutations near the target edit
30. Larger distortions of the DNA double helix are less efficiently recognized by MMR proteins, so introducing additional mutations adjacent to the desired edit impedes engagement of prime editing heteroduplex intermediates by MMR, thereby increasing prime editing efficiencies. Suggestions on when and how to use various MMR manipulation tools are provided in the experimental design section, in FIG.3, and in Table 1. Improvements to the prime editor protein architecture [0533] Engineering the architecture of the editor protein has also improved prime editing efficiency. The PEmax architecture was recently developed and contains four improvements
relative to the original editor: optimization of the nuclear localization signals (NLSs), codon usage, and linkers, as well as two Cas9 mutations that were previously shown to increase Cas9 nuclease activity
30,40. The original prime editor architecture has also been manipulated to create systems such as PE2*
37 and hyPE
41. The PEmax architecture is generally recommended for prime editing applications. Larger genomic changes with twinPE, PEDAR, prime-del, paired pegRNAs, HOPE and GRAND [0534] Traditional prime editing can mediate the efficient insertion or deletion of several dozen base pairs. To increase the size of insertions and deletions that are possible with prime editing, twin prime editing (twinPE) was recently developed. In twinPE, two prime editing events occur on opposite strands of DNA, such that the newly synthesized genomic flaps are complementary to each other (FIG.4). This method directly installs the edit on both DNA strands instead of requiring the cell to synthesize the non-reverse-transcribed strand. TwinPE is capable of making larger edits (for example, ≥780 bp deletions and ≥108 bp insertions) more efficiently than traditional prime editing methods
42. [0535] Several additional dual pegRNA prime editing approaches have been described, including PrimeDel
43, PEDAR
44, paired pegRNAs
45, and HOPE
46, and GRAND
47. These systems differ in the extent and location of complementarity between the two new DNA strands, and in how the starting DNA sequence between the two nicks is manipulated. In twinPE and GRAND, the inter-nick sequence is deleted and replaced with the new sequence encoded by pegRNAs (FIG.4). These newly synthesized strands comprise partial or complete complementarity to each other and can thus spontaneously anneal following reverse transcription. PrimeDel is similar to twinPE, except the newly-synthesized DNA strands are not only complementary to each other but are also complementary to the sequences 5′ of each nick. PEDAR is similar as well, but instead of using a Cas9 nickase, a Cas9 nuclease is used in the prime editor protein. Finally, the paired pegRNA method and HOPE differ from the other three methods in that they do not delete any sequence in between the two nicks. Prime editing and site-specific recombinases to mediate gene insertion and inversion [0536] It has also been shown that PE and twinPE can install recombinase recognition sequences, and following the installation of these sequences, recombinases can mediate kb- scale changes
42. In a sequential-transfection strategy, twinPE was first used to generate cells with a homozygous attB site at CCR5, and then this site was used as a substrate in a second transfection of BxbI recombinase and an attP 5.6-kb donor plasmid, achieving up to 17%
donor knock-in efficiency. In a single-transfection strategy, unedited cells were treated with prime editor, twinPE pegRNAs encoding the attB recombinase site, the corresponding BxbI recombinase, and a 5.6-kb attP donor plasmid to achieve up to 5.5% donor plasmid knock-in efficiency. A similar single-transfection strategy was used to insert factor IX cDNA into the human albumin locus, and editing-dependent production of human factor IX protein was detected in culture media. Two simultaneous twinPE editing events were also used to install both the attB and attP sequences into the HEK293T genome, flanking a 39-kb inversion at the IDS locus that has been shown to cause Hunter syndrome. In a single RNA nucleofection of all PE and recombinase elements, 2.1-2.6% correction of this 39-kb pathogenic inversion was achieved. Prime editing has also been used to incorporate recombinase sites to support gene-sized targeted insertion in a system called PASTE
48. Alternate Cas9 and reverse transcriptase homologs [0537] For other genome editing tools, the primary motivation for using alternate Cas9 domains is to access a wider array of PAM sequences. However, PAM flexibility is not critical for PE, as it offers a much wider range of distances between the PAM and the desired edit than base editing, and either DNA strand can be targeted to achieve a desired edit. Due to this flexibility, using SpCas9 for all prime editing applications may provide advantages. If an NGG PAM is not present, alternate Cas9 domains can be tested, but editing efficiency may be lower. Instead, using twinPE to install the target mutation from two distal NGG PAMs is recommended. Similarly, different RT domains such as the cauliflower mosaic virus RT (RT- CaMV) and the E. coli BL21 retron RT (RT-retron) have been used for prime editing
49. However, these reverse transcriptases yielded lower editing efficiencies than the engineered M-MLV RT used in PE2. While alternate reverse transcriptase domains could eventually prove useful, their prime editing properties may need to be improved before they should be chosen over PE2’s engineered M-MLV. Applications of prime editing [0538] Despite being published just over two years ago, prime editing has already been used in a wide variety of studies. These applications have included editing in many workhorse cell lines such as HEK293T, HeLa, U2OS, and K562 cells
15,24,30,31, as well as more therapeutically-relevant cells including patient-derived fibroblasts, iPSCs, and T cells
30,31 and in animals
34,37,50-55. Using PE4 and PE5, up to 40% editing has been achieved in patient- derived iPSCs, and up to 60% editing has been achieved in primary human T cells
30. Prime editing has also been used for basic research applications such as lineage tracing
56 and
saturation mutagenesis in human cells
57 and plants
58. Many model organisms have also been created using prime editors; prime editing in rabbit embryos yielded an animal model of Tay- Sachs disease
59, and PE has been used to install edits in mouse zygotes
34,52,53. RNP-mediated delivery of the prime editor into zebrafish embryos has also generated up to 30% editing
60. Finally, in vivo prime editing has been shown using hydrodynamic injection, adenovirus, and adeno-associated virus (AAV) delivery methods
37,50,51,54,61. [0539] Editing efficiencies may be dependent on the PBS and RTT of the pegRNA, and the optimal choices for each component are not evident for most sites or edits. General guidelines to overcome the challenge of pegRNA design are provided (see experimental design section and FIGs.2-4); but within these guidelines, typically dozens of potential pegRNAs could be used for a given edit. A recent study attempted to use libraries of edits and corresponding pegRNAs to identify additional design principles
62. The data suggests that, if one can only design a single pegRNA, a 13-nt PBS and a 12-nt RTT would be ideal; this recommendation is helpful in situations where pegRNA screening is not possible. However, many sites such as the RNF2 and HEK4 loci where a PBS of 13 is not optimal have also been encountered, and it has frequently been found that a 12-nt RTT is not desirable, especially for edits that are distal from the nick or mutate more than one base. Thus, when it is essential to achieve the highest editing possible, empirical screening of PBS and RTT lengths is recommended, even when using current-generation prime editing systems. This process is resource-intensive, as many pegRNA constructs may need to be generated and evaluated. To facilitate this screening, performing a pegRNA screen in an easy-to-transfect cell line such as HEK293T cells for human mutations or N2A cells for mouse mutations is recommended. If an easy-to-transfect cell line with a given mutation is not available, cell lines can be made to facilitate screening. [0540] Finally, prime editing precision and in vivo prime editing efficiency can be further optimized. In vivo delivery of a prime editor, particularly using AAV, is more challenging than delivery of Cas9 nuclease or a base editor due to the prime editor’s large size. Removing the RNaseH domain of the RT has allowed AAV delivery, but in vivo editing efficiencies reported to date have been low
37,49,50,53. In addition, while prime editing is very precise overall, it can produce undesired byproducts. Like other genome-editing methods, prime editing can produce indels at the target site. Prime editing generally results in substantially fewer indels than nuclease-based approaches such as Cas9-mediated HDR, but indels can still occur, especially for the PE3 and PE5 systems. Comparatively, the PE2 and PE4 systems typically minimize indel frequencies, though they may be less efficient. Another type of
prime editing byproduct results from reverse transcription into the pegRNA scaffold. Fortunately, the frequency of these scaffold insertions is typically low (1.7% on average)
15, likely because the cell usually excises flaps that are unable hybridize to the unedited DNA strand due to their mismatched 3′ termini. Finally, while MLH1dn is extremely useful for short-term editing, long-term MMR inhibition could potentially lead to adverse cellular effects or mutagenesis. Therefore, optimization of in vivo editing efficiency, improved editor size and precision, and analysis of off-target PE4/PE5 effects will further expand the application scope of prime editing. Prime editing experimental design [0541] There are four main decisions to make when designing a prime editing experiment: (1) pegRNA design, (2) selection of the prime editing system, (3) selection of prime editor architecture, and (4) installation of silent mutations. While some aspects of these decisions are relatively straightforward (for example, the PEmax architecture and the epegRNA modification often provide higher editing efficiency), other decisions are dependent on the edit, target cell type, and delivery method. Guidelines for making these decisions are explained below and in Table 1. Designing candidate epegRNAs [0542] When considering pegRNA design, epegRNAs may be used over unmodified pegRNAs at times due to their increased efficiency. A standard epegRNA has five components: the spacer, scaffold, RTT, PBS, and tevopreQ
1 motif (FIG.2). The scaffold and tevopreQ1 portions are constant, but the spacer, PBS, and RTT should be optimized for each new edit. The first step of epegRNA optimization is to scan the target locus for candidate protospacer sequences that are immediately 5′ of an appropriate PAM sequence (NGG for SpCas9). Only bases 3′ of the nick induced by the Cas9 domain of the editor can be edited. Therefore, as a frame of reference, the first base 3′ of the epegRNA-induced nick—the first editable base—is considered to be the +1 position. While the mechanism of PE enables a broad editing window, it has been found that targeting protospacers more proximal to the desired editing site generally yields higher editing efficiencies. Ideal candidate protospacer sequences may therefore be as close to the desired editing site as possible while keeping the target site in the editable region of PE (i.e., 3′ of the nick, see FIG.3). If the epegRNA will be expressed from a plasmid via the U6 RNA polymerase III promoter, a 5′ G at the start of the spacer may be included to initiate transcription efficiently and should be incorporated into the epegRNA design.
[0543] After identifying candidate protospacers, PBS and RTT lengths may be optimized. The rules governing the best PBS and RTT lengths for a given locus and edit are not completely understood, but optimizing these lengths empirically for a specific edit can help maximize editing efficiency. The number of PBS and RTT lengths that should be screened for a given application depends on the editing efficiency needed and resources available. The number of possible combinations can be large. Optimal PBS lengths have ranged from 8 to 15 nt, and the optimal RTT range is even larger (10 to 74 nt). Screening this entire matrix for a given edit would maximize the likelihood of identifying the optimal epegRNA, but may not be practical for most applications. Sufficiently active epegRNAs can often be determined with a less intensive screening campaign. For a typical epegRNA screen, examining a small matrix of PBS and RTT lengths for each protospacer is recommended. PBS lengths of 10, 13, and 15 are promising candidates for most sites. [0544] Unlike the PBS, the RTT design is dictated by the edit to be installed
15. For small changes such as SNPs, the shortest RTT length tested should encode at least 7 nt of homology downstream of the edit to promote hybridization to the complementary genomic strand. For larger edits such as the insertion of epitope tags, a longer stretch of downstream homology (~20 nt minimum) is recommended. In addition to this edit-dependent minimum RTT length, trying two longer RTT lengths (~4-10 nt longer than the minimum) is recommended as well. This creates a 3 PBS x 3 RTT matrix, representing 9 epegRNAs total for a first-pass assessment. This process is summarized in FIG.3. Screening should be performed in a workhorse cell line such as HEK293T cells for human targets and N2A cells for murine targets. Additionally, screening epegRNAs on the exact target sequence for editing is recommended (this may require creating a cell line that harbors the target mutation—which can often be created), as small changes in the target sequence or epegRNA sequence can lead to large changes in editing outcomes. [0545] Several potential pitfalls should be avoided when designing epegRNAs. For epegRNAs expressed from a plasmid using the U6 RNA polymerase III promoter, four or more consecutive uridines in the pegRNA sequence may act as a transcriptional terminator and prematurely truncate the epegRNA
63. Therefore, the sequences of the spacer, PBS, and RTT should avoid such poly(U) tracts if possible. Additionally, it has been observed that beginning the RTT sequence with a cytosine lowers editing, likely because it disturbs the structure of the epegRNA scaffold
15. Therefore, designing the 3′ extension to not begin with cytosine and omitting designs that would do so is recommended when screening for optimal
RTTs. Online tools such as PrimeDesign
64 and other similar resources
65-67 have also been developed to aid in pegRNA sequence generation. Choice of prime editing system (PE1-PE5) and prime editor architecture [0546] Five prime editing systems have been reported. PE1 lacks the substantial benefits of reverse transcriptase engineering and other improvements and is often not preferred over other systems to achieve prime editing. PE2, PE3, PE4, and PE5 can each be favored for different applications. See Table 1 for a summary of each editing system and detailed guidelines for when to use each one. When performing the epegRNA screen described above, PE2 or PE4 may be used to simplify the screening process, as they do not require simultaneous nicking sgRNA optimization. [0547] When using the PE3 or PE5 system, a secondary nicking guide will need to be designed. Several nicking guide protospacers should be tried to maximize editing efficiency while minimizing the incorporation of indels. Generally, the optimal secondary nick is 50-90 nt upstream or downstream of the epegRNA-induced nick. However, if a PAM is positioned near the desired edit, a PE3b/PE5b nicking sgRNA, which only nicks after prime editing occurs, can be used. To design a PE3b/PE5b nicking sgRNA, positioning the protospacer of the nicking sgRNA such that it overlaps with the edited base(s) on the other strand is recommended, as shown in FIG.5. Because the PE3b/PE5b systems tend to generate fewer indels than PE3/PE5, trying PE3b or PE5b whenever possible is recommended—that is, whenever a properly positioned PAM exists on the unedited strand. For the PE3, PE3b, PE5, and PE5b systems, the U6 RNA polymerase III promoter may be used for nicking sgRNA expression; if this is the case, a 5′ G at the start of the spacer is required for transcription initiation. A final consideration for design of the nicking sgRNA is that differences in DNA repair between cell types may require re-optimization of the nicking sgRNA after transitioning between different cell lines, even for the same edit. [0548] Converting PE2 to PE4, or PE3 to PE5, is simple experimentally; an extra plasmid or other construct providing MLH1dn is added to the transfection mixture. While the addition of MLH1dn may not be as helpful for some edits in MMR-deficient cells such as HEK293T cells, it may improve editing efficiency for the same edit in a more MMR-competent cell type. Therefore, even if using PE4 or PE5 in initial screening in HEK293T cells shows modest benefits, testing these PE systems again later on in the target cell type is recommended. Short-term expression of MLH1dn has been shown to be minimally perturbative to cells, but long-term expression effects have not been evaluated
30. Therefore,
delivery methods in which PE machinery would be constitutively expressed for a long period of time may warrant selecting PE2 and PE3 over PE4 and PE5, especially if the phenomenon being investigated is sensitive to MMR. Finally, regarding the architecture of the protein component of the prime editor, using the PEmax improvements is generally recommended. Compared to the originally described prime editor, PEmax has improved nuclear localization, codons, and linkers, in addition to mutations in the Cas9 domain that increase activity
30. Introduction of silent mutations [0549] Two categories of silent mutations can be installed to achieve higher editing efficiencies. The first class is mutations that disrupt either the PAM or the seed region of the target site. PAM or seed-disrupting edits partially prevent Cas9 from re-binding and re- nicking the target strand, which otherwise could result in indels or the reversion of a desired edit back to the wild-type sequence
15. To include PAM or seed-disrupting mutations, encode them in the RTT of the epegRNA along with the original target edit (FIG.3). The +1 through +3 positions will be seed edits, and the +5 and +6 positions will be PAM edits. PAM- disrupting and seed-disrupting mutations are almost always beneficial, and including them if possible is recommended. [0550] The second class of silent mutations is MMR-evading target-adjacent mutations. Because the inclusion of additional mutations adjacent to the target mutation results in more significant helix distortion, these regions are less likely to be recognized by cellular MMR proteins. This strategy is particularly useful for desired edits that are point mutations and insertions and deletions under 13 nt
30. To include MMR-evading mutations, encode them in the RTT of the epegRNA along with the desired edit (FIG.3). Silent mismatches (particularly C•C mismatches), within about 5 nt of the desired edit are typically the most beneficial. Notably, the effects of MMR-evading mutations are less consistent than those of PAM-disrupting mutations, and certain mismatch types are more effective than others. For this reason, first optimizing the epegRNA without any MMR-evading silent mutations and then adding these mutations afterward is recommended. For both MMR-evading mutations and PAM- or seed-disrupting mutations in coding regions, a codon usage table should be checked to ensure that the additional mutations do not create a highly disfavored codon. MMR-evading silent mutations may be useful to enable a PE3b approach by creating a new nicking sgRNA protospacer that is not present before the edited strand has been generated.
Iteration to maximize editing efficiency [0551] For applications in which editing efficiency must be maximized, several iterative rounds of optimization are recommended. Initially, one should screen for PBS and RTT lengths using the PE2 or PE4 systems, which do not require a nicking sgRNA. Typically, this initial panel will reveal an optimal PBS and/or RTT length: these optimal lengths can then be carried forward in a more refined screen. For instance, if the optimal PBS length is found to be 10 nt in the initial screen, PBS lengths of 9 and 11 nt can be tried, or many different RTT lengths can be screened with the 10-bp PBS. Using optimized PBS and RTT lengths, other aspects of the epegRNA can then be tested. For instance, PAM-disrupting mutations and/or MMR-evading mutations can be encoded in the RTT, and the mpknot motif and F+E scaffold can be evaluated. Finally, nicking sgRNAs and the PE system (PE2-PE5) can be optimized. Even after editing has been optimized in a workhorse cell line, it is beneficial to re-optimize some aspects such as PE system and nicking sgRNAs, due to the specific cell type effects of these changes. This cycle of iterative improvements, summarized in FIG.6, can be repeated until editing efficiencies plateau. Experimental design for twin prime editing [0552] Experimental design of twin prime editing experiments is explained in the following steps. First, using epegRNAs and the PEmax architecture is generally recommended. One exception may be if the additional sequence length from a 3′ motif could make impractical the chemical synthesis of an unusually long epegRNA or its expression from the U6 promoter. Second, unlike other PE schemes, twin prime editing does not require the design of nicking sgRNAs or the use of MLH1dn. The only aspect that should be optimized is a pair of epegRNAs, which have the same architecture as epegRNAs used for typical prime editing. The first step is to identify protospacer combinations to use. However, many possible protospacers typically exist due to the flexibility of the twinPE system. To prioritize protospacers that are likely to yield high editing efficiency, using the CRISPick design tool (portals.broadinstitute.org/gpp/public/analysis-tools/sgrna-design), which can predict the Cas9 nuclease cutting efficiency at a particular protospacer
68, is recommended. Because Cas9 nuclease efficiency is the best predictor of prime editing efficiency
62, it makes sense that a loose correlation between a protospacer’s CRISPick score and the PE efficiency at that protospacer has been observed. [0553] Out of the list of promising protospacers, appropriately spaced pairs of protospacers on opposite DNA strands should be selected. The distance between the two nicks should be at
least 30 bp, as inter-nick distances smaller than this can lead to steric clashes between the two editor proteins. The upper limit of the inter-nick distance is dependent on the desired edit; protospacers as far as 800 bp apart have been used, although most high-efficiency inter-nick distances are between 40 and 150 bp
42. Trying ~5 protospacer combinations in total is recommended. For each protospacer, PBS lengths should be optimized, following the same general guidelines used for traditional epegRNA design (10, 13, and 15 bp to start). Conversely, in twinPE, the RTT does not require extensive optimization or screening. The RTTs for a pair of twinPE epegRNAs are typically each other’s reverse complement, as shown in FIG.4. Due to these guidelines, experimenters will need to screen 9 epegRNA combinations for each pair of protospacers (3 PBS lengths for the top protospacer x 3 PBS lengths for the bottom protospacer). Finally, one aspect of twinPE experimental design is that, if the desired edit is a deletion, editing efficiency can be overestimated due to bias during sample preparation and sequencing. While this bias has been found to be relatively small (<10%) for deletions 50 bp or less in length, bias increases as deletion size increases. Therefore, when performing large deletions, or when quantification must be highly accurate, using unique molecule identifiers (UMIs)
42 is recommended. UMIs, which barcode individual molecules during the first step of HTS sample preparation, allow for PCR duplicates to be detected during downstream analysis. De-duplication mitigates the bias that arises during sample preparation and enables more accurate quantification. Choice of delivery method [0554] Efficient delivery of prime editing components is necessary to achieve efficient editing. During pegRNA optimization, using an easily transfected cell line, such as HEK293T cells for human genome editing or N2A cells for mouse genome editing, is strongly recommended. The efficiency and high-throughput nature of lipid transfection greatly expedites initial rounds of pegRNA screening and prime editor optimization. For other cell types, the most efficient method for delivery will vary, and many therapeutically relevant cell types are not easily transfected. One way to improve editing efficiency in such cell types is to instead deliver plasmids encoding editing systems by electroporation and include a selectable or screenable marker on the prime editor plasmid. Following electroporation, cells harboring the prime editor can be enriched using the marker to increase editing levels among the selected or screened cells. More promisingly, it has been found that in vitro-transcribed mRNA encoding the prime editor protein, co-electroporated with chemically modified synthetic epegRNAs and (if needed) nicking sgRNAs, can support efficient editing in cell
types such as patient-derived iPSCs, primary human T cells, and patient-derived fibroblasts
30,31. In this protocol, procedures for plasmid transfection into HEK293T cells and electroporation of mRNA into patient-derived fibroblasts are described. These methods are promising starting points, but some parameters will need to be re-optimized for other cell types. RNP delivery of prime editors has also been reported, but is not covered in this protocol
60. Experimental controls [0555] In all prime editing experiments, an unedited negative control should be included. This control allows experimenters to be confident that desired editing or other observed mutations at the target locus are PE-dependent. This control is particularly important when attempting to edit a mutation for which cells are heterozygous or contain genetic variability before treatment. Irregularities such as SNPs or indels that endogenously occur at the target locus can be identified using this control. It is also important to note that plasmid quality, transfection efficiency, and the health of the edited cells can affect editing efficiency. For this reason, it is important to include internal controls when comparing two different editing approaches. For example, when comparing two pegRNAs designed to make the same edit, the two should ideally be tested side-by-side in the same experiment. Finally, if attempting to edit a new target locus for the first time, it is helpful to include a positive control using a previously validated pegRNA to edit a well-characterized site (such as the HEK3 locus in human cells or the DNMT1 locus in mouse cells). The editing efficiency achieved for this positive control should be compared to previously published values to ensure that experimental techniques and analyses are being performed correctly. Materials Reagents [0556] Prime editor, epegRNA, and sgRNA preparation
• Plasmids: PEmax (pCMV-PEmax, Addgene ID: 174820), tevopreq1 epegRNA cloning vector (pU6-tevopreq1-GG-acceptor, Addgene ID: 174038), sgRNA cloning vector (pU6-pegRNA-GG-acceptor, Addgene ID: 132777), PEmax mRNA IVT template plasmid (pT7-PEmax, Addgene ID: 178113), hMLH1dn (pEF1a-MLH1dn, Addgene ID: 174824), hMLH1dn mRNA IVT template plasmid (pT7-hMLH1dn, Addgene ID: 178114). • Oligos for sgRNA, pegRNA, and epegRNA Golden Gate cloning, can be designed as shown in Table 2. Alternatively, eBlocks from IDT or similar gene fragment products from other vendors can be used for a simple isothermal assembly reaction with the gene fragment
overhangs and PCR primers listed in Table 2. Custom chemically modified sgRNAs and epegRNAs can also be ordered from Agilent, IDT, or other vendors.
• PCR primers for sequencing edited DNA and amplifying template DNA for mRNA transcription can also be designed as shown in Table 2. • Nuclease-free water (Qiagen, cat. no.129115)
• Phusion U Green Multiplex PCR Master Mix, 2x (Thermo Fisher Scientific, cat. no. F564L) or any other high-fidelity polymerase. • SeaKem LE Agarose (Lonza, cat. no.50004) • Ethidium bromide solution, 10 mg/ml (Millipore Sigma, cat. no. E1510-10ML) • UltraPure TAE Buffer, 10× (Thermo Fisher Scientific, cat. no.15558026) • TriDye 1 kb Plus DNA Ladder (NEB, cat. no. N3270S) • Gel Loading Dye, Purple (6X) (NEB, cat. no. B7024S)
• QIAquick PCR purification kit (Qiagen, cat. no.28104) • S.O.C. medium (Thermo Fisher Scientific, cat. no.15544034) • LB medium (United States Biological, cat. no. L1505)
• LB agar medium (Millipore Sigma, cat. no. L2897) • Carbenicillin, 50 mg/ml, sterile filtered (Gold Biotechnology, cat. no. C-103) • Illustra TempliPhi 100 Amplification Kit (Cytiva, cat. no.25640010) • Qiagen Plasmid Plus Midi Kit (Qiagen, cat. no.12945) • PureYield Plasmid Miniprep System (Promega, cat. no. A1222) • TE Buffer, 1× (Thermo Fisher Scientific, cat. no.12090015) [0557] Golden Gate cloning of epegRNAs and sgRNAs
• BsaI-HFv2 (NEB, cat. no. R3733S) • NcoI-HF (NEB, cat. no. R3193S) • PvuII-HF (NEB, cat. no. R3151S)
• rCutsmart Buffer, 10× (NEB, cat. no. B6004S or provided with restriction enzymes) • Tris-HCl, pH 8.0, 1 M solution (Thermo Fisher Scientific, cat. no.15568025) • NaCl, 5M solution (Thermo Fisher Scientific, cat. no. AM9760G) • T4 DNA Ligase (NEB, cat. no. M0202S) • T4 DNA Ligase Reaction Buffer, 10x provided with the T4 DNA ligase, but can also be ordered separately (NEB, cat. no. B0202S). • T4 Polynucleotide Kinase, necessary if sgRNA scaffold oligos for Golden Gate method will be manually phosphorylated (NEB, cat. no. M0201S)
• QIAquick Gel Extraction Kit (Qiagen, cat. no.28704) [0558] Isothermal assembly of epegRNAs and sgRNAs
• NEBuilder HiFi DNA Assembly Master Mix (NEB, cat. no. E2621S) or other preferred isothermal assembly mastermix • DpnI (NEB, cat. no. R0176S)
• rCutsmart Buffer, 10× is provided with the restriction enzyme, but can also be ordered separately (NEB, cat. no. B6004S). • Phusion High-Fidelity PCR Master Mix with HF Buffer (NEB, cat. no. M0531S) or any other high-fidelity polymerase with a DpnI-compatible reaction buffer. [0559] In vitro transcription of prime editor mRNA • HiScribe T7 High Yield RNA Synthesis Kit (NEB cat. no. E2040S) • CleanCap Reagent AG (Trilink, cat. no. N-7113)
• N
1-Methylpseudouridine-5′-Triphosphate (Trilink, cat. no. N-1081) • 7.5M LiCl Precipitation Solution (Thermo Fisher Scientific, cat. no. AM9480). • RNaseZap RNase Decontamination Solution (Thermo Fisher Scientific, cat. no. AM9782) or equivalent product. • DNase I, RNase-free (NEB cat. no. M0303S) • Gel Loading Buffer II, Denaturing PAGE (Thermo Fisher, cat. no. AM8546G) • Millennium RNA Markers (Thermo Fisher, cat. no. AM7150) • SYBR Gold Nucleic Acid Gel Stain (Thermo Fisher Scientific, cat. no. S11494) [0560] Mammalian cell culture • All cell lines should be regularly tested for mycoplasma with a kit such as MycoAlert Plus (Lonza, cat. no. LT07-710) • DMEM, high glucose, GlutaMAX Supplement (Thermo Fisher Scientific, cat. no. 10566016; phenol-red free: 21063029)
• Fetal bovine serum (FBS) (Thermo Fisher Scientific, cat. no.16000044) FBS should be divided into aliquots and frozen at −20 °C if not in use for culture medium. • PBS, pH 7.4 (1×) (Thermo Fisher Scientific, cat. no.10010023) • TrypLE Express Enzyme (1×), phenol red (Thermo Fisher Scientific, cat. no. 12605036; phenol-red free: 12604021) • Lipofectamine 2000 Transfection Reagent (Thermo Fisher Scientific, cat. no. 11668019)
• Opti-MEM Reduced Serum Medium (Thermo Fisher Scientific, cat. no.31985070)
• GFP transfection marker: pmaxGFP (provided in Lonza Nucleofector kits such as SE Cell Line 4D-Nucleofector X Kit S; Lonza, cat. no. V4XC-1032)
• Proteinase K (NEB, cat. no. P8107S) • Tris-HCl, pH 8.0, 1 M solution (Thermo Fisher Scientific, cat. no.15568025) • SDS, 10% (wt/vol) solution (Thermo Fisher Scientific, cat. no.15553027)
• SE Cell Line 4D-Nucleofector X Kit S, for electroporation of editor mRNA (Lonza, cat. no. V4XC-1032) [0561] Biological materials • One Shot Mach1 T1 Phage-Resistant Chemically Competent Escherichia coli (Thermo Fisher, cat. no. C862003) or preferred cloning strain • HEK293T cell line (ATCC, cat. no. CRL-3216; RRID: CVCL_0063) • Primary human fibroblasts can be purchased from a biobank such as the Coriell Institute. Primary Tay–Sachs disease patient fibroblast cells were previously obtained from the Coriell Institute (cat. no. GM00221). [0562] High-throughput sequencing analysis • Phusion U Green Multiplex PCR Master Mix, 2x (Thermo Fisher Scientific, cat. no. F564L) or any other high-fidelity polymerase.
• QIAquick Gel Extraction Kit (Qiagen, cat. no.28704) • Qubit double-stranded DNA High-Sensitivity Assay Kit (Thermo Fisher Scientific, cat. no. Q32854)
• MiSeq Reagent Kit v2 (300-cycles) (Illumina, cat. no. MS-103-1002—Micro kit, ~4 million reads; MS-102-2002—standard kit, ~15 million reads) [0563] Equipment • Filtered sterile pipette tips (e.g., Biotix, Rainin, VWR) • Serological pipettes, assorted (Corning) • Standard microcentrifuge tubes, 1.5 ml (Neptune Scientific, cat. no.4445.X) • Standard PCR tube strips, 8 tubes/strip, 0.2 ml (Corning, cat. no. PCR-0208-C)
• Standard PCR 1 × 8 strip caps, for 0.2-ml PCR tubes (Corning, cat. no. PCR-2CP-RT- C) • Falcon centrifuge tubes, polypropylene, 15 ml (VWR, cat. no.62406-200)
• Falcon centrifuge tubes, polypropylene, 50 ml (VWR, cat. no.21008-940) • Corning 50-ml Mini Bioreactor (Corning, cat. no.431720)
• VWR 96-Well Deep-Well Plates with Automation Notches (VWR, cat. no.76329- 998)
• Corning vacuum filter/storage bottle system, 0.22-μm pore, 33.2 cm2 polyethersulfone (PES) membrane (Corning, cat. no.431097) • 8-tube PCR strips, white for qPCR (Bio-rad, cat. no. TLS0851)
• Flat PCR tube 8-cap strips, optical, ultraclear (Bio-rad, cat. no. TCS0803) • VWR 96-well PCR plate (VWR, cat. no.89218-296) • Hard-shell 96-well PCR plates, for qPCR (Bio-rad, cat. no. HSP9655) • Microseal ‘F’ PCR plate seal, foil (Bio-rad, cat. no. MSF1001) • PCR plate heat seal, clear, optical, for qPCR reactions (Bio-rad, cat. no.1814030) • Plastic inoculating loops, 10 μl (Copan, cat. no. COP-S10) • Non-tissue culture–treated bacteriological Petri dish, 100 × 15 mm (VWR, cat. no. 470210-568) • 96-well clear flat-bottom TC-treated microplates with lids (Corning, cat. no.353075) • Falcon TC-treated cell culture flask with vented cap, 75 cm
2 (Corning, cat. no. 353136) • Light microscope with filters for fluorescence (Zeiss Axio Observer or comparable system) • Gel casting system (Bio-rad, cat. no.1704412—caster; and Bio-rad, cat. no. 1704416—gel tray) • Gel electrophoresis system (Bio-rad, cat. no.1704401) • Power supply for gel electrophoresis (Bio-rad, cat. no.1645070)
• PCR thermocycler with 48- and/or 96-well heating blocks (Bio-rad, cat. no.1851197) • Real-time PCR detection system (e.g., Bio-rad CFX96 or comparable system) • Benchtop microcentrifuge (Eppendorf, cat. no.5405000441)
• Tabletop centrifuge with rotor fitting 50-ml conical tubes (Eppendorf, cat. no. 022623508 or comparable system) • Qubit 4 fluorometer (Thermo Fisher Scientific, cat. no. Q33238) • Nucleocounter NC-300 (Chemometec), or other cell counter • Lonza 4D Nucleofector with X unit (Lonza, cat. no. AAF-1002X and AAF-1002B) • 37 °C, humidity- and CO2-regulated incubator (Thermo Fisher Scientific, cat. no. 51030284 or comparable system)
• Benchtop vortexer (Fisher Scientific, cat. no.02-215-414 or comparable system)
• Blue-light transilluminator for gel cutting (VWR, cat. no.76151-834 or comparable system)
• Gel-imaging system (Bio-rad ChemiDoc or comparable system) • NanoDrop One Microvolume UV-Vis Spectrophotometer (Thermo Fisher cat. no. ND-ONE-W)
• Miseq system (Illumina, cat. no. SY-410-1003) [0564] Software • CRISPResso2 (github.com/pinellolab/CRISPResso2) • Docker (docker.com/products/docker-desktop) • Geneious or preferred comparable software (geneious.com) Reagent setup [0565] Oligonucleotide annealing buffer for Golden Gate cloning • To prepare 50 ml of annealing buffer, combine 500 µl 1 M Tris-HCl, pH 8.0 with 500 µl 5 M NaCl. Add nuclease-free water to a final volume of 50 ml. This solution can be stored at room temperature (25 °C) indefinitely. [0566] Mammalian cell lysis buffer for gDNA extraction from HEK293Ts and primary fibroblasts • Mix 10 ml of 1 M pH 8.0 Tris-HCl, 5 ml of 10% (wt/vol) SDS solution, and nuclease- free water to a total volume of 1 liter. Store this incomplete buffer at room temperature (25 °C) for <6 months. Immediately before lysis, make a small aliquot of complete mammalian cell lysis buffer by adding a 1:1,000 (vol/vol) dilution of proteinase K (NEB). [0567] DMEM culture medium with FBS for culturing HEK293T cells and primary human fibroblasts • Refer to final FBS concentration suggested for growth media by cell line vendors, especially when growing primary fibroblasts. • For HEK293T cells, prepare a 500 mL volume of 10% FBS-supplemented culture medium by adding 50 ml FBS to 450 ml DMEM and sterile filtering. • For primary human fibroblasts, prepare a 500 mL volume of 20% FBS-supplemented culture medium by adding 100 ml FBS to 400 ml DMEM and sterile filtering. • After supplementing with FBS, DMEM should be stored for a maximum of 3 weeks at 4 °C.
Procedure [0568] Design of epegRNAs and nicking sgRNAs (Timing 1 day) 1. Follow the process outlined in the “Designing candidate epegRNAs” section to create a list of epegRNA spacer and RTT/PBS 3′ extension sequences 2. Follow the process outlined in “Choice of prime editing system (PE1-PE5) and editor architecture” to design nicking sgRNAs. [0569] Preparation of epegRNA or sgRNA constructs 3. When delivering epegRNAs and nicking sgRNAs as plasmids, either Golden Gate cloning (option A) or isothermal assembly (option B) can be used to generate constructs. If pegRNAs, epegRNAs, or nicking sgRNAs will instead be delivered as RNA, they should be purchased with chemical modifications that enhance editing (option C). A. Generation of epegRNAs by Golden Gate cloning (Timing 3 days) This method is most useful for altering spacers and RTT/PBS 3′ extensions while keeping the scaffold and tevopreQ
1 motif constant. The modified version of this procedure noted throughout is also useful for cloning nicking sgRNAs. I.Design Golden Gate cloning oligonucleotides, following the examples listed in Table 2. Briefly, these oligos include: • Top and bottom oligos with cloning overhangs to insert the spacer sequence (Golden Gate part 1). • Top and bottom oligos with cloning overhangs to insert the SpCas9 sgRNA scaffold sequence (Golden Gate part 2). These can either be ordered with 5′ phosphorylation or they can be phosphorylated by the experimenter. Note: Golden Gate part 2 will be different between epegRNAs and nicking sgRNAs to account for the absence of an epegRNA RTT/PBS 3′ extension in nicking sgRNAs. • Top and bottom oligos with cloning overhangs to insert the desired epegRNA RTT/PBS 3′ extension (Golden Gate part 3). This is not required if cloning a nicking sgRNA. II.In a separate reaction for Golden Gate parts 1, 2, and 3, anneal ssDNA oligonucleotides to create the dsDNA parts necessary for Golden Gate assembly. Prepare the annealing reactions as follows:
If cloning an sgRNA, only two annealing reactions (Part 1 for the spacer and the modified Part 2 for the scaffold) are necessary. III.Perform the annealing reaction under the following conditions in a thermocycler:
IV.Dilute annealed oligonucleotides by adding 75 µL H2O. The final concentration of each dsDNA Golden Gate part will now be 1 µM. Do not dilute the sgRNA scaffold (Golden Gate part 2) if oligos were purchased without 5′ phosphorylation. Golden Gate parts can be stored at −20 °C indefinitely. V.(Optional) If Golden Gate Part 2 oligos were not purchased with 5′ phosphorylation, phosphorylate the annealed scaffold oligos (Golden Gate part 2) from step 3A(III). This step is not necessary if top and bottom oligos were purchased with 5′ phosphorylation. Assemble the T4 Polynucleotide Kinase in a reaction as follows:
VI.In a thermocycler, incubate at 37 °C for 60 minutes. Following this phosphorylation reaction, annealed scaffold oligonucleotides are now at a concentration of 1 µM. [Pause Point] Phosphorylated and annealed oligonucleotides can be stored at −20 °C and reused indefinitely for future reactions. VII.Predigestion and agarose gel extraction of the epegRNA expression vector. Cloning epegRNAs using the plasmid pU6-tevopreq1-GG-acceptor (Addgene ID: 174038), which
already contains the tevopreQ
13′ structural motif and a human U6 promoter, is recommended. If cloning a nicking sgRNA, use the plasmid (pU6-pegRNA-GG-acceptor (Addgene ID: 132777), which is a U6 promoter mammalian expression vector without the tevopreQ
13′ structural motif. VIII.Prepare a triple restriction enzyme digest of 5µg of pU6-tevopreq1-GG-acceptor as follows:
IX.Incubate the reaction for 4-16 hours at 37 °C. X.After the restriction digest, use agarose gel electrophoresis to verify successful digestion and gel extract the linearized cloning vector. Make a 1% agarose gel supplemented with 1:10,000 (vol/vol) ethidium bromide or preferred nucleic acid staining reagent. Mix the 40 µl restriction digest with 8 µl 6x purple loading dye (1x final concentration) and load all 48 µl into the gel along with a DNA ladder in a separate lane. Run the gel in a 1× TAE buffer at 140 V/cm for 20 min. Successfully digested plasmids will yield a prominent 2.2 kb restriction fragment corresponding to the desired backbone and an 825bp RFP dropout cassette. XI.Selectively excise and purify the 2.2 kb restriction fragment products using the QIAquick Gel Extraction kit (Qiagen) according to the manufacturer’s protocols. This 2.2 kb restriction fragment is Golden Gate part 4. XII.Elute in EB buffer (provided in the kit) or water and normalize the concentration to 30 ng/μl. Purified restriction fragments can be stored at −20 °C for several months. XIII.Set up the Golden Gate reaction to assemble an epegRNA as follows: If cloning a nicking sgRNA, there will be no Golden Gate part 3, a different part 2 (see Table 2) than shown below, and a different part 4 than shown below.
XIV.Perform the assembly reaction under the following conditions in a thermocycler:
XV.Following the completion of the Golden Gate assembly reaction, place the reactions on ice. XVI.Transform Golden Gate assembly into chemi-competent E. coli. Combine 1 µl of each reaction and 10-μl of chemi-competent E. coli Mach1 cells or another chemi-competent strain. XVII.Incubate the assembly/cell mix on ice for 10 min, heat-shock the mix at 42 °C for 30 s, and then immediately return the mix to ice for 1 min.
XVIII.Add 100 μl of S.O.C. media to the mix and plate the entire volume on an LB-agar plate containing 50 μg/ ml carbenicillin. (Additional outgrowth after heat shock is not required.) Incubate overnight at 37 °C. Transformed E. coli can be stored at 4 °C for 1 week. XIX.Perform a rolling circle amplification (RCA) according to manufacturer (Cytiva) instructions. Briefly, pick individual colonies into 5 μl of sample buffer. Heat the mixture to 95 °C for 5 minutes in a thermocycler, and then add 5 μl of reaction buffer and 0.2 μl of enzyme. Incubate at 30 °C for at least 5 hours. Do not pick red colonies. These are colonies with undigested or reassembled pU6-tevopreq1- GG-acceptor plasmids. XX.Sequencing of epegRNA or sgRNA expression plasmid. Using a preferred Sanger sequencing vendor, submit completed RCA reactions for sequence validation. Be sure to use a sequencing primer that will provide coverage of the epegRNA spacer, sgRNA scaffold, and RTT/PBS 3′ extension. Sequencing verification of the entire cloned epegRNA (or nicking sgRNA) sequence is necessary to avoid junction mutations or mutations from impure oligos. XXI.In single wells of a 96-well deep-well plate, inoculate 1 ml cultures of sequence-verified colonies. LB media with 50 μg/ml carbenicillin should be used. Incubate at 37 °C with shaking for 20 h. XXII.Use a Promega PureYield Plasmid Miniprep kit or another endotoxin-free plasmid preparation kit to isolate plasmid DNA from each 1 ml culture, according to the manufacturer’s instructions. Purified plasmids can be stored at −20 °C indefinitely. b. Generation of an epegRNA by isothermal assembly (Timing 3 days) This method is recommended when one prefers a simpler two-component assembly and has complete control over the entire epegRNA or nicking sgRNA sequence. I.Design and order isothermal assembly gene fragments following the examples listed in Table 2. These fragments should include all epegRNA elements between the two adapter sequences: spacer, sgRNA scaffold, RTT, PBS, and 3′ structural motif. II.Perform a PCR using the isothermal assembly primers listed in Table 2 and the template pU6-tevopreq1-GG-acceptor (Addgene ID: 174038). The reaction is assembled as follows:
Phusion High-Fidelity PCR Master Mix with HF Buffer is specifically used because its buffer is compatible with a later DpnI digestion in step 3B(V). III.Perform PCR using the following program:
IV.Make a 1% agarose gel supplemented with 1:10,000 (vol/vol) ethidium bromide (or other DNA gel stain). Mix 1 µl of the PCR reaction with 4 µl water and 1 µl of 6x purple loading dye. Load this mix into the gel along with a ladder and run the gel in a 1× TAE buffer at 140 V/cm for 20 min. The correct PCR product will yield a prominent 2 kb band. V.Digest the PCR reaction with DpnI (NEB), which removes the template plasmid input. This digestion is essential to minimize re-transformation of the PCR template. Add 1 µl of DpnI (20 U / µl) to the unpurified PCR and incubate at 37 °C for 15 minutes on a thermocycler. DpnI can be added directly to this reaction as it is active in the HF buffer supplied with the Phusion HF 2x Mastermix. VI.Purify the PCR products using the QIAquick PCR purification kit (Qiagen) according to the manufacturer’s instruction. Elute in water and dilute the PCR products to a concentration of 70 ng/μl. Purified amplicons can be stored at −20 °C indefinitely and reused for different cloning projects. VII.Set up the isothermal reaction as follows:
VIII.Incubate the isothermal assembly at 50 °C for 15-60 minutes on a thermocycler. IX.Following the completion of the isothermal assembly, place the reactions on ice. X.For transformation and sequence verification, follow the same procedure used for the Golden Gate Assembly (Steps 3A XVI-XXII). In this method, the entire pU6-tevopreq1-GG-acceptor plasmid is amplified using PCR, which risks generating mutations throughout the entire plasmid. Therefore, when sequence validating, be sure to use a sequencing primer or primers that will provide coverage of the vector’s entire U6 promoter, all epegRNA/sgRNA elements, and terminator. Mutations in any of these could yield ineffective constructs. c. Acquiring purified, chemically modified, synthetic epegRNAs, pegRNAs, or sgRNAs (Timing 1–6 weeks) In general, researchers can deliver epegRNAs, pegRNAs, and nicking sgRNAs either as plasmids or as chemically modified synthetic RNAs. Delivery of chemically modified synthetic RNAs is preferred if the PE protein components will be delivered as in vitro transcribed mRNAs (Step 4). The use of in vitro transcribed mRNA and synthetic guide RNAs can enable higher editing than plasmid delivery in certain cell types. When ordering synthetic epegRNAs from Agilent, IDT, or other vendors, the ends of the RNA may be chemically modified to prevent degradation in cells. Include 2′ O-methyl groups on the first three and last three nucleotides and replace the first three and last three phosphodiester bonds with phosphorothioate bonds. Ordering enough synthetic RNA to use 90 pmol of epegRNA and 60 pmol of nicking sgRNA per sample is recommended, but these amounts may need optimization for each different electroporation system and cell type. i. Dissolve lyophilized synthetic epegRNAs and/or sgRNAs in TE buffer. Resuspend RNAs to a concentration of 100–300 μM and store at −20 °C for ≤ 1 year. [0570] Preparation of in vitro transcribed PEmax mRNA (Optional) (Timing 1-2 Days) This step is only necessary when delivering prime editors as mRNA transcripts. mRNA delivery can greatly enhance editing in some cell types, as shown in FIG.7H. 4. DNA templates for in vitro transcription should be linear, not circular. To generate a linear in vitro transcription template, PCR amplify PEmax and/or MLH1dn from mRNA
transcription template plasmids (Addgene ID: 178113 and 178114, respectively) using the primers listed in Table 2. Set up the following reaction:
DNA yields from this PCR can be relatively low and that pooling multiple 50 µL PCRs into a single PCR purification column (Step 6) provides enough template for the later in vitro transcription (Step 8). Using typical equipment, this 300 µL mastermix will need to be divided into six individual 50 µL reactions on a thermocycler. 5. Perform the PCR under the following conditions:
6. Purify the PCR products from the 300 µL mastermix using a single silica column from the QIAquick PCR purification kit (Qiagen) according to the manufacturer’s protocols. Elute in EB (provided with the kit) and quantify purified product concentration by UV-Vis spectrophotometry (NanoDrop) or equivalent method. The mRNA transcription template plasmid contains a T7 promoter disabled by a single nucleotide mutation. PCR amplification with the mRNA-Fw primer generates an amplicon with a repaired T7 promoter. The disabled T7 promoter on the template plasmid prevents transcription initiation and obviates the need to remove the template plasmid via DpnI digest or gel purification. Instead, a simple silica column cleanup can be used in this step. 7. After PCR purification, verify amplification via agarose gel (0.7%, supplemented with 1:10,000 (vol/vol) ethidium bromide or other nucleic acid stain) electrophoresis. Dilute
100 ng of purified PCR product in 5 µl of nuclease-free water and mix with 1 µL of 6x purple loading dye. Load this mix into the gel along with ladder in a separate lane. Run the gel in a 1× TAE buffer at 140 V/cm for 20 min. Successfully amplified in vitro transcription templates will yield a distinct 6.5 kb amplicon. 8. Using the HiScribe T7 High Yield RNA Synthesis Kit (NEB), set up an in vitro transcription reaction as follows, scaling the reaction up or down as needed: This reaction follows the manufacturer-suggested protocol for HiScribe T7 High Yield RNA Synthesis Kit when using Trilink’s CleanCap Reagent AG to enable co-transcriptional capping. However, the kit’s 100 mM UTP is additionally replaced with Trilink’s 100mM N1- Methylpseudouridine-5′-Triphosphate. RNAse-free technique is essential during this step and all subsequent in vitro transcription steps. RNAse contamination will compromise mRNA integrity and produce sub-optimal results. Before starting an in vitro transcription reaction setup, decontaminate all work surfaces, pipettes, and other materials with an RNase decontamination solution, such as RNaseZap (Thermo Fisher) and ensure that tubes, pipette tips, and other disposables are RNAse free.
9. Incubate the in vitro transcription reaction at 37 °C for 2 hours in a thermocycler or a dry air incubator.
10. Remove template DNA by DNase (NEB) treatment. Set up DNAse digest as listed below:
11. Incubate the DNAse I treatment at 37 °C for 15 minutes in a thermocycler. 12. Purify the synthesized RNA by lithium chloride precipitation: mix the 200 µl reaction from Step 10 with 100 µl 7.5 M LiCl. 13. Incubate the mixture at -20 °C for 30 minutes. 14. Centrifuge at top speed in a microcentrifuge for 15 minutes. A temperature-controlled microcentrifuge set to 4 °C is preferred, if available. 15. A white pellet of precipitated RNA will form in the tube. Pipette off the supernatant and wash the pellet with ice-cold 70% ethanol. Do not remove the 70% ethanol. 16. Centrifuge again at top speed in a microcentrifuge for 5 minutes. 17. Remove all the 70% ethanol without disturbing the pellet. Resuspend the pellet in nuclease-free water or 10 mM Tris, 1 mM EDTA. Quantify purified mRNA concentration by UV-Vis spectrophotometry (NanoDrop) or equivalent method. 18. Verify successful and precise transcription via agarose gel electrophoresis (2.0%, supplemented with 1:10,000 (vol/vol) SYBR Gold nucleic acid staining reagent, Thermo Fisher Scientific): dilute 300ng of purified Step 17 product in 5 µl of nuclease-free water and mix with 5 µL 2x Gel Loading Buffer II (Thermo Fisher). Also dilute 2.5 µl of Millennium RNA Markers (Thermo Fisher) in 2.5 µl nuclease-free water and mix with 5 µL 2x Gel Loading Buffer II. Heat both 10 µl mixtures on a thermocycler for 10 minutes at 70 °C. Load both mixtures into separate lanes of the 2% gel and perform electrophoresis in a 1× TAE buffer at 140 V/cm for 20-30 min. Successfully transcribed mRNAs will yield a distinct 6.5 kb (PEmax) or 2.4 kb (MLH1dn) mRNA transcript. 19. If gel electrophoresis confirms that the transcribed mRNA is high quality, distribute the purified mRNA into working aliquots of 5 - 20 µl.
Multiple freeze-thaw cycles will result in mRNA degradation and should be avoided whenever possible. Preparing multiple aliquots is essential to maximizing the shelf life of in vitro transcribed mRNAs. Purified mRNA transcripts can be stored at −80 °C for several months if not subjected to multiple freeze-thaw cycles. [0571] Verification of prime editing in HEK293T cells or primary human fibroblasts 20. Prime editing can be verified in a variety of mammalian cell types, including HEK293T cells (option A) or primary human fibroblasts (option B). HEK293T cells are recommended as a workhorse cell line for prime editing epegRNA optimization. Primary cells, such as primary human fibroblasts, can be used to verify prime editing correction of pathogenic mutations in patient cells. A. Prime editing in HEK293T cells via plasmid transfection (Timing 4-5 Days) In this example transfection protocol, a PE5 transfection is described, which typically yields the highest editing efficiency out of all PE systems and drastically reduces indels relative to PE3. PE5 requires expression plasmids for four PE components: (1) PEmax (2) an epegRNA (3) a nicking sgRNA (4) MLH1dn. In systems such as PE2, PE3, PE3b, and PE4, the nicking sgRNA and/or MLH1dn are not included and would be excluded from this protocol. For twinPE transfections, two epegRNAs are used instead of an epegRNA and a nicking sgRNA. (See Table 3 for plasmid amounts to be used for each PE system.) i. Plasmid preparation. Order or clone expression plasmids for all desired prime editing components: prime editor (PEmax architecture, Addgene #174820), epegRNA, nicking sgRNA, and MLH1dn (Addgene #174824). See Steps 3A or 3B for epegRNA and nicking sgRNA cloning instructions. ii. Generate transfection-grade preparations of expression vectors using endotoxin-free plasmid isolation kits such as Qiagen Plasmid Plus Midi Kit (Qiagen) or PureYield Plasmid Miniprep System (Promega) according to the manufacturer’s protocol. iii. HEK293T cell culture. Follow the vendor-specified (ATCC) protocol to culture HEK293T cells. Briefly, use DMEM (Thermo Fisher Scientific) supplemented with 10% FBS (vol/vol) and grow HEK293Ts in T75 tissue culture flasks maintained at 37 °C and 5% CO2. Penicillin and streptomycin can be included during the culture of HEK293Ts. However, they should be avoided when plating cells for transfection: using antibiotics during transfections can affect both transfection efficiency and cell viability.
iv. Culture HEK293T cells until 70% confluent. When 70% confluent, passage cells by removing growth medium, washing the cell monolayer with 1x PBS, and then removing the PBS wash, being careful to not detach the monolayer from the surface of the flask. v. Add 2 ml of TrypLE (Thermo Fisher Scientific) and incubate at 37 °C and 5% CO
2 for 5 minutes to dissociate the adherent cells. vi. After incubation, add 10mL of pre-warmed media to the flask. Pipette up and down to detach cells from the flask’s growth surface and to disperse clumps of cells. vii. Continue to subculture the cells by reseeding into a new T75 flask and/or preparing 96-well plates for plasmid transfection. Do not grow HEK293T cell cultures beyond 80% confluency and dispose of cells after passage 20. HEK293T cell cultures are generally passaged at a ratio between 1:5 and 1:10 every 2-3 days. viii. Plate HEK293T cells for transfection. Experiments are performed in 96-well plates, using 1.6–1.8 x 10
4 cells in 100 µl of FBS-supplemented DMEM per well. ix. Count the dissociated cells (step 20f) using a Nucleocounter NC-3000 (Chemometec) or other cell counter according to manufacturer instructions. Dilute the cells to a concentration of 1.6-1.8 x 10
5 cells/mL in FBS-supplemented DMEM. x. Plate 100 µl of the diluted cell mix (step 20i) into each well of a 96 well plate. This will result in 1.6-1.8 x 10
4 cells per well. Cell viability and transfection efficiency are affected by the density at which cells are plated. Plating too many cells will reduce transfection efficiency, and plating too few cells will result in excessive cell death. xi. Perform transfection 18-24h after plating, (step 20j) at which point cells should be approximately 70-80% confluent. xii. Transfection Mix Preparation. For the transfection of each well, mix the desired combinations of prime editor, epegRNA, nicking sgRNA, and MLH1dn expression plasmids following the transfection setup below: Every well of a PE5 editing experiment will receive a plasmid dose of each PE5 editing component: prime editor, epegRNA, nicking sgRNA, MLH1dn. When screening epegRNAs, normalizing the concentration of all epegRNA plasmids and making a mastermix of the other PE5 components to simplify the experimental workflow is recommended. For example, if screening 15 epegRNAs in a PE5 experiment, make a mastermix of 15 equivalents (plus overage) of prime editor plasmid, MLH1dn plasmid, sgRNA plasmid, and Opti-MEM.
Including an unedited negative control at this stage is crucial. To do so, one can either neglect the pegRNA and nicking sgRNA, or include a non-targeting pegRNA and nicking sgRNA pair.
m. Prepare a lipid solution of 0.5 µl of Lipofectamine 2000 (Thermo Fisher Scientific) per well diluted into 4.5 µl of Opti-MEM per well, following the manufacturer’s instructions. In this protocol, using lipofectamine 2000 in HEK293T cells is described. Amounts of lipid and DNA will vary based on the transfection reagent and target cell type. n. Add 5 µl of the separately prepared lipid mixture to each well of the plasmid mixture (Step 20A(xii)) to a total volume of 10 µl and incubate for 10 minutes. o. Transfer all 10 µL of the mix from Step 20A(xiv) to each well of the previously prepared 96-well tissue culture plate (Step 20A(x)). Return the plate to the incubator at 37 °C and 5% CO2 when all wells have been treated. Take care to gently add the DNA and lipid mixture to the culture well. Forcefully ejecting liquid against the plated cell monolayer may dislodge cells from the growth surface or lead to toxicity. B. Prime editing in primary human fibroblasts via RNA electroporation (Timing 4- 5 Days) In this procedure, PEmax and MLH1dn are delivered as in vitro transcribed mRNAs (Step 4), and the epegRNA and nicking sgRNA are delivered as chemically modified synthetic RNAs (Step 3c).
It has been observed that prime editing efficiency is highest in primary human fibroblasts when mRNA and synthetic RNA prime editing reagents are delivered by electroporation. In this example, a PE5 electroporation is described, which typically yields the highest editing efficiency out of all PE systems and reduces indels relative to PE3. A PE5 editing experiment requires four PE components: (1) PEmax (2) an epegRNA (3) a nicking sgRNA (4) MLH1dn. In systems such as PE2, PE3, PE3b, and PE4, the nicking sgRNA and/or MLH1dn are not included. Here, electroporation is conducted using the Lonza 4D Nucleofector with X unit (Lonza) but can be completed with an alternative electroporation system. The conditions described here were optimized for primary human fibroblasts: considerable optimization of electroporation conditions for other cell types should be expected. Protocols for optimization are available from electroporation equipment manufacturers. i. Primary human fibroblast cell culture. Follow the vendor-specified protocol to maintain fibroblasts (Coriell Institute) in cell culture. Briefly, grow fibroblasts in T75 tissue culture flasks in DMEM (Thermo Fisher Scientific) supplemented with 20% (vol/vol) FBS (Thermo Fisher Scientific) at 37 °C and 5% CO2. It has been found that in general, DMEM supplemented with 20% FBS is suitable for most primary fibroblasts, but always reference vendor-recommended growth instructions. ii. Passage fibroblasts until 70% confluent. When 70% confluent, passage cells by removing growth medium, washing the cell monolayer with 1x PBS, and then removing the PBS wash. iii. Add 3 ml of TrypLE (Thermo Fisher Scientific) and incubate at 37 °C and 5% CO
2 for 5 minutes to dissociate the adherent cells. iv. After incubation, add 10mL of FBS-supplemented DMEM to the flask. Pipette repeatedly to detach cells from the flask’s growth surface and dissociate the cells. v. Reseed dissociated cells into a fresh flask to continue subculture or use the cells immediately for an RNA electroporation. Common maintenance antibiotics such as penicillin and streptomycin can be included during the fibroblast culture, however they may affect cell physiology. Do not allow cells to reach a confluency higher than 80%. For most primary fibroblast cell lines utilized, passaging at a 1:5 ratio every 2-3 days is sufficient. However, growth characteristics will likely vary between cell lines and may need to be adjusted. vi. Cell preparation for Lonza electroporation. Count dissociated cells from Step 20B(v), using a Nucleocounter NC-3000 (Chemometec) or other cell counter to determine the density of the dissociated cells.
vii. Calculate the total number of cells required, using 1.0 x 10
5 fibroblasts per electroporation well. Centrifuge the total number of required cells in an appropriately sized tube at 150g for 5 minutes. viii. A pellet of cells will form. Remove and discard supernatant and wash the pellet of cells with 1 ml of PBS. Resuspend the cells in the PBS. ix. Repeat the 5 min centrifugation (Step 20B(vii)) to pellet the cells again. Remove and discard the supernatant. x. Prepare the electroporation buffer for the Lonza SE nucleofection kit (Lonza) during the centrifugation steps. For each electroporation, mix 16.4 µl of Lonza SE nucleofector solution with 3.6 µl of Lonza SE supplement solution, for a total of 20 µl prepared electroporation buffer per electroporation. xi. Resuspend the pelleted cells from Step 20B(ix) with the prepared nucleofection solution from Step 20B(x). For example, if one intended to prepare 5 electroporations, a washed pellet of 5 x 10
5 cells would be resuspended in 100 µl of prepared electroporation buffer. xii. Prepare the prime editor reagent mixture. Prepare the following final reagent mix for the electroporation reaction:
Holding cells in the nucleofection buffer for extended periods of time reduces cell viability and electroporation efficiency. Work as quickly as possible once the washed pellet from Step 20B(ix) is resuspended in the nucleofection buffer from step 20B(x). If preparing many electroporations, premix the RNA components from Step 20B(xii) and hold them on ice until step 20B(xi) is complete.
Including an unedited negative control at this stage is crucial. To do so, one can either neglect the pegRNA and nicking sgRNA, or include a non-targeting pegRNA and nicking sgRNA pair. xiii. Transfer the 22 µl reagent mix into the 20-µl nucleocuvette wells included in the Lonza SE kit. Air bubbles in the cuvette will disrupt the electroporation. Use a thin pipette tip (e.g., a common 10 µL tip) to disrupt bubbles or drag bubbles out of the cuvette. f. Electroporate the reaction mix using program CM-130 on a Lonza 4D nucleofector. g. After electroporation, add 80 µl of warm FBS-supplemented DMEM growth media to each electroporation reaction and gently mix. Incubate for 10 min at room temperature to allow cells to recover. h. Following the incubation at room temperature, gently mix and transfer 40 µL of the recovered cell mix to a 48-well tissue culture plate filled with 250 µl of prewarmed FBS- supplemented DMEM growth media and transfer it to an incubator at 37 °C and 5% CO
2. [0572] Preparation of mammalian cells for HTS (Timing 1 Day) 21. 72 hours after lipid transfection of plasmids into HEK293T cells (Step 20A(v)) or electroporation of RNA into primary human fibroblasts (20B(xvi)), cells are lysed for gDNA harvesting and HTS analysis. Here, a simple cell lysis method for harvesting gDNA without further purification steps is described. Many alternative methods for harvesting gDNA can be used. 22. Prepare a fresh aliquot of complete mammalian cell lysis buffer (See Reagent Setup) by adding a 1:1,000-fold (vol/vol) dilution of proteinase K (NEB) into stored incomplete cell lysis buffer. 23. Remove media from edited cells from Step 20A(v) or Step 20B(xvi) and carefully wash with PBS. Do not disturb the plated monolayers. Remove any residual PBS. 24. Cell lysis. Add lysis buffer directly to PBS-washed plates from Step 23. For lysis of 96-well plates, use 50 µl lysis buffer per well. Lysis buffer volume may need to be adjusted for different cell types or different cell densities. 25. Incubate plates at 37 °C for 1 hour after adding lysis buffer. Adding fresh lysis buffer to cell monolayers will generate a viscous solution that is difficult to pipette. This incubation can be completed on a thermocycler but will be complicated by difficult liquid transfers. Lysing cells directly in culture plates is recommended.
26. After incubation, inactivate proteinase K by transferring the lysate into PCR plates or strips and heating at 80 °C for 30 minutes on a thermocycler. Heat-inactivated lysis mix can be used as a PCR template in subsequent HTS analysis. Cell lysis mix can be stored at 4 °C for 1 week or -20 °C for several months. [0573] HTS preparation for prime editing analysis (Timing 1-2 Days) 27. Design and order PCR1 primers to amplify the target genomic locus. Using NCBI’s Primer-BLAST tool to aid with the design of PCR1 primers is recommended. Primers should amplify a region spanning at least from 25 bp upstream of the epegRNA- induced nick to 25 bp downstream of the 3′ flap generated by the RT or any secondary nick (whichever is longer). If PCR1 primers are too close to either nick site, accurate indel quantification with CRISPResso2 will not be possible (see Table 4). PCR1 primers require 5′ adaptor sequences (see Table 2) so that individual samples can be barcoded in a second PCR (PCR2; see Step 32). These barcodes enable the identification of individual samples during later HTS analysis. 28. Prepare the PCR1 reaction as follows:
Starting with 1 µl of lysis mix as a PCR template is recommended, but optimization of this volume may be required. Post-transfection cell density, cell type, and lysis volume will influence gDNA yields from the lysis mix (Step 27) and may affect PCR performance. Assuming cells divide twice between seeding and lysis, there will be ~1,280 cells/µl of lysis buffer. Adding less than 1 µl of lysis mix to PCR1 risks bottlenecking downstream analysis by the number of cells analyzed, as opposed to the detection limit of the Miseq. Phusion U Green Multiplex Mastermix is typically used for PCR1 and PCR2. It includes a density reagent and two electrophoresis tracking dyes for direct loading of PCR products into gels, which saves considerable time during the HTS library preparation. While convenient,
these properties are not critical, and any other comparable high-fidelity DNA polymerase may be used. 29. Perform PCR1 under the following conditions:
Excessive cycles of amplification at this step and PCR2 (Step 33) can introduce amplification bias. Bias can be minimized (but not completely removed) by performing as few PCR cycles as possible. qPCR should be used to determine this minimum cycle number, which corresponds to the top of the linear range.24–29 cycles are sufficient for most loci. The optimal number of cycles for PCR1 will vary between amplicons. If the target edit is a large deletion, PCR bias is more likely to occur. It has been found that for deletions 50 bp or less, bias is typically in the single-digit percentage range, but for larger deletions, the amount of bias can increase to 30-40%
42. 30. Confirm efficient and precise amplification of PCR1 amplicons using gel electrophoresis. Run 5 µL of each PCR1 reaction on a 1% (wt/vol) agarose gel at 140 V/cm for 10 minutes. Amplicons should be the length of the amplified genomic locus plus approximately 70bp. The additional ~ 70 bp in length is from the included 5′ adaptors appended to the PCR1 primers (See Table 2). Unoptimized PCR1 primers can bind nonspecifically throughout the genome and produce multiple amplification bands after PCR1. Generally, 3-5 pairs of PCR1 primers are tested for each new site to find a specific, high-efficiency pair. If a specific primer pair cannot be found, gel extraction of the desired band is possible following PCR2. 31. Dilute PCR2 primers to 10 µM. Forward and reverse primer sequences for PCR2 are designated by Illumina: (support.illumina.com/downloads/illumina-adapter-sequences- document-1000000002694.html). 32. Use PCR1 products (Step 31) as a PCR template for PCR2. This second amplification appends Illumina indices that uniquely barcode individual samples. The PCR2 primers bind to the 5′ adaptor sequences appended to the PCR1 primers (See Table 2). Prepare the PCR2 reaction:
Use a unique combination of PCR2-Forward and PCR2-Reverse Illumina indices for each sample. This will enable their identification for use in later HTS steps. 33. Perform PCR2 under the following conditions:
has been found that 7–10 cycles are generally a good starting point. 34. Confirm efficient and precise amplification of PCR2 amplicons using gel electrophoresis. Run 5 µL of each PCR2 reaction on a 1% (wt/vol) agarose gel at 140 V/cm for 10 minutes. Amplicons should be the length of the amplified genomic locus plus approximately 130bp. The additional 130 bp in length is from the sum of included 5′ adaptors appended to the PCR1 primers (~ 70bp, See Table 2) and the length of the appended PCR2 Illumina indices (~ 60bp). 35. If all PCR2 products are approximately the same length (<100 bp difference), pool 2 µL of each PCR2 product into a single mastermix. This mastermix will be used for a subsequent gel extraction (Step 36) and should have a minimum volume of 40 µL to ensure enough PCR product is present for an efficient gel extraction. Increase the volume of each individual pooled PCR2 product as needed to reach the 40 µL minimum volume (e.g., 4 µl of each PCR2 product if there are only 10 PCR2 reactions). If PCR2 products have variable length (>100 bp difference), pool like amplicons into separate mastermixes based on size similarity. Sequencing coverage for an individual PCR2 product will be directly related to the molar amount of that product pooled into the gel extraction mastermix (Step 36). PCR2 yields
(evaluated via agarose gel band intensity) and desired sequencing coverage of each PCR2 sample should be considered jointly when pooling individual samples into the gel extraction mastermix. Volume inputs into the gel extraction mastermix can be varied to approximately achieve the desired level of sequencing coverage for each sample. 36. Load 40-60 µl of the gel extraction mastermix onto a 1% (wt/vol) agarose gel for gel extraction. Run the gel for 20-30 min at 140 V. 37. Excise the desired PCR2 band from the gel using a razor blade and purify the size- separated amplicon from the agarose using the QIAquick Gel Extraction Kit (Qiagen) or equivalent gel extraction kit. Elute the gel-extracted DNA in nuclease-free water. It is important to perform this gel extraction precisely. Shorter amplicons bind more efficiently to the Miseq flow cell, so contamination with low-molecular weight primer dimer will cause the loss of many reads in the subsequent Miseq run. Therefore, be careful to excise only the desired amplicon and exclude primer dimer. If PCR1 or PCR2 produced several bands, only the desired length band should be gel extracted. If a large insertion or deletion was performed, gel extract an inclusive range that would contain both the starting and ending amplicon lengths. 38. Quantify the concentration of the eluted DNA using a Qubit kit or similar technique, following manufacturer instructions. Incorrectly determining the concentration of a library could result in a failed MiSeq run or insufficient sequencing coverage. Underestimating the concentration will cause overloading of the sequencer in downstream steps, which can cause the run to fail due to over-clustering. Overestimating the concentration will lead to too little sample being loaded onto the sequencer, yielding fewer sequencing reads per sample. It is essential to determine the library concentration accurately. 39. Dilute the library to precisely 4 nM using the concentration determined in step 38. 40. Illumina MiSeq DNA sequencing. Follow the instructions in the Illumina user manual to complete the remaining library-preparation steps and load the sequencer. [0574] HTS analysis (Timing 1-4 Hours) [0575] A variety of computational pipelines are suitable for analyzing sequencing data generated by genome editing experiments. Here, a typical workflow for batch quantification of prime editing efficiencies using CRISPResso2 is described. The following protocol assumes the user already has access to CRISPResso2 via Docker, Bioconda, or local
installation. Additional details for using CRISPResso2 can be found in the public code repository (github.com/pinellolab/CRISPResso2) or original publication. 41. Generate individual tab-delimited batch parameter files for each target amplicon. While CRISPResso2 can perform batch analysis on multiple amplicons in the same run, doing so will prevent the generation of certain summary tables and plots. Populate the files according to the guidelines in Table 4. The workflow for quantifying prime editing efficiency using CRISPResso2 differs slightly between quantifying single point mutations (requiring standard mode) versus insertions, deletions, or substitutions of multiple base pairs (requiring HDR mode). For ease of analysis, further sort samples into separate batch files for analysis using only standard mode or only HDR mode. 42. Run CRISPResso2 in batch mode for a specific amplicon by calling the appropriate batch parameter file. 43. Quantifying single point mutations. Open the “Nucleotide_percentage_summary.txt” file and collect the frequency of the desired edit for each sample. Using the “CRISPRessoBatch_quantification_of_editing_frequency.txt” file, the frequency of alleles containing only the desired edit (without indels) for each sample may be derived by dividing the number of reads under “Reads aligned” by the number of reads under “Reads_aligned_all_amplicons,” and then multiplying by the previously collected edit frequency. 44. Quantifying insertions, deletions, or multiple-base pair substitutions. Using the “CRISPRessoBatch_quantification_of_editing_frequency.txt” file, the frequency of alleles containing only the desired edit (without indels) for each sample may be derived by dividing the number of reads under “Reads aligned” for the HDR amplicon by the number of reads under “Reads_aligned_all_amplicons.” 45. Quantifying indels. Using the “CRISPRessoBatch_quantification_of_editing_frequency.txt” file, the frequency of alleles containing an indel for each sample may be derived by dividing the number of reads under “Discarded” (if running CRISPResso2 in HDR mode, sum the discarded reads aligning to the reference or the edited sequence) by the number of reads under “Reads_aligned_all_amplicons,” provided that “discard_indel_reads” was set to TRUE for the analysis. 46. Repeat steps 41-45 as necessary for each amplicon to be analyzed.
[0576] Troubleshooting advice is summarized in Table 5. Timing [0577] Steps 1-2, Design of epegRNAs and nicking sgRNAs: 1 d [0578] Step 3a, Generation of epegRNAs by Golden Gate cloning: 3 d [0579] Step 3b, Generation of epegRNAs by isothermal assembly: 3 d [0580] Step 3c, Acquiring purified, chemically modified, synthetic epegRNAs, pegRNAs, or sgRNAs: 7-42 d [0581] Steps 4-19, Preparation of in vitro transcribed PEmax mRNA: 1-2 d [0582] Step 20A, Prime editing in HEK293T cells via plasmid transfection: 4-5 d [0583] Step 20B, Prime editing in primary human fibroblasts via RNA electroporation: 4-5 d [0584] Step 21-26, Preparation of mammalian cells for HTS: 1 d [0585] Step 27-40, HTS preparation for prime editing analysis: 1-2 d [0586] Step 41-46, HTS analysis: 1-4 h Anticipated results [0587] With a few optimizations for the desired edit, prime editing can enable highly efficient and precise genome editing in mammalian cells. Here, the anticipated results from screening pegRNAs and nicking sgRNAs for prime editing an amenable cell line (HEK293T) are shown, which highlights the importance of optimizing pegRNA PBS and RTT length and sgRNA spacer sites (FIGs.7A-7D). In a less amenable cell line (HeLa), it was also demonstrated that the use of PEmax, epegRNAs, PE4/PE5 systems, and additional MMR- evading benign edits can substantially elevate editing efficiency compared to the original prime editing approaches (FIGs.7E-7F). Analysis of high-throughput sequencing data with CRISPResso2 yields the allelic outcomes from editing, revealing the on-target purity of the intended genomic change (FIG.7G). As shown in induced pluripotent stem cells
69, the efficiency of prime editing can vary widely across delivery methods (plasmid DNA, mRNA; FIG.7H) and should be optimized for the desired application. [0588] Table 1. Use cases for various PE systems and modifications.
[0589] For a given prime editing experiment, one option from each category above should be selected. When selecting PE systems and the incorporation of silent mutations, the optimal version will depend on the edit, cell type, and application. For these decisions, empirical testing for each site and mutation is needed to ensure optimal editing.
[0590] Table 2. Example oligonucleotide sequences for prime editing procedure
[0591] Table 3. DNA amounts for lipid transfection, based on prime editor system.
[0592] All of these amounts, except for those associated with single-transfection integration, have been optimized for 96-well plate transfections of HEK293T cells using 0.5 µl per well of Lipofectamine 2000. The single-transfection integration amounts have been optimized for 48-well plate transfections of HEK293T cells using 1 µl per well of Lipofectamine 2000. [0593] Table 4. CRISPResso2 common batch parameters
[0594] Table 5. Troubleshooting advice
[0595] Table 6. pegRNA and nicking sgRNA sequences used in FIG.7
Example 2. MMR-evading silent edits can double as new, more effective PE3b protospacers [0596] Previously, it has been demonstrated that the inclusion of continuous or semi- continuous silent edits near a prime edit can increase edit installation efficiency and reduce indels. Here, it is demonstrated that these silent mutations used to evade MMR can also create a new protospacer to enable a PE3b approach. In a PE3b approach, the secondary nick only occurs after prime editing has occurred on the edit strand but before the edit has been copied to the non-edit strand. [0597] In the experiment shown in FIG.8C, 3 prime editing conditions were used: (None), which is a PE2 approach with no secondary nicking sgRNA; (3), in which the nicking sgRNA named “nick 3” was used in a PE3 approach; (13), in which the nicking sgRNA named “nick 13” was targeted to a protospacer with installed MMR-evading silent edits in a PE3b approach. A (No Edit) control is also included, in which cells were not edited. Two schematics of this experiment are shown in FIG.9. The nick 13 PE3b approach yields editing efficiencies improved or at least comparable to a PE3 approach, but also demonstrates reduced indel rates compared to a PE3 approach, which are comparable to a PE2 approach. [0598] Because several MMR-evading silent edits can be installed around a desired edit, PE3b nicking protospacers that use these multiple silent edits can more effectively discriminate between edited and unedited DNA. PE3b sgRNAs that rely on several installed silent edits are much less likely to unintentionally nick DNA before a prime edit has been installed (compared to originally described PE3b nicks, which generally only rely on a single edit to selectively nick after an edit has been installed). The effect of this enhanced selectivity is a reduced rate of observed indels. This is reflected in the plot shown in FIG.8C, in which the nick 3 PE3b approach shows exceptionally low rates of observed indels. Because these newly described PE3b nicks rely on the installation of several silent edits to make nicking possible, the temporal order of nicking is tightly controlled (compared to a PE3b strategy that only relies on a single edit). As a result, nicking only occurs after the initial pegRNA edit has occurred, greatly reducing indels. Example 3. Nicking sgRNAs can use noncanonical PAMs [0599] In an experiment to correct the AHC D801N pathogenic c.2401A mutation, three nicking sgRNAs (named “nick 13”, “nick 14”, and “nick 15”, respectively) were tested using the PE3b approach. A schematic of the sequence of the target locus after installation of the intended edits is shown in FIG.10. All three nicks use the PE3b approach – the spacers for
these PE3b nicking guide RNAs will not bind to the target locus until the encoded silent edits (grey) and the encoded corrective edit (green) have been installed. Nick 14 and nick 15 rely on NGG PAMs (nick 14 GGG; nick 15 TGG) on the top strand of DNA to permit protospacer recognition. Nick 13 relies on an NAG PAM (specifically, a TAG PAM, which is not shown in the schematic in FIG.10) that exists before the A>G silent edit is installed on both DNA strands. [0600] The unedited target with the incomplete nick 13 protospacer (without installed silent edits) and the NAG (specifically, TAG) PAM are shown in FIG.11. After the initial pegRNA edit has occurred on the bottom strand of DNA, a heteroduplex of mismatched DNA exists with an edited bottom strand and an unedited top strand, shown in FIG.12. The NGA PAM on the unedited top strand is recognized first by the nick 13 sgRNA + PE RNP complex. Then, the nick 13 spacer base pairs with the nick 13 target locus on the bottom edited strand (which now contains the corrective edit and MMR-evading silent edits, which in turn permit binding between the nick 13 spacer and the edit strand and nicking of the non-edit strand). A simple diagram of the nick 13 + PE RNP recognizing this DNA hetero duplex is shown in FIG.13, with edits on the bottom strand shown in red. PAM recognition and spacer base pairing lead to nicking by the RNP complex, and both DNA strands are converted to the final edited product (FIG.10). [0601] Nick 13 recognizes an NGA PAM, a non-canonical PAM of SpCas9, to introduce a secondary nick. The use of a non-canonical PAM for a secondary nicking sgRNA has not been previously considered for the PE3, PE3b, PE5, and PE5b approaches. Introducing non- canonical PAMs into consideration for PE3, PE3b, PE5, and PE5b secondary nicks broadens the possibility of viable nicking sgRNAs. The availability of secondary nicking sgRNAs in PE3, PE3b, PE5, and PE5b approaches can be limited when secondary nicking sgRNAs are designed based on a canonical NGG PAM recognized by a nickase version of SpCas9. The observation that these prime editors can also take advantage of WT SpCas9’s NAG and NGA non-canonical PAMs broadens the possibility of available nicking sgRNAs. [0602] The editing efficiency of PE3b approaches using nicks 13, 14, and 15 to correct the AHC D801N pathogenic c.2401A mutation can demonstrate the value of this point (FIG.14). [0603] Of the three PE3b nicks tested, nick 13 demonstrates the most beneficial improvement over a PE2 approach with no secondary nick (none). If one were to only consider PE3b protospacers with an NGG PAM, nick 13 (and its optimal editing efficiency) would not have been identified – nicks 14 and 15 would have only been tested. The inclusion of protospacers
with a non-canonical PAM in the search for nicking sgRNAs increases the possibility of identifying nicking sgRNAs that are optimal for a given prime editing approach, which was not previously considered. [0604] While the example shown demonstrates the use of a nicking protospacer with a non- canonical PAM for a PE3b approach, the use of nicking protospacer with a non-canonical PAM for PE3, PE5, and PE5b approaches may also be envisioned. The consideration of nicking protospacers with a non-canonical PAM for PE3, PE3b, PE5, and PE5b approaches will increase the number of available nicking sgRNAs to screen and improve the chances of finding an optimal nick for a given application. Example 4. Methods for optimizing PE parameters for a specific edit [0605] For a single edit, there is an extremely large total number of possible pegRNAs that could be used. Based on previous work, optimal PBS lengths have ranged from 8 to 15 nt, and the optimal RTT range is even larger (10 to 74 nt). This PBS x RTT length matrix alone produces 520 possible pegRNAs. There are also two epegRNA modifications, mpknot and evopreQ1, as well as unmodified pegRNAs. This increases the total number to 1560 possible pegRNAs per edit. Layered onto this is the potential to encode silent edits (MMR-evading edits or PAM edits) in the RT template of the pegRNA. For silent edits, it is more difficult to give an estimated number of potential options, because it depends on the local sequence (i.e., what changes will and will not be silent). Assuming that 2 potential PAM edits and three different potential MMR-evading edits are combined, the combination leads to six different combinations of silent edits, bringing the total possible number of pegRNAs to 9,360. Regarding the pegRNA scaffold, either the canonical scaffold or the flip and extension scaffold may be used, bringing the final total of pegRNAs to 18,720. Beyond the pegRNA, there are many different nicking sgRNAs that can be used for a single edit. It has typically been found that there are around 5 potential nicking sgRNAs for a given edit. There are also two editor architectures (the original architecture and the max’ed architecture). Finally, one can choose between many PE systems (PE2-PE5, leading to 4 options). Accordingly, in an optimized set of pegRNA and PE system designs, there are 18,720 pegRNAs x 5 nicking sgRNAs x 2 editor architectures x 4 PE systems = 748,800 potential ways to make a single prime edit. [0606] The methods provided herein (summarized in FIG.15) significantly reduce the number of PE approaches to test, making it feasible to find a high-efficiency prime editing
approach. The specific insights employed in these methods are knowing how to prioritize various parameters when optimizing pegRNAs and other PE parameters. REFERENCES [0607] 1. Jinek, M. et al. A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science 337, 816–821 (2012). [0608] 2. Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR– Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol.38, 824–844 (2020). [0609] 3. Frangoul, H. et al. CRISPR-Cas9 Gene Editing for Sickle Cell Disease and β- Thalassemia. N. Engl. J. Med.384, 252–260 (2021). [0610] 4. Gillmore, J. D. et al. CRISPR-Cas9 In Vivo Gene Editing for Transthyretin Amyloidosis. N. Engl. J. Med.385, 493–502 (2021). [0611] 5. Giannoukos, G. et al. UDiTaS
TM, a genome editing detection method for indels and genome rearrangements. BMC Genomics 19, 212 (2018). [0612] 6. Stadtmauer, E. A. et al. CRISPR-engineered T cells in patients with refractory cancer. Science 367, eaba7365 (2020). [0613] 7. Webber, B. R. et al. Highly efficient multiplex human T cell engineering without double-strand breaks using Cas9 base editors. Nat. Commun.10, 5222 (2019). [0614] 8. Turchiano, G. et al. Quantitative evaluation of chromosomal rearrangements in gene-edited human stem cells by CAST-Seq. Cell Stem Cell 28, 1136-1147.e5 (2021). [0615] 9. Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand breaks induced by CRISPR–Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol.36, 765–771 (2018). [0616] 10. Song, Y. et al. Large-Fragment Deletions Induced by Cas9 Cleavage while Not in the BEs System. Mol. Ther. - Nucleic Acids 21, 523–526 (2020). [0617] 11. Zuccaro, M. V. et al. Allele-Specific Chromosome Removal after Cas9 Cleavage in Human Embryos. Cell 183, 1650-1664.e15 (2020). [0618] 12. Alanis-Lobato, G. et al. Frequent loss of heterozygosity in CRISPR-Cas9– edited early human embryos. Proc. Natl. Acad. Sci.118, e2004832117 (2021). [0619] 13. Leibowitz, M. L. et al. Chromothripsis as an on-target consequence of CRISPR–Cas9 genome editing. Nat. Genet.53, 895–905 (2021). [0620] 14. Enache, O. M. et al. Cas9 activates the p53 pathway and selects for p53- inactivating mutations. Nat. Genet.52, 662–668 (2020).
[0621] 15. Anzalone, A. V. et al. Search-and-replace genome editing without double- strand breaks or donor DNA. Nature 576, 149–157 (2019). [0622] 16. Cox, D. B. T., Platt, R. J. & Zhang, F. Therapeutic genome editing: prospects and challenges. Nat. Med.21, 121–131 (2015). [0623] 17. Chapman, J. R., Taylor, M. R. G. & Boulton, S. J. Playing the End Game: DNA Double-Strand Break Repair Pathway Choice. Mol. Cell 47, 497–510 (2012). [0624] 18. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016). [0625] 19. Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, aaf8729 (2016). [0626] 20. Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017). [0627] 21. Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol.38, 883–891 (2020). [0628] 22. Newby, G. A. et al. Base editing of haematopoietic stem cells rescues sickle cell disease in mice. Nature 595, 295–302 (2021). [0629] 23. Koblan, L. W. et al. In vivo base editing rescues Hutchinson–Gilford progeria syndrome in mice. Nature 589, 608–614 (2021). [0630] 24. Newby, G. A. & Liu, D. R. In vivo somatic cell base editing and prime editing. Mol. Ther.29, 3107–3124 (2021). [0631] 25. Koblan, L. W. et al. Efficient C•G-to-G•C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat. Biotechnol.39, 1414– 1425 (2021). [0632] 26. Kurt, I. C. et al. CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat. Biotechnol.39, 41–46 (2021). [0633] 27. Chen, L. et al. Programmable C:G to G:C genome editing with CRISPR-Cas9- directed base excision repair proteins. Nat. Commun.12, 1384 (2021). [0634] 28. Yuan, T. et al. Optimization of C-to-G base editors with sequence context preference predictable by machine learning methods. Nat. Commun.12, 4902 (2021). [0635] 29. Zhao, D. et al. Glycosylase base editors enable C-to-A and C-to-G base changes. Nat. Biotechnol.39, 35–40 (2021).
[0636] 30. Chen, P. J. et al. Enhanced prime editing systems by manipulating cellular determinants of editing outcomes. Cell 184, 5635-5652.e29 (2021). [0637] 31. Nelson, J. W. et al. Engineered pegRNAs improve prime editing efficiency. Nat. Biotechnol. (2021) doi:10.1038/s41587-021-01039-7. [0638] 32. Kim, D. Y., Moon, S. B., Ko, J.-H., Kim, Y.-S. & Kim, D. Unbiased investigation of specificities of prime editing systems in human cells. Nucleic Acids Res.48, 10576–10589 (2020). [0639] 33. Schene, I. F. et al. Prime editing for functional repair in patient-derived disease models. Nat. Commun.11, 5352 (2020). [0640] 34. Gao, P. et al. Prime editing in mice reveals the essentiality of a single base in driving tissue-specific gene expression. Genome Biol.22, 83 (2021). [0641] 35. Jin, S. et al. Genome-wide specificity of prime editors in plants. Nat. Biotechnol.39, 1292–1299 (2021). [0642] 36. Habib, O., Habib, G., Hwang, G.-H. & Bae, S. Comprehensive analysis of prime editing outcomes in human embryonic stem cells. Nucleic Acids Res.50, 1187–1197 (2022). [0643] 37. Liu, P. et al. Improved prime editors enable pathogenic allele correction and cancer modelling in adult mice. Nat. Commun.12, 2121 (2021). [0644] 38. Chen, B. et al. Dynamic Imaging of Genomic Loci in Living Human Cells by an Optimized CRISPR/Cas System. Cell 155, 1479–1491 (2013). [0645] 39. Hussmann, J. A. et al. Mapping the genetic landscape of DNA double-strand break repair. Cell 184, 5653-5669.e25 (2021). [0646] 40. Spencer, J. M. & Zhang, X. Deep mutational scanning of S. pyogenes Cas9 reveals important functional domains. Sci. Rep.7, 16836 (2017). [0647] 41. Song, M. et al. Generation of a more efficient prime editor 2 by addition of the Rad51 DNA-binding domain. Nat. Commun.12, 5617 (2021). [0648] 42. Anzalone, A. V. et al. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat. Biotechnol.1–10 (2021) doi:10.1038/s41587-021-01133-w. [0649] 43. Choi, J. et al. Precise genomic deletions using paired prime editing. Nat. Biotechnol.1–9 (2021) doi:10.1038/s41587-021-01025-z.
[0650] 44. Jiang, T., Zhang, X.-O., Weng, Z. & Xue, W. Deletion and replacement of long genomic sequences using prime editing. Nat. Biotechnol.1–8 (2021) doi:10.1038/s41587-021-01026-y. [0651] 45. Lin, Q. et al. High-efficiency prime editing with optimized, paired pegRNAs in plants. Nat. Biotechnol.39, 923–927 (2021). [0652] 46. Zhuang, Y. et al. Increasing the efficiency and precision of prime editing with guide RNA pairs. Nat. Chem. Biol.18, 29–37 (2022). [0653] 47. Wang, J. Efficient targeted insertion of large DNA fragments without DNA donors. Nat. Methods 19, 25 (2022). [0654] 48. Ioannidi, E. I. et al. Drag-and-drop genome insertion without DNA cleavage with CRISPR-directed integrases. biorxiv.org/lookup/doi/10.1101/2021.11.01.466786 (2021) doi:10.1101/2021.11.01.466786. [0655] 49. Lin, Q. et al. Prime genome editing in rice and wheat. Nat. Biotechnol.38, 582–585 (2020). [0656] 50. Zheng, C. et al. A flexible split prime editor using truncated reverse transcriptase improves dual-AAV delivery in mouse liver. Mol. Ther. S1525001622000053 (2022) doi:10.1016/j.ymthe.2022.01.005. [0657] 51. Zhi, S. et al. Dual-AAV delivering split prime editor system for in vivo genome editing. Mol. Ther.30, 283–294 (2022). [0658] 52. Liu, Y. et al. Efficient generation of mouse models with the prime editing system. Cell Discov.6, 1–4 (2020). [0659] 53. Lin, J. et al. Modeling a cataract disorder in mice with prime editing. Mol. Ther. - Nucleic Acids 25, 494–501 (2021). [0660] 54. Böck, D. et al. In vivo prime editing of a metabolic liver disease in mice. Sci. Transl. Med.14, 636 (2021). [0661] 55. Kim, Y. et al. Adenine base editing and prime editing of chemically derived hepatic progenitors rescue genetic liver disease. Cell Stem Cell 28, 1614-1624.e5 (2021). [0662] 56. Choi, J. et al. A temporally resolved, multiplex molecular recorder based on sequential genome editing. biorxiv.org/lookup/doi/10.1101/2021.11.05.467388 (2021) doi:10.1101/2021.11.05.467388. [0663] 57. Erwood, S. et al. Saturation variant interpretation using CRISPR prime editing. Nat. Biotechnol.1–11 (2022) doi:10.1038/s41587-021-01201-1.
[0664] 58. Xu, R., Liu, X., Li, J., Qin, R. & Wei, P. Identification of herbicide resistance OsACC1 mutations via in planta prime-editing-library screening in rice. Nat. Plants 7, 888– 892 (2021). [0665] 59. Qian, Y. et al. Efficient and precise generation of Tay–Sachs disease model in rabbit by prime editing system. Cell Discov.7, 50 (2021). [0666] 60. Petri, K. et al. CRISPR prime editing with ribonucleoprotein complexes in zebrafish and primary human cells. Nat. Biotechnol. (2021) doi:10.1038/s41587-021-00901- y. [0667] 61. Jang, H. et al. Application of prime editing to the correction of mutations and phenotypes in adult mice with liver and eye diseases. Nat. Biomed. Eng.6, 181–194 (2022). [0668] 62. Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol.39, 198–206 (2021). [0669] 63. Gao, Z., Herrera-Carrillo, E. & Berkhout, B. Delineation of the Exact Transcription Termination Signal for Type 3 Polymerase III. Mol. Ther. - Nucleic Acids 10, 36–44 (2018). [0670] 64. Hsu, J. Y. et al. PrimeDesign software for rapid and simplified design of prime editing guide RNAs. Nat. Commun.12, 1034 (2021). [0671] 65. Hwang, G.-H. et al. PE-Designer and PE-Analyzer: web-based design and analysis tools for CRISPR prime editing. Nucleic Acids Res.49, W499–W504 (2021). [0672] 66. Anderson, M. V., Haldrup, J., Thomsen, E. A., Wolff, J. H. & Mikkelsen, J. G. pegIT - a web-based design tool for prime editing. Nucleic Acids Res.49, W505–W509 (2021). [0673] 67. Chow, R. D., Chen, J. S., Shen, J. & Chen, S. A web tool for the design of prime-editing guide RNAs. Nat. Biomed. Eng.5, 190–194 (2021). [0674] 68. Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol.34, 184–191 (2016). [0675] 69. Chen, P.-F. et al. Generation and characterization of human induced pluripotent stem cells (iPSCs) from three male and three female patients with CDKL5 Deficiency Disorder (CDD). Stem Cell Res.53, 102276 (2021). EQUIVALENTS AND SCOPE [0676] In the claims articles such as “a,” “an,” and “the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include “or” between one or more members of a group are considered satisfied if one,
more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process. [0677] Furthermore, the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim. For example, any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim. Where elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein. It is also noted that the terms “comprising” and “containing” are intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or sub–range within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise. [0678] This application refers to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If there is a conflict between any of the incorporated references and the instant specification, the specification shall control. In addition, any particular embodiment of the present invention that falls within the prior art may be explicitly excluded from any one or more of the claims. Because such embodiments are deemed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the invention can be excluded from any claim, for any reason, whether or not related to the existence of prior art.
[0679] Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation many equivalents to the specific embodiments described herein. The scope of the present embodiments described herein is not intended to be limited to the above Description, but rather is as set forth in the appended claims. Those of ordinary skill in the art will appreciate that various changes and modifications to this description may be made without departing from the spirit or scope of the present invention, as defined in the following claims.