Nothing Special   »   [go: up one dir, main page]

WO2020209959A1 - Nucleobase-editing fusion protein systems, compositions, and uses thereof - Google Patents

Nucleobase-editing fusion protein systems, compositions, and uses thereof Download PDF

Info

Publication number
WO2020209959A1
WO2020209959A1 PCT/US2020/021388 US2020021388W WO2020209959A1 WO 2020209959 A1 WO2020209959 A1 WO 2020209959A1 US 2020021388 W US2020021388 W US 2020021388W WO 2020209959 A1 WO2020209959 A1 WO 2020209959A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
befp
seq
sequence
domain
Prior art date
Application number
PCT/US2020/021388
Other languages
French (fr)
Inventor
Philipp KNYPHAUSEN
Original Assignee
Crispr Therapeutics Ag
Bayer Healthcare, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Crispr Therapeutics Ag, Bayer Healthcare, Llc filed Critical Crispr Therapeutics Ag
Publication of WO2020209959A1 publication Critical patent/WO2020209959A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • the present application relates to base editing using a novel protein.
  • the novel proteins can bind to apurinic/apyrimidinic (AP) sites or abasic sites within single- or double-stranded DNA, and are useful for genome editing and other applications.
  • AP apurinic/apyrimidinic
  • the converted uridine is paired with adenine in the complementary DNA strand, and as part of the repair mechanism, the uridine is replaced with the normal complement of adenine, a thymidine, thus effecting a cytidine-to-uridine conversion that can lead (in case of base editing) to a desired permanent cytidine-to-thymidine change in the genome if, during DNA repair or replication, the uridine- containing DNA strand serves as the template and a thymidine is subsequently incorporated opposite the adenine.
  • RNA-programmable endonuclease RNA-programmable endonuclease
  • a problem with this system is that cells have a natural mechanism for responding to uridine lesions that reduces the efficiency of the base editing process. This process is based on the enzyme DNA N- glycosylase (UNG), which removes the uracil base, producing an AP site or abasic site (together termed“AP site” unless otherwise distinguished).
  • UNG DNA N- glycosylase
  • DNA-(apurinic or apyrimidinic site) lyase results in a gap that is amenable to DNA repair with the base opposite the gap serving as the repair template.
  • Processing of the AP site by AP lyase is particularly undesirable in base editing applications because the formation of a gap in proximity to the intentionally introduced nick on the opposing strand effectively leads to a staggered double-stranded DNA break, which may, for example, induce cell death or poorly controllable formation of insertions and deletions at or close to the targeted site
  • An alternative DNA repair process that can also lead to undesirable outcomes during base editing involves non- templated DNA synthesis at the AP site by (translesion synthesis; TLS), which frequently results in edits other than C-to-T.
  • AAVs adeno-associated viruses
  • PAMs protospacer adjacent motifs
  • novel base editing system exhibits advantageous characteristics over existing base editing systems.
  • a base-editing fusion protein comprising: a) an AP-binding domain; b) a cytidine deaminase domain; and c) a nucleic acid recognition domain.
  • the AP-binding domain comprises an SOS response-associated peptidase (SRAP) domain.
  • SRAP domain is from 5 -hydroxymethyl cytosine binding, ESC specific (HMCES) or YedK, or a variant thereof.
  • HMCES ESC specific
  • the AP- binding domain comprises an SRAP domain from the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% sequence identity to SEQ ID NO: 5 or 6.
  • the AP-binding domain comprises an SRAP domain from the amino acid sequence of SEQ ID NO: 5 or 6.
  • the cytidine deaminase domain is from a deaminase selected from the group consisting of: APOBEC2, APOBEC3, APOBEC3E, APOBEC3F, APOBEC3G, APOBEC3H, and APOBEC4.
  • the nucleic acid recognition domain is from an RNA-programmable CRISPR-associated nuclease or a variant thereof.
  • the nucleic acid recognition domain is from a modified CRISPR-Cas9 protein that can cleave only one strand of the target DNA or has no endonuclease activity.
  • the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% sequence identity to SEQ ID NO: 7.
  • the nucleic acid recognition domain comprises a D10A mutation, an H559A mutation, and/or a N582A mutation, with respect to SEQ ID NO: 7.
  • the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7.
  • the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or 4, or a variant amino sequence having at least about 85% sequence identity to SEQ ID NO: 2 or 4. In some embodiments, the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or 4.
  • nucleic acid encoding a BEFP according to any of the embodiments described above.
  • the nucleic acid comprises the nucleotide sequence of SEQ ID NO: 1 or 3, or a variant nucleotide sequence having at least about 85% sequence identity to SEQ ED NO: 1 or 3.
  • nucleic acid comprises the nucleotide sequence of SEQ ID NO: 1 or 3.
  • a system comprising: (i) a BEFP according to any of the embodiments described above or a nucleic acid encoding the BEFP according to any of the embodiments described above; and (ii) a guide RNA (gRNA) or nucleic acid encoding the gRNA
  • a method of modifying a targeted site of a double- stranded DNA comprising contacting the double-stranded DNA with: (i) a BEFP according to any of the embodiments described above or a nucleic acid encoding the BEFP according to any of the embodiments described above; and (ii) a guide RNA (gRNA) or nucleic acid encoding the gRNA
  • the double-stranded DNA encodes a protein-of- interest (POI) or derivative thereof.
  • the double-stranded DNA is in a cell.
  • a genetically modified cell in which the genome of the cell is edited by a method according to any of the embodiments described above.
  • a method of treating a disease or condition associated with a protein-of-interest (POI) in a subject comprising providing to a cell in the subject: (i) a BEFP according to any of the embodiments described above or a nucleic acid encoding the BEFP according to any of the embodiments described above; and (ii) a guide RNA (gRNA) or nucleic acid encoding the gRNA.
  • the subject is a patient having or suspected of having the disease or condition or the subject is diagnosed with a risk of the disease or condition.
  • kits comprising one or more elements of a system according to any of the embodiments described above, and further comprising instructions for use.
  • RNA-programmable nucleic acid recognition domain or other suitable nucleic acid recognition domain e.g, an RNA-programmable nucleic acid recognition domain from a Cas protein
  • a nucleobase editing domain e.g., a nucleobase editing domain from a cytidine deaminase
  • a domain capable of binding to apurinic/apyrimidinic (AP) sites or abasic sites within single- or double-stranded DNA e.g ., an SOS response-associated peptidase, or SRAP, domain
  • One embodiment according to the invention is a BEFP comprising an RNA- programmable nucleic acid recognition domain from a Cas protein (e.g., Staphyloccociis lugdunensis (Slu) Cas9), a cytidine deaminase domain, and an AP -binding domain from 5- hydroxymethylcytosine binding, ESC specific (HMCES).
  • a Cas protein e.g., Staphyloccociis lugdunensis (Slu) Cas9
  • a cytidine deaminase domain e.g., 5- hydroxymethylcytosine binding, ESC specific (HMCES).
  • the BEFP comprises the amino acid sequence of SEQ ID NO: 4 or a variant thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%,
  • RNA-programmable nucleic acid recognition domain from a Cas protein (e.g., Staphyloccociis lugdunensis (Slu) Cas9), a cytidine deaminase domain, and an AP -binding domain from YedK, an SOS response-associated peptidase.
  • Cas protein e.g., Staphyloccociis lugdunensis (Slu) Cas9
  • a cytidine deaminase domain e.g., a cytidine deaminase domain
  • AP -binding domain from YedK
  • the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or a variant thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 2.
  • SRAP domains bind to AP sites that can be generated as an undesired side product during the process of base editing through the action of uracil-DNA glycosylase (UNG or UDG). Binding of a SRAP domain to an AP site is thought to protect this site from endonuclease activity and translesion synthesis and therefore, to diminish the generation of double-strand breaks and diversifying edits. Mechanistically, this might lead to reversion to wildtype sequences and might allow for re-targeting of the same site by the base editor. Thus, this novel BEFP provides a solution for base editing in the absence of UGI.
  • UNG or UDG uracil-DNA glycosylase
  • the base-editing fusion protein comprises
  • a domain that can covalently bind to an AP site (an AP -binding domain);
  • the BEFP contains a suitable linker polypeptide between domains a, b, and c
  • the BEFP comprises one or more nuclear localization signals.
  • the domain with covalent AP site binding activity comprises an SOS response-associated peptidase (SRAP) domain or a 5-hydroxymethyl cytosine binding, ES cell specific (HMCES) protein and is placed in front (N-terminally) of the other domains (the cytidine deaminase domain and the nucleic acid recognition domain) and components of the BEFP.
  • SRAP SOS response-associated peptidase
  • HMCES 5-hydroxymethyl cytosine binding, ES cell specific
  • the BEFP starts with this domain at the N-terminus.
  • a BEFP comprises the following components:
  • nucleic acid recognition domain g. a nucleic acid recognition domain
  • a BEFP comprises the following components in the following order from N-terminus to C-terminus:
  • nucleic acid recognition domain g. a nucleic acid recognition domain
  • a BEFP according to the invention comprises the amino acid sequence of SEQ ID NO: 2 or 4, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 2 or 4.
  • a nucleic acid encoding a BEFP according to the invention comprises a nucleic acid sequence encoding the amino acid sequence of SEQ ID NO: 2 or 4 or a variant amino acid sequence thereof.
  • a nucleic acid encoding a BEFP comprises the nucleic acid sequence of SEQ ID NO: 1 or 3, or a variant nucleic acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 1 or 3.
  • the domains of the polypeptides according to the invention may either be connected directly or via a linker peptide.
  • the linker peptides may be the same or different.
  • Suitable linker peptides include oligopeptide or polypeptide sequences.
  • Linker peptides may be rigid or flexible, and may contain sites designed to be cleaved by protease activity. Such linker peptides may function to increase stability or folding of the domains, increase expression, enable targeting, or improve other biological activity.
  • Various linker peptides are known in the art. See , e.g., Chen, et al. Adv. Drug Deliv. Rev.
  • linker peptides comprise one or more of the amino acid sequences listed in paragraph 0025 of W02017070632 (A2).
  • Nuclear localization signals are polypeptide sequences in a protein that enable transport of the protein into the nucleus of eukaryotic cells. When two or more NLS sequences are present in the protein, the NLS sequences may be the same or different.
  • Various NLS sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in WO/2001/038547, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences.
  • Such NLSs include, without limitation, the nucleoplasmin bipartite NLS, the c-myc nuclear localization sequence, and the hRNPAI M9 nuclear localization sequence.
  • Exemplary NLSs include those listed in paragraph 00204 of WO2017070632 (A2).
  • polynucleotide refers to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides.
  • this term includes, but is not limited to, single-, double-, and multi-stranded DNA and RNA, genomic DNA, cDNA, DNA-RNA hybrids/triple helices, and polymers including purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases.
  • the terms“polynucleotide” and“nucleic acid” should be understood to include, as applicable to the embodiments being described, single-stranded (such as sense or antisense) and double-stranded nucleic acids.
  • Oligonucleotide generally refers to single- or double-stranded polynucleotides at least about 5 nucleotides in length, unless otherwise indicated. Oligonucleotides are also known as “oligomers” or“oligos” and may be isolated from genes or chemically synthesized by methods known in the art.
  • Genomic DNA refers to the DNA of a genome of an organism including, but not limited to, the DNA of the genome of a bacterium, fungus, archaeon, protist, virus, plant, or animal.
  • Manipulating DNA encompasses binding, nicking one strand, or cleaving, e.g., cutting both strands of the DNA; or encompasses modifying or editing the DNA or a polypeptide associated with the DNA.
  • Manipulating DNA can silence, activate, or modulate (either increase or decrease) the expression of an RNA or polypeptide encoded by the DNA, or prevent or enhance the binding of a polypeptide to DNA.
  • nucleic acid e.g, RNA or DNA
  • a nucleic acid includes a sequence of nucleotides that enables it to non- covalently bind, e.g, form Watson-Crick base pairs and/or G/U base pairs,“anneal”, or “hybridize,” to another nucleic acid in a sequence-specific, antiparallel, manner (e.g, a nucleic acid specifically binds to a complementary nucleic acid) under the appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength.
  • a guanine (G) of a protein-binding segment (dsRNA duplex) of a guide RNA molecule is considered complementary to a uracil (U), and vice versa.
  • G U base-pair can be made at a given nucleotide position a protein-binding segment (dsRNA duplex) of a guide RNA molecule, the position is not considered to be non-complementary, but is instead considered to be complementary.
  • nucleic acid need not be 100% complementary to that of a target nucleic acid to be specifically hybridizable. Moreover, a nucleic acid may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event (e.g ., a loop structure or hairpin structure).
  • a nucleic acid can include at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or 100% sequence complementarity to a target region within the target nucleic acid sequence to which they are targeted.
  • an antisense nucleic acid in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize would represent 90 percent complementarity.
  • the remaining non complementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides. Percent
  • complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined routinely using methods known in the art, for example, a BLAST program (basic local alignment search tools) and/or PowerBLAST program (Altschul et al., J. Mol. Biol.
  • polypeptide generally refers to a chain of 50 amino acids or fewer.
  • polypeptide andprotein are used interchangeably herein, and refer to a polymeric form of amino acids of any length, which can include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones.
  • Binding refers to a non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid). While in a state of non-covalent interaction, the macromolecules are said to be“associated” or“interacting” or“binding” (e.g., when a molecule X is said to interact with a molecule Y, it is meant the molecule X binds to molecule Y in a non-covalent manner).
  • Binding interactions are generally characterized by a dissociation constant (Kd) of less than 10-6 M, less than 10-7 M, less than 10-8 M, less than 10-9 M, less than 10-10 M, less than 10-11 M, less than 10-12 M, less than 10-13 M, less than 10-14 M, or less than 10- 15 M.“Affinity” refers to the strength of binding, increased binding affinity being correlated with a lower Kd.
  • binding domain it is meant a protein domain that can bind non-covalently to another molecule.
  • a binding domain can bind to, for example, a DNA molecule (and can be termed a“DNA-binding protein”), an RNA molecule (and can be termed an“RNA-binding protein”) and/or a protein molecule (and can be termed a“protein-binding protein”).
  • the binding domain can bind to itself (forming homo dimers, homo-trimers, etc.) and/or it can bind to one or more molecules of a different protein or proteins.
  • a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains consists of serine and threonine; a group of amino acids having amide containing side chains consisting of asparagine and glutamine; a group of amino acids having aromatic side chains consists of phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains consists of lysine, arginine, and histidine; a group of amino acids having acidic side chains consists of glutamate and aspartate; and a group of amino acids having sulfur containing side chains consists of cysteine and methionine.
  • Exemplary conservative amino acid substitution groups are: valine-leucine- isoleucine
  • a nucleic acid or polypeptide has a certain percent“sequence identity” to another nucleic acid or polypeptide, meaning that, when aligned, that percentage of bases or amino acids are the same, and in the same relative position, when comparing the two sequences. Sequence identity can be determined using a number of different methods.
  • sequences can be aligned using various methods and computer programs (e.g ., BLAST, T- COFFEE, MUSCLE, MAFFT, etc.), available over the worldwide-web at sites including ncbi.nlm.nili.gov/BLAST, ebi.ac.uk/Tools/msa tcoffee, ebi.Ac.Uk/Tools/msa/muscle, mafft.cbrc/alignment/software. See, e.g., Altschul et al. (1990), L Mol. Biol.
  • sequence alignments standard in the art are used according to the disclosure to determine amino acid residues in a BEFP domain that“correspond to” amino acid residues in another polypeptide from which the BEFP domain is derived, e.g., a Cas9 endonuclease.
  • the amino acid residues of a BEFP that correspond to amino acid residues of one or more other polypeptides appear at the same position in alignments of the sequences.
  • a DNA sequence that“encodes” a particular RNA is a DNA nucleic acid sequence that is transcribed into the RNA.
  • a polydeoxyribonucleotide may encode an RNA (mRNA) containing a sequence that is translated into protein, or a polydeoxyribonucleotide may encode an RNA that is not translated into protein (e.g., tRNA, rRNA, siRNA, miRNA, or guide RNA; also called“non-coding” RNA or“ncRNA”).
  • A“protein coding sequence” or a sequence that encodes a particular protein or polypeptide is a nucleic acid sequence that is transcribed into mRNA (in the case of DNA) and is translated (in the case of mRNA) into a polypeptide in vitro or in vivo when placed under the control of appropriate regulatory sequences.
  • the boundaries of the coding sequence are determined by a start codon at the 5' terminus (N-terminus) and a translation stop nonsense codon at the 3' terminus (C -terminus).
  • a coding sequence can include, but is not limited to, cDNA from prokaryotic or eukaryotic mRNA, genomic DNA sequences from prokaryotic or eukaryotic DNA, and synthetic nucleic acids.
  • a transcription termination sequence is generally located at 3' of the coding sequence.
  • a“promoter sequence” or“promoter” is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a downstream (3' direction) coding or non-coding sequence.
  • the promoter sequence is bounded at its 3' terminus by the transcription initiation site and extends upstream (5' direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background.
  • a transcription initiation site within the promoter sequence is a transcription initiation site, as well as protein binding domains responsible for the binding of RNA polymerase.
  • Eukaryotic promoters often, but not always, contain“TATA” boxes and“CAAT” boxes.
  • a promoter can be a constitutively active promoter (e.g., a promoter that is constitutively in an active“ON” state), it may be an inducible promoter (e.g., a promoter whose state, active/” ON” or inactive/” OFF”, is controlled by an external stimulus, e.g., the presence of a particular temperature, compound, or protein.), it may be a spatially restricted promoter (e.g,
  • transcriptional control element e.g., tissue specific promoter, cell type specific promoter, etc.
  • it may be a temporally restricted promoter (e.g., the promoter is in the“ON” state or“OFF” state during specific stages of embryonic development or during specific stages of a biological process, e.g, hair follicle cycle in mice).
  • Suitable promoters can be derived from viruses and can therefore be referred to as viral promoters, or they can be derived from any organism, including prokaryotic or eukaryotic organisms.
  • Suitable promoters can be used to drive expression by any RNA polymerase (e.g., pol I, pol II, pol III, pol IV, and pol V).
  • Exemplary promoters include, but are not limited to the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al. , Nature Biotechnology 20, 497-500 (2002)), an enhanced U6 promoter (e.g. , Xia et al., Nucleic Acids Res.
  • LTR mouse mammary tumor virus long terminal repeat
  • Ad MLP adenovirus major late promoter
  • HSV herpes simplex virus
  • CMV cytomegalovirus
  • CMVIE CMV immediate early promoter region
  • RSV rous sarcoma virus
  • inducible promoters include, but are not limited to T7 RNA polymerase promoter, T3 RNA polymerase promoter, isopropyl -beta-D- thiogalactopyranoside (IPTG)-regulated promoter, lactose induced promoter, heat shock promoter, Tetracycline-regulated promoter, steroid-regulated promoter, metal-regulated promoter, estrogen receptor-regulated promoter, etc.
  • Inducible promoters can therefore be regulated by molecules including, but not limited to, doxycycline; RNA polymerase, e.g., T7 RNA polymerase; an estrogen receptor; an estrogen receptor fusion; etc.
  • the promoter is a spatially restricted promoter (e.g., cell type specific promoter, tissue specific promoter, etc.) such that in a multi-cellular organism, the promoter is active (e.g,“ON”) in a subset of specific cells.
  • spatially restricted promoters may also be referred to as enhancers, transcriptional control elements, control sequences, etc.
  • any suitable spatially restricted promoter may be used and the choice of suitable promoter (e.g, a brain specific promoter, a promoter that drives expression in a subset of neurons, a promoter that drives expression in the germline, a promoter that drives expression in the lungs, a promoter that drives expression in muscles, a promoter that drives expression in islet cells of the pancreas, etc ) will depend on the organism.
  • suitable promoter e.g, a brain specific promoter, a promoter that drives expression in a subset of neurons, a promoter that drives expression in the germline, a promoter that drives expression in the lungs, a promoter that drives expression in muscles, a promoter that drives expression in islet cells of the pancreas, etc
  • a spatially restricted promoter can be used to regulate the expression of a nucleic acid encoding a site-specific modifying enzyme in a wide variety of different tissues and cell types, depending on the organism.
  • Some spatially restricted promoters are also temporally restricted such that the promoter is in the“ON” state or“OFF” state during specific stages of embryonic development or during specific stages of a biological process (e.g ., hair follicle cycle in mice).
  • examples of spatially restricted promoters include, but are not limited to, neuron-specific promoters, adipocyte-specific promoters, cardiomyocyte-specific promoters, smooth muscle-specific promoters, photoreceptor- specific promoters, etc.
  • Neuron-specific spatially restricted promoters include, but are not limited to, a neuron-specific enolase (NSE) promoter (see, e.g., EMBL HSEN02, X51956); an aromatic amino acid decarboxylase (AADC) promoter; a neurofilament promoter (see, e.g., GenBank HUMNFL, L04147); a synapsin promoter (see, e.g., GenBank HUMSYNIB, M55301); a thy-1 promoter (see, e.g, Chen et al. (1987) Cell 51 :7-19; and Llewellyn, et al. (2010) Nat. Med.
  • NSE neuron-specific enolase
  • AADC aromatic amino acid decarboxylase
  • DNA regulatory sequences refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, and the like, that provide for and/or regulate transcription of a nucleic acid sequence (e.g, a sequence encoding a guide RNA or a sequence encoding a BEFP) and/or regulate translation of an encoded polypeptide.
  • a nucleic acid sequence e.g, a sequence encoding a guide RNA or a sequence encoding a BEFP
  • nucleic acid refers to a nucleic acid, polypeptide, cell, or organism that is found in nature.
  • a polypeptide or nucleic acid sequence that is present in an organism (including in a virus) that can be isolated from a source in nature and that has not been intentionally modified by a human in the laboratory is naturally occurring.
  • Heterologous means a nucleotide or peptide that is not found in the native nucleic acid or protein, respectively.
  • a BEFP described herein may comprise the RNA- binding domain of the BEFP (or a variant thereof) fused to a heterologous polypeptide sequence (e.g ., a polypeptide sequence from a protein other than BEFP).
  • the heterologous polypeptide may exhibit an activity (e.g., enzymatic activity) that will also be exhibited by the BEFP (e.g, methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.).
  • a heterologous nucleic acid may be linked to a naturally-occurring nucleic acid (or a variant thereof) (e.g., by genetic engineering) to generate a fusion nucleic acid encoding a fusion polypeptide.
  • a variant BEFP may be fused to a heterologous polypeptide (e.g, a polypeptide other than BEFP), which exhibits an activity that will also be exhibited by the fusion variant BEFP.
  • a heterologous nucleic acid may be linked to a variant BEFP (e.g, by genetic engineering) to generate a nucleic acid encoding a fusion variant BEFP.“Heterologous,” as used herein, additionally means a nucleotide or polypeptide in a cell that is not its native cell.
  • cognate refers to two biomolecules that normally interact or co-exist in nature.
  • Recombinant means that a particular nucleic acid (DNA or RNA) or vector is the product of various combinations of cloning, restriction, polymerase chain reaction (PCR) and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems.
  • DNA sequences encoding polypeptides can be assembled from cDNA fragments or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid that is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system.
  • Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit. Sequences of non-translated DNA may be present 5' or 3' from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions, and may indeed act to modulate production of a desired product by various mechanisms (see“DNA regulatory sequences”, below).
  • a DNA sequence encoding RNA e.g, guide RNA
  • the term“recombinant” nucleic acid refers to one which is not naturally occurring, e.g, is made by the artificial combination of two otherwise separated segments of sequence through human intervention.
  • This artificial combination can be accomplished by chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. Such is generally done to replace a codon with a codon encoding the same amino acid, a conservative amino acid, or a non conservative amino acid. In addition or alternatively, it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
  • a“recombinant” polypeptide When a recombinant nucleic acid encodes a polypeptide, the sequence of the encoded polypeptide can be naturally occurring (“wild type”) or can be a variant (e.g., a mutant) of the naturally occurring sequence.
  • wild type a polypeptide whose sequence does not naturally occur
  • a“recombinant” polypeptide is encoded by a recombinant DNA sequence, but the sequence of the polypeptide can be naturally occurring (“wild type”) or non-naturally occurring (e.g, a variant, a mutant, etc.).
  • a“recombinant” polypeptide is the result of human intervention, but may be a naturally occurring amino acid sequence.
  • the term“non-naturally occurring” includes molecules that are markedly different from their naturally occurring counterparts, including chemically modified or mutated molecules.
  • A“vector” or“expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, e.g, an“insert”, may be attached so as to bring about the replication of the attached segment in a cell.
  • An“expression cassette” includes a DNA coding sequence operably linked to a promoter.“Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner.
  • a promoter is operably linked to a coding sequence if the promoter affects its transcription or expression.
  • the terms“recombinant expression vector,” or“DNA construct” are used interchangeably herein to refer to a DNA molecule comprising a vector and at least one insert. Recombinant expression vectors are generally generated for the purpose of expressing and/or propagating the insert(s), or for the construction of other recombinant nucleotide sequences.
  • the nucleic acid(s) may or may not be operably linked to a promoter sequence and may or may not be operably linked to DNA regulatory sequences.
  • operably linked denotes a physical or functional linkage between two or more elements, e.g ., polypeptide sequences or nucleic acid sequences, which permits them to operate in their intended fashion.
  • an operably linkage between a nucleic acid of interest and a regulatory sequence is functional link that allows for expression of the nucleic acid of interest.
  • the term“operably linked” refers to the positioning of a regulatory region and a coding sequence to be transcribed so that the regulatory region is effective for regulating transcription or translation of the coding sequence of interest.
  • the term“operably linked” denotes a configuration in which a regulatory sequence is placed at an appropriate position relative to a sequence that encodes a polypeptide or functional RNA such that the control sequence directs or regulates the expression or cellular localization of the mRNA encoding the polypeptide, the polypeptide, and/or the functional RNA.
  • a promoter is in operable linkage with a nucleic acid sequence if it can mediate transcription of the nucleic acid sequence.
  • Operably linked elements may be contiguous or non-conti guous.
  • a cell has been“genetically modified” or“transformed” or“transfected” by exogenous DNA, e.g., a recombinant expression vector, when such DNA has been introduced inside the cell.
  • exogenous DNA e.g., a recombinant expression vector
  • the presence of the exogenous DNA results in permanent or transient genetic change.
  • the transforming DNA may or may not be integrated (covalently linked) into the genome of the cell.
  • a transforming DNA can be maintained on an episomal element such as a plasmid.
  • a stably transformed cell is one in which the transforming DNA is integrated into a chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones that include a population of daughter cells containing the transforming DNA integrated into chromosomal DNA.
  • A“clone” is a population of cells derived from a single cell or common ancestor by mitosis.
  • A“cell line” is a clone of a primary cell that is capable of stable growth in vitro for many generations.
  • Suitable methods of genetic modification include, but are not limited to, e.g, viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI)- mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle- mediated nucleic acid delivery (see, e.g., Panyam et al., Adv Drug Deliv Rev. 2012 Sep 13. pp: SO 169-409X(12)00283-9. doi : 10.1016/j addr.2012.09.023 ), and the like.
  • PKI polyethyleneimine
  • A“host cell,” as used herein, denotes an in vivo or in vitro eukaryotic cell, a prokaryotic cell (e.g ., bacterial or archaeal cell), or a cell from a multicellular organism (e.g, a cell line) cultured as a unicellular entity, which eukaryotic or prokaryotic cells can be, or have been, used as recipients for a nucleic acid, and include the progeny of the original cell which has been transformed by the nucleic acid. It is understood that the progeny of a single cell may not necessarily be completely identical in morphology or in genomic or total DNA complement as the original parent, due to natural, accidental, or deliberate mutation.
  • A“recombinant host cell” is a host cell into which has been introduced a heterologous nucleic acid, e.g., an expression vector.
  • a bacterial host cell is a genetically modified bacterial host cell by virtue of introduction into a suitable bacterial host cell of an exogenous nucleic acid (e.g, a plasmid or recombinant expression vector) and a eukaryotic host cell is a genetically modified eukaryotic host cell (e.g, a mammalian germ cell), by virtue of introduction into a suitable eukaryotic host cell of an exogenous nucleic acid.
  • A“target DNA” as used herein is a polydeoxyribonucleotide that includes a“target site” or“target sequence.”
  • the terms“target site,”“target sequence,”“target protospacer DNA,” or“protospacer-like sequence” are used interchangeably herein to refer to a nucleic acid sequence present in a target DNA to which a DNA-targeting segment (also referred to as a “spacer”) of a guide RNA can bind, provided permissive conditions for binding exist.
  • the target site (or target sequence) 5'- GAGCATATC-3 1 within a target DNA is targeted by (or is bound by, or hybridizes with, or is complementary to) the RNA sequence 5'- GAUAUGCUC-3'.
  • Suitable DNA/RNA binding conditions include physiological conditions normally present in a cell.
  • Other suitable DNA/RNA binding conditions e.g, conditions in a cell-free system
  • the strand of the target DNA that is complementary to and hybridizes with the guide RNA is referred to as the
  • “complementary strand” and the strand of the target DNA that is complementary to the “complementary strand” (and is therefore not complementary to the guide RNA) is referred to as the“non-complementary strand” or“non-complementary strand.”
  • RNA-binding site-specific modifying enzyme is meant a polypeptide that binds RNA and is targeted to a specific DNA sequence, such as a BEFP
  • a site-specific modifying enzyme as described herein is targeted to a specific DNA sequence by the RNA molecule to which it is bound.
  • the RNA molecule includes a sequence that binds, hybridizes to, or is complementary to a target sequence within the target DNA, thus targeting the bound polypeptide to a specific location within the target DNA (the target sequence).
  • cleavage it is meant the breakage of the covalent backbone of a DNA molecule.
  • Cleavage can be initiated by a variety of methods including, but not limited to, enzymatic or chemical hydrolysis of a phosphodi ester bond. Both single-stranded cleavage and double-stranded cleavage are possible, and double-stranded cleavage can occur as a result of two distinct single-stranded cleavage events. DNA cleavage can result in the production of either blunt ends or staggered ends.
  • a complex comprising a guide RNA and a site-specific modifying enzyme is used for targeted double-stranded DNA cleavage.
  • Nuclease and“endonuclease” are used interchangeably herein to mean an enzyme that possesses endonucleolytic catalytic activity for nucleic acid cleavage.
  • nucleavage domain or“active domain” or“nuclease domain” of a nuclease it is meant the polypeptide sequence or domain within the nuclease which possesses the catalytic activity for DNA cleavage.
  • a cleavage domain can be contained in a single polypeptide chain or cleavage activity can result from the association of two (or more) polypeptides.
  • a single nuclease domain may consist of more than one isolated stretch of amino acids within a given polypeptide.
  • The“guide sequence” or“DNA-targeting segment” or“DNA-targeting sequence” or “spacer” includes a nucleotide sequence that is complementary to a specific sequence within a target DNA (the complementary strand of the target DNA) designated the“protospacer-like” sequence herein.
  • the protein-binding segment (or“protein-binding sequence”) interacts with a site-specific modifying enzyme.
  • site-specific modifying enzyme is a BEFP or BEFP- related polypeptide (described in more detail below)
  • site-specific cleavage of the target DNA occurs at locations determined by both (i) base-pairing complementarity between the guide RNA and the target DNA; and (ii) a short motif (referred to as the protospacer adjacent motif (PAM)) in the target DNA.
  • the protein-binding segment of a guide RNA includes, in part, two complementary stretches of nucleotides that hybridize to one another to form a double-stranded RNA duplex (dsRNA duplex).
  • a nucleic acid (e.g., a guide RNA, a nucleic acid encoding a guide RNA; a nucleic acid encoding a site-specific modifying enzyme; etc.) includes a modification or sequence that provides for an additional desirable feature (e.g., modified or regulated stability; subcellular targeting; tracking, e.g., a fluorescent label; a binding site for a protein or protein complex, etc.).
  • an additional desirable feature e.g., modified or regulated stability; subcellular targeting; tracking, e.g., a fluorescent label; a binding site for a protein or protein complex, etc.
  • Non-limiting examples include: a 5' cap (e.g ., a 7- methylguanylate cap (m7G)); a 3' polyadenylated tail (e.g., a 3' poly(A) tail); a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and/or protein complexes); a stability control sequence; a sequence that forms a dsRNA duplex (e.g., a hairpin)); a modification or sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g, direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc.); a modification or sequence that provides a binding site for proteins (e.g, proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA
  • a guide RNA includes an additional segment at either the 5 1 or 3' end that provides for any of the features described above.
  • a suitable third segment can include a 5' cap (e.g, a 7-methylguanylate cap (m7G)); a 3' polyadenylated tail (e.g, a 3' poly(A) tail); a riboswitch sequence (e.g, to allow for regulated stability and/or regulated accessibility by proteins and protein complexes); a stability control sequence; a sequence that forms a dsRNA duplex (e.g, a hairpin)); a sequence that targets the RNA to a subcellular location (e.g, nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g, direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc ); a modification or sequence that provides a binding
  • a guide RNA and a site-specific modifying enzyme such as a BEFP may form a ribonucleoprotein complex (e.g, bind via non-covalent interactions).
  • the guide RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence of a target DNA.
  • the site-specific modifying enzyme of the complex provides the modifying activity.
  • the site-specific modifying enzyme is guided to a target DNA sequence (e.g ., a target sequence in a chromosomal nucleic acid; a target sequence in an extrachromosomal nucleic acid, e.g., an episomal nucleic acid, a minicircle, etc ; a target sequence in a mitochondrial nucleic acid; a target sequence in a chloroplast nucleic acid; a target sequence in a plasmid; etc.) by virtue of its association with the protein-binding segment of the guide RNA.
  • RNA aptamers are known in the art and are generally a synthetic version of a riboswitch.
  • RNA aptamer and“riboswitch” are used interchangeably herein to encompass both synthetic and natural nucleic acid sequences that provide for inducible regulation of the structure (and therefore the availability of specific sequences) of the RNA molecule of which they are part.
  • RNA aptamers generally include a sequence that folds into a particular structure (e.g., a hairpin), which specifically binds a particular drug (e.g, a small molecule). Binding of the drug causes a structural change in the folding of the RNA, which changes a feature of the nucleic acid of which the aptamer is a part.
  • a hairpin e.g., a hairpin
  • an activator-RNA with an aptamer may not be able to bind to the cognate targeter RNA unless the aptamer is bound by the appropriate drug;
  • a targeter-RNA with an aptamer may not be able to bind to the cognate activator-RNA unless the aptamer is bound by the appropriate drug;
  • a targeter-RNA and an activator-RNA, each comprising a different aptamer that binds a different drug may not be able to bind to each other unless both drugs are present.
  • a two-molecule guide RNA can be designed to be inducible.
  • aptamers and riboswitches can be found, for example, in: Nakamura et al., Genes Cells. 2012 May; 17(5) 344-64; Vavalle et ah, Future Cardiol. 2012 May; 8(3):371-82; Citartan et ah, Biosens Bioelectron. 2012 Apr 15; 34(1): 1-11; and Liberman et ah, Wiley lnterdiscip Rev RNA. 2012 May-Jun; 3(3):369-84; all of which are herein incorporated by reference in their entireties.
  • aptamers and riboswitches can be found, for example, in: Nakamura et ah, Genes Cells. 2012 May; 17(5):344-64; Vavalle et ah, Future Cardiol. 2012 May; 8(3):371-82; Citartan et ah, Biosens Bioelectron. 2012 Apr 15; 34(1): 1-11; and Liberman et ah, Wiley lnterdiscip Rev RNA. 2012 May-Jun; 3(3):369-84; all of which are herein incorporated by reference in their entirety.
  • stem cell is used herein to refer to a cell (e.g ., plant stem cell, vertebrate stem cell) that has the ability both to self-renew and to generate a differentiated cell type (see Morrison et al. (1997) Cell 88:287-298).
  • the adjective “differentiated”, or“differentiating” is a relative term.
  • A“differentiated cell” is a cell that has progressed further down the developmental pathway than the cell it is being compared with.
  • pluripotent stem cells can differentiate into lineage-restricted progenitor cells (e.g., mesodermal stem cells), which in turn can differentiate into cells that are further restricted (e.g., neuron progenitors), which can differentiate into end-stage cells (e.g, terminally differentiated cells, e.g, neurons cardiomyocytes, etc ), which play a characteristic role in a certain tissue type, and may or may not retain the capacity to proliferate further.
  • progenitor cells e.g., mesodermal stem cells
  • end-stage cells e.g, terminally differentiated cells, e.g, neurons cardiomyocytes, etc
  • Stem cells may be characterized by both the presence of specific markers (e.g., proteins, RNAs, etc.) and the absence of specific markers.
  • Stem cells may also be identified by functional assays both in vitro and in vivo, particularly assays relating to the ability of stem cells to give rise to multiple differentiated progeny.
  • Stem cells of interest include pluripotent stem cells (PSCs).
  • PSCs pluripotent stem cells
  • the term“pluripotent stem cell” or“PSC” is used herein to mean a stem cell capable of producing all cell types of the organism. Therefore, a PSC can give rise to cells of all germ layers of the organism (e.g, the endoderm, mesoderm, and ectoderm of a vertebrate).
  • Pluripotent cells are capable of forming teratomas and of contributing to ectoderm, mesoderm, or endoderm tissues in a living organism.
  • Pluripotent stem cells of plants are capable of giving rise to all cell types of the plant (e.g, cells of the root, stem, leaves, etc.).
  • PSCs of animals can be derived in a number of different ways.
  • embryonic stem cells ESCs
  • iPSCs induced pluripotent stem cells
  • somatic cells Takahashi et. al, Cell. 2007 Nov 30;131(5):861-72; Takahashi et. al, Nat Protoc. 2007;2(12):3081-9; Yu et. al, Science. 2007 Dec 21;318(5858): 1917-20. Epub 2007 Nov 20).
  • PSC refers to pluripotent stem cells regardless of their derivation
  • the term PSC encompasses the terms ESC and iPSC, as well as the term embryonic germ stem cells (EGSC), which are another example of a PSC.
  • ESC iPSC
  • EGSC embryonic germ stem cells
  • PSCs may be in the form of an established cell line, they may be obtained directly from primary embryonic tissue, or they may be derived from a somatic cell. PSCs can be target cells of the methods described herein.
  • ESC embryonic stem cell
  • ESC lines are listed in the N1H Eluman Embryonic Stem Cell Registry, e.g., hESBGN-01, hESBGN-02, hESBGN-03, hESBGN-04 (BresaGen, Inc.); HES-1, HES-2, HES-3, HES-4, HES-5, HES-6 (ES Cell International); Miz- hESl (MizMedi Hospital-Seoul National University); HSF-1, HSF-6 (University of California at San Francisco); and HI, H7, H9, H13, H14 (Wisconsin Alumni Research Foundation (WiCell Research Institute)).
  • Stem cells of interest also include embryonic stem cells from other primates, such as Rhesus stem cells and marmoset stem cells.
  • the stem cells may be obtained from any mammalian species, e.g., human, equine, bovine, porcine, canine, feline, rodent, e.g., mice, rats hamster, primate, etc. (Thomson et al. (1998) Science 282: 1145; Thomson et al. (1995) Proc. Natl. Acad. Sci. USA 92:7844; Thomson et al. (1996) Biol. Reprod. 55:254;
  • ESCs In culture, ESCs generally grow as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nucleoli. In addition, ESCs express SSEA-3, SSEA-4, TRA-1-60, TRA-1-81, and Alkaline Phosphatase, but not SSEA-1. Examples of methods of generating and characterizing ESCs may be found in, for example, US Patent No. 7,029,913, US Patent No. 5,843,780, and US Patent No. 6,200,806, the disclosures of which are incorporated herein by reference.
  • EGSC embryonic germ stem cell
  • EG cell a PSC that is derived from germ cells and/or germ cell progenitors, e.g., primordial germ cells, e.g, those that would become sperm and eggs.
  • Embryonic germ cells EG cells are thought to have properties similar to embryonic stem cells as described above. Examples of methods of generating and characterizing EG cells may be found in, for example, US Patent No.
  • iPSC induced pluripotent stem cell
  • iPSCs can be derived from multiple different cell types, including terminally differentiated cells.
  • iPSCs have an ES cell-like morphology, growing as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nuclei.
  • iPSCs express one or more key pluripotency markers known by one of ordinary' skill in the art, including but not limited to Alkaline
  • Examples of methods of generating and characterizing iPSCs may be found in, for example, US Patent Publication Nos. US20090047263, US20090068742, US20090191159, US20090227032, US20090246875, and US20090304646, the disclosures of which are incorporated herein by reference.
  • somatic cells are provided with reprogramming factors (e.g ., Oct4, SOX2, KLF4, MYC, Nanog, Lin28, etc.) known in the art to reprogram the somatic cells to become pluripotent stem cells.
  • somatic cell it is meant any cell in an organism that, in the absence of experimental manipulation, does not ordinarily give rise to all types of cells in an organism.
  • somatic cells are cells that have differentiated sufficiently that they will not naturally generate cells of all three germ layers of the body, e.g., ectoderm, mesoderm and endoderm.
  • somatic cells would include both neurons and neural progenitors, the latter of which may be able to naturally give rise to all or some cell types of the central nervous system but cannot give rise to cells of the mesoderm or endoderm lineages.
  • mitotic cell it is meant a cell undergoing mitosis.
  • Mitosis is the process by which a eukaryotic cell separates the chromosomes in its nucleus into two identical sets in two separate nuclei. It is generally followed immediately by cytokinesis, which divides the nuclei, cytoplasm, organelles and cell membrane into two cells containing roughly equal shares of these cellular components.
  • post-mitotic cell is meant a cell that has exited from mitosis (is in Go), e.g. , the cell is“quiescent,” e.g., it is no longer undergoing cell division. This quiescent state may be temporary, e.g., reversible, or it may be permanent.
  • treatment covers any treatment of a disease or symptom in a mammal, and includes: (a) preventing the disease or symptom from occurring in a subject which may be predisposed to acquiring the disease or symptom but has not yet been diagnosed as having it; (b) inhibiting the disease or symptom, e.g., arresting its development; (c) relieving the disease, e.g., causing regression of the disease, or reducing the risk of disease or a symptom of a disease.
  • the therapeutic agent may be administered before, during, or after the onset of disease or injury.
  • the treatment of ongoing disease, where the treatment stabilizes or reduces the undesirable clinical symptoms of the subject, is of particular interest. Such treatment is desirably performed prior to complete loss of function in affected tissues.
  • therapy is administered to a subject having at least on disease symptom. In some cases the treatment is administered after the subject is not experiencing one or more symptoms of the disease.
  • the terms“individual,”“subject,”“host,” and“patient,” are used interchangeably herein and refer to any mammalian subject for whom diagnosis, treatment, or therapy is desired, particularly humans.
  • the AP-binding domain of a BEFP can comprise any domain with covalent binding activity at an AP site, for example, an SOS response-associated peptidase (SRAP) domain, such as the SRAP domain of 5-hydroxymethylcytosine (5hmC) binding, ESC-specific (HMCES) (see Mohni et al. 2019, Cell 176, 144-153).
  • SRAP SOS response-associated peptidase
  • SRAP domain such as the SRAP domain of 5-hydroxymethylcytosine (5hmC) binding
  • HMCES ESC-specific
  • the AP-binding domain comprises an SRAP domain from any of the proteins identified in Table 1 with Uniprot ID numbers (https://www.uniprot.org/, as accessed on January 18, 2019), or functional fragments and/or derivatives thereof.
  • the AP-binding domain comprises an AP -binding domain from HMCES.
  • the HMCES is a human HMCES.
  • the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5.
  • the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5.
  • the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5.
  • the AP-binding domain comprises the amino acid sequence of SEQ ED NO: 5.
  • the AP-binding domain comprises an AP -binding domain from YedK.
  • the YedK is an Escherichia coli YedK.
  • the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 6 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 6.
  • the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 6.
  • the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 6 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 6.
  • a nucleic acid recognition domain according to the invention can specifically bind to a target nucleotide sequence in a selected double-stranded DNA.
  • the nucleic acid recognition domain is from an RNA-programmable CRISPR- associated nuclease (for example, a CRISPR class 2 type II (Cas9) or CRISPR class 2 type V (Casl2a and Casl2b) nuclease) or a variant thereof, and in a complex with a guide RNA (gRNA) is capable of targeting the BEFP to a target nucleotide sequence in a DNA molecule (Stella et al. Nature Structural Biology, 24 (11), pp. 882-892).
  • gRNA guide RNA
  • the nucleic acid recognition domain is from a Cas9 protein or a variant thereof. In some embodiments, the nucleic acid recognition domain is from a Cas9 protein or a variant thereof and comprises two domains associated with nuclease activity, most commonly denoted as (i) a RuvC domain and (ii) an HNH domain. In some embodiments, the nuclease activity of the RuvC domain and/or the HNH domain is attenuated (e.g. , inactivated), such as by introducing appropriate mutations (Jinek et al. Science. 2012 Aug 17; 337(6096): 816-821).
  • the nucleic acid recognition domain is a derivative of a Cas9 protein containing an inactivating mutation in only one of the two nuclease domains, resulting in a nickase Cas9 (nCas9), which cleaves only one of the two strands of the target DNA.
  • the nucleic acid recognition domain is a derivative of a Cas9 protein containing inactivating mutations in both of the nuclease domains, resulting in a nuclease-dead Cas9 (dCas9).
  • the nCas9 is only able to cleave the DNA strand that is contacted through base pairing with bases of the gRNA. This is generally achieved by introducing one or more nuclease-inactivating mutations into the RuvC nuclease domain, which naturally cleaves the DNA strand not contacted by the gRNA.
  • One example of such a mutation is D10A in SluCas9.
  • the SluCas9 domains of SEQ ID NOs: 2 and 4 include such D10A mutations.
  • the nCas9 is only able to cleave the DNA strand that is not contacted through base pairing with bases of the gRNA. This is generally achieved by introducing one or more nuclease inactivating mutations in the HNH nuclease domain, which naturally cleaves the DNA strand contacted by the gRNA.
  • One example of such a mutation is H559A in SluCas9 (SEQ ID NO: 7).
  • a nucleic acid recognition domain is from a SluCas9 (e.g ., SEQ ID NO: 7) or variant thereof that can further comprise one or more modification(s) or mutation(s) that result in a SluCas9 with a significantly reduced or no detectable nuclease activity, e.g., including, but not limited to, i) a modification or mutation at position 10 with respect to SEQ ID NO: 7 leading to a significantly reduced nuclease activity (e.g., a D10A mutation), ii) a modification or mutation at position 559 with respect to SEQ ID NO: 7 leading to a significantly reduced nuclease activity (e.g., an H559A mutation), and iii) a modification or mutation at position 582 with respect to SEQ ID NO: 7 leading to a significantly reduced nuclease activity (e.g, a N582A mutation).
  • SluCas9 e.g ., SEQ ID NO
  • the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7.
  • the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7.
  • the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7.
  • the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7.
  • the nucleic acid recognition domain is from a SluCas9 polypeptide comprising any contiguous sequence of 265 amino acids from position 789 to position 1053 of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to any contiguous sequence of 265 amino acids from position 789 to position 1053 of SEQ ID NO: 7.
  • the nucleic acid recognition domain is a zinc finger protein, for example:
  • ZFNickases an engineered zinc finger nickase (ZFNickases), in which one monomer of a zinc finger nuclease dimer comprises a Fokl cleavage domain that had its nuclease activity inactivated by one or more introduced mutations, as for example described in Kim et al. Genome Res. 2012 Jul;22(7): 1327-33. doi: 10.1101/gr.138792.112. Epub 2012 Apr 20.
  • the nucleic acid recognition domain is a TALEN protein, for example:
  • TALENickases an engineered TAL effector nickase (TALENickases), in which one monomer of a TALE nuclease dimer comprises a Fokl cleavage domain that had its nuclease activity inactivated by one or more introduced mutations, as for example described in Biochem Biophys Res Commun. 2014 Mar 28;446(l):261-6. Cytidine deaminase domain:
  • the cytidine deaminase domain includes any protein or domain that is able to convert a cytidine base within a nucleic acid to a uracil.
  • the cytidine deaminase domain is from an APOBEC deaminase (Trends Biochem Sci. 2016 Jul; 41(7): 578-594).
  • the cytidine deaminase domain is from a deaminase including, without limitation, APOBEC 1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H.
  • APOBEC4 an activation-induced deaminase
  • the deaminase may be from any suitable organism (e.g ., a human or a rat).
  • the deaminase is from a human, non-human primate (e.g., chimpanzee, gorilla, orangutan, or monkey), cow, pig, dog, rat, or mouse.
  • the deaminase is a rat APOBEC 1.
  • the deaminase is a human APOBEC 1.
  • the deaminase is pmCDAl.
  • the BEFP comprises, from N-terminus to C-terminus, an AP- binding domain, a cytidine deaminase domain, and a nucleic acid recognition domain.
  • the BEFP further comprises an NLS sequence, for example, C-terminal to the nucleic acid recognition domain.
  • the BEFP further comprises a linker, such as a peptide linker, between any of the domains contained therein.
  • the AP-binding domain is from HMCES (e.g ., human HMCES, such as SEQ ID NO: 5) or YedK ie.g., E.
  • the AP- binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6
  • the AP -binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6.
  • the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6.
  • the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7.
  • the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7.
  • the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7.
  • the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7.
  • the nucleic acid recognition domain comprises a D10A mutation, an H559A mutation, and/or a N582A mutation, with respect to SEQ ID NO: 7.
  • the cytidine deaminase domain is from an APOBEC deaminase.
  • the cytidine deaminase domain is from a deaminase including, without limitation, APOBEC 1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H APOBEC4, and an activation-induced deaminase (AID).
  • the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 2 or SEQ ID NO: 4.
  • the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4.
  • a nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3, or a variant nucleotide sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%,
  • the nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3.
  • a protein-of-interest such as by targeted modification of a nucleic acid (e.g ., conversion of a cytidine to a thymidine) encoding the POI or a derivative thereof in the genome of the cell.
  • the POI is a protein associated with a disorder or health condition.
  • systems for treating a subject having or suspected of having a disorder or health condition associated with a POI employing ex vivo and/or in vivo genome editing.
  • a system comprises (i) a BEFP comprising:
  • gRNA guide RNA
  • the AP- binding domain comprises an SRAP domain, e.g., the SRAP domain of an HMCES or YedK protein.
  • the AP-binding domain comprises an SRAP domain from a protein identified in Table 1 or a functional derivative thereof.
  • the AP- binding domain is from HMCES (e.g., human HMCES, such as SEQ ID NO: 5) or YedK (e.g,
  • the AP- binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6
  • the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6.
  • the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6.
  • the nucleic acid recognition domain is from an RNA-programmable CRISPR-associated nuclease or variant thereof.
  • the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7.
  • the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7.
  • the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7.
  • the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7.
  • the nucleic acid recognition domain comprises a D10A mutation, an H559A mutation, and/or a N582A mutation, with respect to SEQ ID NO: 7.
  • the cytidine deaminase domain is from an APOBEC deaminase.
  • the cytidine deaminase domain is from a deaminase including, without limitation, APOBEC 1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G,
  • APOBEC3H APOBEC3H.
  • APOBEC4 an activation-induced deaminase (AID).
  • AID activation-induced deaminase
  • the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 2 or SEQ ID NO: 4.
  • the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4.
  • a nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3, or a variant nucleotide sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 1 or SEQ ID NO: 3.
  • the nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3.
  • the methods of the disclosure may be employed to induce DNA modification in mitotic or post-mitotic cells in vivo , and/or ex vivo, and/or in vitro (e.g., to produce genetically modified cells that can be reintroduced into an individual).
  • a mitotic and/or post-mitotic cell can be any of a variety of host cell, where suitable host cells include, but are not limited to, a bacterial cell; an archaeal cell; a single-celled eukaryotic organism; a plant cell; an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens (C.
  • suitable host cells include, but are not limited to, a bacterial cell; an archaeal cell; a single-celled eukaryotic organism; a plant cell; an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens (C.
  • a fungal cell e.g., an animal cell; a cell from an invertebrate animal (e.g, an insect, a cnidarian, an echinoderm, a nematode, etc ); an eukaryotic parasite (e.g., a malarial parasite, e.g, Plasmodium fakiparum; a helminth; etc.); a cell from a vertebrate animal (e.g, fish, amphibian, reptile, bird, mammal), a mammalian cell, e.g, a rodent cell, a human cell, a non-human primate cell, etc.
  • the host cell can be any human cell. Suitable host cells include naturally occurring cells; genetically modified cells (e.g, cells genetically modified in a laboratory, e.g, by the “hand of man”); and cells manipulated in vitro in any way.
  • a host cell is isolated.
  • a stem cell e.g. , an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a germ cell; a somatic cell, e.g., a fibroblast, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.).
  • Cells may be from established cell lines or they may be primary cells, where“primary cells”,“primary cell lines”, and“primary cultures” are used
  • primary cultures include cultures that may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage.
  • Primary cell lines can be are maintained for fewer than 10 passages in vitro.
  • Target cells are, in some embodiments, unicellular organisms, or are grown in culture.
  • the cells are primary cells, such cells may be harvested from an individual by any method.
  • leukocytes may be harvested by apheresis, leukocytapheresis, density gradient separation, etc.
  • cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. may be harvested by biopsy.
  • An appropriate solution may be used for dispersion or suspension of the harvested cells.
  • Such solution will generally be a balanced salt solution, e.g., normal saline, phosphate-buffered saline (PBS), Hank's balanced salt solution, etc., supplemented with fetal calf serum or other naturally occurring factors, in conjunction with an acceptable buffer at low concentration, e.g., from 5-25 mM.
  • Useful buffers include HEPES, phosphate buffers, lactate buffers, etc.
  • the cells may be used immediately, or they may be stored, e.g, frozen, for long periods of time, being thawed and capable of being reused.
  • the cells may be frozen in 10% dimethyl sulfoxide (DMSO), 50% serum, 40% buffered medium, or some other such solution as is commonly used in the art to preserve cells at such freezing temperatures, and thawed in a manner as commonly known in the art for thawing frozen cultured cells.
  • DMSO dimethyl sulfoxide
  • the BEFP system herein described can be used in eukaryotic, such as mammalian cells, for example, a human cell. Any human cell is suitable for use with the BEFP system disclosed herein.
  • the BEFP system components of the present disclosure can be formulated into compositions (e.g, pharmaceutical compositions) by combination with appropriate carriers or diluents (e.g, pharmaceutically acceptable carriers or diluents).
  • the composition is a pharmaceutical composition.
  • the BEFP system components include a BEFP or a nucleic acid encoding the BEFP and/or a gRNA or nucleic acid encoding the gRNA as described herein.
  • compositions intended for in vivo use are generally sterile. To the extent that a given compound must be synthesized prior to use, the resulting product is generally substantially free of any potentially toxic agents, particularly any endotoxins, which may be present during the synthesis or purification process.
  • compositions for parental administration are also sterile, substantially isotonic and made under GMP conditions.
  • the BEFP present in a composition is at least about 75% (such as at least about any of 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) pure, where“% pure” means that the BEFP is the recited percent free from other proteins, other macromolecules, or contaminants that may be present during the production of the BEFP.
  • compositions are provided herein.
  • compositions comprising components of a BEFP system including (i) a guide RNA or nucleic acid encoding the gRNA; and/or (ii) a BEFP or nucleic acid encoding the BEFP; wherein the BEFP system components are present in a pharmaceutically acceptable vehicle.
  • “Pharmaceutically acceptable vehicles” may be vehicles approved by a regulatory agency of the Federal or a state government or listed in the US Pharmacopeia or other generally recognized pharmacopeia for use in mammals, such as humans.
  • vehicle refers to a diluent, adjuvant, excipient, or carrier with which a compound of the disclosure is formulated for administration to a mammal.
  • Such pharmaceutical vehicles can be lipids, e.g., liposomes, e.g, liposome dendrimers; liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as peanut oil, soybean oil, mineral oil, sesame oil and the like, saline; gum acacia, gelatin, starch paste, talc, keratin, colloidal silica, urea, and the like.
  • auxiliary, stabilizing, thickening, lubricating and coloring agents may be used.
  • compositions may be formulated into preparations in solid, semisolid, liquid or gaseous forms, such as tablets, capsules, powders, granules, ointments, solutions, suppositories, injections, inhalants, gels, microspheres, and aerosols.
  • administration of the BEFP system components can be achieved in various ways, including oral, buccal, rectal, parenteral, intraperitoneal, intradermal, transdermal, intra-tracheal, intraocular, etc., administration.
  • the active agent may be systemic after administration or may be localized by the use of regional administration, intramural administration, or use of an implant that acts to retain the active dose at the site of implantation.
  • the active agent may be formulated for immediate activity or it may be formulated for sustained release.
  • BBB blood-brain barrier
  • osmotic means such as mannitol or leukotrienes
  • vasoactive substances such as bradykinin.
  • a BBB disrupting agent can be co- administered with the therapeutic compositions of the disclosure when the compositions are administered by intravascular injection.
  • the composition is stored in unit or multi-dose containers, for example, sealed ampules or vials, as an aqueous solution or as a lyophilized formulation for reconstitution.
  • a lyophilized formulation 10 ml vials are filled with 5 ml of sterile-filtered 1% (w/v) aqueous solution of compound, and the resulting mixture is lyophilized.
  • the infusion solution is prepared by reconstituting the lyophilized compound using bacteriostatic Water-for-Inj ection.
  • compositions can include, depending on the formulation desired, pharmaceutically acceptable, non-toxic carriers of diluents, which are defined as vehicles commonly used to formulate pharmaceutical compositions for animal or human administration.
  • the diluent is selected so as not to affect the biological activity of the combination. Examples of such diluents are distilled water, buffered water, physiological saline, phosphate buffered saline (PBS), Ringer's solution, dextrose solution, and Hank's solution.
  • the pharmaceutical composition or formulation can include other carriers, adjuvants, or non-toxic, nontherapeutic, non-immunogenic stabilizers, excipients and the like.
  • the compositions can also include additional substances to approximate physiological conditions, such as pH adjusting and buffering agents, toxicity adjusting agents, wetting agents and detergents.
  • the composition can also include any of a variety of stabilizing agents, such as an antioxidant for example.
  • the pharmaceutical composition includes a polypeptide
  • the polypeptide can be complexed with various well-known compounds that enhance the in vivo stability of the polypeptide, or otherwise enhance its pharmacological properties (e.g ., increase the half-life of the polypeptide, reduce its toxicity, and enhance solubility or uptake). Examples of such modifications or complexing agents include sulfate, gluconate, citrate, and phosphate.
  • nucleic acids or polypeptides of a composition can also be complexed with molecules that enhance their in vivo attributes.
  • molecules include, for example, carbohydrates, polyamines, amino acids, other peptides, ions (e.g., sodium, potassium, calcium, magnesium, manganese), and lipids.
  • kits for carrying out a method described herein can include one or more of: a BEFP or nucleic acid encoding the BEFP; and a gRNA or nucleic acid encoding the gRNA.
  • a kit may include a complex that includes two or more of: a BEFP; a nucleic acid encoding a BEFP; a guide RNA; a nucleic acid encoding a guide RNA.
  • a kit includes: (a) a BEFP or nucleic acid encoding the BEFP; and (b) a gRNA or nucleic acid encoding the gRNA, wherein the gRNA is capable of guiding the BEFP to a target nucleic acid sequence.
  • the kit comprises the BEFP.
  • the kit comprises nucleic acid encoding the BEFP.
  • the kit comprises the gRNA.
  • the kit comprises nucleic acid encoding the gRNA.
  • the kit further comprises one or more additional gRNAs or nucleic acid encoding the one or more additional gRNAs.
  • the kit further comprises one or more additional reagents, where such additional reagents can be selected from: a buffer for introducing the BEFP into a cell; a wash buffer; a control reagent; a control expression vector or polyribonucleotide; a reagent for in vitro production of the BEFP from DNA, and the like.
  • a gRNA (including, e.g., two or more guide RNAs) can be provided as an array (e.g., an array of RNA molecules, an array of DNA molecules encoding the guide RNA(s), etc.).
  • Such kits can be useful, for example, for use in any of the methods described herein.
  • kits can be in separate containers; or can be combined in a single container.
  • kits described herein can further include one or more additional reagents, where such additional reagents can be selected from: a dilution buffer; a reconstitution solution; a wash buffer; a control reagent; a control expression vector or Polyribonucleotide; a reagent for in vitro production of the BEFP from DNA, and the like.
  • additional reagents can be selected from: a dilution buffer; a reconstitution solution; a wash buffer; a control reagent; a control expression vector or Polyribonucleotide; a reagent for in vitro production of the BEFP from DNA, and the like.
  • a kit can further include instructions for using the components of the kit to practice the methods.
  • the instructions for practicing the methods are generally recorded on a suitable recording medium.
  • the instructions may be printed on a substrate, such as paper or plastic, etc.
  • the instructions may be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (e.g, associated with the packaging or subpackaging) etc.
  • the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g., CD-ROM, diskette, flash drive, etc.
  • the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source, e.g, via the internet, are provided.
  • An example of this embodiment is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate.
  • the method involves providing (i) a BEFP or nucleic acid encoding the BEFP; and (ii) a gRNA or nucleic acid encoding the gRNA, wherein the gRNA is capable of guiding the BEFP to a target nucleic acid sequence, such that a complex (a“targeting complex”) comprising the BEFP and the gRNA is formed and comes in contact with the target DNA comprising a target nucleic acid sequence.
  • a“targeting complex” comprising the BEFP and the gRNA is formed and comes in contact with the target DNA comprising a target nucleic acid sequence.
  • the BEFP comprises: (a) an AP -binding domain, (b) a cytidine deaminase domain, and (c) a nucleic acid recognition domain.
  • the AP- binding domain comprises an SRAP domain, e.g., the SRAP domain of an HMCES or YedK protein.
  • the AP-binding domain comprises an SRAP domain from a protein identified in Table 1 or a functional derivative thereof.
  • the AP- binding domain is from HMCES (e.g., human HMCES, such as SEQ ID NO: 5) or YedK (e.g.,
  • the AP- binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6
  • the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6.
  • the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6.
  • the nucleic acid recognition domain is from an RNA-programmable CRISPR-associated nuclease or variant thereof.
  • the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7.
  • the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7
  • the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7.
  • the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7.
  • the nucleic acid recognition domain comprises a D10A mutation, an H559A mutation, and/or a N582A mutation, with respect to SEQ ID NO: 7
  • the cytidine deaminase domain is from an APOBEC deaminase.
  • the cytidine deaminase domain is from a deaminase including, without limitation, APOBEC 1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G,
  • APOBEC3H APOBEC3H.
  • APOBEC4 an activation-induced deaminase (AID).
  • AID activation-induced deaminase
  • the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%. 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 2 or SEQ ID NO: 4.
  • the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4.
  • a nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3, or a variant nucleotide sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 1 or SEQ ID NO: 3.
  • the nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3.
  • a method of targeting, editing, modifying, or manipulating a target DNA at one or more locations in a cell or in vitro environment comprising introducing into the cell or in vitro environment (a) a BEFP or nucleic acid encoding the BEFP; and (b) a gRNA or nucleic acid encoding the gRNA, wherein the gRNA is capable of guiding the BEFP to a target nucleic acid sequence in the target DNA.
  • the method comprises introducing into the cell or in vitro environment the BEFP.
  • the method comprises introducing into the cell or in vitro environment nucleic acid encoding the BEFP.
  • the method comprises introducing into the cell or in vitro environment the gRNA. In some embodiments, the method comprises introducing into the cell or in vitro environment nucleic acid encoding the gRNA. In some embodiments, the gRNA is a single guide RNA (sgRNA). In some embodiments, the method comprises introducing into the cell or in vitro environment one or more additional gRNAs or nucleic acid encoding the one or more additional gRNAs targeting the target DNA.
  • sgRNA single guide RNA
  • a method for modifying a targeted site of a double- stranded DNA comprising
  • the BEFP is a BEFP according to any of the embodiments described herein.
  • the nucleic acid recognition domain has a nickase activity capable of cleaving only one strand of the double-stranded DNA.
  • a gRNA or sgRNA and a BEFP may form a ribonucleoprotein (RNP) complex.
  • the guide RNA provides target specificity to the RNP complex by including a nucleotide sequence that is complementary to a sequence of a target DNA.
  • the BEFP of the RNP complex provides the nucleobase-editing activity.
  • the RNP complex modifies a target DNA, leading to, for example, conversion of a cytidine base within the target DNA to a thymidine.
  • the target DNA may be, for example, naked (e.g., unbound by DNA associated proteins) DNA in vitro, chromosomal DNA in cells in vitro, chromosomal DNA in cells in vivo, etc.
  • a heterologous sequence can provide for subcellular localization of the BEFP (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like).
  • a nuclear localization signal NLS
  • a heterologous sequence can provide a tag for ease of tracking or purification (e.g., a fluorescent protein, e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like; a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like).
  • the heterologous sequence can provide for increased or decreased stability.
  • multiple guide RNAs are used to simultaneously modify different locations on the same target DNA or on different target DNAs.
  • two or more guide RNAs target the same gene or transcript or locus.
  • two or more guide RNAs target different unrelated loci.
  • two or more guide RNAs target different, but related loci.
  • the BEFP is provided directly as a protein.
  • a BEFP can be introduced into a cell (provided to the cell) by any method; such methods are known to those of ordinary skill in the art.
  • a method for DNA modification or base editing according to the present disclosure finds use in a variety of applications, which are also provided. Applications include research applications; diagnostic applications; industrial applications; and therapeutic applications.
  • the guide RNA and/or BEFP are employed to modify cellular DNA in vivo , for purposes such as gene therapy, e.g, to treat a disease or as an antiviral, antipathogenic, or anticancer therapeutic, for the production of genetically modified organisms in agriculture, or for biological research.
  • components of a BEFP system including (i) a guide RNA or nucleic acid encoding the gRNA; and/or (ii) a BEFP or nucleic acid encoding the BEFP are administered to a subject. Administration may be by any well-known method in the art for the administration of peptides, small molecules and nucleic acids to a subject.
  • the BEFP system components can be incorporated into a variety of formulations.
  • an effective amount of components of a BEFP system including (i) a guide RNA or nucleic acid encoding the gRNA; and/or (ii) a BEFP or nucleic acid encoding the BEFP are provided.
  • the final amount to be administered will be dependent upon the route of administration and upon the nature of the disorder or condition that is to be treated.
  • the effective amount given to a particular subject will depend on a variety of factors, several of which will differ from subject to subject.
  • the BEFP system components may be obtained from a suitable commercial source.
  • the total pharmaceutically effective amount of the BEFP system components administered parenterally per dose will be in a range that can be measured by a dose response curve.
  • Therapies based on the BEFP system components e.g., preparations of (i) a guide RNA or nucleic acid encoding the gRNA; and/or (ii) a BEFP or nucleic acid encoding the BEFP to be used for therapeutic administration, must be sterile. Sterility is readily accomplished by filtration through a sterile filtration membrane (e.g., 0.2 micrometer membrane).
  • Therapeutics based on the BEFP system components e.g., preparations of (i) a guide RNA or nucleic acid encoding the gRNA; and/or (ii) a BEFP or nucleic acid encoding the BEFP to be used for therapeutic administration, must be sterile. Sterility is readily accomplished by filtration through a sterile filtration membrane (e.g., 0.2 micrometer membrane).
  • compositions generally are placed into a container having a sterile access port, for example, an intravenous solution bag or vial having a stopper pierceable by a hypodermic injection needle.
  • a sterile access port for example, an intravenous solution bag or vial having a stopper pierceable by a hypodermic injection needle.
  • the data obtained from cell culture and/or animal studies can be used in formulating a range of dosages for humans.
  • the dosage of the active ingredient generally lies within a range of circulating concentrations that include the ED50 with low toxicity.
  • the dosage can vary within this range depending upon the dosage form employed and the route of administration utilized.
  • the pharmaceutical compositions can be administered for prophylactic and/or therapeutic treatments. Toxicity and therapeutic efficacy of the active ingredient can be determined according to standard pharmaceutical procedures in cell cultures and/or experimental animals, including, for example, determining the LD50 (the dose lethal to 0% of the population) and the ED50 (the dose therapeutically effective in 50% of the population). The dose ratio between toxic and therapeutic effects is the therapeutic index and it can be expressed as the ratio LD50/ED50. Therapies that exhibit large therapeutic indices are desirable
  • the number of administrations of treatment to a subject may vary. Introducing the pharmaceutical compositions into the subject may be a one-time event; but in certain situations, such treatment may elicit improvement for a limited period of time and require an on-going series of repeated treatments. In certain situations, multiple administrations of pharmaceutical compositions may be required before an effect is observed.
  • the exact protocols may depend upon the disease or condition, the stage of the disease, and parameters of the individual subject being treated.
  • Base editor variants containing an AP -binding domain from HMCES (SEQ ID NO: 3) or YedK (SEQ ID NO: 1) are compared to AncBE4max as well as AncBE4max lacking both uracil glycosylation inhibitors (UGIs) in a mammalian cell transfection assay followed by next- generation amplicon sequencing.
  • HMCES HMCES
  • YedK SEQ ID NO: 1
  • AncBE4max (https://www.ncbi.nlm.nih.gov/pubmed/29813047, as retrieved on February 4, 2019) is a codon-optimized base editor comprising N- and C-terminal bipartite nuclear localization signals (bis-bpNLS), an engineered APOBEC1 (Anc689) obtained by ancestral reconstruction from 468 APOBEC homologs, an S.
  • nCas9 DIOA nickase nCas9 DIOA nickase
  • two UGI moieties and connecting linker sequences 32 AA XTEN linker between Anc689 and nCas9; 10AA GS-rich linker between nCas9 and the first UGI; 10AA GS-rich linker between the first and the second UGI; 4 AA GS-rich linker between the second UGI and the C- terminal bpNLS).
  • plasmid DNA encoding the variants and plasmid DNA encoding a single guide RNA for the target loci VEGF-A (SEQ ID NO: 8) and FANCF (SEQ ID NO: 9) the cells are incubated for several hours or days to allow for base editing to occur at the target loci. The cells are then harvested, the genomic DNA is extracted and amplicons are generated using loci-specific barcoded primers for each sample.
  • the rate of base editing at each locus is quantified from the sequencing reads using the wildtype sequence of the locus as a reference.
  • AncBE4max a majority of edited bases are C- to-T conversion as expected. Lack of UGIs in AncBEmax increases the amount of undesired side products (i.e. non-C-to-T conversions and insertions or deletions).
  • the SRAP-BE variants albeit having slightly decreased editing efficiency, display a low level of such side products.
  • SRAP-BEs yield high product purity in base editing applications, circumventing the need to inhibit UNG through co-expression of/fusion to UGI.

Landscapes

  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)
  • Plant Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Molecular Biology (AREA)

Abstract

Disclosed are new nucleic acid base-editing systems that are based on a fusion protein comprising a) an RNA-programmable nucleic acid recognition domain or other suitable nucleic acid recognition domain, b) a nucleobase editing domain, and c) a domain capable of binding to apurinic/apyrimidinic (AP) sites or abasic sites within single- or double-stranded DNA, and their uses in genome editing and other applications.

Description

NUCLEOBASE-EDITING FUSION PROTEIN SYSTEMS, COMPOSITIONS,
AND USES THEREOF
Related Applications
[0001] This application claims the benefit of U.S. Provisional Patent Application Serial No. 62/815,999, filed March 8, 2019. The entire contents of which is incorporated herein by reference.
Field
[0002] The present application relates to base editing using a novel protein. In particular, the novel proteins can bind to apurinic/apyrimidinic (AP) sites or abasic sites within single- or double-stranded DNA, and are useful for genome editing and other applications.
Background
[0003] Recent developments in gene editing aim at site-specific editing of single base pairs in DNA by employing a combination of a target-specific endonuclease, in particular an RNA- programmable endonuclease (i ?.g ., a CRISPR Cas9), and a nucleobase editing enzyme such as, e.g, a cytidine deaminase (JP6206893) Such editing is believed to be obtained by a first step in which the cytidine deaminase converts a cytidine base into a uridine. The converted uridine is paired with adenine in the complementary DNA strand, and as part of the repair mechanism, the uridine is replaced with the normal complement of adenine, a thymidine, thus effecting a cytidine-to-uridine conversion that can lead (in case of base editing) to a desired permanent cytidine-to-thymidine change in the genome if, during DNA repair or replication, the uridine- containing DNA strand serves as the template and a thymidine is subsequently incorporated opposite the adenine. To promote usage of the uridine-containing strand as the template strand in nick-directed mismatch repair and to increase the efficiency of the base editing process, existing base editing systems bear a nickase function in the RNA-programmable endonuclease, which specifically introduces a nick in the DNA strand not targeted by the deaminase. A problem with this system is that cells have a natural mechanism for responding to uridine lesions that reduces the efficiency of the base editing process. This process is based on the enzyme DNA N- glycosylase (UNG), which removes the uracil base, producing an AP site or abasic site (together termed“AP site” unless otherwise distinguished). Subsequent excision of the AP site by the enzyme DNA-(apurinic or apyrimidinic site) lyase (AP lyase) results in a gap that is amenable to DNA repair with the base opposite the gap serving as the repair template. Processing of the AP site by AP lyase is particularly undesirable in base editing applications because the formation of a gap in proximity to the intentionally introduced nick on the opposing strand effectively leads to a staggered double-stranded DNA break, which may, for example, induce cell death or poorly controllable formation of insertions and deletions at or close to the targeted site An alternative DNA repair process that can also lead to undesirable outcomes during base editing involves non- templated DNA synthesis at the AP site by (translesion synthesis; TLS), which frequently results in edits other than C-to-T. This is undesirable in base editing and, as reported in the art, this repair process is sought to be suppressed by inhibiting the UNG enzyme ( e.g by co-expression or co-delivery of a uracil-glycosylase inhibitor (UGI) (W017070632, US9840699), resulting in an increase in the overall efficiency of the base editing process. However, because the UNG pathway constitutes a major mechanism that prevents cells from accumulating unwanted mutations, inhibition of this pathway is undesirable. Accordingly, providing an alternative to UGI that prevents the repair to the wildtype sequence via UNG is highly desirable.
[0004] Existing base editing systems including CRISPR-Cas 9 systems have one or more of the following disadvantages:
a) They are too large to be carried in the genome of established therapeutically suitable viral transfection systems such as adeno-associated viruses (AAVs).
b) Their activity in non-host environments, for example in eukaryotic, and in particular in mammalian, environments is too low to provide a therapeutic effect.
c) Their nucleic acid sequence recognition lacks fidelity, leading to unwanted off-target effects that can, for example, make them unsuitable for use in gene therapy methods or other applications requiring high precision.
d) Their immunogenicity is too high, which can cause problems or limit their use for in vivo applications in mammals.
e) They require complex and/or long PAMs (protospacer adjacent motifs) that restrict target selection for the DNA targeting segments.
f) They inhibit important DNA repair processes of the cell, which is generally undesirable. The novel base editing system provided herein exhibits advantageous characteristics over existing base editing systems.
Summary
[0005] In one aspect, provided herein is a base-editing fusion protein (BEFP) comprising: a) an AP-binding domain; b) a cytidine deaminase domain; and c) a nucleic acid recognition domain.
In some embodiments, the AP-binding domain comprises an SOS response-associated peptidase (SRAP) domain. In some embodiments, the SRAP domain is from 5 -hydroxymethyl cytosine binding, ESC specific (HMCES) or YedK, or a variant thereof. In some embodiments, the AP- binding domain comprises an SRAP domain from the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% sequence identity to SEQ ID NO: 5 or 6. In some embodiments, the AP-binding domain comprises an SRAP domain from the amino acid sequence of SEQ ID NO: 5 or 6. In some embodiments, the cytidine deaminase domain is from a deaminase selected from the group consisting of: APOBEC2, APOBEC3, APOBEC3E, APOBEC3F, APOBEC3G, APOBEC3H, and APOBEC4. In some embodiments, the nucleic acid recognition domain is from an RNA-programmable CRISPR-associated nuclease or a variant thereof. In some embodiments, the nucleic acid recognition domain is from a modified CRISPR-Cas9 protein that can cleave only one strand of the target DNA or has no endonuclease activity. In some embodiments, the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% sequence identity to SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain comprises a D10A mutation, an H559A mutation, and/or a N582A mutation, with respect to SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7.
[0006] In some embodiments, according to any of the BEFPs described above, the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or 4, or a variant amino sequence having at least about 85% sequence identity to SEQ ID NO: 2 or 4. In some embodiments, the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or 4.
[0007] In another aspect, provided herein is a nucleic acid encoding a BEFP according to any of the embodiments described above. In some embodiments, the nucleic acid comprises the nucleotide sequence of SEQ ID NO: 1 or 3, or a variant nucleotide sequence having at least about 85% sequence identity to SEQ ED NO: 1 or 3. In some embodiments, the nucleic acid comprises the nucleotide sequence of SEQ ID NO: 1 or 3.
[0008] In another aspect, provided herein is a system comprising: (i) a BEFP according to any of the embodiments described above or a nucleic acid encoding the BEFP according to any of the embodiments described above; and (ii) a guide RNA (gRNA) or nucleic acid encoding the gRNA
[0009] In another aspect, provided herein is a method of modifying a targeted site of a double- stranded DNA, the method comprising contacting the double-stranded DNA with: (i) a BEFP according to any of the embodiments described above or a nucleic acid encoding the BEFP according to any of the embodiments described above; and (ii) a guide RNA (gRNA) or nucleic acid encoding the gRNA In some embodiments, the double-stranded DNA encodes a protein-of- interest (POI) or derivative thereof. In some embodiments, the double-stranded DNA is in a cell.
[0010] In another aspect, provided herein is a genetically modified cell in which the genome of the cell is edited by a method according to any of the embodiments described above.
[0011] In another aspect, provided herein is a method of treating a disease or condition associated with a protein-of-interest (POI) in a subject, comprising providing to a cell in the subject: (i) a BEFP according to any of the embodiments described above or a nucleic acid encoding the BEFP according to any of the embodiments described above; and (ii) a guide RNA (gRNA) or nucleic acid encoding the gRNA. In some embodiments, the subject is a patient having or suspected of having the disease or condition or the subject is diagnosed with a risk of the disease or condition.
[0012] In another aspect, provided herein is a kit comprising one or more elements of a system according to any of the embodiments described above, and further comprising instructions for use.
Detailed Description of the Invention
[0013] Disclosed herein is a new base-editing system based on a base-editing fusion protein (BEFP) comprising a) an RNA-programmable nucleic acid recognition domain or other suitable nucleic acid recognition domain (e.g, an RNA-programmable nucleic acid recognition domain from a Cas protein), b) a nucleobase editing domain (e.g., a nucleobase editing domain from a cytidine deaminase), and c) a domain capable of binding to apurinic/apyrimidinic (AP) sites or abasic sites within single- or double-stranded DNA ( e.g ., an SOS response-associated peptidase, or SRAP, domain), and the use of such systems and BEFPs in genome editing and other applications.
[0014] One embodiment according to the invention is a BEFP comprising an RNA- programmable nucleic acid recognition domain from a Cas protein (e.g., Staphyloccociis lugdunensis (Slu) Cas9), a cytidine deaminase domain, and an AP -binding domain from 5- hydroxymethylcytosine binding, ESC specific (HMCES). For example, in some embodiments, the BEFP comprises the amino acid sequence of SEQ ID NO: 4 or a variant thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%,
94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ED NO: 4.
[0015] Another embodiment according to the invention is a BEFP comprising an RNA- programmable nucleic acid recognition domain from a Cas protein (e.g., Staphyloccociis lugdunensis (Slu) Cas9), a cytidine deaminase domain, and an AP -binding domain from YedK, an SOS response-associated peptidase. For example, in some embodiments, the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or a variant thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 2.
[0016] SRAP domains bind to AP sites that can be generated as an undesired side product during the process of base editing through the action of uracil-DNA glycosylase (UNG or UDG). Binding of a SRAP domain to an AP site is thought to protect this site from endonuclease activity and translesion synthesis and therefore, to diminish the generation of double-strand breaks and diversifying edits. Mechanistically, this might lead to reversion to wildtype sequences and might allow for re-targeting of the same site by the base editor. Thus, this novel BEFP provides a solution for base editing in the absence of UGI.
[0017] Disclosed herein are novel base editing systems, methods, reagents, and kits containing the same that allow editing of single bases in a target DNA with high efficiency and high precision. In some embodiments, the base-editing fusion protein (BEFP) comprises
a) a domain that can covalently bind to an AP site (an AP -binding domain);
b) a domain having a cytidine deaminase activity (a cytidine deaminase domain); and c) a domain having a nucleic acid recognition activity (a nucleic acid recognition domain). [0018] Optionally, the BEFP contains a suitable linker polypeptide between domains a, b, and c
[0019] Further, optionally, the BEFP comprises one or more nuclear localization signals.
[0020] In some cases, the domain with covalent AP site binding activity comprises an SOS response-associated peptidase (SRAP) domain or a 5-hydroxymethyl cytosine binding, ES cell specific (HMCES) protein and is placed in front (N-terminally) of the other domains (the cytidine deaminase domain and the nucleic acid recognition domain) and components of the BEFP. In some embodiments, the BEFP starts with this domain at the N-terminus.
[0021] In some embodiments, a BEFP comprises the following components:
a. an AP -binding domain;
b. a linker peptide;
c. a first nuclear localization signal;
d. a first linker peptide;
e. a cytidine deaminase domain;
f. a second linker peptide;
g. a nucleic acid recognition domain;
h. a linker peptide; and
i. a second nuclear localization signal.
[0022] In some embodiments, a BEFP comprises the following components in the following order from N-terminus to C-terminus:
a. an AP -binding domain;
b. a first linker peptide;
c. a first nuclear localization signal;
d. a second linker peptide;
e. a cytidine deaminase domain;
f. a third linker peptide;
g. a nucleic acid recognition domain;
h. a fourth linker peptide; and
i. a second nuclear localization signal.
[0023] In some embodiments, a BEFP according to the invention comprises the amino acid sequence of SEQ ID NO: 2 or 4, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 2 or 4. In some embodiments, a nucleic acid encoding a BEFP according to the invention comprises a nucleic acid sequence encoding the amino acid sequence of SEQ ID NO: 2 or 4 or a variant amino acid sequence thereof. In some embodiments, a nucleic acid encoding a BEFP comprises the nucleic acid sequence of SEQ ID NO: 1 or 3, or a variant nucleic acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 1 or 3.
[0024] The domains of the polypeptides according to the invention may either be connected directly or via a linker peptide. When two or more linker peptides are present in the domain, the linker peptides may be the same or different. Suitable linker peptides include oligopeptide or polypeptide sequences. Linker peptides may be rigid or flexible, and may contain sites designed to be cleaved by protease activity. Such linker peptides may function to increase stability or folding of the domains, increase expression, enable targeting, or improve other biological activity. Various linker peptides are known in the art. See , e.g., Chen, et al. Adv. Drug Deliv. Rev. 2013, 65(10): 1357-1369. In some embodiments, those sequences allow some flexibility between the domains they connect. In some embodiments, the linker peptides comprise one or more of the amino acid sequences listed in paragraph 0025 of W02017070632 (A2).
[0025] Nuclear localization signals (NLS) are polypeptide sequences in a protein that enable transport of the protein into the nucleus of eukaryotic cells. When two or more NLS sequences are present in the protein, the NLS sequences may be the same or different. Various NLS sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in WO/2001/038547, the contents of which are incorporated herein by reference for their disclosure of exemplary nuclear localization sequences. Such NLSs include, without limitation, the nucleoplasmin bipartite NLS, the c-myc nuclear localization sequence, and the hRNPAI M9 nuclear localization sequence. Exemplary NLSs include those listed in paragraph 00204 of WO2017070632 (A2).
DEFINITIONS
[0026] The terms“polynucleotide,” and“nucleic acid,” used interchangeably herein, refer to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. Thus, this term includes, but is not limited to, single-, double-, and multi-stranded DNA and RNA, genomic DNA, cDNA, DNA-RNA hybrids/triple helices, and polymers including purine and pyrimidine bases or other natural, chemically or biochemically modified, non-natural, or derivatized nucleotide bases. The terms“polynucleotide” and“nucleic acid” should be understood to include, as applicable to the embodiments being described, single-stranded (such as sense or antisense) and double-stranded nucleic acids.
[0027] “Oligonucleotide” generally refers to single- or double-stranded polynucleotides at least about 5 nucleotides in length, unless otherwise indicated. Oligonucleotides are also known as “oligomers” or“oligos” and may be isolated from genes or chemically synthesized by methods known in the art.
[0028] “Genomic DNA” refers to the DNA of a genome of an organism including, but not limited to, the DNA of the genome of a bacterium, fungus, archaeon, protist, virus, plant, or animal.
[0029] The term“manipulating” DNA encompasses binding, nicking one strand, or cleaving, e.g., cutting both strands of the DNA; or encompasses modifying or editing the DNA or a polypeptide associated with the DNA. Manipulating DNA can silence, activate, or modulate (either increase or decrease) the expression of an RNA or polypeptide encoded by the DNA, or prevent or enhance the binding of a polypeptide to DNA.
[0030] By“hybridizable” or“complementary” or“substantially complementary” it is meant that a nucleic acid (e.g, RNA or DNA) includes a sequence of nucleotides that enables it to non- covalently bind, e.g, form Watson-Crick base pairs and/or G/U base pairs,“anneal”, or “hybridize,” to another nucleic acid in a sequence-specific, antiparallel, manner (e.g, a nucleic acid specifically binds to a complementary nucleic acid) under the appropriate in vitro and/or in vivo conditions of temperature and solution ionic strength. In the context of this disclosure, a guanine (G) of a protein-binding segment (dsRNA duplex) of a guide RNA molecule is considered complementary to a uracil (U), and vice versa. As such, when a G U base-pair can be made at a given nucleotide position a protein-binding segment (dsRNA duplex) of a guide RNA molecule, the position is not considered to be non-complementary, but is instead considered to be complementary.
[0031] It is understood in the art that the sequence of a nucleic acid need not be 100% complementary to that of a target nucleic acid to be specifically hybridizable. Moreover, a nucleic acid may hybridize over one or more segments such that intervening or adjacent segments are not involved in the hybridization event ( e.g ., a loop structure or hairpin structure).
A nucleic acid can include at least 70%, at least 80%, at least 90%, at least 95%, at least 99%, or 100% sequence complementarity to a target region within the target nucleic acid sequence to which they are targeted. For example, an antisense nucleic acid in which 18 of 20 nucleotides of the antisense compound are complementary to a target region, and would therefore specifically hybridize, would represent 90 percent complementarity. In this example, the remaining non complementary nucleotides may be clustered or interspersed with complementary nucleotides and need not be contiguous to each other or to complementary nucleotides. Percent
complementarity between particular stretches of nucleic acid sequences within nucleic acids can be determined routinely using methods known in the art, for example, a BLAST program (basic local alignment search tools) and/or PowerBLAST program (Altschul et al., J. Mol. Biol.
1990,215, 403-410; Zhang and Madden, Genome Res., 1997,7, 649-656) or by using the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison Wis ), using default settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math. 1981(2) 482-489).
[0032] The term“peptide” generally refers to a chain of 50 amino acids or fewer. The terms “polypeptide” and“protein” are used interchangeably herein, and refer to a polymeric form of amino acids of any length, which can include coded and non-coded amino acids, chemically or biochemically modified or derivatized amino acids, and polypeptides having modified peptide backbones.
[0033] “Binding” as used herein (e.g., with reference to an RNA-binding domain of a polypeptide) refers to a non-covalent interaction between macromolecules (e.g., between a protein and a nucleic acid). While in a state of non-covalent interaction, the macromolecules are said to be“associated” or“interacting” or“binding” (e.g., when a molecule X is said to interact with a molecule Y, it is meant the molecule X binds to molecule Y in a non-covalent manner). Not all components of a binding interaction need be sequence-specific (e.g, contacts with phosphate residues in a DNA backbone), but some portions of a binding interaction may be sequence-specific. Binding interactions are generally characterized by a dissociation constant (Kd) of less than 10-6 M, less than 10-7 M, less than 10-8 M, less than 10-9 M, less than 10-10 M, less than 10-11 M, less than 10-12 M, less than 10-13 M, less than 10-14 M, or less than 10- 15 M.“Affinity” refers to the strength of binding, increased binding affinity being correlated with a lower Kd.
[0034] By“binding domain” it is meant a protein domain that can bind non-covalently to another molecule. A binding domain can bind to, for example, a DNA molecule (and can be termed a“DNA-binding protein”), an RNA molecule (and can be termed an“RNA-binding protein”) and/or a protein molecule (and can be termed a“protein-binding protein”). In the case of a protein domain-binding protein, the binding domain can bind to itself (forming homo dimers, homo-trimers, etc.) and/or it can bind to one or more molecules of a different protein or proteins.
[0035] The term“conservative amino acid substitution” refers to the interchangeability in proteins of amino acid residues having similar side chains. For example, a group of amino acids having aliphatic side chains consists of glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains consists of serine and threonine; a group of amino acids having amide containing side chains consisting of asparagine and glutamine; a group of amino acids having aromatic side chains consists of phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains consists of lysine, arginine, and histidine; a group of amino acids having acidic side chains consists of glutamate and aspartate; and a group of amino acids having sulfur containing side chains consists of cysteine and methionine. Exemplary conservative amino acid substitution groups are: valine-leucine- isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, and asparagine-glutamine.
[0036] A nucleic acid or polypeptide has a certain percent“sequence identity” to another nucleic acid or polypeptide, meaning that, when aligned, that percentage of bases or amino acids are the same, and in the same relative position, when comparing the two sequences. Sequence identity can be determined using a number of different methods. To determine sequence identity, sequences can be aligned using various methods and computer programs ( e.g ., BLAST, T- COFFEE, MUSCLE, MAFFT, etc.), available over the worldwide-web at sites including ncbi.nlm.nili.gov/BLAST, ebi.ac.uk/Tools/msa tcoffee, ebi.Ac.Uk/Tools/msa/muscle, mafft.cbrc/alignment/software. See, e.g., Altschul et al. (1990), L Mol. Biol. 215:403-10 In some embodiments of the disclosure, sequence alignments standard in the art are used according to the disclosure to determine amino acid residues in a BEFP domain that“correspond to” amino acid residues in another polypeptide from which the BEFP domain is derived, e.g., a Cas9 endonuclease. The amino acid residues of a BEFP that correspond to amino acid residues of one or more other polypeptides appear at the same position in alignments of the sequences.
[0037] A DNA sequence that“encodes” a particular RNA is a DNA nucleic acid sequence that is transcribed into the RNA. A polydeoxyribonucleotide may encode an RNA (mRNA) containing a sequence that is translated into protein, or a polydeoxyribonucleotide may encode an RNA that is not translated into protein (e.g., tRNA, rRNA, siRNA, miRNA, or guide RNA; also called“non-coding” RNA or“ncRNA”). A“protein coding sequence” or a sequence that encodes a particular protein or polypeptide, is a nucleic acid sequence that is transcribed into mRNA (in the case of DNA) and is translated (in the case of mRNA) into a polypeptide in vitro or in vivo when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a start codon at the 5' terminus (N-terminus) and a translation stop nonsense codon at the 3' terminus (C -terminus). A coding sequence can include, but is not limited to, cDNA from prokaryotic or eukaryotic mRNA, genomic DNA sequences from prokaryotic or eukaryotic DNA, and synthetic nucleic acids. A transcription termination sequence is generally located at 3' of the coding sequence.
[0038] As used herein, a“promoter sequence” or“promoter” is a DNA regulatory region capable of binding RNA polymerase and initiating transcription of a downstream (3' direction) coding or non-coding sequence. As used herein, the promoter sequence is bounded at its 3' terminus by the transcription initiation site and extends upstream (5' direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence is a transcription initiation site, as well as protein binding domains responsible for the binding of RNA polymerase. Eukaryotic promoters often, but not always, contain“TATA” boxes and“CAAT” boxes. Various promoters, including inducible promoters, may be used to drive the various vectors of the present disclosure. A promoter can be a constitutively active promoter (e.g., a promoter that is constitutively in an active“ON” state), it may be an inducible promoter (e.g., a promoter whose state, active/” ON” or inactive/” OFF”, is controlled by an external stimulus, e.g., the presence of a particular temperature, compound, or protein.), it may be a spatially restricted promoter (e.g,
transcriptional control element, enhancer, etc )(e.g., tissue specific promoter, cell type specific promoter, etc.), and it may be a temporally restricted promoter (e.g., the promoter is in the“ON” state or“OFF” state during specific stages of embryonic development or during specific stages of a biological process, e.g, hair follicle cycle in mice). Suitable promoters can be derived from viruses and can therefore be referred to as viral promoters, or they can be derived from any organism, including prokaryotic or eukaryotic organisms. Suitable promoters can be used to drive expression by any RNA polymerase (e.g., pol I, pol II, pol III, pol IV, and pol V).
Exemplary promoters include, but are not limited to the SV40 early promoter, mouse mammary tumor virus long terminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP); a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promoter such as the CMV immediate early promoter region (CMVIE), a rous sarcoma virus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishi et al. , Nature Biotechnology 20, 497-500 (2002)), an enhanced U6 promoter (e.g. , Xia et al., Nucleic Acids Res. 2003 Sep 1;31(17)), a human HI promoter (HI), and the like. Examples of inducible promoters include, but are not limited to T7 RNA polymerase promoter, T3 RNA polymerase promoter, isopropyl -beta-D- thiogalactopyranoside (IPTG)-regulated promoter, lactose induced promoter, heat shock promoter, Tetracycline-regulated promoter, steroid-regulated promoter, metal-regulated promoter, estrogen receptor-regulated promoter, etc. Inducible promoters can therefore be regulated by molecules including, but not limited to, doxycycline; RNA polymerase, e.g., T7 RNA polymerase; an estrogen receptor; an estrogen receptor fusion; etc.
[0039] In some embodiments, the promoter is a spatially restricted promoter (e.g., cell type specific promoter, tissue specific promoter, etc.) such that in a multi-cellular organism, the promoter is active (e.g,“ON”) in a subset of specific cells. Spatially restricted promoters may also be referred to as enhancers, transcriptional control elements, control sequences, etc. Any suitable spatially restricted promoter may be used and the choice of suitable promoter (e.g, a brain specific promoter, a promoter that drives expression in a subset of neurons, a promoter that drives expression in the germline, a promoter that drives expression in the lungs, a promoter that drives expression in muscles, a promoter that drives expression in islet cells of the pancreas, etc ) will depend on the organism. For example, various spatially restricted promoters are known for plants, flies, worms, mammals, mice, etc. Thus, a spatially restricted promoter can be used to regulate the expression of a nucleic acid encoding a site-specific modifying enzyme in a wide variety of different tissues and cell types, depending on the organism. Some spatially restricted promoters are also temporally restricted such that the promoter is in the“ON” state or“OFF” state during specific stages of embryonic development or during specific stages of a biological process ( e.g ., hair follicle cycle in mice). For illustration purposes, examples of spatially restricted promoters include, but are not limited to, neuron-specific promoters, adipocyte-specific promoters, cardiomyocyte-specific promoters, smooth muscle-specific promoters, photoreceptor- specific promoters, etc. Neuron-specific spatially restricted promoters include, but are not limited to, a neuron-specific enolase (NSE) promoter (see, e.g., EMBL HSEN02, X51956); an aromatic amino acid decarboxylase (AADC) promoter; a neurofilament promoter (see, e.g., GenBank HUMNFL, L04147); a synapsin promoter (see, e.g., GenBank HUMSYNIB, M55301); a thy-1 promoter (see, e.g, Chen et al. (1987) Cell 51 :7-19; and Llewellyn, et al. (2010) Nat. Med. 16(10): 1161-1166); a serotonin receptor promoter (see, e.g. , GenBank S62283), a tyrosine hydroxylase promoter (TH) (see, e.g, Oh et al. (2009) Gene Ther. 16:437; Sasaoka et al. (1992) Mol. Brain Res. 16:274; Boundy et al.(1998) J. Neurosci. 18:9989; and Kaneda et al. (1991) Neuron 6:583-594); a GnRH promoter (see, e.g, Radovick et al. (1991) Proc. Natl. Acad. Sci. USA 88:3402-3406); an L7 promoter (see, e.g., Oberdick et al.(1990) Science 248:223-226); a DNMT promoter (see, e.g., Bartge et al. (1988) Proc. Natl. Acad. Sci. USA 85:3648-3652); an enkephalin promoter (see, e.g., Comb et al. (1988) EMBO J. 17:3793-3805); a myelin basic protein (MBP) promoter; a Ca2+-calmodulin-dependent protein kinase 11 -alpha (CamKIM) promoter (see, e.g, Mayford et al. (1996) Proc. Natl. Acad. Sci. USA 93: 13250; and Casanova et al. (2001) Genesis 31 :37); a CMV enhancer/platelet-derived growth factor-p promoter (see, e.g, Liu et al. (2004) Gene Therapy 11:52-60); and the like.
[0040] The terms“DNA regulatory sequences,”“control elements,” and“regulatory elements,” used interchangeably herein, refer to transcriptional and translational control sequences, such as promoters, enhancers, polyadenylation signals, terminators, and the like, that provide for and/or regulate transcription of a nucleic acid sequence (e.g, a sequence encoding a guide RNA or a sequence encoding a BEFP) and/or regulate translation of an encoded polypeptide.
[0041] The term“naturally-occurring” or“unmodified” as used herein as applied to a nucleic acid, a polypeptide, a cell, or an organism, refers to a nucleic acid, polypeptide, cell, or organism that is found in nature. For example, a polypeptide or nucleic acid sequence that is present in an organism (including in a virus) that can be isolated from a source in nature and that has not been intentionally modified by a human in the laboratory is naturally occurring.
[0042] “Heterologous,” as used herein, means a nucleotide or peptide that is not found in the native nucleic acid or protein, respectively. A BEFP described herein may comprise the RNA- binding domain of the BEFP (or a variant thereof) fused to a heterologous polypeptide sequence ( e.g ., a polypeptide sequence from a protein other than BEFP). The heterologous polypeptide may exhibit an activity (e.g., enzymatic activity) that will also be exhibited by the BEFP (e.g, methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.). A heterologous nucleic acid may be linked to a naturally-occurring nucleic acid (or a variant thereof) (e.g., by genetic engineering) to generate a fusion nucleic acid encoding a fusion polypeptide. As another example, in a fusion variant BEFP, a variant BEFP may be fused to a heterologous polypeptide (e.g, a polypeptide other than BEFP), which exhibits an activity that will also be exhibited by the fusion variant BEFP. A heterologous nucleic acid may be linked to a variant BEFP (e.g, by genetic engineering) to generate a nucleic acid encoding a fusion variant BEFP.“Heterologous,” as used herein, additionally means a nucleotide or polypeptide in a cell that is not its native cell.
[0043] The term“cognate” refers to two biomolecules that normally interact or co-exist in nature.
[0044] “Recombinant,” as used herein, means that a particular nucleic acid (DNA or RNA) or vector is the product of various combinations of cloning, restriction, polymerase chain reaction (PCR) and/or ligation steps resulting in a construct having a structural coding or non-coding sequence distinguishable from endogenous nucleic acids found in natural systems. DNA sequences encoding polypeptides can be assembled from cDNA fragments or from a series of synthetic oligonucleotides, to provide a synthetic nucleic acid that is capable of being expressed from a recombinant transcriptional unit contained in a cell or in a cell-free transcription and translation system. Genomic DNA comprising the relevant sequences can also be used in the formation of a recombinant gene or transcriptional unit. Sequences of non-translated DNA may be present 5' or 3' from the open reading frame, where such sequences do not interfere with manipulation or expression of the coding regions, and may indeed act to modulate production of a desired product by various mechanisms (see“DNA regulatory sequences”, below). In addition or alternatively, a DNA sequence encoding RNA (e.g, guide RNA) that is not translated may also be considered recombinant. Thus, e.g, the term“recombinant” nucleic acid refers to one which is not naturally occurring, e.g, is made by the artificial combination of two otherwise separated segments of sequence through human intervention. This artificial combination can be accomplished by chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques. Such is generally done to replace a codon with a codon encoding the same amino acid, a conservative amino acid, or a non conservative amino acid. In addition or alternatively, it is performed to join together nucleic acid segments of desired functions to generate a desired combination of functions. This artificial combination is often accomplished by either chemical synthesis means, or by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
When a recombinant nucleic acid encodes a polypeptide, the sequence of the encoded polypeptide can be naturally occurring (“wild type”) or can be a variant (e.g., a mutant) of the naturally occurring sequence. Thus, the term“recombinant” polypeptide does not necessarily refer to a polypeptide whose sequence does not naturally occur Instead, a“recombinant” polypeptide is encoded by a recombinant DNA sequence, but the sequence of the polypeptide can be naturally occurring (“wild type”) or non-naturally occurring (e.g, a variant, a mutant, etc.). Thus, a“recombinant” polypeptide is the result of human intervention, but may be a naturally occurring amino acid sequence. The term“non-naturally occurring” includes molecules that are markedly different from their naturally occurring counterparts, including chemically modified or mutated molecules.
[0045] A“vector” or“expression vector” is a replicon, such as plasmid, phage, virus, or cosmid, to which another DNA segment, e.g, an“insert”, may be attached so as to bring about the replication of the attached segment in a cell.
[0046] An“expression cassette” includes a DNA coding sequence operably linked to a promoter.“Operably linked” refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. For example, a promoter is operably linked to a coding sequence if the promoter affects its transcription or expression. The terms“recombinant expression vector,” or“DNA construct” are used interchangeably herein to refer to a DNA molecule comprising a vector and at least one insert. Recombinant expression vectors are generally generated for the purpose of expressing and/or propagating the insert(s), or for the construction of other recombinant nucleotide sequences. The nucleic acid(s) may or may not be operably linked to a promoter sequence and may or may not be operably linked to DNA regulatory sequences.
[0047] The term“operably linked”, as used herein, denotes a physical or functional linkage between two or more elements, e.g ., polypeptide sequences or nucleic acid sequences, which permits them to operate in their intended fashion. For example, an operably linkage between a nucleic acid of interest and a regulatory sequence (for example, a promoter) is functional link that allows for expression of the nucleic acid of interest. In this sense, the term“operably linked” refers to the positioning of a regulatory region and a coding sequence to be transcribed so that the regulatory region is effective for regulating transcription or translation of the coding sequence of interest. In some embodiments disclosed herein, the term“operably linked” denotes a configuration in which a regulatory sequence is placed at an appropriate position relative to a sequence that encodes a polypeptide or functional RNA such that the control sequence directs or regulates the expression or cellular localization of the mRNA encoding the polypeptide, the polypeptide, and/or the functional RNA. Thus, a promoter is in operable linkage with a nucleic acid sequence if it can mediate transcription of the nucleic acid sequence. Operably linked elements may be contiguous or non-conti guous.
[0048] A cell has been“genetically modified” or“transformed” or“transfected” by exogenous DNA, e.g., a recombinant expression vector, when such DNA has been introduced inside the cell. The presence of the exogenous DNA results in permanent or transient genetic change. The transforming DNA may or may not be integrated (covalently linked) into the genome of the cell.
[0049] In prokaryotes, yeast, and mammalian cells, for example, a transforming DNA can be maintained on an episomal element such as a plasmid. With respect to eukaryotic cells, a stably transformed cell is one in which the transforming DNA is integrated into a chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones that include a population of daughter cells containing the transforming DNA integrated into chromosomal DNA. A“clone” is a population of cells derived from a single cell or common ancestor by mitosis. A“cell line” is a clone of a primary cell that is capable of stable growth in vitro for many generations.
[0050] Suitable methods of genetic modification (also referred to as“transformation”) include, but are not limited to, e.g, viral or bacteriophage infection, transfection, conjugation, protoplast fusion, lipofection, electroporation, calcium phosphate precipitation, polyethyleneimine (PEI)- mediated transfection, DEAE-dextran mediated transfection, liposome-mediated transfection, particle gun technology, calcium phosphate precipitation, direct micro injection, nanoparticle- mediated nucleic acid delivery (see, e.g., Panyam et al., Adv Drug Deliv Rev. 2012 Sep 13. pp: SO 169-409X(12)00283-9. doi : 10.1016/j addr.2012.09.023 ), and the like.
[0051] A“host cell,” as used herein, denotes an in vivo or in vitro eukaryotic cell, a prokaryotic cell ( e.g ., bacterial or archaeal cell), or a cell from a multicellular organism (e.g, a cell line) cultured as a unicellular entity, which eukaryotic or prokaryotic cells can be, or have been, used as recipients for a nucleic acid, and include the progeny of the original cell which has been transformed by the nucleic acid. It is understood that the progeny of a single cell may not necessarily be completely identical in morphology or in genomic or total DNA complement as the original parent, due to natural, accidental, or deliberate mutation. A“recombinant host cell” (also referred to as a“genetically modified host cell”) is a host cell into which has been introduced a heterologous nucleic acid, e.g., an expression vector. For example, a bacterial host cell is a genetically modified bacterial host cell by virtue of introduction into a suitable bacterial host cell of an exogenous nucleic acid (e.g, a plasmid or recombinant expression vector) and a eukaryotic host cell is a genetically modified eukaryotic host cell (e.g, a mammalian germ cell), by virtue of introduction into a suitable eukaryotic host cell of an exogenous nucleic acid.
[0052] A“target DNA” as used herein is a polydeoxyribonucleotide that includes a“target site” or“target sequence.” The terms“target site,”“target sequence,”“target protospacer DNA,” or“protospacer-like sequence” are used interchangeably herein to refer to a nucleic acid sequence present in a target DNA to which a DNA-targeting segment (also referred to as a “spacer”) of a guide RNA can bind, provided permissive conditions for binding exist. For example, the target site (or target sequence) 5'- GAGCATATC-31 within a target DNA is targeted by (or is bound by, or hybridizes with, or is complementary to) the RNA sequence 5'- GAUAUGCUC-3'. Suitable DNA/RNA binding conditions include physiological conditions normally present in a cell. Other suitable DNA/RNA binding conditions (e.g, conditions in a cell-free system) are known in the art; see, e.g., Sambrook, supra. The strand of the target DNA that is complementary to and hybridizes with the guide RNA is referred to as the
“complementary strand” and the strand of the target DNA that is complementary to the “complementary strand” (and is therefore not complementary to the guide RNA) is referred to as the“non-complementary strand” or“non-complementary strand.”
[0053] By“site-specific modifying enzyme” or“RNA-binding site-specific modifying enzyme” is meant a polypeptide that binds RNA and is targeted to a specific DNA sequence, such as a BEFP A site-specific modifying enzyme as described herein is targeted to a specific DNA sequence by the RNA molecule to which it is bound. The RNA molecule includes a sequence that binds, hybridizes to, or is complementary to a target sequence within the target DNA, thus targeting the bound polypeptide to a specific location within the target DNA (the target sequence). By“cleavage” it is meant the breakage of the covalent backbone of a DNA molecule. Cleavage can be initiated by a variety of methods including, but not limited to, enzymatic or chemical hydrolysis of a phosphodi ester bond. Both single-stranded cleavage and double-stranded cleavage are possible, and double-stranded cleavage can occur as a result of two distinct single-stranded cleavage events. DNA cleavage can result in the production of either blunt ends or staggered ends. In certain embodiments, a complex comprising a guide RNA and a site-specific modifying enzyme is used for targeted double-stranded DNA cleavage.
[0054] “Nuclease” and“endonuclease” are used interchangeably herein to mean an enzyme that possesses endonucleolytic catalytic activity for nucleic acid cleavage.
[0055] By“cleavage domain” or“active domain” or“nuclease domain” of a nuclease it is meant the polypeptide sequence or domain within the nuclease which possesses the catalytic activity for DNA cleavage. A cleavage domain can be contained in a single polypeptide chain or cleavage activity can result from the association of two (or more) polypeptides. A single nuclease domain may consist of more than one isolated stretch of amino acids within a given polypeptide.
[0056] The“guide sequence” or“DNA-targeting segment” or“DNA-targeting sequence” or “spacer” includes a nucleotide sequence that is complementary to a specific sequence within a target DNA (the complementary strand of the target DNA) designated the“protospacer-like” sequence herein. The protein-binding segment (or“protein-binding sequence”) interacts with a site-specific modifying enzyme. When the site-specific modifying enzyme is a BEFP or BEFP- related polypeptide (described in more detail below), site-specific cleavage of the target DNA occurs at locations determined by both (i) base-pairing complementarity between the guide RNA and the target DNA; and (ii) a short motif (referred to as the protospacer adjacent motif (PAM)) in the target DNA. The protein-binding segment of a guide RNA includes, in part, two complementary stretches of nucleotides that hybridize to one another to form a double-stranded RNA duplex (dsRNA duplex). In some embodiments, a nucleic acid (e.g., a guide RNA, a nucleic acid encoding a guide RNA; a nucleic acid encoding a site-specific modifying enzyme; etc.) includes a modification or sequence that provides for an additional desirable feature (e.g., modified or regulated stability; subcellular targeting; tracking, e.g., a fluorescent label; a binding site for a protein or protein complex, etc.). Non-limiting examples include: a 5' cap ( e.g ., a 7- methylguanylate cap (m7G)); a 3' polyadenylated tail (e.g., a 3' poly(A) tail); a riboswitch sequence (e.g., to allow for regulated stability and/or regulated accessibility by proteins and/or protein complexes); a stability control sequence; a sequence that forms a dsRNA duplex (e.g., a hairpin)); a modification or sequence that targets the RNA to a subcellular location (e.g., nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g, direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc.); a modification or sequence that provides a binding site for proteins (e.g, proteins that act on DNA, including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA
demethylases, histone acetyltransferases, histone deacetylases, and the like); and combinations thereof.
[0057] In some embodiments, a guide RNA includes an additional segment at either the 51 or 3' end that provides for any of the features described above. For example, a suitable third segment can include a 5' cap (e.g, a 7-methylguanylate cap (m7G)); a 3' polyadenylated tail (e.g, a 3' poly(A) tail); a riboswitch sequence (e.g, to allow for regulated stability and/or regulated accessibility by proteins and protein complexes); a stability control sequence; a sequence that forms a dsRNA duplex (e.g, a hairpin)); a sequence that targets the RNA to a subcellular location (e.g, nucleus, mitochondria, chloroplasts, and the like); a modification or sequence that provides for tracking (e.g, direct conjugation to a fluorescent molecule, conjugation to a moiety that facilitates fluorescent detection, a sequence that allows for fluorescent detection, etc ); a modification or sequence that provides a binding site for proteins (e.g, proteins that act on DNA. including transcriptional activators, transcriptional repressors, DNA methyltransferases, DNA demethylases, histone acetyltransferases, histone deacetylases, and the like); and combinations thereof.
[0058] A guide RNA and a site-specific modifying enzyme such as a BEFP may form a ribonucleoprotein complex (e.g, bind via non-covalent interactions). The guide RNA provides target specificity to the complex by comprising a nucleotide sequence that is complementary to a sequence of a target DNA. The site-specific modifying enzyme of the complex provides the modifying activity. In other words, the site-specific modifying enzyme is guided to a target DNA sequence ( e.g ., a target sequence in a chromosomal nucleic acid; a target sequence in an extrachromosomal nucleic acid, e.g., an episomal nucleic acid, a minicircle, etc ; a target sequence in a mitochondrial nucleic acid; a target sequence in a chloroplast nucleic acid; a target sequence in a plasmid; etc.) by virtue of its association with the protein-binding segment of the guide RNA. RNA aptamers are known in the art and are generally a synthetic version of a riboswitch. The terms“RNA aptamer” and“riboswitch” are used interchangeably herein to encompass both synthetic and natural nucleic acid sequences that provide for inducible regulation of the structure (and therefore the availability of specific sequences) of the RNA molecule of which they are part. RNA aptamers generally include a sequence that folds into a particular structure (e.g., a hairpin), which specifically binds a particular drug (e.g, a small molecule). Binding of the drug causes a structural change in the folding of the RNA, which changes a feature of the nucleic acid of which the aptamer is a part. As non-limiting examples:
(i) an activator-RNA with an aptamer may not be able to bind to the cognate targeter RNA unless the aptamer is bound by the appropriate drug; (ii) a targeter-RNA with an aptamer may not be able to bind to the cognate activator-RNA unless the aptamer is bound by the appropriate drug; and (iii) a targeter-RNA and an activator-RNA, each comprising a different aptamer that binds a different drug, may not be able to bind to each other unless both drugs are present. As illustrated by these examples, a two-molecule guide RNA can be designed to be inducible.
[0059] Examples of aptamers and riboswitches can be found, for example, in: Nakamura et al., Genes Cells. 2012 May; 17(5) 344-64; Vavalle et ah, Future Cardiol. 2012 May; 8(3):371-82; Citartan et ah, Biosens Bioelectron. 2012 Apr 15; 34(1): 1-11; and Liberman et ah, Wiley lnterdiscip Rev RNA. 2012 May-Jun; 3(3):369-84; all of which are herein incorporated by reference in their entireties.
[0060] The choice of method of genetic modification is generally dependent on the type of cell being transformed and the circumstances under which the transformation is taking place (e.g, in vitro, ex vivo, or in vivo). A general discussion of these methods can be found in Ausubel, et ah, Short Protocols in Molecular Biology, 3rd ed., Wiley & Sons, 1995.
[0061] Examples of aptamers and riboswitches can be found, for example, in: Nakamura et ah, Genes Cells. 2012 May; 17(5):344-64; Vavalle et ah, Future Cardiol. 2012 May; 8(3):371-82; Citartan et ah, Biosens Bioelectron. 2012 Apr 15; 34(1): 1-11; and Liberman et ah, Wiley lnterdiscip Rev RNA. 2012 May-Jun; 3(3):369-84; all of which are herein incorporated by reference in their entirety.
[0062] The term“stem cell” is used herein to refer to a cell ( e.g ., plant stem cell, vertebrate stem cell) that has the ability both to self-renew and to generate a differentiated cell type (see Morrison et al. (1997) Cell 88:287-298). In the context of cell ontogeny, the adjective “differentiated”, or“differentiating” is a relative term. A“differentiated cell” is a cell that has progressed further down the developmental pathway than the cell it is being compared with. Thus, pluripotent stem cells (described below) can differentiate into lineage-restricted progenitor cells (e.g., mesodermal stem cells), which in turn can differentiate into cells that are further restricted (e.g., neuron progenitors), which can differentiate into end-stage cells (e.g, terminally differentiated cells, e.g, neurons cardiomyocytes, etc ), which play a characteristic role in a certain tissue type, and may or may not retain the capacity to proliferate further. Stem cells may be characterized by both the presence of specific markers (e.g., proteins, RNAs, etc.) and the absence of specific markers. Stem cells may also be identified by functional assays both in vitro and in vivo, particularly assays relating to the ability of stem cells to give rise to multiple differentiated progeny.
[0063] Stem cells of interest include pluripotent stem cells (PSCs). The term“pluripotent stem cell” or“PSC” is used herein to mean a stem cell capable of producing all cell types of the organism. Therefore, a PSC can give rise to cells of all germ layers of the organism (e.g, the endoderm, mesoderm, and ectoderm of a vertebrate). Pluripotent cells are capable of forming teratomas and of contributing to ectoderm, mesoderm, or endoderm tissues in a living organism. Pluripotent stem cells of plants are capable of giving rise to all cell types of the plant (e.g, cells of the root, stem, leaves, etc.).
[0064] PSCs of animals can be derived in a number of different ways. For example, embryonic stem cells (ESCs) are derived from the inner cell mass of an embryo (Thomson et. al, Science. 1998 Nov 6;282(5391): 1145-7) whereas induced pluripotent stem cells (iPSCs) are derived from somatic cells (Takahashi et. al, Cell. 2007 Nov 30;131(5):861-72; Takahashi et. al, Nat Protoc. 2007;2(12):3081-9; Yu et. al, Science. 2007 Dec 21;318(5858): 1917-20. Epub 2007 Nov 20).
[0065] Because the term PSC refers to pluripotent stem cells regardless of their derivation, the term PSC encompasses the terms ESC and iPSC, as well as the term embryonic germ stem cells (EGSC), which are another example of a PSC. PSCs may be in the form of an established cell line, they may be obtained directly from primary embryonic tissue, or they may be derived from a somatic cell. PSCs can be target cells of the methods described herein.
[0066] By“embryonic stem cell” (ESC) is meant a PSC that was isolated from an embryo, such as from the inner cell mass of the blastocyst. ESC lines are listed in the N1H Eluman Embryonic Stem Cell Registry, e.g., hESBGN-01, hESBGN-02, hESBGN-03, hESBGN-04 (BresaGen, Inc.); HES-1, HES-2, HES-3, HES-4, HES-5, HES-6 (ES Cell International); Miz- hESl (MizMedi Hospital-Seoul National University); HSF-1, HSF-6 (University of California at San Francisco); and HI, H7, H9, H13, H14 (Wisconsin Alumni Research Foundation (WiCell Research Institute)). Stem cells of interest also include embryonic stem cells from other primates, such as Rhesus stem cells and marmoset stem cells. The stem cells may be obtained from any mammalian species, e.g., human, equine, bovine, porcine, canine, feline, rodent, e.g., mice, rats hamster, primate, etc. (Thomson et al. (1998) Science 282: 1145; Thomson et al. (1995) Proc. Natl. Acad. Sci. USA 92:7844; Thomson et al. (1996) Biol. Reprod. 55:254;
Shamblott et al., Proc. Natl. Acad. Sci. USA 95: 13726, 1998). In culture, ESCs generally grow as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nucleoli. In addition, ESCs express SSEA-3, SSEA-4, TRA-1-60, TRA-1-81, and Alkaline Phosphatase, but not SSEA-1. Examples of methods of generating and characterizing ESCs may be found in, for example, US Patent No. 7,029,913, US Patent No. 5,843,780, and US Patent No. 6,200,806, the disclosures of which are incorporated herein by reference. Methods for proliferating hESCs in the undifferentiated form are described in WO 99/20741, WO 01/51616, and WO 03/020920. By “embryonic germ stem cell” (EGSC) or“embryonic germ cell” or“EG cell” is meant a PSC that is derived from germ cells and/or germ cell progenitors, e.g., primordial germ cells, e.g, those that would become sperm and eggs. Embryonic germ cells (EG cells) are thought to have properties similar to embryonic stem cells as described above. Examples of methods of generating and characterizing EG cells may be found in, for example, US Patent No. 7,153,684; Matsui, Y., et al., (1992) Cell 70:841; Shamblott, M., et al. (2001) Proc. Natl. Acad. Sci. USA 98: 113; Shamblott, M., et al. (1998) Proc. Natl. Acad. Sci. USA, 95: 13726; and Koshimizu, U., et al. (1996) Development, 122: 1235, the disclosures of which are incorporated herein by reference.
[0067] By“induced pluripotent stem cell” or“iPSC” it is meant a PSC that is derived from a cell that is not a PSC (e.g, from a cell this is differentiated relative to a PSC). iPSCs can be derived from multiple different cell types, including terminally differentiated cells. iPSCs have an ES cell-like morphology, growing as flat colonies with large nucleo-cytoplasmic ratios, defined borders and prominent nuclei. In addition, iPSCs express one or more key pluripotency markers known by one of ordinary' skill in the art, including but not limited to Alkaline
Phosphatase, SSEA3, SSEA4, Sox2, Oct3/4, Nanog, TRA160, TRA181, TDGF 1, Dnmt3b, Fox03, GDF3, Cyp26al, TERT, and zfp42.
[0068] Examples of methods of generating and characterizing iPSCs may be found in, for example, US Patent Publication Nos. US20090047263, US20090068742, US20090191159, US20090227032, US20090246875, and US20090304646, the disclosures of which are incorporated herein by reference. Generally, to generate iPSCs, somatic cells are provided with reprogramming factors ( e.g ., Oct4, SOX2, KLF4, MYC, Nanog, Lin28, etc.) known in the art to reprogram the somatic cells to become pluripotent stem cells.
[0069] By“somatic cell” it is meant any cell in an organism that, in the absence of experimental manipulation, does not ordinarily give rise to all types of cells in an organism. In other words, somatic cells are cells that have differentiated sufficiently that they will not naturally generate cells of all three germ layers of the body, e.g., ectoderm, mesoderm and endoderm. For example, somatic cells would include both neurons and neural progenitors, the latter of which may be able to naturally give rise to all or some cell types of the central nervous system but cannot give rise to cells of the mesoderm or endoderm lineages.
[0070] By“mitotic cell” it is meant a cell undergoing mitosis. Mitosis is the process by which a eukaryotic cell separates the chromosomes in its nucleus into two identical sets in two separate nuclei. It is generally followed immediately by cytokinesis, which divides the nuclei, cytoplasm, organelles and cell membrane into two cells containing roughly equal shares of these cellular components.
[0071] By“post-mitotic cell” is meant a cell that has exited from mitosis (is in Go), e.g. , the cell is“quiescent,” e.g., it is no longer undergoing cell division. This quiescent state may be temporary, e.g., reversible, or it may be permanent.
[0072] The terms“treatment”,“treating” and the like are used herein to generally mean obtaining a desired pharmacologic and/or physiologic effect. The effect may be prophylactic in terms of completely or partially preventing a disease or symptom thereof and/or may be therapeutic in terms of a partial or complete cure for a disease and/or adverse effect attributable to the disease.“Treatment” as used herein covers any treatment of a disease or symptom in a mammal, and includes: (a) preventing the disease or symptom from occurring in a subject which may be predisposed to acquiring the disease or symptom but has not yet been diagnosed as having it; (b) inhibiting the disease or symptom, e.g., arresting its development; (c) relieving the disease, e.g., causing regression of the disease, or reducing the risk of disease or a symptom of a disease. The therapeutic agent may be administered before, during, or after the onset of disease or injury. The treatment of ongoing disease, where the treatment stabilizes or reduces the undesirable clinical symptoms of the subject, is of particular interest. Such treatment is desirably performed prior to complete loss of function in affected tissues. In some cases, therapy is administered to a subject having at least on disease symptom. In some cases the treatment is administered after the subject is not experiencing one or more symptoms of the disease.
[0073] The terms“individual,”“subject,”“host,” and“patient,” are used interchangeably herein and refer to any mammalian subject for whom diagnosis, treatment, or therapy is desired, particularly humans.
[0074] General methods in molecular and cellular biochemistry can be found in such standard textbooks as Molecular Cloning: A Laboratory Manual, 3rd Ed. (Sambrook et al., Cold Spring Harbor Laboratory Press 2001); Greenberg and Sambrook. Molecular Cloning: A Laboratory Manual, Fourth Edition, 2012, Cold Spring Harbor Laboratory Press; Short Protocols in
Molecular Biology, 4th Ed. (Ausubel et al. eds., lohn Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995);
Immunology Methods Manual (1. Lefkovits ed., Academic Press 1997); and Cell and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, lohn Wiley & Sons 1998), the disclosures of which are incorporated herein by reference.
[0075] Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the disclosure. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, and are also encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure. [0076] Certain ranges are presented herein with numerical values being preceded by the term “about.” The term“about” is used herein to provide literal support for the exact number that it precedes, as well as a number that is near to or approximately the number that the term precedes. In determining whether a number is near to or approximately a specifically recited number, the near or approximating unrecited number may be a number which, in the context in which it is presented, provides the substantial equivalent of the specifically recited number.
[0077] It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub combination. All combinations of the embodiments pertaining to the disclosure are specifically embraced by the present disclosure and are disclosed herein just as if each and every
combination was individually and explicitly disclosed. In addition, all sub-combinations of the various embodiments and elements thereof are also specifically embraced by the present disclosure and are disclosed herein just as if each and every such sub- combination was individually and explicitly disclosed herein.
BEFP DOMAINS
AP-binding domain:
[0078] The AP-binding domain of a BEFP can comprise any domain with covalent binding activity at an AP site, for example, an SOS response-associated peptidase (SRAP) domain, such as the SRAP domain of 5-hydroxymethylcytosine (5hmC) binding, ESC-specific (HMCES) (see Mohni et al. 2019, Cell 176, 144-153).
[0079] In some embodiments, the AP-binding domain comprises an SRAP domain from any of the proteins identified in Table 1 with Uniprot ID numbers (https://www.uniprot.org/, as accessed on January 18, 2019), or functional fragments and/or derivatives thereof.
Table 1:
Figure imgf000026_0001
Figure imgf000027_0001
[0080] In some embodiments, the AP-binding domain comprises an AP -binding domain from HMCES. In some embodiments, the HMCES is a human HMCES. In some embodiments, the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5. In some embodiments, the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ED NO: 5.
[0081] In some embodiments, the AP-binding domain comprises an AP -binding domain from YedK. In some embodiments, the YedK is an Escherichia coli YedK. In some embodiments, the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 6 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 6. In some embodiments, the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 6 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 6.
Nucleic acid recognition domain:
[0082] A nucleic acid recognition domain according to the invention can specifically bind to a target nucleotide sequence in a selected double-stranded DNA. In an embodiment of the invention, the nucleic acid recognition domain is from an RNA-programmable CRISPR- associated nuclease (for example, a CRISPR class 2 type II (Cas9) or CRISPR class 2 type V (Casl2a and Casl2b) nuclease) or a variant thereof, and in a complex with a guide RNA (gRNA) is capable of targeting the BEFP to a target nucleotide sequence in a DNA molecule (Stella et al. Nature Structural Biology, 24 (11), pp. 882-892). In some embodiments, the nucleic acid recognition domain is from a Cas9 protein or a variant thereof. In some embodiments, the nucleic acid recognition domain is from a Cas9 protein or a variant thereof and comprises two domains associated with nuclease activity, most commonly denoted as (i) a RuvC domain and (ii) an HNH domain. In some embodiments, the nuclease activity of the RuvC domain and/or the HNH domain is attenuated (e.g. , inactivated), such as by introducing appropriate mutations (Jinek et al. Science. 2012 Aug 17; 337(6096): 816-821). In some embodiments, the nucleic acid recognition domain is a derivative of a Cas9 protein containing an inactivating mutation in only one of the two nuclease domains, resulting in a nickase Cas9 (nCas9), which cleaves only one of the two strands of the target DNA. In some embodiments, the nucleic acid recognition domain is a derivative of a Cas9 protein containing inactivating mutations in both of the nuclease domains, resulting in a nuclease-dead Cas9 (dCas9).
[0083] In some embodiments, the nCas9 is only able to cleave the DNA strand that is contacted through base pairing with bases of the gRNA. This is generally achieved by introducing one or more nuclease-inactivating mutations into the RuvC nuclease domain, which naturally cleaves the DNA strand not contacted by the gRNA. One example of such a mutation is D10A in SluCas9. The SluCas9 domains of SEQ ID NOs: 2 and 4 include such D10A mutations.
[0084] In some embodiments, the nCas9 is only able to cleave the DNA strand that is not contacted through base pairing with bases of the gRNA. This is generally achieved by introducing one or more nuclease inactivating mutations in the HNH nuclease domain, which naturally cleaves the DNA strand contacted by the gRNA. One example of such a mutation is H559A in SluCas9 (SEQ ID NO: 7).
[0085] In some embodiments, a nucleic acid recognition domain is from a SluCas9 ( e.g ., SEQ ID NO: 7) or variant thereof that can further comprise one or more modification(s) or mutation(s) that result in a SluCas9 with a significantly reduced or no detectable nuclease activity, e.g., including, but not limited to, i) a modification or mutation at position 10 with respect to SEQ ID NO: 7 leading to a significantly reduced nuclease activity (e.g., a D10A mutation), ii) a modification or mutation at position 559 with respect to SEQ ID NO: 7 leading to a significantly reduced nuclease activity (e.g., an H559A mutation), and iii) a modification or mutation at position 582 with respect to SEQ ID NO: 7 leading to a significantly reduced nuclease activity (e.g, a N582A mutation).
[0086] Used in the context of an enzymatic activity,“significantly reduced” means that such enzymatic activity is lower than 10% of the activity of the reference protein (e.g., SEQ ID NO:
7), for example, lower than 5%, lower than 2%, lower than 1%, or lower than 0.1% of such reference enzymatic activity.
[0087] In some embodiments, the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7.
[0088] In some embodiments, the nucleic acid recognition domain is from a SluCas9 polypeptide comprising any contiguous sequence of 265 amino acids from position 789 to position 1053 of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to any contiguous sequence of 265 amino acids from position 789 to position 1053 of SEQ ID NO: 7.
[0089] In some embodiments, the nucleic acid recognition domain is a zinc finger protein, for example:
an engineered or naturally-occurring zinc finger protein with specific binding activity for a target nucleotide sequence of the DNA molecule, as for example described in Choo et al, Nature. 1994 Dec 15;372(6507):642-645; or
an engineered zinc finger nickase (ZFNickases), in which one monomer of a zinc finger nuclease dimer comprises a Fokl cleavage domain that had its nuclease activity inactivated by one or more introduced mutations, as for example described in Kim et al. Genome Res. 2012 Jul;22(7): 1327-33. doi: 10.1101/gr.138792.112. Epub 2012 Apr 20.
[0090] In some embodiments, the nucleic acid recognition domain is a TALEN protein, for example:
an engineered TAL effector protein with specific binding activity for a target nucleotide sequence of the DNA molecule, as for example described in Moscou, M. J., & Bogdanove, A. J. (2009). Science, 326(5959): 1501 ; or
an engineered TAL effector nickase (TALENickases), in which one monomer of a TALE nuclease dimer comprises a Fokl cleavage domain that had its nuclease activity inactivated by one or more introduced mutations, as for example described in Biochem Biophys Res Commun. 2014 Mar 28;446(l):261-6. Cytidine deaminase domain:
[0091] The cytidine deaminase domain according to the invention includes any protein or domain that is able to convert a cytidine base within a nucleic acid to a uracil.
[0092] In some embodiments, the cytidine deaminase domain is from an APOBEC deaminase (Trends Biochem Sci. 2016 Jul; 41(7): 578-594).
[0093] In some cases, the cytidine deaminase domain is from a deaminase including, without limitation, APOBEC 1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H. APOBEC4, and an activation-induced deaminase (AID).
[0094] The deaminase may be from any suitable organism ( e.g ., a human or a rat). In some embodiments, the deaminase is from a human, non-human primate (e.g., chimpanzee, gorilla, orangutan, or monkey), cow, pig, dog, rat, or mouse. In some embodiments, the deaminase is a rat APOBEC 1. In some embodiments, the deaminase is a human APOBEC 1. In some embodiments, the deaminase is pmCDAl.
EXEMPLARY NUCLEIC ACID AND PROTEIN SEQUENCES
Figure imgf000031_0001
Figure imgf000032_0001
[0095] In some embodiments, the BEFP comprises, from N-terminus to C-terminus, an AP- binding domain, a cytidine deaminase domain, and a nucleic acid recognition domain. In some embodiments, the BEFP further comprises an NLS sequence, for example, C-terminal to the nucleic acid recognition domain. In some embodiments, the BEFP further comprises a linker, such as a peptide linker, between any of the domains contained therein. In some embodiments, the AP-binding domain is from HMCES ( e.g ., human HMCES, such as SEQ ID NO: 5) or YedK ie.g., E. coli YedK, such as SEQ ID NO: 6), or a variant thereof. In some embodiments, the AP- binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6 In some embodiments, the AP -binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6. In some embodiments, the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain comprises a D10A mutation, an H559A mutation, and/or a N582A mutation, with respect to SEQ ID NO: 7. In some embodiments, the cytidine deaminase domain is from an APOBEC deaminase. In some embodiments, the cytidine deaminase domain is from a deaminase including, without limitation, APOBEC 1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G, APOBEC3H APOBEC4, and an activation-induced deaminase (AID).
[0096] In some embodiments, the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 2 or SEQ ID NO: 4. In some embodiments, the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4.
[0097] In some embodiments, a nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3, or a variant nucleotide sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%,
94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ED NO: 1 or SEQ ID NO: 3. In some embodiments, the nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3.
Systems for Genome Editing
[0098] Provided herein are systems for genome editing in a cell to modulate the expression, function, and/or activity of a protein-of-interest (POI), such as by targeted modification of a nucleic acid ( e.g ., conversion of a cytidine to a thymidine) encoding the POI or a derivative thereof in the genome of the cell. In some embodiments, the POI is a protein associated with a disorder or health condition. Also provided are, inter alia, systems for treating a subject having or suspected of having a disorder or health condition associated with a POI, employing ex vivo and/or in vivo genome editing.
[0099] In some embodiments, provided herein is a system comprises (i) a BEFP comprising:
(a) an AP-binding domain, (b) a cytidine deaminase domain, and (c) a nucleic acid recognition domain, or nucleic acid encoding the BEFP; and (ii) a guide RNA (gRNA) or nucleic acid encoding the gRNA, wherein the gRNA is capable of guiding the BEFP to a target sequence in a nucleic acid.
[0100] In some embodiments, according to any of the systems described herein, the AP- binding domain comprises an SRAP domain, e.g., the SRAP domain of an HMCES or YedK protein. In some embodiments, the AP-binding domain comprises an SRAP domain from a protein identified in Table 1 or a functional derivative thereof. In some embodiments, the AP- binding domain is from HMCES (e.g., human HMCES, such as SEQ ID NO: 5) or YedK (e.g,
E. coli YedK, such as SEQ ID NO: 6), or a variant thereof. In some embodiments, the AP- binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6 In some embodiments, the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6.
[0101] In some embodiments, according to any of the systems described herein, the nucleic acid recognition domain is from an RNA-programmable CRISPR-associated nuclease or variant thereof. In some embodiments, the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain comprises a D10A mutation, an H559A mutation, and/or a N582A mutation, with respect to SEQ ID NO: 7.
[0102] In some embodiments, according to any of the systems described herein, the cytidine deaminase domain is from an APOBEC deaminase. In some embodiments, the cytidine deaminase domain is from a deaminase including, without limitation, APOBEC 1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G,
APOBEC3H. APOBEC4, and an activation-induced deaminase (AID).
[0103] In some embodiments, according to any of the systems described herein, the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 2 or SEQ ID NO: 4. In some embodiments, the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4.
[0104] In some embodiments, according to any of the systems described herein, a nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3, or a variant nucleotide sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 1 or SEQ ID NO: 3. In some embodiments, the nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3.
HOST CELLS
[0105] The methods of the disclosure may be employed to induce DNA modification in mitotic or post-mitotic cells in vivo , and/or ex vivo, and/or in vitro (e.g., to produce genetically modified cells that can be reintroduced into an individual).
[0106] Because the guide RNA provides specificity by hybridizing to target DNA, a mitotic and/or post-mitotic cell can be any of a variety of host cell, where suitable host cells include, but are not limited to, a bacterial cell; an archaeal cell; a single-celled eukaryotic organism; a plant cell; an algal cell, e.g., Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens (C. Agardh), and the like; a fungal cell; an animal cell; a cell from an invertebrate animal (e.g, an insect, a cnidarian, an echinoderm, a nematode, etc ); an eukaryotic parasite (e.g., a malarial parasite, e.g, Plasmodium fakiparum; a helminth; etc.); a cell from a vertebrate animal (e.g, fish, amphibian, reptile, bird, mammal), a mammalian cell, e.g, a rodent cell, a human cell, a non-human primate cell, etc. In some embodiments, the host cell can be any human cell. Suitable host cells include naturally occurring cells; genetically modified cells (e.g, cells genetically modified in a laboratory, e.g, by the “hand of man”); and cells manipulated in vitro in any way. In some embodiments, a host cell is isolated.
[0107] Any type of cell may be of interest (e.g. , a stem cell, e.g. , an embryonic stem (ES) cell, an induced pluripotent stem (iPS) cell, a germ cell; a somatic cell, e.g., a fibroblast, a hematopoietic cell, a neuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivo embryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell, 4-cell, 8-cell, etc. stage zebrafish embryo; etc.). Cells may be from established cell lines or they may be primary cells, where“primary cells”,“primary cell lines”, and“primary cultures” are used
interchangeably herein to refer to cells and cells cultures that have been derived from a subject and allowed to grow in vitro for a limited number of passages, e.g, splitting, of the culture. For example, primary cultures include cultures that may have been passaged 0 times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but not enough times go through the crisis stage. Primary cell lines can be are maintained for fewer than 10 passages in vitro. Target cells are, in some embodiments, unicellular organisms, or are grown in culture.
[0108] If the cells are primary cells, such cells may be harvested from an individual by any method. For example, leukocytes may be harvested by apheresis, leukocytapheresis, density gradient separation, etc., while cells from tissues such as skin, muscle, bone marrow, spleen, liver, pancreas, lung, intestine, stomach, etc. may be harvested by biopsy. An appropriate solution may be used for dispersion or suspension of the harvested cells. Such solution will generally be a balanced salt solution, e.g., normal saline, phosphate-buffered saline (PBS), Hank's balanced salt solution, etc., supplemented with fetal calf serum or other naturally occurring factors, in conjunction with an acceptable buffer at low concentration, e.g., from 5-25 mM. Useful buffers include HEPES, phosphate buffers, lactate buffers, etc. The cells may be used immediately, or they may be stored, e.g, frozen, for long periods of time, being thawed and capable of being reused. In such cases, the cells may be frozen in 10% dimethyl sulfoxide (DMSO), 50% serum, 40% buffered medium, or some other such solution as is commonly used in the art to preserve cells at such freezing temperatures, and thawed in a manner as commonly known in the art for thawing frozen cultured cells.
[0109] In some embodiments, the BEFP system herein described can be used in eukaryotic, such as mammalian cells, for example, a human cell. Any human cell is suitable for use with the BEFP system disclosed herein.
COMPOSITIONS
[0110] In some embodiments, the BEFP system components of the present disclosure can be formulated into compositions (e.g, pharmaceutical compositions) by combination with appropriate carriers or diluents (e.g, pharmaceutically acceptable carriers or diluents). In some embodiments, the composition is a pharmaceutical composition. In some embodiments, the BEFP system components include a BEFP or a nucleic acid encoding the BEFP and/or a gRNA or nucleic acid encoding the gRNA as described herein.
[0111] The components used to formulate the pharmaceutical compositions are generally of high purity and are substantially free of potentially harmful contaminants (e.g, at least National Food (NF) grade, at least analytical grade, or at least pharmaceutical grade). Moreover, compositions intended for in vivo use are generally sterile. To the extent that a given compound must be synthesized prior to use, the resulting product is generally substantially free of any potentially toxic agents, particularly any endotoxins, which may be present during the synthesis or purification process. Compositions for parental administration are also sterile, substantially isotonic and made under GMP conditions.
[0112] In some embodiments, the BEFP present in a composition is at least about 75% (such as at least about any of 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) pure, where“% pure” means that the BEFP is the recited percent free from other proteins, other macromolecules, or contaminants that may be present during the production of the BEFP.
[0113] In some embodiments, provided herein are pharmaceutical preparations or
compositions comprising components of a BEFP system including (i) a guide RNA or nucleic acid encoding the gRNA; and/or (ii) a BEFP or nucleic acid encoding the BEFP; wherein the BEFP system components are present in a pharmaceutically acceptable vehicle.
[0114] “Pharmaceutically acceptable vehicles” may be vehicles approved by a regulatory agency of the Federal or a state government or listed in the US Pharmacopeia or other generally recognized pharmacopeia for use in mammals, such as humans. The term“vehicle” refers to a diluent, adjuvant, excipient, or carrier with which a compound of the disclosure is formulated for administration to a mammal. Such pharmaceutical vehicles can be lipids, e.g., liposomes, e.g, liposome dendrimers; liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as peanut oil, soybean oil, mineral oil, sesame oil and the like, saline; gum acacia, gelatin, starch paste, talc, keratin, colloidal silica, urea, and the like. In addition, auxiliary, stabilizing, thickening, lubricating and coloring agents may be used.
Pharmaceutical compositions may be formulated into preparations in solid, semisolid, liquid or gaseous forms, such as tablets, capsules, powders, granules, ointments, solutions, suppositories, injections, inhalants, gels, microspheres, and aerosols. As such, administration of the BEFP system components can be achieved in various ways, including oral, buccal, rectal, parenteral, intraperitoneal, intradermal, transdermal, intra-tracheal, intraocular, etc., administration. The active agent may be systemic after administration or may be localized by the use of regional administration, intramural administration, or use of an implant that acts to retain the active dose at the site of implantation. The active agent may be formulated for immediate activity or it may be formulated for sustained release.
[0115] For some conditions, particularly central nervous system conditions, it may be necessary to formulate agents to cross the blood-brain barrier (BBB). One strategy for drug delivery through the BBB entails disruption of the BBB, either by osmotic means such as mannitol or leukotrienes, or biochemically by the use of vasoactive substances such as bradykinin. The potential for using BBB opening to target-specific agents to brain tumors is also an option. A BBB disrupting agent can be co- administered with the therapeutic compositions of the disclosure when the compositions are administered by intravascular injection. Other strategies to go through the BBB may entail the use of endogenous transport systems, including Caveolin-1 mediated transcytosis, carrier-mediated transporters such as glucose and amino acid carriers, receptor-mediated transcytosis for insulin or transferrin, and active efflux transporters such asp-glycoprotein. Active transport moieties may also be conjugated to the therapeutic compounds for use in the disclosure to facilitate transport across the endothelial wall of the blood vessel. In addition or alternatively, drug delivery of therapeutics agents behind the BBB may be by local delivery, for example by intrathecal delivery, e.g., through an Ommaya reservoir (see, e.g, US Patent Nos. 5,222,982 and 5385582, incorporated herein by reference); by bolus injection, e.g, by a syringe, e.g, intravitreally or intracranially; by continuous infusion, e.g, by cannulation, e.g., with convection (see, e.g, US Application No. 20070254842, incorporated herein by reference); or by implanting a device upon which the agent has been reversibly affixed (see, e.g., US Application Nos. 20080081064 and 20090196903, incorporated herein by reference).
[0116] In some embodiments, the composition is stored in unit or multi-dose containers, for example, sealed ampules or vials, as an aqueous solution or as a lyophilized formulation for reconstitution. As an example of a lyophilized formulation, 10 ml vials are filled with 5 ml of sterile-filtered 1% (w/v) aqueous solution of compound, and the resulting mixture is lyophilized. The infusion solution is prepared by reconstituting the lyophilized compound using bacteriostatic Water-for-Inj ection.
[0117] Pharmaceutical compositions can include, depending on the formulation desired, pharmaceutically acceptable, non-toxic carriers of diluents, which are defined as vehicles commonly used to formulate pharmaceutical compositions for animal or human administration. The diluent is selected so as not to affect the biological activity of the combination. Examples of such diluents are distilled water, buffered water, physiological saline, phosphate buffered saline (PBS), Ringer's solution, dextrose solution, and Hank's solution. In addition, the pharmaceutical composition or formulation can include other carriers, adjuvants, or non-toxic, nontherapeutic, non-immunogenic stabilizers, excipients and the like. The compositions can also include additional substances to approximate physiological conditions, such as pH adjusting and buffering agents, toxicity adjusting agents, wetting agents and detergents.
[0118] The composition can also include any of a variety of stabilizing agents, such as an antioxidant for example. When the pharmaceutical composition includes a polypeptide, the polypeptide can be complexed with various well-known compounds that enhance the in vivo stability of the polypeptide, or otherwise enhance its pharmacological properties ( e.g ., increase the half-life of the polypeptide, reduce its toxicity, and enhance solubility or uptake). Examples of such modifications or complexing agents include sulfate, gluconate, citrate, and phosphate.
The nucleic acids or polypeptides of a composition can also be complexed with molecules that enhance their in vivo attributes. Such molecules include, for example, carbohydrates, polyamines, amino acids, other peptides, ions (e.g., sodium, potassium, calcium, magnesium, manganese), and lipids.
[0119] Further guidance regarding formulations that are suitable for various types of administration can be found in Remington's Pharmaceutical Sciences, Mace Publishing
Company, Philadelphia, Pa., 17th ed. (1985). For a brief review of methods for drug delivery, see, Langer, Science 249: 1527-1533 (1990).
KITS
[0120] In some embodiments, provided herein are kits for carrying out a method described herein. A kit can include one or more of: a BEFP or nucleic acid encoding the BEFP; and a gRNA or nucleic acid encoding the gRNA. A kit may include a complex that includes two or more of: a BEFP; a nucleic acid encoding a BEFP; a guide RNA; a nucleic acid encoding a guide RNA.
[0121] In some embodiments, a kit includes: (a) a BEFP or nucleic acid encoding the BEFP; and (b) a gRNA or nucleic acid encoding the gRNA, wherein the gRNA is capable of guiding the BEFP to a target nucleic acid sequence. In some embodiments, the kit comprises the BEFP. In some embodiments, the kit comprises nucleic acid encoding the BEFP. In some embodiments, the kit comprises the gRNA. In some embodiments, the kit comprises nucleic acid encoding the gRNA. In some embodiments, the kit further comprises one or more additional gRNAs or nucleic acid encoding the one or more additional gRNAs. In some embodiments, the kit further comprises one or more additional reagents, where such additional reagents can be selected from: a buffer for introducing the BEFP into a cell; a wash buffer; a control reagent; a control expression vector or polyribonucleotide; a reagent for in vitro production of the BEFP from DNA, and the like.
[0122] In some embodiments of any of the kits described herein, a gRNA (including, e.g., two or more guide RNAs) can be provided as an array (e.g., an array of RNA molecules, an array of DNA molecules encoding the guide RNA(s), etc.). Such kits can be useful, for example, for use in any of the methods described herein.
[0123] Components of a kit can be in separate containers; or can be combined in a single container.
[0124] Any of the kits described herein can further include one or more additional reagents, where such additional reagents can be selected from: a dilution buffer; a reconstitution solution; a wash buffer; a control reagent; a control expression vector or Polyribonucleotide; a reagent for in vitro production of the BEFP from DNA, and the like.
[0125] In addition to above-mentioned components, a kit can further include instructions for using the components of the kit to practice the methods. The instructions for practicing the methods are generally recorded on a suitable recording medium. For example, the instructions may be printed on a substrate, such as paper or plastic, etc. As such, the instructions may be present in the kits as a package insert, in the labeling of the container of the kit or components thereof (e.g, associated with the packaging or subpackaging) etc. In some embodiments, the instructions are present as an electronic storage data file present on a suitable computer readable storage medium, e.g., CD-ROM, diskette, flash drive, etc. In yet other embodiments, the actual instructions are not present in the kit, but means for obtaining the instructions from a remote source, e.g, via the internet, are provided. An example of this embodiment is a kit that includes a web address where the instructions can be viewed and/or from which the instructions can be downloaded. As with the instructions, this means for obtaining the instructions is recorded on a suitable substrate. METHODS OF THE DISCLOSURE
Methods of Editing a Genome
[0126] In some embodiments, provided herein are methods for modifying a target DNA and/or a polypeptide encoded by a target DNA. In some embodiments, the method involves providing (i) a BEFP or nucleic acid encoding the BEFP; and (ii) a gRNA or nucleic acid encoding the gRNA, wherein the gRNA is capable of guiding the BEFP to a target nucleic acid sequence, such that a complex (a“targeting complex”) comprising the BEFP and the gRNA is formed and comes in contact with the target DNA comprising a target nucleic acid sequence.
[0127] In some embodiments, according to any of the methods described herein employing a BEFP or nucleic acid encoding the BEFP, the BEFP comprises: (a) an AP -binding domain, (b) a cytidine deaminase domain, and (c) a nucleic acid recognition domain.
[0128] In some embodiments, according to any of the methods described herein, the AP- binding domain comprises an SRAP domain, e.g., the SRAP domain of an HMCES or YedK protein. In some embodiments, the AP-binding domain comprises an SRAP domain from a protein identified in Table 1 or a functional derivative thereof. In some embodiments, the AP- binding domain is from HMCES (e.g., human HMCES, such as SEQ ID NO: 5) or YedK (e.g.,
E. coli YedK, such as SEQ ID NO: 6), or a variant thereof. In some embodiments, the AP- binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6 In some embodiments, the AP-binding domain comprises an AP-binding domain from the amino acid sequence of SEQ ID NO: 5 or 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 5 or 6. In some embodiments, the AP-binding domain comprises the amino acid sequence of SEQ ID NO: 5 or 6.
[0129] In some embodiments, according to any of the methods described herein, the nucleic acid recognition domain is from an RNA-programmable CRISPR-associated nuclease or variant thereof. In some embodiments, the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 In some embodiments, the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain comprises a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7. In some embodiments, the nucleic acid recognition domain comprises a D10A mutation, an H559A mutation, and/or a N582A mutation, with respect to SEQ ID NO: 7
[0130] In some embodiments, according to any of the methods described herein, the cytidine deaminase domain is from an APOBEC deaminase. In some embodiments, the cytidine deaminase domain is from a deaminase including, without limitation, APOBEC 1, APOBEC2, APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D, APOBEC3F, APOBEC3G,
APOBEC3H. APOBEC4, and an activation-induced deaminase (AID).
[0131] In some embodiments, according to any of the methods described herein, the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4, or a variant amino acid sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%. 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 2 or SEQ ID NO: 4. In some embodiments, the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or SEQ ID NO: 4.
[0132] In some embodiments, according to any of the methods described herein, a nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3, or a variant nucleotide sequence thereof having at least about 85% (such as at least about any of 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or greater) sequence identity to SEQ ID NO: 1 or SEQ ID NO: 3. In some embodiments, the nucleic acid encoding the BEFP comprises the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 3.
[0133] In some embodiments, provided herein is a method of targeting, editing, modifying, or manipulating a target DNA at one or more locations in a cell or in vitro environment, comprising introducing into the cell or in vitro environment (a) a BEFP or nucleic acid encoding the BEFP; and (b) a gRNA or nucleic acid encoding the gRNA, wherein the gRNA is capable of guiding the BEFP to a target nucleic acid sequence in the target DNA. In some embodiments, the method comprises introducing into the cell or in vitro environment the BEFP. In some embodiments, the method comprises introducing into the cell or in vitro environment nucleic acid encoding the BEFP. In some embodiments, the method comprises introducing into the cell or in vitro environment the gRNA. In some embodiments, the method comprises introducing into the cell or in vitro environment nucleic acid encoding the gRNA. In some embodiments, the gRNA is a single guide RNA (sgRNA). In some embodiments, the method comprises introducing into the cell or in vitro environment one or more additional gRNAs or nucleic acid encoding the one or more additional gRNAs targeting the target DNA.
[0134] In another aspect, provided herein is a method for modifying a targeted site of a double- stranded DNA, the method comprising
A. providing a BEFP comprising
a. an AP -binding domain;
b. a cytidine deaminase domain; and
c. a nucleic acid recognition domain;
B. contacting the double-stranded DNA with the BEFP, thereby converting one or more nucleotides in the targeted site to different one or more nucleotides. In some embodiments, the BEFP is a BEFP according to any of the embodiments described herein. In some embodiments, the nucleic acid recognition domain has a nickase activity capable of cleaving only one strand of the double-stranded DNA.
[0135] A gRNA or sgRNA and a BEFP may form a ribonucleoprotein (RNP) complex. The guide RNA provides target specificity to the RNP complex by including a nucleotide sequence that is complementary to a sequence of a target DNA. The BEFP of the RNP complex provides the nucleobase-editing activity. In some embodiments, the RNP complex modifies a target DNA, leading to, for example, conversion of a cytidine base within the target DNA to a thymidine. The target DNA may be, for example, naked (e.g., unbound by DNA associated proteins) DNA in vitro, chromosomal DNA in cells in vitro, chromosomal DNA in cells in vivo, etc.
[0136] In some embodiments, the methods described herein employ a BEFP further comprising one or more additional heterologous sequences. In some embodiments, a heterologous sequence can provide for subcellular localization of the BEFP (e.g., a nuclear localization signal (NLS) for targeting to the nucleus; a mitochondrial localization signal for targeting to the mitochondria; a chloroplast localization signal for targeting to a chloroplast; an ER retention signal; and the like). In some embodiments, a heterologous sequence can provide a tag for ease of tracking or purification (e.g., a fluorescent protein, e.g., green fluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and the like; a histidine tag, e.g., a 6XHis tag; a hemagglutinin (HA) tag; a FLAG tag; a Myc tag; and the like). In some embodiments, the heterologous sequence can provide for increased or decreased stability.
[0137] In some embodiments, multiple guide RNAs are used to simultaneously modify different locations on the same target DNA or on different target DNAs. In some embodiments, two or more guide RNAs target the same gene or transcript or locus. In some embodiments, two or more guide RNAs target different unrelated loci. In some embodiments, two or more guide RNAs target different, but related loci. In some embodiments, the BEFP is provided directly as a protein. A BEFP can be introduced into a cell (provided to the cell) by any method; such methods are known to those of ordinary skill in the art.
Methods of Use
[0138] A method for DNA modification or base editing according to the present disclosure finds use in a variety of applications, which are also provided. Applications include research applications; diagnostic applications; industrial applications; and therapeutic applications.
Methods of Treating a Disease or Condition
[0139] In some aspects of the disclosure, the guide RNA and/or BEFP are employed to modify cellular DNA in vivo , for purposes such as gene therapy, e.g, to treat a disease or as an antiviral, antipathogenic, or anticancer therapeutic, for the production of genetically modified organisms in agriculture, or for biological research. In some of these in vivo embodiments, components of a BEFP system including (i) a guide RNA or nucleic acid encoding the gRNA; and/or (ii) a BEFP or nucleic acid encoding the BEFP are administered to a subject. Administration may be by any well-known method in the art for the administration of peptides, small molecules and nucleic acids to a subject. The BEFP system components can be incorporated into a variety of formulations. [0140] Generally, an effective amount of components of a BEFP system including (i) a guide RNA or nucleic acid encoding the gRNA; and/or (ii) a BEFP or nucleic acid encoding the BEFP are provided. The final amount to be administered will be dependent upon the route of administration and upon the nature of the disorder or condition that is to be treated. The effective amount given to a particular subject will depend on a variety of factors, several of which will differ from subject to subject.
[0141] For inclusion in a medicament, the BEFP system components may be obtained from a suitable commercial source. In general, the total pharmaceutically effective amount of the BEFP system components administered parenterally per dose will be in a range that can be measured by a dose response curve.
[0142] Therapies based on the BEFP system components, e.g., preparations of (i) a guide RNA or nucleic acid encoding the gRNA; and/or (ii) a BEFP or nucleic acid encoding the BEFP to be used for therapeutic administration, must be sterile. Sterility is readily accomplished by filtration through a sterile filtration membrane (e.g., 0.2 micrometer membrane). Therapeutic
compositions generally are placed into a container having a sterile access port, for example, an intravenous solution bag or vial having a stopper pierceable by a hypodermic injection needle.
[0143] The data obtained from cell culture and/or animal studies can be used in formulating a range of dosages for humans. The dosage of the active ingredient generally lies within a range of circulating concentrations that include the ED50 with low toxicity. The dosage can vary within this range depending upon the dosage form employed and the route of administration utilized.
[0144] The pharmaceutical compositions can be administered for prophylactic and/or therapeutic treatments. Toxicity and therapeutic efficacy of the active ingredient can be determined according to standard pharmaceutical procedures in cell cultures and/or experimental animals, including, for example, determining the LD50 (the dose lethal to 0% of the population) and the ED50 (the dose therapeutically effective in 50% of the population). The dose ratio between toxic and therapeutic effects is the therapeutic index and it can be expressed as the ratio LD50/ED50. Therapies that exhibit large therapeutic indices are desirable
[0145] The number of administrations of treatment to a subject may vary. Introducing the pharmaceutical compositions into the subject may be a one-time event; but in certain situations, such treatment may elicit improvement for a limited period of time and require an on-going series of repeated treatments. In certain situations, multiple administrations of pharmaceutical compositions may be required before an effect is observed. The exact protocols may depend upon the disease or condition, the stage of the disease, and parameters of the individual subject being treated.
Examples
[0146] The activity of the BEFP according to the invention is assessed in the following nonlimiting example:
Example 1:
[0147] Base editor variants containing an AP -binding domain from HMCES (SEQ ID NO: 3) or YedK (SEQ ID NO: 1) are compared to AncBE4max as well as AncBE4max lacking both uracil glycosylation inhibitors (UGIs) in a mammalian cell transfection assay followed by next- generation amplicon sequencing.
[0148] AncBE4max (https://www.ncbi.nlm.nih.gov/pubmed/29813047, as retrieved on February 4, 2019) is a codon-optimized base editor comprising N- and C-terminal bipartite nuclear localization signals (bis-bpNLS), an engineered APOBEC1 (Anc689) obtained by ancestral reconstruction from 468 APOBEC homologs, an S. pyogenes Cas9 DIOA nickase (nCas9), two UGI moieties and connecting linker sequences (32 AA XTEN linker between Anc689 and nCas9; 10AA GS-rich linker between nCas9 and the first UGI; 10AA GS-rich linker between the first and the second UGI; 4 AA GS-rich linker between the second UGI and the C- terminal bpNLS).
[0149] Upon co-transfection of plasmid DNA encoding the variants and plasmid DNA encoding a single guide RNA for the target loci VEGF-A (SEQ ID NO: 8) and FANCF (SEQ ID NO: 9), the cells are incubated for several hours or days to allow for base editing to occur at the target loci. The cells are then harvested, the genomic DNA is extracted and amplicons are generated using loci-specific barcoded primers for each sample.
[0150] The rate of base editing at each locus is quantified from the sequencing reads using the wildtype sequence of the locus as a reference. For AncBE4max, a majority of edited bases are C- to-T conversion as expected. Lack of UGIs in AncBEmax increases the amount of undesired side products (i.e. non-C-to-T conversions and insertions or deletions). Compared to AncBE4max, the SRAP-BE variants, albeit having slightly decreased editing efficiency, display a low level of such side products. Thus, SRAP-BEs yield high product purity in base editing applications, circumventing the need to inhibit UNG through co-expression of/fusion to UGI.
SEQUENCE LISTING
Figure imgf000048_0001
Figure imgf000049_0001
Figure imgf000050_0001
Figure imgf000051_0001
Figure imgf000052_0001
Figure imgf000053_0001
Figure imgf000054_0001

Claims

Claims
1. A base-editing fusion protein (BEFP) comprising:
a) an AP-binding domain;
b) a cytidine deaminase domain; and
c) a nucleic acid recognition domain.
2. The BEFP of claim 1, wherein the AP-binding domain comprises an SOS response- associated peptidase (SRAP) domain.
3. The BEFP of claim 2, wherein the SRAP domain is from 5-hydroxymethylcytosine binding, ESC specific (HMCES) or YedK, or a variant thereof.
4. The BEFP of claim 2, wherein the AP-binding domain comprises an SRAP domain from the amino acid sequence of SEQ ID NO: 5 or 6, or a variant amino acid sequence thereof having at least about 85% sequence identity to SEQ ID NO: 5 or 6.
5. The BEFP of claim 4, wherein the AP-binding domain comprises an SRAP domain from the amino acid sequence of SEQ ID NO: 5 or 6.
6. The BEFP of any one of claims 1 to 5, wherein the cytidine deaminase domain is from a deaminase selected from the group consisting of: APOBEC2, APOBEC3, APOBEC3E, APOBEC3F, APOBEC3G, APOBEC3H, and APOBEC4.
7. The BEFP of any one of claims 1 to 6, wherein the nucleic acid recognition domain is from an RNA-programmable CRISPR-associated nuclease or a variant thereof.
8. The BEFP of claim 7, wherein the nucleic acid recognition domain is from a modified CRISPR-Cas9 protein that can cleave only one strand of the target DNA or has no endonuclease activity.
9. The BEFP of claim 7 or 8, wherein the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7 or a variant amino acid sequence thereof having at least about 85% sequence identity to SEQ ID NO: 7.
10. The BEFP of claim 9, wherein the nucleic acid recognition domain comprises a D10A mutation, an H559A mutation, and/or a N582A mutation, with respect to SEQ ID NO: 7.
11. The BEFP of claim 9, wherein the nucleic acid recognition domain is from a SluCas9 polypeptide comprising the amino acid sequence of SEQ ID NO: 7.
12. The BEFP of claim 1, wherein the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or 4, or a variant amino sequence having at least about 85% sequence identity to SEQ ID NO: 2 or 4.
13. The BEFP of claim 1, wherein the BEFP comprises the amino acid sequence of SEQ ID NO: 2 or 4.
14. A nucleic acid encoding the BEFP of any one of claims 1 to 13.
15. The nucleic acid of claim 14, comprising the nucleotide sequence of SEQ ID NO: 1 or 3, or a variant nucleotide sequence having at least about 85% sequence identity to SEQ ID NO: 1 or 3.
16. The nucleic acid of claim 15, comprising the nucleotide sequence of SEQ ID NO: 1 or 3.
17. A system comprising:
(i) the BEFP of any one of claims 1 to 13 or the nucleic acid of any one of claims 14 to 16; and
(ii) a guide RNA (gRNA) or nucleic acid encoding the gRNA.
18. A method of modifying a targeted site of a double-stranded DNA, the method comprising contacting the double-stranded DNA with:
(i) the BEFP of any one of claims 1 to 13 or the nucleic acid of any one of claims 14 to 16; and (ii) a guide RNA (gRNA) or nucleic acid encoding the gRNA.
19. The method of claim 18, wherein the double-stranded DNA encodes a protein-of-interest (POI) or derivative thereof.
20. The method of claim 18 or 19, wherein the double- stranded DNA is in a cell.
21. A genetically modified cell in which the genome of the cell is edited by the method of any one of claims 18 to 20.
22. A method of treating a disease or condition associated with a protein-of-interest (POI) in a subject, comprising providing to a cell in the subject:
(i) the BEFP of any one of claims 1 to 13 or the nucleic acid of any one of claims 14 to 16; and
(ii) a guide RNA (gRNA) or nucleic acid encoding the gRNA.
23. The method of claim 22, wherein the subject is a patient having or suspected of having the disease or condition or the subject is diagnosed with a risk of the disease or condition.
24. A kit comprising one or more elements of the system of claim 17, and further comprising instructions for use.
PCT/US2020/021388 2019-03-08 2020-03-06 Nucleobase-editing fusion protein systems, compositions, and uses thereof WO2020209959A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201962815999P 2019-03-08 2019-03-08
US62/815,999 2019-03-08

Publications (1)

Publication Number Publication Date
WO2020209959A1 true WO2020209959A1 (en) 2020-10-15

Family

ID=71948659

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/021388 WO2020209959A1 (en) 2019-03-08 2020-03-06 Nucleobase-editing fusion protein systems, compositions, and uses thereof

Country Status (1)

Country Link
WO (1) WO2020209959A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10894812B1 (en) 2020-09-30 2021-01-19 Alpine Roads, Inc. Recombinant milk proteins
US10947552B1 (en) 2020-09-30 2021-03-16 Alpine Roads, Inc. Recombinant fusion proteins for producing milk proteins in plants
US11840717B2 (en) 2020-09-30 2023-12-12 Nobell Foods, Inc. Host cells comprising a recombinant casein protein and a recombinant kinase protein
US12139737B2 (en) 2023-09-08 2024-11-12 Nobell Foods, Inc. Host cells comprising a recombinant casein protein and a recombinant kinase protein

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS626893B2 (en) 1977-07-08 1987-02-14 Taisei Corp
US5222982A (en) 1991-02-11 1993-06-29 Ommaya Ayub K Spinal fluid driven artificial organ
US5385582A (en) 1991-02-11 1995-01-31 Ommaya; Ayub K. Spinal fluid driven artificial organ
US5843780A (en) 1995-01-20 1998-12-01 Wisconsin Alumni Research Foundation Primate embryonic stem cells
WO1999020741A1 (en) 1997-10-23 1999-04-29 Geron Corporation Methods and materials for the growth of primate-derived primordial stem cells
WO2001038547A2 (en) 1999-11-24 2001-05-31 Mcs Micro Carrier Systems Gmbh Polypeptides comprising multimers of nuclear localization signals or of protein transduction domains and their use for transferring molecules into cells
WO2001051616A2 (en) 2000-01-11 2001-07-19 Geron Corporation Techniques for growth and differentiation of human pluripotent stem cells
WO2003020920A1 (en) 2001-09-05 2003-03-13 Geron Corporation Culture system for rapid expansion of human embryonic stem cells
US7153684B1 (en) 1992-10-08 2006-12-26 Vanderbilt University Pluripotential embryonic stem cells and methods of making same
US20070254842A1 (en) 2006-04-25 2007-11-01 The Regents Of The University Of California Administration of growth factors for the treatment of cns disorders
US20080081064A1 (en) 2006-09-28 2008-04-03 Surmodics, Inc. Implantable Medical Device with Apertures for Delivery of Bioactive Agents
US20090047263A1 (en) 2005-12-13 2009-02-19 Kyoto University Nuclear reprogramming factor and induced pluripotent stem cells
US20090068742A1 (en) 2005-12-13 2009-03-12 Shinya Yamanaka Nuclear Reprogramming Factor
US20090191159A1 (en) 2007-06-15 2009-07-30 Kazuhiro Sakurada Multipotent/pluripotent cells and methods
US20090196903A1 (en) 2008-01-29 2009-08-06 Kliman Gilbert H Drug delivery devices, kits and methods therefor
US20090227032A1 (en) 2005-12-13 2009-09-10 Kyoto University Nuclear reprogramming factor and induced pluripotent stem cells
US20090246875A1 (en) 2007-12-10 2009-10-01 Kyoto University Efficient method for nuclear reprogramming
WO2017070632A2 (en) 2015-10-23 2017-04-27 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US9840699B2 (en) 2013-12-12 2017-12-12 President And Fellows Of Harvard College Methods for nucleic acid editing
WO2019023680A1 (en) * 2017-07-28 2019-01-31 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (pace)

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS626893B2 (en) 1977-07-08 1987-02-14 Taisei Corp
US5222982A (en) 1991-02-11 1993-06-29 Ommaya Ayub K Spinal fluid driven artificial organ
US5385582A (en) 1991-02-11 1995-01-31 Ommaya; Ayub K. Spinal fluid driven artificial organ
US7153684B1 (en) 1992-10-08 2006-12-26 Vanderbilt University Pluripotential embryonic stem cells and methods of making same
US6200806B1 (en) 1995-01-20 2001-03-13 Wisconsin Alumni Research Foundation Primate embryonic stem cells
US7029913B2 (en) 1995-01-20 2006-04-18 Wisconsin Alumni Research Foundation Primate embryonic stem cells
US5843780A (en) 1995-01-20 1998-12-01 Wisconsin Alumni Research Foundation Primate embryonic stem cells
WO1999020741A1 (en) 1997-10-23 1999-04-29 Geron Corporation Methods and materials for the growth of primate-derived primordial stem cells
WO2001038547A2 (en) 1999-11-24 2001-05-31 Mcs Micro Carrier Systems Gmbh Polypeptides comprising multimers of nuclear localization signals or of protein transduction domains and their use for transferring molecules into cells
WO2001051616A2 (en) 2000-01-11 2001-07-19 Geron Corporation Techniques for growth and differentiation of human pluripotent stem cells
WO2003020920A1 (en) 2001-09-05 2003-03-13 Geron Corporation Culture system for rapid expansion of human embryonic stem cells
US20090047263A1 (en) 2005-12-13 2009-02-19 Kyoto University Nuclear reprogramming factor and induced pluripotent stem cells
US20090068742A1 (en) 2005-12-13 2009-03-12 Shinya Yamanaka Nuclear Reprogramming Factor
US20090227032A1 (en) 2005-12-13 2009-09-10 Kyoto University Nuclear reprogramming factor and induced pluripotent stem cells
US20070254842A1 (en) 2006-04-25 2007-11-01 The Regents Of The University Of California Administration of growth factors for the treatment of cns disorders
US20080081064A1 (en) 2006-09-28 2008-04-03 Surmodics, Inc. Implantable Medical Device with Apertures for Delivery of Bioactive Agents
US20090191159A1 (en) 2007-06-15 2009-07-30 Kazuhiro Sakurada Multipotent/pluripotent cells and methods
US20090304646A1 (en) 2007-06-15 2009-12-10 Kazuhiro Sakurada Multipotent/Pluripotent Cells and Methods
US20090246875A1 (en) 2007-12-10 2009-10-01 Kyoto University Efficient method for nuclear reprogramming
US20090196903A1 (en) 2008-01-29 2009-08-06 Kliman Gilbert H Drug delivery devices, kits and methods therefor
US9840699B2 (en) 2013-12-12 2017-12-12 President And Fellows Of Harvard College Methods for nucleic acid editing
WO2017070632A2 (en) 2015-10-23 2017-04-27 President And Fellows Of Harvard College Nucleobase editors and uses thereof
WO2019023680A1 (en) * 2017-07-28 2019-01-31 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (pace)

Non-Patent Citations (56)

* Cited by examiner, † Cited by third party
Title
"Immunology Methods Manual", 1997, ACADEMIC PRESS
"Remington's Pharmaceutical Sciences", 1985, MACE PUBLISHING COMPANY
"Short Protocols in Molecular Biology", 1999, JOHN WILEY & SONS
"Viral Vectors", 1995, ACADEMIC PRESS
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403 - 410
ARAVIND L ET AL: "Novel autoproteolytic and DNA-damage sensing components in the bacterial SOS response and oxidized methylcytosine-induced eukaryotic DNA demethylation systems", BIOLOGY DIRECT, BIOMED CENTRAL, vol. 8, no. 1, 15 August 2013 (2013-08-15), pages 20, XP021160719, ISSN: 1745-6150, DOI: 10.1186/1745-6150-8-20 *
AYMAN EID ET AL: "CRISPR base editors: genome editing without double-stranded breaks", BIOCHEMICAL JOURNAL, vol. 475, no. 11, 11 June 2018 (2018-06-11), pages 1955 - 1964, XP055638645, ISSN: 0264-6021, DOI: 10.1042/BCJ20170793 *
BARTGE ET AL., PROC. NATL. ACAD. SCI. USA, vol. 85, 1988, pages 3648 - 3652
BIOCHEM BIOPHYS RES COMMUN., vol. 446, no. 1, 28 March 2014 (2014-03-28), pages 261 - 6
BOLLAG ET AL.: "Protein Methods", 1996, JOHN WILEY & SONS
BOUNDY ET AL., J. NEUROSCI., vol. 18, 1998, pages 9989
CASANOVA ET AL., GENESIS, vol. 31, 2001, pages 37
CHEN ET AL., ADV. DRUG DELIV. REV., vol. 65, no. 10, 2013, pages 1357 - 1369
CHEN ET AL., CELL, vol. 51, 1987, pages 7 - 19
CHOO ET AL., NATURE, vol. 372, no. 6507, 15 December 1994 (1994-12-15), pages 642 - 645
CITARTAN ET AL., BIOSENS BIOELECTRON., vol. 34, no. 1, 15 April 2012 (2012-04-15), pages 1 - 11
COMB ET AL., EMBO J., vol. 17, 1988, pages 3793 - 3805
DOYLEGRIFFITHS: "Cell and Tissue Culture: Laboratory Procedures in Biotechnology", 1998, JOHN WILEY & SONS
EIRIK ADIM MOREB ET AL: "Managing the SOS Response for Enhanced CRISPR-Cas-Based Recombineering in E.?coli through Transient Inhibition of Host RecA Activity", ACS SYNTHETIC BIOLOGY, vol. 6, no. 12, 15 September 2017 (2017-09-15), Washington, DC,USA, pages 2209 - 2218, XP055442204, ISSN: 2161-5063, DOI: 10.1021/acssynbio.7b00174 *
GREENBERGSAMBROOK: "Molecular Cloning: A Laboratory Manual", 2012, COLD SPRING HARBOR LABORATORY PRESS
JINEK ET AL., SCIENCE, vol. 337, no. 6096, 17 August 2012 (2012-08-17), pages 816 - 821
KANEDA ET AL., NEURON, vol. 6, 1991, pages 583 - 594
KIM ET AL., GENOME RES., vol. 22, no. 7, July 2012 (2012-07-01), pages 1327 - 33
KOSHIMIZU, U. ET AL., DEVELOPMENT, vol. 122, 1996, pages 1235
LIBERMAN ET AL.: "lnterdiscip Rev RNA.", vol. 3, May 2012, WILEY, pages: 369 - 84
LIU ET AL., GENE THERAPY, vol. 11, 2004, pages 52 - 60
LLEWELLYN ET AL., NAT. MED., vol. 16, no. 10, 2010, pages 1161 - 1166
MATSUI, Y. ET AL., CELL, vol. 70, 1992, pages 841
MAYFORD ET AL., PROC. NATL. ACAD. SCI. USA, vol. 93, 1996, pages 13250
MIYAGISHI ET AL., NATURE BIOTECHNOLOGY, vol. 20, 2002, pages 497 - 500
MOHNI ET AL., CELL, vol. 176, 2019, pages 144 - 153
MORRISON ET AL., CELL, vol. 88, 1997, pages 287 - 298
MOSCOU, M. J.BOGDANOVE, A. J., SCIENCE, vol. 326, no. 5959, 2009, pages 1501
NAKAMURA ET AL., GENES CELLS., vol. 17, no. 5, May 2012 (2012-05-01), pages 344 - 64
OBERDICK ET AL., SCIENCE, vol. 249, 1990, pages 1527 - 1533
OH ET AL., GENE THER., vol. 16, 2009, pages 437
PANYAM ET AL., ADV DRUG DELIV REV., 13 September 2012 (2012-09-13)
RADOVICK ET AL., PROC. NATL. ACAD. SCI. USA, vol. 88, 1991, pages 3402 - 3406
REES HOLLY A ET AL: "Base editing: precision chemistry on the genome and transcriptome of living cells", NATURE REVIEWS GENETICS, NATURE PUBLISHING GROUP, GB, vol. 19, no. 12, 15 October 2018 (2018-10-15), pages 770 - 788, XP036637435, ISSN: 1471-0056, [retrieved on 20181015], DOI: 10.1038/S41576-018-0059-1 *
SASAOKA ET AL., MOL. BRAIN RES., vol. 16, 1992, pages 274
SHAMBLOTT, M. ET AL., PROC. NATL. ACAD. SCI. USA, vol. 95, 1998, pages 13726
SHAMBLOTT, M. ET AL., PROC. NATL. ACAD. SCI. USA, vol. 98, pages 113
SMITHWATERMAN, ADV. APPL. MATH., vol. 2, 1981, pages 482 - 489
STELLA ET AL., NATURE STRUCTURAL BIOLOGY, vol. 24, no. 11, pages 882 - 892
TAKAHASHI, CELL, vol. 131, no. 5, 30 November 2007 (2007-11-30), pages 861 - 72
TAKAHASHI, NAT PROTOC., vol. 2, no. 12, 2007, pages 3081 - 9
THOMSON ET AL., BIOL. REPROD., vol. 55, 1996, pages 254
THOMSON ET AL., SCIENCE, vol. 282, 1998, pages 1145
THOMSON, PROC. NATL. ACAD. SCI. USA, vol. 92, 1995, pages 7844
THOMSON, SCIENCE, vol. 282, no. 5391, 6 November 1998 (1998-11-06), pages 1145 - 7
TRENDS BIOCHEM SCI., vol. 41, no. 7, July 2016 (2016-07-01), pages 578 - 594
VAVALLE ET AL., FUTURE CARDIOL., vol. 8, no. 3, May 2012 (2012-05-01), pages 371 - 82
XIA ET AL., NUCLEIC ACIDS RES., vol. 31, no. 17, 1 September 2003 (2003-09-01)
XIAO DING ET AL: "Improving CRISPR-Cas9 Genome Editing Efficiency by Fusion with Chromatin-Modulating Peptides", THE CRISPR JOURNAL, vol. 2, no. 1, 21 February 2019 (2019-02-21), pages 51 - 63, XP055722923, ISSN: 2573-1599, DOI: 10.1089/crispr.2018.0036 *
YU, SCIENCE, vol. 318, no. 5858, 21 December 2007 (2007-12-21), pages 1917 - 20
ZHANGMADDEN, GENOME RES., vol. 7, 1997, pages 649 - 656

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10894812B1 (en) 2020-09-30 2021-01-19 Alpine Roads, Inc. Recombinant milk proteins
US10947552B1 (en) 2020-09-30 2021-03-16 Alpine Roads, Inc. Recombinant fusion proteins for producing milk proteins in plants
US10988521B1 (en) 2020-09-30 2021-04-27 Alpine Roads, Inc. Recombinant milk proteins
US11034743B1 (en) 2020-09-30 2021-06-15 Alpine Roads, Inc. Recombinant milk proteins
US11072797B1 (en) 2020-09-30 2021-07-27 Alpine Roads, Inc. Recombinant fusion proteins for producing milk proteins in plants
US11142555B1 (en) 2020-09-30 2021-10-12 Nobell Foods, Inc. Recombinant milk proteins
US11401526B2 (en) 2020-09-30 2022-08-02 Nobell Foods, Inc. Recombinant fusion proteins for producing milk proteins in plants
US11685928B2 (en) 2020-09-30 2023-06-27 Nobell Foods, Inc. Recombinant fusion proteins for producing milk proteins in plants
US11840717B2 (en) 2020-09-30 2023-12-12 Nobell Foods, Inc. Host cells comprising a recombinant casein protein and a recombinant kinase protein
US11952606B2 (en) 2020-09-30 2024-04-09 Nobell Foods, Inc. Food compositions comprising recombinant milk proteins
US12077798B2 (en) 2020-09-30 2024-09-03 Nobell Foods, Inc. Food compositions comprising recombinant milk proteins
US12139737B2 (en) 2023-09-08 2024-11-12 Nobell Foods, Inc. Host cells comprising a recombinant casein protein and a recombinant kinase protein

Similar Documents

Publication Publication Date Title
US20220042047A1 (en) Compositions and methods for modifying a target nucleic acid
EP3352795B1 (en) Compositions and methods for target nucleic acid modification
US20240156989A1 (en) Methods and Compositions for Modifying a Mutant Dystrophin Gene in a Cell's Genome
JP2021506251A (en) New RNA programmable endonuclease system, as well as its use in genome editing and other applications
WO2016106239A1 (en) Methods and compositions for nucleic acid integration
JP2021518139A (en) New RNA Programmable Endonuclease System and Its Use
US20200291368A1 (en) Improved CRISPR-Cpf1 Genome Editing Tool
WO2020209959A1 (en) Nucleobase-editing fusion protein systems, compositions, and uses thereof
JP2023508362A (en) CRISPR-CAS EFFECTOR POLYPEPTIDES AND METHODS OF USE THEREOF
US20220145274A1 (en) Novel high fidelity rna-programmable endonuclease systems and uses thereof
JP2024526062A (en) V-type RNA programmable endonuclease system
US20230348872A1 (en) Crispr-cas effector polypeptides and methods of use thereof
EP4101928A1 (en) Type v rna programmable endonuclease systems
WO2023118068A1 (en) Novel small type v rna programmable endonuclease systems
WO2023237587A1 (en) Novel small type v rna programmable endonuclease systems
CN117940560A (en) Novel small RNA programmable endonuclease system with improved PAM specificity and uses thereof
CN118103502A (en) V-type RNA programmable endonuclease system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20751323

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20751323

Country of ref document: EP

Kind code of ref document: A1