Nothing Special   »   [go: up one dir, main page]

WO2019120310A1 - Base editing system and method based on cpf1 protein - Google Patents

Base editing system and method based on cpf1 protein Download PDF

Info

Publication number
WO2019120310A1
WO2019120310A1 PCT/CN2018/123158 CN2018123158W WO2019120310A1 WO 2019120310 A1 WO2019120310 A1 WO 2019120310A1 CN 2018123158 W CN2018123158 W CN 2018123158W WO 2019120310 A1 WO2019120310 A1 WO 2019120310A1
Authority
WO
WIPO (PCT)
Prior art keywords
deaminase
base
fusion protein
dna
cleavage activity
Prior art date
Application number
PCT/CN2018/123158
Other languages
French (fr)
Inventor
Caixia Gao
Yanpeng WANG
Original Assignee
Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences filed Critical Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences
Publication of WO2019120310A1 publication Critical patent/WO2019120310A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8216Methods for controlling, regulating or enhancing expression of transgenes in plant cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/85Vectors or expression systems specially adapted for eukaryotic hosts for animal cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • C12N15/907Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04001Cytosine deaminase (3.5.4.1)
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • the invention relates to the field of genetic engineering.
  • the invention relates to a base editing system and method based on CPF1 protein.
  • the present invention relates to a system and method for efficient base editing of a target sequence in the genome of an organism (e.g., a plant) by a guide RNA-directed Cpfl-deaminase fusion protein, and the genetically modified organism (e.g., a plant) produced by the method and progeny thereof.
  • TILLING targeting induced local lesions in genomes
  • Genomic editing techniques particularly those based on the CRISPR/Cas9 system, enable the introduction of specific base substitutions in genomic loci by homologous recombination (HR) -mediated DNA repair pathways.
  • HR homologous recombination
  • the successful use of this method is currently limited, mainly due to the low frequency of HR-mediated double-strand broken chain repair in plants.
  • effectively providing a sufficient amount of DNA repair templates is also a major difficulty.
  • Cas9 and deaminase can be fused to achieve precise conversion of cytosine (C) to thymine (T) and conversion of adenine (A) to guanine (G) in a target gene.
  • the system for C to T transformation mainly includes fusions of SpnCas9-BE3, SpnCas9-AID and Cas9 variants, such as VQR-BE3, EQR-BE3 and VRER-BE3, as well as SaCas9-BE3 and variant SaKKH-BE3.
  • the PAMs for Cas9 and Cas9 variants are generally limited in G/C-rich region, thus the types of PAM for the single-base editing system still needs to be broadened.
  • the single-base editing system still needs to be improved in terms of specificity.
  • nCas9-BE3 and its variants, and nCas9-ABE usually produce single-stranded nicks on the non-targeting strands of the target site, and it tends to generate DNA indels while it generates single-base mutations during the mismatch repair process, therefore there is still room for improvement in the high fidelity of single base editing. Therefore, new systems and methods for base editing of plant genomes are still needed in the art.
  • Cpf1 nuclease Cpf1 protein
  • Cpf1 Cpf1
  • Cpf1 protein Cpf1
  • DSBs DNA double-strand breaks
  • crRNA guide RNA
  • the Cpf1 protein contains a cleavage domain of DNA and an independent RNA cleavage domain.
  • the RNA cleavage domain of the Cpf1 protein is capable of processing pre-crRNA to form a mature crRNA.
  • guide RNA and “gRNA” can be used interchangeably herein.
  • the guide RNA of the Cpf1-mediated genome editing system is typically composed only of mature crRNA molecules, wherein the crRNA comprises a sequence that is sufficiently identical to the target sequence to hybridize to the complement of the target sequence and direct the complex (Cpf1+crRNA) to sequence specifically bind to the target sequence.
  • Deaminase refers to an enzyme that catalyzes a deamination reaction.
  • the deaminase refers to a cytosine deaminase that catalyzes the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively.
  • the deaminase refers to adenine deaminase which is capable of catalyzing the formation of inosine (I) by adenosine or deoxyadenosine (A) .
  • Gene as used herein encompasses not only chromosomal DNA present in the nucleus, but also organellar DNA present in the subcellular components (e.g., mitochondria, plastids) of the cell.
  • organism includes any organism that is suitable for genome editing, eukaryotes are preferred. Examples of organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, geese; plants including monocots and dicots such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis and the like.
  • a “genetically modified organism” or “genetically modified cell” includes the organism or the cell which comprises within its genome an exogenous polynucleotide or a modified gene or expression regulatory sequence.
  • the exogenous polynucleotide is stably integrated within the genome of the organism or the cell such that the polynucleotide is passed on to successive generations.
  • the exogenous polynucleotide may be integrated into the genome alone or as part of a recombinant DNA construct.
  • the modified gene or expression regulatory sequence means that, in the organism genome or the cell genome, said sequence comprises one or more nucleotide substitution, deletion, or addition.
  • a genetically modified organism obtained by the present invention may comprise one or more substitutions of C to T or A to G relative to a wild type (corresponding organism without such genetic modification) .
  • Exogenous in reference to a sequence means a sequence from a foreign species, or refers to a sequence in which significant changes in composition and /or locus occur from its native form through deliberate human intervention if from the same species.
  • nucleic acid sequence RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases.
  • Nucleotides are referred to by their single letter names as follows: “A” is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively) , “C” means cytidine or deoxycytidine, “G” means guanosine or deoxyguanosine, “U” represents uridine, “T” means deoxythymidine, “R” means purine (A or G) , “Y” means pyrimidine (C or T) , “K” means G or T, “H” means A or C or T, “I” means inosine, and “N” means any nucleotide.
  • Polypeptide, " “peptide, “ and “protein” are used interchangeably in the present invention to refer to a polymer of amino acid residues. The terms apply to an amino acid polymer in which one or more amino acid residues is artificial chemical analogue of corresponding naturally occurring amino acid (s) , as well as to a naturally occurring amino acid polymer.
  • polypeptide, “ “peptide, “ “amino acid sequence, “ and “protein” may also include modified forms including, but not limited to, glycosylation, lipid ligation, sulfation, ⁇ carboxylation of glutamic acid residues, and ADP-ribosylation.
  • expression construct refers to a vector such as a recombinant vector that is suitable for expression of a nucleotide sequence of interest in an organism.
  • “Expression” refers to the production of a functional product.
  • expression of a nucleotide sequence may refer to the transcription of a nucleotide sequence (e.g., transcription to produce mRNA or functional RNA) and/or the translation of an RNA into a precursor or mature protein.
  • the "expression construct" of the present invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector or, in some embodiments, an RNA that is capable of translation (such as mRNA) .
  • the "expression construct" of the present invention may comprise regulatory sequences and nucleotide sequences of interest from different origins, or regulatory sequences and nucleotide sequences of interest from the same source but arranged in a manner different from that normally occurring in nature.
  • regulatory sequence and “regulatory element” are used interchangeably to refer to a nucleotide sequence that is located upstream (5 'non-coding sequence) , middle or downstream (3' non-coding sequence) of a coding sequence and affects the transcription, RNA processing or stability or translation of the relevant coding sequence.
  • Regulatory sequences may include, but are not limited to, promoters, translation leaders, introns and polyadenylation recognition sequences.
  • Promoter refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment.
  • the promoter is a promoter capable of controlling the transcription of a gene in a cell, whether or not it is derived from the cell.
  • the promoter may be a constitutive promoter or tissue-specific promoter or developmentally-regulated promoter or inducible promoter.
  • Constant promoter refers to a promoter that may in general cause the gene to be expressed in most cases in most cell types.
  • tissue-specific promoter and “tissue-preferred promoter” are used interchangeably and mean that they are expressed primarily but not necessarily exclusively in one tissue or organ, but also in a specific cell or cell type.
  • Developmentally-regulated promoter refers to a promoter whose activity is dictated by developmental events.
  • inducible promoter selectively express operably linked DNA sequences in response to an endogenous or exogenous stimulus (environment, hormones, chemical signals, etc. ) .
  • operably linked refers to the linkage of a regulatory element (e.g., but not limited to, a promoter sequence, a transcription termination sequence, etc. ) to a nucleic acid sequence (e.g., a coding sequence or an open reading frame) such that transcription of the nucleotide sequence is controlled and regulated by the transcriptional regulatory element.
  • a regulatory element e.g., but not limited to, a promoter sequence, a transcription termination sequence, etc.
  • nucleic acid sequence e.g., a coding sequence or an open reading frame
  • “Introduction" of a nucleic acid molecule (e.g., plasmid, linear nucleic acid fragment, RNA, etc. ) or protein into an organism means that the nucleic acid or protein is used to transform a cell of the organism such that the nucleic acid or protein is capable of functioning in the cell.
  • “transformation” includes both stable and transient transformations.
  • “Stable transformation” refers to the introduction of exogenous nucleotide sequences into the genome, resulting in the stable inheritance of foreign genes. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any of its successive generations.
  • Transient transformation refers to the introduction of a nucleic acid molecule or protein into a cell, performing a function without the stable inheritance of an exogenous gene. In transient transformation, the exogenous nucleic acid sequences are not integrated into the genome.
  • plant includes a whole plant and any descendant, cell, tissue, or part of a plant.
  • plant parts include any part (s) of a plant, including, for example and without limitation: seed (including mature seed and immature seed) ; a plant cutting; a plant cell; a plant cell culture; a plant organ (e.g., pollen, embryos, flowers, fruits, shoots, leaves, roots, stems, and explants) .
  • a plant tissue or plant organ may be a seed, protoplast, callus, or any other group of plant cells that is organized into a structural or functional unit.
  • a plant cell or tissue culture may be capable of regenerating a plant having the physiological and morphological characteristics of the plant from which the cell or tissue was obtained, and of regenerating a plant having substantially the same genotype as the plant. In contrast, some plant cells are not capable of being regenerated to produce plants.
  • Regenerable cells in a plant cell or tissue culture may be embryos, protoplasts, meristematic cells, callus, pollen, leaves, anthers, roots, root tips, silk, flowers, kernels, ears, cobs, husks, or stalks.
  • Plant parts include harvestable parts and parts useful for propagation of progeny plants.
  • Plant parts useful for propagation include, for example and without limitation: seed; fruit; a cutting; a seedling; a tuber; and a rootstock.
  • a harvestable part of a plant may be any useful part of a plant, including, for example and without limitation: flower; pollen; seedling; tuber; leaf; stem; fruit; seed; and root.
  • a plant cell is the structural and physiological unit of the plant, and includes protoplast cells without a cell wall and plant cells with a cell wall.
  • a plant cell may be in the form of an isolated single cell, or an aggregate of cells (e.g., a friable callus and a cultured cell) , and may be part of a higher organized unit (e.g., a plant tissue, plant organ, and plant) .
  • a plant cell may be a protoplast, a gamete producing cell, or a cell or collection of cells that can regenerate into a whole plant.
  • a seed which comprises multiple plant cells and is capable of regenerating into a whole plant, is considered a "plant part" in embodiments herein.
  • protoplast refers to a plant cell that had its cell wall completely or partially removed, with the lipid bilayer membrane thereof naked.
  • a protoplast is an isolated plant cell without cell walls which has the potency for regeneration into cell culture or a whole plant.
  • Plant of a plant comprises any subsequent generation of the plant.
  • Trait refers to the physiological, morphological, biochemical, or physical characteristics of a plant or a particular plant material or cell.
  • the characteristic is visible to the human eye, such as seed or plant size, or can be measured by biochemical techniques, such as detecting the protein, starch, or oil content of seed or leaves, or by observation of a metabolic or physiological process, e.g. by measuring tolerance to water deprivation or particular salt or sugar concentrations, or by the observation of the expression level of a gene or genes, or by agricultural observations such as osmotic stress tolerance or yield.
  • trait also includes ploidy of a plant, such as haploidy which is important for plant breeding.
  • trait also includes resistance of a plant to herbicides.
  • “Agronomic trait” is a measurable parameter including but not limited to, leaf greenness, yield, growth rate, biomass, fresh weight at maturation, dry weight at maturation, fruit yield, seed yield, total plant nitrogen content, fruit nitrogen content, seed nitrogen content, nitrogen content in a vegetative tissue, total plant free amino acid content, fruit free amino acid content, seed free amino acid content, free amino acid content in a vegetative tissue, total plant protein content, fruit protein content, seed protein content, protein content in a vegetative tissue, drought tolerance, nitrogen uptake, root lodging, harvest index, stalk lodging, plant height, ear height, ear length, disease resistance, cold resistance, salt tolerance, and tiller number and so on.
  • the present invention provides a system for base editing of a target sequence in the genome of an organism, comprising at least one of the following i) to v) :
  • an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein, and a guide RNA;
  • a base-editing fusion protein a base-editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA
  • an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
  • the base-editing fusion protein comprises a Cpf1 lacking DNA cleavage activity, and a deaminase, the guide RNA being capable of targeting the base-editing fusion protein to a target sequence in the genome, resulting in one or more C to T or A to G substitution (s) in the target sequence.
  • Cpf1 contains a DNA cleavage domain (RuvC) , which can be mutated to delete the DNA cleavage activity of Cpf1 to form a "Cpf1 lacking DNA cleavage activity" .
  • the Cpf1 lacking DNA cleavage activity still retains gRNA-directed DNA binding ability.
  • the Cpfl lack of DNA cleavage activity can readily target the additional protein to almost any DNA sequence simply by co-expression with a suitable guide RNA.
  • the Cpf1 lacking DNA cleavage activity of the present invention may be derived from Cpf1 of different species, for example, Cpf1 proteins derived from Francisella novicida U112, Acidaminococcus sp. BV3L6 and Lachnospiraceae bacterium ND2006, designated FnCpf1 (amino acid sequence of the wild type is set forth in SEQ ID NO: 19) , AsCpf1 (amino acid sequence of the wild type is set forth in SEQ ID NO: 18) and LbCpf1 (amino acid sequence of the wild type is set forth in SEQ ID NO: 20) , respectively.
  • FnCpf1 amino acid sequence of the wild type is set forth in SEQ ID NO: 19
  • AsCpf1 amino acid sequence of the wild type is set forth in SEQ ID NO: 18
  • LbCpf1 amino acid sequence of the wild type is set forth in SEQ ID NO: 20
  • the Cpf1 lacking DNA cleavage activity is the FnCpfl lacking DNA cleavage activity. In some embodiments, the FnCpfl lacking DNA cleavage activity comprises a D917A mutation relative to wild-type FnCpfl.
  • the Cpf1 lacking DNA cleavage activity is the AsCpfl lacking DNA cleavage activity. In some embodiments, the AsCpfl lacking DNA cleavage activity comprises a D908A mutation relative to wild-type AsCpfl.
  • the Cpf1 lacking DNA cleavage activity is the LbCpfl lacking DNA cleavage activity. In some embodiments, the LbCpfl lacking DNA cleavage activity comprises a D832A mutation relative to wild type LbCpfl.
  • an expression construct comprising a nucleotide sequence encoding a guide RNA in a system of the invention may comprise a sequence encoding a plurality of different guide RNA (crRNA) precursors in tandem, which may be processed by the Cpf1 lacking DNA cleavage activity to form a plurality of different guide RNAs (crRNAs) upon transcription to simultaneously target a plurality of different target sequences.
  • crRNA guide RNA
  • the deaminase in the fusion protein is a cytidine deaminase, such as the apolipoprotein B mRNA editing complex (APOBEC) family deaminase.
  • APOBEC apolipoprotein B mRNA editing complex
  • Cytidine deaminase catalyzes the deamination of cytidine (C) in the DNA to form uracil (U) .
  • the present inventors have surprisingly found that the fusion of a Cpf1 lacking DNA cleavage activity and a cytidine deaminase, under the guidance of a guide RNA, can target a target sequence in the genome.
  • the DNA double strands are not cleaved, and the cytidine deaminase in the fusion protein can deamination of the cytidine of the single-stranded DNA produced during the formation of the Cpf1-guide RNA-DNA complex into U, and then C to T replacement is achieved through the base mismatch repairs.
  • the cytidine deaminase of the present invention is particularly a cytidine deaminase which can accept single-stranded DNA as a substrate.
  • cytidine deaminase useful in the present invention include, but are not limited to, APOBEC1 deaminase, activation-induced cytidine deaminase (AID) , APOBEC3G or CDA1.
  • the cytidine deaminase comprises the amino acid sequence set forth in SEQ ID NO: 1.
  • the base editing system of the present invention can mutate one or more C (s) to T (s) in the genomic target sequence, thus also referred to as the Cpf1-PBE system.
  • uracil DNA glycosylase catalyzes the removal of U from DNA and initiates base excision repair (BER) , resulting in the repair of U: G to C: G.
  • BER base excision repair
  • the base-editing fusion protein further comprises a uracil DNA glycosylase inhibitor (UGI) .
  • the uracil DNA glycosylase inhibitor comprises the amino acid sequence set forth in SEQ ID NO: 2.
  • the deaminase is an adenine deaminase.
  • the naturally occurring adenine deaminase converts adenosine in single-stranded RNA into inosine (I) by deamination using RNA as a substrate.
  • DNA-dependent adenine deaminase that convert deoxyguanosine in single-stranded DNA to inosine (I) using single-stranded DNA as a substrate has been obtained based on tRNA adenine deaminase TadA of E. coli by means of directed evolution. See Nicloe M. Gaudelli et al., doi: 10.1038/nature 24644, 2017.
  • the present inventors have surprisingly found that when Cpf1 lacking DNA cleavage activity is fused to a DNA-dependent adenine deaminase, under the guidance of a guide RNA, the fusion protein can target a target sequence in the plant genome. Due to the deficient of the DNA cleavage activity in Cpf1, the DNA double strands are not cleaved, and the DNA-dependent adenine deaminase in the fusion protein is capable of deaminating the adenosine of the single-stranded DNA produced during the formation of the Cpf1-guide RNA-DNA complex into a inosine (I) .
  • the base editing system of the present invention can mutate one or more A in the genomic target sequence to G, and thus is also called Cpf1-ABE system.
  • the DNA-dependent adenine deaminase is a variant of the E. coli tRNA adenine deaminase TadA (ecTadA) , in particular a variant which can accept single-stranded DNA as a substrate.
  • the variant comprises, relative to wild-type ecTadA, one or more sets of mutations selected from the group consisting of:
  • the DNA-dependent adenine deaminase (ABE version 7.9) comprises the following mutations relative to wild-type ecTadA: W23R, H36L, R51L, S146C, K157N, A106V, D108N, P48A, L84F, H123Y, I156F, A142N, D147Y, E155V and R152P.
  • the DNA-dependent adenine deaminase (ABE version 7.10) comprises the following mutations relative to wild-type ecTadA: W23R, H36L, R51L, S146C, K157N, A106V, D108N, P48A, L84F, H123Y, I156F, D147Y, E155V and R152P.
  • the amino acid sequence of preferred ecTadA derived DNA-dependent adenosine deaminase is shown below: MSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNR AIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIH SRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALL CYFFRMPRQVFNAQKKAQSSTD (SEQ ID NO: 4) .
  • the initiating methionine may be absent.
  • the deaminase is fused to the N-terminus of the Cpfl lacking DNA cleavage activity. In some embodiments, the deaminase is fused to the C-terminus of the Cpfl lacking DNA cleavage activity.
  • the N-terminus of the DNA-dependent adenine deaminase is fused with a corresponding wild-type adenine deaminase. It is expected that the formation of heterodimers by DNA-dependent adenine deaminase and wild-type adenine deaminase can significantly increase the A to G editing activity of fusion proteins.
  • the deaminase and the Cpfl lacking DNA cleavage activity are fused via a linker.
  • the linker may be 1-50 (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) amino acids in length, or non-functional amino acid sequences with more amino acids and without secondary or higher structures.
  • the linker can be a flexible linker such as GGGGS, GS, GAP, (GGGGS) x 3, GGS and (GGS) x7, and the like.
  • the linker is an XTEN linker.
  • the linker is 32 amino acids in length.
  • the amino acid sequence of the linker is: SGGSSGGSSGSETPGTSESATPESSGGSSGGS.
  • the base-editing fusion proteins of the present invention further comprise a nuclear localization sequence (NLS) .
  • NLS nuclear localization sequence
  • one or more NLSs in the base-editing fusion protein should be of sufficient strength to drive the base-editing fusion protein in the nucleus of a plant cell to achieve an amount accumulation of base editing function.
  • the intensity of nuclear localization activity is determined by the number, location, one or more specific NLSs used of the NLS in the base-editing fusion protein, or a combination of these factors.
  • the NLS of the base-editing fusion protein of the present invention may be located at the N-terminus and/or C-terminus.
  • the base-editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs. In some embodiments, the base-editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS at or near the N-terminus. In some embodiments, the base-editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the C-terminus.
  • the base-editing fusion protein comprises a combination of these, such as comprises one or more NLSs at the N-terminus and one or more NLSs at the C-terminus. When there is more than one NLS, each can be selected to be independent of other NLSs. In some preferred embodiments of the present invention, the base-editing fusion protein comprises two NLSs, for example, the two NLSs are located at the N-terminus and the C-terminus, respectively.
  • NLS consists of one or more short sequences of positively charged lysine or arginine exposed on the surface of the protein, but other types of NLS are also known.
  • Non-limiting examples of NLS include: KKRKV (nucleotide sequence 5'-AAGAAGAGAAAGGTC-3') , PKKKRKV (nucleotide sequence 5'-CCCAAGAAGAAGAGGAAGGTG-3' or CCAAAGAAGAAGAGGAAGGTT) , or SGGSPKKKRKV (nucleotide sequence 5'-TCGGGGGGGAGCCCAAAGAAGAAGCGGAAGGTG -3') .
  • the N-terminus of the base-editing fusion protein comprises the NLS with the amino acid sequence set forth in PKKKRKV.
  • the C-terminus of the base-editing fusion protein comprises the NLS with the amino acid sequence set forth by SGGSPKKKRKV or KRPAATKKAGQAKKKK.
  • the base-editing fusion proteins of the present invention may also include other localization sequences, such as cytoplasmic localization sequences, chloroplast localization sequences, mitochondrial localization sequences, and the like.
  • the base editing fusion protein also contains a uracil DNA glycosylase inhibitor (UGI) , and two NLSs flanking either N-terminal or C-terminal of the UGI.
  • UGI uracil DNA glycosylase inhibitor
  • the base editing fusion protein of the invention comprises an amino acid sequence selected from SEQ ID NO: 24-29.
  • the nucleotide sequence encoding the base-editing fusion protein is codon optimized for the biological species to be base edited.
  • Codon optimization refers to the replacement of at least one codon (eg, about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) of a native sequence by a codon that is used more frequently or most frequently in the gene of the host cell, modifying the nucleic acid sequence while maintaining the native amino acid sequence to enhance expression in the host cell of interest.
  • codon preference differ in codon usage between organisms
  • mRNA messenger RNA
  • tRNA transfer RNA
  • codon usage table can be easily obtained, for example, in the Codon Usage Database available at www. kazusa. orjp/codon/ , and these tables can be adjusted in different ways. See, Nakamura Y. et. al "Codon usage tabulated from the international DNA sequence databases: status for the year2000 Nucl. Acids Res, 28: 292 (2000) .
  • the base editing fusion protein of the invention is encoded by the nucleotide sequence selected from SEQ ID NO: 8-9, 11-12 or 14-15.
  • the nucleotide sequence encoding the base-editing fusion protein and/or the nucleotide sequence encoding the guide RNA is operably linked to an expression regulatory element such as a promoter.
  • promoters examples include, but are not limited to, the polymerase (pol) I, pol II or pol III promoters.
  • the pol I promoter examples include the gallus RNA pol I promoter.
  • the pol II promoters include, but are not limited to, the immediate-early cytomegalovirus (CMV) promoter, the Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and the immediate-early simian virus 40 (SV40) promoter.
  • pol III promoters include the U6 and H1 promoters.
  • An inducible promoter such as a metallothionein promoter can be used.
  • promoters include the T7 phage promoter, the T3 phage promoter, the ⁇ -galactosidase promoter, and the Sp6 phage promoter, and the like.
  • Promoters that can be used in plants include, but are not limited to, cauliflower mosaic virus 35S promoter, maize Ubi-1 promoter, wheat U6 promoter, rice U3 promoter, maize U3 promoter, rice actin promoter.
  • the guide RNA (crRNA) is expressed using the Ubi-1 promoter and cleaved to become mature with a ribozyme such as HDV ribozyme.
  • the addition of an intron after the Ubi-1 promoter enhances expression of the protein or RNA of interest.
  • the expression construct for expressing the base editing fusion protein of the invention comprises an expression cassette of SEQ ID NO: 10 or 13.
  • the expression construct comprises a expression regulating sequence set forth in SEQ ID NO: 30.
  • the present invention provides a method of producing a genetically modified organism (e.g. a plant) , comprising introducing a system of the present invention for base editing of a target sequence in the genome of an organism into a cell of the organism, whereby the guide RNA targets the base-editing fusion protein to a target sequence in the genome, resulting in one or more C to T or one or more A to G substitutions in the target sequence.
  • a genetically modified organism e.g. a plant
  • target sequences or crRNA coding sequences that can be recognized and targeted by the Cpf1 protein and the guide RNA (i.e., crRNA) complex can be found, for example, in Zhang et al., Cell 163, 1–13, October 22, 2015.
  • the 5'-terminus of the target sequence targeted by the genome editing system of the present invention needs to include a protospacer adjacent motif (PAM) 5'-TTTN or 5'-YTN, wherein N is independently selected from A, G, C and T, Y is selected from C and T.
  • PAM protospacer adjacent motif
  • the target sequence has the following structure: 5'-TTTN-NX-3' or 5'-YTN-NX-3', wherein N is independently selected from A, G, C and T, Y is selected from C and T; X is an integer of 15 ⁇ X ⁇ 35; Nx represents X consecutive nucleotides.
  • the target sequence to be modified may be located at any location in the genome, for example, in a functional gene such as a protein-encoding gene, or may be, for example, located in a gene expression regulatory region such as a promoter region or an enhancer region, thereby the gene functional modification or gene expression modification can be achieved.
  • a to G or C to T base editing in the target sequence of a cell can be detected by T7EI, PCR/RE or sequencing methods.
  • the base editing system can be introduced into cells by a variety of methods well known to those skilled in the art.
  • Methods that can be used to introduce a genome editing system of the present invention into a cell include, but are not limited to, calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (e.g., baculovirus, vaccinia virus, adenovirus, adeno-associated virus, lentivirus and other viruses) , gene gun method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation.
  • a cell that can be edited by the method of the present invention can be a cell of mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, cat; a cell of poultry such as chicken, duck, goose; a cell of plants including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybean, peanut and Arabidopsis thaliana and so on.
  • the methods of the invention are particularly suitable for producing genetically modified plants, such as crop plants.
  • the base editing system can be introduced into a plant by various methods well known to those skilled in the art. Methods that can be used to introduce a base editing system of the invention into a plant include, but are not limited to, gene gun method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation, plant virus-mediated transformation, pollen tube pathway and ovary injection method.
  • the modification of the target sequence can be achieved by only introducing or producing the base-editing fusion protein and the guide RNA in the plant cell, and the modification can be stably inherited, without any need to stably transform the base editing system into plants. This avoids the potential off-target effect of the stable base editing system and also avoids the integration of the exogenous nucleotide sequence in the plant genome, thereby providing greater biosafety.
  • the introduction is carried out in the absence of selection pressure to avoid integration of the exogenous nucleotide sequence into the plant genome.
  • the introduction comprises transforming the base editing system of the present invention into an isolated plant cell or tissue and then regenerating the transformed plant cell or tissue into an intact plant.
  • the regeneration is carried out in the absence of selection pressure, i.e., no selection agent for the selection gene on the expression vector is used during tissue culture. Avoiding the use of a selection agent can increase the regeneration efficiency of the plant, obtaining a modified plant free of exogenous nucleotide sequences.
  • the base editing system of the present invention can be transformed into specific parts of an intact plant, such as leaves, shoot tips, pollen tubes, young ears or hypocotyls. This is particularly suitable for the transformation of plants that are difficult to regenerate in tissue culture.
  • the in vitro expressed protein and/or the in vitro transcribed RNA molecule are directly transformed into the plant.
  • the protein and/or RNA molecule is capable of performing base editing in plant cells and is subsequently degraded by the cell, avoiding integration of the exogenous nucleotide sequence in the plant genome.
  • Plants that can be base-edited by the methods of the invention include monocots and dicots.
  • the plant may be a crop plant such as wheat, rice, corn, soybean, sunflower, sorghum, canola, alfalfa, cotton, barley, millet, sugar cane, tomato, tobacco, tapioca or potato.
  • the target sequence is associated with a plant trait, such as an agronomic trait, whereby the base editing results in a plant having altered traits relative to a wild type plant.
  • a plant trait such as an agronomic trait
  • the target sequence to be modified may be located at any position in the genome, for example, in a functional gene such as a protein-encoding gene, or may be, for example, located in a gene expression regulatory region such as a promoter region or an enhancer region, thereby gene functional modification or gene expression modification can be achieved.
  • a functional gene such as a protein-encoding gene
  • a gene expression regulatory region such as a promoter region or an enhancer region
  • the substitution of C to T or A to G results in an amino acid substitution in the target protein or a truncation of the target protein (a stop codon is generated) .
  • the substitution of C to T or A to G results in a change in expression of the target gene.
  • the method further comprises obtaining progeny of the genetically modified plant.
  • the present invention provides a genetically modified plant or a progeny thereof, or a part thereof, wherein the plant is obtained by the method of the invention described above.
  • the present invention provides a method of plant breeding comprising crossing a genetically modified first plant obtained by the above method of the present invention with a second plant not containing the genetic modification, thereby the genetic modification is introduced into the second plant.
  • the ABE, XTEN, dCPF1 sequences were codon optimized for plants and ordered from GenScript (Nanjing) .
  • the full-length dCPF1-ABE fragment was amplified using primer pairs HindIII-F (with HindIII restriction site) and EcoRI (with EcoRI restriction site) .
  • the PCR product was digested with HindIII and EcoRI, and then inserted into the pJIT163-GFP vectors (this vector sequence is shown in SEQ ID NO: 16) digested with the two enzymes to generate the fusion expression vector dCPF1-ABE.
  • the PBE, XTEN, and dCPF1 sequences were codon optimized for plants and ordered from GenScript (Nanjing) .
  • the full length dCPF1-PBE fragment was amplified using primer pairs HindIII-F (with HindIII restriction site) and EcoRI (with EcoRI restriction site) .
  • the PCR product was digested with HindIII and EcoRI, and then inserted into the pJIT163-GFP vectors (this vector sequence is shown in SEQ ID NO: 16) digested with the two enzymes to generate the fusion expression vector dCPF1-PBE.
  • an sgRNA expression vector was constructed based on pTaU6-sgRNA (Addgene ID53062) or pOsU3-sgRNA (Addgene ID53063) or pZmU3-sgRNA (Addgene ID5306) or OsU3/TaU6-tRNA-sgRNA (Zhang et al. 2017. Genome Biology. DOI: 10.1186 /s13059-017-1325-9) .
  • the hammerhead enzyme and crRNA are activated by the type II promoter to generate crRNA (Tang et al. Nature plant, doi: 10.1038/nplants. 2017.18)
  • pUbi-mGFPP-crRNA pUbi-DEP1-sgRNA
  • pUbi-DEP1-crRNA pUbi-DME-crRNA.
  • Protoplast transformation was performed as described below. The average transformation efficiency is 55-70%.
  • Protoplasts transformation is performed as described below. Transformation is carried out with 10 ⁇ g of each plasmid by PEG-mediated method. Protoplasts were collected after 48h and DNA was extracted for T7EI and PCR-RE assay.
  • Protoplasts were suspended by adding an appropriate amount of MMG , placed on ice until tranformation.
  • Leaf sheath of the seedlings were used for protoplasts isolation, and cut into about 0.5 mm wide with a sharp blade.
  • Enzymolysis was performed for 5-6h in darkness with gently shaking (decolorization shaker, speed 10) .
  • Protoplasts were filtered into a 50 ml round bottom centrifuge tube with a 40 ⁇ m nylon membrane and washed with W5 solution.
  • the general reaction conditions are: denaturation at 94°C for 5min; denaturation at 94°C for 30s; anneal at 58°C for 30s, extension at 72°C for 30s, amplification for 30 to 35 cycles; incubation at 72°C for 5min; incubation at 12°C. 5 ⁇ l PCR products were subjected to electrophoresis.
  • sgRNA expression vectors were transformed respectively into wheat and rice protoplasts with Ubi-CPF1-PBE/ABE expression vector. Protoplasts were collected 48 hours later, and DNA was extracted for deep sequencing. In the first round of PCR, the target region was amplified using site-specific primers. In second round of PCR, forward and reverse tags were added to the end of the PCR product for library construction. Equal amounts of different PCR products were pooled. Samples were then sequenced using the Illumina High-Seq 4000 at the Beijing Genomics Institute.
  • Example 1 Optimization of CPF1 mediated genome cleavage activity in plants.
  • CPF1 The editing activity of CPF1 in plant cells is quite different in different articles, and the cleavage activity between different types of CPF1 is also very different.
  • the nuclear location state of AsCPF1, FnCPF1 and LbCPF1 was optimized, and the promoter for crRNA was also optimized to improve the cleavage activity of CPF1 in plant cells.
  • Vectors of AsCPF1, FnCPF1 and LbCPF1 carrying 1-4 NLSs were constructed, and different vectors for generating crRNA by ribozyme, driven by U3/U6 or UBI promoter were constructed (Fig. 1) .
  • dCPF1-PBE systems were constructed: dAsCPF1-2NLS-PBE, dFnCPF1-2NLS-PBE, dLbCPF1-2NLS-PBE.
  • the NLSs at the C-terminus are placed at one end of the UGI or placed at both ends of the UGI.
  • the expression of crRNA is initiated with UBI1 promoter and cleaved with a ribozyme.
  • dCPF1-PBE SEQ ID NO: 10 shows the dLBCPF1-PBE-2NLS expression cassette that comprise the sequences of the ZmUbi-1 promoter and an intron
  • CPF1-ABE systems are constructed: dAsCPF1-1NLS-ABE, dFnCPF1-NLS-ABE, dLbCPF1-1NLS-ABE, and dAsCPF1-2NLS-ABE, dFnCPF1-2NLS-ABE, dLbCPF1-2NLS-ABE, where ABE can be ABE7 . 9 or ABE7.10.
  • the crRNA is transcribed with UBI1 promoter and cleaved with a ribozyme.
  • the results by the GFP base editing reporter system of Fig. 3E indicate that both dFnCPF1-ABE7.10 (SEQ ID NO: 11) and dLbCPF1-ABE7.9 and dLbCPF1-ABE7.10 (SEQ ID NO: 12) can work, and efficiency of 7.10 is higher than 7.9 (Fig. 3F) .
  • dCPF1-ABE2-X1 SEQ ID NO. 13
  • ABE was also constructed at the C-terminus of CPF1 (dCPF1-ABE2-X2/X3) (SEQ ID NO. 14, 15) .
  • the results by the GFP base editing reporter system of Fig. 3E indicate that: editing activity of dCPF1-ABE2-X2/X3 is higher than that of dLbCPF1-ABE7.10 (Fig. 3G) .
  • CPF1 In order to further improve the editing efficiency of CPF1, we continued to optimize the CPF1 system. Firstly, all expression vectors for CPF1-mediated editing are driven with BdUbi10 promoter, to increase the expression. In addition, crRNA is transcribed using a type II promoter, and the crRNA Array is placed into the 5'UTR or 3'UTR region of a gene to be expressed, to improve the editing efficiency of CPF1 by increasing mRNA expression.
  • SEQ ID NO: 30 nucleotide sequence of promoter+intron

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Cell Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Mycology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

Provided relates to the field of genetic engineering. In particular, Provided relates to a base editing system and method based on CPF1 protein. More particularly, the provided relates to a system and method for efficient base editing of a target sequence in the genome of an organism (e.g., a plant) by a guide RNA-directed Cpfl-deaminase fusion protein, and the genetically modified organism (e.g., plants) produced by the method and the progeny thereof.

Description

[Title established by the ISA under Rule 37.2] BASE EDITING SYSTEM AND METHOD BASED ON CPF1 PROTEIN Technical field
The invention relates to the field of genetic engineering. In particular, the invention relates to a base editing system and method based on CPF1 protein. More particularly, the present invention relates to a system and method for efficient base editing of a target sequence in the genome of an organism (e.g., a plant) by a guide RNA-directed Cpfl-deaminase fusion protein, and the genetically modified organism (e.g., a plant) produced by the method and progeny thereof.
Technical Background
The prerequisite for efficient crop improvement is the capacity to obtain new genetic mutations that can be easily introduced into modern cultivars. Genetic studies, especially those studies based on whole-genome, have shown that changes in single nucleotides are the main reasons of differences in crop traits. Single base variations may result in amino acid substitutions leading to the evolution of superior alleles and superior traits. Before the emergence of genome editing, targeting induced local lesions in genomes (TILLING) can be used as a method for generating mutations that are urgently needed in crop improvement. However, TILLING screening is time consuming and laborious, and the identified point mutations are often limited for their number and types. Genomic editing techniques, particularly those based on the CRISPR/Cas9 system, enable the introduction of specific base substitutions in genomic loci by homologous recombination (HR) -mediated DNA repair pathways. However, the successful use of this method is currently limited, mainly due to the low frequency of HR-mediated double-strand broken chain repair in plants. In addition, effectively providing a sufficient amount of DNA repair templates is also a major difficulty. These problems make it a challenge to efficiently and simply achieve site-directed mutagenesis in plants through HR.
In recent years, using the binding properties of Cas9 to DNA and the properties of DNA deaminase, Cas9 and deaminase can be fused to achieve precise conversion of cytosine (C) to thymine (T) and conversion of adenine (A) to guanine (G) in a target gene. Currently, the system for C to T transformation mainly includes fusions of SpnCas9-BE3, SpnCas9-AID and Cas9 variants, such as VQR-BE3, EQR-BE3 and VRER-BE3, as well as SaCas9-BE3 and variant SaKKH-BE3. These combinations enable a reduced PAM limitation for cytosine (C) to thymine (T) transitions and a more variable range of editing windows. In addition, recently David Liu’s Lab from Harvard University has developed an adenine deaminase that acts on ssDNA by artificial evolution. The deaminase can be fused with Cas9 and then achieve the Cas9-ABE system that can convert A to G in DNA, which further expands the role of base editing. Although these studies have made a great use of single-base editing of DNA, there are still many problems with current single-base editing techniques. Firstly, the PAMs for Cas9 and Cas9 variants are generally limited in G/C-rich region, thus the types of PAM for the single-base editing system still needs to be broadened. Secondly, due to the poor specificity of Cas9-based editing, the single-base editing system still needs to be improved in terms of specificity. Third, due to that nCas9-BE3 and its variants, and nCas9-ABE usually produce single-stranded nicks on the non-targeting strands of the target site, and it tends to generate DNA indels while it generates single-base mutations during the mismatch repair process, therefore there is still room for improvement in the high fidelity of single base editing. Therefore, new systems and methods for base editing of plant genomes are still needed in the art.
Description of the drawings
Figure 1. Optimization of CPF1-mediated cleavage activity in plant genome.
Figure 2. CPF1-mediated C to T mutations in the plant genome.
Figure 3. CPF1-mediated A to G mutations in plant genomes.
Figure 4. Simultaneous base editing of multiple sites using the RNA  cleavage activity of CPF1.
Description of the Invention
1. Definition
In the present invention, the scientific and technical terms used herein have the meaning as commonly understood by a person skilled in the art unless otherwise specified. Also, the protein and nucleic acid chemistry, molecular biology, cell and tissue culture, microbiology, immunology related terms, and laboratory procedures used herein are terms and routine steps that are widely used in the corresponding field. For example, standard recombinant DNA and molecular cloning techniques used in the present invention are well known to those skilled in the art and are more fully described in the following document: Sambrook, J., Fritsch, E.F. and Maniatis, T., Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, 1989 (hereinafter referred to as "Sambrook" ) . In the meantime, in order to better understand the present invention, definitions and explanations of related terms are provided below.
"Cpf1 nuclease" , "Cpf1 protein" and "Cpf1" are used interchangeably herein and refer to an RNA-directed nuclease including a Cpf1 protein or a fragment thereof. Cpf1 is a component of the CRISPR-Cpf1 genome editing system that targets and cleaves DNA target sequences to form DNA double-strand breaks (DSBs) under the guidance of a guide RNA (crRNA) . The Cpf1 protein contains a cleavage domain of DNA and an independent RNA cleavage domain. The RNA cleavage domain of the Cpf1 protein is capable of processing pre-crRNA to form a mature crRNA.
“guide RNA” and “gRNA” can be used interchangeably herein. The guide RNA of the Cpf1-mediated genome editing system is typically composed only of mature crRNA molecules, wherein the crRNA comprises a sequence that is sufficiently identical to the target sequence to hybridize to the complement of the target sequence and direct the complex (Cpf1+crRNA) to sequence specifically bind to the target sequence.
"Deaminase" refers to an enzyme that catalyzes a deamination reaction. In some embodiments of the invention, the deaminase refers to a cytosine deaminase that catalyzes the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively. In some embodiments of the invention, the deaminase refers to adenine deaminase which is capable of catalyzing the formation of inosine (I) by adenosine or deoxyadenosine (A) .
"Genome" as used herein encompasses not only chromosomal DNA present in the nucleus, but also organellar DNA present in the subcellular components (e.g., mitochondria, plastids) of the cell.
As used herein, "organism" includes any organism that is suitable for genome editing, eukaryotes are preferred. Examples of organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, geese; plants including monocots and dicots such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis and the like.
A “genetically modified organism” or “genetically modified cell” includes the organism or the cell which comprises within its genome an exogenous polynucleotide or a modified gene or expression regulatory sequence. For example, the exogenous polynucleotide is stably integrated within the genome of the organism or the cell such that the polynucleotide is passed on to successive generations. The exogenous polynucleotide may be integrated into the genome alone or as part of a recombinant DNA construct. The modified gene or expression regulatory sequence means that, in the organism genome or the cell genome, said sequence comprises one or more nucleotide substitution, deletion, or addition. For example, a genetically modified organism obtained by the present invention may comprise one or more substitutions of C to T or A to G relative to a wild type (corresponding organism without such genetic modification) .
"Exogenous" in reference to a sequence means a sequence from a foreign species, or refers to a sequence in which significant changes in composition and /or locus occur from its native form through deliberate human intervention if  from the same species.
"Polynucleotide" , "nucleic acid sequence" , "nucleotide sequence" or "nucleic acid fragment" are used interchangeably and are single-stranded or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases. Nucleotides are referred to by their single letter names as follows: "A" is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively) , "C" means cytidine or deoxycytidine, "G" means guanosine or deoxyguanosine, "U" represents uridine, "T" means deoxythymidine, "R" means purine (A or G) , "Y" means pyrimidine (C or T) , "K" means G or T, "H" means A or C or T, "I" means inosine, and "N" means any nucleotide.
"Polypeptide, " "peptide, " and "protein" are used interchangeably in the present invention to refer to a polymer of amino acid residues. The terms apply to an amino acid polymer in which one or more amino acid residues is artificial chemical analogue of corresponding naturally occurring amino acid (s) , as well as to a naturally occurring amino acid polymer. The terms "polypeptide, " "peptide, " "amino acid sequence, " and "protein" may also include modified forms including, but not limited to, glycosylation, lipid ligation, sulfation, γ carboxylation of glutamic acid residues, and ADP-ribosylation.
As used in the present invention, "expression construct" refers to a vector such as a recombinant vector that is suitable for expression of a nucleotide sequence of interest in an organism. "Expression" refers to the production of a functional product. For example, expression of a nucleotide sequence may refer to the transcription of a nucleotide sequence (e.g., transcription to produce mRNA or functional RNA) and/or the translation of an RNA into a precursor or mature protein.
The "expression construct" of the present invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector or, in some embodiments, an RNA that is capable of translation (such as mRNA) .
The "expression construct" of the present invention may comprise regulatory sequences and nucleotide sequences of interest from different origins,  or regulatory sequences and nucleotide sequences of interest from the same source but arranged in a manner different from that normally occurring in nature.
"Regulatory sequence" and "regulatory element" are used interchangeably to refer to a nucleotide sequence that is located upstream (5 'non-coding sequence) , middle or downstream (3' non-coding sequence) of a coding sequence and affects the transcription, RNA processing or stability or translation of the relevant coding sequence.
Regulatory sequences may include, but are not limited to, promoters, translation leaders, introns and polyadenylation recognition sequences.
"Promoter" refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment. In some embodiments of the present invention, the promoter is a promoter capable of controlling the transcription of a gene in a cell, whether or not it is derived from the cell. The promoter may be a constitutive promoter or tissue-specific promoter or developmentally-regulated promoter or inducible promoter.
"Constitutive promoter" refers to a promoter that may in general cause the gene to be expressed in most cases in most cell types. "Tissue-specific promoter" and "tissue-preferred promoter" are used interchangeably and mean that they are expressed primarily but not necessarily exclusively in one tissue or organ, but also in a specific cell or cell type. "Developmentally-regulated promoter" refers to a promoter whose activity is dictated by developmental events. "Inducible promoter" selectively express operably linked DNA sequences in response to an endogenous or exogenous stimulus (environment, hormones, chemical signals, etc. ) .
As used herein, the term "operably linked" refers to the linkage of a regulatory element (e.g., but not limited to, a promoter sequence, a transcription termination sequence, etc. ) to a nucleic acid sequence (e.g., a coding sequence or an open reading frame) such that transcription of the nucleotide sequence is controlled and regulated by the transcriptional regulatory element. Techniques for operably linking regulatory element regions to nucleic acid molecules are  known in the art.
"Introduction" of a nucleic acid molecule (e.g., plasmid, linear nucleic acid fragment, RNA, etc. ) or protein into an organism means that the nucleic acid or protein is used to transform a cell of the organism such that the nucleic acid or protein is capable of functioning in the cell. As used in the present invention, "transformation" includes both stable and transient transformations.
"Stable transformation" refers to the introduction of exogenous nucleotide sequences into the genome, resulting in the stable inheritance of foreign genes. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any of its successive generations.
"Transient transformation" refers to the introduction of a nucleic acid molecule or protein into a cell, performing a function without the stable inheritance of an exogenous gene. In transient transformation, the exogenous nucleic acid sequences are not integrated into the genome.
As used herein, the term "plant" includes a whole plant and any descendant, cell, tissue, or part of a plant. The term "plant parts" include any part (s) of a plant, including, for example and without limitation: seed (including mature seed and immature seed) ; a plant cutting; a plant cell; a plant cell culture; a plant organ (e.g., pollen, embryos, flowers, fruits, shoots, leaves, roots, stems, and explants) . A plant tissue or plant organ may be a seed, protoplast, callus, or any other group of plant cells that is organized into a structural or functional unit. A plant cell or tissue culture may be capable of regenerating a plant having the physiological and morphological characteristics of the plant from which the cell or tissue was obtained, and of regenerating a plant having substantially the same genotype as the plant. In contrast, some plant cells are not capable of being regenerated to produce plants. Regenerable cells in a plant cell or tissue culture may be embryos, protoplasts, meristematic cells, callus, pollen, leaves, anthers, roots, root tips, silk, flowers, kernels, ears, cobs, husks, or stalks.
Plant parts include harvestable parts and parts useful for propagation of progeny plants. Plant parts useful for propagation include, for example and  without limitation: seed; fruit; a cutting; a seedling; a tuber; and a rootstock. A harvestable part of a plant may be any useful part of a plant, including, for example and without limitation: flower; pollen; seedling; tuber; leaf; stem; fruit; seed; and root.
A plant cell is the structural and physiological unit of the plant, and includes protoplast cells without a cell wall and plant cells with a cell wall. A plant cell may be in the form of an isolated single cell, or an aggregate of cells (e.g., a friable callus and a cultured cell) , and may be part of a higher organized unit (e.g., a plant tissue, plant organ, and plant) . Thus, a plant cell may be a protoplast, a gamete producing cell, or a cell or collection of cells that can regenerate into a whole plant. As such, a seed, which comprises multiple plant cells and is capable of regenerating into a whole plant, is considered a "plant part" in embodiments herein.
The term "protoplast" , as used herein, refers to a plant cell that had its cell wall completely or partially removed, with the lipid bilayer membrane thereof naked. Typically, a protoplast is an isolated plant cell without cell walls which has the potency for regeneration into cell culture or a whole plant.
“Progeny” of a plant comprises any subsequent generation of the plant.
“Trait” refers to the physiological, morphological, biochemical, or physical characteristics of a plant or a particular plant material or cell. In some embodiments, the characteristic is visible to the human eye, such as seed or plant size, or can be measured by biochemical techniques, such as detecting the protein, starch, or oil content of seed or leaves, or by observation of a metabolic or physiological process, e.g. by measuring tolerance to water deprivation or particular salt or sugar concentrations, or by the observation of the expression level of a gene or genes, or by agricultural observations such as osmotic stress tolerance or yield. In some embodiments, trait also includes ploidy of a plant, such as haploidy which is important for plant breeding. In some embodiments, trait also includes resistance of a plant to herbicides.
“Agronomic trait” is a measurable parameter including but not limited to, leaf greenness, yield, growth rate, biomass, fresh weight at maturation, dry  weight at maturation, fruit yield, seed yield, total plant nitrogen content, fruit nitrogen content, seed nitrogen content, nitrogen content in a vegetative tissue, total plant free amino acid content, fruit free amino acid content, seed free amino acid content, free amino acid content in a vegetative tissue, total plant protein content, fruit protein content, seed protein content, protein content in a vegetative tissue, drought tolerance, nitrogen uptake, root lodging, harvest index, stalk lodging, plant height, ear height, ear length, disease resistance, cold resistance, salt tolerance, and tiller number and so on.
2. Base editing system based on Cpf1 protein
The present invention provides a system for base editing of a target sequence in the genome of an organism, comprising at least one of the following i) to v) :
i) a base-editing fusion protein, and a guide RNA;
ii) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein, and a guide RNA;
iii) a base-editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
iv) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
v) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein and a nucleotide sequence encoding a guide RNA;
wherein the base-editing fusion protein comprises a Cpf1 lacking DNA cleavage activity, and a deaminase, the guide RNA being capable of targeting the base-editing fusion protein to a target sequence in the genome, resulting in one or more C to T or A to G substitution (s) in the target sequence.
Cpf1 contains a DNA cleavage domain (RuvC) , which can be mutated to delete the DNA cleavage activity of Cpf1 to form a "Cpf1 lacking DNA cleavage activity" . The Cpf1 lacking DNA cleavage activity still retains gRNA-directed DNA binding ability. Thus, in principle, when fused to an  additional protein, the Cpfl lack of DNA cleavage activity can readily target the additional protein to almost any DNA sequence simply by co-expression with a suitable guide RNA.
The Cpf1 lacking DNA cleavage activity of the present invention may be derived from Cpf1 of different species, for example, Cpf1 proteins derived from Francisella novicida U112, Acidaminococcus sp. BV3L6 and Lachnospiraceae bacterium ND2006, designated FnCpf1 (amino acid sequence of the wild type is set forth in SEQ ID NO: 19) , AsCpf1 (amino acid sequence of the wild type is set forth in SEQ ID NO: 18) and LbCpf1 (amino acid sequence of the wild type is set forth in SEQ ID NO: 20) , respectively.
In some embodiments, the Cpf1 lacking DNA cleavage activity is the FnCpfl lacking DNA cleavage activity. In some embodiments, the FnCpfl lacking DNA cleavage activity comprises a D917A mutation relative to wild-type FnCpfl.
In some embodiments, the Cpf1 lacking DNA cleavage activity is the AsCpfl lacking DNA cleavage activity. In some embodiments, the AsCpfl lacking DNA cleavage activity comprises a D908A mutation relative to wild-type AsCpfl.
In some preferred embodiments, the Cpf1 lacking DNA cleavage activity is the LbCpfl lacking DNA cleavage activity. In some embodiments, the LbCpfl lacking DNA cleavage activity comprises a D832A mutation relative to wild type LbCpfl.
In some embodiments, the Cpf1 lacking DNA cleavage activity retains its RNA cleavage activity such that the pre-crRNA can be processed to form a mature crRNA. Thus, in some embodiments, an expression construct comprising a nucleotide sequence encoding a guide RNA in a system of the invention may comprise a sequence encoding a plurality of different guide RNA (crRNA) precursors in tandem, which may be processed by the Cpf1 lacking DNA cleavage activity to form a plurality of different guide RNAs (crRNAs) upon transcription to simultaneously target a plurality of different target sequences.
In some embodiments of the invention, the deaminase in the fusion protein  is a cytidine deaminase, such as the apolipoprotein B mRNA editing complex (APOBEC) family deaminase.
Cytidine deaminase catalyzes the deamination of cytidine (C) in the DNA to form uracil (U) . The present inventors have surprisingly found that the fusion of a Cpf1 lacking DNA cleavage activity and a cytidine deaminase, under the guidance of a guide RNA, can target a target sequence in the genome. Because of the deficient of the DNA cleavage activity in Cpf1, the DNA double strands are not cleaved, and the cytidine deaminase in the fusion protein can deamination of the cytidine of the single-stranded DNA produced during the formation of the Cpf1-guide RNA-DNA complex into U, and then C to T replacement is achieved through the base mismatch repairs.
The cytidine deaminase of the present invention is particularly a cytidine deaminase which can accept single-stranded DNA as a substrate. Examples of cytidine deaminase useful in the present invention include, but are not limited to, APOBEC1 deaminase, activation-induced cytidine deaminase (AID) , APOBEC3G or CDA1. In some embodiments of the invention, the cytidine deaminase comprises the amino acid sequence set forth in SEQ ID NO: 1.
Where the deaminase in the fusion protein is a cytidine deaminase, the base editing system of the present invention can mutate one or more C (s) to T (s) in the genomic target sequence, thus also referred to as the Cpf1-PBE system.
In cells, uracil DNA glycosylase catalyzes the removal of U from DNA and initiates base excision repair (BER) , resulting in the repair of U: G to C: G. Thus, without being bound by any theory, the inclusion of a uracil DNA glycosylase inhibitor in a base-editing fusion protein of the present invention or a system of the present invention will increase the efficiency of base editing.
Thus, in some embodiments of the invention involving a Cpf1-PBE system, the base-editing fusion protein further comprises a uracil DNA glycosylase inhibitor (UGI) . In some embodiments, the uracil DNA glycosylase inhibitor comprises the amino acid sequence set forth in SEQ ID NO: 2.
In some embodiments of the invention, the deaminase is an adenine deaminase.
The naturally occurring adenine deaminase converts adenosine in single-stranded RNA into inosine (I) by deamination using RNA as a substrate. Recently, DNA-dependent adenine deaminase that convert deoxyguanosine in single-stranded DNA to inosine (I) using single-stranded DNA as a substrate has been obtained based on tRNA adenine deaminase TadA of E. coli by means of directed evolution. See Nicloe M. Gaudelli et al., doi: 10.1038/nature 24644, 2017.
The present inventors have surprisingly found that when Cpf1 lacking DNA cleavage activity is fused to a DNA-dependent adenine deaminase, under the guidance of a guide RNA, the fusion protein can target a target sequence in the plant genome. Due to the deficient of the DNA cleavage activity in Cpf1, the DNA double strands are not cleaved, and the DNA-dependent adenine deaminase in the fusion protein is capable of deaminating the adenosine of the single-stranded DNA produced during the formation of the Cpf1-guide RNA-DNA complex into a inosine (I) . Since DNA polymerase treats inosine (I) as guanine (G) , substitution of A to G can be achieved by base mismatch repair. Therefore, in the case where the deaminase in the fusion protein is a DNA-dependent adenine deaminase, the base editing system of the present invention can mutate one or more A in the genomic target sequence to G, and thus is also called Cpf1-ABE system.
In some embodiments of the present invention, the DNA-dependent adenine deaminase is a variant of the E. coli tRNA adenine deaminase TadA (ecTadA) , in particular a variant which can accept single-stranded DNA as a substrate. The variant comprises, relative to wild-type ecTadA, one or more sets of mutations selected from the group consisting of:
1) A106V and D108N;
2) D147Y and E155V;
3) L84F, H123Y and I156F;
4) A142N;
5) H36L, R51L, S146C and K157N;
6) P48S/T/A;
7) A142N;
8) W23L/R;
9) R152H/P.
In a specific embodiment of the present invention, the DNA-dependent adenine deaminase (ABE version 7.9) comprises the following mutations relative to wild-type ecTadA: W23R, H36L, R51L, S146C, K157N, A106V, D108N, P48A, L84F, H123Y, I156F, A142N, D147Y, E155V and R152P.
In a specific embodiment of the present invention, the DNA-dependent adenine deaminase (ABE version 7.10) comprises the following mutations relative to wild-type ecTadA: W23R, H36L, R51L, S146C, K157N, A106V, D108N, P48A, L84F, H123Y, I156F, D147Y, E155V and R152P.
Amino acid sequence of wild-type EcTadA is shown below: MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNR PIGRHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIH SRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADECAALL SDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 3) . In some embodiments, the initiating methionine may be absent.
The amino acid sequence of preferred ecTadA derived DNA-dependent adenosine deaminase (ABE Version 7.10) is shown below: MSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNR AIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIH SRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALL CYFFRMPRQVFNAQKKAQSSTD (SEQ ID NO: 4) . In some embodiments, the initiating methionine may be absent.
In some embodiments of the present invention, the deaminase is fused to the N-terminus of the Cpfl lacking DNA cleavage activity. In some embodiments, the deaminase is fused to the C-terminus of the Cpfl lacking DNA cleavage activity.
In some preferred embodiments, the N-terminus of the DNA-dependent adenine deaminase is fused with a corresponding wild-type adenine deaminase. It is expected that the formation of heterodimers by DNA-dependent adenine  deaminase and wild-type adenine deaminase can significantly increase the A to G editing activity of fusion proteins.
In some embodiments of the present invention, the deaminase and the Cpfl lacking DNA cleavage activity are fused via a linker. The linker may be 1-50 (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) amino acids in length, or non-functional amino acid sequences with more amino acids and without secondary or higher structures. For example, the linker can be a flexible linker such as GGGGS, GS, GAP, (GGGGS) x 3, GGS and (GGS) x7, and the like. In some specific embodiments, the linker is an XTEN linker. In some embodiments, the linker is 32 amino acids in length. In some specific embodiments, the amino acid sequence of the linker is: SGGSSGGSSGSETPGTSESATPESSGGSSGGS.
In some embodiments of the present invention, the base-editing fusion proteins of the present invention further comprise a nuclear localization sequence (NLS) . In general, one or more NLSs in the base-editing fusion protein should be of sufficient strength to drive the base-editing fusion protein in the nucleus of a plant cell to achieve an amount accumulation of base editing function. In general, the intensity of nuclear localization activity is determined by the number, location, one or more specific NLSs used of the NLS in the base-editing fusion protein, or a combination of these factors.
In some embodiments of the present invention, the NLS of the base-editing fusion protein of the present invention may be located at the N-terminus and/or C-terminus. In some embodiments, the base-editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs. In some embodiments, the base-editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS at or near the N-terminus. In some embodiments, the base-editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the C-terminus. In some embodiments, the base-editing fusion protein comprises a combination of these, such as comprises one or more NLSs at the N-terminus and one or more NLSs at the C-terminus. When there is more than one NLS, each can be selected to be independent of other NLSs. In some preferred  embodiments of the present invention, the base-editing fusion protein comprises two NLSs, for example, the two NLSs are located at the N-terminus and the C-terminus, respectively.
In general, NLS consists of one or more short sequences of positively charged lysine or arginine exposed on the surface of the protein, but other types of NLS are also known. Non-limiting examples of NLS include: KKRKV (nucleotide sequence 5'-AAGAAGAGAAAGGTC-3') , PKKKRKV (nucleotide sequence 5'-CCCAAGAAGAAGAGGAAGGTG-3' or CCAAAGAAGAAGAGGAAGGTT) , or SGGSPKKKRKV (nucleotide sequence 5'-TCGGGGGGGAGCCCAAAGAAGAAGCGGAAGGTG -3') .
In some embodiments of the present invention, the N-terminus of the base-editing fusion protein comprises the NLS with the amino acid sequence set forth in PKKKRKV. In some embodiments of the present invention, the C-terminus of the base-editing fusion protein comprises the NLS with the amino acid sequence set forth by SGGSPKKKRKV or KRPAATKKAGQAKKKK.
Furthermore, depending on the location of the DNA to be edited, the base-editing fusion proteins of the present invention may also include other localization sequences, such as cytoplasmic localization sequences, chloroplast localization sequences, mitochondrial localization sequences, and the like.
In some embodiments of the present invention involving Cpf1-PBE system, the base editing fusion protein also contains a uracil DNA glycosylase inhibitor (UGI) , and two NLSs flanking either N-terminal or C-terminal of the UGI. In some preferred embodiments, the base editing fusion protein of the invention comprises an amino acid sequence selected from SEQ ID NO: 24-29.
To obtain efficient expression, in some embodiments of the present invention, the nucleotide sequence encoding the base-editing fusion protein is codon optimized for the biological species to be base edited.
Codon optimization refers to the replacement of at least one codon (eg, about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) of a native sequence by a codon that is used more frequently or most frequently in the gene of the host cell, modifying the nucleic acid sequence while maintaining  the native amino acid sequence to enhance expression in the host cell of interest. Different species show specific preferences for certain codons of a particular amino acid. Codon preference (difference in codon usage between organisms) is often associated with the efficiency of translation of messenger RNA (mRNA) , which is believed to depend on the nature of the translated codon and the availability of specific transfer RNA (tRNA) molecules. The advantages of selected tRNAs within cells generally reflect the most frequently used codons for peptide synthesis. Therefore, genes can be customized to be best gene expressed in a given organism based on codon optimization. The codon usage table can be easily obtained, for example, in the Codon Usage Database available at  www. kazusa. orjp/codon/, and these tables can be adjusted in different ways. See, Nakamura Y. et. al "Codon usage tabulated from the international DNA sequence databases: status for the year2000 Nucl. Acids Res, 28: 292 (2000) .
In some specific embodiments, the base editing fusion protein of the invention is encoded by the nucleotide sequence selected from SEQ ID NO: 8-9, 11-12 or 14-15.
In some embodiments of the present invention, the nucleotide sequence encoding the base-editing fusion protein and/or the nucleotide sequence encoding the guide RNA is operably linked to an expression regulatory element such as a promoter.
Examples of promoters that can be used in the present invention include, but are not limited to, the polymerase (pol) I, pol II or pol III promoters. Examples of the pol I promoter include the gallus RNA pol I promoter. Examples of the pol II promoters include, but are not limited to, the immediate-early cytomegalovirus (CMV) promoter, the Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and the immediate-early simian virus 40 (SV40) promoter. Examples of pol III promoters include the U6 and H1 promoters. An inducible promoter such as a metallothionein promoter can be used. Other examples of promoters include the T7 phage promoter, the T3 phage promoter, the β-galactosidase promoter, and the Sp6 phage promoter, and the  like. Promoters that can be used in plants include, but are not limited to, cauliflower mosaic virus 35S promoter, maize Ubi-1 promoter, wheat U6 promoter, rice U3 promoter, maize U3 promoter, rice actin promoter.
Preferably, the guide RNA (crRNA) is expressed using the Ubi-1 promoter and cleaved to become mature with a ribozyme such as HDV ribozyme.
In one embodiment, the addition of an intron after the Ubi-1 promoter enhances expression of the protein or RNA of interest.
In some specific embodiments, the expression construct for expressing the base editing fusion protein of the invention comprises an expression cassette of SEQ ID NO: 10 or 13. Alternatively, the expression construct comprises a expression regulating sequence set forth in SEQ ID NO: 30.
3. The method of producing genetically modified organisms
In another aspect, the present invention provides a method of producing a genetically modified organism (e.g. a plant) , comprising introducing a system of the present invention for base editing of a target sequence in the genome of an organism into a cell of the organism, whereby the guide RNA targets the base-editing fusion protein to a target sequence in the genome, resulting in one or more C to T or one or more A to G substitutions in the target sequence.
The design of target sequences or crRNA coding sequences that can be recognized and targeted by the Cpf1 protein and the guide RNA (i.e., crRNA) complex can be found, for example, in Zhang et al., Cell 163, 1–13, October 22, 2015. In general, the 5'-terminus of the target sequence targeted by the genome editing system of the present invention needs to include a protospacer adjacent motif (PAM) 5'-TTTN or 5'-YTN, wherein N is independently selected from A, G, C and T, Y is selected from C and T.
For example, in some embodiments of the present invention, the target sequence has the following structure: 5'-TTTN-NX-3' or 5'-YTN-NX-3', wherein N is independently selected from A, G, C and T, Y is selected from C and T; X is an integer of 15 ≤ X ≤ 35; Nx represents X consecutive nucleotides.
In the present invention, the target sequence to be modified may be located at any location in the genome, for example, in a functional gene such as a protein-encoding gene, or may be, for example, located in a gene expression regulatory region such as a promoter region or an enhancer region, thereby the gene functional modification or gene expression modification can be achieved.
A to G or C to T base editing in the target sequence of a cell can be detected by T7EI, PCR/RE or sequencing methods.
In the methods of the present invention, the base editing system can be introduced into cells by a variety of methods well known to those skilled in the art. Methods that can be used to introduce a genome editing system of the present invention into a cell include, but are not limited to, calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (e.g., baculovirus, vaccinia virus, adenovirus, adeno-associated virus, lentivirus and other viruses) , gene gun method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation.
A cell that can be edited by the method of the present invention can be a cell of mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, cat; a cell of poultry such as chicken, duck, goose; a cell of plants including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybean, peanut and Arabidopsis thaliana and so on.
The methods of the invention are particularly suitable for producing genetically modified plants, such as crop plants. In the method of producing a genetically modified plant of the present invention, the base editing system can be introduced into a plant by various methods well known to those skilled in the art. Methods that can be used to introduce a base editing system of the invention into a plant include, but are not limited to, gene gun method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation, plant virus-mediated transformation, pollen tube pathway and ovary injection method.
In the method for producing a genetically modified plant of the present invention, the modification of the target sequence can be achieved by only introducing or producing the base-editing fusion protein and the guide RNA in  the plant cell, and the modification can be stably inherited, without any need to stably transform the base editing system into plants. This avoids the potential off-target effect of the stable base editing system and also avoids the integration of the exogenous nucleotide sequence in the plant genome, thereby providing greater biosafety.
In some preferred embodiments, the introduction is carried out in the absence of selection pressure to avoid integration of the exogenous nucleotide sequence into the plant genome.
In some embodiments, the introduction comprises transforming the base editing system of the present invention into an isolated plant cell or tissue and then regenerating the transformed plant cell or tissue into an intact plant. Preferably, the regeneration is carried out in the absence of selection pressure, i.e., no selection agent for the selection gene on the expression vector is used during tissue culture. Avoiding the use of a selection agent can increase the regeneration efficiency of the plant, obtaining a modified plant free of exogenous nucleotide sequences.
In other embodiments, the base editing system of the present invention can be transformed into specific parts of an intact plant, such as leaves, shoot tips, pollen tubes, young ears or hypocotyls. This is particularly suitable for the transformation of plants that are difficult to regenerate in tissue culture.
In some embodiments of the invention, the in vitro expressed protein and/or the in vitro transcribed RNA molecule are directly transformed into the plant. The protein and/or RNA molecule is capable of performing base editing in plant cells and is subsequently degraded by the cell, avoiding integration of the exogenous nucleotide sequence in the plant genome.
Plants that can be base-edited by the methods of the invention include monocots and dicots. For example, the plant may be a crop plant such as wheat, rice, corn, soybean, sunflower, sorghum, canola, alfalfa, cotton, barley, millet, sugar cane, tomato, tobacco, tapioca or potato.
In some embodiments of the present invention, the target sequence is associated with a plant trait, such as an agronomic trait, whereby the base  editing results in a plant having altered traits relative to a wild type plant.
In the present invention, the target sequence to be modified may be located at any position in the genome, for example, in a functional gene such as a protein-encoding gene, or may be, for example, located in a gene expression regulatory region such as a promoter region or an enhancer region, thereby gene functional modification or gene expression modification can be achieved. Accordingly, in some embodiments of the present invention, the substitution of C to T or A to G results in an amino acid substitution in the target protein or a truncation of the target protein (a stop codon is generated) . In other embodiments of the present invention, the substitution of C to T or A to G results in a change in expression of the target gene.
In some embodiments of the present invention, the method further comprises obtaining progeny of the genetically modified plant.
In another aspect, the present invention provides a genetically modified plant or a progeny thereof, or a part thereof, wherein the plant is obtained by the method of the invention described above.
In another aspect, the present invention provides a method of plant breeding comprising crossing a genetically modified first plant obtained by the above method of the present invention with a second plant not containing the genetic modification, thereby the genetic modification is introduced into the second plant.
Example
Construction of Ubi-CPF1-PBE/ABE expression vector
The ABE, XTEN, dCPF1 sequences were codon optimized for plants and ordered from GenScript (Nanjing) . The full-length dCPF1-ABE fragment was amplified using primer pairs HindIII-F (with HindIII restriction site) and EcoRI (with EcoRI restriction site) . The PCR product was digested with HindIII and EcoRI, and then inserted into the pJIT163-GFP vectors (this vector sequence is shown in SEQ ID NO: 16) digested with the two enzymes to generate the fusion expression vector dCPF1-ABE.
The PBE, XTEN, and dCPF1 sequences were codon optimized for plants and ordered from GenScript (Nanjing) . The full length dCPF1-PBE fragment was amplified using primer pairs HindIII-F (with HindIII restriction site) and EcoRI (with EcoRI restriction site) . The PCR product was digested with HindIII and EcoRI, and then inserted into the pJIT163-GFP vectors (this vector sequence is shown in SEQ ID NO: 16) digested with the two enzymes to generate the fusion expression vector dCPF1-PBE.
Construction of sgRNA expression vector
According to the previous description (Wang, Y. et al. Simultaneous editing of three homoeoalleles in hexaploid bread wheat confers heritable resistance to powdery mildew. Nat. Biotechnol. 32, 947-951, 2014; Shan, Q. et al. Targeted genome modification of Crops using a CRISPR-Cas system. Nat. Biotechnol. 31, 686-688, 2013; and Liang, Z. et al. Targeted mutagenesis in Zea mays using TALENs and the CRISPR/Cas system. J Genet Genomics. 41, 63 -68, 2014) , an sgRNA expression vector was constructed based on pTaU6-sgRNA (Addgene ID53062) or pOsU3-sgRNA (Addgene ID53063) or pZmU3-sgRNA (Addgene ID5306) or OsU3/TaU6-tRNA-sgRNA (Zhang et al. 2017. Genome Biology. DOI: 10.1186 /s13059-017-1325-9) . In addition, the hammerhead enzyme and crRNA are activated by the type II promoter to generate crRNA (Tang et al. Nature plant, doi: 10.1038/nplants. 2017.18)
pUbi-mGFPP-crRNA, pUbi-DEP1-sgRNA, pUbi-DEP1-crRNA, pUbi-DME-crRNA.
BFP and GFP expression vectors
pUbi-mGFP, the vector sequence is shown in SEQ ID NO: 17.
Protoplast assays
Wheat Bobwhite and rice Nipponbare were used in this study. Protoplast transformation was performed as described below. The average transformation efficiency is 55-70%. Protoplasts transformation is performed as described  below. Transformation is carried out with 10μg of each plasmid by PEG-mediated method. Protoplasts were collected after 48h and DNA was extracted for T7EI and PCR-RE assay.
Preparation and transformation of wheat protoplasts
1) The middle parts of wheat tender leaves were cut into strips of 0.5-1 mm in width. The strips were placed into 0.6M Mannitol solution for 10 minutes, filtered, and then placed in 50 ml enzyme solution 20-25℃ in darkness, with gently shaking (10rmp) for 5 hours.
2) 10 ml W5 was added to dilute the enzymolysis products and the products were filtered with a 75 μm nylon filter in a round bottom centrifuge tube (50 ml) .
3) 23℃ 100 g centrifugation for 3 min, and the supernatant was discarded.
4) The products were gently suspended with 10 ml W5, placed on the ice for 30 min to allow the protoplasts gradually settling, and the supernatant was discarded.
5) Protoplasts were suspended by adding an appropriate amount of MMG , placed on ice until tranformation.
6) 10-20μg plasmid, 200μl protoplasts (about 4×105 cells) , 220μl fresh PEG solution were added into a 2 ml centrifuge tube, mixed up, and placed under room temperature in darkness for 10-20 minutes to induce transformation.
7) After the induction of transformation, 880μl W5 solution was slowly added, and the tubes were gently turned upside down for mixing, then 100 g horizontal centrifuged for 3 min, and the supernatant was discarded.
8) The products were resuspended in 2 ml W5 solution, transferred to a six-well plate, cultivated under room temperature (or 25℃) in darkness. For protoplast genomic DNA extraction, the products need to be cultivated for 48 h.
Preparation and transformation of rice protoplast
1) Leaf sheath of the seedlings were used for protoplasts isolation, and cut into about 0.5 mm wide with a sharp blade.
2) Immediately after incision, transferred into 0.6M Mannitol solution, and  placed in the dark for 10 min.
3) Mannitol solution was removed by filtration, and the products were transferred into enzymolysis solution, and evacuated for 30 min.
4) Enzymolysis was performed for 5-6h in darkness with gently shaking (decolorization shaker, speed 10) .
5) After enzymolysis completion, an equal volume of W5 was added, horizontal shake for 10s to release protoplasts.
6) Protoplasts were filtered into a 50 ml round bottom centrifuge tube with a 40μm nylon membrane and washed with W5 solution.
7) 250g horizontal centrifugation for 3min to precipitate the protoplasts, the supernatant was discarded.
8) Protoplasts were resuspended by adding 10ml W5, and then centrifuged at 250g for 3min, and the supernatant was discarded.
9) An appropriate amount of MMG solution was added to resuspend the protoplasts to a concentration of 2×10 6/ml.
Note: All the above steps were carried out at room temperature.
10) 10-20μg plasmid, 200μl protoplasts (about 4×10 5 cells) , and 220μl fresh PEG solution were added into a 2 ml centrifugal tube, mixed, and placed at room temperature in darkness for 10-20 minutes to induce transformation.
11) After the completion of the transformation, 880μl W5 solution was slowly added, and the tubes were gently turned upside down for mixing, 250g horizontal centrifuged for 3min, and the supernatant was discarded.
12) The products were resuspended in 2ml WI solution, transferred to a six-well plate, cultivated in room temperature (or 25℃) in darkness. For protoplast genomic DNA extraction, the products need to be cultivated for 48 h.
PCR/RE:
1) Plant genomic DNA was extracted.
2) Fragments containing the target sites, the length of which is between 350-1000 bp, were amplified with synthetic gene-specific primers:
10×EasyTaq Buffer 5 μl
dNTP (2.5 mM) 4 μl
Forward primer (10 μM) 2 μl
Forward primer (10 μM) 2 μl
Easy Taq 0.5 μl
DNA
2 μl
ddH 2O To 50 μl
3) The general reaction conditions are: denaturation at 94℃ for 5min; denaturation at 94℃ for 30s; anneal at 58℃ for 30s, extension at 72℃ for 30s, amplification for 30 to 35 cycles; incubation at 72℃ for 5min; incubation at 12℃. 5μl PCR products were subjected to electrophoresis.
4) PCR products were digested with restriction endonuclease as follows:
10×Fastdigest Buffer μl
Restriction enzymes
1 μl
PCR product 3-5 μl
ddH 2O To 20 μl
5) Digestion at 37℃ for 2-3 h. Products were analyzed by 1.2%agarose gel electrophoresis.
6) The uncut mutant bands in the PCR products were recovered and purified, and subjected to TA cloning as follows:
Figure PCTCN2018123158-appb-000001
7) The ligation was performed at 22℃ for 12min. And the products were transformed into E. coli competent cells, which were then plated on LB plates (Amp100, IPTG, and X-gal) , incubated at 22℃ for 12-16h. White colonies were picked for identifying positive clones and sequencing.
Deep sequencing
Different sgRNA expression vectors were transformed respectively into wheat and rice protoplasts with Ubi-CPF1-PBE/ABE expression vector. Protoplasts were collected 48 hours later, and DNA was extracted for deep sequencing. In the first round of PCR, the target region was amplified using site-specific primers. In second round of PCR, forward and reverse tags were added to the end of the PCR product for library construction. Equal amounts of different PCR products were pooled. Samples were then sequenced using the Illumina High-Seq 4000 at the Beijing Genomics Institute.
Example 1. Optimization of CPF1 mediated genome cleavage activity in plants.
The editing activity of CPF1 in plant cells is quite different in different articles, and the cleavage activity between different types of CPF1 is also very different.
In this example, the nuclear location state of AsCPF1, FnCPF1 and LbCPF1 was optimized, and the promoter for crRNA was also optimized to improve the cleavage activity of CPF1 in plant cells. Vectors of AsCPF1, FnCPF1 and LbCPF1 carrying 1-4 NLSs were constructed, and different vectors for generating crRNA by ribozyme, driven by U3/U6 or UBI promoter were constructed (Fig. 1) . It can be seen from the results of PCR/RE that the three CPFs with two NLSs can work, and the efficiency of LbCPF1 is high (SEQ ID NO: 5-7 are the coding sequences of ASCPF1-2NLS, FNCPF1-2NLS and LBCPF1-2NLS, respectively, the corresponding amino acid sequence can be easily obtained) . For the target site of the OsPDS gene, it can be seen that the efficiency of 2NLS-LbCPF1 is higher than that of NLS-LbCPF1, and higher than other reported constructs.
Example 2. CPF1-mediated C to T mutation of plant genome (CPF1-PBE)
With reference to the characteristics of CPF1’s cleavage activity in plant cells, the following dCPF1-PBE systems were constructed: dAsCPF1-2NLS-PBE, dFnCPF1-2NLS-PBE, dLbCPF1-2NLS-PBE. The NLSs at the C-terminus are placed at one end of the UGI or placed at both ends of the UGI. The expression of crRNA is initiated with UBI1 promoter and cleaved with a ribozyme. The results of PCR/RE detection indicated that editing activity was detected for dFnCPf1 and dLbCPF1, and NLS at only one end of UGI had higher activity (SEQ ID NO: 8, 9 shows the coding sequence of dFNCPF1-PBE-2NLS and dLbCPF1-2NLS-PBE, respectively, and the corresponding amino acid sequence can be easily obtained) . In addition, an enhanced dCPF1-PBE2-X was constructed, i.e. an intron was added after the ZmUbi-1 promoter to increase the expression of dCPF1-PBE (SEQ ID NO: 10 shows the dLBCPF1-PBE-2NLS expression cassette that comprise the sequences of the ZmUbi-1 promoter and an intron) .
Example 3. CPF1-mediated A to G mutation in the plant genome (CPF1-ABE)
The following CPF1-ABE systems are constructed: dAsCPF1-1NLS-ABE, dFnCPF1-NLS-ABE, dLbCPF1-1NLS-ABE, and dAsCPF1-2NLS-ABE, dFnCPF1-2NLS-ABE, dLbCPF1-2NLS-ABE, where ABE can be ABE7 . 9 or ABE7.10. The crRNA is transcribed with UBI1 promoter and cleaved with a ribozyme.
The results by the GFP base editing reporter system of Fig. 3E indicate that both dFnCPF1-ABE7.10 (SEQ ID NO: 11) and dLbCPF1-ABE7.9 and dLbCPF1-ABE7.10 (SEQ ID NO: 12) can work, and efficiency of 7.10 is higher than 7.9 (Fig. 3F) .
The detection result by PCR/RE showed that activity was detected for dLbCPF1-ABE7.10, and 2NLS was better than 1NLS. In addition, two enhanced dCPF1-ABE2 were constructed by adding an intron after the UBI1 promoter to increase the expression of dCPF1-ABE (dCPF1-ABE2-X1) (SEQ ID NO. 13) , and ABE was also constructed at the C-terminus of CPF1 (dCPF1-ABE2-X2/X3) (SEQ ID NO. 14, 15) . The results by the GFP base editing reporter system of Fig. 3E indicate that: editing activity of dCPF1-ABE2-X2/X3 is higher than that of dLbCPF1-ABE7.10 (Fig. 3G) .
Example 4. Optimization of CPF1-mediated gene editing
In order to further improve the editing efficiency of CPF1, we continued to optimize the CPF1 system. Firstly, all expression vectors for CPF1-mediated editing are driven with BdUbi10 promoter, to increase the expression. In addition, crRNA is transcribed using a type II promoter, and the crRNA Array is placed into the 5'UTR or 3'UTR region of a gene to be expressed, to improve the editing efficiency of CPF1 by increasing mRNA expression.
Description of Relevant Sequences:
SEQ ID NO. 1 amino acid sequence of cytidine deaminase
SEQ ID NO. 2 amino acid sequence of uracil DNA glycosylase inhibitor (UGI)
SEQ ID NO. 3 amino acid sequence of WT ecTadA 
SEQ ID NO. 4 amino acid sequence of ecTadA-derived DNA-dependent adenine deaminase (ABE 7.10)
SEQ ID NO. 5 encoding sequence of ASCPF1-2NLS
SEQ ID NO. 6 encoding sequence of FNCPF1-2NLS
SEQ ID NO. 7 encoding sequence of LBCPF1-2NLS
SEQ ID NO. 8 encoding sequence of dFNCPF1-PBE-2NLS
SEQ ID NO. 9 encoding sequence of dLBCPF1-PBE-2NLS
SEQ ID NO. 10 encoding sequence of promoter+intron+dLBCPF1-PBE-2NLS
SEQ ID NO. 11 encoding sequence of dFNCPF1-ABE7.10-2NLS
SEQ ID NO. 12 encoding sequence of dLBCPF1-ABE7.10-2NLS
SEQ ID NO. 13 encoding sequence of promoter+intron+dLBCPF1-ABE2-X
SEQ ID NO. 14 encoding sequence of LBCPF1-ABE2-X2
SEQ ID NO. 15 encoding sequence of LBCPF1-ABE2-X3
SEQ ID NO. 16 PJIT163-GFP
SEQ ID NO: 17 pBUI-mGFP
SEQ ID NO: 18 amino acid sequence of ASCPF1
SEQ ID NO: 19 amino acid sequence of FNCPF1
SEQ ID NO: 20 amino acid sequence of LBCPF1
SEQ ID NO: 21 amino acid sequence of ASCPF1-2NLS
SEQ ID NO: 22 amino acid sequence of FNCPF1-2NLS
SEQ ID NO: 23 amino acid sequence of LBCPF1-2NLS
SEQ ID NO: 24 amino acid sequence of dFNCPF1-PBE-2NLS
SEQ ID NO: 25 amino acid sequence of dLBCPF1-PBE-2NLS
SEQ ID NO: 26 amino acid sequence of dFNCPF1-ABE7.10-2NLS
SEQ ID NO: 27 amino acid sequence of dLBCPF1-ABE7.10-2NLS
SEQ ID NO: 28 amino acid sequence of LBCPF1-ABE2-X2
SEQ ID NO: 29 amino acid sequence of LBCPF1-ABE2-X3
SEQ ID NO: 30 nucleotide sequence of promoter+intron

Claims (20)

  1. A system for base editing of a target sequence in the genome of an organism, comprising at least one of the following i) to v) :
    i) a base-editing fusion protein, and a guide RNA;
    ii) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein, and a guide RNA;
    iii) a base-editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
    iv) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;
    v) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein and a nucleotide sequence encoding a guide RNA;
    wherein the base-editing fusion protein comprises a Cpf1 lacking DNA cleavage activity and a deaminase, the guide RNA being capable of targeting the base-editing fusion protein to a target sequence in the genome, resulting in one or more C to T or A to G substitutions in the target sequence.
  2. The system of claim 1, wherein the Cpf1 lacking DNA cleavage activity is FnCpfl lacking DNA cleavage activity, for example the FnCpfl lacking DNA cleavage activity comprises a D917A mutation relative to wild-type FnCpfl.
  3. The system of claim 1, wherein the Cpf1 lacking DNA cleavage activity is AsCpfl lacking DNA cleavage activity, for example the AsCpfl lacking DNA cleavage activity comprises a D908A mutation relative to wild-type AsCpfl.
  4. The system of claim 1, wherein the Cpf1 lacking DNA cleavage activity is LbCpfl lacking DNA cleavage activity, for example the LbCpfl lacking DNA cleavage activity comprises a D832A mutation relative to wild type LbCpfl.
  5. The system of claim 1, wherein the deaminase is a cytidine deaminase, such as the apolipoprotein B mRNA editing complex (APOBEC) family deaminase.
  6. The system of claim 5, wherein the cytidine deaminase is APOBEC1 deaminase or activation-induced cytidine deaminase (AID)
  7. The system of claim 5, wherein the base-editing fusion protein further comprises a uracil DNA glycosylase inhibitor (UGI)
  8. The system of claim 1, wherein the deaminase is a DNA-dependent adenine deaminase, preferably a single-stranded DNA-dependent adenine deaminase.
  9. The system of claim 8, wherein the DNA-dependent adenine deaminase is a variant of the E. coli tRNA adenine deaminase TadA (ecTadA) , in particular a variant which can accept single-stranded DNA as a substrate,
  10. The system of claim 9, wherein the DNA-dependent adenine deaminase comprises, relative to wild-type ecTadA, one or more sets of mutations selected from the group consisting of:
    1) A106V and D108N;
    2) D147Y and E155V;
    3) L84F, H123Y and I156F;
    4) A142N;
    5) H36L, R51L, S146C and K157N;
    6) P48S/T/A;
    7) A142N;
    8) W23L/R;
    9) R152H/P.
  11. The system of claim 10, wherein the DNA-dependent adenine deaminase comprises the following mutations relative to wild-type ecTadA: W23R, H36L, R51L, S146C, K157N, A106V, D108N, P48A, L84F, H123Y, I156F, D147Y, E155V and R152P.
  12. The system of claim 9, the N-terminus of the DNA-dependent adenine deaminase is fused with a corresponding wild-type adenine deaminase, preferably the N-terminus of the DNA-dependent adenine deaminase is fused to a corresponding wild-type adenine deaminase via a linker.
  13. The system of claim 1, wherein the deaminase is fused to the N-terminus of the Cpfl lacking DNA cleavage activity, or the deaminase is fused to the C-terminus of the Cpfl lacking DNA cleavage activity.
  14. The system of claim 1, wherein the deaminase and the Cpfl lacking DNA cleavage activity are fused via a linker.
  15. The system of claim 1, wherein the base-editing fusion protein further comprises a nuclear localization sequence (NLS) at its N-terminus and/or C-terminus.
  16. The system of claim 1, wherein the nucleotide sequence encoding the base-editing fusion protein is codon optimized for the organism to be base edited.
  17. The system of claim 1, the nucleotide sequence encoding the base-editing fusion protein and/or the nucleotide sequence encoding the guide RNA is operably linked to an expression regulatory element.
  18. The system of claim 17, wherein the regulatory element is a promoter, such as 35S promoter, maize Ubi-1 promoter, wheat U6 promoter, rice U3 promoter or maize U3 promoter.
  19. A method of producing a genetically modified organism comprising introducing the system of any of claims 1-18 into a cell of the organism, whereby the guide RNA targets the base editing fusion protein to a target sequence in the genome of the cell, resulting in one or more C to T or A to G substitutions in the target sequence.
  20. The method of claim 19, wherein the organism is selected from a mammal such as human, mouse, rat, monkey, dog, pig, sheep, cow, cat; a poultry such as chicken, duck, goose; a plant, including a monocot and a dicot, such as rice, corn, wheat, sorghum, barley, soybeans, peanuts, Arabidopsis.
PCT/CN2018/123158 2017-12-22 2018-12-24 Base editing system and method based on cpf1 protein WO2019120310A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711403490 2017-12-22
CN201711403490.X 2017-12-22

Publications (1)

Publication Number Publication Date
WO2019120310A1 true WO2019120310A1 (en) 2019-06-27

Family

ID=66992485

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/123158 WO2019120310A1 (en) 2017-12-22 2018-12-24 Base editing system and method based on cpf1 protein

Country Status (3)

Country Link
CN (1) CN109957569B (en)
AR (1) AR114014A1 (en)
WO (1) WO2019120310A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110551752A (en) * 2019-08-30 2019-12-10 北京市农林科学院 xCas9n-epBE base editing system and application thereof in genome base replacement
CN112430622A (en) * 2020-10-26 2021-03-02 扬州大学 FokI and dCpf1 fusion protein expression vector and site-directed gene editing method mediated by same
WO2021050571A1 (en) * 2019-09-09 2021-03-18 Beam Therapeutics Inc. Novel nucleobase editors and methods of using same
WO2021047656A1 (en) 2019-09-12 2021-03-18 中国科学院遗传与发育生物学研究所 Herbicide resistant plant
WO2021056302A1 (en) * 2019-09-26 2021-04-01 Syngenta Crop Protection Ag Methods and compositions for dna base editing
WO2022020407A1 (en) * 2020-07-21 2022-01-27 Pairwise Plants Services, Inc. Optimized protein linkers and methods of use
CN114317589A (en) * 2020-09-30 2022-04-12 北京市农林科学院 Application of SpRYn-ABE base editing system in base replacement of plant genome
CN114317518A (en) * 2020-09-30 2022-04-12 北京市农林科学院 Application of SpRYn-CBE base editing system in base replacement of plant genome
CN114317596A (en) * 2020-09-30 2022-04-12 北京市农林科学院 Method for mutating A in plant genome target sequence into G
CN114686456A (en) * 2022-05-10 2022-07-01 中山大学 Base editing system based on bimolecular deaminase complementation and application thereof
CN115820691A (en) * 2022-07-25 2023-03-21 安徽农业大学 Rice basic group editing system based on LbCpf1 variant and application
US11708568B2 (en) 2019-07-19 2023-07-25 Pairwise Plants Services, Inc. Optimized protein linkers and methods of use
US20230272398A1 (en) * 2018-12-27 2023-08-31 LifeEDIT Therapeutics, Inc. Polypeptides useful for gene editing and methods of use
WO2023187027A1 (en) * 2022-03-30 2023-10-05 BASF Agricultural Solutions Seed US LLC Optimized base editors
US12133884B2 (en) 2019-05-11 2024-11-05 Beam Therapeutics Inc. Methods of substituting pathogenic amino acids using programmable base editor systems

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3128755C (en) 2019-02-13 2024-06-04 Beam Therapeutics Inc. Compositions and methods for treating hemoglobinopathies
WO2021032155A1 (en) * 2019-08-20 2021-02-25 中国科学院遗传与发育生物学研究所 Base editing system and use method therefor
CN114829595A (en) * 2019-10-17 2022-07-29 成对植物服务股份有限公司 Variants of CAS12A nuclease, methods of making and uses thereof
WO2021087182A1 (en) * 2019-10-30 2021-05-06 Pairwise Plants Services, Inc. Type v crispr-cas base editors and methods of use thereof
CN111019967A (en) * 2019-11-27 2020-04-17 南京农业大学 Application of GmU3-19g-1 and GmU6-16g-1 promoters in soybean polygene editing system
CN110964741B (en) * 2019-12-20 2022-03-01 北京市农林科学院 Nuclear localization signal FNB and application thereof in improving base editing efficiency
CN111518794B (en) * 2020-04-13 2023-05-16 中山大学 Preparation and use of induced muteins based on activation of induced cytidine deaminase
CN112851776B (en) * 2020-04-20 2022-08-30 中国科学院天津工业生物技术研究所 Gene site-directed mutagenesis method and stress resistance breeding application thereof
CN113005141A (en) * 2021-01-05 2021-06-22 温州医科大学 Gene editing tool composed of high-activity mutant, preparation method and method for repairing congenital retinoschisis disease pathogenic gene
CA3216308A1 (en) * 2021-04-21 2022-10-27 Zhejiang University Negative-strand rna viral vector and plant genome editing method without transformation
CN115704015A (en) * 2021-08-12 2023-02-17 清华大学 Targeted mutagenesis system based on adenine and cytosine double-base editor
CN114045302A (en) * 2021-11-12 2022-02-15 三亚中国农业科学院国家南繁研究院 Single-base editing vector and construction and application thereof
CN114835818B (en) * 2022-03-17 2024-03-22 江南大学 Gene editing fusion protein, adenine base editor constructed by same and application thereof
WO2023207607A1 (en) * 2022-04-29 2023-11-02 北京大学 Deaminase mutant, composition, and method for modifying mitochondrial dna
CN116376948B (en) * 2022-07-25 2023-12-15 广州医科大学 Plasmid vector and preparation method of MS2 phage similar particles for displaying exogenous proteins
CN116286734B (en) * 2022-11-29 2024-04-02 武汉大学 Mutant of wild LbCAs12a protein and SNP detection application
CN116751799B (en) * 2023-06-14 2024-01-26 江南大学 Multi-site double-base editor and application thereof
CN117965505A (en) * 2023-06-28 2024-05-03 微光基因(苏州)有限公司 Engineered adenosine deaminase and base editor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017083722A1 (en) * 2015-11-11 2017-05-18 Greenberg Kenneth P Crispr compositions and methods of using the same for gene therapy
WO2017127807A1 (en) * 2016-01-22 2017-07-27 The Broad Institute Inc. Crystal structure of crispr cpf1
WO2017184786A1 (en) * 2016-04-19 2017-10-26 The Broad Institute Inc. Cpf1 complexes with reduced indel activity
WO2017189308A1 (en) * 2016-04-19 2017-11-02 The Broad Institute Inc. Novel crispr enzymes and systems

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3026055A1 (en) * 2016-04-19 2017-10-26 The Broad Institute, Inc. Novel crispr enzymes and systems
IL308426A (en) * 2016-08-03 2024-01-01 Harvard College Adenosine nucleobase editors and uses thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017083722A1 (en) * 2015-11-11 2017-05-18 Greenberg Kenneth P Crispr compositions and methods of using the same for gene therapy
WO2017127807A1 (en) * 2016-01-22 2017-07-27 The Broad Institute Inc. Crystal structure of crispr cpf1
WO2017184786A1 (en) * 2016-04-19 2017-10-26 The Broad Institute Inc. Cpf1 complexes with reduced indel activity
WO2017189308A1 (en) * 2016-04-19 2017-11-02 The Broad Institute Inc. Novel crispr enzymes and systems

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI, XIAOSA ET AL.: "Base editing with a Cpfl-cytidine deaminase fusion", NATURE BIOTECHNOLOGY, vol. 36, no. 4, 19 March 2018 (2018-03-19), pages 324 - 327, XP055579743 *
SHIMATANI, ZENPEI ET AL.: "Targeted base editing in rice and tomato using a CRIS PR-Cas9 cytidine deaminase fusion", NATURE BIOTECHNOLOGY, vol. 35, no. 5, 27 March 2017 (2017-03-27), pages 441 - 443, XP055529795 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230272398A1 (en) * 2018-12-27 2023-08-31 LifeEDIT Therapeutics, Inc. Polypeptides useful for gene editing and methods of use
US12133884B2 (en) 2019-05-11 2024-11-05 Beam Therapeutics Inc. Methods of substituting pathogenic amino acids using programmable base editor systems
US11708568B2 (en) 2019-07-19 2023-07-25 Pairwise Plants Services, Inc. Optimized protein linkers and methods of use
CN110551752A (en) * 2019-08-30 2019-12-10 北京市农林科学院 xCas9n-epBE base editing system and application thereof in genome base replacement
CN110551752B (en) * 2019-08-30 2023-03-14 北京市农林科学院 xCas9n-epBE base editing system and application thereof in genome base replacement
WO2021050571A1 (en) * 2019-09-09 2021-03-18 Beam Therapeutics Inc. Novel nucleobase editors and methods of using same
WO2021047656A1 (en) 2019-09-12 2021-03-18 中国科学院遗传与发育生物学研究所 Herbicide resistant plant
WO2021061507A1 (en) * 2019-09-26 2021-04-01 Syngenta Crop Protection Ag Methods and compositions for dna base editing
WO2021056302A1 (en) * 2019-09-26 2021-04-01 Syngenta Crop Protection Ag Methods and compositions for dna base editing
WO2022020407A1 (en) * 2020-07-21 2022-01-27 Pairwise Plants Services, Inc. Optimized protein linkers and methods of use
US11718838B2 (en) 2020-07-21 2023-08-08 Pairwise Plants Services. Inc. Optimized protein linkers and methods of use
US12110517B2 (en) 2020-07-21 2024-10-08 Pairwise Plants Services, Inc. Optimized protein linkers and methods of use
CN114317518A (en) * 2020-09-30 2022-04-12 北京市农林科学院 Application of SpRYn-CBE base editing system in base replacement of plant genome
CN114317596A (en) * 2020-09-30 2022-04-12 北京市农林科学院 Method for mutating A in plant genome target sequence into G
CN114317589A (en) * 2020-09-30 2022-04-12 北京市农林科学院 Application of SpRYn-ABE base editing system in base replacement of plant genome
CN114317596B (en) * 2020-09-30 2024-01-16 北京市农林科学院 Method for mutating A in plant genome target sequence into G
CN114317518B (en) * 2020-09-30 2024-01-12 北京市农林科学院 Application of SpRYn-CBE base editing system in plant genome base substitution
CN114317589B (en) * 2020-09-30 2024-01-16 北京市农林科学院 Application of SpRYn-ABE base editing system in plant genome base substitution
CN112430622A (en) * 2020-10-26 2021-03-02 扬州大学 FokI and dCpf1 fusion protein expression vector and site-directed gene editing method mediated by same
WO2023187027A1 (en) * 2022-03-30 2023-10-05 BASF Agricultural Solutions Seed US LLC Optimized base editors
CN114686456A (en) * 2022-05-10 2022-07-01 中山大学 Base editing system based on bimolecular deaminase complementation and application thereof
CN115820691B (en) * 2022-07-25 2023-08-22 安徽农业大学 LbCPf1 variant-based rice base editing system and application
CN115820691A (en) * 2022-07-25 2023-03-21 安徽农业大学 Rice basic group editing system based on LbCpf1 variant and application

Also Published As

Publication number Publication date
AR114014A1 (en) 2020-07-08
CN109957569A (en) 2019-07-02
CN109957569B (en) 2022-10-25

Similar Documents

Publication Publication Date Title
WO2019120310A1 (en) Base editing system and method based on cpf1 protein
US11820990B2 (en) Method for base editing in plants
US11447785B2 (en) Method for base editing in plants
CN105802991B (en) Method for site-specific modification of plants through gene transient expression
CN108866092A (en) Generation of anti-herbicide gene and application thereof
WO2021032155A1 (en) Base editing system and use method therefor
CN112048493B (en) Method for enhancing Cas9 and derivative protein-mediated gene manipulation system thereof and application
WO2021082830A1 (en) Method for targeted modification of sequence of plant genome
US20220010322A1 (en) Gene silencing via genome editing
CN116478988B (en) Method for enlarging soybean seeds
US20210087557A1 (en) Methods and compositions for targeted genomic insertion
CN112805385B (en) Base editor based on human APOBEC3A deaminase and application thereof
WO2024051850A1 (en) Dna polymerase-based genome editing system and method
WO2021061507A1 (en) Methods and compositions for dna base editing
CA3103419A1 (en) Methods and compositions for improving forage production or quality in alfalfa plants
WO2021175288A1 (en) Improved cytosine base editing system
WO2020117837A1 (en) Methods and compositions for improving silage
US20230227835A1 (en) Method for base editing in plants
CN118591629A (en) Method for enlarging soybean seeds

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18891685

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11/11/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18891685

Country of ref document: EP

Kind code of ref document: A1