WO2019120310A1

WO2019120310A1 - Base editing system and method based on cpf1 protein

Info

Publication number: WO2019120310A1
Application number: PCT/CN2018/123158
Authority: WO
Inventors: Caixia Gao; Yanpeng WANG
Original assignee: Institute Of Genetics And Developmental Biology, Chinese Academy Of Sciences
Priority date: 2017-12-22
Filing date: 2018-12-24
Publication date: 2019-06-27
Also published as: AR114014A1; CN109957569A; CN109957569B

Abstract

Provided relates to the field of genetic engineering. In particular, Provided relates to a base editing system and method based on CPF1 protein. More particularly, the provided relates to a system and method for efficient base editing of a target sequence in the genome of an organism (e.g., a plant) by a guide RNA-directed Cpfl-deaminase fusion protein, and the genetically modified organism (e.g., plants) produced by the method and the progeny thereof.

Description

[Title established by the ISA under Rule 37.2] BASE EDITING SYSTEM AND METHOD BASED ON CPF1 PROTEIN

Technical field

The invention relates to the field of genetic engineering. In particular, the invention relates to a base editing system and method based on CPF1 protein. More particularly, the present invention relates to a system and method for efficient base editing of a target sequence in the genome of an organism (e.g., a plant) by a guide RNA-directed Cpfl-deaminase fusion protein, and the genetically modified organism (e.g., a plant) produced by the method and progeny thereof.

Technical Background

The prerequisite for efficient crop improvement is the capacity to obtain new genetic mutations that can be easily introduced into modern cultivars. Genetic studies, especially those studies based on whole-genome, have shown that changes in single nucleotides are the main reasons of differences in crop traits. Single base variations may result in amino acid substitutions leading to the evolution of superior alleles and superior traits. Before the emergence of genome editing, targeting induced local lesions in genomes (TILLING) can be used as a method for generating mutations that are urgently needed in crop improvement. However, TILLING screening is time consuming and laborious, and the identified point mutations are often limited for their number and types. Genomic editing techniques, particularly those based on the CRISPR/Cas9 system, enable the introduction of specific base substitutions in genomic loci by homologous recombination (HR) -mediated DNA repair pathways. However, the successful use of this method is currently limited, mainly due to the low frequency of HR-mediated double-strand broken chain repair in plants. In addition, effectively providing a sufficient amount of DNA repair templates is also a major difficulty. These problems make it a challenge to efficiently and simply achieve site-directed mutagenesis in plants through HR.

In recent years, using the binding properties of Cas9 to DNA and the properties of DNA deaminase, Cas9 and deaminase can be fused to achieve precise conversion of cytosine (C) to thymine (T) and conversion of adenine (A) to guanine (G) in a target gene. Currently, the system for C to T transformation mainly includes fusions of SpnCas9-BE3, SpnCas9-AID and Cas9 variants, such as VQR-BE3, EQR-BE3 and VRER-BE3, as well as SaCas9-BE3 and variant SaKKH-BE3. These combinations enable a reduced PAM limitation for cytosine (C) to thymine (T) transitions and a more variable range of editing windows. In addition, recently David Liu’s Lab from Harvard University has developed an adenine deaminase that acts on ssDNA by artificial evolution. The deaminase can be fused with Cas9 and then achieve the Cas9-ABE system that can convert A to G in DNA, which further expands the role of base editing. Although these studies have made a great use of single-base editing of DNA, there are still many problems with current single-base editing techniques. Firstly, the PAMs for Cas9 and Cas9 variants are generally limited in G/C-rich region, thus the types of PAM for the single-base editing system still needs to be broadened. Secondly, due to the poor specificity of Cas9-based editing, the single-base editing system still needs to be improved in terms of specificity. Third, due to that nCas9-BE3 and its variants, and nCas9-ABE usually produce single-stranded nicks on the non-targeting strands of the target site, and it tends to generate DNA indels while it generates single-base mutations during the mismatch repair process, therefore there is still room for improvement in the high fidelity of single base editing. Therefore, new systems and methods for base editing of plant genomes are still needed in the art.

Description of the drawings

Figure 1. Optimization of CPF1-mediated cleavage activity in plant genome.

Figure 2. CPF1-mediated C to T mutations in the plant genome.

Figure 3. CPF1-mediated A to G mutations in plant genomes.

Figure 4. Simultaneous base editing of multiple sites using the RNA cleavage activity of CPF1.

Description of the Invention

1. Definition

In the present invention, the scientific and technical terms used herein have the meaning as commonly understood by a person skilled in the art unless otherwise specified. Also, the protein and nucleic acid chemistry, molecular biology, cell and tissue culture, microbiology, immunology related terms, and laboratory procedures used herein are terms and routine steps that are widely used in the corresponding field. For example, standard recombinant DNA and molecular cloning techniques used in the present invention are well known to those skilled in the art and are more fully described in the following document: Sambrook, J., Fritsch, E.F. and Maniatis, T., Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, 1989 (hereinafter referred to as "Sambrook" ) . In the meantime, in order to better understand the present invention, definitions and explanations of related terms are provided below.

"Cpf1 nuclease" , "Cpf1 protein" and "Cpf1" are used interchangeably herein and refer to an RNA-directed nuclease including a Cpf1 protein or a fragment thereof. Cpf1 is a component of the CRISPR-Cpf1 genome editing system that targets and cleaves DNA target sequences to form DNA double-strand breaks (DSBs) under the guidance of a guide RNA (crRNA) . The Cpf1 protein contains a cleavage domain of DNA and an independent RNA cleavage domain. The RNA cleavage domain of the Cpf1 protein is capable of processing pre-crRNA to form a mature crRNA.

“guide RNA” and “gRNA” can be used interchangeably herein. The guide RNA of the Cpf1-mediated genome editing system is typically composed only of mature crRNA molecules, wherein the crRNA comprises a sequence that is sufficiently identical to the target sequence to hybridize to the complement of the target sequence and direct the complex (Cpf1+crRNA) to sequence specifically bind to the target sequence.

"Deaminase" refers to an enzyme that catalyzes a deamination reaction. In some embodiments of the invention, the deaminase refers to a cytosine deaminase that catalyzes the deamination of cytidine or deoxycytidine to uracil or deoxyuracil, respectively. In some embodiments of the invention, the deaminase refers to adenine deaminase which is capable of catalyzing the formation of inosine (I) by adenosine or deoxyadenosine (A) .

"Genome" as used herein encompasses not only chromosomal DNA present in the nucleus, but also organellar DNA present in the subcellular components (e.g., mitochondria, plastids) of the cell.

As used herein, "organism" includes any organism that is suitable for genome editing, eukaryotes are preferred. Examples of organisms include, but are not limited to, mammals such as humans, mice, rats, monkeys, dogs, pigs, sheep, cattle, cats; poultry such as chickens, ducks, geese; plants including monocots and dicots such as rice, corn, wheat, sorghum, barley, soybean, peanut, Arabidopsis and the like.

A “genetically modified organism” or “genetically modified cell” includes the organism or the cell which comprises within its genome an exogenous polynucleotide or a modified gene or expression regulatory sequence. For example, the exogenous polynucleotide is stably integrated within the genome of the organism or the cell such that the polynucleotide is passed on to successive generations. The exogenous polynucleotide may be integrated into the genome alone or as part of a recombinant DNA construct. The modified gene or expression regulatory sequence means that, in the organism genome or the cell genome, said sequence comprises one or more nucleotide substitution, deletion, or addition. For example, a genetically modified organism obtained by the present invention may comprise one or more substitutions of C to T or A to G relative to a wild type (corresponding organism without such genetic modification) .

"Exogenous" in reference to a sequence means a sequence from a foreign species, or refers to a sequence in which significant changes in composition and /or locus occur from its native form through deliberate human intervention if from the same species.

"Polynucleotide" , "nucleic acid sequence" , "nucleotide sequence" or "nucleic acid fragment" are used interchangeably and are single-stranded or double-stranded RNA or DNA polymers, optionally containing synthetic, non-natural or altered nucleotide bases. Nucleotides are referred to by their single letter names as follows: "A" is adenosine or deoxyadenosine (corresponding to RNA or DNA, respectively) , "C" means cytidine or deoxycytidine, "G" means guanosine or deoxyguanosine, "U" represents uridine, "T" means deoxythymidine, "R" means purine (A or G) , "Y" means pyrimidine (C or T) , "K" means G or T, "H" means A or C or T, "I" means inosine, and "N" means any nucleotide.

"Polypeptide, " "peptide, " and "protein" are used interchangeably in the present invention to refer to a polymer of amino acid residues. The terms apply to an amino acid polymer in which one or more amino acid residues is artificial chemical analogue of corresponding naturally occurring amino acid (s) , as well as to a naturally occurring amino acid polymer. The terms "polypeptide, " "peptide, " "amino acid sequence, " and "protein" may also include modified forms including, but not limited to, glycosylation, lipid ligation, sulfation, γ carboxylation of glutamic acid residues, and ADP-ribosylation.

As used in the present invention, "expression construct" refers to a vector such as a recombinant vector that is suitable for expression of a nucleotide sequence of interest in an organism. "Expression" refers to the production of a functional product. For example, expression of a nucleotide sequence may refer to the transcription of a nucleotide sequence (e.g., transcription to produce mRNA or functional RNA) and/or the translation of an RNA into a precursor or mature protein.

The "expression construct" of the present invention may be a linear nucleic acid fragment, a circular plasmid, a viral vector or, in some embodiments, an RNA that is capable of translation (such as mRNA) .

The "expression construct" of the present invention may comprise regulatory sequences and nucleotide sequences of interest from different origins, or regulatory sequences and nucleotide sequences of interest from the same source but arranged in a manner different from that normally occurring in nature.

"Regulatory sequence" and "regulatory element" are used interchangeably to refer to a nucleotide sequence that is located upstream (5 'non-coding sequence) , middle or downstream (3' non-coding sequence) of a coding sequence and affects the transcription, RNA processing or stability or translation of the relevant coding sequence.

Regulatory sequences may include, but are not limited to, promoters, translation leaders, introns and polyadenylation recognition sequences.

"Promoter" refers to a nucleic acid fragment capable of controlling the transcription of another nucleic acid fragment. In some embodiments of the present invention, the promoter is a promoter capable of controlling the transcription of a gene in a cell, whether or not it is derived from the cell. The promoter may be a constitutive promoter or tissue-specific promoter or developmentally-regulated promoter or inducible promoter.

"Constitutive promoter" refers to a promoter that may in general cause the gene to be expressed in most cases in most cell types. "Tissue-specific promoter" and "tissue-preferred promoter" are used interchangeably and mean that they are expressed primarily but not necessarily exclusively in one tissue or organ, but also in a specific cell or cell type. "Developmentally-regulated promoter" refers to a promoter whose activity is dictated by developmental events. "Inducible promoter" selectively express operably linked DNA sequences in response to an endogenous or exogenous stimulus (environment, hormones, chemical signals, etc. ) .

As used herein, the term "operably linked" refers to the linkage of a regulatory element (e.g., but not limited to, a promoter sequence, a transcription termination sequence, etc. ) to a nucleic acid sequence (e.g., a coding sequence or an open reading frame) such that transcription of the nucleotide sequence is controlled and regulated by the transcriptional regulatory element. Techniques for operably linking regulatory element regions to nucleic acid molecules are known in the art.

"Introduction" of a nucleic acid molecule (e.g., plasmid, linear nucleic acid fragment, RNA, etc. ) or protein into an organism means that the nucleic acid or protein is used to transform a cell of the organism such that the nucleic acid or protein is capable of functioning in the cell. As used in the present invention, "transformation" includes both stable and transient transformations.

"Stable transformation" refers to the introduction of exogenous nucleotide sequences into the genome, resulting in the stable inheritance of foreign genes. Once stably transformed, the exogenous nucleic acid sequence is stably integrated into the genome of the organism and any of its successive generations.

"Transient transformation" refers to the introduction of a nucleic acid molecule or protein into a cell, performing a function without the stable inheritance of an exogenous gene. In transient transformation, the exogenous nucleic acid sequences are not integrated into the genome.

As used herein, the term "plant" includes a whole plant and any descendant, cell, tissue, or part of a plant. The term "plant parts" include any part (s) of a plant, including, for example and without limitation: seed (including mature seed and immature seed) ; a plant cutting; a plant cell; a plant cell culture; a plant organ (e.g., pollen, embryos, flowers, fruits, shoots, leaves, roots, stems, and explants) . A plant tissue or plant organ may be a seed, protoplast, callus, or any other group of plant cells that is organized into a structural or functional unit. A plant cell or tissue culture may be capable of regenerating a plant having the physiological and morphological characteristics of the plant from which the cell or tissue was obtained, and of regenerating a plant having substantially the same genotype as the plant. In contrast, some plant cells are not capable of being regenerated to produce plants. Regenerable cells in a plant cell or tissue culture may be embryos, protoplasts, meristematic cells, callus, pollen, leaves, anthers, roots, root tips, silk, flowers, kernels, ears, cobs, husks, or stalks.

Plant parts include harvestable parts and parts useful for propagation of progeny plants. Plant parts useful for propagation include, for example and without limitation: seed; fruit; a cutting; a seedling; a tuber; and a rootstock. A harvestable part of a plant may be any useful part of a plant, including, for example and without limitation: flower; pollen; seedling; tuber; leaf; stem; fruit; seed; and root.

A plant cell is the structural and physiological unit of the plant, and includes protoplast cells without a cell wall and plant cells with a cell wall. A plant cell may be in the form of an isolated single cell, or an aggregate of cells (e.g., a friable callus and a cultured cell) , and may be part of a higher organized unit (e.g., a plant tissue, plant organ, and plant) . Thus, a plant cell may be a protoplast, a gamete producing cell, or a cell or collection of cells that can regenerate into a whole plant. As such, a seed, which comprises multiple plant cells and is capable of regenerating into a whole plant, is considered a "plant part" in embodiments herein.

The term "protoplast" , as used herein, refers to a plant cell that had its cell wall completely or partially removed, with the lipid bilayer membrane thereof naked. Typically, a protoplast is an isolated plant cell without cell walls which has the potency for regeneration into cell culture or a whole plant.

“Progeny” of a plant comprises any subsequent generation of the plant.

“Trait” refers to the physiological, morphological, biochemical, or physical characteristics of a plant or a particular plant material or cell. In some embodiments, the characteristic is visible to the human eye, such as seed or plant size, or can be measured by biochemical techniques, such as detecting the protein, starch, or oil content of seed or leaves, or by observation of a metabolic or physiological process, e.g. by measuring tolerance to water deprivation or particular salt or sugar concentrations, or by the observation of the expression level of a gene or genes, or by agricultural observations such as osmotic stress tolerance or yield. In some embodiments, trait also includes ploidy of a plant, such as haploidy which is important for plant breeding. In some embodiments, trait also includes resistance of a plant to herbicides.

“Agronomic trait” is a measurable parameter including but not limited to, leaf greenness, yield, growth rate, biomass, fresh weight at maturation, dry weight at maturation, fruit yield, seed yield, total plant nitrogen content, fruit nitrogen content, seed nitrogen content, nitrogen content in a vegetative tissue, total plant free amino acid content, fruit free amino acid content, seed free amino acid content, free amino acid content in a vegetative tissue, total plant protein content, fruit protein content, seed protein content, protein content in a vegetative tissue, drought tolerance, nitrogen uptake, root lodging, harvest index, stalk lodging, plant height, ear height, ear length, disease resistance, cold resistance, salt tolerance, and tiller number and so on.

2. Base editing system based on Cpf1 protein

The present invention provides a system for base editing of a target sequence in the genome of an organism, comprising at least one of the following i) to v) :

i) a base-editing fusion protein, and a guide RNA;

ii) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein, and a guide RNA;

iii) a base-editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;

iv) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;

v) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein and a nucleotide sequence encoding a guide RNA;

wherein the base-editing fusion protein comprises a Cpf1 lacking DNA cleavage activity, and a deaminase, the guide RNA being capable of targeting the base-editing fusion protein to a target sequence in the genome, resulting in one or more C to T or A to G substitution (s) in the target sequence.

Cpf1 contains a DNA cleavage domain (RuvC) , which can be mutated to delete the DNA cleavage activity of Cpf1 to form a "Cpf1 lacking DNA cleavage activity" . The Cpf1 lacking DNA cleavage activity still retains gRNA-directed DNA binding ability. Thus, in principle, when fused to an additional protein, the Cpfl lack of DNA cleavage activity can readily target the additional protein to almost any DNA sequence simply by co-expression with a suitable guide RNA.

The Cpf1 lacking DNA cleavage activity of the present invention may be derived from Cpf1 of different species, for example, Cpf1 proteins derived from Francisella novicida U112, Acidaminococcus sp. BV3L6 and Lachnospiraceae bacterium ND2006, designated FnCpf1 (amino acid sequence of the wild type is set forth in SEQ ID NO: 19) , AsCpf1 (amino acid sequence of the wild type is set forth in SEQ ID NO: 18) and LbCpf1 (amino acid sequence of the wild type is set forth in SEQ ID NO: 20) , respectively.

In some embodiments, the Cpf1 lacking DNA cleavage activity is the FnCpfl lacking DNA cleavage activity. In some embodiments, the FnCpfl lacking DNA cleavage activity comprises a D917A mutation relative to wild-type FnCpfl.

In some embodiments, the Cpf1 lacking DNA cleavage activity is the AsCpfl lacking DNA cleavage activity. In some embodiments, the AsCpfl lacking DNA cleavage activity comprises a D908A mutation relative to wild-type AsCpfl.

In some preferred embodiments, the Cpf1 lacking DNA cleavage activity is the LbCpfl lacking DNA cleavage activity. In some embodiments, the LbCpfl lacking DNA cleavage activity comprises a D832A mutation relative to wild type LbCpfl.

In some embodiments, the Cpf1 lacking DNA cleavage activity retains its RNA cleavage activity such that the pre-crRNA can be processed to form a mature crRNA. Thus, in some embodiments, an expression construct comprising a nucleotide sequence encoding a guide RNA in a system of the invention may comprise a sequence encoding a plurality of different guide RNA (crRNA) precursors in tandem, which may be processed by the Cpf1 lacking DNA cleavage activity to form a plurality of different guide RNAs (crRNAs) upon transcription to simultaneously target a plurality of different target sequences.

In some embodiments of the invention, the deaminase in the fusion protein is a cytidine deaminase, such as the apolipoprotein B mRNA editing complex (APOBEC) family deaminase.

Cytidine deaminase catalyzes the deamination of cytidine (C) in the DNA to form uracil (U) . The present inventors have surprisingly found that the fusion of a Cpf1 lacking DNA cleavage activity and a cytidine deaminase, under the guidance of a guide RNA, can target a target sequence in the genome. Because of the deficient of the DNA cleavage activity in Cpf1, the DNA double strands are not cleaved, and the cytidine deaminase in the fusion protein can deamination of the cytidine of the single-stranded DNA produced during the formation of the Cpf1-guide RNA-DNA complex into U, and then C to T replacement is achieved through the base mismatch repairs.

The cytidine deaminase of the present invention is particularly a cytidine deaminase which can accept single-stranded DNA as a substrate. Examples of cytidine deaminase useful in the present invention include, but are not limited to, APOBEC1 deaminase, activation-induced cytidine deaminase (AID) , APOBEC3G or CDA1. In some embodiments of the invention, the cytidine deaminase comprises the amino acid sequence set forth in SEQ ID NO: 1.

Where the deaminase in the fusion protein is a cytidine deaminase, the base editing system of the present invention can mutate one or more C (s) to T (s) in the genomic target sequence, thus also referred to as the Cpf1-PBE system.

In cells, uracil DNA glycosylase catalyzes the removal of U from DNA and initiates base excision repair (BER) , resulting in the repair of U: G to C: G. Thus, without being bound by any theory, the inclusion of a uracil DNA glycosylase inhibitor in a base-editing fusion protein of the present invention or a system of the present invention will increase the efficiency of base editing.

Thus, in some embodiments of the invention involving a Cpf1-PBE system, the base-editing fusion protein further comprises a uracil DNA glycosylase inhibitor (UGI) . In some embodiments, the uracil DNA glycosylase inhibitor comprises the amino acid sequence set forth in SEQ ID NO: 2.

In some embodiments of the invention, the deaminase is an adenine deaminase.

The naturally occurring adenine deaminase converts adenosine in single-stranded RNA into inosine (I) by deamination using RNA as a substrate. Recently, DNA-dependent adenine deaminase that convert deoxyguanosine in single-stranded DNA to inosine (I) using single-stranded DNA as a substrate has been obtained based on tRNA adenine deaminase TadA of E. coli by means of directed evolution. See Nicloe M. Gaudelli et al., doi: 10.1038/nature 24644, 2017.

The present inventors have surprisingly found that when Cpf1 lacking DNA cleavage activity is fused to a DNA-dependent adenine deaminase, under the guidance of a guide RNA, the fusion protein can target a target sequence in the plant genome. Due to the deficient of the DNA cleavage activity in Cpf1, the DNA double strands are not cleaved, and the DNA-dependent adenine deaminase in the fusion protein is capable of deaminating the adenosine of the single-stranded DNA produced during the formation of the Cpf1-guide RNA-DNA complex into a inosine (I) . Since DNA polymerase treats inosine (I) as guanine (G) , substitution of A to G can be achieved by base mismatch repair. Therefore, in the case where the deaminase in the fusion protein is a DNA-dependent adenine deaminase, the base editing system of the present invention can mutate one or more A in the genomic target sequence to G, and thus is also called Cpf1-ABE system.

In some embodiments of the present invention, the DNA-dependent adenine deaminase is a variant of the E. coli tRNA adenine deaminase TadA (ecTadA) , in particular a variant which can accept single-stranded DNA as a substrate. The variant comprises, relative to wild-type ecTadA, one or more sets of mutations selected from the group consisting of:

1) A106V and D108N;

2) D147Y and E155V;

3) L84F, H123Y and I156F;

4) A142N;

5) H36L, R51L, S146C and K157N;

6) P48S/T/A;

7) A142N;

8) W23L/R;

9) R152H/P.

In a specific embodiment of the present invention, the DNA-dependent adenine deaminase (ABE version 7.9) comprises the following mutations relative to wild-type ecTadA: W23R, H36L, R51L, S146C, K157N, A106V, D108N, P48A, L84F, H123Y, I156F, A142N, D147Y, E155V and R152P.

In a specific embodiment of the present invention, the DNA-dependent adenine deaminase (ABE version 7.10) comprises the following mutations relative to wild-type ecTadA: W23R, H36L, R51L, S146C, K157N, A106V, D108N, P48A, L84F, H123Y, I156F, D147Y, E155V and R152P.

Amino acid sequence of wild-type EcTadA is shown below: MSEVEFSHEYWMRHALTLAKRAWDEREVPVGAVLVHNNRVIGEGWNR PIGRHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTLEPCVMCAGAMIH SRIGRVVFGARDAKTGAAGSLMDVLHHPGMNHRVEITEGILADECAALL SDFFRMRRQEIKAQKKAQSSTD (SEQ ID NO: 3) . In some embodiments, the initiating methionine may be absent.

The amino acid sequence of preferred ecTadA derived DNA-dependent adenosine deaminase (ABE Version 7.10) is shown below: MSEVEFSHEYWMRHALTLAKRARDEREVPVGAVLVLNNRVIGEGWNR AIGLHDPTAHAEIMALRQGGLVMQNYRLIDATLYVTFEPCVMCAGAMIH SRIGRVVFGVRNAKTGAAGSLMDVLHYPGMNHRVEITEGILADECAALL CYFFRMPRQVFNAQKKAQSSTD (SEQ ID NO: 4) . In some embodiments, the initiating methionine may be absent.

In some embodiments of the present invention, the deaminase is fused to the N-terminus of the Cpfl lacking DNA cleavage activity. In some embodiments, the deaminase is fused to the C-terminus of the Cpfl lacking DNA cleavage activity.

In some preferred embodiments, the N-terminus of the DNA-dependent adenine deaminase is fused with a corresponding wild-type adenine deaminase. It is expected that the formation of heterodimers by DNA-dependent adenine deaminase and wild-type adenine deaminase can significantly increase the A to G editing activity of fusion proteins.

In some embodiments of the present invention, the deaminase and the Cpfl lacking DNA cleavage activity are fused via a linker. The linker may be 1-50 (for example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or 20-25, 25-50) amino acids in length, or non-functional amino acid sequences with more amino acids and without secondary or higher structures. For example, the linker can be a flexible linker such as GGGGS, GS, GAP, (GGGGS) x 3, GGS and (GGS) x7, and the like. In some specific embodiments, the linker is an XTEN linker. In some embodiments, the linker is 32 amino acids in length. In some specific embodiments, the amino acid sequence of the linker is: SGGSSGGSSGSETPGTSESATPESSGGSSGGS.

In some embodiments of the present invention, the base-editing fusion proteins of the present invention further comprise a nuclear localization sequence (NLS) . In general, one or more NLSs in the base-editing fusion protein should be of sufficient strength to drive the base-editing fusion protein in the nucleus of a plant cell to achieve an amount accumulation of base editing function. In general, the intensity of nuclear localization activity is determined by the number, location, one or more specific NLSs used of the NLS in the base-editing fusion protein, or a combination of these factors.

In some embodiments of the present invention, the NLS of the base-editing fusion protein of the present invention may be located at the N-terminus and/or C-terminus. In some embodiments, the base-editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs. In some embodiments, the base-editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLS at or near the N-terminus. In some embodiments, the base-editing fusion protein comprises about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more NLSs at or near the C-terminus. In some embodiments, the base-editing fusion protein comprises a combination of these, such as comprises one or more NLSs at the N-terminus and one or more NLSs at the C-terminus. When there is more than one NLS, each can be selected to be independent of other NLSs. In some preferred embodiments of the present invention, the base-editing fusion protein comprises two NLSs, for example, the two NLSs are located at the N-terminus and the C-terminus, respectively.

In general, NLS consists of one or more short sequences of positively charged lysine or arginine exposed on the surface of the protein, but other types of NLS are also known. Non-limiting examples of NLS include: KKRKV (nucleotide sequence 5'-AAGAAGAGAAAGGTC-3') , PKKKRKV (nucleotide sequence 5'-CCCAAGAAGAAGAGGAAGGTG-3' or CCAAAGAAGAAGAGGAAGGTT) , or SGGSPKKKRKV (nucleotide sequence 5'-TCGGGGGGGAGCCCAAAGAAGAAGCGGAAGGTG -3') .

In some embodiments of the present invention, the N-terminus of the base-editing fusion protein comprises the NLS with the amino acid sequence set forth in PKKKRKV. In some embodiments of the present invention, the C-terminus of the base-editing fusion protein comprises the NLS with the amino acid sequence set forth by SGGSPKKKRKV or KRPAATKKAGQAKKKK.

Furthermore, depending on the location of the DNA to be edited, the base-editing fusion proteins of the present invention may also include other localization sequences, such as cytoplasmic localization sequences, chloroplast localization sequences, mitochondrial localization sequences, and the like.

In some embodiments of the present invention involving Cpf1-PBE system, the base editing fusion protein also contains a uracil DNA glycosylase inhibitor (UGI) , and two NLSs flanking either N-terminal or C-terminal of the UGI. In some preferred embodiments, the base editing fusion protein of the invention comprises an amino acid sequence selected from SEQ ID NO: 24-29.

To obtain efficient expression, in some embodiments of the present invention, the nucleotide sequence encoding the base-editing fusion protein is codon optimized for the biological species to be base edited.

Codon optimization refers to the replacement of at least one codon (eg, about or more than about 1, 2, 3, 4, 5, 10, 15, 20, 25, 50 or more codons) of a native sequence by a codon that is used more frequently or most frequently in the gene of the host cell, modifying the nucleic acid sequence while maintaining the native amino acid sequence to enhance expression in the host cell of interest. Different species show specific preferences for certain codons of a particular amino acid. Codon preference (difference in codon usage between organisms) is often associated with the efficiency of translation of messenger RNA (mRNA) , which is believed to depend on the nature of the translated codon and the availability of specific transfer RNA (tRNA) molecules. The advantages of selected tRNAs within cells generally reflect the most frequently used codons for peptide synthesis. Therefore, genes can be customized to be best gene expressed in a given organism based on codon optimization. The codon usage table can be easily obtained, for example, in the Codon Usage Database available at www. kazusa. orjp/codon/, and these tables can be adjusted in different ways. See, Nakamura Y. et. al "Codon usage tabulated from the international DNA sequence databases: status for the year2000 Nucl. Acids Res, 28: 292 (2000) .

In some specific embodiments, the base editing fusion protein of the invention is encoded by the nucleotide sequence selected from SEQ ID NO: 8-9, 11-12 or 14-15.

In some embodiments of the present invention, the nucleotide sequence encoding the base-editing fusion protein and/or the nucleotide sequence encoding the guide RNA is operably linked to an expression regulatory element such as a promoter.

Examples of promoters that can be used in the present invention include, but are not limited to, the polymerase (pol) I, pol II or pol III promoters. Examples of the pol I promoter include the gallus RNA pol I promoter. Examples of the pol II promoters include, but are not limited to, the immediate-early cytomegalovirus (CMV) promoter, the Rous sarcoma virus long terminal repeat (RSV-LTR) promoter, and the immediate-early simian virus 40 (SV40) promoter. Examples of pol III promoters include the U6 and H1 promoters. An inducible promoter such as a metallothionein promoter can be used. Other examples of promoters include the T7 phage promoter, the T3 phage promoter, the β-galactosidase promoter, and the Sp6 phage promoter, and the like. Promoters that can be used in plants include, but are not limited to, cauliflower mosaic virus 35S promoter, maize Ubi-1 promoter, wheat U6 promoter, rice U3 promoter, maize U3 promoter, rice actin promoter.

Preferably, the guide RNA (crRNA) is expressed using the Ubi-1 promoter and cleaved to become mature with a ribozyme such as HDV ribozyme.

In one embodiment, the addition of an intron after the Ubi-1 promoter enhances expression of the protein or RNA of interest.

In some specific embodiments, the expression construct for expressing the base editing fusion protein of the invention comprises an expression cassette of SEQ ID NO: 10 or 13. Alternatively, the expression construct comprises a expression regulating sequence set forth in SEQ ID NO: 30.

3. The method of producing genetically modified organisms

In another aspect, the present invention provides a method of producing a genetically modified organism (e.g. a plant) , comprising introducing a system of the present invention for base editing of a target sequence in the genome of an organism into a cell of the organism, whereby the guide RNA targets the base-editing fusion protein to a target sequence in the genome, resulting in one or more C to T or one or more A to G substitutions in the target sequence.

The design of target sequences or crRNA coding sequences that can be recognized and targeted by the Cpf1 protein and the guide RNA (i.e., crRNA) complex can be found, for example, in Zhang et al., Cell 163, 1–13, October 22, 2015. In general, the 5'-terminus of the target sequence targeted by the genome editing system of the present invention needs to include a protospacer adjacent motif (PAM) 5'-TTTN or 5'-YTN, wherein N is independently selected from A, G, C and T, Y is selected from C and T.

For example, in some embodiments of the present invention, the target sequence has the following structure: 5'-TTTN-NX-3' or 5'-YTN-NX-3', wherein N is independently selected from A, G, C and T, Y is selected from C and T; X is an integer of 15 ≤ X ≤ 35; Nx represents X consecutive nucleotides.

In the present invention, the target sequence to be modified may be located at any location in the genome, for example, in a functional gene such as a protein-encoding gene, or may be, for example, located in a gene expression regulatory region such as a promoter region or an enhancer region, thereby the gene functional modification or gene expression modification can be achieved.

A to G or C to T base editing in the target sequence of a cell can be detected by T7EI, PCR/RE or sequencing methods.

In the methods of the present invention, the base editing system can be introduced into cells by a variety of methods well known to those skilled in the art. Methods that can be used to introduce a genome editing system of the present invention into a cell include, but are not limited to, calcium phosphate transfection, protoplast fusion, electroporation, lipofection, microinjection, viral infection (e.g., baculovirus, vaccinia virus, adenovirus, adeno-associated virus, lentivirus and other viruses) , gene gun method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation.

A cell that can be edited by the method of the present invention can be a cell of mammals such as human, mouse, rat, monkey, dog, pig, sheep, cattle, cat; a cell of poultry such as chicken, duck, goose; a cell of plants including monocots and dicots, such as rice, corn, wheat, sorghum, barley, soybean, peanut and Arabidopsis thaliana and so on.

The methods of the invention are particularly suitable for producing genetically modified plants, such as crop plants. In the method of producing a genetically modified plant of the present invention, the base editing system can be introduced into a plant by various methods well known to those skilled in the art. Methods that can be used to introduce a base editing system of the invention into a plant include, but are not limited to, gene gun method, PEG-mediated protoplast transformation, Agrobacterium-mediated transformation, plant virus-mediated transformation, pollen tube pathway and ovary injection method.

In the method for producing a genetically modified plant of the present invention, the modification of the target sequence can be achieved by only introducing or producing the base-editing fusion protein and the guide RNA in the plant cell, and the modification can be stably inherited, without any need to stably transform the base editing system into plants. This avoids the potential off-target effect of the stable base editing system and also avoids the integration of the exogenous nucleotide sequence in the plant genome, thereby providing greater biosafety.

In some preferred embodiments, the introduction is carried out in the absence of selection pressure to avoid integration of the exogenous nucleotide sequence into the plant genome.

In some embodiments, the introduction comprises transforming the base editing system of the present invention into an isolated plant cell or tissue and then regenerating the transformed plant cell or tissue into an intact plant. Preferably, the regeneration is carried out in the absence of selection pressure, i.e., no selection agent for the selection gene on the expression vector is used during tissue culture. Avoiding the use of a selection agent can increase the regeneration efficiency of the plant, obtaining a modified plant free of exogenous nucleotide sequences.

In other embodiments, the base editing system of the present invention can be transformed into specific parts of an intact plant, such as leaves, shoot tips, pollen tubes, young ears or hypocotyls. This is particularly suitable for the transformation of plants that are difficult to regenerate in tissue culture.

In some embodiments of the invention, the in vitro expressed protein and/or the in vitro transcribed RNA molecule are directly transformed into the plant. The protein and/or RNA molecule is capable of performing base editing in plant cells and is subsequently degraded by the cell, avoiding integration of the exogenous nucleotide sequence in the plant genome.

Plants that can be base-edited by the methods of the invention include monocots and dicots. For example, the plant may be a crop plant such as wheat, rice, corn, soybean, sunflower, sorghum, canola, alfalfa, cotton, barley, millet, sugar cane, tomato, tobacco, tapioca or potato.

In some embodiments of the present invention, the target sequence is associated with a plant trait, such as an agronomic trait, whereby the base editing results in a plant having altered traits relative to a wild type plant.

In the present invention, the target sequence to be modified may be located at any position in the genome, for example, in a functional gene such as a protein-encoding gene, or may be, for example, located in a gene expression regulatory region such as a promoter region or an enhancer region, thereby gene functional modification or gene expression modification can be achieved. Accordingly, in some embodiments of the present invention, the substitution of C to T or A to G results in an amino acid substitution in the target protein or a truncation of the target protein (a stop codon is generated) . In other embodiments of the present invention, the substitution of C to T or A to G results in a change in expression of the target gene.

In some embodiments of the present invention, the method further comprises obtaining progeny of the genetically modified plant.

In another aspect, the present invention provides a genetically modified plant or a progeny thereof, or a part thereof, wherein the plant is obtained by the method of the invention described above.

In another aspect, the present invention provides a method of plant breeding comprising crossing a genetically modified first plant obtained by the above method of the present invention with a second plant not containing the genetic modification, thereby the genetic modification is introduced into the second plant.

Example

Construction of Ubi-CPF1-PBE/ABE expression vector

The ABE, XTEN, dCPF1 sequences were codon optimized for plants and ordered from GenScript (Nanjing) . The full-length dCPF1-ABE fragment was amplified using primer pairs HindIII-F (with HindIII restriction site) and EcoRI (with EcoRI restriction site) . The PCR product was digested with HindIII and EcoRI, and then inserted into the pJIT163-GFP vectors (this vector sequence is shown in SEQ ID NO: 16) digested with the two enzymes to generate the fusion expression vector dCPF1-ABE.

The PBE, XTEN, and dCPF1 sequences were codon optimized for plants and ordered from GenScript (Nanjing) . The full length dCPF1-PBE fragment was amplified using primer pairs HindIII-F (with HindIII restriction site) and EcoRI (with EcoRI restriction site) . The PCR product was digested with HindIII and EcoRI, and then inserted into the pJIT163-GFP vectors (this vector sequence is shown in SEQ ID NO: 16) digested with the two enzymes to generate the fusion expression vector dCPF1-PBE.

Construction of sgRNA expression vector

According to the previous description (Wang, Y. et al. Simultaneous editing of three homoeoalleles in hexaploid bread wheat confers heritable resistance to powdery mildew. Nat. Biotechnol. 32, 947-951, 2014; Shan, Q. et al. Targeted genome modification of Crops using a CRISPR-Cas system. Nat. Biotechnol. 31, 686-688, 2013; and Liang, Z. et al. Targeted mutagenesis in Zea mays using TALENs and the CRISPR/Cas system. J Genet Genomics. 41, 63 -68, 2014) , an sgRNA expression vector was constructed based on pTaU6-sgRNA (Addgene ID53062) or pOsU3-sgRNA (Addgene ID53063) or pZmU3-sgRNA (Addgene ID5306) or OsU3/TaU6-tRNA-sgRNA (Zhang et al. 2017. Genome Biology. DOI: 10.1186 /s13059-017-1325-9) . In addition, the hammerhead enzyme and crRNA are activated by the type II promoter to generate crRNA (Tang et al. Nature plant, doi: 10.1038/nplants. 2017.18)

pUbi-mGFPP-crRNA, pUbi-DEP1-sgRNA, pUbi-DEP1-crRNA, pUbi-DME-crRNA.

BFP and GFP expression vectors

pUbi-mGFP, the vector sequence is shown in SEQ ID NO: 17.

Protoplast assays

Wheat Bobwhite and rice Nipponbare were used in this study. Protoplast transformation was performed as described below. The average transformation efficiency is 55-70%. Protoplasts transformation is performed as described below. Transformation is carried out with 10μg of each plasmid by PEG-mediated method. Protoplasts were collected after 48h and DNA was extracted for T7EI and PCR-RE assay.

Preparation and transformation of wheat protoplasts

1) The middle parts of wheat tender leaves were cut into strips of 0.5-1 mm in width. The strips were placed into 0.6M Mannitol solution for 10 minutes, filtered, and then placed in 50 ml enzyme solution 20-25℃ in darkness, with gently shaking (10rmp) for 5 hours.

2) 10 ml W5 was added to dilute the enzymolysis products and the products were filtered with a 75 μm nylon filter in a round bottom centrifuge tube (50 ml) .

3) 23℃ 100 g centrifugation for 3 min, and the supernatant was discarded.

4) The products were gently suspended with 10 ml W5, placed on the ice for 30 min to allow the protoplasts gradually settling, and the supernatant was discarded.

5) Protoplasts were suspended by adding an appropriate amount of MMG , placed on ice until tranformation.

6) 10-20μg plasmid, 200μl protoplasts (about 4×105 cells) , 220μl fresh PEG solution were added into a 2 ml centrifuge tube, mixed up, and placed under room temperature in darkness for 10-20 minutes to induce transformation.

7) After the induction of transformation, 880μl W5 solution was slowly added, and the tubes were gently turned upside down for mixing, then 100 g horizontal centrifuged for 3 min, and the supernatant was discarded.

8) The products were resuspended in 2 ml W5 solution, transferred to a six-well plate, cultivated under room temperature (or 25℃) in darkness. For protoplast genomic DNA extraction, the products need to be cultivated for 48 h.

Preparation and transformation of rice protoplast

1) Leaf sheath of the seedlings were used for protoplasts isolation, and cut into about 0.5 mm wide with a sharp blade.

2) Immediately after incision, transferred into 0.6M Mannitol solution, and placed in the dark for 10 min.

3) Mannitol solution was removed by filtration, and the products were transferred into enzymolysis solution, and evacuated for 30 min.

4) Enzymolysis was performed for 5-6h in darkness with gently shaking (decolorization shaker, speed 10) .

5) After enzymolysis completion, an equal volume of W5 was added, horizontal shake for 10s to release protoplasts.

6) Protoplasts were filtered into a 50 ml round bottom centrifuge tube with a 40μm nylon membrane and washed with W5 solution.

7) 250g horizontal centrifugation for 3min to precipitate the protoplasts, the supernatant was discarded.

8) Protoplasts were resuspended by adding 10ml W5, and then centrifuged at 250g for 3min, and the supernatant was discarded.

9) An appropriate amount of MMG solution was added to resuspend the protoplasts to a concentration of 2×10 ⁶/ml.

Note: All the above steps were carried out at room temperature.

10) 10-20μg plasmid, 200μl protoplasts (about 4×10 ⁵ cells) , and 220μl fresh PEG solution were added into a 2 ml centrifugal tube, mixed, and placed at room temperature in darkness for 10-20 minutes to induce transformation.

11) After the completion of the transformation, 880μl W5 solution was slowly added, and the tubes were gently turned upside down for mixing, 250g horizontal centrifuged for 3min, and the supernatant was discarded.

12) The products were resuspended in 2ml WI solution, transferred to a six-well plate, cultivated in room temperature (or 25℃) in darkness. For protoplast genomic DNA extraction, the products need to be cultivated for 48 h.

PCR/RE:

1) Plant genomic DNA was extracted.

2) Fragments containing the target sites, the length of which is between 350-1000 bp, were amplified with synthetic gene-specific primers:

10×EasyTaq Buffer	5 μl
dNTP (2.5 mM)	4 μl
Forward primer (10 μM)	2 μl
Forward primer (10 μM)	2 μl
Easy Taq	0.5 μl
DNA
	2 μl

ddH ₂O

To 50 μl

3) The general reaction conditions are: denaturation at 94℃ for 5min; denaturation at 94℃ for 30s; anneal at 58℃ for 30s, extension at 72℃ for 30s, amplification for 30 to 35 cycles; incubation at 72℃ for 5min; incubation at 12℃. 5μl PCR products were subjected to electrophoresis.

4) PCR products were digested with restriction endonuclease as follows:

10×Fastdigest Buffer	2 μl
Restriction enzymes
	1 μl
PCR product	3-5 μl
ddH ₂O	To 20 μl

5) Digestion at 37℃ for 2-3 h. Products were analyzed by 1.2%agarose gel electrophoresis.

6) The uncut mutant bands in the PCR products were recovered and purified, and subjected to TA cloning as follows:

7) The ligation was performed at 22℃ for 12min. And the products were transformed into E. coli competent cells, which were then plated on LB plates (Amp100, IPTG, and X-gal) , incubated at 22℃ for 12-16h. White colonies were picked for identifying positive clones and sequencing.

Deep sequencing

Different sgRNA expression vectors were transformed respectively into wheat and rice protoplasts with Ubi-CPF1-PBE/ABE expression vector. Protoplasts were collected 48 hours later, and DNA was extracted for deep sequencing. In the first round of PCR, the target region was amplified using site-specific primers. In second round of PCR, forward and reverse tags were added to the end of the PCR product for library construction. Equal amounts of different PCR products were pooled. Samples were then sequenced using the Illumina High-Seq 4000 at the Beijing Genomics Institute.

Example 1. Optimization of CPF1 mediated genome cleavage activity in plants.

The editing activity of CPF1 in plant cells is quite different in different articles, and the cleavage activity between different types of CPF1 is also very different.

In this example, the nuclear location state of AsCPF1, FnCPF1 and LbCPF1 was optimized, and the promoter for crRNA was also optimized to improve the cleavage activity of CPF1 in plant cells. Vectors of AsCPF1, FnCPF1 and LbCPF1 carrying 1-4 NLSs were constructed, and different vectors for generating crRNA by ribozyme, driven by U3/U6 or UBI promoter were constructed (Fig. 1) . It can be seen from the results of PCR/RE that the three CPFs with two NLSs can work, and the efficiency of LbCPF1 is high (SEQ ID NO: 5-7 are the coding sequences of ASCPF1-2NLS, FNCPF1-2NLS and LBCPF1-2NLS, respectively, the corresponding amino acid sequence can be easily obtained) . For the target site of the OsPDS gene, it can be seen that the efficiency of 2NLS-LbCPF1 is higher than that of NLS-LbCPF1, and higher than other reported constructs.

Example 2. CPF1-mediated C to T mutation of plant genome (CPF1-PBE)

With reference to the characteristics of CPF1’s cleavage activity in plant cells, the following dCPF1-PBE systems were constructed: dAsCPF1-2NLS-PBE, dFnCPF1-2NLS-PBE, dLbCPF1-2NLS-PBE. The NLSs at the C-terminus are placed at one end of the UGI or placed at both ends of the UGI. The expression of crRNA is initiated with UBI1 promoter and cleaved with a ribozyme. The results of PCR/RE detection indicated that editing activity was detected for dFnCPf1 and dLbCPF1, and NLS at only one end of UGI had higher activity (SEQ ID NO: 8, 9 shows the coding sequence of dFNCPF1-PBE-2NLS and dLbCPF1-2NLS-PBE, respectively, and the corresponding amino acid sequence can be easily obtained) . In addition, an enhanced dCPF1-PBE2-X was constructed, i.e. an intron was added after the ZmUbi-1 promoter to increase the expression of dCPF1-PBE (SEQ ID NO: 10 shows the dLBCPF1-PBE-2NLS expression cassette that comprise the sequences of the ZmUbi-1 promoter and an intron) .

Example 3. CPF1-mediated A to G mutation in the plant genome (CPF1-ABE)

The following CPF1-ABE systems are constructed: dAsCPF1-1NLS-ABE, dFnCPF1-NLS-ABE, dLbCPF1-1NLS-ABE, and dAsCPF1-2NLS-ABE, dFnCPF1-2NLS-ABE, dLbCPF1-2NLS-ABE, where ABE can be ABE7 . 9 or ABE7.10. The crRNA is transcribed with UBI1 promoter and cleaved with a ribozyme.

The results by the GFP base editing reporter system of Fig. 3E indicate that both dFnCPF1-ABE7.10 (SEQ ID NO: 11) and dLbCPF1-ABE7.9 and dLbCPF1-ABE7.10 (SEQ ID NO: 12) can work, and efficiency of 7.10 is higher than 7.9 (Fig. 3F) .

The detection result by PCR/RE showed that activity was detected for dLbCPF1-ABE7.10, and 2NLS was better than 1NLS. In addition, two enhanced dCPF1-ABE2 were constructed by adding an intron after the UBI1 promoter to increase the expression of dCPF1-ABE (dCPF1-ABE2-X1) (SEQ ID NO. 13) , and ABE was also constructed at the C-terminus of CPF1 (dCPF1-ABE2-X2/X3) (SEQ ID NO. 14, 15) . The results by the GFP base editing reporter system of Fig. 3E indicate that: editing activity of dCPF1-ABE2-X2/X3 is higher than that of dLbCPF1-ABE7.10 (Fig. 3G) .

Example 4. Optimization of CPF1-mediated gene editing

In order to further improve the editing efficiency of CPF1, we continued to optimize the CPF1 system. Firstly, all expression vectors for CPF1-mediated editing are driven with BdUbi10 promoter, to increase the expression. In addition, crRNA is transcribed using a type II promoter, and the crRNA Array is placed into the 5'UTR or 3'UTR region of a gene to be expressed, to improve the editing efficiency of CPF1 by increasing mRNA expression.

Description of Relevant Sequences:

SEQ ID NO. 1 amino acid sequence of cytidine deaminase

SEQ ID NO. 2 amino acid sequence of uracil DNA glycosylase inhibitor (UGI)

SEQ ID NO. 3 amino acid sequence of WT ecTadA

SEQ ID NO. 4 amino acid sequence of ecTadA-derived DNA-dependent adenine deaminase (ABE 7.10)

SEQ ID NO. 5 encoding sequence of ASCPF1-2NLS

SEQ ID NO. 6 encoding sequence of FNCPF1-2NLS

SEQ ID NO. 7 encoding sequence of LBCPF1-2NLS

SEQ ID NO. 8 encoding sequence of dFNCPF1-PBE-2NLS

SEQ ID NO. 9 encoding sequence of dLBCPF1-PBE-2NLS

SEQ ID NO. 10 encoding sequence of promoter+intron+dLBCPF1-PBE-2NLS

SEQ ID NO. 11 encoding sequence of dFNCPF1-ABE7.10-2NLS

SEQ ID NO. 12 encoding sequence of dLBCPF1-ABE7.10-2NLS

SEQ ID NO. 13 encoding sequence of promoter+intron+dLBCPF1-ABE2-X

SEQ ID NO. 14 encoding sequence of LBCPF1-ABE2-X2

SEQ ID NO. 15 encoding sequence of LBCPF1-ABE2-X3

SEQ ID NO. 16 PJIT163-GFP

SEQ ID NO: 17 pBUI-mGFP

SEQ ID NO: 18 amino acid sequence of ASCPF1

SEQ ID NO: 19 amino acid sequence of FNCPF1

SEQ ID NO: 20 amino acid sequence of LBCPF1

SEQ ID NO: 21 amino acid sequence of ASCPF1-2NLS

SEQ ID NO: 22 amino acid sequence of FNCPF1-2NLS

SEQ ID NO: 23 amino acid sequence of LBCPF1-2NLS

SEQ ID NO: 24 amino acid sequence of dFNCPF1-PBE-2NLS

SEQ ID NO: 25 amino acid sequence of dLBCPF1-PBE-2NLS

SEQ ID NO: 26 amino acid sequence of dFNCPF1-ABE7.10-2NLS

SEQ ID NO: 27 amino acid sequence of dLBCPF1-ABE7.10-2NLS

SEQ ID NO: 28 amino acid sequence of LBCPF1-ABE2-X2

SEQ ID NO: 29 amino acid sequence of LBCPF1-ABE2-X3

SEQ ID NO: 30 nucleotide sequence of promoter+intron

Claims

A system for base editing of a target sequence in the genome of an organism, comprising at least one of the following i) to v) :

i) a base-editing fusion protein, and a guide RNA;

ii) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein, and a guide RNA;

iii) a base-editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;

iv) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein, and an expression construct comprising a nucleotide sequence encoding a guide RNA;

v) an expression construct comprising a nucleotide sequence encoding a base-editing fusion protein and a nucleotide sequence encoding a guide RNA;

wherein the base-editing fusion protein comprises a Cpf1 lacking DNA cleavage activity and a deaminase, the guide RNA being capable of targeting the base-editing fusion protein to a target sequence in the genome, resulting in one or more C to T or A to G substitutions in the target sequence.
The system of claim 1, wherein the Cpf1 lacking DNA cleavage activity is FnCpfl lacking DNA cleavage activity, for example the FnCpfl lacking DNA cleavage activity comprises a D917A mutation relative to wild-type FnCpfl.
The system of claim 1, wherein the Cpf1 lacking DNA cleavage activity is AsCpfl lacking DNA cleavage activity, for example the AsCpfl lacking DNA cleavage activity comprises a D908A mutation relative to wild-type AsCpfl.
The system of claim 1, wherein the Cpf1 lacking DNA cleavage activity is LbCpfl lacking DNA cleavage activity, for example the LbCpfl lacking DNA cleavage activity comprises a D832A mutation relative to wild type LbCpfl.
The system of claim 1, wherein the deaminase is a cytidine deaminase, such as the apolipoprotein B mRNA editing complex (APOBEC) family deaminase.
The system of claim 5, wherein the cytidine deaminase is APOBEC1 deaminase or activation-induced cytidine deaminase (AID)
The system of claim 5, wherein the base-editing fusion protein further comprises a uracil DNA glycosylase inhibitor (UGI)
The system of claim 1, wherein the deaminase is a DNA-dependent adenine deaminase, preferably a single-stranded DNA-dependent adenine deaminase.
The system of claim 8, wherein the DNA-dependent adenine deaminase is a variant of the E. coli tRNA adenine deaminase TadA (ecTadA) , in particular a variant which can accept single-stranded DNA as a substrate,
The system of claim 9, wherein the DNA-dependent adenine deaminase comprises, relative to wild-type ecTadA, one or more sets of mutations selected from the group consisting of:

1) A106V and D108N;

2) D147Y and E155V;

3) L84F, H123Y and I156F;

4) A142N;

5) H36L, R51L, S146C and K157N;

6) P48S/T/A;

7) A142N;

8) W23L/R;

9) R152H/P.
The system of claim 10, wherein the DNA-dependent adenine deaminase comprises the following mutations relative to wild-type ecTadA: W23R, H36L, R51L, S146C, K157N, A106V, D108N, P48A, L84F, H123Y, I156F, D147Y, E155V and R152P.
The system of claim 9, the N-terminus of the DNA-dependent adenine deaminase is fused with a corresponding wild-type adenine deaminase, preferably the N-terminus of the DNA-dependent adenine deaminase is fused to a corresponding wild-type adenine deaminase via a linker.
The system of claim 1, wherein the deaminase is fused to the N-terminus of the Cpfl lacking DNA cleavage activity, or the deaminase is fused to the C-terminus of the Cpfl lacking DNA cleavage activity.
The system of claim 1, wherein the deaminase and the Cpfl lacking DNA cleavage activity are fused via a linker.
The system of claim 1, wherein the base-editing fusion protein further comprises a nuclear localization sequence (NLS) at its N-terminus and/or C-terminus.
The system of claim 1, wherein the nucleotide sequence encoding the base-editing fusion protein is codon optimized for the organism to be base edited.
The system of claim 1, the nucleotide sequence encoding the base-editing fusion protein and/or the nucleotide sequence encoding the guide RNA is operably linked to an expression regulatory element.
The system of claim 17, wherein the regulatory element is a promoter, such as 35S promoter, maize Ubi-1 promoter, wheat U6 promoter, rice U3 promoter or maize U3 promoter.
A method of producing a genetically modified organism comprising introducing the system of any of claims 1-18 into a cell of the organism, whereby the guide RNA targets the base editing fusion protein to a target sequence in the genome of the cell, resulting in one or more C to T or A to G substitutions in the target sequence.
The method of claim 19, wherein the organism is selected from a mammal such as human, mouse, rat, monkey, dog, pig, sheep, cow, cat; a poultry such as chicken, duck, goose; a plant, including a monocot and a dicot, such as rice, corn, wheat, sorghum, barley, soybeans, peanuts, Arabidopsis.