WO2024186806A2 - Plant pathogen resistance genes - Google Patents
Plant pathogen resistance genes Download PDFInfo
- Publication number
- WO2024186806A2 WO2024186806A2 PCT/US2024/018501 US2024018501W WO2024186806A2 WO 2024186806 A2 WO2024186806 A2 WO 2024186806A2 US 2024018501 W US2024018501 W US 2024018501W WO 2024186806 A2 WO2024186806 A2 WO 2024186806A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- plant
- rot
- maize
- gene
- seq
- Prior art date
Links
- 108090000623 proteins and genes Proteins 0.000 title claims description 204
- 244000000003 plant pathogen Species 0.000 title description 4
- 206010034133 Pathogen resistance Diseases 0.000 title description 2
- 238000000034 method Methods 0.000 claims abstract description 115
- 108700026215 vpr Genes Proteins 0.000 claims abstract description 110
- 101150090155 R gene Proteins 0.000 claims abstract description 79
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 64
- 201000010099 disease Diseases 0.000 claims abstract description 62
- 230000001965 increasing effect Effects 0.000 claims abstract description 43
- 230000009261 transgenic effect Effects 0.000 claims abstract description 18
- 238000010362 genome editing Methods 0.000 claims abstract description 16
- 230000004048 modification Effects 0.000 claims abstract description 11
- 238000012986 modification Methods 0.000 claims abstract description 11
- 241000196324 Embryophyta Species 0.000 claims description 314
- 150000007523 nucleic acids Chemical class 0.000 claims description 126
- 240000008042 Zea mays Species 0.000 claims description 106
- 239000003550 marker Substances 0.000 claims description 105
- 235000002017 Zea mays subsp mays Nutrition 0.000 claims description 104
- 235000016383 Zea mays subsp huehuetenangensis Nutrition 0.000 claims description 89
- 235000009973 maize Nutrition 0.000 claims description 88
- 102000039446 nucleic acids Human genes 0.000 claims description 69
- 108020004707 nucleic acids Proteins 0.000 claims description 69
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 68
- 230000014509 gene expression Effects 0.000 claims description 65
- 108700028369 Alleles Proteins 0.000 claims description 64
- 108091033319 polynucleotide Proteins 0.000 claims description 51
- 102000040430 polynucleotide Human genes 0.000 claims description 51
- 239000002157 polynucleotide Substances 0.000 claims description 51
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 41
- 239000000463 material Substances 0.000 claims description 33
- 108091026890 Coding region Proteins 0.000 claims description 22
- 241000233679 Peronosporaceae Species 0.000 claims description 18
- 240000007594 Oryza sativa Species 0.000 claims description 16
- 235000007164 Oryza sativa Nutrition 0.000 claims description 16
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 claims description 16
- 235000005822 corn Nutrition 0.000 claims description 16
- 235000009566 rice Nutrition 0.000 claims description 16
- 230000001105 regulatory effect Effects 0.000 claims description 13
- 230000001580 bacterial effect Effects 0.000 claims description 12
- 241000219194 Arabidopsis Species 0.000 claims description 11
- 238000012216 screening Methods 0.000 claims description 11
- 244000068988 Glycine max Species 0.000 claims description 8
- 235000010469 Glycine max Nutrition 0.000 claims description 8
- 240000005979 Hordeum vulgare Species 0.000 claims description 8
- 235000007340 Hordeum vulgare Nutrition 0.000 claims description 8
- 241000209140 Triticum Species 0.000 claims description 8
- 235000021307 Triticum Nutrition 0.000 claims description 8
- 108020004511 Recombinant DNA Proteins 0.000 claims description 7
- 244000062793 Sorghum vulgare Species 0.000 claims description 7
- 241000223218 Fusarium Species 0.000 claims description 6
- 241000209510 Liliopsida Species 0.000 claims description 6
- JEIPFZHSYJVQDO-UHFFFAOYSA-N iron(III) oxide Inorganic materials O=[Fe]O[Fe]=O JEIPFZHSYJVQDO-UHFFFAOYSA-N 0.000 claims description 6
- 240000000111 Saccharum officinarum Species 0.000 claims description 5
- 235000007201 Saccharum officinarum Nutrition 0.000 claims description 5
- 235000011684 Sorghum saccharatum Nutrition 0.000 claims description 5
- 235000013339 cereals Nutrition 0.000 claims description 5
- 235000019713 millet Nutrition 0.000 claims description 5
- UHPMCKVQTMMPCG-UHFFFAOYSA-N 5,8-dihydroxy-2-methoxy-6-methyl-7-(2-oxopropyl)naphthalene-1,4-dione Chemical compound CC1=C(CC(C)=O)C(O)=C2C(=O)C(OC)=CC(=O)C2=C1O UHPMCKVQTMMPCG-UHFFFAOYSA-N 0.000 claims description 4
- 241000371644 Curvularia ravenelii Species 0.000 claims description 4
- 241000935926 Diplodia Species 0.000 claims description 4
- 206010028851 Necrosis Diseases 0.000 claims description 4
- 240000000275 Persicaria hydropiper Species 0.000 claims description 4
- 235000017337 Persicaria hydropiper Nutrition 0.000 claims description 4
- 241000233639 Pythium Species 0.000 claims description 4
- 241001361634 Rhizoctonia Species 0.000 claims description 4
- 231100000518 lethal Toxicity 0.000 claims description 4
- 230000001665 lethal effect Effects 0.000 claims description 4
- 230000017074 necrotic cell death Effects 0.000 claims description 4
- 235000014698 Brassica juncea var multisecta Nutrition 0.000 claims description 3
- 235000006008 Brassica napus var napus Nutrition 0.000 claims description 3
- 240000000385 Brassica napus var. napus Species 0.000 claims description 3
- 235000006618 Brassica rapa subsp oleifera Nutrition 0.000 claims description 3
- 235000004977 Brassica sinapistrum Nutrition 0.000 claims description 3
- 229920000742 Cotton Polymers 0.000 claims description 3
- 241000219146 Gossypium Species 0.000 claims description 3
- 244000020551 Helianthus annuus Species 0.000 claims description 3
- 235000003222 Helianthus annuus Nutrition 0.000 claims description 3
- 240000004658 Medicago sativa Species 0.000 claims description 3
- 235000017587 Medicago sativa ssp. sativa Nutrition 0.000 claims description 3
- 241001520808 Panicum virgatum Species 0.000 claims description 3
- 241000228212 Aspergillus Species 0.000 claims description 2
- 241000190146 Botryosphaeria Species 0.000 claims description 2
- 241001619326 Cephalosporium Species 0.000 claims description 2
- 241001157813 Cercospora Species 0.000 claims description 2
- 241000222290 Cladosporium Species 0.000 claims description 2
- 241000030549 Claviceps gigantea Species 0.000 claims description 2
- 241001529717 Corticium <basidiomycota> Species 0.000 claims description 2
- 240000008067 Cucumis sativus Species 0.000 claims description 2
- 235000010799 Cucumis sativus var sativus Nutrition 0.000 claims description 2
- 241000223208 Curvularia Species 0.000 claims description 2
- 241000555695 Didymella Species 0.000 claims description 2
- 241000308375 Graminicola Species 0.000 claims description 2
- 241000744855 Holcus Species 0.000 claims description 2
- 241000441510 Hormodendrum Species 0.000 claims description 2
- 244000309480 Hyalothyridium Species 0.000 claims description 2
- 241000289619 Macropodidae Species 0.000 claims description 2
- 241000611254 Maize rayado fino virus Species 0.000 claims description 2
- 206010027146 Melanoderma Diseases 0.000 claims description 2
- 241000228143 Penicillium Species 0.000 claims description 2
- 235000002233 Penicillium roqueforti Nutrition 0.000 claims description 2
- 241000286137 Phaeocytostroma Species 0.000 claims description 2
- 241000555275 Phaeosphaeria Species 0.000 claims description 2
- 241001115351 Physalospora Species 0.000 claims description 2
- 241000812330 Pyrenochaeta Species 0.000 claims description 2
- 241001183191 Sclerophthora macrospora Species 0.000 claims description 2
- 241001558929 Sclerotium <basidiomycota> Species 0.000 claims description 2
- 241000986481 Selenophoma Species 0.000 claims description 2
- 240000002439 Sorghum halepense Species 0.000 claims description 2
- 241000223259 Trichoderma Species 0.000 claims description 2
- 206010000210 abortion Diseases 0.000 claims description 2
- 231100000176 abortion Toxicity 0.000 claims description 2
- GDTBXPJZTBHREO-UHFFFAOYSA-N bromine Chemical compound BrBr GDTBXPJZTBHREO-UHFFFAOYSA-N 0.000 claims description 2
- 239000003610 charcoal Substances 0.000 claims description 2
- 235000019219 chocolate Nutrition 0.000 claims description 2
- 230000001172 regenerating effect Effects 0.000 claims description 2
- 239000004460 silage Substances 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 210000003462 vein Anatomy 0.000 claims description 2
- 241000209072 Sorghum Species 0.000 claims 3
- 208000035240 Disease Resistance Diseases 0.000 abstract description 40
- 239000012634 fragment Substances 0.000 abstract description 39
- 230000001488 breeding effect Effects 0.000 abstract description 13
- 238000009395 breeding Methods 0.000 abstract description 12
- 238000004519 manufacturing process Methods 0.000 abstract description 3
- 102000054767 gene variant Human genes 0.000 abstract 1
- 210000004027 cell Anatomy 0.000 description 66
- 102000004169 proteins and genes Human genes 0.000 description 63
- 102000012064 NLR Proteins Human genes 0.000 description 61
- 108020004414 DNA Proteins 0.000 description 43
- 230000002068 genetic effect Effects 0.000 description 39
- 125000003729 nucleotide group Chemical group 0.000 description 39
- 210000001519 tissue Anatomy 0.000 description 38
- 239000002773 nucleotide Substances 0.000 description 37
- 210000000349 chromosome Anatomy 0.000 description 36
- 229920001184 polypeptide Polymers 0.000 description 36
- 108090000765 processed proteins & peptides Proteins 0.000 description 36
- 102000004196 processed proteins & peptides Human genes 0.000 description 36
- 229920003266 Leaf® Polymers 0.000 description 35
- 230000006798 recombination Effects 0.000 description 32
- 238000005215 recombination Methods 0.000 description 30
- 239000000523 sample Substances 0.000 description 26
- 239000013615 primer Substances 0.000 description 24
- 238000005516 engineering process Methods 0.000 description 22
- 238000001514 detection method Methods 0.000 description 20
- 239000012636 effector Substances 0.000 description 20
- 238000003752 polymerase chain reaction Methods 0.000 description 19
- 241000894007 species Species 0.000 description 19
- 108091092878 Microsatellite Proteins 0.000 description 18
- 238000003780 insertion Methods 0.000 description 17
- 230000037431 insertion Effects 0.000 description 17
- 230000000295 complement effect Effects 0.000 description 16
- 238000009396 hybridization Methods 0.000 description 16
- 244000052769 pathogen Species 0.000 description 16
- 102000054766 genetic haplotypes Human genes 0.000 description 15
- 102000054765 polymorphisms of proteins Human genes 0.000 description 15
- 238000004458 analytical method Methods 0.000 description 14
- 230000001717 pathogenic effect Effects 0.000 description 13
- 238000012217 deletion Methods 0.000 description 12
- 230000037430 deletion Effects 0.000 description 12
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 11
- 230000002759 chromosomal effect Effects 0.000 description 11
- 230000000694 effects Effects 0.000 description 11
- 230000009466 transformation Effects 0.000 description 11
- 108091008109 Pseudogenes Proteins 0.000 description 10
- 102000057361 Pseudogenes Human genes 0.000 description 10
- 230000008859 change Effects 0.000 description 10
- 238000009826 distribution Methods 0.000 description 10
- 238000012163 sequencing technique Methods 0.000 description 10
- 230000005945 translocation Effects 0.000 description 10
- 239000002299 complementary DNA Substances 0.000 description 9
- 239000003623 enhancer Substances 0.000 description 9
- 230000003993 interaction Effects 0.000 description 9
- 230000021121 meiosis Effects 0.000 description 8
- 239000000203 mixture Substances 0.000 description 8
- 239000003147 molecular marker Substances 0.000 description 8
- 238000006467 substitution reaction Methods 0.000 description 8
- 238000003559 RNA-seq method Methods 0.000 description 7
- 230000009418 agronomic effect Effects 0.000 description 7
- 230000002349 favourable effect Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000002703 mutagenesis Methods 0.000 description 7
- 231100000350 mutagenesis Toxicity 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000005204 segregation Methods 0.000 description 7
- 125000006850 spacer group Chemical group 0.000 description 7
- 230000002103 transcriptional effect Effects 0.000 description 7
- 108091034117 Oligonucleotide Proteins 0.000 description 6
- 108091000080 Phosphotransferase Proteins 0.000 description 6
- -1 SNP Proteins 0.000 description 6
- 230000003321 amplification Effects 0.000 description 6
- 238000003556 assay Methods 0.000 description 6
- 238000013507 mapping Methods 0.000 description 6
- 238000003199 nucleic acid amplification method Methods 0.000 description 6
- 102000020233 phosphotransferase Human genes 0.000 description 6
- 230000000306 recurrent effect Effects 0.000 description 6
- 238000012935 Averaging Methods 0.000 description 5
- 108010044467 Isoenzymes Proteins 0.000 description 5
- 238000013459 approach Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 230000000875 corresponding effect Effects 0.000 description 5
- 238000000338 in vitro Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 description 5
- 108091035707 Consensus sequence Proteins 0.000 description 4
- 108091060211 Expressed sequence tag Proteins 0.000 description 4
- 241000760038 Monosis Species 0.000 description 4
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 4
- 238000007792 addition Methods 0.000 description 4
- 230000004075 alteration Effects 0.000 description 4
- 150000001413 amino acids Chemical class 0.000 description 4
- 210000004899 c-terminal region Anatomy 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 4
- 239000002131 composite material Substances 0.000 description 4
- 241001233957 eudicotyledons Species 0.000 description 4
- 230000001976 improved effect Effects 0.000 description 4
- 230000001404 mediated effect Effects 0.000 description 4
- 230000035772 mutation Effects 0.000 description 4
- 108010089193 pattern recognition receptors Proteins 0.000 description 4
- 102000007863 pattern recognition receptors Human genes 0.000 description 4
- 210000001938 protoplast Anatomy 0.000 description 4
- 230000002441 reversible effect Effects 0.000 description 4
- 230000001131 transforming effect Effects 0.000 description 4
- 238000011144 upstream manufacturing Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 206010001513 AIDS related complex Diseases 0.000 description 3
- 102000004190 Enzymes Human genes 0.000 description 3
- 108090000790 Enzymes Proteins 0.000 description 3
- 108091092195 Intron Proteins 0.000 description 3
- 108091026898 Leader sequence (mRNA) Proteins 0.000 description 3
- 108010006444 Leucine-Rich Repeat Proteins Proteins 0.000 description 3
- 238000007476 Maximum Likelihood Methods 0.000 description 3
- 108091036066 Three prime untranslated region Proteins 0.000 description 3
- 125000000539 amino acid group Chemical group 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 210000004901 leucine-rich repeat Anatomy 0.000 description 3
- 108020004999 messenger RNA Proteins 0.000 description 3
- 210000003205 muscle Anatomy 0.000 description 3
- 238000003976 plant breeding Methods 0.000 description 3
- 230000008488 polyadenylation Effects 0.000 description 3
- 239000002987 primer (paints) Substances 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 230000010076 replication Effects 0.000 description 3
- 238000013515 script Methods 0.000 description 3
- 230000017260 vegetative to reproductive phase transition of meristem Effects 0.000 description 3
- 239000001707 (E,7R,11R)-3,7,11,15-tetramethylhexadec-2-en-1-ol Substances 0.000 description 2
- 241000589155 Agrobacterium tumefaciens Species 0.000 description 2
- 238000001712 DNA sequencing Methods 0.000 description 2
- 102000000589 Interleukin-1 Human genes 0.000 description 2
- 108010002352 Interleukin-1 Proteins 0.000 description 2
- 108010036473 NLR Proteins Proteins 0.000 description 2
- 108700026244 Open Reading Frames Proteins 0.000 description 2
- BLUHKGOSFDHHGX-UHFFFAOYSA-N Phytol Natural products CC(C)CCCC(C)CCCC(C)CCCC(C)C=CO BLUHKGOSFDHHGX-UHFFFAOYSA-N 0.000 description 2
- 108091028664 Ribonucleotide Proteins 0.000 description 2
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 2
- 238000010459 TALEN Methods 0.000 description 2
- HNZBNQYXWOLKBA-UHFFFAOYSA-N Tetrahydrofarnesol Natural products CC(C)CCCC(C)CCCC(C)=CCO HNZBNQYXWOLKBA-UHFFFAOYSA-N 0.000 description 2
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 2
- 108700019146 Transgenes Proteins 0.000 description 2
- 108090000848 Ubiquitin Proteins 0.000 description 2
- 102000044159 Ubiquitin Human genes 0.000 description 2
- 108010017070 Zinc Finger Nucleases Proteins 0.000 description 2
- BOTWFXYSPFMFNR-OALUTQOASA-N all-rac-phytol Natural products CC(C)CCC[C@H](C)CCC[C@H](C)CCCC(C)=CCO BOTWFXYSPFMFNR-OALUTQOASA-N 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000004071 biological effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 230000008602 contraction Effects 0.000 description 2
- 230000008260 defense mechanism Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000004520 electroporation Methods 0.000 description 2
- 238000010195 expression analysis Methods 0.000 description 2
- 239000012014 frustrated Lewis pair Substances 0.000 description 2
- 238000003205 genotyping method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 230000003834 intracellular effect Effects 0.000 description 2
- 210000001161 mammalian embryo Anatomy 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 101150113790 nlr gene Proteins 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- BOTWFXYSPFMFNR-PYDDKJGSSA-N phytol Chemical compound CC(C)CCC[C@@H](C)CCC[C@@H](C)CCC\C(C)=C\CO BOTWFXYSPFMFNR-PYDDKJGSSA-N 0.000 description 2
- 230000010152 pollination Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000003259 recombinant expression Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 239000002336 ribonucleotide Substances 0.000 description 2
- 125000002652 ribonucleotide group Chemical group 0.000 description 2
- 230000001568 sexual effect Effects 0.000 description 2
- 238000002741 site-directed mutagenesis Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000000123 temperature gradient gel electrophoresis Methods 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000011426 transformation method Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 101150092328 22 gene Proteins 0.000 description 1
- 101150090724 3 gene Proteins 0.000 description 1
- 108091006112 ATPases Proteins 0.000 description 1
- 102000011932 ATPases Associated with Diverse Cellular Activities Human genes 0.000 description 1
- 108010075752 ATPases Associated with Diverse Cellular Activities Proteins 0.000 description 1
- 102000057290 Adenosine Triphosphatases Human genes 0.000 description 1
- 244000291564 Allium cepa Species 0.000 description 1
- 235000002732 Allium cepa var. cepa Nutrition 0.000 description 1
- 229910000497 Amalgam Inorganic materials 0.000 description 1
- 108091093088 Amplicon Proteins 0.000 description 1
- 241000219195 Arabidopsis thaliana Species 0.000 description 1
- 241000972773 Aulopiformes Species 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- 108091033409 CRISPR Proteins 0.000 description 1
- 238000010354 CRISPR gene editing Methods 0.000 description 1
- 108020004705 Codon Proteins 0.000 description 1
- 108020004635 Complementary DNA Proteins 0.000 description 1
- 241000195493 Cryptophyta Species 0.000 description 1
- 108010066133 D-octopine dehydrogenase Proteins 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 239000003298 DNA probe Substances 0.000 description 1
- 230000004543 DNA replication Effects 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- 241000221785 Erysiphales Species 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 108090000652 Flap endonucleases Proteins 0.000 description 1
- 102000004150 Flap endonucleases Human genes 0.000 description 1
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 1
- 208000034951 Genetic Translocation Diseases 0.000 description 1
- 206010071602 Genetic polymorphism Diseases 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 101000878605 Homo sapiens Low affinity immunoglobulin epsilon Fc receptor Proteins 0.000 description 1
- 230000010740 Hormone Receptor Interactions Effects 0.000 description 1
- 206010020649 Hyperkeratosis Diseases 0.000 description 1
- 102100034343 Integrase Human genes 0.000 description 1
- 102000019223 Interleukin-1 receptor Human genes 0.000 description 1
- 108050006617 Interleukin-1 receptor Proteins 0.000 description 1
- 240000008415 Lactuca sativa Species 0.000 description 1
- 235000003228 Lactuca sativa Nutrition 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- 241000234280 Liliaceae Species 0.000 description 1
- 102100038007 Low affinity immunoglobulin epsilon Fc receptor Human genes 0.000 description 1
- 108020005196 Mitochondrial DNA Proteins 0.000 description 1
- 241000244206 Nematoda Species 0.000 description 1
- 101710163270 Nuclease Proteins 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 108700001094 Plant Genes Proteins 0.000 description 1
- 102000001253 Protein Kinase Human genes 0.000 description 1
- 102000004022 Protein-Tyrosine Kinases Human genes 0.000 description 1
- 108090000412 Protein-Tyrosine Kinases Proteins 0.000 description 1
- 101150035080 RGA5 gene Proteins 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 1
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 1
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 1
- 102000002278 Ribosomal Proteins Human genes 0.000 description 1
- 108010000605 Ribosomal Proteins Proteins 0.000 description 1
- 244000061456 Solanum tuberosum Species 0.000 description 1
- 235000002595 Solanum tuberosum Nutrition 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 102000040945 Transcription factor Human genes 0.000 description 1
- 108091023045 Untranslated Region Proteins 0.000 description 1
- 244000083398 Zea diploperennis Species 0.000 description 1
- 235000007241 Zea diploperennis Nutrition 0.000 description 1
- 235000007244 Zea mays Nutrition 0.000 description 1
- 235000017556 Zea mays subsp parviglumis Nutrition 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 208000005652 acute fatty liver of pregnancy Diseases 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012742 biochemical analysis Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000003197 catalytic effect Effects 0.000 description 1
- 230000021164 cell adhesion Effects 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 208000030499 combat disease Diseases 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000004665 defense response Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000002939 deleterious effect Effects 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 229910003460 diamond Inorganic materials 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 238000002845 discoloration Methods 0.000 description 1
- 230000005782 double-strand break Effects 0.000 description 1
- 230000010318 early mammalian development Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 210000002257 embryonic structure Anatomy 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000034964 establishment of cell polarity Effects 0.000 description 1
- 210000003527 eukaryotic cell Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 108020001507 fusion proteins Proteins 0.000 description 1
- 102000037865 fusion proteins Human genes 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- 108091008053 gene clusters Proteins 0.000 description 1
- 238000011331 genomic analysis Methods 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 230000002363 herbicidal effect Effects 0.000 description 1
- 239000004009 herbicide Substances 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 230000006882 induction of apoptosis Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- PMHURSZHKKJGBM-UHFFFAOYSA-N isoxaben Chemical compound O1N=C(C(C)(CC)CC)C=C1NC(=O)C1=C(OC)C=CC=C1OC PMHURSZHKKJGBM-UHFFFAOYSA-N 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 210000004962 mammalian cell Anatomy 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 238000000520 microinjection Methods 0.000 description 1
- 238000000386 microscopy Methods 0.000 description 1
- 208000024191 minimally invasive lung adenocarcinoma Diseases 0.000 description 1
- 238000001823 molecular biology technique Methods 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000004897 n-terminal region Anatomy 0.000 description 1
- 230000001338 necrotic effect Effects 0.000 description 1
- 230000003988 neural development Effects 0.000 description 1
- 108010058731 nopaline synthase Proteins 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000002974 pharmacogenomic effect Effects 0.000 description 1
- 230000000243 photosynthetic effect Effects 0.000 description 1
- 230000008635 plant growth Effects 0.000 description 1
- 239000013612 plasmid Substances 0.000 description 1
- 210000002706 plastid Anatomy 0.000 description 1
- 210000001236 prokaryotic cell Anatomy 0.000 description 1
- 108060006633 protein kinase Proteins 0.000 description 1
- 230000007398 protein translocation Effects 0.000 description 1
- 230000004850 protein–protein interaction Effects 0.000 description 1
- 238000012175 pyrosequencing Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 102000005962 receptors Human genes 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000014493 regulation of gene expression Effects 0.000 description 1
- 230000022532 regulation of transcription, DNA-dependent Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 235000019515 salmon Nutrition 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 230000010153 self-pollination Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 230000019491 signal transduction Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012289 standard assay Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000004114 suspension culture Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000005026 transcription initiation Effects 0.000 description 1
- 230000005030 transcription termination Effects 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 241000701447 unidentified baculovirus Species 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/415—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
-
- A—HUMAN NECESSITIES
- A01—AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
- A01H—NEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
- A01H1/00—Processes for modifying genotypes ; Plants characterised by associated natural traits
- A01H1/04—Processes of selection involving genotypic or phenotypic markers; Methods of using phenotypic markers for selection
-
- A—HUMAN NECESSITIES
- A01—AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
- A01H—NEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
- A01H1/00—Processes for modifying genotypes ; Plants characterised by associated natural traits
- A01H1/12—Processes for modifying agronomic input traits, e.g. crop yield
- A01H1/122—Processes for modifying agronomic input traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance
- A01H1/1245—Processes for modifying agronomic input traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance for biotic stress resistance, e.g. pathogen, pest or disease resistance
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/82—Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
- C12N15/8241—Phenotypically and genetically modified plants via recombinant DNA technology
- C12N15/8261—Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield
- C12N15/8271—Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance
- C12N15/8279—Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance for biotic stress resistance, pathogen resistance, disease resistance
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/6895—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/13—Plant traits
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- sequence listing is submitted electronically as an xml- formatted sequence listing file named 9667-US-PSP ST26 created on March 2, 2023, and having a size of 9,668,699 bytes which is filed concurrently with the specification.
- sequence listing comprised in this xml-formatted document is part of the specification and is herein incorporated by reference in its entirety.
- the disclosures relates to disease resistance genes, plant breeding and methods of identifying and selecting disease resistance genes.
- Plant pathogens cause significant crop loss world-wide, and new resistance genes deployed to combat diseases can be overcome quickly. Plant disease resistance gene complements are the result of millions of years of coevolution with pathogens. Resistance genes encode proteins which form a multi-layer defense mechanism that can detect pathogen- associated molecular patterns (PAMPs) or damage-associated molecular patterns (DAMPs) through extracellular pattern recognition receptors (PRRs), as well small, secreted pathogen effectors, through intracellular nucleotide-binding leucine-rich repeat receptors (NLRs) (Zipfel 2014 Trends Immunol, 35:345-51; Monteiro and Nishimura 2018 Annu Rev Phytopathol, 56:243-267; Jones and Dangl, 2006 Nature, 444:323-9).
- PAMPs pathogen- associated molecular patterns
- DAMPs damage-associated molecular patterns
- NLRs intracellular nucleotide-binding leucine-rich repeat receptors
- PRRs are primarily comprised of trans-membrane domain-containing proteins, in which extracellular domains interact with PAMPs or DAMPs. This interaction can cause a conformatiexonal change that initiates a signaling cascade through the action of an intracellular kinase domain (Tang et al. 2017 Plant Cell, 29:618-637). Effectors are excreted by plant pathogens for a variety of purposes, including suppression of plant defense responses that are triggered by PRRs (Irieda et al.
- NLRs have been found to underlie dominant resistance phenotypes in many crop species, including rice, soybean, wheat and maize (Liu et al. 2020 Plant Biotechnol J, 18: 1376-1383; Wang et al. 2021 Nat Commun, 12:6263; Saintenac et al. 2013 Science, 341 :783-786; Deng et al. 2022 Mol Plant, 15:904-912; Thatcher et al. 2022 Mol Plant Pathol DOI: 10.1111/mpp.13267).
- Maize pathogens cause significant crop loss annually, and thus there is significant interest in identifying new sources of resistance genes (Mueller 2016 Plant Health Progress, 17: 12). Maize is thought to have been domesticated during a single event roughly 9,000 years ago, implying that a significant portion of the resistance gene diversity in maize’s wild ancestors may have been lost in modem day varieties through the initial domestication event and subsequent breeding (Yang et al. 2019 Proc Natl Acad Sci U S A, 116:5643-5652; Matsuoka et al. 2002 Proc Natl Acad Sci USA, 99, 6080-4).
- compositions and methods are based on the discovery disclosed herein of a large number of new maize genes, including genes that have the structural features and expression patterns making them suitable for use as disease resistance genes.
- These disease resistance genes (“R genes”) can provide increased resistance to a disease.
- the compositions and methods disclosed herein are thus useful in selecting disease resistant plants, breeding for disease resistant plants, creating transgenic disease resistant plants, and/or using genome editing to introduce or improve disease resistance in plants.
- plants and methods for making plants having the disclosed markers and/or genes associated with disease resistance that is enhanced as compared to control plants.
- the compositions and methods are useful in selecting disease resistant plants, introgressing disease resistance into plants, creating transgenic disease resistant plants, and/or creating disease resistant genome edited plants.
- the methods for identifying and/or selecting comprise detecting or selecting one or more plant materials having a genomic region comprising a sequence that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
- the identified or selected plant may possess plant disease resistance that is newly conferred or enhanced relative to a control plant that does not have a genomic region comprising one or more of a genomic region comprising a sequence that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
- methods are provided to identify and/or select plant materials with a QTL containing an R gene or marker allele associated with R gene that can confer increased resistance to plant disease.
- such methods can include obtaining a nucleic acid sample from a plant, seed, tissue or germplasm thereof; and screening the sample for the presence of a QTL containing the R gene or a marker allele associated with the R gene, wherein the R gene (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
- the method can include screening the sample for the presence of a marker allele linked to the R gene, e.g., by 10 cM, 9 cM, 8 cM, 7 cM, 6 cM, 5 cM, 4 cM, 3 cM, 2 cM, 1 cM, 0.9 cM, 0.8 cM, 0.7 cM, 0.6 cM, 0.5 cM, 0.4 cM, 0.3 cM, 0.2 cM, 0.1 cM, or less on a single meiosis-based genetic map, and associated.
- a marker allele linked to the R gene e.g., by 10 cM, 9 cM, 8 cM, 7 cM, 6 cM, 5 cM, 4 cM, 3 cM, 2 cM, 1 cM, 0.9 cM, 0.8 cM, 0.7 cM, 0.6 cM, 0.5 cM, 0.4
- the method can further include detecting one or more R genes or one or more marker alleles linked to R genes, where the one or more R gene (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478, thereby identifying the plant material as comprising a QTL or marker allele associated with increased resistance to plant disease. Additionally, the method can include selecting the plant material identified as comprising one or more R genes or one or more marker alleles linked to R genes.
- the foregoing method of identifying and/or selecting plant materials with a QTL or marker allele associated with increased resistance to plant disease can include obtaining a nucleic acid sample from each of one or more plants, seeds, tissues or germplasm in a population; screening each sample for the presence of one or more R gene, a QTL comprising one or more R gene, or a marker allele associated with the R gene, wherein each R gene (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478; and selecting one or more of the plants, seeds, tissues or germplasm having the R gene associated with increased resistance to plant disease.
- the foregoing methods of identifying and/or selecting plant materials with increased resistance to plant disease can include obtaining a nucleic acid sample from one or more plants, seeds, tissues or germplasm, each sample being representative of a plurality (e.g., a population) of plants, seeds, tissues or germplasm; screening each sample for the presence of one or more of the foregoing R genes, a QTL comprising one or more of the foregoing R genes, or a marker allele associated with the foregoing R genes, wherein each R gene (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: l- 1739 or (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478; and selecting one or more plurality of plants, seeds, tissues or germplasm, wherein the representative sample for the selected plurality has the one or more of the foregoing R genes, a QTL comprising one or more of the foregoing R genes, or
- the foregoing methods identifying and/or selecting plants can further include crossing at least one of the selected plants comprising an R gene to a second plant that does not have the R gene, thereby producing a progeny plant whose genome comprises one or more R genes that (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1 - 1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
- the second plant is one of a plant line (a “recurrent parent line”) and the method further includes crossing the progeny plant with another plant of the recurrent parent line to produce a second-generation progeny whose genome comprises one or more R genes that (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
- the second- generation progeny can be crossed with the recurrent parent line to produce a third-generation progeny whose genome comprises one or more R genes that (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
- This process can be repeated three, four, five, six, seven, or more times, such that each subsequent generation progeny is crossed with the recurrent parent line, thereby introgressing the R gene into the recurrent parent line.
- a plant having the one or more R genes is crossed with a second plant to produce progeny plants.
- the progeny plants are screened for a QTL or marker allele associated with the R gene in accordance with the methods disclosed herein.
- screening includes obtaining a nucleic acid sample from each of the progeny plants and screening the sample for the presence of nucleic acid comprising one or more R genes that
- methods include expressing in a plant material a heterologous nucleic acid capable increasing plant disease resistance.
- the method can include introducing into the plant material a nucleic acid sequence, e.g., by transgenic modification or genome editing, approaches.
- the plant material is susceptible to plant disease prior to introducing the heterologous nucleic acid.
- the genome of a plant is altered by transgenic modification or gene editing to include one or more R genes that (i) are at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence having 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478.
- the transgenic or genome edited plant materials provide increased resistance to a plant disease relative to otherwise isogenic plant lacking materials lacking the R gene introduced by transgenic or gene edited modification.
- the construct comprises a nucleic acid that is heterologous to the plant material, and the heterologous nucleic acid comprises an R gene sequence that (i) comprises at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encodes a polypeptide comprising at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478.
- the construct is a recombinant construct in the foregoing R gene sequence is operably linked to at least one heterologous regulatory sequence.
- a method of introducing the foregoing construct comprising an R gene sequence into plant material wherein, for example, the construct comprises the R gene sequence operably linked to its native promoter and the construct is introduced into a heterologous genomic locus that did not comprise the R gene prior to the construct being introduced.
- the foregoing construct is recombinant expression construct that comprises the R gene operably linked to a heterologous promoter; and the method comprises introducing the recombinant expression construct into the plant material.
- plant materials such as a plant (e.g. a maize plant), plant cell, plant tissue, seed, or germplasm thereof, comprising the isolated construct disclosed herein.
- the methods embodied by the present disclosure relate to a method for transforming a host cell, which can be a plant cell.
- the method comprises transforming the host or plant cell with the isolated construct disclosed herein.
- the method can further include producing a plant by transforming a plant cell with a construct of the present disclosure and regenerating a plant from the transformed plant cell, thereby producing a plant having the R gene disclosed herein.
- the regenerated plant has improved plant disease resistance, as compared to an isogenic plant lacking the R gene.
- compositions and methods relate to plant material modified to include an R gene, the plant material having increased resistance to a plant disease, wherein prior to modification the plant material lacked the R.
- the plant material is modified, e.g., by mutagenesis, transgene insertion, or gene editing, to include a nucleotide sequence (i) having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, at least 99%, or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478.
- a method of generating a variant of an R gene by gene shuffling one or more nucleotide sequences comprising an R gene (i) having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs:l-1739 or (ii) encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, at least 99%, or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478.
- Variants are then transiently or stably expressed in plant material and tested for whether they provide increased resistance to plant disease.
- one or more variants can be incorporated into construct(s), and the construct(s) can be introduced into a regenerable plant cell; and a plant comprising the variant(s) construct can be regenerated from the plant cell.
- Plants containing the variant(s) can be evaluated for their tolerance/ susceptibility to gray leaf spot.
- Plants having a variant that provides increased resistance to plant disease, relative to an isogenic plant that lacks the variant construct can be selected.
- the plant can be maize, or the plant can be Arabidopsis, soybean, sunflower, sorghum, canola, wheat, alfalfa, cotton, rice, barley, millet, sugar cane, or switchgrass.
- the method comprises obtaining a plurality of plants each of which exhibits differing levels of plant disease resistance; screening nucleic acid samples from each plant for the presence of allelic variations in the R gene sequence (i) having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, at least 99%, or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478; evaluating the variations for genetic linkage to altered tolerance/susceptible to the plant disease; and identifying one or
- the R gene can be associated with increased disease resistance to a plant disease.
- a plant disease includes , bacterial leaf blight and stalk rot; bacterial leaf spot; bacterial stripe; chocolate spot; goss's bacterial wilt and blight; holcus spot; purple leaf sheath; seed rot-seedling blight; bacterial wilt; com stunt; anthracnose leaf blight; gray leaf spot; aspergillus ear and kernel rot; banded leaf and sheath spot; black bundle disease; black kernel rot; borde bianco; brown spot; black spot; stalk rot; cephalosporium kernel rot; charcoal rot; corticium ear rot; curvularia leaf spot; didymella leaf spot; diplodia ear rot and stalk rot; seed rot; corn seedling blight; diplodia leaf spot or leaf streak; downy mildews; brown strip
- Fig. 1 is a bar chart showing the distribution of different domain architectures involving canonical NLR domains in maize; abundance of indicated domain architectures are shown for the sum of all 26 maize NAM lines.
- CC coiled-coil
- NB NB-ARC
- LRR Leucine- rich repeat
- TIR Toll/interleukin 1 receptor
- RPW8 RPW8-like coiled-coil.
- Fig. 2 is a bar chart showing the distribution of integrated domains; atypical integrated domains identified via HMMer searches of NLR proteins are shown for all 26 NAM genomes.
- Figure 3 is a graph shows the average Shannon Entropy across different NLR features of a composite constructed by averaging the entropy of all 158 NLR clusters.
- a gene or allele is “associated with” a trait when it is part of or linked to a DNA sequence or allele that affects the expression of the trait.
- the presence of the allele is an indicator of how the trait will be expressed.
- disease resistant As used to herein, “disease resistant”, “increased plant disease resistance”, “increased resistance to plant disease”, “plant disease resistance” and the like refer to a plant showing increase resistance to a disease compared to a control plant, e.g., a control plant can be one that lacks the QTL or R gene that provides disease resistance but is otherwise isogenic to the disease resistant plant. Disease resistance may manifest in fewer and/or smaller lesions, increased plant health, increased yield, increased root mass, increased plant vigor, less or no discoloration, increased growth, reduced necrotic area, or reduced wilting.
- an R gene or variant disclosed herein may show resistance one or more diseases [0027]
- a plant having disease resistance may have 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% increased resistance to a disease compared to a control plant.
- a plant may have 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% increased plant health in the presence of a disease compared to a control plant.
- chromosomal interval designates a contiguous linear span of genomic DNA that resides in planta on a single chromosome.
- the genetic elements or genes located on a single chromosomal interval are physically linked.
- the size of a chromosomal interval is not particularly limited.
- the genetic elements located within a single chromosomal interval are genetically linked, typically with a genetic recombination distance of, for example, less than or equal to 20 cM, or alternatively, less than or equal to 10 cM. That is, two genetic elements within a single chromosomal interval undergo recombination at a frequency of less than or equal to 20% or less than or equal to 10%.
- crossed refers to a sexual cross and involved the fusion of two haploid gametes via pollination to produce diploid progeny (e.g., cells, seeds or plants).
- diploid progeny e.g., cells, seeds or plants.
- the term encompasses both the pollination of one plant by another and selfing (or self- pollination, e.g., when the pollen and ovule are from the same plant).
- An “elite line” is any line that has resulted from breeding and selection for superior agronomic performance.
- a “favorable allele” is the allele at a particular locus (a marker, a QTL, a gene etc.) that confers, or contributes to, an agronomically desirable phenotype, e.g., disease resistance, and that allows the identification of plants with that agronomically desirable phenotype.
- a favorable allele of a marker is a marker allele that segregates with the favorable phenotype.
- Genetic markers are nucleic acids that are polymorphic in a population and where the alleles of which can be detected and distinguished by one or more analytic methods, e.g., RFLP, AFLP, isozyme, SNP, SSR, and the like.
- the term also refers to nucleic acid sequences complementary to the genomic sequences, such as nucleic acids used as probes. Markers corresponding to genetic polymorphisms between members of a population can be detected by methods well-established in the art.
- PCR-based sequence specific amplification methods include, e.g., PCR-based sequence specific amplification methods, detection of restriction fragment length polymorphisms (RFLP), detection of isozyme markers, detection of polynucleotide polymorphisms by allele specific hybridization (ASH), detection of amplified variable sequences of the plant genome, detection of self-sustained sequence replication, detection of simple sequence repeats (SSRs), detection of single nucleotide polymorphisms (SNPs), or detection of amplified fragment length polymorphisms (AFLPs).
- ESTs expressed sequence tags
- SSR markers derived from EST sequences and randomly amplified polymorphic DNA (RAPD).
- germplasm refers to genetic material of or from an individual (e.g., a plant), a group of individuals (e.g., a plant line, variety or family), or a clone derived from a line, variety, species, or culture, or more generally, all individuals within a species or for several species (e.g., maize germplasm collection or Andean germplasm collection).
- the germplasm can be part of an organism, cell, or can be separate from the organism or cell.
- germplasm provides genetic material with a specific molecular makeup that provides a physical foundation for some or all of the hereditary qualities of an organism or cell culture.
- germplasm includes cells, seed or tissues from which new plants may be grown, or plant parts, such as leafs, stems, pollen, or cells, that can be cultured into a whole plant.
- a “haplotype” is the genotype of an individual at a plurality of genetic loci, i.e. a combination of alleles. Typically, the genetic loci described by a haplotype are physically and genetically linked, i.e., on the same chromosome segment.
- heterogeneity is used to indicate that individuals within the group differ in genotype at one or more specific loci.
- heterosis can be defined by performance which exceeds the average of the parents (or high parent) when crossed to other dissimilar or unrelated groups.
- a “heterotic group” comprises a set of genotypes that perform well when crossed with genotypes from a different heterotic group (Hallauer et al. (1998) Corn breeding, p. 463- 564. In G.F. Sprague and J.W. Dudley (ed.) Corn and corn improvement). Inbred lines are classified into heterotic groups, and are further subdivided into families within a heterotic group, based on several criteria such as pedigree, molecular marker-based associations, and performance in hybrid combinations (Smith et al. (1990) Theor. Appl. Gen. 80:833-840).
- Iowa Stiff Stalk Synthetic also referred to herein as “stiff stalk”
- Lancaster or “Lancaster Sure Crop” (sometimes referred to as NSS, or non-Stiff Stalk).
- BSSS Stiff Stalk Synthetic population
- NSS Non-Stiff Stalk.
- This group includes several major heterotic groups such as Lancaster Surecrop, lodent, and Learning Corn.
- homogeneity indicates that members of a group have the same genotype at one or more specific loci.
- hybrid refers to the progeny obtained between the crossing of at least two genetically dissimilar parents.
- inbred refers to a line that has been bred for genetic homogeneity.
- introgression refers to the transmission of a desired allele of a genetic locus from one genetic background to another.
- introgression of a desired R gene allele at a specified locus can be transmitted to at least one progeny via a sexual cross between two parents of the same species, where at least one of the parents has the desired allele in its genome.
- transmission of an allele can occur by recombination between two donor genomes, e.g., in a fused protoplast, where at least one of the donor protoplasts has the desired allele in its genome.
- the desired allele can be, e.g., detected by a marker that is associated with a phenotype, at a QTL, a transgene, or the like.
- Offspring comprising the desired allele may be repeatedly backcrossed to a line having a desired genetic background and selected for the desired allele, to result in the allele becoming fixed in a selected genetic background.
- a “line” or “strain” is a group of individuals of identical parentage that are generally inbred to some degree and that are generally homozygous and homogeneous at most loci (isogenic or near isogenic).
- a “subline” refers to an inbred subset of descendents that are genetically distinct from other similarly inbred subsets descended from the same progenitor.
- the term “linked” or “linkage” is used to describe the degree with which one marker locus is associated with another marker locus or some other locus.
- the linkage relationship between a molecular marker and a locus affecting a phenotype is given as a “probability” or “adjusted probability”.
- Linkage can be expressed as a desired limit or range.
- any marker is linked (genetically and physically) to any other marker when the markers are separated by less than 50, 40, 30, 25, 20, or 15 map units (or cM) of a single meiosis map (a genetic map based on a population that has undergone one round of meiosis, such as e.g.
- bracketed range of linkage for example, between 10 and 20 cM, between 10 and 30 cM, or between 10 and 40 cM.
- the phrase “closely linked”, in the present application, means that recombination between two linked loci occurs with a frequency of equal to or less than about 10% (i.e., are separated on a genetic map by not more than 10 cM). Put another way, the closely linked loci co-segregate at least 90% of the time.
- Marker loci are especially useful with respect to the subject matter of the current disclosure when they demonstrate a significant probability of co-segregation (linkage) with a desired trait (e.g., increased resistance to plant disease).
- “closely linked” loci such as a marker locus and a second locus can display an inter-locus recombination frequency of 10% or less, preferably about 9% or less, still more preferably about 8% or less, yet more preferably about 7% or less, still more preferably about 6% or less, yet more preferably about 5% or less, still more preferably about 4% or less, yet more preferably about 3% or less, and still more preferably about 2% or less.
- the relevant loci display a recombination frequency of about 1% or less, e.g., about 0.75% or less, more preferably about 0.5% or less, or yet more preferably about 0.25% or less.
- a recombination frequency of about 1% or less, e.g., about 0.75% or less, more preferably about 0.5% or less, or yet more preferably about 0.25% or less.
- Two loci that are localized to the same chromosome, and at such a distance that recombination between the two loci occurs at a frequency of less than 10% (e.g., about 9 %, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.75%, 0.5%, 0.25%, or less) are also said to be “in proximity to” each other.
- any marker is closely linked (genetically and physically) to any other marker that is in close proximity, e.g., at or less than 10 cM distant.
- Two closely linked markers on the same chromosome can be positioned 9, 8, 7, 6, 5, 4, 3, 2, 1, 0.75, 0.5 or 0.25 cM or less from each other.
- two different markers can have the same genetic map coordinates. In that case, the two markers are in such close proximity to each other that recombination occurs between them with such low frequency that it is undetectable.
- linkage disequilibrium refers to a non-random segregation of genetic loci or traits (or both). In either case, linkage disequilibrium implies that the relevant loci are within sufficient physical proximity along a length of a chromosome so that they segregate together with greater than random (i.e., non-random) frequency. Markers that show linkage disequilibrium are considered linked. Linked loci co-segregate more than 50% of the time, e.g., from about 51% to about 100% of the time.
- linkage can be between two markers, or alternatively between a marker and a locus affecting a phenotype.
- a marker locus can be “associated with” (linked to) a trait. The degree of linkage of a marker locus and a locus affecting a phenotypic trait is measured, e.g., as a statistical probability of co-segregation of that molecular marker with the phenotype (e.g., an F statistic or LOD score).
- Linkage disequilibrium is most commonly assessed using the measure r2, which is calculated using the formula described by Hill, W.G. and Robertson, A, Theor. Appl. Genet. 38:226-231(1968).
- r2 1
- complete LD exists between the two marker loci, meaning that the markers have not been separated by recombination and have the same allele frequency.
- the r2 value will be dependent on the population used. Values for r2 above 1/3 indicate sufficiently strong LD to be useful for mapping (Ardlie et al. 2002 Nature Reviews Genetics 3:299-309).
- alleles are in linkage disequilibrium when r2 values between pairwise marker loci are greater than or equal to 0.33, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 1.0.
- linkage equilibrium describes a situation where two markers independently segregate, i.e., sort among progeny randomly. Markers that show linkage equilibrium are considered unlinked (whether or not they lie on the same chromosome).
- a “locus” is a position on a chromosome, e.g. where a nucleotide, gene, sequence, or marker is located.
- LOD score The “logarithm of odds (LOD) value” or “LOD score” (Risch, 1992 Science 255(5046):803-804) is used in genetic interval mapping to describe the degree of linkage between two marker loci.
- a LOD score of three between two markers indicates that linkage is 1000 times more likely than no linkage, while a LOD score of two indicates that linkage is 100 times more likely than no linkage.
- LOD scores greater than or equal to two may be used to detect linkage.
- LOD scores can also be used to show the strength of association between marker loci and quantitative traits in “quantitative trait loci” mapping. In this case, the LOD score’s size is dependent on the closeness of the marker locus to the locus affecting the quantitative trait, as well as the size of the quantitative trait effect.
- plant material includes whole plants, plant cells, plant protoplast, plant cell or tissue culture from which plants can be regenerated, plant calli, plant clumps and plant cells that are intact in plants, or parts of plants, such as seeds, flowers, cotyledons, leaves, stems, buds, roots, root tips and the like.
- a “modified plant” means any plant that has a genetic change due to human intervention.
- a modified plant may have genetic changes introduced through plant transformation, genome editing, mutagenesis, or conventional plant breeding.
- a “marker” is a means of finding a position on a genetic or physical map, or else linkages among markers and trait loci (loci affecting traits).
- the position that the marker detects may be known via detection of polymorphic alleles and their genetic mapping, or else by hybridization, sequence match or amplification of a sequence that has been physically mapped.
- a marker can be a DNA marker (detects DNA polymorphisms), a protein (detects variation at an encoded polypeptide), or a simply inherited phenotype (such as the ‘waxy’ phenotype).
- a DNA marker can be developed from genomic nucleotide sequence or from expressed nucleotide sequences (e.g., from a spliced RNA or a cDNA). Depending on the DNA marker technology, the marker may consist of primers complementary to sequence flanking the locus and/or probes that hybridize to polymorphic alleles at the locus.
- a DNA marker, or a genetic marker may also be used to describe the gene, DNA sequence or nucleotide on the chromosome itself (rather than the components used to detect the gene or DNA sequence) and is often used when that DNA marker is associated with a particular trait in human genetics (e.g. a marker for breast cancer).
- the term marker locus is the locus (gene, sequence or nucleotide) that the marker detects.
- Markers can be defined by the type of polymorphism that they detect and also the marker technology used to detect the polymorphism. Marker types include but are not limited to, e.g., detection of restriction fragment length polymorphisms (RFLP), detection of isozyme markers, randomly amplified polymorphic DNA (RAPD), amplified fragment length polymorphisms (AFLPs), detection of simple sequence repeats (SSRs), detection of amplified variable sequences of the plant genome, detection of self-sustained sequence replication, or detection of single nucleotide polymorphisms (SNPs). SNPs can be detected e.g.
- RFLP restriction fragment length polymorphisms
- RAPD randomly amplified polymorphic DNA
- AFLPs amplified fragment length polymorphisms
- SSRs simple sequence repeats
- SNPs single nucleotide polymorphisms
- DNA sequencing via DNA sequencing, PCR-based sequence specific amplification methods, detection of polynucleotide polymorphisms by allele specific hybridization (ASH), dynamic allele-specific hybridization (DASH), molecular beacons, microarray hybridization, oligonucleotide ligase assays, Flap endonucleases, 5’ endonucleases, primer extension, single strand conformation polymorphism (SSCP) or temperature gradient gel electrophoresis (TGGE).
- DNA sequencing such as the pyrosequencing technology has the advantage of being able to detect a series of linked SNP alleles that constitute a haplotype. Haplotypes tend to be more informative (detect a higher level of polymorphism) than SNPs.
- Marker assisted selection (of MAS) is a process by which individual plants are selected based on marker genotypes.
- Marker assisted counter-selection is a process by which marker genotypes are used to identify plants that will not be selected, allowing them to be removed from a breeding program or planting.
- a “marker haplotype” refers to a combination of alleles at a marker locus.
- molecular marker may be used to refer to a genetic marker, as defined above, or an encoded product thereof (e.g., a protein) used as a point of reference when identifying a linked locus.
- a molecular marker can be derived from genomic nucleotide sequences or from expressed nucleotide sequences (e.g., from a spliced RNA, a cDNA, etc.), or from an encoded polypeptide.
- the term also refers to nucleic acid sequences complementary to or flanking the marker sequences, such as nucleic acids used as probes or primer pairs capable of amplifying the marker sequence.
- a “molecular marker probe” is a nucleic acid sequence or molecule that can be used to identify the presence of a marker locus, e.g., a nucleic acid probe that is complementary to a marker locus sequence.
- a marker probe refers to a probe of any type that is able to distinguish (i.e., genotype) the particular allele that is present at a marker locus.
- Nucleic acids are “complementary” when they specifically hybridize in solution. Some of the markers described herein are also referred to as hybridization markers when located on an indel region, such as the non-collinear region described herein.
- the insertion region is, by definition, a polymorphism vis a vis a plant without the insertion.
- the marker need only indicate whether the indel region is present or absent. Any suitable marker detection technology may be used to identify such a hybridization marker, e.g. SNP technology is used in the examples provided herein.
- phenotype can refer to the observable expression of a gene or series of genes.
- the phenotype can be observable to the naked eye, or by any other means of evaluation, e.g., weighing, counting, measuring (length, width, angles, etc.), microscopy, biochemical analysis, or an electromechanical assay.
- a phenotype is directly controlled by a single gene or genetic locus, i.e., a “single gene trait” or a “simply inherited trait”.
- single gene traits can segregate in a population to give a “qualitative” or “discrete” distribution, i.e.
- a phenotype falls into discrete classes.
- a phenotype is the result of several genes and can be considered a “multigenic trait” or a “complex trait”.
- Multigenic traits segregate in a population to give a “quantitative” or “continuous” distribution, i.e. the phenotype cannot be separated into discrete classes. Both single gene and multigenic traits can be affected by the environment in which they are being expressed, but multigenic traits tend to have a larger environmental component.
- a “polymorphism” is a variation in the DNA between two or more individuals within a population.
- a polymorphism preferably has a frequency of at least 1% in a population.
- a useful polymorphism can include a single nucleotide polymorphism (SNP), a simple sequence repeat (SSR), or an insertion/deletion polymorphism, also referred to herein as an “indel”
- QTL quantitative trait locus
- a “reference sequence” or a “consensus sequence” is a defined sequence used as a basis for sequence comparison.
- the reference sequence for a marker is obtained by sequencing a number of lines at the locus, aligning the nucleotide sequences in a sequence alignment program (e.g. Sequencher), and then obtaining the most common nucleotide sequence of the alignment. Polymorphisms found among the individual sequences are annotated within the consensus sequence.
- a reference sequence is not usually an exact copy of any individual DNA sequence, but represents an amalgam of available sequences and is useful for designing primers and probes to polymorphisms within the sequence.
- An “unfavorable allele” of a marker is a marker allele that segregates with the unfavorable plant phenotype, therefore providing the benefit of identifying plants that can be removed from a breeding program or planting.
- yield refers to the productivity per unit area of a particular plant product of commercial value. Yield is affected by both genetic and environmental factors. “Agronomics,” “agronomic traits,” and “agronomic performance” refer to the traits (and underlying genetic elements) of a given plant variety that contribute to yield over the course of growing season. Individual agronomic traits include emergence vigor, vegetative vigor, stress tolerance, disease resistance or tolerance, herbicide resistance, branching, flowering, seed set, seed size, seed density, standability, threshability and the like. Yield is, therefore, the final culmination of all agronomic traits. [0062] NLR-genes.
- NBS-LRR (“NLR”) group of R genes is the largest class of R genes discovered to date. In Arabidopsis thaliana, over 150 are predicted to be present in the genome (Meyers et al. 2003 Plant Cell, 15:809-834; Monosi et al. 2004 Theoretical and Applied Genetics, 109:1434-1447), while in rice, approximately 500 NLR genes have been predicted (Monosi 2004, supra).
- the NBS-LRR class of R genes is comprised of two subclasses. Class 1 NLR genes contain a TIR-Toll/Interleukin-1 like domain at their N’ terminus; which to date have only been found in dicots (Meyers 2003, supra; Monosi 2004, supra).
- NBS-LRR The second class of NBS-LRR contain either a coiled-coil domain or an (nt) domain at their N terminus (Baiet et al. 2002 Genome Research, 12: 1871-1884; Monosi 2004 supra; Pan et al. 2000 Journal of Molecular Evolution, 50:203-213). Class 2 NBS-LRR have been found in both di cot and monocot species. (Bai 2002, supra; Meyers 2003, supra; Monosi 2004, supra; Pan 2000, supra).
- the NBS domain of the gene appears to have a role in signaling in plant defense mechanisms (van derBiezen et al. 1998, Current Biology: CB, 8:R226-R227).
- the LRRregion appears to be the region that interacts with the pathogen AVR products (Michelmore et al. 1998 Genome Res, 8:1113-1130; Meyers 2003 supra).
- This LRR region in comparison with the NB- ARC (NBS) domain is under a much greater selection pressure to diversify (Michelmore 1998, supra; Meyers 2003, supra; Palomino et al. 2002, Genome Research, 12: 1305-1315).
- LRR domains are found in other contexts as well; these 20-29-residue motifs are present in tandem arrays in a number of proteins with diverse functions, such as hormone - receptor interactions, enzyme inhibition, cell adhesion and cellular trafficking.
- NLRs typically comprise a nucleotide-binding domain, a series of leucine-rich repeats, and an N-terminal region which may include a coiled-coil (CC), Toll/Interleukin-1 (TIR) or resistance to powdery mildew 8 (RPW8) domain (Shao et al. 2016 Plant Physiol, 170:2095-109).
- CC coiled-coil
- TIR Toll/Interleukin-1
- RPW8 resistance to powdery mildew 8
- NLRs may arise via rare recombination events which result in domains with high similarity to effector targets being integrated into NLR genes, which then detect the presence of effectors though direct interaction (Grund et al. 2019 Plant Physiol 179: 1227-1235).
- NLRs can detect the presence of pathogen effectors through (i) direct interaction of an effector with canonical NLR domains, (ii) direct interaction of an effector with an integrated domain that mimics the effector’s host target or (iii) interaction with a host gene targeted by an effector (guardee) to detect alteration of its normal state by the pathogen (van der Hoorn and Kamoun, 2008 Plant Cell 20:2009-17; Cesari et al.
- ID NLRs typically contain all canonical NLR domains but do not function in direct detection of pathogen effectors, but instead act to transmit the signal of a “sensor” NLR (Wu et al. 2017 Proc Natl Acad Sci USA 114:8113-8118). Helper NLRs have been found in a variety of plant species, and can be specific to a single sensor NLR, or interact with a wide variety of sensors (Saile et al. 2020 PLoS Biol 18:e3000783).
- NLR complements of a variety of plant species have been identified, including a nearly comprehensive set of Arabidopsis and rice NLR complements (Van de Weyer et al. 2019 Cell 178: 1260-1272 el4; Shang et al. 2022 Cell Res, 32:878-896).
- NLR PAV largely takes place through the expansion and contraction of large physically compact clusters of NLRs in a few locations within the genome (Meyers et al. 2003 Plant Cell, 15:809-34; Jacob et al. 2013 Front Immunol, 4:297). These regions are thought to represent evolutionary hotspots, where different NLRs may rapidly recombine to generate new sequence diversity (van Wersch and Li 2019 Trends Plant Sci, 24:688-699 ).
- NLRs are subject to differential evolutionary pressures (Meyers et al. 1998 Plant Cell, 10: 1833-46).
- NB-ARC domains which are functional ATPases that control the activation states of NLRs, typically show high conservation, while LRRs and coiled-coil domains have much higher amino acid diversity (Qi et al. 2012 Plant Physiol, 158: 1819-32).
- a common measure of linkage is the frequency with which traits cosegregate. This can be expressed as a percentage of cosegregation (recombination frequency) or in centiMorgans (cM).
- the cM is a unit of measure of genetic recombination frequency.
- One cM is equal to a 1% chance that a trait at one genetic locus will be separated from a trait at another locus due to crossing over in a single generation (meaning the traits segregate together 99% of the time). Because chromosomal distance is approximately proportional to the frequency of crossing over events between traits, there is an approximate physical distance that correlates with recombination frequency.
- Marker loci are themselves traits and can be assessed according to standard linkage analysis by tracking the marker loci during segregation. Thus, one cM is equal to a 1% chance that a marker locus will be separated from another locus, due to crossing over in a single generation.
- a marker is to a gene (e.g., R gene disclosed herein which (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478) controlling a trait of interest, the more effective and advantageous that marker is as an indicator for the desired trait.
- a gene e.g., R gene disclosed herein which (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478) controlling a trait of interest, the more effective and advantageous that marker is as an indicator for the desired trait.
- Closely linked loci display an inter-locus cross-over frequency of about 10% or less, preferably about 9% or less, still more preferably about 8% or less, yet more preferably about 7% or less, still more preferably about 6% or less, yet more preferably about 5% or less, still more preferably about 4% or less, yet more preferably about 3% or less, and still more preferably about 2% or less.
- the relevant loci e.g., a marker locus and a target locus
- the loci are about 10 cM, 9 cM, 8 cM, 7 cM, 6 cM, 5 cM, 4 cM, 3 cM, 2 cM, 1 cM, 0.75 cM, 0.5 cM or 0.25 cM or less apart.
- two loci that are localized to the same chromosome, and at such a distance that recombination between the two loci occurs at a frequency of less than 10% (e.g., about 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.75%, 0.5%, 0.25%, or less) are said to be “proximal to” each other.
- marker locus is not necessarily responsible for the expression of the disease resistance phenotype.
- the marker polynucleotide sequence be part of a gene that is responsible for the disease resistant phenotype (for example, is part of the gene open reading frame).
- the association between a specific marker allele and the disease resistance trait is due to the original “coupling” linkage phase between the marker allele and the allele in the ancestral line from which the allele originated. Eventually, with repeated recombination, crossing over events between the marker and genetic locus can change this orientation.
- the favorable marker allele may change depending on the linkage phase that exists within the parent having resistance to the disease that is used to create segregating populations. This does not change the fact that the marker can be used to monitor segregation of the phenotype. It only changes which marker allele is considered favorable in a given segregating population.
- Marker assisted selection Molecular markers can be used in a variety of plant breeding applications (e.g. see Staub et al. 1996 Hortscience 31 :729-741; Tanksley 1983 Plant Molecular Biology Reporter. 1 :3-8).
- One of the main areas of interest is to increase the efficiency of backcrossing and introgressing genes using marker-assisted selection (MAS).
- MAS marker-assisted selection
- a molecular marker that demonstrates linkage with a locus affecting a desired phenotypic trait provides a useful tool for the selection of the trait in a plant population. This is particularly true where the phenotype is hard to assay.
- DNA marker assays are less laborious and take up less physical space than field phenotyping, much larger populations can be assayed, increasing the chances of finding a recombinant with the target segment from the donor line moved to the recipient line.
- a marker is located within the gene itself, so that recombination cannot occur between the marker and the gene.
- the methods disclosed herein produce a marker in a disease resistance gene, wherein the gene was identified by inferring genomic location from clustering of conserved domains or a clustering analysis.
- flanking regions When a gene is introgressed by MAS, it is not only the gene that is introduced but also the flanking regions (Gepts. 2002 Crop Sci 42: 1780-1790). This is referred to as “linkage drag.” In the case where the donor plant is highly unrelated to the recipient plant, these flanking regions carry additional genes that may code for agronomically undesirable traits. Linkage drag may also result in reduced yield or other negative agronomic characteristics even after multiple cycles of backcrossing into the elite line.
- flanking region can be decreased by additional backcrossing, although this is not always successful, as breeders do not have control over the size of the region or the recombination breakpoints (Young et al. 1998 Genetics 120:579-585). In classical breeding it is usually only by chance that recombinations are selected that contribute to a reduction in the size of the donor segment (Tanksley et al. 1989 Biotechnology 7: 257-264). Even after 20 backcrosses in backcrosses of this type, one may expect to find a sizeable piece of the donor chromosome still linked to the gene being selected.
- markers however, it is possible to select those rare individuals that have experienced recombination near the gene of interest.
- 150 backcross plants there is a 95% chance that at least one plant will have experienced a crossover within 1 cM of the gene, based on a single meiosis map distance. Markers will allow unequivocal identification of those individuals.
- With one additional backcross of 300 plants there would be a 95% chance of a crossover within 1 cM single meiosis map distance of the other side of the gene, generating a segment around the target gene of less than 2 cM based on a single meiosis map distance. This can be accomplished in two generations with markers, while it would have required on average 100 generations without markers (See Tanksley et al., supra).
- flanking markers surrounding the gene can be utilized to select for recombinations in different population sizes. For example, in smaller population sizes, recombinations may be expected further away from the gene, so more distal flanking markers would be required to detect the recombination.
- the key components to the implementation of MAS are: (i) defining the population within which the marker-trait association will be determined, which can be a segregating population, or a random or structured population; (ii) monitoring the segregation or association of polymorphic markers relative to the trait, and determining linkage or association using statistical methods; (iii) defining a set of desirable markers based on the results of the statistical analysis, and (iv) the use and/or extrapolation of this information to the current set of breeding germplasm to enable marker-based selection decisions to be made.
- the markers described in this disclosure, as well as other marker types such as SSRs and FLPs, can be used in marker assisted selection protocols.
- SSRs can be defined as relatively short runs of tandemly repeated DNA with lengths of 6 bp or less (Tautz 1989 Nucleic Acid Research 17: 6463-6471; Wang et al. 1994 Theoretical and Applied Genetics, 88: 1-6). Polymorphisms arise due to variation in the number of repeat units, probably caused by slippage during DNA replication (Levinson and Gutman 1987 Mol Biol Evol 4: 203-221). The variation in repeat length may be detected by designing PCR primers to the conserved non-repetitive flanking regions (Weber and May 1989 Am J Hum Genet. 44:388-396).
- SSRs are highly suited to mapping and MAS as they are multi-allelic, codominant, reproducible and amenable to high throughput automation (Rafalski et al. 1996 Generating and using DNA markers in plants. In: Non-mammalian genomic analysis: a practical guide. Academic press, pp 75-135).
- SSR markers can be generated, and SSR profiles can be obtained by gel electrophoresis of the amplification products. Scoring of marker genotype is based on the size of the amplified fragment.
- FLP markers can also be generated. Most commonly, amplification primers are used to generate fragment length polymorphisms. Such FLP markers are in many ways similar to SSR markers, except that the region amplified by the primers is not typically a highly repetitive region.
- the amplified region, or amplicon will have sufficient variability among germplasm, often due to insertions or deletions, such that the fragments generated by the amplification primers can be distinguished among polymorphic individuals, and such indels are known to occur frequently in maize (Bhattramakki et al. 2002 Plant Mol Biol 48, 539-547; Rafalski 2002b, supra).
- SNP markers detect single base pair nucleotide substitutions. Of all the molecular marker types, SNPs are the most abundant, thus having the potential to provide the highest genetic map resolution (Bhattramakki et al. 2002 Plant Molecular Biology 48:539-547). SNPs can be assayed at an even higher level of throughput than SSRs, in a so-called 'ultra-high- throughput' fashion, as SNPs do not require large amounts of DNA and automation of the assay may be straight-forward. SNPs also have the promise of being relatively low-cost systems. These three factors together make SNPs highly attractive for use in MAS.
- a number of SNPs together within a sequence, or across linked sequences, can be used to describe a haplotype for any particular genotype (Ching et al. 2002, BMC Genet. 3: 19; Gupta et al. 2001, Rafalski 2002b, Plant Science 162:329-333).
- Haplotypes can be more informative than single SNPs and can be more descriptive of any particular genotype.
- a single SNP may be allele “T' for a specific line or variety with disease resistance, but the allele T' might also occur in the breeding population being utilized for recurrent parents.
- a haplotype e.g. a combination of alleles at linked SNP markers, may be more informative.
- haplotype Once a unique haplotype has been assigned to a donor chromosomal region, that haplotype can be used in that population or any subset thereof to determine whether an individual has a particular gene. Using automated high throughput marker detection platforms makes this process highly efficient and effective.
- SNP single nucleotide polymorphic
- the primers are used to amplify DNA segments from individuals (preferably inbred) that represent the diversity in the population of interest.
- the PCR products are sequenced directly in one or both directions.
- the resulting sequences are aligned and polymorphisms are identified.
- the polymorphisms are not limited to single nucleotide polymorphisms (SNPs), but also include indels, CAPS, SSRs, and VNTRs (variable number of tandem repeats).
- markers within the described map region can be hybridized to BACs or other genomic libraries, or electronically aligned with genome sequences, to find new sequences in the same approximate location as the described markers.
- ESTs expressed sequence tags
- RAPD randomly amplified polymorphic DNA
- Isozyme profiles and linked morphological characteristics can, in some cases, also be indirectly used as markers. Even though they do not directly detect DNA differences, they are often influenced by specific genetic differences. However, markers that detect DNA variation are far more numerous and polymorphic than isozyme or morphological markers (Tanksley 1983 Plant Molecular Biology Reporter 1 : 3-8).
- Sequence alignments or contigs may also be used to find sequences upstream or downstream of the specific markers listed herein. These new sequences, close to the markers described herein, are then used to discover and develop functionally equivalent markers. For example, different physical and/or genetic maps are aligned to locate equivalent markers not described within this disclosure but that are within similar regions. These maps may be within the species, or even across other species that have been genetically or physically aligned.
- MAS uses polymorphic markers that have been identified as having a significant likelihood of co- segregation with a trait conferred by the R gene disclosed herein (e.g., a sequence that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs:l-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478).
- markers are presumed to map near a gene or genes that give the plant its disease resistant phenotype, and are considered indicators for the desired trait, or markers.
- plants are tested for the presence of a desired allele in the marker, and plants containing a desired genotype at one or more loci are expected to transfer the desired genotype, along with a desired phenotype, to their progeny.
- plants with one or more disease resistance R-genes disclosed herein may be selected for by detecting one or more marker alleles, and in addition, progeny plants derived from those plants can also be selected.
- a plant containing a desired genotype in a given chromosomal region i.e. a genotype associated with disease resistance
- the progeny of such a cross would then be evaluated genotypically using one or more markers and the progeny plants with the same genotype in a given chromosomal region would then be selected as having disease resistance.
- SNPs could be used alone or in combination (i.e. a SNP haplotype) to select for a favorable R gene allele associated with disease resistance.
- a SNP haplotype can include a combination of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 of the markers for the R gene disclosed herein.
- a SNP haplotype can also include a combination of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39 of such markers for one or more R gene disclosed herein.
- polymorphic sites at marker loci in and around a chromosome marker identified by the methods disclosed herein wherein one or more polymorphic sites is in linkage disequilibrium (LD) with an allele at one or more of the polymorphic sites in the haplotype and thus could be used in a marker assisted selection program to introgress a gene allele or genomic fragment of interest.
- LD linkage disequilibrium
- Two particular alleles at different polymorphic sites are said to be in LD if the presence of the allele at one of the sites tends to predict the presence of the allele at the other site on the same chromosome (Stevens 1999 Mol. Diag. 4:309-17).
- the marker loci can be located within 5 cM, 2 cM, or 1 cM (on a single meiosis based genetic map) of the disease resistance trait QTL comprising an R-gene disclosed herein.
- Allelic frequency can differ from one germplasm pool to another. Germplasm pools vary due to maturity differences, heterotic groupings, geographical distribution, etc. As a result, SNPs and other polymorphisms may not be informative in some germplasm pools.
- a “recombinant protein” is used herein to refer to a protein that is no longer in its natural environment, for example in vitro or in a recombinant bacterial or plant host cell; a protein that is expressed from a polynucleotide that has been edited from its native version; or a protein that is expressed from a polynucleotide in a different genomic position relative to the native sequence.
- R-gene encoded polypeptides including a polypeptides having an amino acid sequence that has at least about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater sequence identity to one of SEQ ID NOs: 1740-3478.
- the sequence identity is against the full-length sequence of a polypeptide.
- the term “about” when used herein in context with percent sequence identity means +/- 1.0 percentage point, relative to the recited percentage.
- substantially free of cellular material refers to a polypeptide including preparations of protein having less than about 30%, 20%, 10% or 5% (by dry weight) of non-target protein (also referred to herein as a “contaminating protein”).
- “Fragments” or “biologically active portions” include polypeptide or polynucleotide fragments comprising sequences sufficiently identical to an R gene or R gene encoded polypeptide disclosed herein, respectively, and that exhibit disease resistance when expressed in a plant.
- “Variants” as used herein refers to proteins or polypeptides having an amino acid sequence that is at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater identical to the parental amino acid sequence, e.g., one of SEQ ID NOs: 1740-3478.
- amino acid sequence variants of a polypeptide may be prepared by mutations in the DNA. This may also be accomplished by one of several forms of mutagenesis, such as for example site-specific double strand break technology, and/or in directed evolution. In some aspects, the changes encoded in the amino acid sequence will not substantially affect the function of the protein. Such variants will possess the desired activity. However, it is understood that the ability of an R gene-encoded polypeptide to confer disease resistance may, in some cases, be improved by the use of such techniques upon the compositions of this disclosure.
- nucleic acid molecule refers to DNA molecules (e.g., recombinant DNA, cDNA, genomic DNA, plastid DNA, mitochondrial DNA) and RNA molecules (e.g., mRNA) and analogs of the DNA or RNA generated using nucleotide analogs.
- the nucleic acid molecule can be single-stranded. In some examples, the nucleic acid molecule can be double-stranded.
- nucleic acid molecule e.g., RNA or DNA
- isolated nucleic acid molecule e.g., RNA or DNA
- recombinant nucleic acid molecule e.g., RNA or DNA
- nucleic acid sequence e.g., RNA or DNA
- an “isolated” or “recombinant” nucleic acid is free of sequences (preferably protein encoding sequences) that naturally flank the nucleic acid (i.e., sequences located at the 5' and 3' ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived.
- isolated or “recombinant” when used to refer to nucleic acid molecules excludes isolated chromosomes.
- the recombinant nucleic acid molecules can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of nucleic acid sequences that naturally flank the R gene nucleic acid molecule in genomic DNA of the cell from which the R gene is derived.
- an isolated nucleic acid molecule comprising an R gene has one or more change in the nucleic acid sequence compared to the native or genomic nucleic acid sequence.
- the change in the native or genomic nucleic acid sequence includes but is not limited to: changes in the nucleic acid sequence due to the degeneracy of the genetic code; changes in the nucleic acid sequence due to the amino acid substitution, insertion, deletion and/or addition compared to the native or genomic sequence; removal of one or more intron; deletion of one or more upstream or downstream regulatory regions; and deletion of the 5’ and/or 3’ untranslated region associated with the genomic nucleic acid sequence.
- the nucleic acid molecule encoding one of SEQ ID NOs: 1740-3478 is a non-genomic sequence.
- polynucleotides comprising R gene disclosed herein are contemplated. Such polynucleotides are useful for production of encoded polypeptides in host cells when operably linked to a suitable promoter, transcription termination and/or polyadenylation sequences. Such polynucleotides are also useful as probes for isolating homologous or substantially homologous polynucleotides that are R genes or related to R genes disclosed herein.
- nucleic acid molecules encoding one of SEQ ID NOs: 1740- 3478, and variants, fragments and complements thereof.
- “Complement” is used herein to refer to a nucleic acid sequence that is sufficiently complementary to a given nucleic acid sequence such that it can hybridize to the given nucleic acid sequence to thereby form a stable duplex.
- a reverse complement is a complement formed by exchanging each A with T, T with A, C with G, and G with C in a sequence and then reversing the 5’ to 3’ order of the exchanged sequence, such that the reverse complement of 5’-ACCTGAG-3’ is 5’-CTCAGGT-3’.
- “Polynucleotide sequence variants” is used herein to refer to a nucleic acid sequence that except for the degeneracy of the genetic code encodes the same polypeptide.
- the nucleic acid molecule comprising an R gene is a non- genomic nucleic acid sequence.
- a “non-genomic nucleic acid sequence” or “non-genomic nucleic acid molecule” or “non-genomic polynucleotide” refers to a nucleic acid molecule that has one or more change in the nucleic acid sequence compared to a native or genomic nucleic acid sequence.
- the change to a native or genomic nucleic acid molecule includes but is not limited to: changes in the nucleic acid sequence due to the degeneracy of the genetic code; optimization of the nucleic acid sequence for expression in plants; changes in the nucleic acid sequence to introduce at least one amino acid substitution, insertion, deletion and/or addition compared to the native or genomic sequence; removal of one or more intron associated with the genomic nucleic acid sequence; insertion of one or more heterologous introns; deletion of one or more upstream or downstream regulatory regions associated with the genomic nucleic acid sequence; insertion of one or more heterologous upstream or downstream regulatory regions; deletion of the 5’ and/or 3’ untranslated region associated with the genomic nucleic acid sequence; insertion of a heterologous 5’ and/or 3’ untranslated region; and modification of a polyadenylation site.
- the non- genomic nucleic acid molecule is a synthetic nucleic acid sequence.
- the nucleic acid molecule comprising an R gene disclosed herein is a non-genomic polynucleotide having a nucleotide sequence having at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater identity, to a nucleic acid sequence encoding one of SEQ ID NOs: 1740-3478, wherein the R gene can confer disease resistance activity when expressed in a plant.
- the nucleic acid molecule encodes a polypeptide variant comprising one or more amino acid substitutions relative to the amino acid sequence of one of SEQ ID NOs: 1740-3478.
- Nucleic acid molecules that are fragments of these R gene nucleic acid sequences are also encompassed by the disclosure.
- “Fragment” as used herein refers to a portion of the nucleic acid sequence encoding one of SEQ ID NOs: 1740-3478.
- a fragment of a nucleic acid sequence may encode a biologically active portion of one of SEQ ID NOs: 1740-3478 or it may be a fragment that can be used as a hybridization probe or PCR primer using methods disclosed below.
- Nucleic acid molecules that are fragments of a nucleic acid sequence encoding a polypeptide comprising at least about 150, 180, 210, 240, 270, 300, 330, 360, 400, 450, or 500 contiguous nucleotides or up to the number of nucleotides present in a full-length nucleic acid sequence encoding one of SEQ ID NOs: 1740-3478 (e.g., one of SEQ ID NOs: 1-1739, respectively), depending upon the intended use.
- Contiguous nucleotides is used herein to refer to nucleotide residues that are immediately adjacent to one another.
- Fragments of the nucleic acid sequences will encode protein fragments that retain the biological activity of the R gene-encoded polypeptide and, hence, retain disease resistance.
- “Retains disease resistance” is used herein to refer to a polypeptide having at least about 10%, at least about 30%, at least about 50%, at least about 70%, 80%, 90%, 95% or higher of the disease resistance of the full- length R gene disclosed herein.
- a polynucleotide disclosed herein encodes a polypeptide comprising an amino acid sequence having at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater identity across the entire length of the amino acid sequence of one of SEQ ID NOs: 1740-3478.
- such a polynucleotide comprises genomic sequence, including introns, regulatory elements, and untranslated regions.
- the disclosure also provides nucleic acid molecules encoding variants of the R gene-encoded polypeptide disclosed herein.
- “Variants” of R gene include sequences that encode the R polypeptides disclosed herein (such as one of SEQ ID NOs: 1740-3478) or a fragment or variant thereof, but that differ conservatively because of the degeneracy of the genetic code as well as those that are sufficiently identical as discussed above.
- Naturally occurring allelic variants can be identified with the use of well-known molecular biology techniques, such as polymerase chain reaction (PCR) and hybridization techniques as outlined below.
- Variant nucleic acid sequences also include synthetically derived nucleic acid sequences that have been generated, for example, by using site-directed mutagenesis but which still encode the R-gene polypeptidea disclosed herein.
- variant nucleic acid molecules can be created by introducing one or more nucleotide substitutions, additions and/or deletions into the corresponding nucleic acid sequence disclosed herein, such that one or more amino acid substitutions, additions or deletions are introduced into the encoded protein. Mutations can be introduced by standard techniques, such as site-directed mutagenesis and PCR-mediated mutagenesis. Such variant nucleic acid sequences are also encompassed by the present disclosure.
- variant nucleic acid sequences can be made by introducing mutations randomly along all or part of the coding sequence, such as by saturation mutagenesis, and the resultant mutants can be screened for ability to confer activity to identify mutants that retain activity.
- the encoded protein can be expressed recombinantly, and the activity of the protein can be determined using standard assay techniques.
- polynucleotides of the disclosure and fragments thereof are optionally used as substrates for a variety of recombination and recursive recombination reactions, in addition to standard cloning methods as set forth in, e.g., Ausubel, Berger and Sambrook, i.e., to produce additional polypeptide homologues and fragments thereof with desired properties. A variety of such reactions are known.
- Methods for producing a variant of any nucleic acid listed herein comprising recursively recombining such polynucleotide with a second (or more) polynucleotide, thus forming a library of variant polynucleotides are also examples of the disclosure, as are the libraries produced, the cells comprising the libraries and any recombinant polynucleotide produced by such methods. Additionally, such methods optionally comprise selecting a variant polynucleotide from such libraries based on activity, as is wherein such recursive recombination is done in vitro or in vivo.
- a variety of diversity generating protocols including nucleic acid recursive recombination protocols are available.
- the procedures can be used separately, and/or in combination to produce one or more variants of a nucleic acid or set of nucleic acids, as well as variants of encoded proteins.
- Individually and collectively, these procedures provide robust, widely applicable ways of generating diversified nucleic acids and sets of nucleic acids (including, e.g., nucleic acid libraries) useful, e.g., for the engineering or rapid evolution of nucleic acids, proteins, pathways, cells and/or organisms with new and/or improved characteristics.
- the result of any of the diversity generating procedures described herein can be the generation of one or more nucleic acids, which can be selected or screened for nucleic acids with or which confer desirable properties or that encode proteins with or which confer desirable properties.
- any nucleic acids that are produced can be selected for a desired activity or property, e.g. such activity at a desired pH, etc. This can include identifying any activity that can be detected, for example, in an automated or automatable format, by any of the assays in the art.
- a variety of related (or even unrelated) properties can be evaluated, in serial or in parallel, at the discretion of the practitioner.
- nucleotide sequences disclosed herein can also be used to isolate corresponding sequences from a different source. In this manner, methods such as PCR, hybridization, and the like can be used to identify such sequences based on their sequence homology to the sequences identified by the methods disclosed herein. Sequences that are selected based on their sequence identity to the entire sequences set forth herein or to fragments thereof are encompassed by the disclosure. Such sequences include sequences that are orthologs of the sequences.
- the term “orthologs” refers to genes derived from a common ancestral gene and which are found in different species as a result of speciation. Genes found in different species are considered orthologs when their nucleotide sequences and/or their encoded protein sequences share substantial identity as defined elsewhere herein.
- oligonucleotide primers can be designed for use in PCR reactions to amplify corresponding DNA sequences from cDNA or genomic DNA extracted from any organism of interest.
- Methods for designing PCR primers and PCR cloning are disclosed in Sambrook et al. 1989 Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, New York), hereinafter “Sambrook”. See also, Innis et al., eds. 1990 PCR Protocols: A Guide to Methods and Applications (Academic Press, New York); Innis and Gelfand, eds.
- PCR PCR Strategies (Academic Press, New York); and Innis and Gelfand, eds. 1999 PCR Methods Manual (Academic Press, New York).
- Known methods of PCR include, but are not limited to, methods using paired primers, nested primers, single specific primers, degenerate primers, gene-specific primers, vector-specific primers, partially-mismatched primers, and the like.
- hybridization probes may be genomic DNA fragments, cDNA fragments, RNA fragments or other oligonucleotides and may be labeled with a detectable group such as 32 P or any other detectable marker, such as other radioisotopes, a fluorescent compound, an enzyme or an enzyme cofactor.
- Probes for hybridization can be made by labeling synthetic oligonucleotides based on the known polypeptide-encoding nucleic acid sequences disclosed herein.
- the probe typically comprises a region of nucleic acid sequence that hybridizes under stringent conditions to at least about 12, at least about 25, at least about 50, 75, 100, 125, 150, 175 or 200 consecutive nucleotides of nucleic acid sequences encoding polypeptides or a fragment or variant thereof.
- nucleotide Constructs are not intended to limit the disclosure to constructs comprising DNA.
- Polynucleotide constructs particularly polynucleotides and oligonucleotides composed of ribonucleotides and combinations of ribonucleotides and deoxyribonucleotides, may also be employed in the methods disclosed herein.
- the isolated polynucleotide constructs, nucleic acids, and nucleotide sequences disclosed herein additionally encompass all complementary forms (e.g., the reverse complement) of each sequence disclosed for such a construct.
- polynucleotide constructs and nucleotide sequences disclosed herein can encompass any such constructs, molecules, and sequences suitable for use in a method for transforming plant material disclosed herein. Such constructs can include naturally occurring molecules and/or synthetic analogues.
- the disclosed nucleotide constructs, nucleic acids, and nucleotide sequences also encompass all forms of nucleotide constructs including, but not limited to, single-stranded forms, double-stranded forms, hairpins, stem-and-loop structures and the like.
- Transformed organisms disclosed herein include plant cells, bacteria, yeast, baculovirus, protozoa, nematodes and algae.
- the transformed organism comprises a disclosed sequence (e.g., as part of a construct, expression cassette, or vector comprising the nucleotide sequence disclosed herein which are associated with increased disease resistance.
- the disclosed sequences can be used in constructs for expression in the organism of interest.
- Constructs can include 5’ and 3’; regulatory sequences operably linked to an R gene sequence, variant or fragment disclosed herein.
- operably linked refers to a functional linkage between a promoter and/or a regulatory sequence and a second sequence, wherein the promoter and/or regulatory sequence initiates, mediates, and/or affects transcription of the DNA sequence corresponding to the second sequence.
- operably linked means that the nucleic acid sequences being linked are contiguous and, where necessary, to join two protein coding regions in the same reading frame.
- the construct may additionally contain at least one additional gene to be cotransformed into the organism. Alternatively, the additional gene(s) can be provided on multiple DNA constructs.
- Such a DNA construct is provided with a plurality of restriction sites for insertion of the polypeptide gene sequence of the disclosure to be under the transcriptional regulation of the regulatory regions.
- the DNA construct may additionally contain selectable marker genes.
- the DNA construct will generally include in the 5' to 3' direction of transcription: a transcriptional and translational initiation region (e.g., a promoter), a DNA sequence of the embodiments, and a transcriptional and translational termination region (e.g., termination region) functional in the organism serving as a host.
- the transcriptional initiation region e.g., the promoter
- the transcriptional initiation region may be native, analogous, foreign or heterologous to the host organism and/or to the sequence of the embodiments.
- the promoter or regulatory sequence may be the natural sequence or alternatively a synthetic sequence.
- the term “foreign” as used herein indicates that the promoter is not found in the native organism into which the promoter is introduced.
- the term “heterologous” in reference to a sequence means a sequence that originates from a foreign species or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention.
- a chimeric gene comprises a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence. Where the promoter is a native or natural sequence, the expression of the operably linked sequence is altered from the wild-type expression, which results in an alteration in phenotype.
- the DNA construct comprises a polynucleotide encoding one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof. In some embodiments the DNA construct comprises a polynucleotide encoding a fusion protein that includes one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof.
- a DNA construct may also include a transcriptional enhancer sequence.
- An “enhancer” refers to a DNA sequence which can stimulate promoter activity, and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissuespecificity of a promoter.
- Various enhancers include, for example, introns with gene expression enhancing properties in plants (US Patent Application Publication Number 2009/0144863, the ubiquitin intron (i.e., the maize ubiquitin intron 1 (see, for example, NCBI sequence S94464)), the omega enhancer or the omega prime enhancer (Gallie et al. 1989 Molecular Biology ofRNA ed.
- the termination region may be native with the transcriptional initiation region, may be native with the operably linked DNA sequence of interest, may be native with the plant host or may be derived from another source (i.e., foreign or heterologous to the promoter, the sequence of interest, the plant host or any combination thereof).
- Convenient termination regions are available from the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also, Guerineau et al. 1991 Mol. Gen. Genet. 262: 141-144; Proudfoot 1991 Cell 64:671-674; Sanfacon et al. 1991 Genes Dev. 5: 141-149; Mogen et al. 1990 Plant Cell 2: 1261-1272; Munroe et al. 1990 Gene 91 :151-158; Ballas et al. 1989 Nucleic Acids Res. 17:7891-7903 and Joshi et al. 1987 Nucleic Acid Res. 15:9627-9639.
- a nucleic acid may be optimized for increased expression in the host organism.
- the synthetic nucleic acids can be synthesized using plant-preferred codons for improved expression. See, for example, Campbell and Gowri 1990 Plant Physiol. 92: 1-11 for a discussion of host-preferred usage.
- nucleic acid sequences of the embodiments may be expressed in both monocotyledonous and dicotyledonous plant species, sequences can be modified to account for the specific preferences and GC content preferences of monocotyledons or dicotyledons as these preferences have been shown to differ (Murray et al. 1989 Nucleic Acids Res. 17:477- 498).
- the plant-preferred for a particular amino acid may be derived from known gene sequences from plants.
- Additional sequence modifications are known to enhance gene expression in a cellular host. These include elimination of sequences encoding spurious polyadenylation signals, exon-intron splice site signals, transposon-like repeats, and other well -characterized sequences that may be deleterious to gene expression.
- the GC content of the sequence may be adjusted to levels average for a given cellular host, as calculated by reference to known genes expressed in the host cell.
- host cell refers to a cell which contains a vector and supports the replication and/or expression of the expression vector is intended. Host cells may be prokaryotic cells such as E.
- coli or eukaryotic cells such as yeast, insect, amphibian or mammalian cells or monocotyledonous or dicotyledonous plant cells.
- An example of a monocotyledonous host cell is a maize host cell.
- the sequence is modified to avoid predicted hairpin secondary mRNA structures.
- the various DNA fragments may be manipulated so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
- adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites or the like.
- in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions may be involved.
- a number of promoters can be used in the practice of the embodiments.
- the promoters can be selected based on the desired outcome.
- the nucleic acids can be combined with constitutive, tissue-preferred, inducible
- the methods of the embodiments involve introducing a polypeptide or polynucleotide into a plant.
- “Introducing” is as used herein means presenting to the plant the polynucleotide or polypeptide in such a manner that the sequence gains access to the interior of a cell of the plant.
- the methods of the embodiments do not depend on a particular method for introducing a polynucleotide or polypeptide into a plant, only that the polynucleotide(s) or polypeptide(s) gains access to the interior of at least one cell of the plant.
- Methods for introducing polynucleotide(s) or polypeptide(s) into plants include, but are not limited to, stable transformation methods, transient transformation methods, and virus- mediated methods.
- “Stable transformation” as used herein means that the nucleotide construct introduced into a plant integrates into the genome of the plant and is capable of being inherited by the progeny thereof. “Transient transformation” as used herein means that a polynucleotide is introduced into the plant and does not integrate into the genome of the plant or a polypeptide is introduced into a plant. “Plant” as used herein refers to whole plants, plant organs (e.g., leaves, stems, roots, etc.), seeds, plant cells, propagules, embryos and progeny of the same. Plant cells can be differentiated or undifferentiated (e.g. callus, suspension culture cells, protoplasts, leaf cells, root cells, phloem cells and pollen).
- Transformation protocols as well as protocols for introducing nucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation. Suitable methods of introducing nucleotide sequences into plant cells and subsequent insertion into the plant genome include microinjection (Crossway et al. (1986) Biotechniques 4:320-334), electroporation (Riggs et al. 1986 Proc. Natl. Acad. Sci. USA 83:5602-5606), dgrotocterzwm-mediated transformation (US Patent Numbers 5,563,055 and 5,981,840), direct gene transfer (Paszkowski et al. 1984 EMBO J.
- R genes e.g., encoding one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof
- the identified polynucleotides can be introduced into a desired location in the genome of a plant through the use of double-stranded break technologies such as TALENs, meganucleases, zinc finger nucleases, CRISPR-Cas, and the like.
- the R gene can be introduced into a desired location in a genome using a CRISPR-Cas system, for the purpose of site-specific insertion.
- the desired location in a plant genome can be any desired target site for insertion, such as a genomic region amenable for breeding or may be a target site located in a genomic window with an existing trait of interest.
- Existing traits of interest could be either an endogenous trait or a previously introduced trait.
- an R gene can be altered though gene editing in its native site to encode a R polypeptide having the amino acid sequence set forth in one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof.
- an R gene can be introduced by genome editing at a different genomic location.
- nucleotide construct comprising an R gene sequence encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478 (or a fragment or variant thereof) can be inserted at a genomic locus other than the R gene’s native genomic locus.
- genome editing technologies may be used to alter or modify the polynucleotide sequence to make it a favorable R gene allele.
- Site specific modifications can be introduced into the desired R gene allele using any method for introducing site specific modification, including, but not limited to, through the use of gene repair oligonucleotides (e.g. US Publication 2013/0019349), or through the use of double-stranded break technologies such as TALENs, meganucleases, zinc finger nucleases, CRISPR-Cas, and the like.
- Such technologies can be used to modify the previously introduced polynucleotide through the insertion, deletion or substitution of nucleotides within the introduced polynucleotide.
- doublestranded break technologies can be used to add additional nucleotide sequences to the introduced polynucleotide. Additional sequences that may be added include, additional expression elements, such as enhancer and promoter sequences.
- genome editing technologies may be used to position additional disease resistant proteins in close proximity to the R gene sequence within the genome of a plant, in order to generate molecular stacks disease resistant proteins.
- an “altered target site,” “altered target sequence.” “modified target site,” and “modified target sequence” are used interchangeably herein and refer to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence.
- Such “alterations” include, for example: (i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an insertion of at least one nucleotide, or (iv) any combination of (i) - (iii).
- NB-ARC domain clustering was carried out by pairwise alignment using MUSCLE, followed by construction of a phylogenetic tree with a maximum likelihood tree using 50 bootstraps, through MEGA software (version 10.0.5) (Kumar et al. 2018 Mol Biol Evol 35: 1547-1549).
- RNA-seq library construction used in Examples below. Leaves from plants grown under the conditions listed above were sampled at flowering stage (50 to 70 days, depending on the line), and their total RNA was isolated from ground frozen tissue with RNeasy (Qiagen Inc., Valencia, CA), according to manufacturer’s protocol. Total RNA was then analyzed for quality and quantity with the Agilent Bioanalyzer RNA Nano kit (Agilent Technologies, Santa Clara, CA) and normalized to lug input per sample. Sequencing libraries were prepared according to Illumina Inc. (San Diego, CA) TruSeq mRNA-Seq protocols.
- RNAs were isolated via attachment to oligo (dT) beads, fragmented and reverse transcribed into cDNA by random hexamer primers with Superscript II reverse transcriptase (Life Technologies, Carlsbad, CA). The resulting cDNAs were end repaired, 3 prime A-tailed and ligated with Illumina indexed TruSeq adapters. Ligated cDNA fragments were PCR amplified with Illumina TruSeq primers, purified with AmpureXP Beads (Beckman Coulter Genomics, Danvers, MA) and checked for quality and quantity with the Agilent TapeStation 4200 system with DI 000 ScreenTape. Libraries were combined into one sequencing pool and was normalized to 2nM.
- the pool was denatured according to Illumina sequencing protocols, hybridized and clustered on two flow cell lanes of a NovaSP flow cell using the NovaSeq 6000. Single-end fifty base sequences and eight base dual-index sequences were generated on the NovaSeq 6000 according to Illumina protocols. Data was trimmed for quality with a minimum threshold of Q13 and the resulting sequences were split by index identifier. Sequencing data is available at the Sequence Read Archive (SRA) database, accession GSE206952.
- SRA Sequence Read Archive
- RNA-seq expression data was obtained from a short-read repository (SRA, https://www.ncbi.nlm.nih.gov/sra, accessions ERX3793507-ERX3793986).
- SRA short-read repository
- RNA-seq reads obtained from SRA, as well as those generated through our own library construction and sequencing, were then quantified by running Salmon (version 1.1.0) against the transcriptome of each NAM founder line, with GC bias correction (Patro et al., 2017).
- Transcript expression per library was then converted to gene expression per tissue using the DESeq2 package in R (version 1.31.6) (Love et al., 2014).
- Intra-cluster expression variability was assessed by calculating the average pairwise Manhattan distance of log-transformed expression values for each cluster, using spatial distance module of SciPy (version 1.5.4).
- Example 1 Identification of NLRs.
- NAM maize nested association mapping
- NLRs The majority of NLRs (57 %) had canonical structures, with a coiled-coil region, followed by an NB-ARC domain, terminating in a series of LRRs. Some alternative structures were abundant, including proteins containing only a coiled-coil and an NB-ARC domain (14.6 %), proteins containing only an NB-ARC domain and LRRs (11.4 %) and proteins with an NB- ARC and no other canonical NLR domains (6.1 %). Interestingly, several genes were identified that may be the result of a two-NLR fusion.
- NAM-associated domain 25 of the NAM founder lines contained an NLR on chromosome 6 which had a coiled-coil-NB-ARC-LRR-NB-ARC-LRR structure, with a C-terminal integrated no apical meristem associated (NAM-associated) domain (Cheng et al., 2012).
- Example 2 Integrated domains in NLRs of NAM founder lines.
- HMMer was used to search for atypical integrated domains within the NAM NLR repertoire, i.e., the NAM’s NLRome. After identifying all domains via HMMer, custom python scripts were employed to filter out all hits that overlapped canonical NLR domains. The resulting set of potential atypical domains was then filtered loosely (e-value ⁇ 0.11) and strictly (e-value ⁇ 0.01 and at least 40% of the domain covered). The loosely filtered set contained a number of domains of unknown function and canonical domains with very poor coverage. Although the majority of these hits are likely false positives, some may represent true IDs that have undergone significant divergence after their neofunctionalization. After filtering for high confidence domain calls and collapsing redundant domains, a total of 19 strictly filtered unique integrated domains were found across all NAM NLRs (Fig. 2).
- the most frequent integrated domain was a kinase, which appeared in two to three NLRs in each NAM founder line.
- PAH amphipathic helix
- NAM-associated no apical meristem-associated domain
- the unfiltered set included many low-quality domain hits found in only a single gene, the more strictly filtered set only included two domains that appeared uniquely in a single gene in only one NAM founder line (zf-RVT and UvsW).
- the M0I8W genome contains nucleotides that potentially encode an NB-ARC domain and which cluster very closely with the B73 gene NB-ARC domain (98.3% sequence identity), but no actual gene was found to be produced at this locus. Similar genomic/genic NB-ARC clusters were found throughout the genome, including Chrl (M0I8W Zm000034a005849), Chr2 (M0I8W Zm00034a016521), Chr4 (M0I8W Zm000034a031848), Chr6 (B73 Zm00001e031193), Chr7 (M0I8W Zm00034a051957) and ChrlO (B73 Zm00001e039226).
- NLRs were found to be distributed as singletons and small groups throughout the genome, but many existed in a few large clusters of variable size in which many NLRs were concentrated in a small genomic space. For the purpose of this analysis, physically clustered genes where those considered to reside within 1 MB of another NLR.
- This cluster also contained a large number of genomic NB-ARCs without definitive gene models, with the most extreme example being M37W, which had 17 NLR genes and 18 genomic regions with potential to encode NB-ARCs, but gene model derived from RNA-seq data. Unsurprisingly this cluster also had a high degree of PAV and allelic diversity. Sequence-based clustering revealed that this cluster is actually comprised of two groups which are distinct at the sequence level but in very close proximity physically.
- NLRs Despite the distance and potential intervening gene, these NLRs appeared to be highly co-regulated, averaging an R2 of 0.97 across different tissue types.
- Clustering of the protein sequences of all NAM NLRs to determine their relationships was done using OrthoAgogue software application (Ekseth et al. 2014 Bioinformatics 30(5): 734-736. 158 clusters were identified. 20 were classified as “core” NLR clusters, with all NAM founder lines containing at least one member. A total of 15 clusters were present in all but one NAM founder line and 11 were missing from only two NAM founder lines. On average, clusters contained at least one member in 16 out of the 26 NAM founder lines, indicating that PAV was the norm for most NLRs across the lines.
- NLRs are known to be a very diverse group of genes, with high presence-absence variation, high Ka/Ks ratios and frequent intergenic crossovers in other species.
- OrthoAug clusters were examined for outliers on different chromosomes or significantly different positions relative to other members. Although the vast majority of NLRs (98.7 %) resided in groups that contained similar positions on the same chromosome, several outliers were also identified. The most extreme outliers were found in Oh7B, which contained 11 NLRs on chromosome 9 that clustered with chromosome 10 NLRs from all other NAM founder lines.
- NLR gene expression Besides transposition, an alternative or additional explanation may be that the rapidly evolving nature of NLRs caused two separate clusters to undergo convergent evolution.
- Subsequent expression analysis revealed relatively low Manhattan distances for pairwise comparisons within these clusters, providing further evidence for their relatedness (see “NLR gene expression”).
- RNA-seq data was originally intended for transcriptome annotation and most tissues only contained two biological replicates, reducing the statistical power of differential expression testing. Therefore the data was used only to assess broad expression differences across tissues and have noted all cases where the two biological replicates are substantially divergent (> 2-fold difference), and a third biological replicate would be required to get a more accurate expression estimate.
- the public data was also supplemented with additional RNA-seq libraries that contained four biological replicates from each NAM founder line constructed from R1 leaves, a developmental stage at which plants often encounter pathogen challenge in the field.
- NLRs were found to be expressed at a significant level across all tissues surveyed (average fragments per kilobase of exon per million mapped fragments or FPKM of 6.75), with the highest average expression found in vegetative tissue. Endosperm had the lowest median NLR expression, followed by embryo, anther, ear inflorescence and tassel. All vegetative tissues had similar levels of average NLR expression, with shoot having the lowest average NLR expression (4.52 FPKM) and leaf base having the highest (7.85 FPKM).
- NLRs which lacked LRR domains were expressed at a slightly lower level than those containing the canonical coiled-coil, NB-ARC and LRR domains (average FPKM of 4.26 compared to 6.94).
- RPW8 NLRs were found in the NAM founder lines, they both possessed above average expression levels (average 18.98 FPKM).
- the rare tissue-specific expression patterns may have bearing on resistance gene selection for diseases which are known to invade specific tissues.
- the ChrlO->Chr2 translocation which resulted in a sequence-based cluster containing a mixture of genes from different chromosomes also possessed a Manhattan distance which was similar to clusters containing non-mixed genes (23.2, compared to an average of 25.4).
- Example 7 Diversity within clusters at the whole gene and domain-level.
- Entropy variation across the different regions of the NLR proteins within each cluster was assessed. After Shannon entropy was calculated at each position within each cluster, these values were binned into the following protein regions: coiled-coil, NB-ARC domain, spacer (region between NB-ARC and start of LRRs), LRRs, LRR spacers (regions in between LRRs), C-terminal and integrated domains. Coiled-coil regions, which have been proposed to play a role in inter- and intra-protein interaction, tended to have higher entropy than the whole protein (0.31).
- NB-ARC domains tend to have higher conservation than average within NLRs, and this was broadly consistent across the clusters from the NAM founder lines (average Shannon entropy of 0.10).
- Spacer sequences between the NB-ARC domain and LRR region also had low entropy on average (0.18).
- LRRs have been noted to have higher than average diversity, and we also found that they had high average Shannon entropy within clusters (0.38).
- the spacer regions between different LRRs on average had a similar level of entropy (0.38), but on a per-cluster basis, diversity of LRRs was often uncorrelated with diversity of LRR spacer regions.
- the Sec66 domain which has been proposed to be involved in protein translocation, had extremely low entropy (0.04) within its 24-member cluster, despite this cluster having very high entropy at the whole protein level (0.69).
- the majority of clusters with IDs tended to have high entropy either within the ID, or at the whole protein level, which may be reflective of their proposed role in direct effector binding.
- a “composite” NLR was constructed by averaging the entropy patterns of the most common domains. Average Shannon entropy of each position within each domain/protein region was calculated for all clusters (Fig. 3). For regions of variable size (spacers and C- terminal), the positions of entropy values were placed into 100 bins, with each bin representing 1% of the domain’s total size in a given cluster, before averaging. Only the four common LRR HMM models were included in the resulting composite NLR (LRR 1, LRR 4, LRR 6 and LRR 8). These LRR domains showed the second highest level of entropy, with only the C- terminal domains having higher average values. The resulting composite NLR shows the clear variability of NLR entropy throughout the different canonical domains (Fig. 3).
- NLRs were found to have very high levels of PAV and allelic diversity and were distributed unevenly across maize genomes, with a single cluster on chromosome 10 representing a significant portion of the total complement of almost all lines.
- the physical clustering seen across the maize genome correlates well with sequence-based clustering, enabling physical placement of NLRs based on sequence alone.
- the ability to infer physical location from sequence is beneficial for techniques such as resistance gene enrichment sequencing (RenSeq) (Jupe et al. 2013 Plant J, 76: 530-44).
- NLR expression across a wide array of tissue types indicated that genes in most-sequence based clusters shared tissue-specific expression patterns. The majority of NLRs were expressed ubiquitously, although some clear root-preferential clusters existed. A small number of outliers within sequence-based clusters exhibited different expression patterns compared to the rest of the cluster, including a chromosome 10 NLR which had leaf basespecific expression in most lines, but much broader expression in other lines. Such outliers may be indicative of neofunctionalization, although additional studies are needed to assess this possibility.
- PAH domains have not been reported as effector targets, they are known to integrated into the NLRs of other species and may be targeted by pathogens due to their role in protein-protein interaction of transcription factors (Kroj et al. 2016, supra; Bowen et al. 2010 J Mol Biol, 395: 937-49).
- a novel ID structure was found in a gene that contained both an N-terminal REC 104 domain and a mid-protein kinase domain.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Botany (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Biomedical Technology (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Developmental Biology & Embryology (AREA)
- Environmental Sciences (AREA)
- Immunology (AREA)
- Cell Biology (AREA)
- Mycology (AREA)
- Plant Pathology (AREA)
- Gastroenterology & Hepatology (AREA)
- Medicinal Chemistry (AREA)
- Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Provided are plants, cells, tissues, and germplasm thereof comprising R genes for increased plant diseases resistance. Also provided are breeding methods and methods of identifying and selecting plants having the disclosed R genes. Provided are methods to make novel R gene variants and fragments for disease resistance. The disclosed R genes are useful in the production of disease resistant plants through breeding, transgenic modification, or genome editing.
Description
PLANT PATHOGEN RESISTANCE GENES
REFERENCE TO SEQUENCE LISTING SUBMITTED ELECTRONICALLY
[0001] The official copy of the sequence listing is submitted electronically as an xml- formatted sequence listing file named 9667-US-PSP ST26 created on March 2, 2023, and having a size of 9,668,699 bytes which is filed concurrently with the specification. The sequence listing comprised in this xml-formatted document is part of the specification and is herein incorporated by reference in its entirety.
FIELD OF THE DISCLOSURE
[0002] The disclosures relates to disease resistance genes, plant breeding and methods of identifying and selecting disease resistance genes. Provided are novel genes and methods of identifying and selecting such genes that encode proteins providing plant resistance to various diseases and uses thereof. These disease resistant genes are useful in the production of resistant plants through breeding, transgenic modification, or genome editing.
BACKGROUND
[0003] Plant pathogens cause significant crop loss world-wide, and new resistance genes deployed to combat diseases can be overcome quickly. Plant disease resistance gene complements are the result of millions of years of coevolution with pathogens. Resistance genes encode proteins which form a multi-layer defense mechanism that can detect pathogen- associated molecular patterns (PAMPs) or damage-associated molecular patterns (DAMPs) through extracellular pattern recognition receptors (PRRs), as well small, secreted pathogen effectors, through intracellular nucleotide-binding leucine-rich repeat receptors (NLRs) (Zipfel 2014 Trends Immunol, 35:345-51; Monteiro and Nishimura 2018 Annu Rev Phytopathol, 56:243-267; Jones and Dangl, 2006 Nature, 444:323-9). PRRs are primarily comprised of trans-membrane domain-containing proteins, in which extracellular domains interact with PAMPs or DAMPs. This interaction can cause a conformatiexonal change that initiates a signaling cascade through the action of an intracellular kinase domain (Tang et al. 2017 Plant Cell, 29:618-637). Effectors are excreted by plant pathogens for a variety of purposes, including suppression of plant defense responses that are triggered by PRRs (Irieda et al. 2019 Proc Natl Acad Sci U S A, 116:496-505These proteins are often small, and typically undergo rapid evolution, meaning that the complement of pathogen effectors which plants interact with is in a constant state of flux (Newman et al. 2013 Front Plant Sci, 4, 139; Sanchez-Vallet et al. 2018 Annu Rev Phytopathol, 56, 21 -40). As a result, plant species typically harbor hundreds of different NLRs, which have high sequence diversity and presence absence variation (PAV) (Van de Weyer et al. 2019 Cell, 178, 1260-1272 el4; Shang et al. 2022 Cell Res, 32, 878-896).
NLRs have been found to underlie dominant resistance phenotypes in many crop species, including rice, soybean, wheat and maize (Liu et al. 2020 Plant Biotechnol J, 18: 1376-1383; Wang et al. 2021 Nat Commun, 12:6263; Saintenac et al. 2013 Science, 341 :783-786; Deng et al. 2022 Mol Plant, 15:904-912; Thatcher et al. 2022 Mol Plant Pathol DOI: 10.1111/mpp.13267).
[0004] Maize pathogens cause significant crop loss annually, and thus there is significant interest in identifying new sources of resistance genes (Mueller 2016 Plant Health Progress, 17: 12). Maize is thought to have been domesticated during a single event roughly 9,000 years ago, implying that a significant portion of the resistance gene diversity in maize’s wild ancestors may have been lost in modem day varieties through the initial domestication event and subsequent breeding (Yang et al. 2019 Proc Natl Acad Sci U S A, 116:5643-5652; Matsuoka et al. 2002 Proc Natl Acad Sci USA, 99, 6080-4).
[0005] There is a desire to identify new disease resistance genes that might be available for use in germplasm of commercial crops such as maize.
SUMMARY OF THE DISCLOSURE
[0006] The disclosed compositions and methods are based on the discovery disclosed herein of a large number of new maize genes, including genes that have the structural features and expression patterns making them suitable for use as disease resistance genes. These disease resistance genes (“R genes”) can provide increased resistance to a disease. The compositions and methods disclosed herein are thus useful in selecting disease resistant plants, breeding for disease resistant plants, creating transgenic disease resistant plants, and/or using genome editing to introduce or improve disease resistance in plants. Also provided herein are plants and methods for making plants having the disclosed markers and/or genes associated with disease resistance that is enhanced as compared to control plants. In some embodiments, the compositions and methods are useful in selecting disease resistant plants, introgressing disease resistance into plants, creating transgenic disease resistant plants, and/or creating disease resistant genome edited plants.
[0007] In an aspect, provided are methods for identifying and/or selecting one or more plant materials having an R gene or maker allele associated with increased plant disease resistance. As used herein, the term “plant materials” refers to one or more plants, plant cells, plant tissues, seeds, or germplasm thereof. In some examples, the methods for identifying and/or selecting comprise detecting or selecting one or more plant materials having a genomic region comprising a sequence that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the
selected from the group consisting of SEQ ID NOs: 1740-3478. The identified or selected plant may possess plant disease resistance that is newly conferred or enhanced relative to a control plant that does not have a genomic region comprising one or more of a genomic region comprising a sequence that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
[0008] In one aspect, methods are provided to identify and/or select plant materials with a QTL containing an R gene or marker allele associated with R gene that can confer increased resistance to plant disease. Generally, such methods can include obtaining a nucleic acid sample from a plant, seed, tissue or germplasm thereof; and screening the sample for the presence of a QTL containing the R gene or a marker allele associated with the R gene, wherein the R gene (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478. For example, the method can include screening the sample for the presence of a marker allele linked to the R gene, e.g., by 10 cM, 9 cM, 8 cM, 7 cM, 6 cM, 5 cM, 4 cM, 3 cM, 2 cM, 1 cM, 0.9 cM, 0.8 cM, 0.7 cM, 0.6 cM, 0.5 cM, 0.4 cM, 0.3 cM, 0.2 cM, 0.1 cM, or less on a single meiosis-based genetic map, and associated. The method can further include detecting one or more R genes or one or more marker alleles linked to R genes, where the one or more R gene (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478, thereby identifying the plant material as comprising a QTL or marker allele associated with increased resistance to plant disease. Additionally, the method can include selecting the plant material identified as comprising one or more R genes or one or more marker alleles linked to R genes.
[0009] In a particular example, the foregoing method of identifying and/or selecting plant materials with a QTL or marker allele associated with increased resistance to plant disease can include obtaining a nucleic acid sample from each of one or more plants, seeds, tissues or germplasm in a population; screening each sample for the presence of one or more R gene, a QTL comprising one or more R gene, or a marker allele associated with the R gene, wherein each R gene (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478; and selecting one or more of the plants, seeds, tissues or germplasm having the R gene associated with increased resistance to plant disease.
[0010] In another example, the foregoing methods of identifying and/or selecting plant materials with increased resistance to plant disease can include obtaining a nucleic acid sample from one or more plants, seeds, tissues or germplasm, each sample being representative of a plurality (e.g., a population) of plants, seeds, tissues or germplasm; screening each sample for the presence of one or more of the foregoing R genes, a QTL comprising one or more of the foregoing R genes, or a marker allele associated with the foregoing R genes, wherein each R gene (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: l- 1739 or (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478; and selecting one or more plurality of plants, seeds, tissues or germplasm, wherein the representative sample for the selected plurality has the one or more of the foregoing R genes, a QTL comprising one or more of the foregoing R genes, or a marker allele associated with the foregoing R genes.
[0011] The foregoing methods identifying and/or selecting plants can further include crossing at least one of the selected plants comprising an R gene to a second plant that does not have the R gene, thereby producing a progeny plant whose genome comprises one or more R genes that (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1 - 1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478. In a further example, the second plant is one of a plant line (a “recurrent parent line”) and the method further includes crossing the progeny plant with another plant of the recurrent parent line to produce a second-generation progeny whose genome comprises one or more R genes that (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478. Optionally, the second- generation progeny can be crossed with the recurrent parent line to produce a third-generation progeny whose genome comprises one or more R genes that (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478. This process can be repeated three, four, five, six, seven, or more times, such that each subsequent generation progeny is crossed with the recurrent parent line, thereby introgressing the R gene into the recurrent parent line.
[0012] In an alternative method, a plant having the one or more R genes is crossed with a second plant to produce progeny plants. The progeny plants are screened for a QTL or marker allele associated with the R gene in accordance with the methods disclosed herein. Generally, such screening includes obtaining a nucleic acid sample from each of the progeny plants and
screening the sample for the presence of nucleic acid comprising one or more R genes that
(i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or
(ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478; thereby identifying novel progeny plants comprising a one of the R genes disclosed herein.
[0013] In another aspect, methods are provided that include expressing in a plant material a heterologous nucleic acid capable increasing plant disease resistance. The method can include introducing into the plant material a nucleic acid sequence, e.g., by transgenic modification or genome editing, approaches. In some examples, the plant material is susceptible to plant disease prior to introducing the heterologous nucleic acid. For example, the genome of a plant (e.g., a plant that is that is susceptible to disease) is altered by transgenic modification or gene editing to include one or more R genes that (i) are at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence having 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478. In some examples, the transgenic or genome edited plant materials provide increased resistance to a plant disease relative to otherwise isogenic plant lacking materials lacking the R gene introduced by transgenic or gene edited modification.
[0014] Provided herein is an isolated construct or a method of introducing such a construct into a plant material. The construct comprises a nucleic acid that is heterologous to the plant material, and the heterologous nucleic acid comprises an R gene sequence that (i) comprises at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encodes a polypeptide comprising at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478. In some examples the construct is a recombinant construct in the foregoing R gene sequence is operably linked to at least one heterologous regulatory sequence. Thus provided herein is a method of introducing the foregoing construct comprising an R gene sequence into plant material wherein, for example, the construct comprises the R gene sequence operably linked to its native promoter and the construct is introduced into a heterologous genomic locus that did not comprise the R gene prior to the construct being introduced. In another example, the foregoing construct is recombinant expression construct that comprises the R gene operably linked to a heterologous
promoter; and the method comprises introducing the recombinant expression construct into the plant material. Also provided are plant materials, such as a plant (e.g. a maize plant), plant cell, plant tissue, seed, or germplasm thereof, comprising the isolated construct disclosed herein.
[0015] Methods for introducing constructs and plant promoters are described in more detail herein. For example, the methods embodied by the present disclosure relate to a method for transforming a host cell, which can be a plant cell. The method comprises transforming the host or plant cell with the isolated construct disclosed herein. The method can further include producing a plant by transforming a plant cell with a construct of the present disclosure and regenerating a plant from the transformed plant cell, thereby producing a plant having the R gene disclosed herein. In some examples, the regenerated plant has improved plant disease resistance, as compared to an isogenic plant lacking the R gene.
[0016] In some embodiments, the compositions and methods relate to plant material modified to include an R gene, the plant material having increased resistance to a plant disease, wherein prior to modification the plant material lacked the R. The plant material is modified, e.g., by mutagenesis, transgene insertion, or gene editing, to include a nucleotide sequence (i) having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, at least 99%, or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478.
[0017] In another aspect, provided herein is a method of generating a variant of an R gene, by gene shuffling one or more nucleotide sequences comprising an R gene (i) having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs:l-1739 or (ii) encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, at least 99%, or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478. Variants are then transiently or stably expressed in plant material and tested for whether they provide increased resistance to plant disease. For example, one or more variants can be incorporated into construct(s), and the construct(s) can be introduced into a regenerable plant cell; and a plant comprising the variant(s) construct can be regenerated from the plant cell. Plants containing the variant(s) can be evaluated for their tolerance/ susceptibility to gray leaf spot. Plants having a variant that provides increased resistance to plant disease,
relative to an isogenic plant that lacks the variant construct, can be selected. The plant can be maize, or the plant can be Arabidopsis, soybean, sunflower, sorghum, canola, wheat, alfalfa, cotton, rice, barley, millet, sugar cane, or switchgrass.
[0018] Also provided is a method of identifying an allelic variant of a R gene disclosed herein. The method comprises obtaining a plurality of plants each of which exhibits differing levels of plant disease resistance; screening nucleic acid samples from each plant for the presence of allelic variations in the R gene sequence (i) having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, at least 99%, or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478; evaluating the variations for genetic linkage to altered tolerance/susceptible to the plant disease; and identifying one or more allelic variations associated with increased resistance to plant disease. [0019] The disclosure also provides plants identified, selected, or created using any of the methods presented herein.
[0020] In each of the foregoing aspects and examples of the disclosure, the R gene can be associated with increased disease resistance to a plant disease. As used herein, a plant disease includes , bacterial leaf blight and stalk rot; bacterial leaf spot; bacterial stripe; chocolate spot; goss's bacterial wilt and blight; holcus spot; purple leaf sheath; seed rot-seedling blight; bacterial wilt; com stunt; anthracnose leaf blight; gray leaf spot; aspergillus ear and kernel rot; banded leaf and sheath spot; black bundle disease; black kernel rot; borde bianco; brown spot; black spot; stalk rot; cephalosporium kernel rot; charcoal rot; corticium ear rot; curvularia leaf spot; didymella leaf spot; diplodia ear rot and stalk rot; seed rot; corn seedling blight; diplodia leaf spot or leaf streak; downy mildews; brown stripe downy mildew; crazy top downy mildew; green ear downy mildew; graminicola downy mildew; java downy mildew; Philippine downy mildew; sorghum downy mildew; spontaneum downy mildew; sugarcane downy mildew; dry ear rot; ergot; horse's tooth; corn eyespot; fusarium ear and stalk rot; fusarium blight; seedling root rot; gibberella ear and stalk rot; gray ear rot; gray leaf spot; cercospora leaf spot; helminthosporium root rot; hormodendrum ear rot; cladosporium rot; hyalothyridium leaf spot; late wilt; northern leaf blight; white blast; crown stalk rot; corn stripe; northern leaf spot; helminthosporium ear rot; penicillium ear rot; com blue eye; blue mold; phaeocytostroma stalk rot and root rot; phaeosphaeria leaf spot; physalospora ear rot; botryosphaeria ear rot; pyrenochaeta stalk rot and root rot; pythium root rot; pythium stalk rot; red kernel disease;
rhizoctonia ear rot; sclerotial rot; rhizoctonia root rot and stalk rot; rostratum leaf spot; common corn rust; southern corn rust; tropical corn rust; sclerotium ear rot; southern blight; selenophoma leaf spot; sheath rot; shuck rot; silage mold; common smut; false smut; head smut; southern corn leaf blight and stalk rot; southern leaf spot; tar spot; trichoderma ear rot and root rot; white ear rot, root and stalk rot; yellow leaf blight; zonate leaf spot; american wheat striate (wheat striate mosaic); barley stripe mosaic; barley yellow dwarf; brome mosaic; cereal chlorotic mottle; lethal necrosis (maize lethal necrosis disease); cucumber mosaic; johnsongrass mosaic; maize bushy stunt; maize chlorotic dwarf; maize chlorotic mottle; maize dwarf mosaic; maize leaf fleck; maize pellucid ringspot; maize rayado fino; maize red leaf and red stripe; maize red stripe; maize ring mottle; maize rough dwarf; maize sterile stunt; maize streak; maize stripe; maize tassel abortion; maize vein enation; maize wallaby ear; maize white leaf; maize white line mosaic; millet red leaf; and northern cereal mosaic.
BRIEF DESCRIPTION OF THE FIGURES
[0021] Fig. 1 is a bar chart showing the distribution of different domain architectures involving canonical NLR domains in maize; abundance of indicated domain architectures are shown for the sum of all 26 maize NAM lines. CC: coiled-coil, NB: NB-ARC, LRR: Leucine- rich repeat, TIR:Toll/interleukin 1 receptor, RPW8: RPW8-like coiled-coil.
[0022] Fig. 2 is a bar chart showing the distribution of integrated domains; atypical integrated domains identified via HMMer searches of NLR proteins are shown for all 26 NAM genomes.
[0023] Figure 3: is a graph shows the average Shannon Entropy across different NLR features of a composite constructed by averaging the entropy of all 158 NLR clusters.
DETAILED DESCRIPTION
[0024] As used herein the singular forms “a”, “and”, and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a cell” includes a plurality of such cells and reference to “the protein” includes reference to one or more proteins and equivalents thereof, and so forth. All technical and scientific terms used herein have the same meaning as commonly understood to one of ordinary skill in the art to which this disclosure belongs unless clearly indicated otherwise.
[0025] A gene or allele is “associated with” a trait when it is part of or linked to a DNA sequence or allele that affects the expression of the trait. The presence of the allele is an indicator of how the trait will be expressed.
[0026] As used to herein, “disease resistant”, “increased plant disease resistance”, “increased resistance to plant disease”, “plant disease resistance” and the like refer to a plant
showing increase resistance to a disease compared to a control plant, e.g., a control plant can be one that lacks the QTL or R gene that provides disease resistance but is otherwise isogenic to the disease resistant plant. Disease resistance may manifest in fewer and/or smaller lesions, increased plant health, increased yield, increased root mass, increased plant vigor, less or no discoloration, increased growth, reduced necrotic area, or reduced wilting. In some embodiments, an R gene or variant disclosed herein may show resistance one or more diseases [0027] A plant having disease resistance may have 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% increased resistance to a disease compared to a control plant. In some embodiments, a plant may have 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% increased plant health in the presence of a disease compared to a control plant. In some embodiments, a plant comprising
[0028] As used herein, the term “chromosomal interval” designates a contiguous linear span of genomic DNA that resides in planta on a single chromosome. The genetic elements or genes located on a single chromosomal interval are physically linked. The size of a chromosomal interval is not particularly limited. In some aspects, the genetic elements located within a single chromosomal interval are genetically linked, typically with a genetic recombination distance of, for example, less than or equal to 20 cM, or alternatively, less than or equal to 10 cM. That is, two genetic elements within a single chromosomal interval undergo recombination at a frequency of less than or equal to 20% or less than or equal to 10%.
[0029] The term “crossed” or “cross” refers to a sexual cross and involved the fusion of two haploid gametes via pollination to produce diploid progeny (e.g., cells, seeds or plants). The term encompasses both the pollination of one plant by another and selfing (or self- pollination, e.g., when the pollen and ovule are from the same plant).
[0030] An “elite line” is any line that has resulted from breeding and selection for superior agronomic performance.
[0031] A “favorable allele” is the allele at a particular locus (a marker, a QTL, a gene etc.) that confers, or contributes to, an agronomically desirable phenotype, e.g., disease resistance, and that allows the identification of plants with that agronomically desirable phenotype. A favorable allele of a marker is a marker allele that segregates with the favorable phenotype.
[0032] “Genetic markers” are nucleic acids that are polymorphic in a population and where the alleles of which can be detected and distinguished by one or more analytic methods, e.g., RFLP, AFLP, isozyme, SNP, SSR, and the like. The term also refers to nucleic acid sequences complementary to the genomic sequences, such as nucleic acids used as probes. Markers corresponding to genetic polymorphisms between members of a population can be detected by
methods well-established in the art. These include, e.g., PCR-based sequence specific amplification methods, detection of restriction fragment length polymorphisms (RFLP), detection of isozyme markers, detection of polynucleotide polymorphisms by allele specific hybridization (ASH), detection of amplified variable sequences of the plant genome, detection of self-sustained sequence replication, detection of simple sequence repeats (SSRs), detection of single nucleotide polymorphisms (SNPs), or detection of amplified fragment length polymorphisms (AFLPs). Well established methods are also known for the detection of expressed sequence tags (ESTs) and SSR markers derived from EST sequences and randomly amplified polymorphic DNA (RAPD).
[0033] “Germplasm” refers to genetic material of or from an individual (e.g., a plant), a group of individuals (e.g., a plant line, variety or family), or a clone derived from a line, variety, species, or culture, or more generally, all individuals within a species or for several species (e.g., maize germplasm collection or Andean germplasm collection). The germplasm can be part of an organism, cell, or can be separate from the organism or cell. In general, germplasm provides genetic material with a specific molecular makeup that provides a physical foundation for some or all of the hereditary qualities of an organism or cell culture. As used herein, germplasm includes cells, seed or tissues from which new plants may be grown, or plant parts, such as leafs, stems, pollen, or cells, that can be cultured into a whole plant.
[0034] A “haplotype” is the genotype of an individual at a plurality of genetic loci, i.e. a combination of alleles. Typically, the genetic loci described by a haplotype are physically and genetically linked, i.e., on the same chromosome segment.
[0035] The term “heterogeneity” is used to indicate that individuals within the group differ in genotype at one or more specific loci.
[0036] The heterotic response of material, or “heterosis”, can be defined by performance which exceeds the average of the parents (or high parent) when crossed to other dissimilar or unrelated groups.
[0037] A “heterotic group” comprises a set of genotypes that perform well when crossed with genotypes from a different heterotic group (Hallauer et al. (1998) Corn breeding, p. 463- 564. In G.F. Sprague and J.W. Dudley (ed.) Corn and corn improvement). Inbred lines are classified into heterotic groups, and are further subdivided into families within a heterotic group, based on several criteria such as pedigree, molecular marker-based associations, and performance in hybrid combinations (Smith et al. (1990) Theor. Appl. Gen. 80:833-840). The two most widely used heterotic groups in the United States are referred to as “Iowa Stiff Stalk
Synthetic” (also referred to herein as “stiff stalk”) and “Lancaster” or “Lancaster Sure Crop” (sometimes referred to as NSS, or non-Stiff Stalk).
[0038] Some heterotic groups possess the traits needed to be a female parent, and others, traits for a male parent. For example, in maize, yield results from public inbreds released from a population called BSSS (Iowa Stiff Stalk Synthetic population) has resulted in these inbreds and their derivatives becoming the female pool in the central Com Belt. BSSS inbreds have been crossed with other inbreds, e.g. SD 105 and Maiz Amargo, and this general group of materials has become known as Stiff Stalk Synthetics (SSS) even though not all of the inbreds are derived from the original BSSS population (Mikel and Dudley 2006 Crop Set 46: 1193- 1205). By default, all other inbreds that combine well with the SSS inbreds have been assigned to the male pool, which for lack of a better name has been designated as NSS, i.e. Non-Stiff Stalk. This group includes several major heterotic groups such as Lancaster Surecrop, lodent, and Learning Corn.
[0039] The term “homogeneity” indicates that members of a group have the same genotype at one or more specific loci.
[0040] The term “hybrid” refers to the progeny obtained between the crossing of at least two genetically dissimilar parents.
[0041] The term “inbred” refers to a line that has been bred for genetic homogeneity.
[0042] The term “introgression” refers to the transmission of a desired allele of a genetic locus from one genetic background to another. For example, introgression of a desired R gene allele at a specified locus can be transmitted to at least one progeny via a sexual cross between two parents of the same species, where at least one of the parents has the desired allele in its genome. Alternatively, for example, transmission of an allele can occur by recombination between two donor genomes, e.g., in a fused protoplast, where at least one of the donor protoplasts has the desired allele in its genome. The desired allele can be, e.g., detected by a marker that is associated with a phenotype, at a QTL, a transgene, or the like. Offspring comprising the desired allele may be repeatedly backcrossed to a line having a desired genetic background and selected for the desired allele, to result in the allele becoming fixed in a selected genetic background.
[0043] The process of “introgressing” is often referred to as “backcrossing” when the process is repeated two or more times.
[0044] A “line” or “strain” is a group of individuals of identical parentage that are generally inbred to some degree and that are generally homozygous and homogeneous at most loci
(isogenic or near isogenic). A “subline” refers to an inbred subset of descendents that are genetically distinct from other similarly inbred subsets descended from the same progenitor.
[0045] As used herein, the term “linked” or “linkage” is used to describe the degree with which one marker locus is associated with another marker locus or some other locus. The linkage relationship between a molecular marker and a locus affecting a phenotype is given as a “probability” or “adjusted probability”. Linkage can be expressed as a desired limit or range. For example, in some embodiments, any marker is linked (genetically and physically) to any other marker when the markers are separated by less than 50, 40, 30, 25, 20, or 15 map units (or cM) of a single meiosis map (a genetic map based on a population that has undergone one round of meiosis, such as e.g. an F2; the IBM2 maps consist of multiple rounds of meiosis). In some aspects, it is advantageous to define a bracketed range of linkage, for example, between 10 and 20 cM, between 10 and 30 cM, or between 10 and 40 cM. The more closely a marker is linked to a second locus, the better an indicator for the second locus that marker becomes. The phrase “closely linked”, in the present application, means that recombination between two linked loci occurs with a frequency of equal to or less than about 10% (i.e., are separated on a genetic map by not more than 10 cM). Put another way, the closely linked loci co-segregate at least 90% of the time. Marker loci are especially useful with respect to the subject matter of the current disclosure when they demonstrate a significant probability of co-segregation (linkage) with a desired trait (e.g., increased resistance to plant disease). Thus, “closely linked” loci such as a marker locus and a second locus can display an inter-locus recombination frequency of 10% or less, preferably about 9% or less, still more preferably about 8% or less, yet more preferably about 7% or less, still more preferably about 6% or less, yet more preferably about 5% or less, still more preferably about 4% or less, yet more preferably about 3% or less, and still more preferably about 2% or less. In highly preferred embodiments, the relevant loci display a recombination frequency of about 1% or less, e.g., about 0.75% or less, more preferably about 0.5% or less, or yet more preferably about 0.25% or less. Two loci that are localized to the same chromosome, and at such a distance that recombination between the two loci occurs at a frequency of less than 10% (e.g., about 9 %, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.75%, 0.5%, 0.25%, or less) are also said to be “in proximity to” each other. Since one cM is the distance between two markers that show a 1% recombination frequency, any marker is closely linked (genetically and physically) to any other marker that is in close proximity, e.g., at or less than 10 cM distant. Two closely linked markers on the same chromosome can be positioned 9, 8, 7, 6, 5, 4, 3, 2, 1, 0.75, 0.5 or 0.25 cM or less from each other. In some cases, two different markers can have the same genetic map coordinates. In that
case, the two markers are in such close proximity to each other that recombination occurs between them with such low frequency that it is undetectable.
[0046] The term “linkage disequilibrium” refers to a non-random segregation of genetic loci or traits (or both). In either case, linkage disequilibrium implies that the relevant loci are within sufficient physical proximity along a length of a chromosome so that they segregate together with greater than random (i.e., non-random) frequency. Markers that show linkage disequilibrium are considered linked. Linked loci co-segregate more than 50% of the time, e.g., from about 51% to about 100% of the time. In other words, two markers that co-segregate have a recombination frequency of less than 50% (and by definition, are separated by less than 50 cM on the same linkage group.) As used herein, linkage can be between two markers, or alternatively between a marker and a locus affecting a phenotype. A marker locus can be “associated with” (linked to) a trait. The degree of linkage of a marker locus and a locus affecting a phenotypic trait is measured, e.g., as a statistical probability of co-segregation of that molecular marker with the phenotype (e.g., an F statistic or LOD score).
[0047] Linkage disequilibrium is most commonly assessed using the measure r2, which is calculated using the formula described by Hill, W.G. and Robertson, A, Theor. Appl. Genet. 38:226-231(1968). When r2 = 1, complete LD exists between the two marker loci, meaning that the markers have not been separated by recombination and have the same allele frequency. The r2 value will be dependent on the population used. Values for r2 above 1/3 indicate sufficiently strong LD to be useful for mapping (Ardlie et al. 2002 Nature Reviews Genetics 3:299-309). Hence, alleles are in linkage disequilibrium when r2 values between pairwise marker loci are greater than or equal to 0.33, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 1.0.
[0048] As used herein, “linkage equilibrium” describes a situation where two markers independently segregate, i.e., sort among progeny randomly. Markers that show linkage equilibrium are considered unlinked (whether or not they lie on the same chromosome).
[0049] A “locus” is a position on a chromosome, e.g. where a nucleotide, gene, sequence, or marker is located.
[0050] The “logarithm of odds (LOD) value” or “LOD score” (Risch, 1992 Science 255(5046):803-804) is used in genetic interval mapping to describe the degree of linkage between two marker loci. A LOD score of three between two markers indicates that linkage is 1000 times more likely than no linkage, while a LOD score of two indicates that linkage is 100 times more likely than no linkage. LOD scores greater than or equal to two may be used to detect linkage. LOD scores can also be used to show the strength of association between marker loci and quantitative traits in “quantitative trait loci” mapping. In this case, the LOD
score’s size is dependent on the closeness of the marker locus to the locus affecting the quantitative trait, as well as the size of the quantitative trait effect.
[0051] The term “plant material” includes whole plants, plant cells, plant protoplast, plant cell or tissue culture from which plants can be regenerated, plant calli, plant clumps and plant cells that are intact in plants, or parts of plants, such as seeds, flowers, cotyledons, leaves, stems, buds, roots, root tips and the like. As used herein, a “modified plant” means any plant that has a genetic change due to human intervention. A modified plant may have genetic changes introduced through plant transformation, genome editing, mutagenesis, or conventional plant breeding.
[0052] A “marker” is a means of finding a position on a genetic or physical map, or else linkages among markers and trait loci (loci affecting traits). The position that the marker detects may be known via detection of polymorphic alleles and their genetic mapping, or else by hybridization, sequence match or amplification of a sequence that has been physically mapped. A marker can be a DNA marker (detects DNA polymorphisms), a protein (detects variation at an encoded polypeptide), or a simply inherited phenotype (such as the ‘waxy’ phenotype). A DNA marker can be developed from genomic nucleotide sequence or from expressed nucleotide sequences (e.g., from a spliced RNA or a cDNA). Depending on the DNA marker technology, the marker may consist of primers complementary to sequence flanking the locus and/or probes that hybridize to polymorphic alleles at the locus. A DNA marker, or a genetic marker, may also be used to describe the gene, DNA sequence or nucleotide on the chromosome itself (rather than the components used to detect the gene or DNA sequence) and is often used when that DNA marker is associated with a particular trait in human genetics (e.g. a marker for breast cancer). The term marker locus is the locus (gene, sequence or nucleotide) that the marker detects.
[0053] Markers can be defined by the type of polymorphism that they detect and also the marker technology used to detect the polymorphism. Marker types include but are not limited to, e.g., detection of restriction fragment length polymorphisms (RFLP), detection of isozyme markers, randomly amplified polymorphic DNA (RAPD), amplified fragment length polymorphisms (AFLPs), detection of simple sequence repeats (SSRs), detection of amplified variable sequences of the plant genome, detection of self-sustained sequence replication, or detection of single nucleotide polymorphisms (SNPs). SNPs can be detected e.g. via DNA sequencing, PCR-based sequence specific amplification methods, detection of polynucleotide polymorphisms by allele specific hybridization (ASH), dynamic allele-specific hybridization (DASH), molecular beacons, microarray hybridization, oligonucleotide ligase assays, Flap
endonucleases, 5’ endonucleases, primer extension, single strand conformation polymorphism (SSCP) or temperature gradient gel electrophoresis (TGGE). DNA sequencing, such as the pyrosequencing technology has the advantage of being able to detect a series of linked SNP alleles that constitute a haplotype. Haplotypes tend to be more informative (detect a higher level of polymorphism) than SNPs.
[0054] “Marker assisted selection” (of MAS) is a process by which individual plants are selected based on marker genotypes. “Marker assisted counter-selection” is a process by which marker genotypes are used to identify plants that will not be selected, allowing them to be removed from a breeding program or planting. A “marker haplotype” refers to a combination of alleles at a marker locus.
[0055] The term “molecular marker” may be used to refer to a genetic marker, as defined above, or an encoded product thereof (e.g., a protein) used as a point of reference when identifying a linked locus. A molecular marker can be derived from genomic nucleotide sequences or from expressed nucleotide sequences (e.g., from a spliced RNA, a cDNA, etc.), or from an encoded polypeptide. The term also refers to nucleic acid sequences complementary to or flanking the marker sequences, such as nucleic acids used as probes or primer pairs capable of amplifying the marker sequence. A “molecular marker probe” is a nucleic acid sequence or molecule that can be used to identify the presence of a marker locus, e.g., a nucleic acid probe that is complementary to a marker locus sequence. Alternatively, in some aspects, a marker probe refers to a probe of any type that is able to distinguish (i.e., genotype) the particular allele that is present at a marker locus. Nucleic acids are “complementary” when they specifically hybridize in solution. Some of the markers described herein are also referred to as hybridization markers when located on an indel region, such as the non-collinear region described herein. This is because the insertion region is, by definition, a polymorphism vis a vis a plant without the insertion. Thus, the marker need only indicate whether the indel region is present or absent. Any suitable marker detection technology may be used to identify such a hybridization marker, e.g. SNP technology is used in the examples provided herein.
[0056] The term “phenotype”, “phenotypic trait”, or “trait” can refer to the observable expression of a gene or series of genes. The phenotype can be observable to the naked eye, or by any other means of evaluation, e.g., weighing, counting, measuring (length, width, angles, etc.), microscopy, biochemical analysis, or an electromechanical assay. In some cases, a phenotype is directly controlled by a single gene or genetic locus, i.e., a “single gene trait” or a “simply inherited trait”. In the absence of large levels of environmental variation, single gene traits can segregate in a population to give a “qualitative” or “discrete” distribution, i.e. the
phenotype falls into discrete classes. In other cases, a phenotype is the result of several genes and can be considered a “multigenic trait” or a “complex trait”. Multigenic traits segregate in a population to give a “quantitative” or “continuous” distribution, i.e. the phenotype cannot be separated into discrete classes. Both single gene and multigenic traits can be affected by the environment in which they are being expressed, but multigenic traits tend to have a larger environmental component.
[0057] A “polymorphism” is a variation in the DNA between two or more individuals within a population. A polymorphism preferably has a frequency of at least 1% in a population. A useful polymorphism can include a single nucleotide polymorphism (SNP), a simple sequence repeat (SSR), or an insertion/deletion polymorphism, also referred to herein as an “indel”
[0058] The term “quantitative trait locus” or “QTL” refers to a region of DNA that is associated with the differential expression of a quantitative phenotypic trait in at least one genetic background, e.g., in at least one breeding population. The region of the QTL encompasses or is closely linked to the gene or genes that affect the trait in question.
[0059] A “reference sequence” or a “consensus sequence” is a defined sequence used as a basis for sequence comparison. The reference sequence for a marker is obtained by sequencing a number of lines at the locus, aligning the nucleotide sequences in a sequence alignment program (e.g. Sequencher), and then obtaining the most common nucleotide sequence of the alignment. Polymorphisms found among the individual sequences are annotated within the consensus sequence. A reference sequence is not usually an exact copy of any individual DNA sequence, but represents an amalgam of available sequences and is useful for designing primers and probes to polymorphisms within the sequence.
[0060] An “unfavorable allele” of a marker is a marker allele that segregates with the unfavorable plant phenotype, therefore providing the benefit of identifying plants that can be removed from a breeding program or planting.
[0061 ] The term “yield” refers to the productivity per unit area of a particular plant product of commercial value. Yield is affected by both genetic and environmental factors. “Agronomics,” “agronomic traits,” and “agronomic performance” refer to the traits (and underlying genetic elements) of a given plant variety that contribute to yield over the course of growing season. Individual agronomic traits include emergence vigor, vegetative vigor, stress tolerance, disease resistance or tolerance, herbicide resistance, branching, flowering, seed set, seed size, seed density, standability, threshability and the like. Yield is, therefore, the final culmination of all agronomic traits.
[0062] NLR-genes. The NBS-LRR (“NLR”) group of R genes is the largest class of R genes discovered to date. In Arabidopsis thaliana, over 150 are predicted to be present in the genome (Meyers et al. 2003 Plant Cell, 15:809-834; Monosi et al. 2004 Theoretical and Applied Genetics, 109:1434-1447), while in rice, approximately 500 NLR genes have been predicted (Monosi 2004, supra). The NBS-LRR class of R genes is comprised of two subclasses. Class 1 NLR genes contain a TIR-Toll/Interleukin-1 like domain at their N’ terminus; which to date have only been found in dicots (Meyers 2003, supra; Monosi 2004, supra). The second class of NBS-LRR contain either a coiled-coil domain or an (nt) domain at their N terminus (Baiet et al. 2002 Genome Research, 12: 1871-1884; Monosi 2004 supra; Pan et al. 2000 Journal of Molecular Evolution, 50:203-213). Class 2 NBS-LRR have been found in both di cot and monocot species. (Bai 2002, supra; Meyers 2003, supra; Monosi 2004, supra; Pan 2000, supra).
[0063] The NBS domain of the gene appears to have a role in signaling in plant defense mechanisms (van derBiezen et al. 1998, Current Biology: CB, 8:R226-R227). The LRRregion appears to be the region that interacts with the pathogen AVR products (Michelmore et al. 1998 Genome Res, 8:1113-1130; Meyers 2003 supra). This LRR region in comparison with the NB- ARC (NBS) domain is under a much greater selection pressure to diversify (Michelmore 1998, supra; Meyers 2003, supra; Palomino et al. 2002, Genome Research, 12: 1305-1315). LRR domains are found in other contexts as well; these 20-29-residue motifs are present in tandem arrays in a number of proteins with diverse functions, such as hormone - receptor interactions, enzyme inhibition, cell adhesion and cellular trafficking. A number of recent studies revealed the involvement of LRR proteins in early mammalian development, neural development, cell polarization, regulation of gene expression and apoptosis signaling.
[0064] NLRs typically comprise a nucleotide-binding domain, a series of leucine-rich repeats, and an N-terminal region which may include a coiled-coil (CC), Toll/Interleukin-1 (TIR) or resistance to powdery mildew 8 (RPW8) domain (Shao et al. 2016 Plant Physiol, 170:2095-109). The relative abundance of these different N-terminal domains varies significantly across different plant species, with Zea mays possessing coiled-coil N-terminal domains almost exclusively. In addition to these canonical domains, NLRs occasionally harbor atypical integrated domains (IDs) (Sarris et al. 2016, BMC Biol 14:8 ). IDs may arise via rare recombination events which result in domains with high similarity to effector targets being integrated into NLR genes, which then detect the presence of effectors though direct interaction (Grund et al. 2019 Plant Physiol 179: 1227-1235). Depending on their domain complement, NLRs can detect the presence of pathogen effectors through (i) direct interaction of an effector
with canonical NLR domains, (ii) direct interaction of an effector with an integrated domain that mimics the effector’s host target or (iii) interaction with a host gene targeted by an effector (guardee) to detect alteration of its normal state by the pathogen (van der Hoorn and Kamoun, 2008 Plant Cell 20:2009-17; Cesari et al. 2014 Front Plant Sci, 5:606. Cesari 2018 New Phytol 219: 17-24). The direct interaction mechanism of ID NLRs makes them particularly amenable to engineering, which has been shown to be able to expand the resistance spectrum of RGA5 in rice (Liu et al. 2021 Front Genet 12:694682). Additionally, “helper” NLRs typically contain all canonical NLR domains but do not function in direct detection of pathogen effectors, but instead act to transmit the signal of a “sensor” NLR (Wu et al. 2017 Proc Natl Acad Sci USA 114:8113-8118). Helper NLRs have been found in a variety of plant species, and can be specific to a single sensor NLR, or interact with a wide variety of sensors (Saile et al. 2020 PLoS Biol 18:e3000783).
[0065] NLR complements of a variety of plant species have been identified, including a nearly comprehensive set of Arabidopsis and rice NLR complements (Van de Weyer et al. 2019 Cell 178: 1260-1272 el4; Shang et al. 2022 Cell Res, 32:878-896). Studies in Arabidopsis and rice found that NLR PAV largely takes place through the expansion and contraction of large physically compact clusters of NLRs in a few locations within the genome (Meyers et al. 2003 Plant Cell, 15:809-34; Jacob et al. 2013 Front Immunol, 4:297). These regions are thought to represent evolutionary hotspots, where different NLRs may rapidly recombine to generate new sequence diversity (van Wersch and Li 2019 Trends Plant Sci, 24:688-699 ).
[0066] At the protein level, early work in lettuce, as well as the more recent work in Arabidopsis, has shown that different domains of NLRs are subject to differential evolutionary pressures (Meyers et al. 1998 Plant Cell, 10: 1833-46). NB-ARC domains, which are functional ATPases that control the activation states of NLRs, typically show high conservation, while LRRs and coiled-coil domains have much higher amino acid diversity (Qi et al. 2012 Plant Physiol, 158: 1819-32). Additionally, there is evidence that sensor NLRs, which must keep pace with rapidly evolving pathogen effectors, may have much higher diversity than helper NLRs, which must maintain the ability to interact with multiple different immune system components (Wu et al. 2017 Proc Natl Acad Sci U S A, 114:8113-8118.). Knowledge about the array of NLRs throughout the maize genome is therefore key to identifying putative sensor NLRs which may be responsible for resistance to a given pathogen.
[0067] Markers and linkage relationships. A common measure of linkage is the frequency with which traits cosegregate. This can be expressed as a percentage of cosegregation (recombination frequency) or in centiMorgans (cM). The cM is a unit of measure of genetic
recombination frequency. One cM is equal to a 1% chance that a trait at one genetic locus will be separated from a trait at another locus due to crossing over in a single generation (meaning the traits segregate together 99% of the time). Because chromosomal distance is approximately proportional to the frequency of crossing over events between traits, there is an approximate physical distance that correlates with recombination frequency.
[0068] Marker loci are themselves traits and can be assessed according to standard linkage analysis by tracking the marker loci during segregation. Thus, one cM is equal to a 1% chance that a marker locus will be separated from another locus, due to crossing over in a single generation.
[0069] The closer a marker is to a gene (e.g., R gene disclosed herein which (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478) controlling a trait of interest, the more effective and advantageous that marker is as an indicator for the desired trait. Closely linked loci display an inter-locus cross-over frequency of about 10% or less, preferably about 9% or less, still more preferably about 8% or less, yet more preferably about 7% or less, still more preferably about 6% or less, yet more preferably about 5% or less, still more preferably about 4% or less, yet more preferably about 3% or less, and still more preferably about 2% or less. In highly preferred embodiments, the relevant loci (e.g., a marker locus and a target locus) display a recombination frequency of about 1% or less, e.g., about 0.75% or less, more preferably about 0.5% or less, or yet more preferably about 0.25% or less. Thus, the loci are about 10 cM, 9 cM, 8 cM, 7 cM, 6 cM, 5 cM, 4 cM, 3 cM, 2 cM, 1 cM, 0.75 cM, 0.5 cM or 0.25 cM or less apart. Put another way, two loci that are localized to the same chromosome, and at such a distance that recombination between the two loci occurs at a frequency of less than 10% (e.g., about 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.75%, 0.5%, 0.25%, or less) are said to be “proximal to” each other.
[0070] Although particular marker alleles can co-segregate with the disease resistance trait, it is important to note that the marker locus is not necessarily responsible for the expression of the disease resistance phenotype. For example, it is not a requirement that the marker polynucleotide sequence be part of a gene that is responsible for the disease resistant phenotype (for example, is part of the gene open reading frame). The association between a specific marker allele and the disease resistance trait is due to the original “coupling” linkage phase between the marker allele and the allele in the ancestral line from which the allele originated. Eventually, with repeated recombination, crossing over events between the marker and genetic locus can change this orientation. For this reason, the favorable marker allele may change
depending on the linkage phase that exists within the parent having resistance to the disease that is used to create segregating populations. This does not change the fact that the marker can be used to monitor segregation of the phenotype. It only changes which marker allele is considered favorable in a given segregating population.
[0071] Marker assisted selection. Molecular markers can be used in a variety of plant breeding applications (e.g. see Staub et al. 1996 Hortscience 31 :729-741; Tanksley 1983 Plant Molecular Biology Reporter. 1 :3-8). One of the main areas of interest is to increase the efficiency of backcrossing and introgressing genes using marker-assisted selection (MAS). A molecular marker that demonstrates linkage with a locus affecting a desired phenotypic trait provides a useful tool for the selection of the trait in a plant population. This is particularly true where the phenotype is hard to assay. Since DNA marker assays are less laborious and take up less physical space than field phenotyping, much larger populations can be assayed, increasing the chances of finding a recombinant with the target segment from the donor line moved to the recipient line. The closer the linkage, the more useful the marker, as recombination is less likely to occur between the marker and the gene causing the trait, which can result in false positives. Having flanking markers decreases the chances that false positive selection will occur as a double recombination event would be needed. In the most preferred case, a marker is located within the gene itself, so that recombination cannot occur between the marker and the gene. In some embodiments, the methods disclosed herein produce a marker in a disease resistance gene, wherein the gene was identified by inferring genomic location from clustering of conserved domains or a clustering analysis.
[0072] When a gene is introgressed by MAS, it is not only the gene that is introduced but also the flanking regions (Gepts. 2002 Crop Sci 42: 1780-1790). This is referred to as “linkage drag.” In the case where the donor plant is highly unrelated to the recipient plant, these flanking regions carry additional genes that may code for agronomically undesirable traits. Linkage drag may also result in reduced yield or other negative agronomic characteristics even after multiple cycles of backcrossing into the elite line. This is also sometimes referred to as “yield drag.” The size of the flanking region can be decreased by additional backcrossing, although this is not always successful, as breeders do not have control over the size of the region or the recombination breakpoints (Young et al. 1998 Genetics 120:579-585). In classical breeding it is usually only by chance that recombinations are selected that contribute to a reduction in the size of the donor segment (Tanksley et al. 1989 Biotechnology 7: 257-264). Even after 20 backcrosses in backcrosses of this type, one may expect to find a sizeable piece of the donor chromosome still linked to the gene being selected. With markers however, it is possible to
select those rare individuals that have experienced recombination near the gene of interest. In 150 backcross plants, there is a 95% chance that at least one plant will have experienced a crossover within 1 cM of the gene, based on a single meiosis map distance. Markers will allow unequivocal identification of those individuals. With one additional backcross of 300 plants, there would be a 95% chance of a crossover within 1 cM single meiosis map distance of the other side of the gene, generating a segment around the target gene of less than 2 cM based on a single meiosis map distance. This can be accomplished in two generations with markers, while it would have required on average 100 generations without markers (See Tanksley et al., supra). When the exact location of a gene is known, flanking markers surrounding the gene can be utilized to select for recombinations in different population sizes. For example, in smaller population sizes, recombinations may be expected further away from the gene, so more distal flanking markers would be required to detect the recombination.
[0073] The key components to the implementation of MAS are: (i) defining the population within which the marker-trait association will be determined, which can be a segregating population, or a random or structured population; (ii) monitoring the segregation or association of polymorphic markers relative to the trait, and determining linkage or association using statistical methods; (iii) defining a set of desirable markers based on the results of the statistical analysis, and (iv) the use and/or extrapolation of this information to the current set of breeding germplasm to enable marker-based selection decisions to be made. The markers described in this disclosure, as well as other marker types such as SSRs and FLPs, can be used in marker assisted selection protocols.
[0074] SSRs can be defined as relatively short runs of tandemly repeated DNA with lengths of 6 bp or less (Tautz 1989 Nucleic Acid Research 17: 6463-6471; Wang et al. 1994 Theoretical and Applied Genetics, 88: 1-6). Polymorphisms arise due to variation in the number of repeat units, probably caused by slippage during DNA replication (Levinson and Gutman 1987 Mol Biol Evol 4: 203-221). The variation in repeat length may be detected by designing PCR primers to the conserved non-repetitive flanking regions (Weber and May 1989 Am J Hum Genet. 44:388-396). SSRs are highly suited to mapping and MAS as they are multi-allelic, codominant, reproducible and amenable to high throughput automation (Rafalski et al. 1996 Generating and using DNA markers in plants. In: Non-mammalian genomic analysis: a practical guide. Academic press, pp 75-135).
[0075] Various types of SSR markers can be generated, and SSR profiles can be obtained by gel electrophoresis of the amplification products. Scoring of marker genotype is based on the size of the amplified fragment.
[0076] Various types of FLP markers can also be generated. Most commonly, amplification primers are used to generate fragment length polymorphisms. Such FLP markers are in many ways similar to SSR markers, except that the region amplified by the primers is not typically a highly repetitive region. Still, the amplified region, or amplicon, will have sufficient variability among germplasm, often due to insertions or deletions, such that the fragments generated by the amplification primers can be distinguished among polymorphic individuals, and such indels are known to occur frequently in maize (Bhattramakki et al. 2002 Plant Mol Biol 48, 539-547; Rafalski 2002b, supra).
[0077] SNP markers detect single base pair nucleotide substitutions. Of all the molecular marker types, SNPs are the most abundant, thus having the potential to provide the highest genetic map resolution (Bhattramakki et al. 2002 Plant Molecular Biology 48:539-547). SNPs can be assayed at an even higher level of throughput than SSRs, in a so-called 'ultra-high- throughput' fashion, as SNPs do not require large amounts of DNA and automation of the assay may be straight-forward. SNPs also have the promise of being relatively low-cost systems. These three factors together make SNPs highly attractive for use in MAS. Several methods are available for SNP genotyping, including but not limited to, hybridization, primer extension, oligonucleotide ligation, nuclease cleavage, mini sequencing, and coded spheres. Such methods have been reviewed in: Gut 2001 Hum Mutat 17: 475-492; Shi 2001 Clin Chem 47: 164-172; Kwok 2000 Pharmacogenomics 1 : 95-100; and Bhattramakki and Rafalski 2001 Discovery and application of single nucleotide polymorphism markers in plants. In: R. J. Henry, Ed, Plant Genotyping: The DNA Fingerprinting of Plants, CABI Publishing, Wallingford. A wide range of commercially available technologies utilize these and other methods to interrogate SNPs including Masscode. TM. (Qiagen), INVADER®. (Third Wave Technologies) and Invader PLUS®, SNAPSHOT®. (Applied Biosystems), TAQMAN®. (Applied Biosystems) and BEADARRAYS®. (Illumina).
[0078] A number of SNPs together within a sequence, or across linked sequences, can be used to describe a haplotype for any particular genotype (Ching et al. 2002, BMC Genet. 3: 19; Gupta et al. 2001, Rafalski 2002b, Plant Science 162:329-333). Haplotypes can be more informative than single SNPs and can be more descriptive of any particular genotype. For example, a single SNP may be allele “T' for a specific line or variety with disease resistance, but the allele T' might also occur in the breeding population being utilized for recurrent parents. In this case, a haplotype, e.g. a combination of alleles at linked SNP markers, may be more informative. Once a unique haplotype has been assigned to a donor chromosomal region, that haplotype can be used in that population or any subset thereof to determine whether an
individual has a particular gene. Using automated high throughput marker detection platforms makes this process highly efficient and effective.
[0079] Many of the markers presented herein can readily be used as single nucleotide polymorphic (SNP) markers to select for one or more R genes disclosed herein. Using PCR, the primers are used to amplify DNA segments from individuals (preferably inbred) that represent the diversity in the population of interest. The PCR products are sequenced directly in one or both directions. The resulting sequences are aligned and polymorphisms are identified. The polymorphisms are not limited to single nucleotide polymorphisms (SNPs), but also include indels, CAPS, SSRs, and VNTRs (variable number of tandem repeats). Specifically, with respect to the fine map information described herein, one can readily use the information provided herein to obtain additional polymorphic SNPs (and other markers) within the region amplified by the primers disclosed herein. Markers within the described map region can be hybridized to BACs or other genomic libraries, or electronically aligned with genome sequences, to find new sequences in the same approximate location as the described markers.
[0080] In addition to SSR's, FLPs and SNPs, as described above, other types of molecular markers are also widely used, including but not limited to expressed sequence tags (ESTs), SSR markers derived from EST sequences, randomly amplified polymorphic DNA (RAPD), and other nucleic acid based markers.
[0081] Isozyme profiles and linked morphological characteristics can, in some cases, also be indirectly used as markers. Even though they do not directly detect DNA differences, they are often influenced by specific genetic differences. However, markers that detect DNA variation are far more numerous and polymorphic than isozyme or morphological markers (Tanksley 1983 Plant Molecular Biology Reporter 1 : 3-8).
[0082] Sequence alignments or contigs may also be used to find sequences upstream or downstream of the specific markers listed herein. These new sequences, close to the markers described herein, are then used to discover and develop functionally equivalent markers. For example, different physical and/or genetic maps are aligned to locate equivalent markers not described within this disclosure but that are within similar regions. These maps may be within the species, or even across other species that have been genetically or physically aligned.
[0083] In general, MAS uses polymorphic markers that have been identified as having a significant likelihood of co- segregation with a trait conferred by the R gene disclosed herein (e.g., a sequence that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs:l-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478). Such markers are presumed to map near a gene
or genes that give the plant its disease resistant phenotype, and are considered indicators for the desired trait, or markers. Plants are tested for the presence of a desired allele in the marker, and plants containing a desired genotype at one or more loci are expected to transfer the desired genotype, along with a desired phenotype, to their progeny. Thus, plants with one or more disease resistance R-genes disclosed herein may be selected for by detecting one or more marker alleles, and in addition, progeny plants derived from those plants can also be selected. Hence, a plant containing a desired genotype in a given chromosomal region (i.e. a genotype associated with disease resistance) is obtained and then crossed to another plant. The progeny of such a cross would then be evaluated genotypically using one or more markers and the progeny plants with the same genotype in a given chromosomal region would then be selected as having disease resistance.
[0084] The SNPs could be used alone or in combination (i.e. a SNP haplotype) to select for a favorable R gene allele associated with disease resistance. For example, a SNP haplotype can include a combination of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 of the markers for the R gene disclosed herein. A SNP haplotype can also include a combination of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39 of such markers for one or more R gene disclosed herein. [0085] The skilled artisan would expect that there might be additional polymorphic sites at marker loci in and around a chromosome marker identified by the methods disclosed herein, wherein one or more polymorphic sites is in linkage disequilibrium (LD) with an allele at one or more of the polymorphic sites in the haplotype and thus could be used in a marker assisted selection program to introgress a gene allele or genomic fragment of interest. Two particular alleles at different polymorphic sites are said to be in LD if the presence of the allele at one of the sites tends to predict the presence of the allele at the other site on the same chromosome (Stevens 1999 Mol. Diag. 4:309-17). The marker loci can be located within 5 cM, 2 cM, or 1 cM (on a single meiosis based genetic map) of the disease resistance trait QTL comprising an R-gene disclosed herein.
[0086] Allelic frequency (and hence, haplotype frequency) can differ from one germplasm pool to another. Germplasm pools vary due to maturity differences, heterotic groupings, geographical distribution, etc. As a result, SNPs and other polymorphisms may not be informative in some germplasm pools.
[0087] Proteins and Variants and Fragments Thereof. A “recombinant protein” is used herein to refer to a protein that is no longer in its natural environment, for example in vitro or in a recombinant bacterial or plant host cell; a protein that is expressed from a polynucleotide
that has been edited from its native version; or a protein that is expressed from a polynucleotide in a different genomic position relative to the native sequence. Provided herein are R-gene encoded polypeptides, including a polypeptides having an amino acid sequence that has at least about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater sequence identity to one of SEQ ID NOs: 1740-3478. In some embodiments the sequence identity is against the full-length sequence of a polypeptide. The term “about” when used herein in context with percent sequence identity means +/- 1.0 percentage point, relative to the recited percentage.
[0088] “Substantially free of cellular material” as used herein refers to a polypeptide including preparations of protein having less than about 30%, 20%, 10% or 5% (by dry weight) of non-target protein (also referred to herein as a “contaminating protein”).
[0089] “Fragments” or “biologically active portions” include polypeptide or polynucleotide fragments comprising sequences sufficiently identical to an R gene or R gene encoded polypeptide disclosed herein, respectively, and that exhibit disease resistance when expressed in a plant.
[0090] “Variants” as used herein refers to proteins or polypeptides having an amino acid sequence that is at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater identical to the parental amino acid sequence, e.g., one of SEQ ID NOs: 1740-3478.
[0091 ] Methods for such manipulations are generally known in the art. For example, amino acid sequence variants of a polypeptide may be prepared by mutations in the DNA. This may also be accomplished by one of several forms of mutagenesis, such as for example site-specific double strand break technology, and/or in directed evolution. In some aspects, the changes encoded in the amino acid sequence will not substantially affect the function of the protein. Such variants will possess the desired activity. However, it is understood that the ability of an R gene-encoded polypeptide to confer disease resistance may, in some cases, be improved by the use of such techniques upon the compositions of this disclosure.
[0092] Nucleic Acid Molecules and Variants and Fragments Thereof. Isolated or recombinant nucleic acid molecules comprising R-gene disclosed herein as well as active portions thereof, as well as nucleic acid molecules sufficient for use as hybridization probes to identify R-gene by sequence homology are provided. As used herein, the term “nucleic acid molecule” refers to DNA molecules (e.g., recombinant DNA, cDNA, genomic DNA, plastid DNA, mitochondrial DNA) and RNA molecules (e.g., mRNA) and analogs of the DNA or
RNA generated using nucleotide analogs. In some examples, the nucleic acid molecule can be single-stranded. In some examples, the nucleic acid molecule can be double-stranded.
[0093] An “isolated” nucleic acid molecule (e.g., RNA or DNA) is used herein to refer to a nucleic acid sequence (e.g., RNA or DNA) that is no longer in its natural environment, for example in vitro. A “recombinant” nucleic acid molecule (e.g., RNA or DNA) is used herein to refer to a nucleic acid sequence (e.g., RNA or DNA) that is in a recombinant bacterial or plant host cell; has been edited from its native sequence; or is located in a different location than the native sequence. In some embodiments, an “isolated” or “recombinant” nucleic acid is free of sequences (preferably protein encoding sequences) that naturally flank the nucleic acid (i.e., sequences located at the 5' and 3' ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived. For purposes of the disclosure, “isolated” or “recombinant” when used to refer to nucleic acid molecules excludes isolated chromosomes. For example, in various embodiments, the recombinant nucleic acid molecules can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of nucleic acid sequences that naturally flank the R gene nucleic acid molecule in genomic DNA of the cell from which the R gene is derived.
[0094] In some embodiments, an isolated nucleic acid molecule comprising an R gene has one or more change in the nucleic acid sequence compared to the native or genomic nucleic acid sequence. In some embodiments, the change in the native or genomic nucleic acid sequence includes but is not limited to: changes in the nucleic acid sequence due to the degeneracy of the genetic code; changes in the nucleic acid sequence due to the amino acid substitution, insertion, deletion and/or addition compared to the native or genomic sequence; removal of one or more intron; deletion of one or more upstream or downstream regulatory regions; and deletion of the 5’ and/or 3’ untranslated region associated with the genomic nucleic acid sequence. In some embodiments, the nucleic acid molecule encoding one of SEQ ID NOs: 1740-3478 is a non-genomic sequence.
[0095] A variety of polynucleotides comprising R gene disclosed herein are contemplated. Such polynucleotides are useful for production of encoded polypeptides in host cells when operably linked to a suitable promoter, transcription termination and/or polyadenylation sequences. Such polynucleotides are also useful as probes for isolating homologous or substantially homologous polynucleotides that are R genes or related to R genes disclosed herein.
[0096] Provided herein are nucleic acid molecules encoding one of SEQ ID NOs: 1740- 3478, and variants, fragments and complements thereof. “Complement” is used herein to refer
to a nucleic acid sequence that is sufficiently complementary to a given nucleic acid sequence such that it can hybridize to the given nucleic acid sequence to thereby form a stable duplex. A reverse complement is a complement formed by exchanging each A with T, T with A, C with G, and G with C in a sequence and then reversing the 5’ to 3’ order of the exchanged sequence, such that the reverse complement of 5’-ACCTGAG-3’ is 5’-CTCAGGT-3’. “Polynucleotide sequence variants” is used herein to refer to a nucleic acid sequence that except for the degeneracy of the genetic code encodes the same polypeptide.
[0097] In some examples, the nucleic acid molecule comprising an R gene is a non- genomic nucleic acid sequence. As used herein a “non-genomic nucleic acid sequence” or “non-genomic nucleic acid molecule” or “non-genomic polynucleotide” refers to a nucleic acid molecule that has one or more change in the nucleic acid sequence compared to a native or genomic nucleic acid sequence. In some examples, the change to a native or genomic nucleic acid molecule includes but is not limited to: changes in the nucleic acid sequence due to the degeneracy of the genetic code; optimization of the nucleic acid sequence for expression in plants; changes in the nucleic acid sequence to introduce at least one amino acid substitution, insertion, deletion and/or addition compared to the native or genomic sequence; removal of one or more intron associated with the genomic nucleic acid sequence; insertion of one or more heterologous introns; deletion of one or more upstream or downstream regulatory regions associated with the genomic nucleic acid sequence; insertion of one or more heterologous upstream or downstream regulatory regions; deletion of the 5’ and/or 3’ untranslated region associated with the genomic nucleic acid sequence; insertion of a heterologous 5’ and/or 3’ untranslated region; and modification of a polyadenylation site. In some examples, the non- genomic nucleic acid molecule is a synthetic nucleic acid sequence.
[0098] In some examples, the nucleic acid molecule comprising an R gene disclosed herein is a non-genomic polynucleotide having a nucleotide sequence having at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater identity, to a nucleic acid sequence encoding one of SEQ ID NOs: 1740-3478, wherein the R gene can confer disease resistance activity when expressed in a plant.
[0099] In some examples, the nucleic acid molecule encodes a polypeptide variant comprising one or more amino acid substitutions relative to the amino acid sequence of one of SEQ ID NOs: 1740-3478.
T1
[0100] Nucleic acid molecules that are fragments of these R gene nucleic acid sequences are also encompassed by the disclosure. “Fragment” as used herein refers to a portion of the nucleic acid sequence encoding one of SEQ ID NOs: 1740-3478. A fragment of a nucleic acid sequence may encode a biologically active portion of one of SEQ ID NOs: 1740-3478 or it may be a fragment that can be used as a hybridization probe or PCR primer using methods disclosed below. Nucleic acid molecules that are fragments of a nucleic acid sequence encoding a polypeptide comprising at least about 150, 180, 210, 240, 270, 300, 330, 360, 400, 450, or 500 contiguous nucleotides or up to the number of nucleotides present in a full-length nucleic acid sequence encoding one of SEQ ID NOs: 1740-3478 (e.g., one of SEQ ID NOs: 1-1739, respectively), depending upon the intended use. “Contiguous nucleotides” is used herein to refer to nucleotide residues that are immediately adjacent to one another. Fragments of the nucleic acid sequences will encode protein fragments that retain the biological activity of the R gene-encoded polypeptide and, hence, retain disease resistance. “Retains disease resistance” is used herein to refer to a polypeptide having at least about 10%, at least about 30%, at least about 50%, at least about 70%, 80%, 90%, 95% or higher of the disease resistance of the full- length R gene disclosed herein.
[0101] “Percent (%) sequence identity” with respect to a reference sequence (subject) is determined as the percentage of amino acid residues or nucleotides in a candidate sequence (query) that are identical with the respective amino acid residues or nucleotides in the reference sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity, and not considering any amino acid conservative substitutions as part of the sequence identity. Alignment for purposes of determining percent sequence identity can be achieved in various ways, for instance, using publicly available computer software such as BLAST, BLAST-2. Those skilled in the art can determine appropriate parameters for aligning sequences, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (e.g., percent identity of query sequence = number of identical positions between query and subject sequences/total number of positions of query sequence * 100).
[0102] In some examples, a polynucleotide disclosed herein encodes a polypeptide comprising an amino acid sequence having at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater identity across the entire length of the amino acid sequence of one of SEQ ID NOs: 1740-3478.
In some examples, such a polynucleotide comprises genomic sequence, including introns, regulatory elements, and untranslated regions.
[0103] The disclosure also provides nucleic acid molecules encoding variants of the R gene-encoded polypeptide disclosed herein. “Variants” of R gene include sequences that encode the R polypeptides disclosed herein (such as one of SEQ ID NOs: 1740-3478) or a fragment or variant thereof, but that differ conservatively because of the degeneracy of the genetic code as well as those that are sufficiently identical as discussed above. Naturally occurring allelic variants can be identified with the use of well-known molecular biology techniques, such as polymerase chain reaction (PCR) and hybridization techniques as outlined below. Variant nucleic acid sequences also include synthetically derived nucleic acid sequences that have been generated, for example, by using site-directed mutagenesis but which still encode the R-gene polypeptidea disclosed herein.
[0104] The skilled artisan will further appreciate that changes can be introduced by mutation of the nucleic acid sequences thereby leading to changes in the amino acid sequence of the R-gene encoded polypeptide (e.g., one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof), without altering the biological activity of the proteins. Thus, variant nucleic acid molecules can be created by introducing one or more nucleotide substitutions, additions and/or deletions into the corresponding nucleic acid sequence disclosed herein, such that one or more amino acid substitutions, additions or deletions are introduced into the encoded protein. Mutations can be introduced by standard techniques, such as site-directed mutagenesis and PCR-mediated mutagenesis. Such variant nucleic acid sequences are also encompassed by the present disclosure.
[0105] Alternatively, variant nucleic acid sequences can be made by introducing mutations randomly along all or part of the coding sequence, such as by saturation mutagenesis, and the resultant mutants can be screened for ability to confer activity to identify mutants that retain activity. Following mutagenesis, the encoded protein can be expressed recombinantly, and the activity of the protein can be determined using standard assay techniques.
[0106] The polynucleotides of the disclosure and fragments thereof are optionally used as substrates for a variety of recombination and recursive recombination reactions, in addition to standard cloning methods as set forth in, e.g., Ausubel, Berger and Sambrook, i.e., to produce additional polypeptide homologues and fragments thereof with desired properties. A variety of such reactions are known. Methods for producing a variant of any nucleic acid listed herein comprising recursively recombining such polynucleotide with a second (or more) polynucleotide, thus forming a library of variant polynucleotides are also examples of the
disclosure, as are the libraries produced, the cells comprising the libraries and any recombinant polynucleotide produced by such methods. Additionally, such methods optionally comprise selecting a variant polynucleotide from such libraries based on activity, as is wherein such recursive recombination is done in vitro or in vivo.
[0107] A variety of diversity generating protocols, including nucleic acid recursive recombination protocols are available. The procedures can be used separately, and/or in combination to produce one or more variants of a nucleic acid or set of nucleic acids, as well as variants of encoded proteins. Individually and collectively, these procedures provide robust, widely applicable ways of generating diversified nucleic acids and sets of nucleic acids (including, e.g., nucleic acid libraries) useful, e.g., for the engineering or rapid evolution of nucleic acids, proteins, pathways, cells and/or organisms with new and/or improved characteristics.
[0108] While distinctions and classifications are made in the course of the ensuing discussion for clarity, it will be appreciated that the techniques are often not mutually exclusive. Indeed, the various methods can be used singly or in combination, in parallel or in series, to access diverse sequence variants.
[0109] The result of any of the diversity generating procedures described herein can be the generation of one or more nucleic acids, which can be selected or screened for nucleic acids with or which confer desirable properties or that encode proteins with or which confer desirable properties. Following diversification by one or more of the methods herein or otherwise available to one of skill, any nucleic acids that are produced can be selected for a desired activity or property, e.g. such activity at a desired pH, etc. This can include identifying any activity that can be detected, for example, in an automated or automatable format, by any of the assays in the art. A variety of related (or even unrelated) properties can be evaluated, in serial or in parallel, at the discretion of the practitioner.
[0110] The nucleotide sequences disclosed herein can also be used to isolate corresponding sequences from a different source. In this manner, methods such as PCR, hybridization, and the like can be used to identify such sequences based on their sequence homology to the sequences identified by the methods disclosed herein. Sequences that are selected based on their sequence identity to the entire sequences set forth herein or to fragments thereof are encompassed by the disclosure. Such sequences include sequences that are orthologs of the sequences. The term “orthologs” refers to genes derived from a common ancestral gene and which are found in different species as a result of speciation. Genes found in different species
are considered orthologs when their nucleotide sequences and/or their encoded protein sequences share substantial identity as defined elsewhere herein.
[0111] In a PCR approach, oligonucleotide primers can be designed for use in PCR reactions to amplify corresponding DNA sequences from cDNA or genomic DNA extracted from any organism of interest. Methods for designing PCR primers and PCR cloning are disclosed in Sambrook et al. 1989 Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, New York), hereinafter “Sambrook”. See also, Innis et al., eds. 1990 PCR Protocols: A Guide to Methods and Applications (Academic Press, New York); Innis and Gelfand, eds. 1995 PCR Strategies (Academic Press, New York); and Innis and Gelfand, eds. 1999 PCR Methods Manual (Academic Press, New York). Known methods of PCR include, but are not limited to, methods using paired primers, nested primers, single specific primers, degenerate primers, gene-specific primers, vector-specific primers, partially-mismatched primers, and the like.
[0112] In hybridization methods, all or part of the nucleic acid sequence can be used to screen cDNA or genomic libraries. Methods for construction of such cDNA and genomic libraries are disclosed in Sambrook and Russell 2001, supra. The so-called hybridization probes may be genomic DNA fragments, cDNA fragments, RNA fragments or other oligonucleotides and may be labeled with a detectable group such as 32P or any other detectable marker, such as other radioisotopes, a fluorescent compound, an enzyme or an enzyme cofactor. Probes for hybridization can be made by labeling synthetic oligonucleotides based on the known polypeptide-encoding nucleic acid sequences disclosed herein. Degenerate primers designed on the basis of conserved nucleotides or amino acid residues in the nucleic acid sequence or encoded amino acid sequence can additionally be used. The probe typically comprises a region of nucleic acid sequence that hybridizes under stringent conditions to at least about 12, at least about 25, at least about 50, 75, 100, 125, 150, 175 or 200 consecutive nucleotides of nucleic acid sequences encoding polypeptides or a fragment or variant thereof. Methods for the preparation of probes for hybridization and stringency conditions are disclosed in Sambrook and Russell 2001, supra.
[0113] Nucleotide Constructs, Expression Cassettes and Vectors. The use of the term “construct” in connection with isolated and/or heterologous polynucleotides herein is not intended to limit the disclosure to constructs comprising DNA. Polynucleotide constructs, particularly polynucleotides and oligonucleotides composed of ribonucleotides and combinations of ribonucleotides and deoxyribonucleotides, may also be employed in the methods disclosed herein. The isolated polynucleotide constructs, nucleic acids, and
nucleotide sequences disclosed herein additionally encompass all complementary forms (e.g., the reverse complement) of each sequence disclosed for such a construct. Further, polynucleotide constructs and nucleotide sequences disclosed herein can encompass any such constructs, molecules, and sequences suitable for use in a method for transforming plant material disclosed herein. Such constructs can include naturally occurring molecules and/or synthetic analogues. The disclosed nucleotide constructs, nucleic acids, and nucleotide sequences also encompass all forms of nucleotide constructs including, but not limited to, single-stranded forms, double-stranded forms, hairpins, stem-and-loop structures and the like. [0114] Transformed organisms disclosed herein include plant cells, bacteria, yeast, baculovirus, protozoa, nematodes and algae. The transformed organism comprises a disclosed sequence (e.g., as part of a construct, expression cassette, or vector comprising the nucleotide sequence disclosed herein which are associated with increased disease resistance.
[0115] The disclosed sequences can be used in constructs for expression in the organism of interest. Constructs can include 5’ and 3’; regulatory sequences operably linked to an R gene sequence, variant or fragment disclosed herein. The term “operably linked” as used herein refers to a functional linkage between a promoter and/or a regulatory sequence and a second sequence, wherein the promoter and/or regulatory sequence initiates, mediates, and/or affects transcription of the DNA sequence corresponding to the second sequence. Generally, operably linked means that the nucleic acid sequences being linked are contiguous and, where necessary, to join two protein coding regions in the same reading frame. The construct may additionally contain at least one additional gene to be cotransformed into the organism. Alternatively, the additional gene(s) can be provided on multiple DNA constructs.
[0116] Such a DNA construct is provided with a plurality of restriction sites for insertion of the polypeptide gene sequence of the disclosure to be under the transcriptional regulation of the regulatory regions. The DNA construct may additionally contain selectable marker genes. [0117] The DNA construct will generally include in the 5' to 3' direction of transcription: a transcriptional and translational initiation region (e.g., a promoter), a DNA sequence of the embodiments, and a transcriptional and translational termination region (e.g., termination region) functional in the organism serving as a host. The transcriptional initiation region (e.g., the promoter) may be native, analogous, foreign or heterologous to the host organism and/or to the sequence of the embodiments. Additionally, the promoter or regulatory sequence may be the natural sequence or alternatively a synthetic sequence. The term “foreign” as used herein indicates that the promoter is not found in the native organism into which the promoter is introduced. As used herein, the term “heterologous” in reference to a sequence means a
sequence that originates from a foreign species or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. As used herein, a chimeric gene comprises a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence. Where the promoter is a native or natural sequence, the expression of the operably linked sequence is altered from the wild-type expression, which results in an alteration in phenotype.
[0118] In some embodiments the DNA construct comprises a polynucleotide encoding one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof. In some embodiments the DNA construct comprises a polynucleotide encoding a fusion protein that includes one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof.
[0119] A DNA construct may also include a transcriptional enhancer sequence. An “enhancer” refers to a DNA sequence which can stimulate promoter activity, and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissuespecificity of a promoter. Various enhancers include, for example, introns with gene expression enhancing properties in plants (US Patent Application Publication Number 2009/0144863, the ubiquitin intron (i.e., the maize ubiquitin intron 1 (see, for example, NCBI sequence S94464)), the omega enhancer or the omega prime enhancer (Gallie et al. 1989 Molecular Biology ofRNA ed. Cech (Liss, New York) 237-256 and Gallie et al. 1987 Gene 60: 217-25), the CaMV 35S enhancer (see, e.g., Benfey et al. 1990 EMBO J. 9: 1685-96) and the enhancers of US Patent Number 7,803,992 may also be used. The above list of transcriptional enhancers is not meant to be limiting. Any appropriate transcriptional enhancer can be used in the embodiments.
[0120] The termination region may be native with the transcriptional initiation region, may be native with the operably linked DNA sequence of interest, may be native with the plant host or may be derived from another source (i.e., foreign or heterologous to the promoter, the sequence of interest, the plant host or any combination thereof).
[0121] Convenient termination regions are available from the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also, Guerineau et al. 1991 Mol. Gen. Genet. 262: 141-144; Proudfoot 1991 Cell 64:671-674; Sanfacon et al. 1991 Genes Dev. 5: 141-149; Mogen et al. 1990 Plant Cell 2: 1261-1272; Munroe et al. 1990 Gene 91 :151-158; Ballas et al. 1989 Nucleic Acids Res. 17:7891-7903 and Joshi et al. 1987 Nucleic Acid Res. 15:9627-9639.
[0122] Where appropriate, a nucleic acid may be optimized for increased expression in the host organism. Thus, where the host organism is a plant, the synthetic nucleic acids can be
synthesized using plant-preferred codons for improved expression. See, for example, Campbell and Gowri 1990 Plant Physiol. 92: 1-11 for a discussion of host-preferred usage. For example, although nucleic acid sequences of the embodiments may be expressed in both monocotyledonous and dicotyledonous plant species, sequences can be modified to account for the specific preferences and GC content preferences of monocotyledons or dicotyledons as these preferences have been shown to differ (Murray et al. 1989 Nucleic Acids Res. 17:477- 498). Thus, the plant-preferred for a particular amino acid may be derived from known gene sequences from plants.
[0123] Additional sequence modifications are known to enhance gene expression in a cellular host. These include elimination of sequences encoding spurious polyadenylation signals, exon-intron splice site signals, transposon-like repeats, and other well -characterized sequences that may be deleterious to gene expression. The GC content of the sequence may be adjusted to levels average for a given cellular host, as calculated by reference to known genes expressed in the host cell. The term “host cell” as used herein refers to a cell which contains a vector and supports the replication and/or expression of the expression vector is intended. Host cells may be prokaryotic cells such as E. coli or eukaryotic cells such as yeast, insect, amphibian or mammalian cells or monocotyledonous or dicotyledonous plant cells. An example of a monocotyledonous host cell is a maize host cell. When possible, the sequence is modified to avoid predicted hairpin secondary mRNA structures.
[0124] In preparing the expression cassette, the various DNA fragments may be manipulated so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved.
[0125] A number of promoters can be used in the practice of the embodiments. The promoters can be selected based on the desired outcome. The nucleic acids can be combined with constitutive, tissue-preferred, inducible
[0126] Plant Transformation. The methods of the embodiments involve introducing a polypeptide or polynucleotide into a plant. “Introducing” is as used herein means presenting to the plant the polynucleotide or polypeptide in such a manner that the sequence gains access to the interior of a cell of the plant. The methods of the embodiments do not depend on a particular method for introducing a polynucleotide or polypeptide into a plant, only that the
polynucleotide(s) or polypeptide(s) gains access to the interior of at least one cell of the plant. Methods for introducing polynucleotide(s) or polypeptide(s) into plants include, but are not limited to, stable transformation methods, transient transformation methods, and virus- mediated methods.
[0127] “Stable transformation” as used herein means that the nucleotide construct introduced into a plant integrates into the genome of the plant and is capable of being inherited by the progeny thereof. “Transient transformation” as used herein means that a polynucleotide is introduced into the plant and does not integrate into the genome of the plant or a polypeptide is introduced into a plant. “Plant” as used herein refers to whole plants, plant organs (e.g., leaves, stems, roots, etc.), seeds, plant cells, propagules, embryos and progeny of the same. Plant cells can be differentiated or undifferentiated (e.g. callus, suspension culture cells, protoplasts, leaf cells, root cells, phloem cells and pollen).
[0128] Transformation protocols as well as protocols for introducing nucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation. Suitable methods of introducing nucleotide sequences into plant cells and subsequent insertion into the plant genome include microinjection (Crossway et al. (1986) Biotechniques 4:320-334), electroporation (Riggs et al. 1986 Proc. Natl. Acad. Sci. USA 83:5602-5606), dgrotocterzwm-mediated transformation (US Patent Numbers 5,563,055 and 5,981,840), direct gene transfer (Paszkowski et al. 1984 EMBO J. 3:2717-2722) and ballistic particle acceleration (see, for example, US Patent Numbers 4,945,050; 5,879,918; 5,886,244 and 5,932,782; Tomes et al. 1995 in Plant Cell, Tissue, and Organ Culture: Fundamental Methods, ed. Gamborg and Phillips (Springer-Verlag, Berlin) and McCabe et al. 1988 Biotechnology 6:923-926) and Led transformation (WO 00/28058). For potato transformation see, Tu et al. 1998 Plant Molecular Biology 37:829-838 and Chong et al. 2000 Transgenic Research 9:71-78. Additional transformation procedures can be found in Weissinger et al. 1988 Ann. Rev. Genet. 22:421-477; Sanford et al. 1987 Particulate Science and Technology 5:27-37 (onion); Christou et al. 1988 Plant Physiol. 87:671-674 (soybean); McCabe et al. 1988 Bio/Technology 6:923-926 (soybean); Finer and McMullen 1991 In Vitro Cell Dev. Biol. 27P: 175-182 (soybean); Singh et al. 1998 Theor. Appl. Genet. 96:319-324 (soybean); Datta et al. 1990 Biotechnology 8:736-740 (rice); Klein et al. 1988 roc. Natl. Acad. Sci. USA 85:4305- 4309 (maize); Klein et al. 1988 Biotechnology 6:559-563 (maize); US Patent Numbers 5,240,855; 5,322,783 and 5,324,646; Klein et al. (1988) Plant Physiol. 91 :440-444 (maize); Fromm et al. 1990 Biotechnology 8:833-839 (maize); Hooykaas-Van Slogteren et al. 1984 Nature (London) 311 :763-764; US Patent Number 5,736,369 (cereals); Bytebier et al. 1987
Proc. Natl. Acad. Sci. USA 84:5345-5349 (Liliaceae); De Wet et al. 1985 in The Experimental Manipulation of Ovule Tissues, ed. Chapman et al. (Longman, New York), pp. 197-209 (pollen); Kaeppler et al. 1990 Plant Cell Reports 9:415-418 and Kaeppler et al. 1992 Theor. AppL Genet. 84:560-566 (whisker-mediated transformation); D'Halluin et al. 1992 Plant Cell 4: 1495-1505 (electroporation); Li et al. (1993) Plant Cell Reports 12:250-255 and Christou and Ford 1995 Annals of Botany 75:407-413 (rice); Osjoda et al. 1996 Nature Biotechnology 14:745-750 (maize via Agrobacterium tumefaciens).
[0129] Methods to Introduce Genome Editing Technologies into Plants. In some embodiments, R genes (e.g., encoding one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof) can be introduced into the genome of a plant using genome editing technologies. For example, the identified polynucleotides can be introduced into a desired location in the genome of a plant through the use of double-stranded break technologies such as TALENs, meganucleases, zinc finger nucleases, CRISPR-Cas, and the like. For example, the R gene can be introduced into a desired location in a genome using a CRISPR-Cas system, for the purpose of site-specific insertion. The desired location in a plant genome can be any desired target site for insertion, such as a genomic region amenable for breeding or may be a target site located in a genomic window with an existing trait of interest. Existing traits of interest could be either an endogenous trait or a previously introduced trait. Thus, for example, an R gene can be altered though gene editing in its native site to encode a R polypeptide having the amino acid sequence set forth in one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof. Alternatively or additionally, an R gene can be introduced by genome editing at a different genomic location. For example, a nucleotide construct comprising an R gene sequence encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478 (or a fragment or variant thereof) can be inserted at a genomic locus other than the R gene’s native genomic locus.
[0130] In some embodiments, where an unfavorable R gene allele has been identified in a genome, genome editing technologies may be used to alter or modify the polynucleotide sequence to make it a favorable R gene allele. Site specific modifications can be introduced into the desired R gene allele using any method for introducing site specific modification, including, but not limited to, through the use of gene repair oligonucleotides (e.g. US Publication 2013/0019349), or through the use of double-stranded break technologies such as TALENs, meganucleases, zinc finger nucleases, CRISPR-Cas, and the like. Such technologies can be used to modify the previously introduced polynucleotide through the insertion, deletion
or substitution of nucleotides within the introduced polynucleotide. Alternatively, doublestranded break technologies can be used to add additional nucleotide sequences to the introduced polynucleotide. Additional sequences that may be added include, additional expression elements, such as enhancer and promoter sequences. In another embodiment, genome editing technologies may be used to position additional disease resistant proteins in close proximity to the R gene sequence within the genome of a plant, in order to generate molecular stacks disease resistant proteins.
[0131] An “altered target site,” “altered target sequence.” “modified target site,” and “modified target sequence” are used interchangeably herein and refer to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence. Such “alterations” include, for example: (i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an insertion of at least one nucleotide, or (iv) any combination of (i) - (iii).
[0132] The following are examples of specific embodiments of some aspects of the invention. The examples are offered for illustrative purposes only and are not intended to limit the scope of the invention in any way.
EXAMPLES
[0133] NLR annotation and integrated domain analysis used in Examples below. Maize NAM genomes, gene annotations and predicted proteins were obtained from MaizeGDB (available at maizegdb.org web site). HMMer was utilized to identify conserved domains in the predicted protein sequences, with default settings. For canonical NLR domain annotations, the following Markov models were obtained from Pfam (at maizegdb.org/ web site): NB-ARC, RP8W, TIR 2, TIR l, TIR-like, Rx N (coiled-coil), LRR 1, LRR 2, LRR 3, LRR 4, LRR 5, LRR 6, LRR 8 and LRR 9. For integrated domain (ID) analysis, all Pfam hidden Markov models were utilized. Hits which overlapped canonical domains were removed, and similar domains were collapsed through custom python scripts and manual curation. The resulting set was filtered both loosely (E-value<0.1) and strictly (E-value<0.01, at least 40% coverage of the domain). For identification of NLR pseudogenes, each genome was translated into all 6 possible reading frames. HMMer was then used to search for NB-ARC domains and resulting partial hits within 10,000 bp were stitched together via custom python scripts to identify genomic regions with the potential to code for NLRs. Genomic hits which overlapped annotated genes were then removed by combining genomic hit information with GFF file information from each NAM founder line.
[0134] NLR clustering used in Examples below. All predicted proteins from the NAM founder lines were clustered to determine their relationships. Diamond (version 0.9.31.132) was first utilized to perform and all -by-all BLAST with default settings, followed by clustering with OrthAgogue (version 1.0.2) with strict orthologues (Buchfink et al. 2015 Nat Methods, 12: 59-60). Clusters were named using the convention “chromosome_position-in-MB_cluster- size”. Clusters were then manually examined for outliers where genes from different chromosomes were clustered together. NB-ARC domain clustering was carried out by pairwise alignment using MUSCLE, followed by construction of a phylogenetic tree with a maximum likelihood tree using 50 bootstraps, through MEGA software (version 10.0.5) (Kumar et al. 2018 Mol Biol Evol 35: 1547-1549).
[0135] Plant growth conditions used in Examples below. During seedling stage, all plants were grown in a greenhouse under 16h/8h light/dark conditions with a total of 25 umol/m2/day of total photosynthetic light, a vapor pressure deficit of 12 mm bar and day/night temperatures of 25/22 °C (week 1), 22/19 °C (week 2), 19/17 °C (week 3) and 17/14 °C (week 4). Because of the photoperiod sensitivity of some NAM founder lines, from week 5 until maturity, plants were grown under 12h/12h day/night conditions, at 26/20 °C day/night temperatures to reduce delayed flowering and maturity.
[0136] RNA-seq library construction used in Examples below. Leaves from plants grown under the conditions listed above were sampled at flowering stage (50 to 70 days, depending on the line), and their total RNA was isolated from ground frozen tissue with RNeasy (Qiagen Inc., Valencia, CA), according to manufacturer’s protocol. Total RNA was then analyzed for quality and quantity with the Agilent Bioanalyzer RNA Nano kit (Agilent Technologies, Santa Clara, CA) and normalized to lug input per sample. Sequencing libraries were prepared according to Illumina Inc. (San Diego, CA) TruSeq mRNA-Seq protocols. Messenger RNAs were isolated via attachment to oligo (dT) beads, fragmented and reverse transcribed into cDNA by random hexamer primers with Superscript II reverse transcriptase (Life Technologies, Carlsbad, CA). The resulting cDNAs were end repaired, 3 prime A-tailed and ligated with Illumina indexed TruSeq adapters. Ligated cDNA fragments were PCR amplified with Illumina TruSeq primers, purified with AmpureXP Beads (Beckman Coulter Genomics, Danvers, MA) and checked for quality and quantity with the Agilent TapeStation 4200 system with DI 000 ScreenTape. Libraries were combined into one sequencing pool and was normalized to 2nM. The pool was denatured according to Illumina sequencing protocols, hybridized and clustered on two flow cell lanes of a NovaSP flow cell using the NovaSeq 6000. Single-end fifty base sequences and eight base dual-index sequences were generated on the
NovaSeq 6000 according to Illumina protocols. Data was trimmed for quality with a minimum threshold of Q13 and the resulting sequences were split by index identifier. Sequencing data is available at the Sequence Read Archive (SRA) database, accession GSE206952.
[0137] Expression analysis used in Examples below. Publicly available RNA-seq expression data was obtained from a short-read repository (SRA, https://www.ncbi.nlm.nih.gov/sra, accessions ERX3793507-ERX3793986). RNA-seq reads obtained from SRA, as well as those generated through our own library construction and sequencing, were then quantified by running Salmon (version 1.1.0) against the transcriptome of each NAM founder line, with GC bias correction (Patro et al., 2017). Transcript expression per library was then converted to gene expression per tissue using the DESeq2 package in R (version 1.31.6) (Love et al., 2014). Since much of the SRA data only contained two replicates, quantifications which were significant (>0.3 FPKM) but differed by more than 2-fold between replicates, or were calculated from only one library, were noted as potentially unreliable in the supplemental data. Intra-cluster expression variability was assessed by calculating the average pairwise Manhattan distance of log-transformed expression values for each cluster, using spatial distance module of SciPy (version 1.5.4).
[0138] Shannon entropy assessment used in Examples below. Shannon entropy was calculated by first employing MUSCLE (version 3.8.31) to align all proteins within each cluster, followed by construction of a consensus sequence based on these alignments. The entropy of each position was then assessed using the following formula:
[0139] X = -XP(xi)logP(xi)
[0140] Where x equals each position within the consensus protein and X represents the sum of the variability at each position. For the diversity calculation, gaps in the alignment were treated as the 21 st amino acid. Consensus sequences were then assessed via HMMer to identify conserved domains, followed by calculation of the average Shannon entropy within each domain for each gene cluster.
[0141] Example 1 : Identification of NLRs.
[0142] A maize nested association mapping (NAM) population was created from 26 maize founder lines to identify the genetic basis of complex traits (Yu et al., 2008). HMMER software package was used to search for NLR domains in the 26 NAM founder lines. Wheeler et al. 2013, Bioinformatics 29(19): 2487-2489. NLR domains included NB-ARC, coiled-coil, RPW8, TIR and LRR (Eddy, 1995). A total of 2,658 genes containing an NB-ARC domain were found across the NAM founder lines, with an average number of 102 per line, in addition to an average of 47 genomic regions which did not contain annotated genes but which genomic
scans with HMMer indicated possessed the potential to encode NLRs. These results are largely in agreement with the recent findings of Huff ord et al. and highlight the relatively low NLR content of maize compared to other crop species, such as rice (Shang et al., 2022). Most of these genes also included LRRs (77.0 %) and N-terminal coiled-coils (77.9 %). Unlike in dicots, TIR and RPW8 domains were extremely rare, only appearing in 1.3% and 2.6% of NB- ARC containing genes, respectively. TIR domains were never found in combination with LRRs, while RPW8 NLRs almost always contained LRRs. A total of 14 potential domain architectures were found to occur in at least five different genes across the NAM founder lines (Fig. 1).
[0143] The majority of NLRs (57 %) had canonical structures, with a coiled-coil region, followed by an NB-ARC domain, terminating in a series of LRRs. Some alternative structures were abundant, including proteins containing only a coiled-coil and an NB-ARC domain (14.6 %), proteins containing only an NB-ARC domain and LRRs (11.4 %) and proteins with an NB- ARC and no other canonical NLR domains (6.1 %). Interestingly, several genes were identified that may be the result of a two-NLR fusion. For example, 25 of the NAM founder lines contained an NLR on chromosome 6 which had a coiled-coil-NB-ARC-LRR-NB-ARC-LRR structure, with a C-terminal integrated no apical meristem associated (NAM-associated) domain (Cheng et al., 2012).
[0144] Example 2: Integrated domains in NLRs of NAM founder lines.
[0145] HMMer was used to search for atypical integrated domains within the NAM NLR repertoire, i.e., the NAM’s NLRome. After identifying all domains via HMMer, custom python scripts were employed to filter out all hits that overlapped canonical NLR domains. The resulting set of potential atypical domains was then filtered loosely (e-value < 0.11) and strictly (e-value < 0.01 and at least 40% of the domain covered). The loosely filtered set contained a number of domains of unknown function and canonical domains with very poor coverage. Although the majority of these hits are likely false positives, some may represent true IDs that have undergone significant divergence after their neofunctionalization. After filtering for high confidence domain calls and collapsing redundant domains, a total of 19 strictly filtered unique integrated domains were found across all NAM NLRs (Fig. 2).
[0146] The most frequent integrated domain was a kinase, which appeared in two to three NLRs in each NAM founder line. The next most common was a paired amphipathic helix (PAH) domain, followed by a no apical meristem-associated domain (NAM-associated), which occurred in the unique two-NLR fusion gene described earlier. Although the unfiltered set included many low-quality domain hits found in only a single gene, the more strictly filtered
set only included two domains that appeared uniquely in a single gene in only one NAM founder line (zf-RVT and UvsW). A single NLR, occurring in 14 of the NAM founder lines, possessed two integrated domains, which were found in the middle of the protein between the NB-ARC and LRRs (Mei otic recombination factor, REC 104) and at the end of the protein (tyrosine kinase). Even when employing a loose filter, very little overlap was found with the Arabidopsis pan-NLRome, with only kinase and AAA-ATPase domains appearing in both sets (Van de Weyer et al., 2019). Surprisingly, a similarly low level of integrated domain overlap was found with the recently published rice NLRome, with only integrated NAM domains found in both sets (Shang et al., 2022).
[0147] Example 3: Distribution of NLRs in the NAM genomes reveals high PAY,
[0148] The distribution of all identified maize NLRs throughout the NAM genomes was analyzed with a view to understand whether any genomic regions contain a pseudogene in one line, and a fully functional gene in other lines. HMMer was used to scan all maize genomes for regions that potentially encode NB-ARC and LRR domains and removed those which overlapped with annotated genes. This analysis revealed potential pseudogenes present in all 26 lines, ranging from 35 in CML247 to 68 in CML103. The number of pseudogenes present in a given line was uncorrelated (r2 0.035) with the number of annotated genes, making it unlikely that variable annotation quality was the reason that some lines had higher pseudogene counts.
[0149] To assess whether pseudogenes in one line were closely related to functional genes in other lines, we extracted all nucleotides encoding the NB-ARC domains of all genes and pseudogenes. These sequences were aligned in a pairwise manner, and alignments were used to generate phylogenetic trees via the maximum likelihood method. All pairwise comparisons examined contained several pseudogenes from one line which clustered closely with genes from another line. Such clusters could be found across many different chromosomal locations. The B73 ol8W pairwise comparison revealed a number of these instances, including a chromosome 3 gene in B73 (Zm00001e018195) which is located at 129 MB. The M0I8W genome contains nucleotides that potentially encode an NB-ARC domain and which cluster very closely with the B73 gene NB-ARC domain (98.3% sequence identity), but no actual gene was found to be produced at this locus. Similar genomic/genic NB-ARC clusters were found throughout the genome, including Chrl (M0I8W Zm000034a005849), Chr2 (M0I8W Zm00034a016521), Chr4 (M0I8W Zm000034a031848), Chr6 (B73 Zm00001e031193), Chr7 (M0I8W Zm00034a051957) and ChrlO (B73 Zm00001e039226). Although mis-annotation could account for some cases of potential pseudogenization, this does not likely explain most
of them because all genomes were annotated in an identical manner, utilizing RNA-seq from the same set of tissues as part of the NAM sequencing project (Hufford et al., 2021).
[0150] To visualize the physical distances between and overall distribution of NLRs in the maize genome, we plotted the physical locations of all NLR genes and pseudogenes in the 26 NAM founder lines, with B73 shown as a representative line. The overall distribution of NLRs was mostly consistent across the different NAM founder lines. NLRs were found to be distributed as singletons and small groups throughout the genome, but many existed in a few large clusters of variable size in which many NLRs were concentrated in a small genomic space. For the purpose of this analysis, physically clustered genes where those considered to reside within 1 MB of another NLR. Expanding this distance to 2 MB did not dramatically change the number of clustered NLRs, with an average of 54% of NLRs clustered using a 1 MB cutoff and an average of 59% clustered using a 2 MB cutoff. This rate was very similar to the one found in the recently published rice NLRome (60.0%), despite the significantly different total NLR counts of the two species.
[0151] The largest physical cluster exists on chromosome 10, which contains from 6 (CML52) to 20 (CML333 and CML69) NLR genes and represents an average of 14 % of the total NLR content in maize genomes. This cluster also contained a large number of genomic NB-ARCs without definitive gene models, with the most extreme example being M37W, which had 17 NLR genes and 18 genomic regions with potential to encode NB-ARCs, but gene model derived from RNA-seq data. Unsurprisingly this cluster also had a high degree of PAV and allelic diversity. Sequence-based clustering revealed that this cluster is actually comprised of two groups which are distinct at the sequence level but in very close proximity physically. Furthermore, the percentage of genes containing NB-ARCs that existed within clusters was positively correlated (R2=0.50) with the total number of NB-ARC containing genes in a given genome, indicating that much of the differences in NLR counts between NAM genomes arose from expansion or contraction of clustered genes. Pairwise comparisons of NAM lines showed that on average, there was a difference of 8.2 clustered NB-ARC containing genes but only 3.3 singleton NB-ARC containing genes. The most extreme case can be seen when comparing Oh7B (91 NB-ARC genes) and Tzi8 (113 NB-ARC genes), where variation in clusters accounts for 100% of the 22 gene difference.
[0152] Although paired NLRs in a head-to-head configuration have been reported to exist frequently in other species (Grund et al., 2019, Lee et al., 2021, Stein et al., 2018), this analysis found only one potential NLR pair in the NAM founder lines. This pair was located at approximately 235 MB on chromosome 2 in ten of the NAM founder lines (B73:
ZmOOOOleOl 1300 and ZmOOOOleOl 1302) and was separated by around 8 kb in all lines. This distance is greater than that typically associated with paired NLRs (van Wersch and Li, 2019), and a small ribosomal protein was also annotated between the NLRs, although it had no detectible expression in any of the lines. Despite the distance and potential intervening gene, these NLRs appeared to be highly co-regulated, averaging an R2 of 0.97 across different tissue types. A survey of the teosinte genome also revealed only a single head-to-head NLR pair, which was separated by only 595 bps. Both genes in the pair lacked any LRR domains and did not share any significant sequence homology with the maize pair.
[0153] Clustering of the protein sequences of all NAM NLRs to determine their relationships was done using OrthoAgogue software application (Ekseth et al. 2014 Bioinformatics 30(5): 734-736. 158 clusters were identified. 20 were classified as “core” NLR clusters, with all NAM founder lines containing at least one member. A total of 15 clusters were present in all but one NAM founder line and 11 were missing from only two NAM founder lines. On average, clusters contained at least one member in 16 out of the 26 NAM founder lines, indicating that PAV was the norm for most NLRs across the lines.
[0154] Example 4: Chromosomal translocation of NLR,
[0155] NLRs are known to be a very diverse group of genes, with high presence-absence variation, high Ka/Ks ratios and frequent intergenic crossovers in other species. In order to assess NLR mobility in maize, the previously generated OrthoAug clusters were examined for outliers on different chromosomes or significantly different positions relative to other members. Although the vast majority of NLRs (98.7 %) resided in groups that contained similar positions on the same chromosome, several outliers were also identified. The most extreme outliers were found in Oh7B, which contained 11 NLRs on chromosome 9 that clustered with chromosome 10 NLRs from all other NAM founder lines. This apparent NLR translocation spanned genes that normally range from approximately 1.5 MB to 28.4 MB on chromosome 10. Earlier research using chromosomal probes identified this non-reciprocal translocation several years ago (Albert et al., 2010), but the fact that it resulted in the movement of the largest NLR cluster in maize was not previously reported. Interestingly, the translocated NLRs had an average similarity of 99.23 % protein identity to their closest Chrl 0 orthologs in the other NAM founder lines.
[0156] Several putative smaller translocations were also identified from our clustering approach, which we examined in more detail. Initial clustering was carried out at the protein level, but this approach requires genes to be expressed or predicted correctly, making it possible
for a gene which was only annotated correctly in one line to be incorrectly clustered because it was not assembled or predicted correctly in other lines. To address this possible false clustering, BLAST was used to search the genomic regions encoding NLRs which clustered with genes on other chromosomes against the genomes of all NAM founder lines.
[0157] Several rare translocations that were predicted from protein clustering were also reproduced at the nucleotide level, including a putative Chr2 to ChrlO translocation. One mixed cluster contains eight ChrlO genes and ten Chr2 genes, while the other contains three ChrlO and ten Chr2 genes. These proteins were reclustered using MUSCLE, followed by both nearest- neighbor joining and maximum likelihood models, which both gave the same result: two distinct clusters, which are formed from a mixture of Chr2 and ChrlO genes. See Edgar 2004 Nucleic Acids Res . 5(1): 1792-97 and Saitou and Nei, 1987 Mol. Biol. Evol. 4:406 -425. Within the two major clusters, sub-clusters did break out by chromosomal location, but these distances were relatively minor.
[0158] Besides transposition, an alternative or additional explanation may be that the rapidly evolving nature of NLRs caused two separate clusters to undergo convergent evolution. To assess the likelihood of this, we performed a separate analysis clustering only the NB-ARC domains, which are typically under selection pressure to remain unchanged. This analysis revealed the same clustering pattern as the full protein sequences, indicating that convergent evolution of the rapidly evolving portions of the proteins does not explain the similarity of these genes (Figure 4b). Subsequent expression analysis revealed relatively low Manhattan distances for pairwise comparisons within these clusters, providing further evidence for their relatedness (see “NLR gene expression”). Thus, while transposition remains the most likely explanation, the most closely related inter-chromosome homologs in the mixed clusters (M0I8W ChrlO Zm00034a 060252 and Tzi8 Chr2 Zm00042a013478) were only 89.6% identical and 92.3% similar.
[0159] Example 5: NLR Gene Expression.
[0160] Expression of NLR genes was tested in a variety of tissue types present as part of the recent maize NAM sequencing effort (Hufford et al. 2021 Science 373: 655-662). This analysis contained a total of 11 tissue types, including leaf, leaf base, leaf tip, shoot, root, plant embryo, endosperm, anther, tassel inflorescence and ear inflorescence. RNA-seq data was originally intended for transcriptome annotation and most tissues only contained two biological replicates, reducing the statistical power of differential expression testing. Therefore the data was used only to assess broad expression differences across tissues and have noted all cases where the two biological replicates are substantially divergent (> 2-fold difference), and a third
biological replicate would be required to get a more accurate expression estimate. The public data was also supplemented with additional RNA-seq libraries that contained four biological replicates from each NAM founder line constructed from R1 leaves, a developmental stage at which plants often encounter pathogen challenge in the field.
[0161] NLRs were found to be expressed at a significant level across all tissues surveyed (average fragments per kilobase of exon per million mapped fragments or FPKM of 6.75), with the highest average expression found in vegetative tissue. Endosperm had the lowest median NLR expression, followed by embryo, anther, ear inflorescence and tassel. All vegetative tissues had similar levels of average NLR expression, with shoot having the lowest average NLR expression (4.52 FPKM) and leaf base having the highest (7.85 FPKM). Surprisingly, although anther had a low median NLR expression (2.20 FPKM), one NLR present on chromosome 5 (B73: Zm00001e013342) was found in all lines and had the highest average expression of any NLR in any tissue, averaging 632.71 FPKM across the 26 lines. This gene had very low diversity across all lines, contained all canonical NLR domains and was also expressed well above the average NLR expression in all vegetative tissues (average 33.56 FPKM). This gene’s extremely high expression and low diversity indicate that it may not play a canonical NLR role, although its exact function is currently unknown. We also sought to determine if the various domain architectures of NLRs noted earlier possessed different average expression levels across tissues. In general, NLRs which lacked LRR domains were expressed at a slightly lower level than those containing the canonical coiled-coil, NB-ARC and LRR domains (average FPKM of 4.26 compared to 6.94). Interestingly, although only two distinct RPW8 NLRs were found in the NAM founder lines, they both possessed above average expression levels (average 18.98 FPKM).
[0162] To examine whether the clusters formed based on sequence-homology also had similar expression patterns, average Manhattan distances (which represent average differences in expression between each gene in each tissue for each pairwise comparison) were calculated for log transformed expression values within each group, and the resulting average cluster distances were then compared to each other to identify clusters with very low and very high expression diversity. Analysis of the resulting distances revealed that most sequence-based clusters shared similar expression patterns across the 11 different tissue types, although some clear outliers could be found. A clear cluster-wide tissue preference could be seen for some groups of NLRs, including a chromosome 2 root preferred cluster (B97: Zm00018a019897) and a chromosome 4 endosperm-specific cluster (CML228: Zm00022a035566), but vast majority of the sequence-based clusters (154/159) showed no strong tissue-specific expression.
The rare tissue-specific expression patterns may have bearing on resistance gene selection for diseases which are known to invade specific tissues. Interestingly, the ChrlO->Chr2 translocation which resulted in a sequence-based cluster containing a mixture of genes from different chromosomes also possessed a Manhattan distance which was similar to clusters containing non-mixed genes (23.2, compared to an average of 25.4).
[0163] Although most of the sequence-based clusters contained genes with similar expression patterns, a small number of clear outliers could be seen. An NLR found at approximately 24 MB on chromosome 10 of every maize line (B73: Zm00001e039226) was predominantly expressed in leaf base tissue (average 6.80 FPKM), with very little expression in other tissues of most maize lines (average 0.49 FPKM). However, four NAM lines (CML103, M37W, NC358 and Tx303) had high average expression of this gene in several other tissue types, including R1 leaf (34.72 FPKM), leap tip (47.66 FPKM), shoot (19.82 FPKM), anther (6.74 FPKM) and tassel (3.34 FPKM). Clustering using the protein sequences of this gene from each NAM line indicated high sequence conservation, with the expression outliers not clustering apart from other members at the sequence level. Expression of some other NLRs were lost in specific NAM lines, including an NLR located at approximately 31 MB on chromosome 2 (B73: Zm00001e009019), which had near ubiquitous expression across all tissues in all lines (average 7.74 FPKM), but had very little expression in CML52 (average FPKM 0.37). This gene was expressed at the leaf R1 (3.32 FPKM), indicating that it was not a pseudogene.
[0164] Example 7: Diversity within clusters at the whole gene and domain-level.
[0165] To investigate the degree of diversity within the previously generated sequence clusters, Shannon entropy was calculated at both the whole protein and protein-region level. Average Shannon entropy across whole proteins ranged from 0 to 1.27, with an average of 0.28. Although the cluster sizes examined ranged from 2 members up to 36, there was no significant correlation (r2=0.04) between average Shannon entropy and group size. In general, groups with integrated domains had slightly higher protein-wide entropy than those without (0.39 versus 0.25), which fits with their proposed role in direct interactions with rapidly changing pathogen effectors.
[0166] Entropy variation across the different regions of the NLR proteins within each cluster was assessed. After Shannon entropy was calculated at each position within each cluster, these values were binned into the following protein regions: coiled-coil, NB-ARC domain, spacer (region between NB-ARC and start of LRRs), LRRs, LRR spacers (regions in between LRRs), C-terminal and integrated domains. Coiled-coil regions, which have been
proposed to play a role in inter- and intra-protein interaction, tended to have higher entropy than the whole protein (0.31). It has previously been reported that NB-ARC domains tend to have higher conservation than average within NLRs, and this was broadly consistent across the clusters from the NAM founder lines (average Shannon entropy of 0.10). Spacer sequences between the NB-ARC domain and LRR region also had low entropy on average (0.18). LRRs have been noted to have higher than average diversity, and we also found that they had high average Shannon entropy within clusters (0.38). Interestingly, the spacer regions between different LRRs on average had a similar level of entropy (0.38), but on a per-cluster basis, diversity of LRRs was often uncorrelated with diversity of LRR spacer regions. For example, many clusters had high LRR spacer entropy and low LRR entropy, while others showed the opposite pattern. The C-terminal regions of NLRs, which harbor no annotated domains, tended to have the highest level of entropy (average 0.53).
[0167] Integrated domains on average had a higher level of entropy than whole proteins (0.45). Despite typically being highly conserved in other genes, integrated NLR kinase domains in NLRs had very high entropy (0.68), possibly due to their transition away from catalytic activity and towards effector binding (Lai et al., 2016). The highest level of entropy was found in the integrated NAM domains (0.78), including those found in the novel Chr6 fused NLR, which also possessed well above average whole protein entropy (0.68). Still, some integrated domains were highly conserved, and these conserved IDs could be found in clusters with both high and low diversity at the whole-protein level. For example, the Sec66 domain, which has been proposed to be involved in protein translocation, had extremely low entropy (0.04) within its 24-member cluster, despite this cluster having very high entropy at the whole protein level (0.69). Overall, the majority of clusters with IDs tended to have high entropy either within the ID, or at the whole protein level, which may be reflective of their proposed role in direct effector binding.
[0168] Finally, to get an overall picture of the average structure and entropy within maize NLRs, a “composite” NLR was constructed by averaging the entropy patterns of the most common domains. Average Shannon entropy of each position within each domain/protein region was calculated for all clusters (Fig. 3). For regions of variable size (spacers and C- terminal), the positions of entropy values were placed into 100 bins, with each bin representing 1% of the domain’s total size in a given cluster, before averaging. Only the four common LRR HMM models were included in the resulting composite NLR (LRR 1, LRR 4, LRR 6 and LRR 8). These LRR domains showed the second highest level of entropy, with only the C-
terminal domains having higher average values. The resulting composite NLR shows the clear variability of NLR entropy throughout the different canonical domains (Fig. 3).
[0169] The foregoing Examples identified the total NLR complement of the 26 maize NAM founder lines. NLRs were found to have very high levels of PAV and allelic diversity and were distributed unevenly across maize genomes, with a single cluster on chromosome 10 representing a significant portion of the total complement of almost all lines. The physical clustering seen across the maize genome correlates well with sequence-based clustering, enabling physical placement of NLRs based on sequence alone. The ability to infer physical location from sequence is beneficial for techniques such as resistance gene enrichment sequencing (RenSeq) (Jupe et al. 2013 Plant J, 76: 530-44). Despite a high level of variability in total NLR count, maize contained significantly fewer total NLRs than most other plant species, such as Arabidopsis (201), rice (453) and barley (468). (Van de Weyer et al. 2019 Cell, 178: 1260-1272 el 4; Shang et al. 2022 Cell Res, 32: 878-896; Li et al. 2021 Front Genet, 12: 694682). The majority clustered closely with NAM line NLRs and the increased count was partially driven by high heterozygosity.
[0170] Analysis of NLR expression across a wide array of tissue types indicated that genes in most-sequence based clusters shared tissue-specific expression patterns. The majority of NLRs were expressed ubiquitously, although some clear root-preferential clusters existed. A small number of outliers within sequence-based clusters exhibited different expression patterns compared to the rest of the cluster, including a chromosome 10 NLR which had leaf basespecific expression in most lines, but much broader expression in other lines. Such outliers may be indicative of neofunctionalization, although additional studies are needed to assess this possibility.
[0171] Evidence was found for mobility of NLRs within maize genomes, including several MB of chromosome 10 which were found to have translocated into chromosome 9 in Oh7B, taking with it the largest NLR cluster in maize. NLRs in this translocated cluster were found to have high similarity to NLRs from other lines lacking the translocation. Another translocation was also identified between chromosome 10 and chromosome 2, which was evidenced by similarity at the whole protein and NB-ARC domain levels, genomic sequence similarity and expression pattern similarity. These results are in keeping with findings in other species, where the uneven distribution of NLRs throughout the genome are thought the be the result of translocations (Borrelli et al., 2018). The results also indicate that translocated maize NLRs maintained high sequence similarity and similar expression patterns when compared to their non-translocated counterparts.
[0172] Genome editing represents a powerful new technology for crop improvement, enabling both precise editing of gene sequences, as well as the insertion of whole genes via template-based approaches (Svitashev et al., 2016). Here, we report that R gene translocations have previously occurred in maize, indicating that template-based genome editing mimics a process by which R genes have naturally moved among maize genomes. Furthermore, some of the translocated genes identified in this work were shown to have maintained similar expression profiles to their non-translocated counterparts, indicating that genes moved via technologies such as CRISPR are likely to retain their native expression patterns and functions. Taken together, these results strengthen the case for editing cloned resistance genes into susceptible maize lines in order to improve their disease tolerance (Yin and Qiu, 2019 Philos Trans R Soc LondB Biol Set 374: 20180322)
[0173] Analysis of the entropy of NLRs at the protein level revealed several interesting findings. Different clusters were found to have significantly different levels of entropy, which may affect their likelihood of containing sensor NLRs that are responsible for resistance phenotypes. Previous work has shown that NLRs that directly interact with pathogen effectors have higher levels of diversity, likely to keep pace with the rapid nature of pathogen effector evolution (Prigozhin and Krasileva, 2021 Plant Cell, 33: 998-1015; Sanchez-Vallet et al. 2018 Annu Rev Phytopathol, 56: 21-40). Additionally, entropy was found to vary substantially across different protein regions, with NB-ARC domains showing very low entropy, while LRRs, coiled-coil domains and C-terminal domains possessed very high entropy. These findings are largely in keeping with the proposed roles of the canonical NLR domains. Interestingly, genes containing integrated domains were found to possess above average entropy levels both within and outside of those domains, which lends further support to the direct effector interaction model which has been proposed for this class of NLRs (Kroj et al., 2016 New Phytol, 210: 618-26)
[0174] The survey of maize integrated domains identified several domains not previously known to be associated with NLRs. The integrated domains present in the NAM founder lines varied substantially, with only integrated kinases being present in all lines. The prevalence of this particular domain in NLRs is likely a response to targeting of kinase pathways by pathogen effectors, which has been reported in several other species . (Zhang et al. 2010 Cell Host Microbe, 7: 290-301; Shan et al. 2008 Cell Host Microbe, 4, 17-27) Although PAH domains have not been reported as effector targets, they are known to integrated into the NLRs of other species and may be targeted by pathogens due to their role in protein-protein interaction of transcription factors (Kroj et al. 2016, supra; Bowen et al. 2010 J Mol Biol, 395: 937-49). A
novel ID structure was found in a gene that contained both an N-terminal REC 104 domain and a mid-protein kinase domain. Despite the lack of overlap of actual domains with the recently published Arabidopsis pan-NLRome, roughly similar rates of integrated domains were found in the two species, with 5.0% of the Arabidopsis pan-NLRome reported to contain integrated domains and 6.7% of the NAM founder lines possessing at least one. Rice, which had an integrated domain frequency of 8.9%, only shared one integrated domain in common with maize (NAM), while rice and Arabidopsis shared no common integrated domains. This lack of conservation in integrated domain repertoire indicates that NLR IDs may undergo rapid rates of gene birth and death. Taken together, the findings disclosed herein substantially expand on the existing knowledge of maize NLR diversity and mobility.
Claims
1. A method of identifying a plant comprising one or more R genes associated with increased resistance to plant disease, said method comprising: a. obtaining a nucleic acid sample from a plant, seed, tissue or germplasm thereof; b. screening the sample for the presence of an R gene that comprises a polynucleotide that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740- 3478, wherein the R gene is associated with increased resistance to plant disease.
2. The method of claim 1, further comprising a. obtaining a nucleic acid sample from one or more plants, seeds, tissues or germplasm in a population; b. screening each sample in accordance with claim 1; and c. selecting one or more of the plants, seeds, tissues or germplasm having the R gene associated with increased resistance to plant disease.
3. The method of claim 1, further comprising a. obtaining a nucleic acid sample from one or more plants, seeds, tissues or germplasm, each sample being representative of a plurality of plants, seeds, tissues or germplasm; b. screening each sample in accordance with claim 1; and c. selecting one or more plurality of plants, seeds, tissues or germplasm, wherein the representative sample for each selected plurality has the R gene associated with increased resistance to plant disease.
4. A method of identifying a plant comprising a qualitative trait locus (QTL) associated with increased resistance to plant disease, said method comprising: a. obtaining a sample comprising nucleic acid from a plant; b. screening the sample for any of the following: i. a QTL comprising a gene that comprises a polynucleotide that (A) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (B) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740- 3478;
ii. one or more marker alleles within 5 cM of the polynucleotide of (i) and linked to and associated with (i); and c. detecting (i) or (ii) in the sample and thereby identifying the plant as having the QTL associated with increased resistance to plant disease.
5. A method of increasing resistance to plant disease in plant material comprising introducing into the genome of the plant material a heterologous nucleic acid sequence or expressing the heterologous nucleic acid sequence in the plant material, wherein the heterologous nucleic acid sequence (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478; wherein the plant expressing the heterologous polynucleotide has increased resistance to plant disease, as compared to a control plant that does not express the heterologous polynucleotide.
6. The method of claim 5, wherein the heterologous polynucleotide further comprises a heterologous promoter.
7. The method of claim 5, further comprising obtaining a progeny plant derived from the plant material expressing the heterologous polynucleotide, wherein said progeny plant comprises in its genome the heterologous polynucleotide and exhibits increased resistance to plant disease as compared to a control plant that does not express the heterologous polynucleotide.
8. The method of claim 5, wherein the plant material’ s genome is altered by gene editing or by transgenic modification to include the heterologous nucleic acid sequence.
9. The method of claim 5, wherein said plant is a monocot.
10. A method of generating a variant of an R gene, the method comprising the steps of: a. gene shuffling one or more nucleotide sequences of an R gene that (i) comprises a coding sequence selected from the group consisting of SEQ ID NO: 1-1739 or (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NO: 1740-3478; and b. testing the variants for increased resistance to plant disease.
11. The method of claim 10 wherein the method further comprises the steps of: a. introducing into a regenerable plant cell a recombinant construct comprising a variant of the R gene generated by the method of claim 9; b. regenerating a transgenic plant from the regenerable plant cell after step (a), wherein the transgenic plant comprises in its genome the recombinant DNA construct; and
c. selecting a transgenic plant of (b), wherein the transgenic plant comprises the recombinant DNA construct and exhibits increased resistance to plant disease, as compared to a control plant that does not comprise the recombinant DNA construct.
12. The method of claim 10, wherein said plant is selected from the group consisting of: Arabidopsis, maize, soybean, sunflower, sorghum, canola, wheat, alfalfa, cotton, rice, barley, millet, sugar cane, and switchgrass.
13. The method of claim 10, wherein said plant is a monocot.
14. The method of claim 13, wherein said monocot is maize.
15. A method of identifying an allelic variant of an R gene wherein said allelic variant is associated with increased resistance to plant disease, the method comprising the steps of: a. obtaining a population of plants, wherein said plants exhibit differing levels of resistance to plant disease; b. evaluating allelic variations with respect to a polynucleotide sequence that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478, or (iii) regulates the expression of the coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or the amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478; c. associating allelic variations for variations in resistance to plant disease; and d. identifying allelic variations associated with increased resistance to plant disease.
16. The method of claim 15, further comprising detecting said allelic variant associated with increased resistance to plant disease and selecting a plant if said allelic variant is detected.
17. The method of anyone of claims 1-16, wherein the plant disease is bacterial leaf blight and stalk rot; bacterial leaf spot; bacterial stripe; chocolate spot; goss's bacterial wilt and blight; holcus spot; purple leaf sheath; seed rot-seedling blight; bacterial wilt; com stunt; anthracnose leaf blight; gray leaf spot; aspergillus ear and kernel rot; banded leaf and sheath spot; black bundle disease; black kernel rot; borde bianco; brown spot; black spot; stalk rot; cephalosporium kernel rot; charcoal rot; corticium ear rot; curvularia leaf spot; didymella leaf spot; diplodia ear rot and stalk rot; seed rot; corn seedling blight; diplodia leaf spot or leaf streak; downy mildews; brown stripe downy mildew; crazy top downy mildew; green ear downy mildew; graminicola downy mildew; java downy mildew; Philippine downy mildew;
sorghum downy mildew; spontaneum downy mildew; sugarcane downy mildew; dry ear rot; ergot; horse's tooth; corn eyespot; fusarium ear and stalk rot; fusarium blight; seedling root rot; gibberella ear and stalk rot; gray ear rot; gray leaf spot; cercospora leaf spot; helminthosporium root rot; hormodendrum ear rot; cladosporium rot; hyalothyridium leaf spot; late wilt; northern leaf blight; white blast; crown stalk rot; corn stripe; northern leaf spot; helminthosporium ear rot; penicillium ear rot; com blue eye; blue mold; phaeocytostroma stalk rot and root rot; phaeosphaeria leaf spot; physalospora ear rot; botryosphaeria ear rot; pyrenochaeta stalk rot and root rot; pythium root rot; pythium stalk rot; red kernel disease; rhizoctonia ear rot; sclerotial rot; rhizoctonia root rot and stalk rot; rostratum leaf spot; common corn rust; southern corn rust; tropical com rust; sclerotium ear rot; southern blight; selenophoma leaf spot; sheath rot; shuck rot; silage mold; common smut; false smut; head smut; southern com leaf blight and stalk rot; southern leaf spot; tar spot; trichoderma ear rot and root rot; white ear rot, root and stalk rot; yellow leaf blight; zonate leaf spot; american wheat striate (wheat striate mosaic); barley stripe mosaic; barley yellow dwarf; brome mosaic; cereal chlorotic mottle; lethal necrosis (maize lethal necrosis disease); cucumber mosaic; johnsongrass mosaic; maize bushy stunt; maize chlorotic dwarf; maize chlorotic mottle; maize dwarf mosaic; maize leaf fleck; maize pellucid ringspot; maize rayado fino; maize red leaf and red stripe; maize red stripe; maize ring mottle; maize rough dwarf; maize sterile stunt; maize streak; maize stripe; maize tassel abortion; maize vein enation; maize wallaby ear; maize white leaf; maize white line mosaic; millet red leaf; or northern cereal mosaic.
18. A recombinant DNA construct comprising a polynucleotide operably linked to at least one heterologous regulatory sequence wherein said polynucleotide comprises a nucleic acid sequence that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs:l-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
19. The recombinant DNA construct of claim 19, wherein said at least one heterologous regulatory sequence comprises a promoter functional in a plant cell.
20. A transgenic plant or transgenic plant cell comprising the recombinant DNA construct of claim 18 or 19.
21. The transgenic plant or transgenic plant cell of claim 20, wherein said plant is selected from the group consisting of: Arabidopsis, maize, soybean, sunflower, sorghum, canola, wheat, alfalfa, cotton, rice, barley, millet, sugar cane, and switchgrass.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202363488568P | 2023-03-06 | 2023-03-06 | |
US63/488,568 | 2023-03-06 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2024186806A2 true WO2024186806A2 (en) | 2024-09-12 |
WO2024186806A3 WO2024186806A3 (en) | 2024-10-24 |
Family
ID=92675664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2024/018501 WO2024186806A2 (en) | 2023-03-06 | 2024-03-05 | Plant pathogen resistance genes |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024186806A2 (en) |
-
2024
- 2024-03-05 WO PCT/US2024/018501 patent/WO2024186806A2/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2024186806A3 (en) | 2024-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240093223A1 (en) | Methods of identifying, selecting, and producing southern corn rust resistant crops | |
US20210040569A1 (en) | Methods of identifying, selecting, and producing disease resistant crops | |
US11473101B2 (en) | Methods of identifying, selecting, and producing southern corn rust resistant crops | |
US20040025202A1 (en) | Nucleic acid molecules associated with oil in plants | |
WO2021143587A1 (en) | Methods of identifying, selecting, and producing disease resistant crops | |
WO2019203942A1 (en) | Methods of identifying, selecting, and producing bacterial leaf blight resistant rice | |
US20240191249A1 (en) | Plant pathogen effector and disease resistance gene identification, compositions, and methods of use | |
US12091673B2 (en) | Methods of identifying, selecting, and producing southern corn rust resistant crops | |
US20230151382A1 (en) | Plant pathogen effector and disease resistance gene identification, compositions, and methods of use | |
WO2023023499A1 (en) | Compositions and methods for gray leaf spot resistance | |
US20220282338A1 (en) | Methods of identifying, selecting, and producing anthracnose stalk rot resistant crops | |
WO2024186806A2 (en) | Plant pathogen resistance genes | |
US11661609B2 (en) | Methods of identifying, selecting, and producing disease resistant crops | |
BR112020025031A2 (en) | molecule that has pesticidal utility, compositions and processes related to it |