Nothing Special   »   [go: up one dir, main page]

WO2016025516A1 - Methods and systems for selective quantitation and detection of allergens - Google Patents

Methods and systems for selective quantitation and detection of allergens Download PDF

Info

Publication number
WO2016025516A1
WO2016025516A1 PCT/US2015/044710 US2015044710W WO2016025516A1 WO 2016025516 A1 WO2016025516 A1 WO 2016025516A1 US 2015044710 W US2015044710 W US 2015044710W WO 2016025516 A1 WO2016025516 A1 WO 2016025516A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
gly
nos
sequence
signature peptides
Prior art date
Application number
PCT/US2015/044710
Other languages
French (fr)
Inventor
Trent James OMAN
Barry William SCHAFER
Ryan Christopher HILL
Guomin Shan
Original Assignee
Dow Agrosciences Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dow Agrosciences Llc filed Critical Dow Agrosciences Llc
Priority to CN201580054874.8A priority Critical patent/CN106796242A/en
Priority to BR112017002622A priority patent/BR112017002622A2/en
Priority to EP15831719.8A priority patent/EP3180622A4/en
Priority to AU2015301806A priority patent/AU2015301806A1/en
Priority to CA2958063A priority patent/CA2958063A1/en
Publication of WO2016025516A1 publication Critical patent/WO2016025516A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/415Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6893Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids related to diseases not provided for elsewhere
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K5/00Peptides containing up to four amino acids in a fully defined sequence; Derivatives thereof
    • C07K5/04Peptides containing up to four amino acids in a fully defined sequence; Derivatives thereof containing only normal peptide links
    • C07K5/10Tetrapeptides
    • C07K5/1021Tetrapeptides with the first amino acid being acidic
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/88Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86
    • G01N2030/8809Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample
    • G01N2030/8813Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials
    • G01N2030/8831Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials involving peptides or proteins
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/24Immunology or allergic disorders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers

Definitions

  • Patent Applications Serial No. 62/035,744, filed August 1 1 , 2014, for "Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 1"; Serial No. 62/035,731 , filed August 1 1, 2014, for “Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 3"; Serial No. 62/035,768, filed August 1 1 , 2014, for "Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 4"; Serial No.
  • the current methods for analysis of gene expression in plants that are preferred in the art include DNA-based techniques (for example PCR and/or RT-PCR); the use of reporter genes; Southern blotting; and immunochemistry. All of these methodologies suffer from various shortcomings. Detection of known and potential allergens in plants, plant parts, and/or food products is an important subject for public safety.
  • the invention relates to methods and systems taking advantage of bioinformatic investigations to identify candidate signature peptides for quantitative multiplex analysis of complex protein samples from plants, plant parts, and/or food products using mass spectrometry.
  • Provided are use and methods for selecting candidate signature peptides for quantitation using a bioinformatic approach.
  • systems comprising a chromatography and mass spectrometry for using selected signature peptides.
  • a method of selecting candidate signature peptide for quantitation of known allergen and potential allergens from a plant-based sample comprises:
  • step (b) performing sequence alignment of the at least one known allergen and potential allergens identified in step (a);
  • Step (d) determining a plural of candidate signature peptides based on conservative regions or domains from the sequence alignment and in silico digestion data of the consensus sequence or representative sequence selected in Step (c);
  • the quantitating step uses a column chromatography and mass spectrometry. In another embodiment, the quantitating step comprises measuring the plural of candidate signature peptides using high resolution accurate mass spectrometry (HRAM MS). In another embodiment, the quantitating step comprises calculating corresponding peak heights or peak areas of the candidate signature peptides from mass spectrometry. In another embodiment, the quantitating step comprises comparing data from high fragmentation mode and low fragmentation mode from mass spectrometry.
  • HRAM MS high resolution accurate mass spectrometry
  • the at least one known allergen comprises at least one allergen selected from the group consisting of Gly m 1, Gly m 3, Gly m 4, Gly m 5 (beta- conglycinin), Gly m 6 (Glycinin) Gl , Gly m 6 (Glycinin) G2, Gly m 6 (Glycinin) G3, Gly m 6 (Glycinin) G4, Gly m 6 (Glycinin) precursor, Kunitz trypsin inhibitor 1 , Kunitz trypsin inhibitor 3, Gly m Bd 28 K, Gly m Bd 30 K, Gly m 8 (2S albumin), Lectin, and lipoxygenase.
  • the at least one known allergen comprises Gly m 1, Gly m 3, Gly m 4, Gly m 5 (beta-conglycinin), Gly m 6 (Glycinin) Gl, Gly m 6 (Glycinin) G2, Gly m 6 (Glycinin) G3, Gly m 6 (Glycinin) G4, or Gly m 6 (Glycinin) precursor.
  • the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 12, 28, 29, 30, 31 , 32, 33, 34, or 35 for Gly m 1.
  • the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 12, 28, or 29 for Gly m l .
  • the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 13, 38, 39, or 40 for Gly m 3.
  • the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 13 or 38 for Gly m 3.
  • the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 14, 42, 43, 44, 45, 46, 47, 48, 49, 50, or 51 for Gly m 4.
  • the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 14 or 42 for Gly m 4. In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 15, 61 , 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 , 72, 73, or 74 for Gly m 5. In a further embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 15 or 61 for Gly m 5. In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 16 or 95 for Gly m 6 Gl .
  • the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 17 or 107 for Gly m 6 G2. In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 18 for Gly m 6 G3. In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 19, 130, 131 , 132, 133, or 134 for Gly m 6 G4. In a further embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 19 or 130 for Gly m 6 G4. In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 20 for Gly m 6 precursor.
  • the potential allergens comprise at least one sequence selected from SEQ ID NOs: 12 and 30-35 for Gly m 1 . In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 13 and 39-40 for Gly m 3. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 14 and 43-51 for Gly m 4. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 15 and 62- 74 for Gly m 5. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 16 and 95 for Gly m 6 Gl .
  • the potential allergens comprise at least one sequence selected from SEQ ID NOs: 17 and 107 for Gly m 6 G2. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 19 and 131-134 for Gly m 6 G4.
  • the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 1, 36, and 37 for Gly m 1. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 36 and 37 for Gly m 1. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 2 and 41 for Gly m 3. In another embodiment, the candidate signature peptides comprise SEQ ID NO: 41 for Gly m 3. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 52-57 for Gly m 4. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 52-57 for Gly m 4.
  • the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 3 and 75-94 for Gly m 5. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 75-94 for Gly m 5. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 96-106 for Gly m 6 Gl . In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 96-106 for Gly m 6 Gl . In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 4 and 108-120 for Gly m 6 G2.
  • the candidate signature peptides comprise SEQ ID NOs: 108-120 for Gly m 6 G2. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 5 and 121 -129 for Gly m 6 G3. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 121 -129 for Gly m 6 G3. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 135- 1 56 and 58 for Gly m 6 G4. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 135-156 and 58 for Gly m 6 G4.
  • the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 6, 59, and 60 for Gly m 6 precursor. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 59 and 60 for Gly m 6 precursor. In another embodiment, the plant-based sample comprises a soybean seed or part of a soybean seed.
  • a system for quantitating one or more protein of interest with known amino acid sequence in a plant-based sample comprises:
  • a selection module for selecting a plural of signature peptides for at least one known allergen and potential allergens
  • the separation module comprises a column chromatography.
  • the column chromatography comprises a liquid column chromatography.
  • the mass spectrometry comprises a high resolution accurate mass spectrometry (HRAM MS).
  • the selection module uses a method provided herein.
  • the one or more protein of interest with known amino acid sequence in a plant-based sample comprises potential allergens.
  • the potential allergens comprise at least one sequence selected from SEQ ID NOs: 12 and 30-35 for Gly m l .
  • the potential allergens comprise at least one sequence selected from SEQ ID NOs: 13 and 39-40 for Gly m 3.
  • the potential allergens comprise at least one sequence selected from SEQ ID NOs: 14 and 43-51 for Gly m 4.
  • the potential allergens comprise at least one sequence selected from SEQ ID NOs: 15 and 62-74 for Gly m 5.
  • the potential allergens comprise at least one sequence selected from SEQ ID NOs: 16 and 95 for Gly m 6 G l . In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 1 7 and 107 for Gly m 6 G2. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 1 and 131 -134 for Gly m 6 G4.
  • the signature peptides comprise at least one sequence selected from SEQ ID NOs: 1 , 36, and 37 for Gly m 1 . In another embodiment, the signature peptides comprise SEQ ID NOs: 36 and 37 for Gly m 1 . In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 2 and 41 for Gly m 3. In another embodiment, the signature peptides comprise SEQ ID NO: 41 for Gly m 3. In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 52-57 for Gly m 4. In another embodiment, the signature peptides comprise SEQ ID NOs: 52-57 for Gly m 4.
  • the signature peptides comprise at least one sequence selected from SEQ ID NOs: 3 and 75-94 for Gly m 5. In another embodiment, the signature peptides comprise SEQ ID NOs: 75-94 for Gly m 5. In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 96-106 for Gly m 6 Gl . In another embodiment, the signature peptides comprise SEQ ID NOs: 96-106 for Gly m 6 Gl . In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 4 and 108-120 for Gly m 6 G2.
  • the signature peptides comprise SEQ ID NOs: 108-120 for Gly m 6 G2. In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 5 and 121 -129 for Gly m 6 G3. In another embodiment, the signature peptides comprise SEQ ID NOs: 121-129 for Gly m 6 G3. In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 135-156 and 58 for Gly m 6 G4. In another embodiment, the signature peptides comprise SEQ ID NOs: 135-156 and 58 for Gly m 6 G4.
  • the signature peptides comprise at least one sequence selected from SEQ ID NOs: 6, 59, and 60 for Gly m 6 precursor. In another embodiment, the signature peptides comprise SEQ ID NOs: 59 and 60 for Gly m 6 precursor. In another embodiment, the plant-based sample comprises a soybean seed or part of a soybean seed.
  • a high-throughput method of quantitating at least one allergen with known amino acid sequence and homologous potential allergens in a plant- based sample comprises using the system provided herein.
  • FIG. 1 shows a representative analysis work flow for the methods and systems disclosed herein.
  • FIGs. 2A - 2C show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 1 SYPSNATCPR; SEQ ID NO: 36 ALGILNLNR; and SEQ ID NO: 37 NLQL1LNSCGR from trypsin digested soybean sample chromatogram for Gly m 1.
  • FIG. 2D shows representative SRM LC-MS/MS Standard Chromatogram - 500 ng/mL Synthetic Peptide SEQ ID NO: 1 SYPSNATCPR natural abundance peptide and heavy isotope labeled peptide transitions.
  • FIG. 2E shows sequences alignments among Gly m 1 and potential homologs of Gly m l .
  • FIG. 3A shows representative SRM LC-MS/MS for selected signature peptide SEQ ID NO: 2 YMVIQGEPGAVIR from trypsin digested soybean sample chromatogram for Gly m 3.
  • FIG. 3B shows representative SRM LC-MS/MS Standard Chromatogram - 500 ng/mL Synthetic Peptide SEQ ID NO: 2 YMVIQGEPGAVIR natural abundance peptide and heavy isotope labeled peptide transitions.
  • FIG. 3C shows sequences alignments among Gly m 3 and potential homologs of Gly m 3.
  • FIGs. 4A - 4F show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 52 MGVFTFEDEINSPVAPATLYK; SEQ ID NO: 53 ALDSFK; SEQ ID NO: 54 SVENVEGNGGPGTIK; SEQ ID NO: 55 ITFLEDGETK; SEQ ID NO: 56 FVLHK; and SEQ ID NO: 57 AIEAYLLAHPDYN from trypsin digested soybean sample chromatogram for Gly m 4.
  • FIG. 4G shows sequences alignments among Gly m 4 and potential homologs of Gly m 4.
  • FIGs. 5A - 5U show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 3 NILEASYDTK; SEQ ID NO: 75 CNLLK; SEQ ID NO: 76 EEDEDEQPRPIPFPRPQPR; SEQ ID NO: 77 EEQEWPR; SEQ ID NO: 78 QFPFPRPPHQK; SEQ ID NO: 79 ESEESEDSELR; SEQ ID NO: 80 NPFLFGSNR; SEQ ID NO: 81 FETLFK; SEQ ID NO: 82 SPQLQNLR; SEQ ID NO: 83 LQSGDALR; SEQ ID NO: 84 VPSGTTYYVVNPDNNENLR; SEQ ID NO: 85 FESFFLSSTEAQQSYLQGFSR; SEQ ID NO: 86 FEEINK; SEQ ID NO: 87 VLFSR; SEQ ID NO: 88 TISSEDKPFNLR; SEQ ID NO: 89 DP1YSNK; S
  • FIG. 5V shows representative SRM LC-MS/MS Standard Chromatogram - 500 ng/mL Synthetic Peptide SEQ ID NO: 3 NILEASYDTK natural abundance peptide and heavy isotope labeled peptide transitions.
  • FIG. 5W show sequences alignments among Gly m 5 and potential homologs of Gly m 5.
  • FIG. 6A - 6K show representative SRM LC -MS/MS for selected signature peptides
  • FIG. 6L shows a sequence alignment between Gly m 6 Gl and a potential homolog of Gly m 6 Gl .
  • FIGs. 7A - 7N show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 4 VTAPAMR; SEQ ID NO: 108 LVLSLCFLLFSGCFALR; SEQ ID NO: 109 EQAQQNECQIQK; SEQ ID NO: 1 10 RPSYTNGPQEIYIQQGNGIFGMIFPGC PSTYQEPQESQQR; SEQ ID NO: 1 1 1 SQRPQDR; SEQ ID NO: 1 12 QQEEENEGSNILSGFAPEFLK; SEQ ID NO: 1 13 EAFGVNMQIVR; SEQ ID NO: 1 14 KPQQEEDDDDEEEQPQCVETDK; SEQ ID NO: 1 15 LSAQYGSLR; SEQ ID NO: 1 16 NAMFVPHYTLNANSIIYALNGR; SEQ ID NO: 1 17 ALVQVVNCNGER; SEQ ID NO: 1 18 VFDGELQEGGVLIVPQNFAVAAK; SEQ ID NO:
  • FIG. 70 shows representative SRM LC-MS/MS Standard Chromatogram - 500 ng/mL Synthetic Peptide SEQ ID NO: 4 VTAPAMR natural abundance peptide and heavy isotope labeled peptide transitions.
  • FIG. 7P shows a sequence alignment between Gly m 6 G2 and a potential homolog of Gly m 6 G2.
  • FIGs. 8A - 8J show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 5 NNNPFSFLVPPK: SEQ ID NO: 121 LVLSLCFLLFSGCCFAFSFR; SEQ ID NO: 122 EQPQQNECQIQR; SEQ ID NO: 123 QQEEENEGGSILSGFAPEFLEHAFVVDR; SEQ ID NO: 124 LQGENEEEEK; SEQ ID NO: 125 GGLSVISPPTEEQQQRPEEEEKPDCDEK; SEQ ID NO: 126 HCQSQSR; SEQ ID NO: 127 LSAQFGSLR; SEQ ID NO: 128 VFDGELQEGQVLIVPQNFAVAAR; and SEQ ID NO: 129 TNDRPSIGNLAGANSLLNALPEEVIQQTFNLR from trypsin digested soybean sample chromatogram for Gly m 6 G3.
  • FIG. 8K shows representative SRM LC-MS/MS Standard Chromatogram - 500 ng/mL Synthetic Peptide SEQ ID NO: 5 NNNPF SFL VPPK natural abundance peptide and heavy isotope labeled peptide transitions.
  • FIGs. 9A - 9X show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 6 ADFYNPK; SEQ ID NO: 135 LNECQLNNLNALEPDHR; SEQ ID NO: 136 CAGVTVSK; SEQ ID NO: 137 LTLNR; SEQ ID NO: 138 MIIIAQGK; SEQ ID NO: 139 GALGVAIPGCPETFEEPQEQSNR; SEQ ID NO: 140 QQLQDSHQK; SEQ ID NO: 141 VFYLAGNPDIEYPETMQQQQQK; SEQ ID NO: 142 QGQHQQEEEEEGGSVLSGFSK; SEQ ID NO: 143 HFLAQSFNTNEDIAEK; SEQ ID NO: 144 QIVTVEGGLSVISPK; SEQ ID NO: 145
  • WQEQQDEDEDEDEDDEDEQIPSHPPR SEQ ID NO: 146 RPSHGK; SEQ ID NO: 147 EQDEDEDEDKPRP SRP SQGK; SEQ ID NO: 148 QEEPR; SEQ ID NO: 149 NGVEENICTLK; SEQ ID NO: 150 LHENIARPSR; SEQ ID NO: 151 ISTLNSLTLPALR; SEQ ID NO: 152 QFQLSAQYVVLYK; SEQ ID NO: 153 NGIYSPHWNLNANSVIYVTR; SEQ ID NO: 154 DVFR; SEQ ID NO: 155 AIPSEVLAHSYNLR; SEQ ID NO: 156 QSQVSELK; and SEQ ID NO: 58 YEGNWGPLVNPESQQGSPR from trypsin digested soybean sample chromatogram for Gly m 6 G4.
  • FIG 9Y shows sequences alignments among Gly m 6 G4 and potential homologs of
  • FIGs. 10A, 10B, AND I OC show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 6 ADFYNPK, SEQ ID NO: 59 GALQCKPGCPETFEEPQEQSNR, and SEQ ID NO: 60 LQSPDDER from trypsin digested soybean sample chromatogram for Gly m 6 precursor.
  • MODE(S) FOR CARRYING OUT THE INVENTION It is of significance to enable a sensitive multiplex assay that is capable of selectively detecting and measuring levels of proteins of interest.
  • relevant technologies for protein expression detection rely heavily on traditional immunochemistry technologies which present a challenge to accommodate the volume of data required to generate per sample.
  • Soybean is a multi-billion dollar commodity due to its balanced composition of
  • the allergens can be quantified using a multiplexing format and samples can be harvested from the field, processed, and analyzed/quantitated for example within a day (twenty-four hours) window (from field to measured numerical value).
  • sample preparations of the methods and systems provided can be fully scalable for high- throughput, thus enabling hundreds of samples to be analyzed in a single batch.
  • soybean allergens include, for example, Gly m 1 , Gly m 3, Gly m 4, Gly m 5 (beta-conglycinin), Gly m 6 (Glycinin) Gl , Gly m 6 (Glycinin) G2, Gly m 6 (G lycinin) G3, Gly m 6 (Glycinin) G4, Gly m 6 (Glycinin) precursor, Gly m 6 (Glycinin) G4 precursor, Kimitz trypsin inhibitor 1 , Kunitz trypsin inhibitor 3, Gly m Bd 28 K, Gly m Bd 30 K, Gly m 8 (2S albumin), Lectin, and lipoxygenase.
  • Representative wheat allergens include, for example, profilin (Tri a ! 2), wheat lipid transfer protein 1 (Tri al4), agglutinin isolectin 1 (Tri al 8), omega-5 gliadin - seed storage protein (Tri al9), gliadin (Tri a20; NCBI Accession Nos. M10092, Ml 1073, Ml 1074, Ml 1075, Ml 1076, K03074, and K03075), thioredoxin (Tri a25), high molecular weight glutenin (Tri a26), low molecular weight glutenin (Tri a36), and alpha purothionin (Tri a37).
  • profilin Tri a ! 2
  • Tri al4 wheat lipid transfer protein 1
  • Tri al 8 agglutinin isolectin 1
  • Tri al9 omega-5 gliadin - seed storage protein
  • Tri al9 gliadin
  • Tri a20 NCBI Accession Nos. M10092, Ml 1073, Ml 1074
  • corn allergens include, for example, maize lipid transfer protein (LTP) (Zea ml 4) and thioredoxin (Zea m25).
  • LTP maize lipid transfer protein
  • thioredoxin Zinc m25
  • corn allergens include, for example, rice profilin A (Ory si 2).
  • the methods and systems provided use liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) to detect protein expression levels of sixteen different allergens from soybean.
  • the methods and systems enable analysis of each allergen by itself or combined with additional proteins for a multiplexing assay for qualitative and quantitative analysis in plant matrices.
  • the mass spectrometry detection for quantitative studies may be accomplished using selected reaction monitoring, performed on a triple quadrupole mass spectrometer. Using this type of instrumentation, initial mass-selection of ion (peptide) of interest formed in the source, followed by, dissociation of this precursor ion in the collision region of the MS, then mass-selection, and counting, of a specific product (daughter) ion.
  • the mass spectrometry detection for quantitative studies may be accomplished using selected reaction monitoring (SRM).
  • counts per unit time may provide an integratable peak area from which amounts or concentration of analytes can be determined.
  • the use of high resolution accurate mass (HRAM) monitoring for quantitation, performed on a HRAM capable mass spectrometer may include, but is not limited to, hybrid quadrupole- time-of-flight, quadrupole-orbitrap, ion trap-orbitrap, or quadrupole-ion-trap-orbitrap (tribrid) mass spectrometers.
  • HRAM high resolution accurate mass
  • peptides are not subject to fragmentation conditions, but rather are measured as intact peptides using full scan or targeted scan modes (for example selective ion monitoring mode or SIM). Integratable peak area can be determined by generating an extracted ion chromatogram for each specific analyte and amounts or concentration of analytes can be calculated.
  • the high resolution and accurate mass nature of the data enable highly specific and sensitive ion signals for the analyte (protein and/or peptide) of interest.
  • bioconfinement refers to restriction of the movement of genetically modified plants or their genetic material to designated areas.
  • the term includes physical, physicochemical, biological confinement, as well as other forms of confinement that prevent the survival, spread or reproduction of a genetically modified plants in the natural environment or in artificial growth conditions.
  • complex protein sample is used to distinguish a sample from a purified protein sample.
  • a complex protein sample contains multiple proteins, and may additionally contain other contaminants.
  • mass spectrometry refers to any suitable mass spectrometry method, device or configuration including, e.g., electrospray ionization (ESI), matrix-assisted laser desorption/ionization (MALDI) MS, MALDI-time of flight (TOF) MS, atmospheric pressure (AP) MALDI MS, vacuum MALDI MS, or combinations thereof.
  • ESI electrospray ionization
  • MALDI matrix-assisted laser desorption/ionization
  • TOF MALDI-time of flight
  • AP atmospheric pressure
  • mass spectrometry devices measure the molecular mass of a molecule (as a function of the molecule's mass-to-charge ratio) by measuring the molecule's flight path through a set of magnetic and electric fields.
  • the mass-to-charge ratio is a physical quantity that is widely used in the electrodynamics of charged particles.
  • the mass-to-charge ratio of a particular peptide can be calculated, a priori, by one of skill in the art. Two particles with different mass-to-charge ratio will not move in the same path in a vacuum when subjected to the same electric and magnetic fields.
  • Mass spectrometry instruments consist of three modules: an ion source, which splits the sample molecules into ions; a mass analyzer, which sorts the ions by their masses by applying electromagnetic fields; and a detector, which measures the value of an indicator quantity and thus provides data for calculating the abundances of each ion present.
  • the technique has both qualitative and quantitative applications. These include identifying unknown compounds, determining the isotopic composition of elements in a molecule, determining the structure of a compound by observing its fragmentation, and quantifying the amount of a compound in a sample.
  • proteins and/or peptides are "multiplexed" when two or more proteins and/or peptides of interest are present in the same sample.
  • a "plant trait” may refer to any single feature or quantifiable measurement of a plant.
  • protein or proteins may refer to organic compounds made of amino acids arranged in a linear chain and joined together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues.
  • sequence of amino acids in a protein is defined by the sequence of a gene, which is encoded in the genetic code.
  • the genetic code specifies 20 standard amino acids, however in certain organisms the genetic code can include selenocysteine— and in certain archaea- pyrrolysine.
  • the residues in a protein are often observed to be chemically modified by post-translational modification, which can happen either before the protein is used in the cell, or as part of control mechanisms. Protein residues may also be modified by design, according to techniques familiar to those of skill in the art.
  • the term “protein” encompasses linear chains comprising naturally occurring amino acids, synthetic amino acids, modified amino acids, or combinations of any or all of the above.
  • single injection refers to the initial step in the operation of a MS or LC-MS device.
  • a protein sample is introduced into the device in a single injection, the entire sample is introduced in a single step.
  • signature peptide refers an identifier (short peptide) sequence of a specific protein. Any protein may contain an average of between 10 and 100 signature peptides. Typically signature peptides have at least one of the following criteria: easily detected by mass spectroscopy, predictably and stably eluted from a liquid chromatography (LC) column, enriched by reversed phase high performance liquid chromatography (RP-HPLC), good ionization, good fragmentation, or combinations thereof.
  • LC liquid chromatography
  • RP-HPLC reversed phase high performance liquid chromatography
  • the hydrophobicity index can be calculated according to Krokhin, Molecular and Cellular Proteomics 3 (2004) 908, which is incorporated by reference. It's known that a peptide having a hydrophobicity index less than 10 or greater than 40 may not be reproducibly resolved or eluted by a RP-HPLC column.
  • stacked refers to the presence of multiple heterologous polynucleotides incorporated in the genome of a plant.
  • Tandem mass spectrometry In tandem mass spectrometry, a parent ion generated from a molecule of interest may be filtered in a mass spectrometry instrument, and the parent ion subsequently fragmented to yield one or more daughter ions that are then analyzed (detected and/or quantified) in a second mass spectrometry procedure. In some embodiments, the use of tandem mass spectrometry is excluded. In these embodiments, tandem mass spectrometry is not used in the methods and systems provided. Thus, neither parent ions nor daughter ions are generated in these embodiments.
  • transgenic plant includes reference to a plant which comprises within its genome a heterologous polynucleotide.
  • the heterologous polynucleotide is stably integrated within the genome such that the polynucleotide is passed on to successive generations.
  • the heterologous polynucleotide may be integrated into the genome alone or as part of a recombinant expression cassette.
  • Transgenic is used herein to include any cell, cell line, callus, tissue, plant part or plant, the genotype of which has been altered by the presence of heterologous nucleic acid including those transgenic plants initially so altered as well as those created by sexual crosses or asexual propagation from the initial transgenic plant.
  • any plants that provide useful plant parts may be treated in the practice of the present invention.
  • Examples include plants that provide flowers, fruits, vegetables, and grains.
  • the phrase "plant” includes dicotyledonous plants and monocotyledonous plants.
  • dicotyledonous plants include tobacco, Arabidopsis, soybean, tomato, papaya, canola, sunflower, cotton, alfalfa, potato, grapevine, pigeon pea, pea, Brassica, chickpea, sugar beet, rapeseed, watermelon, melon, pepper, peanut, pumpkin, radish, spinach, squash, broccoli, cabbage, carrot, cauliflower, celery, Chinese cabbage, cucumber, eggplant, and lettuce.
  • Examples of monocotyledonous plants include corn, rice, wheat, sugarcane, barley, rye, sorghum, orchids, bamboo, banana, cattails, lilies, oat, onion, millet, and triticale.
  • Examples of fruit include banana, pineapple, oranges, grapes, grapefruit, watermelon, melon, apples, peaches, pears, kiwifruit, mango, nectarines, guava, persimmon, avocado, lemon, fig, and berries.
  • flowers include baby's breath, carnation, dahlia, daffodil, geranium, gerbera, lily, orchid, peony, Queen Anne's lace, rose, snapdragon, or other cut-flowers or ornamental flowers, potted- flowers, and flower bulbs.
  • a proteolytic fragment or set of proteolytic fragments that uniquely identifies a protein(s) of interest is used to detect the protein(s) of interest in a complex protein sample.
  • disclosed methods enable the quantification or determination of ratios of multiple proteins in a complex protein sample by a single mass spectrometry analysis, as opposed to measuring each protein of interest individually multiple times and compiling the individual results into one sample result.
  • the present disclosure also provides methods useful for the development and use of transgenic plant technology. Specifically, disclosed methods may be used to maintain the genotype of transgenic plants through successive generations. Also, some embodiments of the methods disclosed herein may be used to provide high- throughput analysis of non-transgenic plants that are at risk of being contaminated with transgenes from neighboring plants, for example, by cross-pollination. By these embodiments, bioconfinement of transgenes may be facilitated and/or accomplished. In other embodiments, methods disclosed herein may be used to screen the results of a plant transformation procedure in a high-throughput manner to identify transformants that exhibit desirable expression characteristics
  • the mass-to-charge ratio may be determined using a quadrupole analyzer.
  • a quadrupole analyzer For example, in a "quadrupole” or “quadrupole ion trap” instrument, ions in an oscillating radio frequency field experience a force proportional to the DC potential applied between electrodes, the amplitude of the RF signal, and m/z. The voltage and amplitude can be selected so that only ions having a particular m/z travel the length of the quadrupole, while all other ions are deflected.
  • quadrupole instruments can act as a "mass filter” and “mass detector” for the ions injected into the instrument.
  • CID Collision-induced dissociation
  • parent ions gain energy through collisions with an inert gas, such as argon, and subsequently fragmented by a process referred to as "unimolecular decomposition.” Sufficient energy must be deposited in the parent ion so that certain bonds within the ion can be broken due to increased energy.
  • the mass spectrometer typically provides the user with an ion scan; that is, the relative abundance of each m/z over a given range (for example 10 to 1200 amu).
  • the results of an analyte assay can be related to the amount of the analyte in the original sample by numerous methods known in the art. For example, given that sampling and analysis parameters are carefully controlled, the relative abundance of a given ion can be compared to a table that converts that relative abundance to an absolute amount of the original molecule.
  • molecular standards e.g., internal standards and external standards
  • the relative abundance of a given ion can be converted into an absolute amount of the original molecule.
  • Numerous other methods for relating the presence or amount of an ion to the presence or amount of the original molecule are well known to those of ordinary skill in the art.
  • Ions can be produced using a variety of methods including, but not limited to, electron ionization, chemical ionization, fast atom bombardment, field desorption, and matrix- assisted laser desorption ionization (MALDl), surface enhanced laser desorption ionization (SELDI), desorption electrospray ionization (DESI), photon ionization, electrospray ionization, and inductively coupled plasma.
  • MALDl matrix- assisted laser desorption ionization
  • SELDI surface enhanced laser desorption ionization
  • DESI desorption electrospray ionization
  • photon ionization electrospray ionization
  • electrospray ionization electrospray ionization
  • inductively coupled plasma inductively coupled plasma.
  • Electrospray ionization refers to methods in which a solution is passed along a short length of capillary tube, to the end of which is applied a high positive or negative electric potential. Solution reaching the end of the tube, is vaporized (nebulized) into a jet or spray of very small droplets of solution in solvent vapor. This mist of droplets flows through an evaporation chamber which is heated to prevent condensation and to evaporate solvent. As the droplets get smaller the electrical surface charge density increases until such time that the natural repulsion between like charges causes ions as well as neutral molecules to be released.
  • the effluent of an LC may be injected directly and automatically (i.e., "in-line") into the electrospray device.
  • proteins contained in an LC effluent are first ionized by electrospray into a parent ion.
  • mass analyzers can be used in liquid chromatography - mass spectrometry combination (LC-MS).
  • exemplary mass analyzers include, but not limited to, single quadrupole, triple quadrupole, ion trap, TOF (time of flight), and quadrupole- time of flight (Q-TOF).
  • the quadrupole mass analyzer may consist of 4 circular rods, set parallel to each other.
  • the quadrupole is the component of the instrument responsible for filtering sample ions, based on their mass-to-charge ratio (m/z). Ions are separated in a quadrupole based on the stability of their trajectories in the oscillating electric fields that are applied to the rods.
  • An ion trap is a combination of electric or magnetic fields that captures ions in a region of a vacuum system or tube. Ion traps can be used in mass spectrometry while the ion's quantum state is manipulated.
  • Time-of-flight mass spectrometry is a method of mass spectrometry in which an ion's mass-to-charge ratio is determined via a time measurement. Ions are accelerated by an electric field of known strength. This acceleration results in an ion having the same kinetic energy as any other ion that has the same charge. The velocity of the ion depends on the mass-to-charge ratio. The time that it subsequently takes for the particle to reach a detector at a known distance is measured. This time will depend on the mass-to-charge ratio of the particle (heavier particles reach lower speeds). From this time and the known experimental parameters one can find the mass-to-charge ratio of the ion.
  • the particular instrument used by the methods and/or systems provided may comprise a high fragmentation mode and a low fragmentation mode (or alternatively a non-fragmentation mode).
  • Such different modes may include alternating scan high and low energy acquisition methodology to generate high resolution mass data.
  • the high resolution mass data may comprise a product data set (for example data derived from product ion (fragmented ions) under the high fragmentation mode) and a precursor data set (for example data derived from precursor ions (unfragmented ions) under the low fragmentation or non-fragmentation mode).
  • the methods and/or systems provided use a mass spectrometer comprising a filtering device that may be used in the selection step, a fragmentation device that may be used in the fragmentation step, and/or one or more mass analyzers that may be used in the acquisition and/or mass spectrum creation step or steps.
  • the filtering device and/or mass analyzer may comprise a quadrupole.
  • the selection step and/or acquisition step and/or mass spectrum creation step or steps may involve the use of a resolving quadrupole.
  • the filtering device may comprise a two-dimensional or three-dimensional ion trap or time-of-flight (ToF) mass analyzer.
  • the mass analyzer or mass analyzers may comprise or further comprise one or more of a time-of-flight mass analyzer and/or an ion cyclotron resonance mass analyzer and/or an orbitrap mass analyzer and/or a two-dimensional or three-dimensional ion trap.
  • Filtering by means of selection based upon mass-to-charge ratio (m/z) can be achieved by using a mass analyzer which can select ions based upon m/z, for example a quadrupole; or to transmit a wide m/z range, separate ions according to their m/z, and then select the ions of interest by means of their m/z value.
  • An example of the latter would be a time-of-flight mass analyzer combined with a timed ion selector(s).
  • the methods and/or systems provided may comprise isolating and/or separating the one or more proteins of interest, for example from two or more of a plurality of proteins, using a chromatographic technique for example liquid chromatography (LC).
  • the method may further comprise measuring an elution time for the protein of interest and/or comparing the measured elution time with an expected elution time.
  • the proteins of interest may be separated using an ion mobility technique, which may be carried out using an ion mobility cell. Additionally, the proteins of interest may be selected by order or time of ion mobility drift. The method may further comprise measuring a drift time for the proteins of interest and/or comparing the measured drift time with an expected drift time.
  • the methods and/or systems provided are label-free, where quantitation can be achieved by comparison of the peak intensity, or area under the mass spectral peak for the precursor or product m/z values of interest between injections and across samples.
  • internal standard normalization may be used to account for any known associated analytical error.
  • Another label-free method of quantification, spectral counting involves summing the number of fragment ion spectra, or scans, that are acquired for each given peptide, in a non-redundant or redundant fashion. The associated peptide mass spectra for each protein are then summed, providing a measure of the number of scans per protein with this being proportional to its abundance. Comparison can then be made between samples/injections.
  • the ion source is selected from the group consisting of: (1 ) an electrospray ionization (“ESI”) ion source; (2) an atmospheric pressure photo ionization (“APPI”) ion source; (3) an atmospheric pressure chemical ionization (“APCI”) ion source; (4) a matrix assisted laser desorption ionization (“MALDI”) ion source; (5) a laser desorption ionization (“LDI”) ion source; (6) an atmospheric pressure ionization (“API”) ion source; (7) a desorption ionization on silicon (“DIOS”) ion source; (8) an electron impact ("El”) ion source; (9) a chemical ionization ("CI”) ion source; (10) a field ionization ("Fl”) ion source; (1 1) a field desorption (“FD”) ion source; (12) an inductively coupled plasma (“ICP”) ion source; (13) a fast atom bombardment (“FAB”) ion
  • the methods and/or systems provided comprise an apparatus and/or control system configured to execute a computer program element comprising computer readable program code means for causing a processor to execute a procedure to implement the methods.
  • the methods and/or systems provided use an alternating low and elevated energy scan function in combination with liquid chromatography separation of a plant extract.
  • a list of information for proteins of interest can be provided including, but is not limited to, m/z of precursor ion, m/z of product ions, retention time, ion mobility drift time and rate of change of mobility.
  • the mass analyzer of the methods and/or systems provided may select a narrow m/z range (of a variable and changeable width) to pass ions through to the gas cell. Accordingly, the signal to noise ratio can be enhanced significantly for quantification of proteins of interest.
  • the mass analyzer of the methods and/or systems provided can select a narrow m/z range (of a variable and changeable width) according to the targeted precursor ion. These selected ions are then transferred to an instrument stage capable of dissociating the ions by means of alternate and repeated switches between a high fragmentation mode where the sample precursor ions are substantially fragmented into product ions and a low fragmentation mode (or non- fragmentation mode) where the sample precursor ions are not substantially fragmented.
  • the methods and systems provided are used for determination of endogenous soybean allergen proteins in soybean seed including Gly m 1 , Gly m 3, Gly m 4, Gly m 5 (beta-conglycinin), Gly m 6, Kunitz trypsin inhibitor 1 , Kunitz trypsin inhibitor 3, Gly m Bd 28 K, Gly m Bd 30 K, and Gly m 8 (2S albumin).
  • a 100 ⁇ 0.5 mg ground soybean seed sample is defatted twice with hexanes and dried before extracting with extraction buffer containing 5 M urea, 2 M thiourea, 50 mM Tris pH 8.0 and 65 iiiM DTT. The sample is sonicated in a water bath for thirty minutes, vortexed for one minute, sonicated for another thirty minutes and centrifuged at > 3,000 rpm for ten minutes at 4°C.
  • the aqueous supernatant is collected and diluted to bring the endogenous soybean allergen protein concentration into the calibration standard range with extraction buffer.
  • the diluted extract is denatured at 95°C for twenty minutes with the additional 1 M Tris pH 8.0, 0.5 M DTT and deionized water followed by refrigeration at 4°C for ten minutes.
  • the denatured extract is incubated overnight ( ⁇ 15 hours) at 37°C with 0.5 mg/mL trypsin enzyme.
  • the digestion reaction is quenched with formic acid water (50/50 v/v) and centrifuge at > 3,000 rpm for tern minutes at 4°C.
  • LOD limits of detection
  • LOQ limits of quantitation
  • Gly m 5 NILEASYDTK (SEQ ID NO: 3) 1.22 2.44
  • Glycinin G2 VTAPAMPv (SEQ ID NO: 4) 1.46 2.92
  • Glycinin G3 NNNPF SFLVPPK (SEQ ID NO: 5) 1.58 3.16
  • Glycinin ADFYNPK (SEQ ID NO: 6) - - precursor
  • Concentrations of allergens are calculated from quantitation of signature peptides (for example Analyst Bioanalytical software for LC-MS/MS), and validated by other methods including enzyme-linked immunosorbent assays (ELISA). Calculated concentrations of allergens from different samples are compared using statistical analysis, and results show good consistency among samples.
  • signature peptides for example Analyst Bioanalytical software for LC-MS/MS
  • ELISA enzyme-linked immunosorbent assays
  • homologous protein sequences for Gly m 1 are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 12 and 29-35) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 2E). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
  • signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 1 itself, but also measure potential allergens which are highly homologous to Gly m 1.
  • Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT).
  • suitable assay buffer for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT.
  • the samples are sonicated in buffer to extract proteins.
  • the extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours.
  • the selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 1 , either by itself or in combination with additional proteins in a multiplexing assay format.
  • three signature peptides are selected from all peptide possibilities (SEQ ID NO: 1 SYPSNATCPR; SEQ ID NO: 36 ALGILNLNR; and SEQ ID NO: 37 NLQLILNSCGR), and representative quantitation of these signature peptides are shown in FIGs. 2A - 2C.
  • a peptide standard is synthesized for SEQ ID NO: 1 SYPSNATCPR for quantitative and qualitative analyses (see FIG. 2D). Synthetic peptides can directly serve as an analytical reference standard for protein quantitation.
  • homologous protein sequences for Gly m 3 are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 13, 39, and 40) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 3C). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
  • signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 3 itself, but also measure potential allergens which are highly homologous to Gly m 3.
  • Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT).
  • suitable assay buffer for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT.
  • the samples are sonicated in buffer to extract proteins.
  • the extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours.
  • the selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 3, either by itself or in combination with additional proteins in a multiplexing assay format.
  • two signature peptides are selected from all peptide possibilities (SEQ ID NO: 2 YMVIQGEPGAVIR; and SEQ ID NO: 41 GPGGVTVK), and representative quantitation of SEQ ID NO: 2 YMVIQGEPGAVIR is shown in FIG. 3A.
  • a peptide standard is synthesized for SEQ ID NO: 2 YMVIQGEPGAVIR for quantitative and qualitative analyses (see FIG. 3B). Synthetic peptides can directly serve as an analytical reference standard for protein quantitation.
  • homologous protein sequences for Gly m 4 are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 14 and 43-51 ) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 4G). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
  • signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 4 itself, but also measure potential allergens which are highly homologous to Gly m 4.
  • Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT).
  • suitable assay buffer for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT.
  • the samples are sonicated in buffer to extract proteins.
  • the extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours.
  • the selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 4, either by itself or in combination with additional proteins in a multiplexing assay format.
  • several signature peptides are selected from all peptide possibilities (SEQ ID NO: 52 MGVFTFEDEINSPVAPATLYK; SEQ ID NO: 53 ALDSFK; SEQ ID NO: 54 SVENVEGNGGPGTIK; SEQ ID NO: 55 ITFLEDGETK; SEQ ID NO: 56 FVLHK; and SEQ ID NO: 57 AIEAYLLAHPDYN), and representative quantitation of these signature peptides are shown in FIGs. 4A - 4F. Synthetic peptides can directly serve as an analytical reference standard for protein quantitation.
  • homologous protein sequences for Gly m 5 are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 1 5 and 62-74) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 5W). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
  • signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify al l or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 5 itself, but also measure potential allergens which are highly homologous to Gly m 5.
  • Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT).
  • suitable assay buffer for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT.
  • the samples are sonicated in buffer to extract proteins.
  • the extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours.
  • the selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 5, either by itself or in combination with additional proteins in a multiplexing assay format.
  • several signature peptides are selected from all peptide possibilities (SEQ ID NO: 3 NILEASYDTK; SEQ ID NO: 75 CNLLK; SEQ ID NO: 76 EEDEDEQPRPIPFPRPQPR; SEQ ID NO: 77 EEQEWPR; SEQ ID NO: 78 QFPFPRPPHQK; SEQ ID NO: 79 ESEESEDSELR; SEQ ID NO: 80 NPFLFGSNR; SEQ ID NO: 81 FETLFK; SEQ ID NO: 82 SPQLQNLR; SEQ ID NO: 83 LQSGDALR; SEQ ID NO: 84 VPSGTTYYVVNPDNNENLR; SEQ ID NO: 85 FESFFLSSTEAQQSYLQGFSR; SEQ ID NO: 86 FEEINK; SEQ ID NO: 87 V
  • FIGs. 5A - 5U A peptide standard is synthesized for SEQ ID NO: 3 NILEASYDTK for quantitative and qualitative analyses (see FIG. 5V). Synthetic peptides can directly serve as an analytical reference standard for protein quantitation.
  • At least two homologous protein sequences for Gly m 6 Gl are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 16 and 95) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 6L). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
  • signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 6 Gl itself, but also measure potential allergens which are highly homologous to Gly m 6 Gl .
  • Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT).
  • suitable assay buffer for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT.
  • the samples are sonicated in buffer to extract proteins.
  • the extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours.
  • the selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 6 Gl , either by itself or in combination with additional proteins in a multiplexing assay format.
  • several signature peptides are selected from all peptide possibilities (SEQ ID NO: 96 LVFSLCFLLFSGCCFAFSSR; SEQ ID NO: 97 EQPQQNECQIQK; SEQ ID NO: 98 RPSYTNGPQEIYIQQGK; SEQ ID NO: 99 HQQEEENEGGSILSGFTLEFLEHAFSVDK; SEQ ID NO: 100 HCQRPR; SEQ ID NO: 101 HNIGQTSSPDIYNPQAGSVTTATSLDFPALSWLR; SEQ ID NO: 102 ALIQVVNCNGER; SEQ ID NO: 103 VFDGELQEGR; SEQ ID NO: 104 TNDTPMIGTLAGANSLLNALPEEVIQHTFNLK; SEQ ID NO: 105 NN PFK; and SEQ
  • At least two homologous protein sequences for Gly m 6 G2 are identified from public databases including NCBL Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 17 and 107) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 7P). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
  • signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 6 G2 itself, but also measure potential allergens which are highly homologous to Gly m 6 G2.
  • Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT).
  • suitable assay buffer for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT.
  • the samples are sonicated in buffer to extract proteins.
  • the extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours.
  • the selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 6 G2, either by itself or in combination with additional proteins in a multiplexing assay format.
  • several signature peptides are selected from all peptide possibilities (SEQ ID NO: 4 VTAPAMR; SEQ ID NO: 108 LVLSLCFLLFSGCFALR; SEQ ID NO: 109 EQAQQNECQIQK; SEQ ID NO: 1 10 RPSYTNGPQE1Y1QQGNGIFGMIFPGCPSTYQEPQESQQR; SEQ ID NO: 1 1 1 SQRPQDR; SEQ ID NO: 1 12 QQEEENEGSNILSGFAPEFLK; SEQ ID NO: 1 13 EAFGVNMQIVR; SEQ ID NO: 1 14 PQQEEDDDDEEEQPQCVETDK; SEQ ID NO: 1 15 LSAQYGSLR; SEQ ID NO: 1 16 NAMFVPHYTLNANSIIYALNGR; SEQ ID NO: 1 17 A
  • the Gly m 6 G3 sequence identified from databases is digested in silico to generate candidate signature peptide fragments to be detected and measured by LC-MS.
  • signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. Quantitation of selected signature peptides can not only measure Gly m 6 G3 itself, but also measure potential allergens which are highly homologous to Gly m 6 G3.
  • Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT).
  • suitable assay buffer for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT.
  • the samples are sonicated in buffer to extract proteins.
  • the extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours.
  • the selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 6 G3, either by itself or in combination with additional proteins in a multiplexing assay format.
  • several signature peptides are selected from all peptide possibilities (SEQ ID NO: 5 NNNPFSFLVPPK; SEQ ID NO: 121 LVLSLCFLLFSGCCFAFSFR; SEQ ID NO: 122 EQPQQNECQIQR; SEQ ID NO: 123 QQEEENEGGSILSGFAPEFLEHAFVVDR; SEQ ID NO: 124 LQGENEEEEK; SEQ ID NO: 125 GGLSVISPPTEEQQQRPEEEEKPDCDEK; SEQ ID NO: 126 HCQSQSR; SEQ ID NO: 127 LSAQFGSLR; SEQ ID NO: 128 VFDGELQEGQVLIVPQNFAVAAR; and SEQ ID NO: 129 TNDRPSIGNLAGANSLLNALPEEVIQQTFNLR), and
  • a peptide standard is synthesized for SEQ ID NO: 5 NNNPFSFLVPPK for quantitative and qualitative analyses (see FIG. 8K). Synthetic peptides can directly serve as an analytical reference standard for protein quantitation.
  • homologous protein sequences for Gly m 6 G4 are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 19 and 131 -134) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 9Y). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
  • signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 6 G4 itself, but also measure potential allergens which are highly homologous to Gly m 6 G4.
  • Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT).
  • suitable assay buffer for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT.
  • the samples are sonicated in buffer to extract proteins.
  • the extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours.
  • the selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 6 G4, either by itself or in combination with additional proteins in a multiplexing assay format.
  • several signature peptides are selected from all peptide possibilities (SEQ ID NO: 6 ADFYNPK; SEQ ID NO: 135 LNECQLNNLNALEPDHR; SEQ ID NO: 136 CAGVTVSK; SEQ ID NO: 137 LTLNR; SEQ ID NO: 138 MIIIAQGK; SEQ ID NO: 139 GALGVAIPGCPETFEEPQEQSNR; SEQ ID NO: 140 QQLQDSHQK; SEQ ID NO: 141 VFYLAGNPDIEYPETMQQQQQK; SEQ ID NO: 142 QGQHQQEEEEEGGSVLSGFSK; SEQ I D NO: 143 HFLAQSFNTNEDIAEK; SEQ ID NO: 144 Q1VTVEGGLSV1SPK; SEQ ID NO: 145 WQEQ
  • the Gly m 6 precursor is digested in silico to generate candidate signature peptide fragments to be detected and measured by LC-MS.
  • signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases.
  • quantitation of selected signature peptides can not only measure Gly m 6 precursor itself, but also measure potential allergens which are highly homologous to Gly m 6 precursor.
  • Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT).
  • suitable assay buffer for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT.
  • the samples are sonicated in buffer to extract proteins.
  • the extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours.
  • the selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 6 precursor, either by itself or in combination with additional proteins in a multiplexing assay format.
  • three signature peptides are selected from all peptide possibilities (SEQ ID NO: 6 ADFYNPK, SEQ ID NO: 59 GALQCKPGCPETFEEPQEQSNR and SEQ ID NO: 60 LQSPDDER), and representative quantitation of these signature peptides are shown in FIGs. 10B and I OC.
  • Synthetic peptides can directly serve as an analytical reference standard for protein quantitation.
  • SEQ ID NO: 1 Exemplary signature peptide for Gly m 1
  • SEQ ID NO: 2 Exemplary signature peptide for Gly m 3
  • SEQ ID NO: 3 Exemplary signature peptide for Gly m 5 (beta-conglycinin) NILEASYDTK
  • SEQ ID NO: 4 Exemplary signature peptide for Gly m 6 (Glycinin) G2
  • VTAPAMR SEQ ID NO: 5 Exemplary signature peptide for Gly m 6 (Glycinin) G3
  • SEQ ID NO: 8 Exemplary signature peptide for Kunitz trypsin inhibitor 3
  • SEQ ID NO: 9 Exemplary signature peptide for Gly m Bd 28 K
  • NKPQFLAGAASLLR SEQ ID NO: 10 Exemplary signature peptide for Gly m Bd 30 K
  • SEQ ID NO: 1 Exemplary signature peptide for Gly m 8 (2S albumin)
  • SEQ ID NO: 28 Gly m 1 consensus sequence
  • SEQ ID NO: 38 Gly m 3 consensus sequence

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Urology & Nephrology (AREA)
  • Immunology (AREA)
  • Hematology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medicinal Chemistry (AREA)
  • Food Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Cell Biology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Pathology (AREA)
  • Biotechnology (AREA)
  • General Physics & Mathematics (AREA)
  • Botany (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Genetics & Genomics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The invention relates to methods and systems taking advantage of bioinformatic investigations to identify candidate signature peptides for quantitative multiplex analysis of complex protein samples from plants, plant parts, and/or food products using mass spectroscopy. Provided are use and methods for selecting candidate signature peptides for quantitation using a bioinformatic approach. Also provided are systems comprising a chromatography and mass spectrometry for using selected signature peptides.

Description

METHODS AND SYSTEMS FOR SELECTIVE QUANTITATION
AND DETECTION OF ALLERGENS
PRIORITY CLAIM
This application claims the benefit of the filing date of United States Provisional
Patent Applications: Serial No. 62/035,744, filed August 1 1 , 2014, for "Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 1"; Serial No. 62/035,731 , filed August 1 1, 2014, for "Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 3"; Serial No. 62/035,768, filed August 1 1 , 2014, for "Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 4"; Serial No. 62/035,800, filed August 1 1 , 2014, for "Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 5"; Serial No. 62/035,858, filed August 1 1 , 2014, for "Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 6 Gl SUBUNIT"; Serial No. 62/035,876, filed August 1 1 , 2014, for "Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 6 G2 SUBUNIT"; Serial No. 62/035,920, filed August 1 1 , 2014, for "Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 6 G3 SUBUNIT"; Serial No. 62/036,926, filed August 13, 2014, for "Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 6 G4 SUBUNIT"; and Serial No. 62/035,944, filed August 1 1 , 2014, for "Methods and Systems for Selective Quantitation and Detection of Allergens Including GLY M 6 Precursor," the disclosure of each of which is hereby incorporated herein in its entirety by this reference.
BACKGROUND
The current methods for analysis of gene expression in plants that are preferred in the art include DNA-based techniques (for example PCR and/or RT-PCR); the use of reporter genes; Southern blotting; and immunochemistry. All of these methodologies suffer from various shortcomings. Detection of known and potential allergens in plants, plant parts, and/or food products is an important subject for public safety.
Although mass spectrometry has been disclosed previously, existing approaches are lim ited without selected and sensitive quantitation. There remains a need for a high- throughput method for selected and sensitive quantitation of known and/or potential allergens in plant, plant parts, and/or food products.
DISCLOSURE
The invention relates to methods and systems taking advantage of bioinformatic investigations to identify candidate signature peptides for quantitative multiplex analysis of complex protein samples from plants, plant parts, and/or food products using mass spectrometry. Provided are use and methods for selecting candidate signature peptides for quantitation using a bioinformatic approach. Also provided are systems comprising a chromatography and mass spectrometry for using selected signature peptides.
In one aspect, provided is a method of selecting candidate signature peptide for quantitation of known allergen and potential allergens from a plant-based sample. The method comprises:
(a) identifying potential allergens based on homology to at least one known allergen protein sequence;
(b) performing sequence alignment of the at least one known allergen and potential allergens identified in step (a);
(c) selecting a consensus sequence or representative sequence based on the sequence alignment;
(d) determining a plural of candidate signature peptides based on conservative regions or domains from the sequence alignment and in silico digestion data of the consensus sequence or representative sequence selected in Step (c); and
(e) quantitating the amount of the at least one known allergen and potential allergens in the plant-based sample based on measurements of the signature peptides.
In one embodiment, the quantitating step uses a column chromatography and mass spectrometry. In another embodiment, the quantitating step comprises measuring the plural of candidate signature peptides using high resolution accurate mass spectrometry (HRAM MS). In another embodiment, the quantitating step comprises calculating corresponding peak heights or peak areas of the candidate signature peptides from mass spectrometry. In another embodiment, the quantitating step comprises comparing data from high fragmentation mode and low fragmentation mode from mass spectrometry. In one embodiment, the at least one known allergen comprises at least one allergen selected from the group consisting of Gly m 1, Gly m 3, Gly m 4, Gly m 5 (beta- conglycinin), Gly m 6 (Glycinin) Gl , Gly m 6 (Glycinin) G2, Gly m 6 (Glycinin) G3, Gly m 6 (Glycinin) G4, Gly m 6 (Glycinin) precursor, Kunitz trypsin inhibitor 1 , Kunitz trypsin inhibitor 3, Gly m Bd 28 K, Gly m Bd 30 K, Gly m 8 (2S albumin), Lectin, and lipoxygenase. In another embodiment, the at least one known allergen comprises Gly m 1, Gly m 3, Gly m 4, Gly m 5 (beta-conglycinin), Gly m 6 (Glycinin) Gl, Gly m 6 (Glycinin) G2, Gly m 6 (Glycinin) G3, Gly m 6 (Glycinin) G4, or Gly m 6 (Glycinin) precursor.
In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 12, 28, 29, 30, 31 , 32, 33, 34, or 35 for Gly m 1. In a further embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 12, 28, or 29 for Gly m l . In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 13, 38, 39, or 40 for Gly m 3. In a further embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 13 or 38 for Gly m 3. In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 14, 42, 43, 44, 45, 46, 47, 48, 49, 50, or 51 for Gly m 4. In a further embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 14 or 42 for Gly m 4. In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 15, 61 , 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 , 72, 73, or 74 for Gly m 5. In a further embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 15 or 61 for Gly m 5. In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 16 or 95 for Gly m 6 Gl . In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 17 or 107 for Gly m 6 G2. In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 18 for Gly m 6 G3. In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 19, 130, 131 , 132, 133, or 134 for Gly m 6 G4. In a further embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 19 or 130 for Gly m 6 G4. In another embodiment, the consensus sequence or representative sequence of step (c) comprises SEQ ID NO: 20 for Gly m 6 precursor.
In one embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 12 and 30-35 for Gly m 1 . In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 13 and 39-40 for Gly m 3. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 14 and 43-51 for Gly m 4. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 15 and 62- 74 for Gly m 5. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 16 and 95 for Gly m 6 Gl . In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 17 and 107 for Gly m 6 G2. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 19 and 131-134 for Gly m 6 G4.
In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 1, 36, and 37 for Gly m 1. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 36 and 37 for Gly m 1. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 2 and 41 for Gly m 3. In another embodiment, the candidate signature peptides comprise SEQ ID NO: 41 for Gly m 3. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 52-57 for Gly m 4. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 52-57 for Gly m 4. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 3 and 75-94 for Gly m 5. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 75-94 for Gly m 5. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 96-106 for Gly m 6 Gl . In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 96-106 for Gly m 6 Gl . In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 4 and 108-120 for Gly m 6 G2. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 108-120 for Gly m 6 G2. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 5 and 121 -129 for Gly m 6 G3. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 121 -129 for Gly m 6 G3. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 135- 1 56 and 58 for Gly m 6 G4. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 135-156 and 58 for Gly m 6 G4. In another embodiment, the candidate signature peptides comprise at least one sequence selected from SEQ ID NOs: 6, 59, and 60 for Gly m 6 precursor. In another embodiment, the candidate signature peptides comprise SEQ ID NOs: 59 and 60 for Gly m 6 precursor. In another embodiment, the plant-based sample comprises a soybean seed or part of a soybean seed.
In another aspect, provided is a system for quantitating one or more protein of interest with known amino acid sequence in a plant-based sample. The system comprises:
(a) a high-throughput means for extracting proteins from a plant-based sample;
(b) a process module for digesting extracted proteins with at least one protease;
(c) a separation module for separating peptides in a single step;
(d) a selection module for selecting a plural of signature peptides for at least one known allergen and potential allergens; and
(e) a mass spectrometry for measuring the plural of signature peptides.
In one embodiment, the separation module comprises a column chromatography. In a further embodiment, the column chromatography comprises a liquid column chromatography. In another embodiment, the mass spectrometry comprises a high resolution accurate mass spectrometry (HRAM MS). In another embodiment, the selection module uses a method provided herein.
In one embodiment, the one or more protein of interest with known amino acid sequence in a plant-based sample comprises potential allergens. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 12 and 30-35 for Gly m l . In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 13 and 39-40 for Gly m 3. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 14 and 43-51 for Gly m 4. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 15 and 62-74 for Gly m 5. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 16 and 95 for Gly m 6 G l . In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 1 7 and 107 for Gly m 6 G2. In another embodiment, the potential allergens comprise at least one sequence selected from SEQ ID NOs: 1 and 131 -134 for Gly m 6 G4.
In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 1 , 36, and 37 for Gly m 1 . In another embodiment, the signature peptides comprise SEQ ID NOs: 36 and 37 for Gly m 1 . In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 2 and 41 for Gly m 3. In another embodiment, the signature peptides comprise SEQ ID NO: 41 for Gly m 3. In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 52-57 for Gly m 4. In another embodiment, the signature peptides comprise SEQ ID NOs: 52-57 for Gly m 4. In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 3 and 75-94 for Gly m 5. In another embodiment, the signature peptides comprise SEQ ID NOs: 75-94 for Gly m 5. In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 96-106 for Gly m 6 Gl . In another embodiment, the signature peptides comprise SEQ ID NOs: 96-106 for Gly m 6 Gl . In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 4 and 108-120 for Gly m 6 G2. In another embodiment, the signature peptides comprise SEQ ID NOs: 108-120 for Gly m 6 G2. In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 5 and 121 -129 for Gly m 6 G3. In another embodiment, the signature peptides comprise SEQ ID NOs: 121-129 for Gly m 6 G3. In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 135-156 and 58 for Gly m 6 G4. In another embodiment, the signature peptides comprise SEQ ID NOs: 135-156 and 58 for Gly m 6 G4. In another embodiment, the signature peptides comprise at least one sequence selected from SEQ ID NOs: 6, 59, and 60 for Gly m 6 precursor. In another embodiment, the signature peptides comprise SEQ ID NOs: 59 and 60 for Gly m 6 precursor. In another embodiment, the plant-based sample comprises a soybean seed or part of a soybean seed.
In another aspect, provided is a high-throughput method of quantitating at least one allergen with known amino acid sequence and homologous potential allergens in a plant- based sample. The method comprises using the system provided herein.
BRIEF DESCRIPTION OF THE FIGURES FIG. 1 shows a representative analysis work flow for the methods and systems disclosed herein.
FIGs. 2A - 2C show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 1 SYPSNATCPR; SEQ ID NO: 36 ALGILNLNR; and SEQ ID NO: 37 NLQL1LNSCGR from trypsin digested soybean sample chromatogram for Gly m 1. FIG. 2D shows representative SRM LC-MS/MS Standard Chromatogram - 500 ng/mL Synthetic Peptide SEQ ID NO: 1 SYPSNATCPR natural abundance peptide and heavy isotope labeled peptide transitions.
FIG. 2E shows sequences alignments among Gly m 1 and potential homologs of Gly m l .
FIG. 3A shows representative SRM LC-MS/MS for selected signature peptide SEQ ID NO: 2 YMVIQGEPGAVIR from trypsin digested soybean sample chromatogram for Gly m 3.
FIG. 3B shows representative SRM LC-MS/MS Standard Chromatogram - 500 ng/mL Synthetic Peptide SEQ ID NO: 2 YMVIQGEPGAVIR natural abundance peptide and heavy isotope labeled peptide transitions.
FIG. 3C shows sequences alignments among Gly m 3 and potential homologs of Gly m 3.
FIGs. 4A - 4F show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 52 MGVFTFEDEINSPVAPATLYK; SEQ ID NO: 53 ALDSFK; SEQ ID NO: 54 SVENVEGNGGPGTIK; SEQ ID NO: 55 ITFLEDGETK; SEQ ID NO: 56 FVLHK; and SEQ ID NO: 57 AIEAYLLAHPDYN from trypsin digested soybean sample chromatogram for Gly m 4.
FIG. 4G shows sequences alignments among Gly m 4 and potential homologs of Gly m 4.
FIGs. 5A - 5U show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 3 NILEASYDTK; SEQ ID NO: 75 CNLLK; SEQ ID NO: 76 EEDEDEQPRPIPFPRPQPR; SEQ ID NO: 77 EEQEWPR; SEQ ID NO: 78 QFPFPRPPHQK; SEQ ID NO: 79 ESEESEDSELR; SEQ ID NO: 80 NPFLFGSNR; SEQ ID NO: 81 FETLFK; SEQ ID NO: 82 SPQLQNLR; SEQ ID NO: 83 LQSGDALR; SEQ ID NO: 84 VPSGTTYYVVNPDNNENLR; SEQ ID NO: 85 FESFFLSSTEAQQSYLQGFSR; SEQ ID NO: 86 FEEINK; SEQ ID NO: 87 VLFSR; SEQ ID NO: 88 TISSEDKPFNLR; SEQ ID NO: 89 DP1YSNK; SEQ ID NO: 90 FFEITPEK; SEQ ID NO: 91 A I VILVINEGDANIELVGLK; SEQ ID NO: 92 EQQQEQQQEEQPLEVR; SEQ ID NO: 93
NFLAGSQDNVISQ1PSQVQELAFPGSAQAVEK; and SEQ ID NO: 94 ESYFVDAQPK from trypsin digested soybean sample chromatogram for Gly m 5
FIG. 5V shows representative SRM LC-MS/MS Standard Chromatogram - 500 ng/mL Synthetic Peptide SEQ ID NO: 3 NILEASYDTK natural abundance peptide and heavy isotope labeled peptide transitions.
FIG. 5W show sequences alignments among Gly m 5 and potential homologs of Gly m 5.
FIG. 6A - 6K show representative SRM LC -MS/MS for selected signature peptides
SEQ ID NO: 96 LVFSLCFLLFSGCCFAFSSR; SEQ ID NO: 97 EQPQQNECQIQK; SEQ ID NO: 98 RP S YTNGPQEI YIQQGK; SEQ ID NO: 99 HQQEEENEGGSILSGFTLEFLEHAFSVDK; SEQ ID NO: 100 HCQRPR; SEQ ID NO: 101 HNIGQTSSPDIYNPQAGSVTTATSLDFPALSWLR; SEQ ID NO: 102 ALIQVVNCNGER; SEQ ID NO: 103 VFDGELQEGR; SEQ ID NO: 104 TNDTPMIGTLAGANSLLNALPEEVIQHTFNLK; SEQ ID NO: 105 NNNPFK; and SEQ ID NO: 106 FLVPPQESQK from trypsin digested soybean sample chromatogram for Gly m 6 Gl .
FIG. 6L shows a sequence alignment between Gly m 6 Gl and a potential homolog of Gly m 6 Gl .
FIGs. 7A - 7N show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 4 VTAPAMR; SEQ ID NO: 108 LVLSLCFLLFSGCFALR; SEQ ID NO: 109 EQAQQNECQIQK; SEQ ID NO: 1 10 RPSYTNGPQEIYIQQGNGIFGMIFPGC PSTYQEPQESQQR; SEQ ID NO: 1 1 1 SQRPQDR; SEQ ID NO: 1 12 QQEEENEGSNILSGFAPEFLK; SEQ ID NO: 1 13 EAFGVNMQIVR; SEQ ID NO: 1 14 KPQQEEDDDDEEEQPQCVETDK; SEQ ID NO: 1 15 LSAQYGSLR; SEQ ID NO: 1 16 NAMFVPHYTLNANSIIYALNGR; SEQ ID NO: 1 17 ALVQVVNCNGER; SEQ ID NO: 1 18 VFDGELQEGGVLIVPQNFAVAAK; SEQ ID NO: 1 19 TNDRPSIGNLAGANSLLNALPEE-VIQHTFNLK; and SEQ ID NO: 120 NNNPFSFLVPPQESQR from trypsin digested soybean sample chromatogram for Gly m 6 G2.
FIG. 70 shows representative SRM LC-MS/MS Standard Chromatogram - 500 ng/mL Synthetic Peptide SEQ ID NO: 4 VTAPAMR natural abundance peptide and heavy isotope labeled peptide transitions.
FIG. 7P shows a sequence alignment between Gly m 6 G2 and a potential homolog of Gly m 6 G2.
FIGs. 8A - 8J show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 5 NNNPFSFLVPPK: SEQ ID NO: 121 LVLSLCFLLFSGCCFAFSFR; SEQ ID NO: 122 EQPQQNECQIQR; SEQ ID NO: 123 QQEEENEGGSILSGFAPEFLEHAFVVDR; SEQ ID NO: 124 LQGENEEEEK; SEQ ID NO: 125 GGLSVISPPTEEQQQRPEEEEKPDCDEK; SEQ ID NO: 126 HCQSQSR; SEQ ID NO: 127 LSAQFGSLR; SEQ ID NO: 128 VFDGELQEGQVLIVPQNFAVAAR; and SEQ ID NO: 129 TNDRPSIGNLAGANSLLNALPEEVIQQTFNLR from trypsin digested soybean sample chromatogram for Gly m 6 G3.
FIG. 8K shows representative SRM LC-MS/MS Standard Chromatogram - 500 ng/mL Synthetic Peptide SEQ ID NO: 5 NNNPF SFL VPPK natural abundance peptide and heavy isotope labeled peptide transitions.
FIGs. 9A - 9X show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 6 ADFYNPK; SEQ ID NO: 135 LNECQLNNLNALEPDHR; SEQ ID NO: 136 CAGVTVSK; SEQ ID NO: 137 LTLNR; SEQ ID NO: 138 MIIIAQGK; SEQ ID NO: 139 GALGVAIPGCPETFEEPQEQSNR; SEQ ID NO: 140 QQLQDSHQK; SEQ ID NO: 141 VFYLAGNPDIEYPETMQQQQQQK; SEQ ID NO: 142 QGQHQQEEEEEGGSVLSGFSK; SEQ ID NO: 143 HFLAQSFNTNEDIAEK; SEQ ID NO: 144 QIVTVEGGLSVISPK; SEQ ID NO: 145
WQEQQDEDEDEDEDDEDEQIPSHPPR; SEQ ID NO: 146 RPSHGK; SEQ ID NO: 147 EQDEDEDEDEDKPRP SRP SQGK; SEQ ID NO: 148 QEEPR; SEQ ID NO: 149 NGVEENICTLK; SEQ ID NO: 150 LHENIARPSR; SEQ ID NO: 151 ISTLNSLTLPALR; SEQ ID NO: 152 QFQLSAQYVVLYK; SEQ ID NO: 153 NGIYSPHWNLNANSVIYVTR; SEQ ID NO: 154 DVFR; SEQ ID NO: 155 AIPSEVLAHSYNLR; SEQ ID NO: 156 QSQVSELK; and SEQ ID NO: 58 YEGNWGPLVNPESQQGSPR from trypsin digested soybean sample chromatogram for Gly m 6 G4.
FIG 9Y shows sequences alignments among Gly m 6 G4 and potential homologs of
Gly m 6 G4.
FIGs. 10A, 10B, AND I OC show representative SRM LC-MS/MS for selected signature peptides SEQ ID NO: 6 ADFYNPK, SEQ ID NO: 59 GALQCKPGCPETFEEPQEQSNR, and SEQ ID NO: 60 LQSPDDER from trypsin digested soybean sample chromatogram for Gly m 6 precursor.
MODE(S) FOR CARRYING OUT THE INVENTION It is of significance to enable a sensitive multiplex assay that is capable of selectively detecting and measuring levels of proteins of interest. Currently, relevant technologies for protein expression detection rely heavily on traditional immunochemistry technologies which present a challenge to accommodate the volume of data required to generate per sample.
Soybean is a multi-billion dollar commodity due to its balanced composition of
2:2: 1 protein, starch, and oil by weight. Many seeds, including soybeans, contain proteins that are allergens and anti-nutritional factors. As such, there are concerns regarding the potential of altering allergen levels in genetically-modified soybean varieties when compared to varieties developed through traditional breeding. The measurement of allergen levels in crops has been achieved almost exclusively by immunoassays, such as enzyme-linked immunosorbent assays (ELISA) or IgE-immunoblotting; however, these methods suffer from limited sensitivity and specificity and high variability.
There has been recent interest in developing LC-MS/MS based methods to quantify several plant-expressed proteins in a single analysis. Analysis using these "signature peptides" involves tracking protein expression levels by quantifying several highly specific digest fragments of the proteins of interest. This can be typically accomplished using liquid chromatography coupled with selected reaction monitoring (SRM) tandem mass spectrometry. Improved multiplexed LC-MS/MS methods and systems are provided herein to enable simultaneous quantitation(s) of several allergen proteins in transgenic and non- transgenic soybean. Methods and systems provided herein are validated for analytical figures of merit including accuracy, precision, linearity, limits of detection and quantitation; and for other considerations including sample throughput, transferability, and ease of use. The allergens can be quantified using a multiplexing format and samples can be harvested from the field, processed, and analyzed/quantitated for example within a day (twenty-four hours) window (from field to measured numerical value). In addition, sample preparations of the methods and systems provided can be fully scalable for high- throughput, thus enabling hundreds of samples to be analyzed in a single batch.
Representative soybean allergens include, for example, Gly m 1 , Gly m 3, Gly m 4, Gly m 5 (beta-conglycinin), Gly m 6 (Glycinin) Gl , Gly m 6 (Glycinin) G2, Gly m 6 (G lycinin) G3, Gly m 6 (Glycinin) G4, Gly m 6 (Glycinin) precursor, Gly m 6 (Glycinin) G4 precursor, Kimitz trypsin inhibitor 1 , Kunitz trypsin inhibitor 3, Gly m Bd 28 K, Gly m Bd 30 K, Gly m 8 (2S albumin), Lectin, and lipoxygenase.
Representative wheat allergens include, for example, profilin (Tri a ! 2), wheat lipid transfer protein 1 (Tri al4), agglutinin isolectin 1 (Tri al 8), omega-5 gliadin - seed storage protein (Tri al9), gliadin (Tri a20; NCBI Accession Nos. M10092, Ml 1073, Ml 1074, Ml 1075, Ml 1076, K03074, and K03075), thioredoxin (Tri a25), high molecular weight glutenin (Tri a26), low molecular weight glutenin (Tri a36), and alpha purothionin (Tri a37).
Representative corn allergens include, for example, maize lipid transfer protein (LTP) (Zea ml 4) and thioredoxin (Zea m25).
Representative corn allergens include, for example, rice profilin A (Ory si 2).
In some embodiments, the methods and systems provided use liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) to detect protein expression levels of sixteen different allergens from soybean. In some embodiments, the methods and systems enable analysis of each allergen by itself or combined with additional proteins for a multiplexing assay for qualitative and quantitative analysis in plant matrices.
In some embodiments, the mass spectrometry detection for quantitative studies may be accomplished using selected reaction monitoring, performed on a triple quadrupole mass spectrometer. Using this type of instrumentation, initial mass-selection of ion (peptide) of interest formed in the source, followed by, dissociation of this precursor ion in the collision region of the MS, then mass-selection, and counting, of a specific product (daughter) ion. In some embodiments, the mass spectrometry detection for quantitative studies may be accomplished using selected reaction monitoring (SRM). Using particular type of instrumentation, initial mass-selection of ion of interest formed in the source, followed by, dissociation of this precursor (protein) ion in the collision region of the mass spectrometer (MS), then mass-selection, and counting, of a specific product (peptide) ion. In some embodiment, counts per unit time may provide an integratable peak area from which amounts or concentration of analytes can be determined. In some embodiment, the use of high resolution accurate mass (HRAM) monitoring for quantitation, performed on a HRAM capable mass spectrometer, may include, but is not limited to, hybrid quadrupole- time-of-flight, quadrupole-orbitrap, ion trap-orbitrap, or quadrupole-ion-trap-orbitrap (tribrid) mass spectrometers. Using particular type of instrumentation, peptides are not subject to fragmentation conditions, but rather are measured as intact peptides using full scan or targeted scan modes (for example selective ion monitoring mode or SIM). Integratable peak area can be determined by generating an extracted ion chromatogram for each specific analyte and amounts or concentration of analytes can be calculated. The high resolution and accurate mass nature of the data enable highly specific and sensitive ion signals for the analyte (protein and/or peptide) of interest.
Unless otherwise stated, the following terms used in this application, including the specification and claims, have the definitions given below. It must be noted that, as used in the specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise.
As used herein, the term "bioconfinement" refers to restriction of the movement of genetically modified plants or their genetic material to designated areas. The term includes physical, physicochemical, biological confinement, as well as other forms of confinement that prevent the survival, spread or reproduction of a genetically modified plants in the natural environment or in artificial growth conditions.
As used herein, the term "complex protein sample" is used to distinguish a sample from a purified protein sample. A complex protein sample contains multiple proteins, and may additionally contain other contaminants.
As used herein, the general term "mass spectrometry" or "MS" refers to any suitable mass spectrometry method, device or configuration including, e.g., electrospray ionization (ESI), matrix-assisted laser desorption/ionization (MALDI) MS, MALDI-time of flight (TOF) MS, atmospheric pressure (AP) MALDI MS, vacuum MALDI MS, or combinations thereof. Mass spectrometry devices measure the molecular mass of a molecule (as a function of the molecule's mass-to-charge ratio) by measuring the molecule's flight path through a set of magnetic and electric fields. The mass-to-charge ratio is a physical quantity that is widely used in the electrodynamics of charged particles. The mass-to-charge ratio of a particular peptide can be calculated, a priori, by one of skill in the art. Two particles with different mass-to-charge ratio will not move in the same path in a vacuum when subjected to the same electric and magnetic fields.
Mass spectrometry instruments consist of three modules: an ion source, which splits the sample molecules into ions; a mass analyzer, which sorts the ions by their masses by applying electromagnetic fields; and a detector, which measures the value of an indicator quantity and thus provides data for calculating the abundances of each ion present. The technique has both qualitative and quantitative applications. These include identifying unknown compounds, determining the isotopic composition of elements in a molecule, determining the structure of a compound by observing its fragmentation, and quantifying the amount of a compound in a sample. A detailed overview of mass spectrometry methodologies and devices can be found in the following references which are hereby incorporated by reference: Can and Annan (1997) Overview of peptide and protein analysis by mass spectrometry. In: Current Protocols in Molecular Biology, edited by Ausubel, et al. New York: Wiley, p. 10.21.1- 10.21.27; Paterson and Aebersold (1995) Electrophoresis 16: 1791-1814; Patterson (1998) Protein identification and characterization by mass spectrometry. In: Current Protocols in Molecular Biology, edited by Ausubel, et al. New York: Wiley, p. 10.22.1-10.22.24; and Domon and Aebersold (2006) Science 312(5771 ):212-17.
As the term is used herein, proteins and/or peptides are "multiplexed" when two or more proteins and/or peptides of interest are present in the same sample.
As used herein, a "plant trait" may refer to any single feature or quantifiable measurement of a plant.
As used herein, the phrase "peptide" or peptides" may refer to short polymers formed from the linking, in a defined order, of a-amino acids. Peptides may also be generated by the digestion of polypeptides, for example proteins, with a protease.
As used herein, the phrase "protein" or proteins" may refer to organic compounds made of amino acids arranged in a linear chain and joined together by peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. The sequence of amino acids in a protein is defined by the sequence of a gene, which is encoded in the genetic code. In general, the genetic code specifies 20 standard amino acids, however in certain organisms the genetic code can include selenocysteine— and in certain archaea- pyrrolysine. The residues in a protein are often observed to be chemically modified by post-translational modification, which can happen either before the protein is used in the cell, or as part of control mechanisms. Protein residues may also be modified by design, according to techniques familiar to those of skill in the art. As used herein, the term "protein" encompasses linear chains comprising naturally occurring amino acids, synthetic amino acids, modified amino acids, or combinations of any or all of the above.
As used herein, the term "single injection" refers to the initial step in the operation of a MS or LC-MS device. When a protein sample is introduced into the device in a single injection, the entire sample is introduced in a single step.
As used herein, the phrase "signature peptide" refers an identifier (short peptide) sequence of a specific protein. Any protein may contain an average of between 10 and 100 signature peptides. Typically signature peptides have at least one of the following criteria: easily detected by mass spectroscopy, predictably and stably eluted from a liquid chromatography (LC) column, enriched by reversed phase high performance liquid chromatography (RP-HPLC), good ionization, good fragmentation, or combinations thereof. A peptide that is readily quantified by mass spectrometry typically has at least one of the following criteria: readily synthesized, ability to be highly purified (>97%), soluble in ≤=20% acetonitrile, low non-specific binding, oxidation resistant, post-synthesis modification resistant, and a hydrophobicity or hydrophobicity index≥ 10 and≤ 40. The hydrophobicity index can be calculated according to Krokhin, Molecular and Cellular Proteomics 3 (2004) 908, which is incorporated by reference. It's known that a peptide having a hydrophobicity index less than 10 or greater than 40 may not be reproducibly resolved or eluted by a RP-HPLC column.
As used herein, the term "stacked" refers to the presence of multiple heterologous polynucleotides incorporated in the genome of a plant.
Tandem mass spectrometry: In tandem mass spectrometry, a parent ion generated from a molecule of interest may be filtered in a mass spectrometry instrument, and the parent ion subsequently fragmented to yield one or more daughter ions that are then analyzed (detected and/or quantified) in a second mass spectrometry procedure. In some embodiments, the use of tandem mass spectrometry is excluded. In these embodiments, tandem mass spectrometry is not used in the methods and systems provided. Thus, neither parent ions nor daughter ions are generated in these embodiments.
As used herein, the term "transgenic plant" includes reference to a plant which comprises within its genome a heterologous polynucleotide. Generally, the heterologous polynucleotide is stably integrated within the genome such that the polynucleotide is passed on to successive generations. The heterologous polynucleotide may be integrated into the genome alone or as part of a recombinant expression cassette. "Transgenic" is used herein to include any cell, cell line, callus, tissue, plant part or plant, the genotype of which has been altered by the presence of heterologous nucleic acid including those transgenic plants initially so altered as well as those created by sexual crosses or asexual propagation from the initial transgenic plant.
Any plants that provide useful plant parts may be treated in the practice of the present invention. Examples include plants that provide flowers, fruits, vegetables, and grains. As used herein, the phrase "plant" includes dicotyledonous plants and monocotyledonous plants. Examples of dicotyledonous plants include tobacco, Arabidopsis, soybean, tomato, papaya, canola, sunflower, cotton, alfalfa, potato, grapevine, pigeon pea, pea, Brassica, chickpea, sugar beet, rapeseed, watermelon, melon, pepper, peanut, pumpkin, radish, spinach, squash, broccoli, cabbage, carrot, cauliflower, celery, Chinese cabbage, cucumber, eggplant, and lettuce. Examples of monocotyledonous plants include corn, rice, wheat, sugarcane, barley, rye, sorghum, orchids, bamboo, banana, cattails, lilies, oat, onion, millet, and triticale. Examples of fruit include banana, pineapple, oranges, grapes, grapefruit, watermelon, melon, apples, peaches, pears, kiwifruit, mango, nectarines, guava, persimmon, avocado, lemon, fig, and berries. Examples of flowers include baby's breath, carnation, dahlia, daffodil, geranium, gerbera, lily, orchid, peony, Queen Anne's lace, rose, snapdragon, or other cut-flowers or ornamental flowers, potted- flowers, and flower bulbs.
The specificity allowed in a mass spectrometry approach for identifying a single protein from a complex sample is unique in that only the sequence of the protein of interest is required in order to identify the protein of interest. Compared to other formats of multiplexing, mass spectrometry is unique in being able to exploit the full length of a protein's primary amino acid sequence to target unique identifier-type portions of a protein's primary amino acid sequence to virtually eliminate non-specific detection. In some embodiments of the present invention, a proteolytic fragment or set of proteolytic fragments that uniquely identifies a protein(s) of interest is used to detect the protein(s) of interest in a complex protein sample.
In some embodiments, disclosed methods enable the quantification or determination of ratios of multiple proteins in a complex protein sample by a single mass spectrometry analysis, as opposed to measuring each protein of interest individually multiple times and compiling the individual results into one sample result.
In some embodiments, the present disclosure also provides methods useful for the development and use of transgenic plant technology. Specifically, disclosed methods may be used to maintain the genotype of transgenic plants through successive generations. Also, some embodiments of the methods disclosed herein may be used to provide high- throughput analysis of non-transgenic plants that are at risk of being contaminated with transgenes from neighboring plants, for example, by cross-pollination. By these embodiments, bioconfinement of transgenes may be facilitated and/or accomplished. In other embodiments, methods disclosed herein may be used to screen the results of a plant transformation procedure in a high-throughput manner to identify transformants that exhibit desirable expression characteristics
The mass-to-charge ratio may be determined using a quadrupole analyzer. For example, in a "quadrupole" or "quadrupole ion trap" instrument, ions in an oscillating radio frequency field experience a force proportional to the DC potential applied between electrodes, the amplitude of the RF signal, and m/z. The voltage and amplitude can be selected so that only ions having a particular m/z travel the length of the quadrupole, while all other ions are deflected. Thus, quadrupole instruments can act as a "mass filter" and "mass detector" for the ions injected into the instrument.
Collision-induced dissociation ("CID") is often used to generate the daughter ions for further detection. In CID, parent ions gain energy through collisions with an inert gas, such as argon, and subsequently fragmented by a process referred to as "unimolecular decomposition." Sufficient energy must be deposited in the parent ion so that certain bonds within the ion can be broken due to increased energy.
The mass spectrometer typically provides the user with an ion scan; that is, the relative abundance of each m/z over a given range (for example 10 to 1200 amu). The results of an analyte assay, that is, a mass spectrum, can be related to the amount of the analyte in the original sample by numerous methods known in the art. For example, given that sampling and analysis parameters are carefully controlled, the relative abundance of a given ion can be compared to a table that converts that relative abundance to an absolute amount of the original molecule. Alternatively, molecular standards (e.g., internal standards and external standards) can be run with the samples and a standard curve constructed based on ions generated from those standards. Using such a standard curve, the relative abundance of a given ion can be converted into an absolute amount of the original molecule. Numerous other methods for relating the presence or amount of an ion to the presence or amount of the original molecule are well known to those of ordinary skill in the art.
The choice of ionization method can be determined based on the analyte to be measured, type of sample, the type of detector, the choice of positive versus negative mode, etc. Ions can be produced using a variety of methods including, but not limited to, electron ionization, chemical ionization, fast atom bombardment, field desorption, and matrix- assisted laser desorption ionization (MALDl), surface enhanced laser desorption ionization (SELDI), desorption electrospray ionization (DESI), photon ionization, electrospray ionization, and inductively coupled plasma. Electrospray ionization refers to methods in which a solution is passed along a short length of capillary tube, to the end of which is applied a high positive or negative electric potential. Solution reaching the end of the tube, is vaporized (nebulized) into a jet or spray of very small droplets of solution in solvent vapor. This mist of droplets flows through an evaporation chamber which is heated to prevent condensation and to evaporate solvent. As the droplets get smaller the electrical surface charge density increases until such time that the natural repulsion between like charges causes ions as well as neutral molecules to be released.
The effluent of an LC may be injected directly and automatically (i.e., "in-line") into the electrospray device. In some embodiments, proteins contained in an LC effluent are first ionized by electrospray into a parent ion.
Various different mass analyzers can be used in liquid chromatography - mass spectrometry combination (LC-MS). Exemplary mass analyzers include, but not limited to, single quadrupole, triple quadrupole, ion trap, TOF (time of flight), and quadrupole- time of flight (Q-TOF).
The quadrupole mass analyzer may consist of 4 circular rods, set parallel to each other. In a quadrupole mass spectrometer (QMS), the quadrupole is the component of the instrument responsible for filtering sample ions, based on their mass-to-charge ratio (m/z). Ions are separated in a quadrupole based on the stability of their trajectories in the oscillating electric fields that are applied to the rods.
An ion trap is a combination of electric or magnetic fields that captures ions in a region of a vacuum system or tube. Ion traps can be used in mass spectrometry while the ion's quantum state is manipulated.
Time-of-flight mass spectrometry (TOFMS) is a method of mass spectrometry in which an ion's mass-to-charge ratio is determined via a time measurement. Ions are accelerated by an electric field of known strength. This acceleration results in an ion having the same kinetic energy as any other ion that has the same charge. The velocity of the ion depends on the mass-to-charge ratio. The time that it subsequently takes for the particle to reach a detector at a known distance is measured. This time will depend on the mass-to-charge ratio of the particle (heavier particles reach lower speeds). From this time and the known experimental parameters one can find the mass-to-charge ratio of the ion. In some embodiments, the particular instrument used by the methods and/or systems provided may comprise a high fragmentation mode and a low fragmentation mode (or alternatively a non-fragmentation mode). Such different modes may include alternating scan high and low energy acquisition methodology to generate high resolution mass data. In some embodiments, the high resolution mass data may comprise a product data set (for example data derived from product ion (fragmented ions) under the high fragmentation mode) and a precursor data set (for example data derived from precursor ions (unfragmented ions) under the low fragmentation or non-fragmentation mode).
In some embodiments, the methods and/or systems provided use a mass spectrometer comprising a filtering device that may be used in the selection step, a fragmentation device that may be used in the fragmentation step, and/or one or more mass analyzers that may be used in the acquisition and/or mass spectrum creation step or steps.
The filtering device and/or mass analyzer may comprise a quadrupole. The selection step and/or acquisition step and/or mass spectrum creation step or steps may involve the use of a resolving quadrupole. Additionally or alternatively, the filtering device may comprise a two-dimensional or three-dimensional ion trap or time-of-flight (ToF) mass analyzer. The mass analyzer or mass analyzers may comprise or further comprise one or more of a time-of-flight mass analyzer and/or an ion cyclotron resonance mass analyzer and/or an orbitrap mass analyzer and/or a two-dimensional or three-dimensional ion trap.
Filtering by means of selection based upon mass-to-charge ratio (m/z) can be achieved by using a mass analyzer which can select ions based upon m/z, for example a quadrupole; or to transmit a wide m/z range, separate ions according to their m/z, and then select the ions of interest by means of their m/z value. An example of the latter would be a time-of-flight mass analyzer combined with a timed ion selector(s). The methods and/or systems provided may comprise isolating and/or separating the one or more proteins of interest, for example from two or more of a plurality of proteins, using a chromatographic technique for example liquid chromatography (LC). The method may further comprise measuring an elution time for the protein of interest and/or comparing the measured elution time with an expected elution time.
Additionally or alternatively, the proteins of interest may be separated using an ion mobility technique, which may be carried out using an ion mobility cell. Additionally, the proteins of interest may be selected by order or time of ion mobility drift. The method may further comprise measuring a drift time for the proteins of interest and/or comparing the measured drift time with an expected drift time.
In some embodiments, the methods and/or systems provided are label-free, where quantitation can be achieved by comparison of the peak intensity, or area under the mass spectral peak for the precursor or product m/z values of interest between injections and across samples. In some embodiments, internal standard normalization may be used to account for any known associated analytical error. Another label-free method of quantification, spectral counting, involves summing the number of fragment ion spectra, or scans, that are acquired for each given peptide, in a non-redundant or redundant fashion. The associated peptide mass spectra for each protein are then summed, providing a measure of the number of scans per protein with this being proportional to its abundance. Comparison can then be made between samples/injections.
In some embodiments, the ion source is selected from the group consisting of: (1 ) an electrospray ionization ("ESI") ion source; (2) an atmospheric pressure photo ionization ("APPI") ion source; (3) an atmospheric pressure chemical ionization ("APCI") ion source; (4) a matrix assisted laser desorption ionization ("MALDI") ion source; (5) a laser desorption ionization ("LDI") ion source; (6) an atmospheric pressure ionization ("API") ion source; (7) a desorption ionization on silicon ("DIOS") ion source; (8) an electron impact ("El") ion source; (9) a chemical ionization ("CI") ion source; (10) a field ionization ("Fl") ion source; (1 1) a field desorption ("FD") ion source; (12) an inductively coupled plasma ("ICP") ion source; (13) a fast atom bombardment ("FAB") ion source; (14) a liquid secondary ion mass spectrometry ("LSIMS") ion source; (15) a desorption electrospray ionization ("DESI") ion source; (16) a nickel- 63 radioactive ion source; ( 17) an atmospheric pressure matrix assisted laser desorption ionization ion source; and (18) a thermospray ion source.
In some embodiments, the methods and/or systems provided comprise an apparatus and/or control system configured to execute a computer program element comprising computer readable program code means for causing a processor to execute a procedure to implement the methods.
In some embodiments, the methods and/or systems provided use an alternating low and elevated energy scan function in combination with liquid chromatography separation of a plant extract. A list of information for proteins of interest can be provided including, but is not limited to, m/z of precursor ion, m/z of product ions, retention time, ion mobility drift time and rate of change of mobility. During the course of the LC separation and as the target ions elute into the mass spectrometer (and as either low energy precursor ions, or elevated energy product ions are detected, or the retention time window is activated) the mass analyzer of the methods and/or systems provided may select a narrow m/z range (of a variable and changeable width) to pass ions through to the gas cell. Accordingly, the signal to noise ratio can be enhanced significantly for quantification of proteins of interest.
In some embodiments, at a chromatographic retention time when a targeted protein of interest is about to elute into the mass spectrometer ion source, the mass analyzer of the methods and/or systems provided can select a narrow m/z range (of a variable and changeable width) according to the targeted precursor ion. These selected ions are then transferred to an instrument stage capable of dissociating the ions by means of alternate and repeated switches between a high fragmentation mode where the sample precursor ions are substantially fragmented into product ions and a low fragmentation mode (or non- fragmentation mode) where the sample precursor ions are not substantially fragmented. Typically high resolution, accurate mass spectra are acquired in both modes and at the end of the experiment associated precursor and product ions are recognized by the closeness in fit of their chromatographic elution times and optionally other physicochemical properties. The signal intensity of either the precursor ion or the product ion associated with targeted proteins of interest can be used to determine the quantity of the proteins in the plant extract.
Those skilled in the art would understand certain variation can exist based on the disclosure provided. Thus, the following examples are given for the purpose of illustrating the invention and shall not be construed as being a limitation on the scope of the invention or claims. EXAMPLES
Example 1
The methods and systems provided are used for determination of endogenous soybean allergen proteins in soybean seed including Gly m 1 , Gly m 3, Gly m 4, Gly m 5 (beta-conglycinin), Gly m 6, Kunitz trypsin inhibitor 1 , Kunitz trypsin inhibitor 3, Gly m Bd 28 K, Gly m Bd 30 K, and Gly m 8 (2S albumin). A 100 ± 0.5 mg ground soybean seed sample is defatted twice with hexanes and dried before extracting with extraction buffer containing 5 M urea, 2 M thiourea, 50 mM Tris pH 8.0 and 65 iiiM DTT. The sample is sonicated in a water bath for thirty minutes, vortexed for one minute, sonicated for another thirty minutes and centrifuged at > 3,000 rpm for ten minutes at 4°C.
Figure imgf000023_0001
The aqueous supernatant is collected and diluted to bring the endogenous soybean allergen protein concentration into the calibration standard range with extraction buffer. The diluted extract is denatured at 95°C for twenty minutes with the additional 1 M Tris pH 8.0, 0.5 M DTT and deionized water followed by refrigeration at 4°C for ten minutes. The denatured extract is incubated overnight (~ 15 hours) at 37°C with 0.5 mg/mL trypsin enzyme. The digestion reaction is quenched with formic acid water (50/50 v/v) and centrifuge at > 3,000 rpm for tern minutes at 4°C. An aliquot of digested extract is transferred to an autosampler vial and analyzed along with calibration standard by liquid chromatography with positive-ion electrospray (ESI) tandem mass spectrometry (LC- MS/MS). Calibration standards of signature peptides are prepared as listed in Table 1.
The limits of detection (LOD) and limits of quantitation (LOQ) for endogenous soybean allergens in this example are set forth in Table 2, where LOD and LOQ represent protein concentration (ng/mg). Table 2. Limits of detection (LOD) and limits of quantitation (LOQ) for endogenous soybean allergens in Example 1 (LOD and LOQ represent protein concentration)
Allergen Signature peptide LOD LOQ
(ng/mg) (ng/mg)
Gly m 1 SYPSNATCPR (SEQ ID NO: 1) 0.23 0.46
Gly m 3 YMVIQGEPGAVIR (SEQ ID NO: 2) 0.20 0.39
Gly m 5 NILEASYDTK (SEQ ID NO: 3) 1.22 2.44
Glycinin G2 VTAPAMPv (SEQ ID NO: 4) 1.46 2.92
Glycinin G3 NNNPF SFLVPPK (SEQ ID NO: 5) 1.58 3.16
Glycinin ADFYNPK (SEQ ID NO: 6) - - precursor
Kunitz trypsin GGGIEVDSTGK (SEQ ID NO: 7) - - inhibitor 1
Kunitz trypsin GIGTLLSSPYR (SEQ ID NO: 8) - - inhibitor 3
Gly m Bd 28 K NKPQFLAGAASLLR (SEQ ID NO: 9) 5.70 1 1.40
Gly m Bd 30 K GVITQVK (SEQ ID NO: 10) 1.15 2.30
Gly m 8 IMENQSEELEEK (SEQ ID NO: 1 1) 0.25 0.50
Concentrations of allergens are calculated from quantitation of signature peptides (for example Analyst Bioanalytical software for LC-MS/MS), and validated by other methods including enzyme-linked immunosorbent assays (ELISA). Calculated concentrations of allergens from different samples are compared using statistical analysis, and results show good consistency among samples.
Example 2
Several homologous protein sequences for Gly m 1 are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 12 and 29-35) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 2E). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
Once the consensus sequence and/or representative sequence is chosen or determined, it is digested in silico to generate candidate signature peptide fragments to be detected and measured by LC-MS. According to the unique approaches provided herein, signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 1 itself, but also measure potential allergens which are highly homologous to Gly m 1.
Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT). The samples are sonicated in buffer to extract proteins. The extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours. The digestion reactions are acidified with formic acid (pH = 1-2) and are analyzed using LC-MS/MS.
The selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 1 , either by itself or in combination with additional proteins in a multiplexing assay format. In this example, three signature peptides are selected from all peptide possibilities (SEQ ID NO: 1 SYPSNATCPR; SEQ ID NO: 36 ALGILNLNR; and SEQ ID NO: 37 NLQLILNSCGR), and representative quantitation of these signature peptides are shown in FIGs. 2A - 2C. A peptide standard is synthesized for SEQ ID NO: 1 SYPSNATCPR for quantitative and qualitative analyses (see FIG. 2D). Synthetic peptides can directly serve as an analytical reference standard for protein quantitation.
Example 3
Several homologous protein sequences for Gly m 3 are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 13, 39, and 40) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 3C). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
Once the consensus sequence and/or representative sequence is chosen or determined, it is digested in silico to generate candidate signature peptide fragments to be detected and measured by LC-MS. According to the unique approaches provided herein, signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 3 itself, but also measure potential allergens which are highly homologous to Gly m 3.
Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT). The samples are sonicated in buffer to extract proteins. The extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours. The digestion reactions are acidified with formic acid (pH = 1-2) and are analyzed using LC-MS/MS.
The selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 3, either by itself or in combination with additional proteins in a multiplexing assay format. In this example, two signature peptides are selected from all peptide possibilities (SEQ ID NO: 2 YMVIQGEPGAVIR; and SEQ ID NO: 41 GPGGVTVK), and representative quantitation of SEQ ID NO: 2 YMVIQGEPGAVIR is shown in FIG. 3A. A peptide standard is synthesized for SEQ ID NO: 2 YMVIQGEPGAVIR for quantitative and qualitative analyses (see FIG. 3B). Synthetic peptides can directly serve as an analytical reference standard for protein quantitation.
Example 4
Several homologous protein sequences for Gly m 4 are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 14 and 43-51 ) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 4G). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
Once the consensus sequence and/or representative sequence is chosen or determined, it is digested in silico to generate candidate signature peptide fragments to be detected and measured by LC-MS. According to the unique approaches provided herein, signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 4 itself, but also measure potential allergens which are highly homologous to Gly m 4.
Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT). The samples are sonicated in buffer to extract proteins. The extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours. The digestion reactions are acidified with formic acid (pH = 1 -2) and are analyzed using LC-MS/MS.
The selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 4, either by itself or in combination with additional proteins in a multiplexing assay format. In this example, several signature peptides are selected from all peptide possibilities (SEQ ID NO: 52 MGVFTFEDEINSPVAPATLYK; SEQ ID NO: 53 ALDSFK; SEQ ID NO: 54 SVENVEGNGGPGTIK; SEQ ID NO: 55 ITFLEDGETK; SEQ ID NO: 56 FVLHK; and SEQ ID NO: 57 AIEAYLLAHPDYN), and representative quantitation of these signature peptides are shown in FIGs. 4A - 4F. Synthetic peptides can directly serve as an analytical reference standard for protein quantitation. Example 5
Several homologous protein sequences for Gly m 5 are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 1 5 and 62-74) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 5W). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
Once the consensus sequence and/or representative sequence is chosen or determined, it is digested in silico to generate candidate signature peptide fragments to be detected and measured by LC-MS. According to the unique approaches provided herein, signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify al l or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 5 itself, but also measure potential allergens which are highly homologous to Gly m 5.
Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT). The samples are sonicated in buffer to extract proteins. The extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours. The digestion reactions are acidified with formic acid (pH = 1-2) and are analyzed using LC-MS/MS.
The selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 5, either by itself or in combination with additional proteins in a multiplexing assay format. In this example, several signature peptides are selected from all peptide possibilities (SEQ ID NO: 3 NILEASYDTK; SEQ ID NO: 75 CNLLK; SEQ ID NO: 76 EEDEDEQPRPIPFPRPQPR; SEQ ID NO: 77 EEQEWPR; SEQ ID NO: 78 QFPFPRPPHQK; SEQ ID NO: 79 ESEESEDSELR; SEQ ID NO: 80 NPFLFGSNR; SEQ ID NO: 81 FETLFK; SEQ ID NO: 82 SPQLQNLR; SEQ ID NO: 83 LQSGDALR; SEQ ID NO: 84 VPSGTTYYVVNPDNNENLR; SEQ ID NO: 85 FESFFLSSTEAQQSYLQGFSR; SEQ ID NO: 86 FEEINK; SEQ ID NO: 87 VLFSR; SEQ ID NO: 88 TISSEDKPFNLR; SEQ ID NO: 89 DP1YSNK; SEQ ID NO: 90 FFEITPEK; SEQ ID NO: 91 AIVILVINEGDANIELVGLK; SEQ ID NO: 92 EQQQEQQQEEQPLEVR; SEQ ID NO: 93
NFLAGSQDNVISQIPSQVQELAFPGSAQAVEK; and SEQ ID NO: 94 ESYFVDAQPK), and representative quantitation of these signature peptides are shown in FIGs. 5A - 5U. A peptide standard is synthesized for SEQ ID NO: 3 NILEASYDTK for quantitative and qualitative analyses (see FIG. 5V). Synthetic peptides can directly serve as an analytical reference standard for protein quantitation.
Example 6
At least two homologous protein sequences for Gly m 6 Gl are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 16 and 95) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 6L). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
Once the consensus sequence and/or representative sequence is chosen or determined, it is digested in silico to generate candidate signature peptide fragments to be detected and measured by LC-MS. According to the unique approaches provided herein, signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 6 Gl itself, but also measure potential allergens which are highly homologous to Gly m 6 Gl .
Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT). The samples are sonicated in buffer to extract proteins. The extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours. The digestion reactions are acidified with formic acid (pH = 1-2) and are analyzed using LC-MS/MS.
The selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 6 Gl , either by itself or in combination with additional proteins in a multiplexing assay format. In this example, several signature peptides are selected from all peptide possibilities (SEQ ID NO: 96 LVFSLCFLLFSGCCFAFSSR; SEQ ID NO: 97 EQPQQNECQIQK; SEQ ID NO: 98 RPSYTNGPQEIYIQQGK; SEQ ID NO: 99 HQQEEENEGGSILSGFTLEFLEHAFSVDK; SEQ ID NO: 100 HCQRPR; SEQ ID NO: 101 HNIGQTSSPDIYNPQAGSVTTATSLDFPALSWLR; SEQ ID NO: 102 ALIQVVNCNGER; SEQ ID NO: 103 VFDGELQEGR; SEQ ID NO: 104 TNDTPMIGTLAGANSLLNALPEEVIQHTFNLK; SEQ ID NO: 105 NN PFK; and SEQ ID NO: 106 FLVPPQESQK), and representative quantitation of these signature peptides are shown in FIGs. 6A - 6K. Synthetic peptides can directly serve as an analytical reference standard for protein quantitation.
Example 7
At least two homologous protein sequences for Gly m 6 G2 are identified from public databases including NCBL Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 17 and 107) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 7P). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
Once the consensus sequence and/or representative sequence is chosen or determined, it is digested in silico to generate candidate signature peptide fragments to be detected and measured by LC-MS. According to the unique approaches provided herein, signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 6 G2 itself, but also measure potential allergens which are highly homologous to Gly m 6 G2.
Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT). The samples are sonicated in buffer to extract proteins. The extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours. The digestion reactions are acidified with formic acid (pH = 1 -2) and are analyzed using LC-MS/MS.
The selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 6 G2, either by itself or in combination with additional proteins in a multiplexing assay format. In this example, several signature peptides are selected from all peptide possibilities (SEQ ID NO: 4 VTAPAMR; SEQ ID NO: 108 LVLSLCFLLFSGCFALR; SEQ ID NO: 109 EQAQQNECQIQK; SEQ ID NO: 1 10 RPSYTNGPQE1Y1QQGNGIFGMIFPGCPSTYQEPQESQQR; SEQ ID NO: 1 1 1 SQRPQDR; SEQ ID NO: 1 12 QQEEENEGSNILSGFAPEFLK; SEQ ID NO: 1 13 EAFGVNMQIVR; SEQ ID NO: 1 14 PQQEEDDDDEEEQPQCVETDK; SEQ ID NO: 1 15 LSAQYGSLR; SEQ ID NO: 1 16 NAMFVPHYTLNANSIIYALNGR; SEQ ID NO: 1 17 ALVQVVNCNGER; SEQ ID NO: 1 18 VFDGELQEGGVLIVPQNFAVAA ; SEQ ID NO: 1 19 TNDRPSIGNLAGANSLLNALPEE-VIQHTFNLK; and SEQ ID NO: 120 NNNPFSFLVPPQESQR), and representative quantitation of these signature peptides are shown in FIGs. 7A - 7N. A peptide standard is synthesized for SEQ ID NO: 4 VTAPAMR for quantitative and qualitative analyses (see FIG. 70). Synthetic peptides can directly serve as an analytical reference standard for protein quantitation.
Example 8
The Gly m 6 G3 sequence identified from databases is digested in silico to generate candidate signature peptide fragments to be detected and measured by LC-MS. According to the unique approaches provided herein, signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. Quantitation of selected signature peptides can not only measure Gly m 6 G3 itself, but also measure potential allergens which are highly homologous to Gly m 6 G3.
Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT). The samples are sonicated in buffer to extract proteins. The extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours. The digestion reactions are acidified with formic acid (pH = 1 -2) and are analyzed using LC-MS/MS.
The selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 6 G3, either by itself or in combination with additional proteins in a multiplexing assay format. In this example, several signature peptides are selected from all peptide possibilities (SEQ ID NO: 5 NNNPFSFLVPPK; SEQ ID NO: 121 LVLSLCFLLFSGCCFAFSFR; SEQ ID NO: 122 EQPQQNECQIQR; SEQ ID NO: 123 QQEEENEGGSILSGFAPEFLEHAFVVDR; SEQ ID NO: 124 LQGENEEEEK; SEQ ID NO: 125 GGLSVISPPTEEQQQRPEEEEKPDCDEK; SEQ ID NO: 126 HCQSQSR; SEQ ID NO: 127 LSAQFGSLR; SEQ ID NO: 128 VFDGELQEGQVLIVPQNFAVAAR; and SEQ ID NO: 129 TNDRPSIGNLAGANSLLNALPEEVIQQTFNLR), and representative quantitation of these signature peptides are shown in FIGs. 8A - 8J. A peptide standard is synthesized for SEQ ID NO: 5 NNNPFSFLVPPK for quantitative and qualitative analyses (see FIG. 8K). Synthetic peptides can directly serve as an analytical reference standard for protein quantitation. Example 9
Several homologous protein sequences for Gly m 6 G4 are identified from public databases including NCBI, Phytozome, and UniProt. Identified sequences (SEQ ID NOs: 19 and 131 -134) are analyzed using bioinformatics tools to identify sequence homology and shared sequence composition among the available protein sequences (see FIG. 9Y). Specifically this involved the use of Vector NTI Align X alignment tool which performs a CLUSTAL W type alignment. From this analysis, a consensus sequence and/or representative sequence can be determined.
Once the consensus sequence and/or representative sequence is chosen or determined, it is digested in silico to generate candidate signature peptide fragments to be detected and measured by LC-MS. According to the unique approaches provided herein, signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 6 G4 itself, but also measure potential allergens which are highly homologous to Gly m 6 G4.
Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT). The samples are sonicated in buffer to extract proteins. The extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours. The digestion reactions are acidified with formic acid (pH = 1 -2) and are analyzed using LC-MS/MS.
The selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 6 G4, either by itself or in combination with additional proteins in a multiplexing assay format. In this example, several signature peptides are selected from all peptide possibilities (SEQ ID NO: 6 ADFYNPK; SEQ ID NO: 135 LNECQLNNLNALEPDHR; SEQ ID NO: 136 CAGVTVSK; SEQ ID NO: 137 LTLNR; SEQ ID NO: 138 MIIIAQGK; SEQ ID NO: 139 GALGVAIPGCPETFEEPQEQSNR; SEQ ID NO: 140 QQLQDSHQK; SEQ ID NO: 141 VFYLAGNPDIEYPETMQQQQQQK; SEQ ID NO: 142 QGQHQQEEEEEGGSVLSGFSK; SEQ I D NO: 143 HFLAQSFNTNEDIAEK; SEQ ID NO: 144 Q1VTVEGGLSV1SPK; SEQ ID NO: 145 WQEQQDEDEDEDEDDEDEQ1 PSHPPR; SEQ I D NO: 146 RPSHG ; SEQ ID NO: 147 EQDEDEDEDEDKPRPSRPSQGK; SEQ ID NO: 148 QEEPR; SEQ ID NO: 149 NGVEENICTLK; SEQ ID NO: 150 LHENIARPSR; SEQ ID NO: 151 ISTLNSLTLPALR; SEQ ID NO: 152 QFQLSAQYVVLYK; SEQ ID NO: 153 NGIYSPHWNLNANSV1YVTR; SEQ ID NO: 154 DVFR; SEQ ID NO: 155 AIPSEVLAHSY LR; SEQ ID NO: 156 QSQVSELK; and SEQ ID NO: 58 YEGNWGPLVNPESQQGSPR), and representative quantitation of these signature peptides are shown in FIGs. 9A - 9X. Synthetic peptides can directly serve as an analytical reference standard for protein quantitation. Example 10
The Gly m 6 precursor is digested in silico to generate candidate signature peptide fragments to be detected and measured by LC-MS. According to the unique approaches provided herein, signature peptides are selected based on the degree of conservation among the available protein sequences, such that the selected signature peptide can be used to quantify all or as many protein isoforms as possible among the identified protein sequences found in the public sequence databases. As a result, quantitation of selected signature peptides can not only measure Gly m 6 precursor itself, but also measure potential allergens which are highly homologous to Gly m 6 precursor.
Soybean seed samples are ground to a fine powder, defatted twice with hexane, and extracted with suitable assay buffer (for example 5 M urea, 2 M thiourea, 50 mM Tris (pH 8.0), 65 mM DTT). The samples are sonicated in buffer to extract proteins. The extracted proteins are diluted, denatured, and then proteolytically digested by adding trypsin protease and incubating at 37°C for 15-20 hours. The digestion reactions are acidified with formic acid (pH = 1 -2) and are analyzed using LC-MS/MS.
The selected signature peptides can be used for both qualitative and quantitative analysis of Gly m 6 precursor, either by itself or in combination with additional proteins in a multiplexing assay format. In this example, three signature peptides are selected from all peptide possibilities (SEQ ID NO: 6 ADFYNPK, SEQ ID NO: 59 GALQCKPGCPETFEEPQEQSNR and SEQ ID NO: 60 LQSPDDER), and representative quantitation of these signature peptides are shown in FIGs. 10B and I OC. Synthetic peptides can directly serve as an analytical reference standard for protein quantitation. Sequence Listing
SEQ ID NO: 1 Exemplary signature peptide for Gly m 1
SYPSNATCPR
SEQ ID NO: 2 Exemplary signature peptide for Gly m 3
YMVIQGEPGAVIR
SEQ ID NO: 3 Exemplary signature peptide for Gly m 5 (beta-conglycinin) NILEASYDTK
SEQ ID NO: 4 Exemplary signature peptide for Gly m 6 (Glycinin) G2
VTAPAMR SEQ ID NO: 5 Exemplary signature peptide for Gly m 6 (Glycinin) G3
NNNPFSFLVPPK
Exemplary signature peptide for Gly m 6 (Glycinin) precur; Exemplary signature peptide for Kunitz trypsin inhibitor 1
SEQ ID NO: 8 Exemplary signature peptide for Kunitz trypsin inhibitor 3
GIGTIISSPYR
SEQ ID NO: 9 Exemplary signature peptide for Gly m Bd 28 K
NKPQFLAGAASLLR SEQ ID NO: 10 Exemplary signature peptide for Gly m Bd 30 K
GVITQVK
SEQ ID NO: 1 1 Exemplary signature peptide for Gly m 8 (2S albumin)
IMENQSEELEEK
SEQ ID NO: 12 Gly m 1 ABA54898.1 [MW= 1 2482.64 Da]
MGSKVVASVALLLSINILFISMVSSSSHYDPQPQPSHVTALITRPSCPDLSICLNILGG SLGTVDDCCALIGGLGDIEAIVCLCIQLRALGILNLNRNLQLILNSCGRSYPSNATCP RT
SEQ ID NO: 13 Gly m 3 CAA 1 1 755.1 [MW= 1 41 00.07 Da]
MSWQAYVDDHLLCGIEGNHLTHAAIIGQDGSVWLQSTDFPQFKPEEITAIMNDFNE
PGSLAPTGLYLGGTKYMV1QGEPGAVIRGKKGPGGVTVKKTGAALI1GIYDEPMTP
GQCN VVERLGDYLIDQGY
SEQ I D NO: 14 Gly m 4 P26987 [MW= 1 6771 .81 Da] MGVFTFEDEINSPVAPATLYKALVTDADNVIPKALDSFKSVENVEGNGGPGTIKKI TFLEDGETKFVLHKIESIDEANLGYSYSVVGGAALPDTAEK1TFDSKLVAGPNGGS AGKLTVKYETKGDAEPNQDELKTGKAKADALFKAIEAYLLAHPDY SEQ ID NO: 15 Gly m 5 (beta-conglycinin) 121281 [MW= 70293.13 Da]
MMRARFPLLLLGLVFLASVSVSFGIAYWEKENPKHNKCLQSCNSERDSYRNQACH ARCNLLKVEKEECEEGEIPRPRPRPQHPEREPQQPGEKEEDEDEQPRPIPFPRPQPRQ EEEHEQREEQEWPRKEEKRGEKGSEEEDEDEDEEQDERQFPFPRPPHQKEERNEEE DEDEEQQRESEESEDSELRRHKNKNPFLFGSNRFETLFKNQYGRIRVLQRFNQRSP QLQNLRDYRILEFNSKPNTLLLPNHADADYLIVILNGTAILSLVNNDDRDSYRLQSG DALRVPSGTTYYVVNPDN ENLRLITLAIPVNKPGRFESFFLSSTEAQQSYLQGFSR NILEASYDTKFEEINKVLFSREEGQQQGEQRLQESVIVEISKEQIRALSKRAKSSSRK TISSEDKPFNLRSRDPIYSNKLGKFFEITPEKNPQLRDLDIFLSIVDMNEGALLLPHFN SKAIVILVINEGDANIELVGLKEQQQEQQQEEQPLEVRKYRAELSEQDIFVIPAGYP VVVNATSNLNFFAIG1NAEN QRNFLAGSQDNVISQIPSQVQELAFPGSAQAVEKL LKNQRESYFVDAQPKKKEEGNKGRKGPLSSILRAFY
SEQ ID NO: 16 Gly m 6 Glycinin Gl 121276 [MW= 55706.34 Da]
MAKLVFSLCFLLFSGCCFAFSSREQPQQNECQIQKLNALKPDNRIESEGGLIETWNP NNKPFQCAGVALSRCTLNRNALRRPSYTNGPQEIYIQQGKGIFGMIYPGCPSTFEEP QQPQQRGQSSRPQDRHQKIYNFREGDLIAVPTGVAWWMYNNEDTPVVAVSIIDTN SLENQLDQMPRRFYLAGNQEQEFLKYQQEQGGHQSQKGKHQQEEENEGGSILSGF TLEFLEHAFSVDKQIAKNLQGENEGEDKGAIVTVKGGLSVIKPPTDEQQQRPQEEE EEEEDEKPQCKGKDKHCQRPRGSQSKSRRNGIDETICTMRLRHNIGQTSSPDIYNPQ AGSVTTATSLDFPALSWLRLSAEFGSLRKNAMFVPHYNLNANSIIYALNGRALIQV VNCNGERVFDGELQEGRVLIVPQNFVVAARSQSDNFEYVSFKTNDTPMIGTLAGA NSLLNALPEEVIQHTFNLKSQQARQIKNNNPFKFLVPPQESQKRAVA
SEQ ID NO: 17 Gly m 6 Glycinin G2 121277 [MW= 54390.76 Da]
MAKLVLSLCFLLFSGCFALREQAQQNECQIQKLNALKPDNRIESEGGFIETWNPNN KPFQCAGVALSRCTLNRNALRRPSYTNGPQEIYIQQGNGIFGMIFPGCPSTYQEPQE SQQRGRSQRPQDRHQKVHRFREGDLIAVPTGVAWWMYNNEDTPVVAVSIIDTNS LENQLDQMPRRFYLAGNQEQEFLKYQQQQQGGSQSQKGKQQEEENEGSN1LSGF APEFLKEAFGVNMQIVRNLQGENEEEDSGAIVTVKGGLRVTAPAMRKPQQEEDDD DEEEQPQCVETDKGCQRQSKRSRNGIDET1CTMRLRQNIGQNSSPD1YNPQAGSITT ATSLDFPALWLLKLSAQYGSLRKNAMFVPHYTLNANSIIYALNGRALVQVVNCNG ERVFDGELQEGGVLIVPQNFAVAAKSQSDNFEYVSFKTNDRPS1GNLAGANSLLNA LPEEVIQHTFNLKSQQARQVKNNNPFSFLVPPQESQRRAVA SEQ ID NO: 18 Gly m 6 Glycinin G3 121278 [MW= 54241.73 Da]
MAKLVLSLCFLLFSGCCFAFSFREQPQQNECQIQRLNALKPDNRIESEGGFIETWNP NNKPFQCAGVALSRCTLNRNALRRPSYTNAPQE1YIQQGSGIFGM1FPGCPSTFEEP QQKGQSSRPQDRHQKIYHFREGDLIAVPTGFAYWMYNNEDTPVVAVSLIDTNSFQ NQLDQMPRRFYLAGNQEQEFLQYQPQKQQGGTQSQ GKRQQEEENEGGSILSGF APEFLEHAFVVDRQ1VRKLQGENEEEEKGA1VTVKGGLSVISPPTEEQQQRPEEEEK PDCDEKDKHCQSQSRNG1DETICTMRLRHNIGQTSSPD1FNPQAGSITTATSLDFPAL SWL LSAQFGSLRKNAMFVPHYNLNANSIIYALNGRALVQVVNCNGERVFDGEL QEGQVLIVPQNFAVAARSQSDNFEYVSFKTNDRPSIGNLAGANSLLNALPEEV1QQ TFNLRRQQARQV NNNPFSFLVPPKESQRRVVA SEQ ID NO: 19 Gly m 6 Glycinin G4 121279 [MW= 63587.16 Da]
MGKPFTL SLS SLCLLLLS S ACF AI S S SKLNEC QLN LN ALEPDHRVE SEGGLIQTWN SQHPELKCAGVTVSKLTLNRNGLHSPSYSPYPRMIIIAQGKGALGVAIPGCPETFEE PQEQSNRRGSRSQKQQLQDSHQKIRHFNEGDVLVIPPSVPYWTY TGDEPVVAISL LDTSNFNNQLDQTPRVFYLAGNPDIEYPETMQQQQQQKSHGGRKQGQHQQEEEE EGGSVLSGF SKHFLAQSFNTNEDI AEKLE SPDDERKQI VTVEGGLS VI SPK WQEQQ DEDEDEDEDDEDEQIPSHPPRRPSHGKREQDEDEDEDEDKPRPSRPSQGKRNKTGQ DEDEDEDEDQPRKSREWRSKKTQPRRPRQEEPRERGCETRNGVEENICTLKLHENI ARPSRADFYNPKAGRISTLNSLTLPALRQFQLSAQYVVLYK GIYSPHWNLNANSV
lYVTRGQGKVRVVNCQGNAVFDGELRRGQLLVVPQNFVVAEQAGEQGFEYIVFK THHNAVTSYLKDVFRAIPSEVLAHSY LRQSQVSELKYEGNWGPLV PESQQGSP RVKVA SEQ ID NO: 20 Gly m 6 Glycinin precursor 75221455 [MW= 63876.47 Da]
MGKPFTLSLS SLCLLLLS S ACF AI S SSKLNECQLNNLNALEPDHRVEFEGGLIQTWN SQHPELKCAGVTVSKLTLNRNGLHLPSYSPYPRMIIIAQGKGALQCKPGCPETFEEP QEQ SNRRG SRS QKQQLQDSHQKIRHFNEGDVL VIPPG VP Y WTYNTGDEP V V AI SLL DTSNFNNQLDQTPRVFYLAGNPDIEYPETMQQQQQQKSHGGRKQGQHQQEEEEE GGSVLSGFSKHFLAQSFNTNEDIAEKLQSPDDERKQIVTVEGGLSVISPKWQEQQD EDEDEDEDDEDEQIPSHPPRRPSHGKREQDEDEDEDEDKPRPSRPSQGKREQDQDQ DEDEDEDEDQPRKSREWRSKKTQPRRPRQEEPRERGCETRNGVEENICTLKLHENI ARPSRADFYNPKAGRISTLNSLTLPALRQFQLSAQYVVLYKNGIYSPHWNLNANSV IYVTRGQGKVRVVNCQGNAVFDGELRRGQLLVVPQNFVVAEQAGEQGFEYIVFK THHNAVTSYLKDVFRAIPSEVLAHSYNLRQSQVSELKYEGNWGPLVNPESQQGSP RVKVA
SEQ ID NO: 21 Gly m Bd 28K 12697782 [MW= 52944.36 Da]
MGNKTTLLLLLFVLCHGVATTTMAFHDDEGGDKKSPKSLFLMSNSTRVFKTDAG EMRVLKSHGGRIFYRHMHIGFISMEPKSLFVPQYLDSNLIIF1RRGEAKLGFIYDDEL AERRLKTGDLYMIPSGSAFYLVNIGEGQRLHVICSIDPSTSLGLETFQSFYIGGGANS HSVLSGFEPAILETAFNESRTVVEEIFSKELDGPIMFVDDSHAPSLWTKFLQLKKDD KEQQLKKMMQDQEEDEEEKQTSRSWRKLLETVFGKVNEKIENKDTAGSPASYNL YDDKKADFKNAYGWSKALHGGEYPPLSEPDIGVLLVKLSAGSMLAPHVNPISDEY TIVLSGYGELHIGYPNGSRAMKTKIKQGDVFVVPRYFPFCQVASRDGPLEFFGFSTS ARKNKPQFLAGAASLLRTLMGPELSAAFGVSEDTLRRAVDAQHEAVILPSAWAAP PENAGKLKMEEEPNAIRSFANDVVMDVF
SEQ ID NO: 22 Gly m Bd 30K 84371705 [MW= 42757.81 Da]
MGFLVLLLFSLLGLSSSSSISTHRS1LDLDLTKFTTQKQVSSLFQLWKSEHGRVYHN HEEEAKRLEIFKNNLNYIRDMNANRKSPHSHRLGLNKFAD1TPQEFSKKYLQAPKD VSQQIKMANKKMKKEQYSCDHPPASWDWRKKGVITQVKYQGGCGSGWAFSATG AIEAAHAIATGDLVSLSEQELVDCVEESEGCYNGWHYQSFEWVLEHGGIATDDDY PYRAKEGRCKANKIQDKVTIDGYETLIMSDESTESETEQAFLSAILEQP1SVSIDAKD FHLYTGGIYDGENCTSPYGINHFVLLVGYGSADGVDYWIAKNSWGEDWGEDGYI WIQRNTGNLLGVCGMNYFASYPTKEESETLVSARVKGHRRVDHSPL
SEQ ID NO: 23 KTI 1 125722 [MW= 22545.94 Da] MKST1FFALFLVCAFTISYLPSATAQFVLDTDDDPLQNGGTYYMLPVMRGKGGGIE VDSTGKEICPLTVVQSPNELDKGIGLVFTSPLHALFIAERYPLSIKFGSFAVITLCAG MPTEWAIVEREGLQAVKLAARDTVDGWFNIERVSREYNDYKLVFCPQQAEDNKC EDIGIQIDDDGIRRLVLSKNKPLVVQFQKFRSSTA
SEQ ID NO: 24 KTI 3 125020 [ MW= 24005.29 Da]
MKSTIFFLFLFCAFTTSYLPSAIADFVLDNEGNPLENGGTYYILSDITAFGGIRAAPT GNERCPLTVVQSRNELDKGIGTIISSPYRIRFIAEGHPLSLKFDSFAVIMLCVGIPTEW SVVEDLPEGPAVKIGENKDAMDGWFRLERVSDDEFNNYKLVFCPQQAEDDKCGD IGISIDHDDGTRRLVVSKNKPLVVQFQKLDKESLAKKNHGLSRSE
SEQ ID NO: 25 Gly m 8 (2S albumin) NP_001238443 [MW= 18459.97 Da] MTKFTILLISLLFCIAHTCSASKWQHQQDSCRKQLQGVNLTPCEKHIMEKIQGRGD DDDDDDDDNHILRTMRGRINYIRRNEGKDEDEEEEGHMQKCCTEMSELRSPKCQC KALQKIMENQSEELEEKQKKKMEKELINLATMCRFGPMIQCDLS SDD
SEQ ID NO : 26 Lectin ADC94422 [M W= 30186.22 Da]
MATSNFSIVLSLSLAFFLVLLTKANSTNTVSFTVSKFSPRQQNLIFQGDAAISPSGVL RLTKVDSIDVPTTGSLGRALYATPIQIWDSETGKVASWATSFKFKVFSPNKTADGL AFFLAPVGSKPQSKGGFLGLFNSDSKNKSVQTVAVEFDTYY AKWDPANRHIGID VNSIKSVKTASWGLANGQIAQILITYDADTSLLVASLIHPSRKTSYILSETVSLKSNL PEWVNIGFSATTGLNKGFVETHDVFSWSFASKLSDGSTSDTLDLPSFLLNEAI
SEQ ID NO: 27 Lipoxygenase CAA39604 [MW= 96817.14 Da]
MFGIFDKGQKIKGTVVLMPKNVLDFNAITSIGKGGVIDTATGILGQGVSLVGGVID TATSFLGRNISMQLISATQTDGSGNGKVGKEVYLEKHLPTLPTLGARQDAFSIFFE WDASFGIPGAFYIKNFMTDEFFLVSVKLEDIPNHGTIEFVCNSWVYNFRSYKKNRIF FVNDTYLPSATPAPLLKYRKEELEVLRGDGTGKRKDFDRIYDYDVYNDLGNPDGG DPRPILGGSSIYPYPRRVRTGRERTRTDPNSEKPGEVYVPRDENFGHLKSSDFLTYG IKSLSHDVIPLFKSAIFQLRVTSSEFESFEDVRSLYEGGIKLPTDILSQISPLPALKEIFR TDGENVLQFPPPHVAKVSKSGWMTDEEFAREVIAGVNPNVIRRLQEFPPKSTLDPT LYGDQTSTITKEQLEINMGGVTVEEALSTQRLFILDYQDAFIPYLTRINSLPTAKAY ATRTILFLKDDGTLKPLAIELSKPHPDGDNLGPESIVVLPATEGVDSTIWLLAKAHV IVNDSGYHQLVSHWLNTHAVMEPFAIAT RHLSVLHPIYKLLYPHYRDTININGLA RQSLINADGIIEKSFLPGKYSIEMSSSVYKNWVFTDQALPADLVKRGLAIEDPSAPH GLRLVIEDYPYAVDGLEIWDAIKTWVHEYVSLYYPTDAAVQQDTELQAWWKEA VEKGHGDLKEKPWWPKMQTTEDLIQSCSIIVWTASALHAAVNFGQYPYGGLILNR PTLARRFIPAEGTPEYDEMVKNPQKAYLRTITPKFETLIDLSVIEILSRHASDEIYLGE RETPNWTTD KALEAF RFGSKLTGIEGK1NARNSDPSLRNRTGPVQLPYTLLHRS SEEG LTFKGIPN SI SI
SEQ ID NO: 28 Gly m 1 consensus sequence
MGSKVVASVALLLS1N1LFISMVSSSSHYDPPQPSYVTALITRPSCPDLSICLNILGGS LGTVDDCCALIGGLGDIEAIVCLCIQLRALGILNLNRNLQLILNACGRSYPSNATCP RT
SEQ ID NO: 29 Gly m 1 consensus sequence #2 MGSKVVASVALLLSINILFISMVSSSSHYDPQPQPSHVTALITRPSCPDLSICLNILGG SLGTVDDCCALIGGLGDIEAlVCLCIQLPvALGILNLNRNLQLILNACGRSYPSNATCP RT SEQ ID NO: 30 Gly m 1 Ping AAB34755.1 42aa
ALITRPSXPDLSIXLNILGGSLGTVDDXXALIGGLXDXXAIV
SEQ ID NO: 31 Gly m 1 Ping ABA54897.1 134aa
MGSKVVASVALLLSINILFISMVSSSSHYDPPPPPCYVPAPLTPPPSLSPPPSLSPPPPS GPSCPDLSVCLNILDGSPADDCC ALIADLVDLEASVCLCIQLRVLGIVNLDLNLQLI LNACGPSYPSNATCPRT
SEQ ID NO: 32 Gly m 1 Eric ABA54899 1 18aa
GSKVVASVALLLSINILFISMVSSSSHYDPQPQPSHVTALITRPSCPDLSICLNILGGS LGTVDDCCALIGGLGDIEAIVCLCIQLRALGILNLNRNLQLILNSCGRSYPSNATCPR T
SEQ ID NO: 33 Gly m 1 Glymal 5gl3740.1 BLAST 120aa
MGSKVVAYVALLLSINILFISMVSSSSHYDPQPQPSYVTALITRPSCPDLSVCLNILG GYLGTVDDCCALIGGLGDIEATVCLCIQLRALGILNLNRNLQLILNACGPSYPSNAT CPRT
SEQ ID NO: 34 Gly m 1 Glymal 5gl 3770.1 BLAST 130aa
MGSKVVASVALLLSINILFISMVSSSSHYDPPPPPCYVPAPFTPPPPSLSPPPPSGPSCP DLSVCLNILDGSPADDCCALIADLVDLEASVCLCIQLRVLGIVNLDLNLQLILNACG PSYPSNATCPRT
SEQ ID NO: 35 Gly m 1 Glymal 5gl3750.1 BLAST 120aa
MGSKVVASVALLLSINILFISMVSSSSHYDPPPQPSYVTALITRPSCPDLSICLNILGG SLGTVDDCCALIGGLGDIEAIVCLCIQLRALGILNLNRNLQLILNSCGRSYPSNATCP RT
SEQ ID NO: 36 ALGILNLNR
SEQ ID NO: 37 NLQLILNSCGR
SEQ ID NO: 38 Gly m 3 consensus sequence
MSWQAYVDDHLLCGIEGNHLTHAAIIGQDGSVWAQSTDFPQFKPEEITAIMNDFN EPGSLAPTGLYLGGTKYMVIQGEPGAVIRGKKGPGGVTVKKTGAALIIGIYDEPMT PGQCNMVVERLGDYLIDQGY
SEQ ID NO: 39 Gly m 3 Ping ABU97472.1 131aa
MSWQAYVDDHLLCEIEGNHLTHAAIIGQDGSVWAQSTNFPQFKPEEITAINNDFNE PGSLAPTGLYIGGTKYMVIQGEPGAVIRGKKGPGGVTVKKTGAALIIGIYDEPMTP GQCNMVVERLGDYLIDQGL
SEQ ID NO: 40 Gly m 3 Ping 065809.1 131 aa MSWQAYVDDHLLCDIEGNHLTHAAIIGQDGSVWAQSTDFPQFKPEEITAIMNDFN
EPGSLAPTGLYLGGTKYMVIQGEPGAV1RGKKGPGGVTVKKTGAALIIGIYDEPMT
PGQCNMVVERPGDYLIDQGY SEQ ID NO: 41 GPGGVTVK
SEQ ID NO: 42 Gly m 4 consensus sequence
MGVFTFEDETTSPVAPATLYKALVTDADNVIPKAVDAFKSVENVEGNGGPGTIKKI TFVEDGETKFVLHKIEA1DEANLGYSYSVVGGAGLPDTVEKITFEAKLAAGANGGS AGKLTVKYQTKGDAQPNQDELKSGKAKADALFKAVEAYLLANPDYN
SEQ ID NO: 43 Gly m 4 Glyma07g37240.1 BLAST 165aa
MGVFTFEDEINSPVAPATLYKALVTDADNVIPKALDSFKSVENVEGNGGPGTIKKI TFLEADVNEWIDGETKFVLHKIESIDEANLGYSYSVVGGAALPDTAEKITFDSKLV AGPNGGSAGKLTVKYETKGDAEPNQDELKTGKAKADALFKAIEAYLLAHPDYN
SEQ ID NO: 44 Gly m 4 Glymal 7g03365.1 BLAST 159aa
MGIFTFEDEITSPVAPATLYKALVTDADNIIPKALDSFKSVENVEGNGGPGTIKKITF VEDGETKFVLHKIEAVDEANLGYSYSVVGGAALPDTAEKITFHSKLAAGPNGGSA GKLTVEYQTKGDAQPNQDQLKTGKAKADALFKAIEAYLLANPDYN
SEQ ID NO: 45 Gly m 4 Glyma07g37240.3 BLAST 147aa
MGVFTFEDEINSPVAPATLYKALDSFKSVENVEGNGGPGTIKKITFLEDGETKFVLH KIESIDEANLGYSYSVVGGAALPDTAEKITFDSKLVAGPNGGSAGKLTVKYETKGD AEPNQDELKTGKAKADALFKAIEAYLLAHPDYN
SEQ ID NO: 46 Gly m 4 Glyma07g37270.2 BLAST 159aa
MGVFTFEDEINSPVAPATLYKALVTDADNVIPKALDSFKSVENVEGNGGPGTIKKI TFLEDGETKFVLHKIEAIDEANLGYSYSVVGGDGLPDTVEKITFECKLAAGANGGS AGKLTVKYQTKGDAQPNQDDLK1GKAKSDALFKAVEAYLLAHPDYN
SEQ ID NO: 47 Gly m 4 Glyma07g37270.1 BLAST 159aa
MGVFTFEDETTSPVAPATLYKALVTDADNVIPKAVDAFRSVENVEGNGGPGTIKKI TFLEDGETKFVLHKIEAIDEANLGYSYSVVGGDGLPDTVEKITFECKLAAGANGGS AGKLTVKYQTKGDAQPNQDDLKIGKAKSDALFKAVEAYLLAHPDYN
SEQ ID NO: 48 Gly m 4 Glymal 7g03360.1 BLAST 159aa
MGVFTFEDETTSPVAPATLYKALVTDADNVIPKAVDAFRSVENLEGNGGPGTIKKI TFVEDGESKFVLHKIESVDEANLGYSYSVVGGVGLPDTVEKITFECKLAAGANGGS AGKLTVKYQTKGDAQPNPDDLKIGKVKSDALF AVEAYLLANPHYN
SEQ ID NO: 49 Gly m 4 Glymal 7g03350.1 BLAST 159aa
MGIFTFEDETTSPVAPATLYKALVTDADNVIPKAVEAFRSVENLEGNGGPGTIKKIT FVEDGES FVLHKIESVDEANLGYSYSVVGGVGLPDTVEKITFECKLAAGANGGS AGKLTVKYQTKGDAQPNPDDLK1GKVKSDALFKAVEAYLLANPHYN
SEQ ID NO: 50 Gly m 4 Glyma09g04510.1 BLAST 1 59aa MGVFTFEDETTSTVAPARLYKALVKDADNLVPKAVEAIKSVEIVEGNGGPGTIKKL
TFVEDGQTKYVLHKVEAIDEANWGYNYSVVGGVGLPDTVEKISFEAKLVADPNG
GSIAK1TVKYQTKGDANPSEEELKSGKAKGDALFKALEGYVLANPDYN
SEQ ID NO: 51 Gly ra 4 Glymal 5gl 5590.1 BLAST 159aa
MGVFTFEDETTSTVAPARLYKALVKDADNLVPKAVEAIKSVEIVEGNGGPGTIKKL
TFVEDGQTKYVLHKVEAIDEANWGYNYSVVGGVGLPDTVEKISFEAKLVEGASG
GSIAKITVKYQTKGDVNPSEEELKSGKAKGDALFKALEGYVLANPDYN
SEQ ID NO: 52 MGVFTFEDEINSPVAPATLYK
SEQ ID NO: 53 ALDSFK
SEQ ID NO: 54 SVENVEGNGGPGTIK
SEQ ID NO: 55 ITFLEDGETK
SEQ ID NO: 56 FVLHK
SEQ ID NO: 57 AIEAYLLAHPDYN
SEQ ID NO: 61 Gly m 5 consensus sequence
MMRARFPLLLLG V VFL A S V S V SFGI A Y WEKQNPSHNKCLQ SCN SEKD S YRNQ ACH ARCNLLKVEEEEECEEGQIPRPRPQHPERERQQHGEKEEDEGEQPRPFPFPRPRQPH QEEEHEQKEEHEWHRKEEKHGGKGSEEEQDEREHPRPHQPHQKEEEKHEWQHKQ EKHQGKESEEEEEDQDEDEEQDKESQESEGSESQREPRRHKNKNPFHFNSKRFQTL FKNQYGHVRVLQRFNKRSQQLQNLRDYRILEFNSKPNTLLLPHHADADYLIVILNG TAILTLVNNDDRDSYNLQSGDALRVPAGTTYYVVNPDNDENLRMITLAIPVNKPG RFESFFLSSTQAQQSYLQGFSKNILEASYDTKFEErNKVLFGREEGQQQGEERLQES VIVEISKKQIRELSKRAKSSSRKTISSEDKPFNLRSRDPIYSNKLGKLFEITPEKNPQL RDLDVFLSVVDMNEGALLLPHFNSKAIVVLVINEGDANIELVGIKEQQQRQQQEEQ PLEVRKYRAELSEQDIFVIPAGYPVVVNATSNLNFFAFGINAENNQRNFLAGSKDN VISQIPSQVQELAFPGSAKDIENLIKSQSESYFVDAQPQQKEEGNKGRKGPLSSILRA FY
SEQ ID NO: 62 Gly m 5 ABH09130 BLAST 600aa
MFGIVYWEKQNPSHNKCLRSCNSEKDSYRNQACHARCNLLKVEEEEECEEGQIPR
PRPQHPERERQQHGEKEEDEGEQPRPFPFPRPRQPHQEEEHEQKEEHEWHRKEEKH
GGKGSEEEQDEREHPRPHQPHQKEEEKHEWQHKQEKHQGKESEEEEEDQDEDEG QDKESQESEGSESQREPRRHKNKNPFHFNSKRFQTLFKNQYGHVRVLQRFNKRSQ QLQNLRDYRILEFNSKPNTLLSPNHADADYLIVILNGTAILTLVNNDDRDSYNLQS GDALRVPAGTTYYVVNPDNDENLRMITLAIPVNKPGRFESFFLSSTQAQQSYLQGF SKNILEASYDTKFEEINKVLFGREEGQQQGEERLQESVIVEISKKQIRELSKHAKPSS R TISSEDKPFNLRSRDPIYSNKLGKLFEITPEKNPQLRDLDVFLSVVDMNEGALFL PHFNSKAIVVLV1NEGEANIELVGIKEQQQRQQQEEQPLEVRKYRAELSEQDIFVIP AGYPVVVNATSDLNFFAFGINAENNQRNFLAGSKDNVISQIPSQVQELAFPGSAKD IENLIKSQSESYFVDAQPQQKEEGNKGRKGPLSSILRAFY
SEQ ID NO: 63 Gly m 5 BAA 74452 BLAST 559aa VEEEEECEEGQIPRPRPQHPERERQQHGEKEEDEGEQPRPFPFPRPRQPHQEEEHEQ KEEHEWHRKEEKHGGKGSEEEQDEREHPRPHQPHQKEEEKHEWQHKQEKHQGK ESEEEEEDQDEDEEQDKESQESEGSESQREPRRHKNKNPFHFNSKRFQTLFKNQYG HVRVLQRFNKRSQQLQNLRDYR1LEFNSKPNTLLLPHHADADYLIVILNGTAILTLV NNDDRDSY LQSGDALRVPAGTTYYVV PDNDENLRMITLAIPVNKPGRFESFFLS STQ AQQ S YLQGF SKNILE A S YDTKFEEI KVLFGREEGQQQGEERLQES VI VEI SKK QIRELSKHAKSSSRKTISSEDKPFNLRSRDPIYSNKLGKLFEITPEKNPQLRDLDVFLS VVDMNEGALFLPHFNSKAIVVLVINEGEANIELVGIKEQQQRQQQEEQPLEVRKYR AELSEQDIFVIPAGYPVVVNATSDLNFFAFGINAEN QRNFLAGSKDNVISQIPSQV QELAFPGSAKDIENLIKSQSESYFVDAQPQQKEEGNKGRKGPLSSILRAFY
SEQ ID NO: 64 Gly m 5 BAC78524 BLAST 621aa
MMRARFPLLLLGVVFLASVSVSFGIAYWEKQNPSHNKCLRSCNSEKDSYRNQACH ARCNLLKVEEEEECEEGQIPRPRPQFIPERERQQHGEKEEDEGEQPRPFPFPRPRQPR QEEEHEQKEEHEWHRKEEKHGGKGSEEEQDEREHPRPHQPHQKEEEKHEWQHKQ EKHQGKESEEEEEDQDEDEEQDKESQESEGSESQREPRRFIK KNPFFIFNSKRFQTL FKNQYGHVRVLQRFNKRSQQLQNLRDYRILEFNSKPNTLLLPHHADADYLIVILNG TAILTLVNNDDRDSYNLQSGDALRVPAGTTYYVVNPDNDENLRM1TLAIPVNKPG RFESFFLSSTQAQQSYLQGFSKNILEASYDTKFEEINKVLFGREEGQQQGEERLQES VIVEISKKQIRELSKHAKSSSRKTISSEDKPFNLRSRDPIYSNKLGKLFEITPEKNPQL RDLDVFLSVVDMNEGALFLPHFNSKAIVVLVINEGEANIELVGIKEQQQRQQQEEQ PLEVRKYRAELSEQDIFVIPAGYPVVVNATSDLNFFAFGINAENNQRNFLAGSKDN VISQIPSQVQELAFPGSAKDIENLIKSQSESYFVDAQPQQKEEGNKGRKGPLSSILRA FY
SEQ ID NO: 65 Gly m 5 Glyma20g28460.1 BLAST 440aa
MMRVRFPLLVLLGTVFLASVCVSLKVREDENNPFYLRSSNSFQTLFENQNGRIRLL QRFNKRSPQLENLRDYRIVQFQSKPNTILLPHHADADFLLFVLSGRAILTLVNNDDR DSYNLHPGDAQRIPAGTTYYLVNPHDHQNLKIIKLAIPVNKPGRYDDFFLSSTQAQ QSYLQGFSHNILETSFHSEFEEINRVLFGEEEEQRQQEGVIVELSKEQIRQLSRRAKS SSRKTISSEDEPFNLRSRNPIYSNNFGKFFEITPEKNPQLRDLDIFLSSVDINEGALLLP HFNSKAIVILVINEGDANIELVGIKEQQQKQKQEEEPLEVQRYRAELSEDDVFVIPA AYPFVVNATSNLNFLAFGINAENNQRNFLAGEKDNVVRQIERQVQELAFPGSAQD VERLLKKQRESYFVDAQPQQKEEGSKGRKGPFPSILGALY
SEQ ID NO: 66 Gly m 5 Ping AAB23463 439aa
MMRVRFPLLVLLGTVFLASVCVSLKVREDENNPFYFRSSNSFQTLFENQNVR1RLL QRFNKRSPQLENLRDYRIVQFQSKPNTILLPHHADADFLLFVLSGRAILTLVNNDDR DSYNLHPGDAQRIPAGTTYYLVNPHDHQNLKIIKLAIPVNKPGRYDDFFLSSTQAQ QSYLQGFSHNILETSFHSEFEEINRVLFGEEEEQRQQEGVIVELSKEQIRQLSRRAKS SSRKTISSEDEPFNLRSRNPtYSNNFGKFFEITPEKNPQLRDLDIFLSSVDINEGALLLP HFNSKAIVILVINEGDANIELVGIKEQQQKQKQEEEPLEVQRYRAELSEDDVFVIPA AYPFVVNATSNLNFLAFGINAENNQRNFLAGEKDNVVRQIERQVQELAFPGSAQD VERLLKKQRESYFVDAQPQQKEEGSKGRKGPFPS1LGALY
SEQ ID NO: 67 Gly m 5 Glyma20g28650.1 BLAST 606aa
MMRARFPLLLLGLVFLASVSVSFGIAYWEKENP HNKCLQSCNSERDSYRNQACH ARCNLLKVEKEECEEGEIPRPRPRPQHPEREPQQPGEKEEDEDEQPRP1PFPRPQPRQ EEEHEQREEQEWPRKEEKRGEKGSEEEDEDEDEEQDERQFPFPRPPHQ EERKQEE DEDEEQQRESEESEDSELRRHKNKNPFLFGSNRFETLFKNQYGRIRVLQRFNQRSP QLQNLRDYRJLEFNSKPNTLLLPNHADADYL1VILNGTAILSLVNNDDRDSYRLQSG DALRVPSGTTYYVVNPDN ENLRLITLAIPV KPGRFESFFLSSTEAQQSYLQGFSR
N1LEASYDTKFEEINKVLFSREEGQQQGEQRLQESV1VEISKEQIRALSKRAKSSSRK TIS SEDKPFNLRSRDPIYSNKLGKFFEITPEKNPQLRDLDIFLSIVDMNEGALLLPHFN SKAIVILVINEGDANIELVGLKEQQQEQQQEEQPLEVRKYRAELSEQDIFVIPAGYP VVV ATSNLNFFAIG1NAEN QR FLAGSQDNVISQIPSQVQELAFPGSAQAVEKL LKNQRESYFVDAQPKKKEEGNKGRKGPLSSILRAFY SEQ ID NO: 68 Gly m 5 Glyma20g28660.1 BLAST 606aa
MMRARFPLLLLGLVFLASVSVSFGIAYWEKENPKHNKCLQSCNSERDSYRNQACH ARCNLLKVEKEECEEGEIPRPRPRPQHPEREPQQPGEKEEDEDEQPRPIPFPRPQPRQ EEEHEQREEQEWPRKEEKRGEKGSEEEDEDEDEEQDERQFPFPRPPHQKEERKQEE DEDEEQQRESEESEDSELRRHKNKNPFLFGSNRFETLFKNQYGRIRVLQRFNQRSP QLQNLRDYRILEFNSKPNTLLLPNHADADYLIVILNGTAILSLVNNDDRDSYRLQSG DALRVPSGTTYYVVNPDN ENLRLITLAIPVNKPGRFESFFLSSTEAQQSYLQGFSR NILEA S YDTKFEEINKVLF SREEGQQQGEQRLQES VI VEI SKEQIRAL SKRAKS S SRK TISSEDKPFNLRSRDPIYSNKLGKFFEITPEKNPQLRDLDIFLSIVDMNEGALLLPHFN SKAIVILVINEGDANIELVGLKEQQQEQQQEEQPLEVRKYRAELSEQDIFVIPAGYP VVVNATSNLNFFAIGINAENNQRNFLAGSQDNVISQIPSQVQELAFPGSAQAVEKL LKNQRESYFVDAQPKKKEEGNKGRKGPLSSILRAFY
SEQ ID NO: 69 Gly m 5 Glyma20g28650.2 BLAST 543aa
MMR ARFPLLLLGL VFL A S V S V SFGI A Y WEKENPKHNKCLQ SCN SERDS YRNQ ACH ARCNLLKVEKEECEEGEIPRPRPRPQHPEREPQQPGEKEEDEDEQPRPIPFPRPQPRQ EEEHEQREEQEWPRKEEKRGEKGSEEEDEDEDEEQDERQFPFPRPPHQKEERKQEE DEDEEQQRESEESEDSELRRHKNKNPFLFGSNRFETLFKNQYGRIRVLQRFNQRSP QLQNLRDYRILEFNSKPNTLLLPNHADADYLIVILNGTAILSLVNNDDRDSYRLQSG DALRVPSGTTYYVVNPDNNENLRLITLAIPVNKPGRFESFFLSSTEAQQSYLQGFSR NILEASYDTKFEEINKVLFSREEGQQQGEQRLQESVIVEISKEQIRALSKRAKSSSRK TISSEDKPFNLRSRDPIYSNKLGKFFEITPEKNPQLRDLDIFLSIVDMNEGALLLPHFN SKAIVILVINEGDANIELVGLKEQQQEQQQEEQPLEVRKYRAELSEQDIFVIPAGYP VVVNATSNLNFFAIGINAENNQRNFLAGI SEQ ID NO: 70 Gly m 5 Ping AAA33947 218aa
SKRAKSSSRKTISSEDKPFNLGSRDPIYSKKLGKFFEITPEKNPQLRDLDIFLSIVDMN EGALLLPHFNSKAIVILVINEGDANIELVGLKEQQQEQQQEEQPLEVRKYRAELSEQ DIFVIPAGYPVVVNATSNLNFFAIGINAENNQRNFLAGSQDNVISQIPSQVQELAFPG SAQAVEKLLKNQRESYFVDAQPNEKEEGNKGRKGPLSSILRAFY
SEQ ID NO: 71 Gly m 5 NP 001237316 BLAST 621 aa
MMRARFPLLLLGVVFLASVSVSFGIAYWEKQNPSHNKCLRSCNSEKDSYRNQACH ARCNLLKVEEEEECEEGQIPRPRPQHPERERQQHGEKEEDEGEQPRPFPFPRPRQPR QEGEHEQKEEHEWHRKEE HGGKGSEEEQDGREHPRPHQPHQKEEEKHEWQHK QEKHQGKESEEEEEDQDEDEEQDKESQESEGSESQREPRRH NKNPFHFNSKRFQT LFKNQYGHVRVLQRFN RSQQLQNLRDYRILEFNSKPNTLLLPHHADADYLIVILN GTAILTLVNNDDRDSYNLQSGDALRVPAGTTYYVVNPDNDENLRMITLA1PVNKP GRFESFFLSSTQAQQSYLQGFSKNILEASYDTKFEEINKVLFGREEGQQQGEERLQE SVIVEISKKQIRELSKRAKSSSRKTISSEDKPFNLRSRDPIYSNKLG LFEITPEKNPQ LPvDLDVFLSVVDMNEGALFLPHFNSKAIVVLVINEGEANIELVGIKEQQQRQQQEE QPLEVRKYRAELSEQDIFVIPAGYPVVVNATSDLNFFAFGINAENNQR FLAGSKD NVISQIPSQVQELAFLGSAKDIENL1KSQSESYFVDAQPQQKEEGNKGRKGPLSSILR AFY
SEQ ID NO: 72 Gly m 5 BAE02726 BLAST 621aa
MMRARFPLLLLGVVFLASVSVSFGIAYWEKQNPSHNKCLRSCNSEKDSYRNQACH
ARCNLLKVEEEEECEEGQIPRPRPQHPERERQQHGEKEEDEGEQPRPFPFPRPRQPH
QEEEHEQKEEHEWHRKEEKHGGKGSEEEQDEREHPRPHQPHQKEEEKHEWQHKQ
EKHQGKESEEEEEDQDEDEEQDKESQESEGSESQREPRRHKNKNPFHFNSKRFQTL
FKNQYGHVRVLQRFNKRSQQLQNLRDYRILEFNSKPNTLLLPHHADADYLIVILNG
TAILTLVNNDDRDSYNLQSGDALRVPAGTTYYVVNPDNDENLRMITLAIPVNKPG
RFESFFLSSTQAQQSYLQGFSKNILEASYDTKFEEINKVLFGREEGQQQGEERLQES
VI VEI SKKQIREL SKH AKS S SRKTI S SEDKPFNLRSRDPI YSNKLGKLFEITPEKNPQL
RDLDVFLSVVDMNEGALFLPHFNSKAIVVLVINEGEANIELVGIKEQQQRQQQEEQ
PLEVRKYRAELSEQDIFVIPAGYPVVVNATSDLNFFAFGINAENNQRNFLAGSKDN
VISQIPSQVQELAFPGSAKDIENLIKSQSESYFVDAQPQQKEEGNKGRKGPLSSILRA
FY
SEQ ID NO: 73 Gly m 5 Ping AAB01374 639aa
MMRARFPLLLLG V VFLA S V S V SFGI A Y WEKQNP SHNKCLRSCN SEKD S YRNQ ACH
ARCNLLKVEEEEECEEGQIPRPRPQHPERERQQHGEKEEDEGEQPRPFPFPRPRQPH
QEEEHEQKEEHEWHRKEEKHGGKGSEEEQDEREHPRPHQPHQKEEEKHEWQHKQ
EKHQGKESEEEEEDQDEDEEQDKESQESEGSESQREPRRHKNKNPFHFNSKRFQTL
FKNQYGFTVRVLQRFNKRSQQLQNLRDYRILEFNSKPNTLLLPHFLADADYLIVILNG
TAILTLVNNDDRDSYNLQSGDALRVPAGTTFYWNPDNDENLRMIAGTTFYVVNP
DNDENLRMITLAIPVNKPGRFESFFLSSTQAQQSYLQGFSKNILEASYDTKFEEINK
V LFGREEGQQQGEERLQE S VI VEI SKKQIREL SKH AKS S SRKTI S SEDKPFNLGSRDP
IYSNKLGKLFEITQRNPQLRDLDVFLSVVDMNEGALFLPHFNSKAIVVLVINEGEA
NIELVGIKEQQQRQQQEEQPLEVRKYRAELSEQDIFVIPAGYPVMVNATSDLNFFA
FGINAEN QRNFLAGSKDNVISQIPSQVQELAFPRSAKDIENLIKSQSESYFVDAQP
QQKEEGNKGRKGPLSSILRAFY
SEQ ID NO: 74 Gly m 5 Glymal 0g39150.1 BLAST 622aa
MMRARFPLLLLGVVFLASVSVSFGIAYWEKQNPSHNKCLRSCNSEKDSYRNQACH
ARCNLLKVEEEEECEEGQIPRPRPQHPERERQQHGEKEEDEGEQPRPFPFPRPRQPH
QEEEHEQKEEHEWHRKEEKHGGKGSEEEQDEREHPRPHQPHQKEEEKHEWQHKQ
EKHQGKESEEEEEDQDEDEEQDKESQESEGSESQREPRRHKNKNPFHFNSKRFQTL
FKNQYGHVRVLQRFNKRSQQLQNLRDYRILEFNSKPNTLLLPHHADADYLIVILNG
TAILTLVNNDDRDSYNLQSGDALRVPAGTTYYVVNPDNDENLRMITLAIPVNKPG
RFESFFLSSTQAQQSYLQGFSKNILEASYDTKFEEINKVLFGREEGQQQGEERLQES
VIVEISKKQIRELSKHAKSSSRKTISSEDKPFNLRSRDPIYSNKLGKLFEITPEKNPQL
RDLDVFLSVVDMNEGALFLPHFNSKAIVVLVINEGEANIELVGIKEQQQRQQQEEQ
PLEVRKYRAELSEQDIFVIPAGYPVVVNATSDLNFFAFGINAENNQRNFLAGSKDN
VISQIPSQVQELAFPGSAKDIENLIKSQSESYFVDAQPQQKEEGNKGRKGPLSSILRA
FY
SEQ ID NO: 75 CNLLK SEQ ID NO: 76 EEDEDEQPRPIPFPRPQPR
SEQ ID NO: 77 EEQEWPR
SEQ ID NO: 78 QFPFPRPPHQK
SEQ ID NO: 79 ESEESEDSELR
SEQ ID NO: 80 NPFLFGSNR
SEQ ID NO: 81 FETLFK
SEQ ID NO: 82 SPQLQNLR
SEQ ID NO: 83 LQSGDALR
SEQ ID NO: 84 VPSGTTYYVVNPDNNENLR
SEQ ID NO: 85 FESFFLSSTEAQQSYLQGFSR
SEQ ID NO: 86 FEEINK
SEQ ID NO: 87 VLFSR
SEQ ID NO: 88 TISSEDKPFNLR
SEQ ID NO: 89 DPIYSNK
SEQ ID NO: 90 FFEITPEK
SEQ ID NO: 91 AIVILVINEGDANIELVGLK
SEQ ID NO: 92 EQQQEQQQEEQPLEVR
SEQ ID NO: 93 NFLAGSQDNVISQIPSQVQELAFPGSAQAVEK
SEQ ID NO: 94 ESYFVDAQPK
SEQ ID NO: 95 Gly m 6 Gl Ping CAA26723.1 495aa
MAKLVFSLCFLLFSGCCFAFSSREQPQQNECQ1QKLNALKPGNRIESEGGLIETWNP NNKPFQCAGVALSRCTLNRNALRRPSYTNGPQEIYIQQGKGIFGMIYPGCSSTFEEP QQPQQRGQSSRPQDRHQ IYNSREGDLIAVPTGVAWW YNNEDTPVVAVSIIDTN SLENQLDQMPRRFYLAGNQEQEFLKYQQEQGGHQSQKGKHQQEEENEGGSILSGF TLEFLEHAFSVDKQIAKNLQGENEGEDKGAIVTVKGGLSVIKPPTDEQQQRPQEEE EEEEDEKPQCKGKDKHCQRPRGSQSKSRRNGIDETICTMRLRHNIGQTSSPDIYNPQ AGSVTTATSLDFPALSWLRLSAGFGSLR NAMFVPHYNLNANSllYALNGRALIQV VNCNGERVFDGELQEGRVLIVPQNFVVAARSQSDNFEYVSFKTNDTPMIGTLAGA NSLLNALPEEVIQHTFNLKSQQARQIKNN PF FLVPPQESQKRAVA SEQ ID NO: 96 LVFSLCFLLFSGCCFAFSSPv
SEQ ID NO: 97 EQPQQNECQIQK
SEQ ID NO: 98 RPSYTNGPQEIYIQQGK
SEQ ID NO: 99 HQQEEENEGGSILSGFTLEFLEHAFSVDK
SEQ ID NO: 100 HCQRPR
SEQ ID NO: 101 HNIGQTSSPDIYNPQAGSVTTATSLDFPALSWLR
SEQ ID NO: 102 ALIQVVNCNGER
SEQ ID NO: 103 VFDGELQEGR
SEQ ID NO: 104 TNDTPMIGTLAGANSLLNALPEEVIQHTFNLK
SEQ ID NO: 105 NNNPFK
SEQ ID NO: 106 FLVPPQESQK
SEQ ID NO: 107 Gly m 6 G2 Ping CAA26575.1 485aa
MAKLVLSLCFLLFSGCFALREQAQQNECQIQKLNALKPGNRIESEGGFIETWNPNN KPFQCAGVALSRCTLNRNALRRPSYTNGPQEIYIQQGNGIFGMIFPGCPSTYQEPQE SQQRGRSQRPQDRHQKVHRFREGDLIAVPTGVAWWMYNNEDTPVVAVSIIDTNS LENQLDQMPRRFYLAGNQEQEFLKYQQQQQGGSQSQKGKQQEEENEGSNILSGF APEFLKEAFGVNMQIVRNLQGENEEEDSGAIVTVKGGLRVTAPAMRKPQQEEDDD DEEEQPQCVETDKGCQRQSKRSRNGIDETICTMRLRQNIGQNSSPDIYNPQAGSITT ATSLDFPALWLLKLSAQYGSLRKNAMFVPHYTLNANSIIYALNGRALVQVVNCNG ERVFDGELQEGGVLIVPQNFAVAAKSQSDNFEYVSFKTNDRPSIGNLAGANSLLNA LPEEVIQHTFNLKSQQARQVKNNNPFSFLVPPQESQRRAVA
SEQ ID NO: 108 LVLSLCFLLFSGCFALR
SEQ ID NO: 109 EQAQQNECQIQK
SEQ ID NO: 1 1 0 RPSYTNGPQE1YIQQGNGIFGMIFPGCPSTYQEPQESQQR
SEQ ID NO: 1 1 1 SQRPQDR
SEQ ID NO: 1 12 QQEEENEGSNILSGFAPEFLK
SEQ ID NO: 1 13 EAFGVNMQIVR
SEQ ID NO: 1 14 KPQQEEDDDDE EEQPQC VETDK
SEQ ID NO: 1 1 5 LSAQYGSLR SEQ ID NO: 1 16 N AMF VPHYTLN AN SII Y ALNGR
SEQ ID NO: 1 17 ALVQVV CNGER
SEQ ID NO: 118 VFDGELQEGGVLIVPQNFAVAAK
SEQ ID NO: 119 TNDRPSIGNLAGANSLLNALPEEVIQHTFNLK
SEQ ID NO: 120 NNNPFSFLVPPQESQR
Signature Peptides for Gly m 6 G3
SEQ ID NO: 121 LVLSLCFLLFSGCCFAFSFR
SEQ ID NO: 122 EQPQQNECQIQR
SEQ ID NO: 123 QQEEENEGG SIL SGF APEFLEHAF V VDR
SEQ ID NO: 124 LQGENEEEEK
SEQ ID NO: 125 GGLSVISPPTEEQQQRPEEEEKPDCDEK
SEQ ID NO: 126 HCQSQSR
SEQ ID NO: 127 LSAQFGSLR
SEQ ID NO: 128 VFDGELQEGQVLIVPQNFAVAAR
SEQ ID NO: 129 TNDRPSIGNLAGANSLLNALPEEVIQQTFNLR
SEQ ID NO: 130 Gly m 6 G4 consensus sequence
MGKPFTLSLSSLCLLLLSSACFAISSSKLNECQLNNLNALEPDHRVESEGGLIQTWN SQHPELKCAGVTVSKLTLNRNGLHLPSYSPYPRMIIIAQGKGALGVAIPGCPETFEE PQEQSNRRGSRSQKQQLQDSHQKIRHFNEGDVLVIPPGVPYWTYNTGDEPVVAISL LDTSNFNNQLDQTPRVFYLAGNPDIEYPETMQQQQQQKSHGGRKQGQHQQEEEE EGGSVLSGFSKHFLAQSFNTNEDIAEKLQSPDDERKQIVTVEGGLSVISPKWQEQQ DEDEDEDEDDEDEQ1PSHP
SEQ ID NO: 131 Gly m 6 G4 Ping CAA60533.1 563aa
MGKPFTLSLSSLCLLLLSSACFAISSSKLNECQLNNLNALEPDHRVESEGGLIQTWN SQHPELKCAGVTVSKLTLNRNGLHLPSYSPYPRMIIIAQGKGALGVAIPGCPETFEE PQEQSNRRGSRSQKQQLQDSHQKIRHFNEGDVLVIPPGVPYWTYNTGDEPVVAISL LDTSNFNNQLDQTPRVFYLAGNPDIEYPETMQQQQQQKSHGGRKQGQHQQEEEE EGGSVLSGFSKHFLAQSFNTNEDIAEKLQSPDDERKQIVTVEGGLSVISPKWQEQQ DEDEDEDEDDEDEQIPSHPPRRPSHGKREQDEDEDEDEDKPRPSRPSHGKREQDQD QDEDEDEDEDQPRKSREWRSKKTQPRRPRQEEPRERGCETRNGVEENICTLKLHE NIARPSRADFYNP AGRISTLNSLTLPALRQFQLSAQYVVLYKNG1YSPHWNLNAN SVIYVTRGQGKVRVVNCQGNAVFDGELRRGQLLVVPQNFVVAEQAGEQGFEYIV FKTHHNAVTSYLKDVFRAIPSEVLAHSYNLRQSQVSELKYEGNWGPLVNPESQQG SPRVKVA
SEQ ID NO: 132 Gly m 6 G4 UniProt Q9SB 1 1 G4 563aa
MGKPFTLSLSSLCLLLLSSACFAISSSKLNECQLNNLNALEPDHRVESEGGLIQTWN SQHPELKCAGVTVSKLTLNRNGLHLPSYSPYPRMIIIAQGKGALGVAIPGCPETFEE PQEQSNRRGSRSQKQQLQDSHQKIRHFNEGDVLVIPPGVPYWTYNTGDEPVVAISL LDTSNFNNQLDQTPRVFYLAGNPDIEYPETMQQQQQQKSHGGRKQGQHQQEEEE EGGSVLSGFSKHFLAQSFNTNEDIAEKLQSPDDERKQIVTVEGGLSVISPKWQEQQ DEDEDEDEDDEDEQIPSHPPRRPSHGKREQDEDEDEDEDKPRPSRPSQGKREQDQD QDEDEDEDEDQPRKSREWRSKKTQPRRPRQEEPRERGCETRNGVEENICTLKLHE NIARPSRADFYNPKAGRISTLNSLTLPALRQFQLSAQYVVLYKNGIYSPHWNLNAN SVIYVTRGQGKVRVVNCQGNAVFDGELRRGQLLVVPQNFVVAEQAGEQGFEYIV FKTHFINAVTSYLKDVFRAIPSEVLAHSYNLRQSQVSELKYEGNWGPLV PESQQG SPRVKVA
SEQ ID NO: 133 Gly m 6 G4 Ping CAA55977.1 517aa
MGKPFFTLSLSSLCLLLLSSACFAITSSKFNECQLNNLNALEPDHRVESEGGLIETW NSQHPELQCAGVTVSKRTLNRNGLHLPSYSPYPQMIIVVQGKGAIGF AFPGCPETFE KPQQQSSRRGSRSQQQLQDSHQKIRHFNEGDVLVIPPGVPYWTYNTGDEPVVAISL LDTSNF QLDQNPRVFYLAGNPDIEHPETMQQQQQQKSHGGRKQGQHQQQEEE GGSVLSGFSKHFLAQSFNTNEDTAEKLRSPDDERKQIVTVEGGLSVISPKWQEQED EDEDEDEEYEQTPSYPPRRPSHGKHEDDEDEDEEEDQPRPDHPPQRPSRPEQQEPR GRGCQTRNGVEENICTMKLHENIARPSRADFYNPKAGRISTLNSLTLPALRQFGLS AQYVVLYRNGIYSPHWNLNANSVIYVTRGKGRVRVV CQGNAVFDGELRRGQLL VVPQNFVVAEQGGEQGLEYVVFKTHHNAVSSYIKDVFRAIPSEVLSNSYNLGQSQ VRQLKYQGNSGPLVNP
SEQ ID NO: 134 Gly m 6 G4 Ping AAA33964.1 516aa
MGKPFFTLSLSSLCLLLLSSACFAITSSKFNECQLNNLNALEPDHRVESEGGLIETW NSQHPELQC AG VTV SKRTLNRNG SHLP S YLPYPQMII V VQGKG AIGF AFPGCPETFE KPQQQS SRRG SRSQQQLQDSHQKIRHFNEGDVLVIPLG VPY WTYN TGDEPVVAI SP LDTSNFNNQLDQNPRVFYLAGNPDIEHPETMQQQQQQKSHGGRKQGQHRQQEEE GGSVLSGFSKHFLAQSFNTNEDTAEKLRSPDDERKQIVTVEGGLSVISPKWQEQED EDEDEDEE YGRTPS YPPRRPSHGKHEDDEDEDEEEDQPRPDHPPQRPSRPEQQEPR GRGCQTRNGVEENICTMKLHENIARPSRADFYNPKAGRISTLNSLTLPALRQFGLS AQYVVLYRNGIYSPDWNLNANSVTMTRGKGRVRVVNCQGNAVFDGELRRGQLL VVPQNPAVAEQGGEQGLEYVVFKTHHNAVSSY1KDVFRVIPSEVLSNSYNLGQSQ VRQLKYQGN SGPLVNP
SEQ ID NO: 135 LNECQLNNLNALEPDHR
SEQ ID NO: 136 CAGVTVSK
SEQ ID NO: 137 LTLNR
SEQ ID NO: 138 MIIIAQGK
SEQ ID NO: 139 GALGVAIPGCPETFEEPQEQSNR SEQ ID NO: 140 QQLQDSHQK
SEQ ID NO : 141 VFYLAGNPDIEYPETMQQQQQQK
SEQ ID NO: 142 QGQHQQEEEEEGGS VLSGFSK
SEQ ID NO: 143 HFLAQSFNTNEDIAEK
SEQ ID NO: 144 QIVTVEGGLSVISPK
SEQ ID NO: 145 WQEQQDEDEDEDEDDEDEQIPSHPPR
SEQ ID NO: 146 RPSHGK
SEQ ID NO: 147 EQDEDEDEDEDKPRPSRPSQGK
SEQ ID NO: 148 QEEPR
SEQ ID NO: 149 NGVEENICTLK
SEQ ID NO: 150 LHENIARPSR
SEQ ID NO: 151 ISTLNSLTLPALR
SEQ ID NO: 152 QFQLSAQYVVLYK
SEQ ID NO: 153 NGIYSPHWNLN ANSVIYVTR
SEQ ID NO: 154 DVFR
SEQ ID NO: 155 AIPSEVLAHSYNLR
SEQ ID NO: 156 QSQVSELK
SEQ ID NO: 58 YEGNWGPLVNPESQQGSPR
Signature Peptides for Gly m 6 precursor
SEQ ID NO: 59 GALQCKPGCPETFEEPQEQSNR
SEQ ID NO: 60 LQSPDDER

Claims

CLAIMS We claim:
1. A method of selecting candidate signature peptide for quantitation of known allergen and potential allergens from a plant-based sample, comprising:
(a) identifying potential allergens based on homology to at least one known allergen protein sequence;
(b) performing sequence alignment of the at least one known allergen and potential allergens identified in step (a);
(c) selecting a consensus sequence or representative sequence based on the sequence alignment;
(d) determining a plural of candidate signature peptides based on conservative regions or domains from the sequence alignment and in silico digestion data of the consensus sequence or representative sequence selected in Step (c); and
(e) quantitating the amount of the at least one known allergen and potential allergens in the plant-based sample based on measurements of the signature peptides.
2. The method of claim 1 , wherein the quantitating step uses a column chromatography and mass spectrometry.
3. The method of claim 1 , wherein the quantitating step comprises measuring the plural of candidate signature peptides using high resolution accurate mass spectrometry (HRAM MS).
4. The method of claim 1 , wherein the quantitating step comprises calculating corresponding peak heights or peak areas of the candidate signature peptides from mass spectrometry.
5. The method of claim 1 , wherein the quantitating step comprises comparing data from high fragmentation mode and lo fragmentation mode from mass spectrometry.
6. The method of claim 1 , wherein the at least one known allergen comprises Gly m 1 , Gly m 3, Gly m 4, Gly m 5 (beta-conglycinin), Gly m 6 (Glycinin) Gl , Gly m 6 (Glycinin) G2, Gly m 6 (Glycinin) G3, Gly m 6 (Glycinin) G4, or Gly m 6 (Glycinin) precursor.
7. The method of claim 1, wherein the potential allergens comprise at least one sequence selected from the group consisting of:
(a) SEQ ID NOs 12 and 30-35 for Gly m 1 ;
(b) SEQ ID NOs 13 and 39-40 for Gly m 3;
(c) SEQ ID NOs 14 and 43-51 for Gly m 4;
(d) SEQ ID NOs 15 and 62-74 for Gly m 5;
(e) SEQ ID NOs 16 and 95 for Gly m 6 Gl ;
(f> SEQ ID NOs 17 and 107 for Gly m 6 G2;
(g) SEQ ID NO: 18 for Gly m 6 G3;
(h) SEQ ID NOs 19 and 131 -134 for Gly m 6 G4; and
(0 SEQ ID NO: 20 for Gly m 6 precursor.
8. The method of claim 1 , wherein the candidate signature peptides comprise at least one sequence selected from the group consisting of:
(a) SEQ ID NOs: 1, 36, and 37 for Gly m 1 ;
(b) SEQ ID NOs: 2 and 41 for Gly m 3;
(c) SEQ ID NOs: 52-57 for Gly m 4;
(d) SEQ ID NOs: 3 and 75-94 for Gly m 5;
(e) SEQ ID NOs: 96-106 for Gly m 6 Gl ;
(f) SEQ ID NOs: 4 and 108- 120 for Gly m 6 G2;
(g) SEQ ID NOs: 5 and 121 -129 for Gly m 6 G3;
(h) SEQ ID NOs: 135-156 and 58 for Gly m 6 G4; and
(0 SEQ ID NOs: 6, 59, and 60 for Gly m 6 precursor.
9. The method of claim 1 , wherein the plant-based sample comprises a soybean seed or part of a soybean seed.
10. A system for quantitating one or more protein of interest with known amino acid sequence in a plant-based sample, the system comprising:
(a) a high-throughput means for extracting proteins from a plant-based sample;
(b) a process module for digesting extracted proteins with at least one protease;
(c) a separation module for separating peptides in a single step;
(d) a selection module for selecting a plural of signature peptides for at least one known allergen and potential allergens; and
(e) a mass spectrometry for measuring the plural of signature peptides.
1 1. The system of claim 10, wherein the separation module comprises a column chromatography.
12. The system of claim 1 1 , wherein the column chromatography comprises a liquid column chromatography.
13. The system of claim 10, wherein the mass spectrometry comprises a high resolution accurate mass spectrometry (HRAM MS).
14. The system of claim 10, wherein the selection module uses a method according to claim 1.
15. The system of claim 10, wherein the at least one known allergen comprises Gly m 1 , Gly m 3, Gly m 4, Gly m 5 (beta-conglycinin), Gly m 6 (Glycinin) G l , Gly m 6 (Glycinin) G2, Gly m 6 (Glycinm) G3, Gly m 6 (Glycinin) G4, or Gly m 6 (Glycinin) precursor.
16. The system of claim 10, wherein the potential allergens comprise at least one sequence selected from the group consisting of:
(a) SEQ ID NOs 12 and 30-35 for Gly m 1 ;
(b) SEQ ID NOs 13 and 39-40 for Gly m 3;
(c) SEQ ID NOs 14 and 43-51 for Gly m 4;
(d) SEQ ID NOs 1 5 and 62-74 for Gly m 5;
(e) SEQ ID NOs 16 and 95 for Gly m 6 G l ; (f) SEQ ID NOs: 17 and 107 for Gly m 6 G2;
(g) SEQ ID NO: 18 for Gly m 6 G3;
(h) SEQ ID NOs: 19 and 131-134 for Gly m 6 G4; and
(i) SEQ ID NO: 20 for Gly m 6 precursor.
17. The system of claim 10, wherein the signature peptides comprise at least one sequence selected from the group consisting of:
(a) SEQ ID NOs: 1 , 36, and 37 for Gly m 1 ;
(b) SEQ ID NOs: 2 and 41 for Gly m 3;
(c) SEQ ID NOs: 52-57 for Gly m 4;
(d) SEQ ID NOs: 3 and 75-94 for Gly m 5;
(e) SEQ ID NOs: 96-106 for Gly m 6 Gl ;
(f) SEQ ID NOs: 4 and 108-120 for Gly m 6 G2;
(g) SEQ ID NOs: 5 and 121-129 for Gly m 6 G3;
(h) SEQ ID NOs: 135-156 and 58 for Gly m 6 G4; and
(0 SEQ ID NOs: 6, 59, and 60 for Gly m 6 precursor..
18. The system of claim 10, wherein the plant-based sample comprises a soybean seed or part of a soybean seed.
19. A high-throughput method of quantitating at least one allergen with known amino acid sequence and homologous potential allergens in a plant-based sample, comprising using the system of claim 10.
PCT/US2015/044710 2014-08-11 2015-08-11 Methods and systems for selective quantitation and detection of allergens WO2016025516A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201580054874.8A CN106796242A (en) 2014-08-11 2015-08-11 For the quantitative method and system with detection of selectivity of allergen
BR112017002622A BR112017002622A2 (en) 2014-08-11 2015-08-11 methods and systems for selective quantification and allergen detection
EP15831719.8A EP3180622A4 (en) 2014-08-11 2015-08-11 Methods and systems for selective quantitation and detection of allergens
AU2015301806A AU2015301806A1 (en) 2014-08-11 2015-08-11 Methods and systems for selective quantitation and detection of allergens
CA2958063A CA2958063A1 (en) 2014-08-11 2015-08-11 Methods and systems for selective quantitation and detection of allergens

Applications Claiming Priority (18)

Application Number Priority Date Filing Date Title
US201462035876P 2014-08-11 2014-08-11
US201462035944P 2014-08-11 2014-08-11
US201462035744P 2014-08-11 2014-08-11
US201462035920P 2014-08-11 2014-08-11
US201462035768P 2014-08-11 2014-08-11
US201462035731P 2014-08-11 2014-08-11
US201462035858P 2014-08-11 2014-08-11
US201462035800P 2014-08-11 2014-08-11
US62/035,731 2014-08-11
US62/035,920 2014-08-11
US62/035,768 2014-08-11
US62/035,800 2014-08-11
US62/035,858 2014-08-11
US62/035,876 2014-08-11
US62/035,944 2014-08-11
US62/035,744 2014-08-11
US201462036926P 2014-08-13 2014-08-13
US62/036,926 2014-08-13

Publications (1)

Publication Number Publication Date
WO2016025516A1 true WO2016025516A1 (en) 2016-02-18

Family

ID=55304559

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/044710 WO2016025516A1 (en) 2014-08-11 2015-08-11 Methods and systems for selective quantitation and detection of allergens

Country Status (6)

Country Link
EP (1) EP3180622A4 (en)
CN (1) CN106796242A (en)
AU (1) AU2015301806A1 (en)
BR (1) BR112017002622A2 (en)
CA (1) CA2958063A1 (en)
WO (1) WO2016025516A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018140370A1 (en) * 2017-01-25 2018-08-02 Dow Agrosciences Llc Methods and systems for selective quantitation and detection of allergens including gly m 7

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110862996B (en) * 2019-12-23 2020-12-25 华中农业大学 Application of isolated soybean gene in improving soybean cyst nematode resistance
CN111187343B (en) * 2020-03-16 2022-02-25 西北大学 Peony 2S albumin and extraction method and application thereof
CN113429474B (en) * 2021-07-07 2022-08-05 天津中医药大学 Method for identifying adulteration of vegetable protein meat sample based on characteristic peptide fragment label
CN114990181B (en) * 2022-05-13 2023-08-22 中食都庆(山东)生物技术有限公司 Anti-aging soybean peptide and preparation method and application thereof
CN116491631B (en) * 2023-05-26 2024-06-25 青岛农业大学 Braised pork with prefabricated dish and preparation method thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110294700A1 (en) * 2010-06-01 2011-12-01 Thelen Jay J High-throughput quantitation of crop seed proteins
US20140206027A1 (en) * 2005-09-15 2014-07-24 Alk-Abelló A/S Method for quantification of allergens

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2347277T3 (en) * 2005-09-15 2010-10-27 Alk-Abello A/S PROCEDURE FOR QUANTIFICATION OF ALLERGENS.
AU2015275086B2 (en) * 2014-06-10 2018-03-15 Dow Agrosciences Llc Quantitative analysis of transgenic proteins

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140206027A1 (en) * 2005-09-15 2014-07-24 Alk-Abelló A/S Method for quantification of allergens
US20110294700A1 (en) * 2010-06-01 2011-12-01 Thelen Jay J High-throughput quantitation of crop seed proteins

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HOUSTON ET AL.: "Quantitation of soybean allergens using tandem mass spectrometry", JOURNAL OF PROTEOME RESEARCH, vol. 10, no. 2, 2010, pages 763 - 773, XP009153905 *
JULKA ET AL.: "Quantification of Gly m 4 protein, a major soybean allergen, by two-dimensional liquid chromatography with ultraviolet and mass spectrometry detection", ANALYTICAL CHEMISTRY, vol. 84, no. 22, 2012, pages 10019 - 10030, XP055399229 *
See also references of EP3180622A4 *
SEPPALA ET AL.: "Absolute quantification of allergens from complex mixtures : a new sensitive tool for standardization of allergen extracts for specific immunotherapy", JOURNAL OF PROTEOME RESEARCH, vol. 10, no. 4, 2011, pages 2113 - 2122, XP055204232, DOI: doi:10.1021/pr101150z *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018140370A1 (en) * 2017-01-25 2018-08-02 Dow Agrosciences Llc Methods and systems for selective quantitation and detection of allergens including gly m 7
US11808771B2 (en) 2017-01-25 2023-11-07 Corteva Agriscience Llc Methods and systems for selective quantitation and detection of allergens including Gly m 7

Also Published As

Publication number Publication date
CA2958063A1 (en) 2016-02-18
EP3180622A4 (en) 2018-02-28
CN106796242A (en) 2017-05-31
AU2015301806A1 (en) 2017-03-02
BR112017002622A2 (en) 2018-02-20
EP3180622A1 (en) 2017-06-21

Similar Documents

Publication Publication Date Title
US20240019446A1 (en) Methods and systems for selective quantitation and detection of allergens including gly m 7
AU2015275086B2 (en) Quantitative analysis of transgenic proteins
EP3180622A1 (en) Methods and systems for selective quantitation and detection of allergens
US9018580B2 (en) Method for detecting molecules through mass spectrometry
EP2437869B1 (en) Multiplex analysis of stacked transgenic protein
WO2018223076A1 (en) Methods for absolute quantification of low-abundance polypeptides using mass spectrometry
AU2015301885B2 (en) Systems and methods for selective quantitation and detection of allergens
Christofakis et al. LC–MS/MS techniques for food allergen testing
Jaeger Development and Application of Mass Spectrometry based Tools for Metabolomic and Proteomic Profiling of Transgenic Arabidopsis thaliana.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15831719

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2958063

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112017002622

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2015301806

Country of ref document: AU

Date of ref document: 20150811

Kind code of ref document: A

REEP Request for entry into the european phase

Ref document number: 2015831719

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015831719

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 112017002622

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20170209