WO2018174863A1 - Methods and composition for detecting early stage colon cancer with rna-seq expression profiling - Google Patents
Methods and composition for detecting early stage colon cancer with rna-seq expression profiling Download PDFInfo
- Publication number
- WO2018174863A1 WO2018174863A1 PCT/US2017/023478 US2017023478W WO2018174863A1 WO 2018174863 A1 WO2018174863 A1 WO 2018174863A1 US 2017023478 W US2017023478 W US 2017023478W WO 2018174863 A1 WO2018174863 A1 WO 2018174863A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- colon cancer
- reagents
- sample
- target analytes
- biomarker
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/574—Immunoassay; Biospecific binding assay; Materials therefor for cancer
- G01N33/57407—Specifically defined cancers
- G01N33/57419—Specifically defined cancers of colon
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/112—Disease subtyping, staging or classification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Definitions
- the present invention relates to expression profiling to differentiate early stage colon cancer patients from normal subjects.
- Colon cancer is the third most common cancer worldwide.
- One of the most important prognostic factors of colon cancer is the stage at diagnosis, with a 5-year relative survival rate greater than 90% for patients diagnosed at early stages.
- colon cancer often develops through a step-wise adenoma-carcinoma sequence; thus most patients could be cured if the disease was detected and resected at a precancerous or early stage. Therefore, early detection of colon cancer and also precancerous lesions is one of the main prerequisites for successful treatment and reduction of mortality from this disease.
- RNA-seq Serum RNAs and proteins found to correlate with tumor status and/or patient survival are increasingly being applied as diagnostic and prognostic indicators in various carcinomas.
- RNA-seq technology provides a revolutionary tool for transcriptome analysis. Compared with microarray platform, RNA-seq has less background noise due to image analysis and is more sensitive in detection of transcripts with low-abundance or higher fold change in expression. In this invention, we use RNA-seq to find biomarkers for colon cancer early detection.
- methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all of the target molecules selected from Table 3, or any sub-combinations thereof, in a sample from a subject.
- methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all early stage colon cancer biomarkers identified in experiment conducted during development of embodiments of the present invention.
- biomarkers are selected from Table 3, or any sub-combinations thereof.
- a method comprises detecting the level of one or more biomarkers in a sample from a subject.
- a method of monitoring colon cancer (e.g., response to treatment, likelihood of mortality, etc.) in a subject comprises forming a biomarker panel having 50 biomarker proteins from colon cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., comprising MYOC, COL11A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1 , PLP1 , PCSK2, LYVE1 , SCN7A, M P7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA1 , ETV4, CPNE7, NR
- N is 1 to 50. In some embodiments, N is 2 to 50. In some embodiments, methods comprise panels of any combination of the colon cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., MYOC, COL11A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1, CNTFR, COL10A1 , PLP1, PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA!
- ETV4 CPNE7, NRXN1 , OTOP3, ADH1B, RP11-474D1.3, RP5-884M6.1 , ELFN1-AS1 , CRNDE, C17orf96, BLACAT1 , or any sub-combinations thereof), in addition to any other colon cancer biomarkers.
- methods comprise comparing biomarker(s) level to a reference value/range or a threshold. In some embodiments, deviation of the biomarker(s) level from the reference value/range, or exceeding or failing to meet the threshold, is indicative of a diagnosis, prognosis, etc. for the subject.
- each biomarker may be a protein biomarker.
- the method may comprise contacting biomarkers of the sample from the subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected.
- each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected.
- each biomarker capture reagent may be an antibody or an aptamer.
- a biomarker is a RNA transcript.
- the method may comprise contacting biomarkers of the sample from the subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected.
- each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected.
- each biomarker capture reagent may be a nucleic acid probe.
- the sample may be a biological sample (e.g., tissue, fluid (e.g., blood, urine, saliva, etc.), etc.).
- the sample is filtered, concentrated (e.g., 2-fold, 5-fold, 10 fold, 20-fold, 50-fold, 100-fold, or more), diluted, or un-manipulated.
- a methods further comprise treating the subject for colon cancer.
- treating the subject for colon cancer comprises a treatment regimen of administering one or more chemotherapeutic, radiation, surgery, etc.
- biomarkers described herein are monitored before, during, and/or after treatment.
- methods comprise providing palliative treatment (e.g., symptom relief) to a subject suffering from colon cancer, but not providing interventional treatment of the colon cancer.
- palliative treatment e.g., symptom relief
- methods comprise providing palliative treatment (e.g., symptom relief) to a subject suffering from colon cancer, but not providing interventional treatment of the colon cancer.
- palliative care is pursued in place of colon treatment.
- palliative care is provided in addition to treatment for colon cancer.
- a method comprises detecting the level of one or more colon cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., MYOC, COL11 A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1, PLP1 , PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRI A1 , ETV4, CPNE7, NRXN1 , OTO
- the method further comprises measuring the level one or more of the biomarkers at a second time point.
- colon cancer severity is improving (e.g., declining) if the level of said biomarkers improved at the second time point than at the first time point.
- biomarkers or panels thereof provide a prognosis regarding the future course a colon cancer in a subject (e.g., likelihood of survival, likelihood of mortality, likelihood of response to therapy, etc.).
- treatment decisions e.g., whether to treat, surgery, radiation, chemotherapy, etc.
- experiments conducted during development of embodiments of the present invention e.g., comprising MYOC, COL1 1A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1 , PLP1 , PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA!
- ETV4 CPNE7, NRXN1 , OTOP3, ADH1 B, RP11-474D1.3, RP5-884M6.1 , ELFN1-AS1 , CRNDE, C17orf96, BLACAT1 ,or any sub-combinations thereof).
- kits are provided.
- a kit comprises at least one, at least two, at least three, at least four, of at least five capture/detection reagents (e.g., antibody, probe, etc.), wherein each capture/detection reagents specifically binds to a different biomarker (e.g., protein or nucleic acid) selected from the colon cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., MYOC, COL11 A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1, PLP1 , PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , M
- N is 4 to 50. In some embodiments, N is 5 to 50. In some embodiments, at least one of the 51 biomarker proteins is selected from the colon cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., MYOC, COL11A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1 , PLP1 , PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1, FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA1 , ETV4, CPNE7, NRXN1 , OTOP3, ADH1B,
- compositions comprising proteins of a sample from a subject and at least one, at least two, at least three, at least four, at least five capture/detection reagents that each specifically bind to a different biomarker selected from the colon cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., MYOC, COL11A1 , CDH3, WISP2, SCGN, TRIB3, CE IP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1 , PLP1, PCSK2, LYVE1 , SCN7A, MP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MA DC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA1, ETV4,
- FIG. 1 The analysis procedure of RNA sequencing data. Each step and packages used in alignment, quantification, and DE analysis are described in this figure.
- Figure 2 Scatterplot of calculated probabilities of colon cancer with selected 50-gene panel.
- the model was trained with Random Forest algorithm, 241/707 case/control (268/786 in total) were selected out randomly to train the model.
- colon cancer biomarkers are provided.
- a “biomarker” or “marker” it is meant a molecular entity whose representation in a sample is associated with a disease phenotype.
- colon cancer it is meant any cancerous growth arising from the colon, for example, Adenocarcinomas, Carcinoid tumors, Gastrointestinal stromal tumors, Lymphomas, Sarcomas, and the like, as known in the art or as described herein.
- colon cancer a molecular entity whose representation in a sample is associated with a colon cancer phenotype, e.g., the presence of colon cancer, the stage of colon cancer, a prognosis associated with the colon cancer, the predictability of the colon cancer being responsive to a therapy, etc.
- the marker may be said to be differentially represented in a sample having a colon cancer phenotype.
- Colon cancer biomarkers include proteins that are differentially represented in a colon cancer phenotype and their corresponding genetic sequences, i.e., mRNA, DNA, etc.
- a “gene” or “recombinant gene” it is meant a nucleic acid comprising an open reading frame that encodes for the protein. The boundaries of a coding sequence are determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) terminus.
- a transcription termination sequence may be located 3' to the coding sequence, in addition, a gene may optionally include its natural promoter (i.e., the promoter with which the exons and introns of the gene are operably linked in a non-recombinant cell , i.e., a naturally occurring cell), and associated regulatory sequences, and may or may not have sequences upstream of the AUG start site, and may or may not include untranslated leader sequences, signal sequences, downstream untranslated sequences, transcriptional start and stop sequences, polyadenylation signals, translational start and stop sequences, ribosome binding sites, and the like.
- its natural promoter i.e., the promoter with which the exons and introns of the gene are operably linked in a non-recombinant cell , i.e., a naturally occurring cell
- associated regulatory sequences may or may not have sequences upstream of the AUG start site, and may or may not include untranslated leader sequences, signal sequences,
- gene product or "expression product” are used herein to refer to the RNA transcription products (transcripts) of the gene, including mRNA; and the polypeptide translation products of such RNA transcripts, i.e. the amino acid product encoded by a gene.
- a gene product can be, for example, a RNA transcript of the gene, e.g. an unspliced RNA, a mRNA, a splice variant mRNA, a microRNA, a fragmented RNA, etc.; or an amino acid product encoded by the gene, including, for example, full length polypeptide, splice variants of the full length polypeptide, post-translationally modified polypeptide, and fragments of the gene product, e.g. peptides, etc.
- an elevated level of marker or marker activity may be associated with the colon cancer phenotype.
- a reduced level of marker or marker activity may be associated with the colon cancer phenotype.
- T is used to categorize the pathology of the tumor (TX: Primary tumor cannot be assessed. TO: No evidence of primary tumor.
- Tis Carcinoma in situ (intraepithelial or invasion of lamina basement).
- T1 Tumor invades submucosa.
- T2 Tumor invades muscularis propria.
- T3 Tumor invades through the muscularis propria into the horronrectal tissues.
- T4a Tumor penetrates to the surface of the visceral peritoneum.
- T4b Tumor directly invades or is adherent to other organs or structures;
- N describes the pathology of local lymph nodes (NX: The regional lymph nodes cannot be evaluated. NO: The cancer has not spread to the regional lymph nodes. N1 : Cancer has spread to 1 to 3 axillary (underarm) lymph node(s). N1c: Tumor deposit(s) in subserosa, mesentery, or nonperitonealized horric or perirectal tissues without regional nodal metastasis. N2: Metastasis in four or more regional lymph nodes. N2a: Metastasis in 4-6 regional lymph nodes. N2b: Metastasis in seven or more regional lymph nodes;
- M describes the extent, if any, of metastasis (MO: The disease has not metastasized.
- M1 Cancer has spread to distant organs.
- M1a Metastasis confined to one organ or site (eg, lungs, liver, ovary, nonregional node).
- M1b Metastases in more than one organ/site or the peritoneum.
- stage 0 stage ! and stage II.
- Table.1 The TNM classification for staging of colon cancer.
- a biomarker level is detected using a capture reagent.
- the capture reagent contains a feature that is reactive with a secondary feature on a solid support. In these embodiments, the capture reagent is exposed to the biomarker in solution, and then the feature on the capture reagent is used in conjunction with the secondary feature on the solid support to immobilize the biomarker on the solid support.
- Capture reagent is selected based on the type of analysis to be conducted.
- Capture reagents include but are not limited to aptamers, antibodies, other antibody mimetics and other protein scaffolds, autoantibodies, chimeras, small molecules, F(ab')2 fragments, single chain antibody fragments, FV fragments, single chain FV fragments, nucleic acids, lectins, ligand-binding receptors, affybodies, nanobodies, imprinted polymers, avimers, peptidomimetics, hormone receptors, cytokine receptors, and synthetic receptors, and modifications and fragments of these.
- biomarker presence or level is detected using a
- the biomarker presence or level is derived from the biomarker/capture reagent complex and is detected indirectly, such as, for example, as a result of a reaction that is subsequent to the biomarker/capture reagent interaction, but is dependent on the formation of the biomarker/capture reagent complex.
- biomarker presence or level is detected directly from the biomarker in a biological sample.
- biomarkers are detected using a multiplexed format that allows for the simultaneous detection of two or more biomarkers in a biological sample.
- capture reagents are immobilized, directly or indirectly, covalently or non-covalently, in discrete locations on a solid support.
- a multiplexed format uses discrete soiid supports where each solid support has a unique capture reagent associated with that solid support, such as, for example quantum dots.
- an individual device is used for the detection of each one of multiple biomarkers to be detected in a bioiogicai sample.
- Individual devices are configured to permit each biomarker in the bioiogicai sample to be processed simultaneously.
- a microtiter plate can be used such that each well in the plate is used to analyze one or more of multiple biomarkers to be detected in a biological sample.
- the fluorescent label is a fluorescent dye molecule.
- the fluorescent dye molecule includes at least one substituted indolium ring system in which the substituent on the 3-carbon of the indolium ring contains a chemically reactive group or a conjugated substance.
- the dye molecule includes an AlexFluor molecule, such as, for example, AlexaFluor 488, AlexaFluor 532, AlexaFluor 647, AlexaFluor680, or AlexaFluor 700.
- the dye moiecuie includes a first type and a second type of dye molecule, such as, e.g., two different AlexaFluor molecules.
- the dye molecule includes a first type and a second type of dye molecule, and the two dye molecules have different emission spectra.
- Fluorescence can be measured with a variety of instrumentation compatible with a wide range of assay formats.
- instrumentation for example, spectrofluorimeters have been designed to analyze microtiter plates, microscope slides, printed arrays, cuvettes, etc. See Principles of
- a chemiluminescence tag is optionally used to label a component of the biomarker/capture complex to enable the detection of a biomarker level.
- Suitable chemiluminescent materials include any of oxalylchloride, Rodamin 6G, Ru(bipy)32+, TMAE (tetrakis(dimethylamino)ethylene), Pyrogallol (1 ,2,3-trihydroxibenzene), Lucigenin, peroxyoxaiates, Aryl oxalates, Acridinium esters, dioxetanes, and others.
- the detection method includes an enzyme/substrate combination that generates a detectable signal that corresponds to the biomarker level (e.g., using the techniques of ELISA, Western blotting, isoelectric focusing).
- the enzyme catalyzes a chemical alteration of the chromogenic substrate which can be measured using various techniques, including spectrophotometry, fluorescence, and chemiiuminescence.
- Suitable enzymes include, for example, luciferases, luciferin, malate dehydrogenase, urease,
- HRPO horseradish peroxidase
- alkaline phosphatase beta-galactosidase
- glucoamylase lysozyme
- glucose oxidase galactose oxidase
- glucose-6-phosphate dehydrogenase uricase
- xanthine oxidase lactoperoxidase
- microperoxidase and the like.
- the detection method is a combination of fluorescence, chemiiuminescence, radionuclide or enzyme/substrate combinations that generate a
- multimodal signaling has unique and advantageous characteristics in biomarker assay formats.
- the biomarker levels for the biomarkers described herein is detected using any analytical methods including, singleplex aptamer assays, multiplexed aptamer assays, singleplex or multiplexed immunoassays, mRNA expression profiling histological/cytologicai methods, etc. as discussed below.
- Measuring mRNA in a biological sample may, in some embodiments, be used as a surrogate for detection of the level of a corresponding protein in the biological sample.
- a biomarker or biomarker panel described herein can be detected by detecting the appropriate RNA.
- mRNA expression levels are measured by reverse transcription quantitative polymerase chain reaction (RT-PCR followed with qPCR).
- RT-PCR reverse transcription quantitative polymerase chain reaction
- qPCR reverse transcription quantitative polymerase chain reaction
- qPCR fluorescence as the DNA amplification process progresses.
- qPCR can produce an absolute measurement such as number of copies of mRNA per cell.
- Northern blots, microarrays, RNA-seq, Invader assays, and RT-PCR combined with capillary electrophoresis have all been used to measure expression levels of mRNA in a sample. See Gene Expression Profiling; Methods and Protocols, Richard A. Shimkets, editor, Humana Press, 2004; herein incorporated by reference in its entirety.
- Immunoassay methods are based on the reaction of an antibody to its corresponding target or anaiyte and can detect the anaiyte in a sample depending on the specific assay format.
- monoclonal antibodies and fragments are often used because of their specific epitope recognition.
- Polyclonal antibodies have also been successfully used in various immunoassays because of their increased affinity for the target as compared to monoclonal antibodies.
- Immunoassays have been designed for use with a wide range of biological sample matrices. Immunoassay formats have been designed to provide qualitative, semi-quantitative, and quantitative results.
- Quantitative results are generated through the use of a standard curve created with known concentrations of the specific analyte to be detected.
- the response or signal from an unknown sample is plotted onto the standard curve, and a quantity or level corresponding to the target in the unknown sample is established.
- ELISA or EIA can be quantitative for the detection of an analyte. This method relies on attachment of a label to either the analyte or the antibody and the label component includes, either directly or indirectly, an enzyme. ELISA tests may be formatted for direct, indirect, competitive, or sandwich detection of the analyte. Other methods rely on labels such as, for example, radioisotopes (I 125 ) or fluorescence.
- Additional techniques include, for example, agglutination, nephelometry, turbidimetry, Western blot, immunoprecipitation, immunocytochemistry, immunohistochemistry, flow cytometry, Luminex assay, and others (see ImmunoAssay: A Practical Guide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005 edition; herein incorporated by reference in its entirety).
- Exemplary assay formats include enzyme-linked immunosorbent assay (ELISA), radioimmunoassay, fluorescent, chemiluminescence, and fluorescence resonance energy transfer (FRET) or time resolved-FRET (TR-FRET) immunoassays.
- ELISA enzyme-linked immunosorbent assay
- FRET fluorescence resonance energy transfer
- TR-FRET time resolved-FRET
- biomarkers include biomarker immunoprecipitation followed by quantitative methods that allow size and peptide level discrimination, such as gel electrophoresis, capillary electrophoresis, planar electrochromatography, and the like.
- Methods of detecting and/or for quantifying a detectable label or signal generating material depend on the nature of the label.
- the products of reactions catalyzed by appropriate enzymes can be, without limitation, fluorescent, luminescent, or radioactive or they may absorb visible or ultraviolet light.
- detectors suitable for detecting such detectable labels include, without limitation, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers. Any of the methods for detection can be performed in any format that allows for any suitable preparation, processing, and analysis of the reactions.
- This can be, for example, in multi-well assay plates (e.g., 96 wells or 384 wells) or using any suitable array or microarray.
- Stock solutions for various agents can be made manually or robotically, and all subsequent pipetting, diluting, mixing, distribution, washing, incubating, sample readout, data collection and analysis can be done robotically using commercially available analysis software, robotics, and detection instrumentation capable of detecting a detectable label.
- the biomarkers described herein may be detected in a variety of tissue samples using histological or cytological methods.
- one or more capture reagent/s specific to the corresponding biomarkers are used in a cytological evaluation of a sample and may include one or more of the following: collecting a cell sample, fixing the cell sample, dehydrating, clearing, immobilizing the cell sample on a microscope slide,
- the cell sample is produced from a cell block.
- one or more capture reagent s specific to the corresponding biomarkers are used in a histological evaluation of a tissue sample and may include one or more of the following: collecting a tissue specimen, fixing the tissue sample, dehydrating, clearing, immobilizing the tissue sample on a microscope slide, permeabilizing the tissue sample, treating for analyte retrieval, staining, destaining, washing, blocking, rehydrating, and reacting with capture reagent s in a buffered solution.
- fixing and dehydrating are replaced with freezing.
- results are analyzed and/or reported (e.g., to a patient, clinician, researcher, investigator, etc.).
- Results, analyses, and/or data e.g., signature, disease score, diagnosis, recommended course, etc. are identified and/or reported as an
- a result may be produced by receiving or generating data
- results determined by methods described herein can be independently verified by further or repeat testing.
- analysis results are reported (e.g., to a health care professional (e.g., laboratory technician or manager, physician, nurse, or assistant, etc.), patient, researcher, investigator, etc.).
- a result is provided on a peripheral, device, or component of an apparatus. For example, sometimes an outcome is provided by a printer or display.
- an outcome is reported in the form of a report. Generally, an outcome can be displayed in a suitable format that facilitates downstream use of the reported information.
- Non-limiting examples of formats suitable for use for reporting and/or displaying data, characteristics, etc. include text, outline, digital data, a graph, graphs, a picture, a pictograph, a chart, a bar graph, a pie-graph, a diagram, a flow chart, a scatter plot, a map, a histogram, a density chart, a function graph, a circuit diagram, a block diagram, a bubble map, a constellation diagram, a contour diagram, a cartogram, spider chart, Venn diagram, nomogram, and the like, and combination of the foregoing.
- Generating and reporting results from the methods described herein comprises transformation of biological data (e.g., presence or level of biomarkers) into a representation of the characteristics of a subject (e.g., likelihood of mortality, likelihood corresponding to treatment, etc.).
- a representation reflects information not determinable in the absence of the method steps described herein. Converting biologic data into understandable characteristics of a subject allows actions to be taken in response such information.
- a downstream individual upon receiving or reviewing a report comprising one or more results determined from the analyses provided herein, will take specific steps or actions in response. For example, a decision about whether or not to treat the subject, and/or how to treat the subject is made.
- receiving a report refers to obtaining, by a communication means, a written and/or graphical representation comprising results or outcomes of analysis.
- the report may be generated by a computer or by human data entry, and can be communicated using electronic means (e.g., over the internet, via computer, via fax, from one network location to another location at the same or different physical sites), or by another method of sending or receiving data (e.g., mail service, courier service and the like).
- the outcome is transmitted in a suitable medium, including, without limitation, in verbal, document, or file form.
- the file may be, for example, but not limited to, an auditory file, a computer readable file, a paper file, a laboratory file or a medical record file.
- a report may be encrypted to prevent unauthorized viewing.
- systems and method described herein transform data from one form into another form (e.g., from biomarker levels to diagnoistic/prognostic determination, etc.).
- the terms “transformed”, “transformation”, and grammatical derivations or equivalents thereof refer to an alteration of data from a physical starting material (e.g., biological sample, etc.) into a digital representation of the physical starting material (e.g., biomarker levels), a condensation/representation of that starting material (e.g., risk level), or a recommended action (e.g., treatment, no treatment, etc.).
- any combination of the biomarkers described herein can be detected using a suitable kit, such as for use in performing the methods disclosed herein.
- the biomarkers described herein may be combined in any suitable combination, or may be combined with other markers not described herein.
- any kit can contain one or more detectable labels as described herein, such as a fluorescent moiety, etc.
- a kit includes (a) one or more capture reagents for detecting one or more biomarkers in a biological sample, and optionally (b) one or more software or computer program products for providing a diagnosis/prognosis for the individual from whom the biological sample was obtained.
- one or more instructions for manually performing the above steps by a human can be provided.
- a kit comprises a solid support, a capture reagent, and a signal generating material.
- the kit can also include instructions for using the devices and reagents, handling the sample, and analyzing the data. Further the kit may be used with a computer system or software to analyze and report the result of the analysis of the biological sample.
- kits can also contain one or more reagents (e.g., solubilization buffers, detergents, washes, or buffers) for processing a biological sample.
- reagents e.g., solubilization buffers, detergents, washes, or buffers
- Any of the kits described herein can also include, e.g., buffers, blocking agents, mass spectrometry matrix materials, serum/plasma separators, antibody capture agents, positive control samples, negative control samples, software and information such as protocols, guidance and reference data.
- kits are provided for the analysis of glioma, wherein the kits comprise PCR primers for one or more biomarkers described herein.
- a kit may further include instructions for use and correlation of the biomarkers.
- kits may include a DNA array containing the complement of one or more of the biomarkers described herein, reagents, and/or enzymes for amplifying or isolating sample DNA.
- the kits may include reagents for real-time PCR, for example, TaqMan probes and/or primers, and enzymes.
- a kit can comprise (a) reagents comprising at least one capture reagent for determining the level of one or more biomarkers in a test sample, and optionally (b) one or more algorithms or computer programs for performing the steps of comparing the amount of each biomarker quantified in the test sample to one or more predetermined cutoffs, in some embodiments, an algorithm or computer program assigns a score for each biomarker quantified based on said comparison and, in some embodiments, combines the assigned scores for each biomarker quantified to obtain a total score. Further, in some embodiments, an algorithm or computer program compares the total score with a predetermined score, and uses the comparison to determine a diagnosis/prognosis. Alternatively, rather than one or more algorithms or computer programs, one or more instructions for manually performing the above steps by a human can be provided.
- the subject following a determination that a subject has suffers from colon cancer, the subject is appropriately treated.
- therapy is administered to treat colon cancer.
- therapy is administered to treat complications of colon cancer (e.g., surgery, radiation, chemotherapy).
- treatment comprises palliative care.
- methods of monitoring treatment of glioma are provided.
- the present methods of detecting biomarkers are carried out at a time 0.
- the method is carried out again at a time 1 , and optionally, a time 2, and optionally, a time 3, etc., in order to monitor the progression of colon cancer or to monitor the effectiveness of one or more treatments of colon cancer.
- Time points for detection may be separated by, for example at least 4 hours, at least 8 hours, at least 12 hours, at least 1 day, at least 2 days, at least 4 days, at least 1 week, at least 2 weeks, at least 1 month, at least 2 months, at least 3 months, at least 4 months, at least 6 months, or by 1 year or more.
- a treatment regimen is altered based upon the results of monitoring (e.g., upon determining that a first treatment is ineffective).
- the level of intervention may be altered.
- Reagents, cloning vectors, and kits for genetic manipulation referred to in this disclosure are available from commercial vendors such as BioRad, Stratagene, Invitrogen, Sigma-Aldrich, and ClonTech.
- the raw count RNA sequencing data for colon cancer patients were downloaded from GDC data portal.
- the patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port.
- the SRA RNA sequencing data for normal colon tissue were downloaded from GTEx data portal through dbGaP (Table 2).
- the two data sets were then manually curated based on the available stage and grade information from patient clinical data.
- 268 early stage colon cancer samples and 786 normal colon samples as our dataset for early stage colon cancer btomarker detection.
- Genomic sequencing pipeline for RNA sequencing data Genomic sequencing pipeline for RNA sequencing data.
- the entire RNA-seq pipeline was divided into two parts for GTEx data: alignment and quantification ( Figure 1).
- the alignment step consists of: SRA to bam conversion using SRA Toolkits (SRA Toolkit development team), bam to fastq conversion using Biobambam (Tischler G et al., 2014), and fastq to aligned bam conversion using STAR (Alex D et al., 2016).
- the quantification step consists of: quality improvement filtering using Fixmate
- quantification step results in gene raw counts for GTEx data and is conbined with GDC gene profile for further downstream analysis.
- the gene expression profile is then pre-filtered based on the mean expression per gene.
- the filtered profile is then normalized using quantile metric and is converted into log2 scale.
- Combat package (from edgeR, http://www.r-project.org) is then used to perform further normalization between GDC case, GDC control, and GTEx control to minimize the difference between normal controls from two databases ( Figure 1 ).
- the normalized gene profile is then analyzed by linear model using R package 'limma' (http://www.r-project.org/).
- the 50 genes with relatively low p-values and relatively large absolute value of log2 fold change were selected as our panel.
- RNA sequencing data for early stage colon cancer tissue and normal colon tissue were downloaded from GDC and GTEx data portal.
- the patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port.
- the normal colon tissue data from GTEx were processed using developed RNA-seq pipeline.
- the 50 genes used to differentiate of early stage are the 50 genes used to differentiate of early stage
- the Random Forest based risk mode stratified ali subjects in training and testing cohorts into two levels of risk for progression as discussed above (normal or early stage). 50 selected genes profiles (normalized) were used as the model input. The risk scores of colon cancer were calculated by the model ( Figure 2). We use 0.5 as the cutoff threshold.
- Unsupervised hierarchical clustering analysis was applied to the selected genes profiles to visually depict the association of the disease status with the abundance patterns of these genes profiles ( Figure 4). This analysis demonstrated two major clusters reflecting normal samples and early stage colon cancer samples. The error rate of the unsupervised clustering is less than 0.001 , which reinforced the effectiveness of the selected gene profiles for colon cancer assessment.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Immunology (AREA)
- Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Biotechnology (AREA)
- Biomedical Technology (AREA)
- Urology & Nephrology (AREA)
- Hematology (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Physics & Mathematics (AREA)
- Hospice & Palliative Care (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Oncology (AREA)
- Microbiology (AREA)
- Genetics & Genomics (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Cell Biology (AREA)
- Medicinal Chemistry (AREA)
- Food Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Colon cancer markers, colon cancer marker panels, and methods for obtaining a colon cancer marker level representation for a sample are provided, based upon RNAseq expression profiling. These composition and methods find use in a number of applications, including, for example, diagnosing colon cancer, prognosing colon cancer, monitoring a subject with colon cancer, and determining a treatment for colon cancer. In addition, systems, devices, and kits thereof that find use in practicing the subject methods are provided.
Description
FIELD OF THE INVENTION
The present invention relates to expression profiling to differentiate early stage colon cancer patients from normal subjects.
BACKGROUND OF THE INVENTION
Colon cancer is the third most common cancer worldwide. One of the most important prognostic factors of colon cancer is the stage at diagnosis, with a 5-year relative survival rate greater than 90% for patients diagnosed at early stages. Importantly, colon cancer often develops through a step-wise adenoma-carcinoma sequence; thus most patients could be cured if the disease was detected and resected at a precancerous or early stage. Therefore, early detection of colon cancer and also precancerous lesions is one of the main prerequisites for successful treatment and reduction of mortality from this disease.
Currently, there are no blood-based biomarkers suitable for population screening or early diagnosis of colon cancer as the majority of potential biomarkers fail the initial phases of the evaluation process and never make it to the clinic. The best currently available blood test, carcinoembryonic antigen(CEA), exhibits low sensitivity and specificity, especially in early stages of the disease. Novel biomarkers are urgently needed to detect early stage colon cancer.
Serum RNAs and proteins found to correlate with tumor status and/or patient survival are increasingly being applied as diagnostic and prognostic indicators in various carcinomas. RNA-seq technology provides a revolutionary tool for transcriptome analysis. Compared with microarray platform, RNA-seq has less background noise due to image analysis and is more sensitive in detection of transcripts with low-abundance or higher fold change in expression. In this invention, we use RNA-seq to find biomarkers for colon cancer early detection.
SUMMARY OF THE INVENTION
In some embodiments, methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all of the target molecules selected from Table 3, or any sub-combinations thereof, in a sample from a subject.
In some embodiments, methods are provided for detecting the level of at least one, at least two, at least three, at least four, or all early stage colon cancer biomarkers identified in experiment conducted during development of embodiments of the present invention. In some embodiments, biomarkers are selected from Table 3, or any sub-combinations thereof. In some embodiments, a method comprises detecting the level of one or more biomarkers in a sample from a subject.
In some embodiments, a method of monitoring colon cancer (e.g., response to treatment, likelihood of mortality, etc.) in a subject comprises forming a biomarker panel having 50 biomarker proteins from colon cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., comprising MYOC, COL11A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1 , PLP1 , PCSK2, LYVE1 , SCN7A, M P7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA1 , ETV4, CPNE7, NRXN1 , OTOP3, ADH1 B, RP11 - 474D1.3, RP5-884M6.1 , ELFN1-AS1 , CRNDE, C17orf96, BLACAT1 ,or any sub-combinations thereof), and detecting the level of each of the N biomarker proteins of the panel in a sample from the subject. In some embodiments, N is 1 to 50. In some embodiments, N is 2 to 50. In some embodiments, methods comprise panels of any combination of the colon cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., MYOC, COL11A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1, CNTFR, COL10A1 , PLP1, PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA! ETV4, CPNE7, NRXN1 , OTOP3, ADH1B, RP11-474D1.3, RP5-884M6.1 , ELFN1-AS1 , CRNDE, C17orf96, BLACAT1 , or any sub-combinations thereof), in addition to any other colon cancer biomarkers.
In some embodiments, methods comprise comparing biomarker(s) level to a reference value/range or a threshold. In some embodiments, deviation of the biomarker(s) level from the reference value/range, or exceeding or failing to meet the threshold, is indicative of a diagnosis, prognosis, etc. for the subject.
In any of the embodiments described herein, each biomarker may be a protein biomarker. In any of the embodiments described herein, the method may comprise contacting biomarkers of the sample from the subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected. In some embodiments, each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected. In any of the embodiments described herein, each biomarker capture reagent may be an antibody or an aptamer.
In some embodiments, a biomarker is a RNA transcript. In any of the embodiments described herein, the method may comprise contacting biomarkers of the sample from the
subject with a set of biomarker capture reagents, wherein each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a biomarker being detected. In some embodiments, each biomarker capture reagent of the set of biomarker capture reagents specifically binds to a different biomarker being detected. In any of the embodiments described herein, each biomarker capture reagent may be a nucleic acid probe.
In any of the embodiments described herein, the sample may be a biological sample (e.g., tissue, fluid (e.g., blood, urine, saliva, etc.), etc.). In some embodiments, the sample is filtered, concentrated (e.g., 2-fold, 5-fold, 10 fold, 20-fold, 50-fold, 100-fold, or more), diluted, or un-manipulated.
In any of the embodiments described herein, a methods further comprise treating the subject for colon cancer. In some embodiments, treating the subject for colon cancer comprises a treatment regimen of administering one or more chemotherapeutic, radiation, surgery, etc. in some embodiments, biomarkers described herein are monitored before, during, and/or after treatment.
In some embodiments, methods comprise providing palliative treatment (e.g., symptom relief) to a subject suffering from colon cancer, but not providing interventional treatment of the colon cancer. In some embodiments, when embodiments herein indicate a low likelihood of success in treating colon cancer, palliative care is pursued in place of colon treatment. In some embodiments, palliative care is provided in addition to treatment for colon cancer.
In some embodiments, methods of monitoring progression or severity of colon cancer and/or monitoring effectiveness of treatment in a subject are provided. In some embodiments, a method comprises detecting the level of one or more colon cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., MYOC, COL11 A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1, PLP1 , PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRI A1 , ETV4, CPNE7, NRXN1 , OTOP3, ADH1B, RP11-474D1.3, RP5-884 6.1 , ELFN1-AS1 , CRNDE, C17orf96, BLACAT1 ,or any subcombinations thereof) in a sample from the subject at a first time point. In some embodiments, the method further comprises measuring the level one or more of the biomarkers at a second time point. In some embodiments, colon cancer severity is improving (e.g., declining) if the level of said biomarkers improved at the second time point than at the first time point.
In some embodiments, biomarkers or panels thereof provide a prognosis regarding the future course a colon cancer in a subject (e.g., likelihood of survival, likelihood of mortality,
likelihood of response to therapy, etc.). In some embodiments treatment decisions (e.g., whether to treat, surgery, radiation, chemotherapy, etc.) are made based on the detection and/or quantification of one or more (e.g., 1, 2, 3, 4, 5) of the biomarkers identified in
experiments conducted during development of embodiments of the present invention (e.g., comprising MYOC, COL1 1A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1 , PLP1 , PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA! ETV4, CPNE7, NRXN1 , OTOP3, ADH1 B, RP11-474D1.3, RP5-884M6.1 , ELFN1-AS1 , CRNDE, C17orf96, BLACAT1 ,or any sub-combinations thereof).
In some embodiments, kits are provided. In some embodiments, a kit comprises at least one, at least two, at least three, at least four, of at least five capture/detection reagents (e.g., antibody, probe, etc.), wherein each capture/detection reagents specifically binds to a different biomarker (e.g., protein or nucleic acid) selected from the colon cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., MYOC, COL11 A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1, PLP1 , PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA1 , ETV4, CPNE7, NRXN1 , OTOP3, ADH1B, RP11-474D1.3, RP5-884M6.1 , ELFN1-AS1 , CRNDE, C17orf96, BLACAT1 ). In some embodiments, a kit comprises N capture/detection reagents. In some embodiments, N is 1 to 50. In some embodiments, N is 2 to 50. In some embodiments, N is 3 to 50. In some
embodiments, N is 4 to 50. In some embodiments, N is 5 to 50. In some embodiments, at least one of the 51 biomarker proteins is selected from the colon cancer biomarkers identified in experiments conducted during development of embodiments of the present invention (e.g., MYOC, COL11A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1 , PLP1 , PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1, FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA1 , ETV4, CPNE7, NRXN1 , OTOP3, ADH1B, RP11-474D1.3, RP5-884M6.1 , ELFN1-AS1 , CRNDE, C17orf96, BLACAT1 ). In some embodiments, compositions are provided comprising proteins of a sample from a subject and at least one, at least two, at least three, at least four, at least five capture/detection reagents that each specifically bind to a different biomarker selected from the colon cancer biomarkers identified in experiments conducted during development of embodiments of the present
invention (e.g., MYOC, COL11A1 , CDH3, WISP2, SCGN, TRIB3, CE IP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1 , PLP1, PCSK2, LYVE1 , SCN7A, MP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MA DC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA1, ETV4, CPNE7, NRXN1 , OTOP3, ADH1 B, RP11-474D1.3, RP5-884M6.1 , ELFN1-AS1 , CRNDE, C17orf96, BLACAT1 ).
BRIEF DESCRIPTION OF THE DRAWINGS
The invention wili be best understood from the following detailed description when read in conjunction with the accompanying drawings. The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. It is emphasized that, according to common practice, the various features of the drawings are not to- scale. On the contrary, the dimensions of the various features are arbitrarily expanded or reduced for clarity. Included in the drawings are the following figures.
Figure 1. The analysis procedure of RNA sequencing data. Each step and packages used in alignment, quantification, and DE analysis are described in this figure.
Figure 2. Scatterplot of calculated probabilities of colon cancer with selected 50-gene panel. The model was trained with Random Forest algorithm, 241/707 case/control (268/786 in total) were selected out randomly to train the model.
Figure 3. ROC curves for models of colon cancer assessment with selected biomarker profile evaluated on early stage patients versus normal subjects. Average true positive rate was calculated with 500 10-fold cross validation fits of the model.
Figure 4. Unsupervised hierarchical cluster analysis with heat map shows the
abundance pattern of selected biomarkers of early stage colon cancer patients versus normal subjects.
DETAIL DESCRIPTION OF THE INVENTION
Colon cancer markers and panels
In some aspects of the invention, colon cancer biomarkers are provided. By a "biomarker" or "marker" it is meant a molecular entity whose representation in a sample is associated with a disease phenotype. By "colon cancer" it is meant any cancerous growth arising from the colon, for example, Adenocarcinomas, Carcinoid tumors, Gastrointestinal stromal tumors, Lymphomas, Sarcomas, and the like, as known in the art or as described herein. Thus, by a colon cancer "biomarker" or "colon cancer marker" it is meant a molecular entity whose representation in a sample is associated with a colon cancer phenotype, e.g., the presence of colon cancer, the stage of colon cancer, a prognosis associated with the colon cancer, the predictability of the colon cancer being responsive to a therapy, etc. In other words, the marker may be said to be differentially represented in a sample having a colon cancer phenotype.
Colon cancer biomarkers include proteins that are differentially represented in a colon cancer phenotype and their corresponding genetic sequences, i.e., mRNA, DNA, etc. By a "gene" or "recombinant gene" it is meant a nucleic acid comprising an open reading frame that encodes for the protein. The boundaries of a coding sequence are determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxy) terminus. A transcription termination sequence may be located 3' to the coding sequence, in addition, a gene may optionally include its natural promoter (i.e., the promoter with which the exons and introns of the gene are operably linked in a non-recombinant cell , i.e., a naturally occurring cell), and associated regulatory sequences, and may or may not have sequences upstream of the AUG start site, and may or may not include untranslated leader sequences, signal sequences, downstream untranslated sequences, transcriptional start and stop sequences, polyadenylation signals, translational start and stop sequences, ribosome binding sites, and the like. The term "gene product" or "expression product" are used herein to refer to the RNA transcription products (transcripts) of the gene, including mRNA; and the polypeptide translation products of such RNA transcripts, i.e. the amino acid product encoded by a gene. A gene product can be, for example, a RNA transcript of the gene, e.g. an unspliced RNA, a mRNA, a splice variant mRNA, a microRNA, a fragmented RNA, etc.; or an amino acid product encoded by the gene, including, for example, full length polypeptide, splice variants of the full length polypeptide, post-translationally modified polypeptide, and fragments of the gene product, e.g. peptides, etc. In some instances, an elevated level of marker or marker activity may be associated with the colon cancer phenotype. In other instances, a reduced level of marker or marker activity may be associated with the colon cancer phenotype.
Colon cancer stage
We summarized colon cancer staging information (Table 1) based on National
Comprehensive Cancer Network. NCCN Clinical Practice Guidelines: Colon Cancer Version. 1.2017.
T is used to categorize the pathology of the tumor (TX: Primary tumor cannot be assessed. TO: No evidence of primary tumor. Tis: Carcinoma in situ (intraepithelial or invasion of lamina propria). T1 : Tumor invades submucosa. T2: Tumor invades muscularis propria. T3: Tumor invades through the muscularis propria into the pericolonrectal tissues. T4a: Tumor penetrates to the surface of the visceral peritoneum. T4b: Tumor directly invades or is adherent to other organs or structures;
N describes the pathology of local lymph nodes (NX: The regional lymph nodes cannot be evaluated. NO: The cancer has not spread to the regional lymph nodes. N1 : Cancer has spread to 1 to 3 axillary (underarm) lymph node(s). N1c: Tumor deposit(s) in subserosa, mesentery, or nonperitonealized pericolic or perirectal tissues without regional nodal metastasis. N2: Metastasis in four or more regional lymph nodes. N2a: Metastasis in 4-6 regional lymph nodes. N2b: Metastasis in seven or more regional lymph nodes;
M describes the extent, if any, of metastasis (MO: The disease has not metastasized. M1 : Cancer has spread to distant organs. M1a: Metastasis confined to one organ or site (eg, lungs, liver, ovary, nonregional node). M1b: Metastases in more than one organ/site or the peritoneum.
By early stage colon cancer, it is meant stage 0, stage ! and stage II.
Table.1 The TNM classification for staging of colon cancer.
Stage T N M
Stage 0 Tis NO MO
Stage I T1 or T2 NO MO
Stage IIA T3 NO MO
Stage IIB T4a NO MO
Stage IIC T4b NO MO
Stage IMA T1 or T2 N1/N1c MO
T1 N2a MO
Stage 1MB T3 or T4a N1/N1C MO
T2 or T3 N2a MO
T1 or T2 N2b MO
Stage IMC T4a N2a MO
T3 or T4a N2b MO
T4b N1 or N2 MO
Stage IV Any T Any N M1a
Any T Any N M1b
Detection of Biomarkers and Determination of Biomarker Levels
The presence of a biomarker or a biomarker level for the biomarkers described herein can be detected using any of a variety of analytical methods. In one embodiment, a biomarker level is detected using a capture reagent. In various embodiments, the capture reagents exposed to the biomarker in solution or is exposed to the biomarker while the capture reagent is immobilized on a solid support. In other embodiments, the capture reagent contains a feature that is reactive with a secondary feature on a solid support. In these embodiments, the capture reagent is exposed to the biomarker in solution, and then the feature on the capture reagent is used in conjunction with the secondary feature on the solid support to immobilize the biomarker on the solid support. The capture reagent is selected based on the type of analysis to be conducted. Capture reagents include but are not limited to aptamers, antibodies, other antibody mimetics and other protein scaffolds, autoantibodies, chimeras, small molecules, F(ab')2 fragments, single chain antibody fragments, FV fragments, single chain FV fragments, nucleic acids, lectins, ligand-binding receptors, affybodies, nanobodies, imprinted polymers, avimers, peptidomimetics, hormone receptors, cytokine receptors, and synthetic receptors, and modifications and fragments of these.
In some embodiments, biomarker presence or level is detected using a
biomarker/capture reagent complex, in some embodiments, the biomarker presence or level is derived from the biomarker/capture reagent complex and is detected indirectly, such as, for example, as a result of a reaction that is subsequent to the biomarker/capture reagent interaction, but is dependent on the formation of the biomarker/capture reagent complex.
In some embodiments, biomarker presence or level is detected directly from the biomarker in a biological sample.
In some embodiments, biomarkers are detected using a multiplexed format that allows for the simultaneous detection of two or more biomarkers in a biological sample. In some embodiments of the multiplexed format, capture reagents are immobilized, directly or indirectly, covalently or non-covalently, in discrete locations on a solid support. In some embodiments, a multiplexed format uses discrete soiid supports where each solid support has a unique capture reagent associated with that solid support, such as, for example quantum dots. In some embodiments, an individual device is used for the detection of each one of multiple biomarkers to be detected in a bioiogicai sample. Individual devices are configured to permit each biomarker in the bioiogicai sample to be processed simultaneously. For example, a microtiter plate can be used such that each well in the plate is used to analyze one or more of multiple biomarkers to be detected in a biological sample.
In some embodiments, the fluorescent label is a fluorescent dye molecule. In some embodiments, the fluorescent dye molecule includes at least one substituted indolium ring system in which the substituent on the 3-carbon of the indolium ring contains a chemically reactive group or a conjugated substance. In some embodiments, the dye molecule includes an AlexFluor molecule, such as, for example, AlexaFluor 488, AlexaFluor 532, AlexaFluor 647, AlexaFluor680, or AlexaFluor 700. In some embodiments, the dye moiecuie includes a first type and a second type of dye molecule, such as, e.g., two different AlexaFluor molecules. In some embodiments, the dye molecule includes a first type and a second type of dye molecule, and the two dye molecules have different emission spectra.
Fluorescence can be measured with a variety of instrumentation compatible with a wide range of assay formats. For example, spectrofluorimeters have been designed to analyze microtiter plates, microscope slides, printed arrays, cuvettes, etc. See Principles of
Fluorescence Spectroscopy, by J. R. Lakowicz, Springer Science+Business Media, Inc., 2004. See Bioluminescence & Chemiluminescence: Progress & Current Applications; Philip E. Stanley and Larry J. Kricka editors, World Scientific Publishing Company, January 2002.
In one or more embodiments, a chemiluminescence tag is optionally used to label a component of the biomarker/capture complex to enable the detection of a biomarker level. Suitable chemiluminescent materials include any of oxalylchloride, Rodamin 6G, Ru(bipy)32+, TMAE (tetrakis(dimethylamino)ethylene), Pyrogallol (1 ,2,3-trihydroxibenzene), Lucigenin, peroxyoxaiates, Aryl oxalates, Acridinium esters, dioxetanes, and others.
In some embodiments, the detection method includes an enzyme/substrate combination that generates a detectable signal that corresponds to the biomarker level (e.g., using the techniques of ELISA, Western blotting, isoelectric focusing). Generally, the enzyme catalyzes a
chemical alteration of the chromogenic substrate which can be measured using various techniques, including spectrophotometry, fluorescence, and chemiiuminescence. Suitable enzymes include, for example, luciferases, luciferin, malate dehydrogenase, urease,
horseradish peroxidase (HRPO), alkaline phosphatase, beta-galactosidase, glucoamylase, lysozyme, glucose oxidase, galactose oxidase, and glucose-6-phosphate dehydrogenase, uricase, xanthine oxidase, lactoperoxidase, microperoxidase, and the like.
In some embodiments, the detection method is a combination of fluorescence, chemiiuminescence, radionuclide or enzyme/substrate combinations that generate a
measurable signal, in some embodiments, multimodal signaling has unique and advantageous characteristics in biomarker assay formats.
In some embodiments, the biomarker levels for the biomarkers described herein is detected using any analytical methods including, singleplex aptamer assays, multiplexed aptamer assays, singleplex or multiplexed immunoassays, mRNA expression profiling histological/cytologicai methods, etc. as discussed below.
Determination of Biomarker Levels Using Gene Expression Profiling
Measuring mRNA in a biological sample may, in some embodiments, be used as a surrogate for detection of the level of a corresponding protein in the biological sample.
Thus, in some embodiments, a biomarker or biomarker panel described herein can be detected by detecting the appropriate RNA.
In some embodiments, mRNA expression levels are measured by reverse transcription quantitative polymerase chain reaction (RT-PCR followed with qPCR). RT-PCR is used to create a cDNA from the mRNA. The cDNA may be used in a qPCR assay to produce
fluorescence as the DNA amplification process progresses. By comparison to a standard curve, qPCR can produce an absolute measurement such as number of copies of mRNA per cell. Northern blots, microarrays, RNA-seq, Invader assays, and RT-PCR combined with capillary electrophoresis have all been used to measure expression levels of mRNA in a sample. See Gene Expression Profiling; Methods and Protocols, Richard A. Shimkets, editor, Humana Press, 2004; herein incorporated by reference in its entirety.
Determination of Biomarker Levels Using immunoassays
Immunoassay methods are based on the reaction of an antibody to its corresponding target or anaiyte and can detect the anaiyte in a sample depending on the specific assay format.
To improve specificity and sensitivity of an assay method based on immuno-reactivity, monoclonal antibodies and fragments are often used because of their specific epitope recognition. Polyclonal antibodies have also been successfully used in various immunoassays because of their increased affinity for the target as compared to monoclonal antibodies.
Immunoassays have been designed for use with a wide range of biological sample matrices. Immunoassay formats have been designed to provide qualitative, semi-quantitative, and quantitative results.
Quantitative results are generated through the use of a standard curve created with known concentrations of the specific analyte to be detected. The response or signal from an unknown sample is plotted onto the standard curve, and a quantity or level corresponding to the target in the unknown sample is established.
Numerous immunoassay formats have been designed. ELISA or EIA can be quantitative for the detection of an analyte. This method relies on attachment of a label to either the analyte or the antibody and the label component includes, either directly or indirectly, an enzyme. ELISA tests may be formatted for direct, indirect, competitive, or sandwich detection of the analyte. Other methods rely on labels such as, for example, radioisotopes (I125) or fluorescence.
Additional techniques include, for example, agglutination, nephelometry, turbidimetry, Western blot, immunoprecipitation, immunocytochemistry, immunohistochemistry, flow cytometry, Luminex assay, and others (see ImmunoAssay: A Practical Guide, edited by Brian Law, published by Taylor & Francis, Ltd., 2005 edition; herein incorporated by reference in its entirety).
Exemplary assay formats include enzyme-linked immunosorbent assay (ELISA), radioimmunoassay, fluorescent, chemiluminescence, and fluorescence resonance energy transfer (FRET) or time resolved-FRET (TR-FRET) immunoassays. Examples of procedures for detecting biomarkers include biomarker immunoprecipitation followed by quantitative methods that allow size and peptide level discrimination, such as gel electrophoresis, capillary electrophoresis, planar electrochromatography, and the like.
Methods of detecting and/or for quantifying a detectable label or signal generating material depend on the nature of the label. The products of reactions catalyzed by appropriate enzymes (where the detectable label is an enzyme; see above) can be, without limitation, fluorescent, luminescent, or radioactive or they may absorb visible or ultraviolet light. Examples of detectors suitable for detecting such detectable labels include, without limitation, x-ray film, radioactivity counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers.
Any of the methods for detection can be performed in any format that allows for any suitable preparation, processing, and analysis of the reactions. This can be, for example, in multi-well assay plates (e.g., 96 wells or 384 wells) or using any suitable array or microarray. Stock solutions for various agents can be made manually or robotically, and all subsequent pipetting, diluting, mixing, distribution, washing, incubating, sample readout, data collection and analysis can be done robotically using commercially available analysis software, robotics, and detection instrumentation capable of detecting a detectable label.
Determination of Biomarkers Using Histology/Cytology Methods
In some embodiments, the biomarkers described herein may be detected in a variety of tissue samples using histological or cytological methods. In some embodiments, one or more capture reagent/s specific to the corresponding biomarkers are used in a cytological evaluation of a sample and may include one or more of the following: collecting a cell sample, fixing the cell sample, dehydrating, clearing, immobilizing the cell sample on a microscope slide,
permeabilizing the cell sample, treating for analyte retrieval, staining, destaining, washing, blocking, and reacting with one or more capture reagent/s in a buffered solution. In another embodiment, the cell sample is produced from a cell block.
In some embodiments, one or more capture reagent s specific to the corresponding biomarkers are used in a histological evaluation of a tissue sample and may include one or more of the following: collecting a tissue specimen, fixing the tissue sample, dehydrating, clearing, immobilizing the tissue sample on a microscope slide, permeabilizing the tissue sample, treating for analyte retrieval, staining, destaining, washing, blocking, rehydrating, and reacting with capture reagent s in a buffered solution. In another embodiment, fixing and dehydrating are replaced with freezing.
Data Analysis and Reporting
In some embodiments, the results are analyzed and/or reported (e.g., to a patient, clinician, researcher, investigator, etc.). Results, analyses, and/or data (e.g., signature, disease score, diagnosis, recommended course, etc.) are identified and/or reported as an
outcome/result of an analysis. A result may be produced by receiving or generating data
(e.g., test results) and transforming the data to provide an outcome or result. An outcome or result may be determinative of an action to be taken. In some embodiments, results determined by methods described herein can be independently verified by further or repeat testing.
In some embodiments, analysis results are reported (e.g., to a health care professional (e.g., laboratory technician or manager, physician, nurse, or assistant, etc.), patient, researcher, investigator, etc.). In some embodiments, a result is provided on a peripheral, device, or component of an apparatus. For example, sometimes an outcome is provided by a printer or display. In some embodiments, an outcome is reported in the form of a report. Generally, an outcome can be displayed in a suitable format that facilitates downstream use of the reported information. Non-limiting examples of formats suitable for use for reporting and/or displaying data, characteristics, etc. include text, outline, digital data, a graph, graphs, a picture, a pictograph, a chart, a bar graph, a pie-graph, a diagram, a flow chart, a scatter plot, a map, a histogram, a density chart, a function graph, a circuit diagram, a block diagram, a bubble map, a constellation diagram, a contour diagram, a cartogram, spider chart, Venn diagram, nomogram, and the like, and combination of the foregoing. Generating and reporting results from the methods described herein comprises transformation of biological data (e.g., presence or level of biomarkers) into a representation of the characteristics of a subject (e.g., likelihood of mortality, likelihood corresponding to treatment, etc.). Such a representation reflects information not determinable in the absence of the method steps described herein. Converting biologic data into understandable characteristics of a subject allows actions to be taken in response such information.
In some embodiments, a downstream individual (e.g., clinician, patient, etc.), upon receiving or reviewing a report comprising one or more results determined from the analyses provided herein, will take specific steps or actions in response. For example, a decision about whether or not to treat the subject, and/or how to treat the subject is made.
The term "receiving a report" as used herein refers to obtaining, by a communication means, a written and/or graphical representation comprising results or outcomes of analysis. The report may be generated by a computer or by human data entry, and can be communicated using electronic means (e.g., over the internet, via computer, via fax, from one network location to another location at the same or different physical sites), or by another method of sending or receiving data (e.g., mail service, courier service and the like). In some embodiments the outcome is transmitted in a suitable medium, including, without limitation, in verbal, document, or file form. The file may be, for example, but not limited to, an auditory file, a computer readable file, a paper file, a laboratory file or a medical record file. A report may be encrypted to prevent unauthorized viewing.
As noted above, in some embodiments, systems and method described herein transform data from one form into another form (e.g., from biomarker levels to diagnoistic/prognostic
determination, etc.). In some embodiments, the terms "transformed", "transformation", and grammatical derivations or equivalents thereof, refer to an alteration of data from a physical starting material (e.g., biological sample, etc.) into a digital representation of the physical starting material (e.g., biomarker levels), a condensation/representation of that starting material (e.g., risk level), or a recommended action (e.g., treatment, no treatment, etc.).
Kits
Any combination of the biomarkers described herein can be detected using a suitable kit, such as for use in performing the methods disclosed herein. The biomarkers described herein may be combined in any suitable combination, or may be combined with other markers not described herein. Furthermore, any kit can contain one or more detectable labels as described herein, such as a fluorescent moiety, etc.
In some embodiments, a kit includes (a) one or more capture reagents for detecting one or more biomarkers in a biological sample, and optionally (b) one or more software or computer program products for providing a diagnosis/prognosis for the individual from whom the biological sample was obtained. Alternatively, rather than one or more computer program products, one or more instructions for manually performing the above steps by a human can be provided.
In some embodiments, a kit comprises a solid support, a capture reagent, and a signal generating material. The kit can also include instructions for using the devices and reagents, handling the sample, and analyzing the data. Further the kit may be used with a computer system or software to analyze and report the result of the analysis of the biological sample.
The kits can also contain one or more reagents (e.g., solubilization buffers, detergents, washes, or buffers) for processing a biological sample. Any of the kits described herein can also include, e.g., buffers, blocking agents, mass spectrometry matrix materials, serum/plasma separators, antibody capture agents, positive control samples, negative control samples, software and information such as protocols, guidance and reference data.
In some embodiments, kits are provided for the analysis of glioma, wherein the kits comprise PCR primers for one or more biomarkers described herein. In some embodiments, a kit may further include instructions for use and correlation of the biomarkers. In some
embodiments, a kit may include a DNA array containing the complement of one or more of the biomarkers described herein, reagents, and/or enzymes for amplifying or isolating sample DNA. The kits may include reagents for real-time PCR, for example, TaqMan probes and/or primers, and enzymes.
For example, a kit can comprise (a) reagents comprising at least one capture reagent for determining the level of one or more biomarkers in a test sample, and optionally (b) one or more algorithms or computer programs for performing the steps of comparing the amount of each biomarker quantified in the test sample to one or more predetermined cutoffs, in some embodiments, an algorithm or computer program assigns a score for each biomarker quantified based on said comparison and, in some embodiments, combines the assigned scores for each biomarker quantified to obtain a total score. Further, in some embodiments, an algorithm or computer program compares the total score with a predetermined score, and uses the comparison to determine a diagnosis/prognosis. Alternatively, rather than one or more algorithms or computer programs, one or more instructions for manually performing the above steps by a human can be provided.
Methods of Treatment
In some embodiments, following a determination that a subject has suffers from colon cancer, the subject is appropriately treated. In some embodiments, therapy is administered to treat colon cancer. In some embodiments, therapy is administered to treat complications of colon cancer (e.g., surgery, radiation, chemotherapy). In some embodiments, treatment comprises palliative care.
In some embodiments, methods of monitoring treatment of glioma are provided. In some embodiments, the present methods of detecting biomarkers are carried out at a time 0. In some embodiments, the method is carried out again at a time 1 , and optionally, a time 2, and optionally, a time 3, etc., in order to monitor the progression of colon cancer or to monitor the effectiveness of one or more treatments of colon cancer. Time points for detection may be separated by, for example at least 4 hours, at least 8 hours, at least 12 hours, at least 1 day, at least 2 days, at least 4 days, at least 1 week, at least 2 weeks, at least 1 month, at least 2 months, at least 3 months, at least 4 months, at least 6 months, or by 1 year or more. In some embodiments, a treatment regimen is altered based upon the results of monitoring (e.g., upon determining that a first treatment is ineffective). In some embodiments, the level of intervention may be altered.
EXAMPLES
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and
are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are ail or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g. amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Centigrade, and pressure is at or near atmospheric.
General methods in molecular and cellular biochemistry can be found in such standard textbooks as Molecular Cloning: A Laboratory Manual, 3rd ED. (Sambrook et a!., HaRBor Laboratory Press 2001 ); Short Protocols in Molecular Biology, 4th Ed. (Ausubel et ai. eds., john Wiley & Sons 1999); Protein Methods (Bollag et al., John Wiley & Sons 1996); Nonviral Vectors for Gene Therapy (Wagner et al. eds., Academic Press 1999); Viral Vectors (Kaplift & Loewy eds., Academic Press 1995); Immunology Methods Manual (I. Lefkovits ed., Academic Press 1997); and Ceil and Tissue Culture: Laboratory Procedures in Biotechnology (Doyle & Griffiths, John Wiley & Sons 1998), the disclosures of which are incorporated herein by reference.
Reagents, cloning vectors, and kits for genetic manipulation referred to in this disclosure are available from commercial vendors such as BioRad, Stratagene, Invitrogen, Sigma-Aldrich, and ClonTech.
EXAMPLE 1
Materials and methods
Data collection and pre-processing.
The raw count RNA sequencing data for colon cancer patients were downloaded from GDC data portal. The patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port. The SRA RNA sequencing data for normal colon tissue were downloaded from GTEx data portal through dbGaP (Table 2). The two data sets were then manually curated based on the available stage and grade information from patient clinical data. In this patent we used 268 early stage colon cancer samples and 786 normal colon samples as our dataset for early stage colon cancer btomarker detection. We manually categorized the data sets based on the available stage and grade information for each samples (Table 2).
Table 2. Data sets used for RNA sequencing.
vs
Genomic sequencing pipeline for RNA sequencing data.
The entire RNA-seq pipeline was divided into two parts for GTEx data: alignment and quantification (Figure 1). The alignment step consists of: SRA to bam conversion using SRA Toolkits (SRA Toolkit development team), bam to fastq conversion using Biobambam (Tischler G et al., 2014), and fastq to aligned bam conversion using STAR (Alex D et al., 2016). The quantification step consists of: quality improvement filtering using Fixmate
(http://broadinstitute.github.io/picard/), sorting and quality filtering using samtools (Li H et al., 2009), and sequence counting using HTSeq (Simon A et al., 2014). The output from
quantification step results in gene raw counts for GTEx data and is conbined with GDC gene profile for further downstream analysis.
Normalizations of RNA sequencing data.
The gene expression profile is then pre-filtered based on the mean expression per gene. The filtered profile is then normalized using quantile metric and is converted into log2 scale. Combat package (from edgeR, http://www.r-project.org) is then used to perform further normalization between GDC case, GDC control, and GTEx control to minimize the difference between normal controls from two databases (Figure 1 ).
Differentiated gene selection.
The normalized gene profile is then analyzed by linear model using R package 'limma' (http://www.r-project.org/). The 50 genes with relatively low p-values and relatively large absolute value of log2 fold change were selected as our panel.
Random forest analysis.
The selected gene expression profile was firstly normalized to z-score across all the samples. The z-score of the gene expression profiles for the samples randomized to the statistical training cohort (n=948) were then analyzed by Random Forest analysis using the R package 'randomForest' (http://www.r-project.org/). All subjects in the training cohort were subsequently assigned to one of two possible subgroups (normal and early stage). With the trained model applied to both training cohort and testing cohort (n=106), the possibility of each sample in each subgroup can be calculated (Figure 2). Receiver-operator characteristic (ROC) analysis was conducted (Figure 3) to evaluate the ability of the selected gene expression profile in differentiating the subjects in the testing cohort with early stage colon cancer patients from those normal samples. This process was repeated 500 times using bootstrapping algorithm to get more accurate evaluation of the model.
Heat map.
Unsupervised hierarchical clustering analysis was performed (Figure 4) to visually depict the association between the disease status and the abundance pattern of the selected genes profile. This analysis was used to demonstrate the effectiveness of the selected genes panel in differentiating early stage colon cancer from normal subjects.
EXAMPLE 2
Results
Data collection, pre-processing.
The RNA sequencing data for early stage colon cancer tissue and normal colon tissue were downloaded from GDC and GTEx data portal. The patient clinical data, including specific tumor stage and grade, are downloaded from GDC data port. The normal colon tissue data from GTEx were processed using developed RNA-seq pipeline.
Statistical results for fifty-one selected genes.
A linear model from R package "limma" is applied for gene profile between early stage colon cancer and normal samples. P-values and log2 fold change for each selected gene were shown in Table 3.
Table 3
The 50 genes used to differentiate of early stage
colon cancer patients from normal subjects
Gene symbol LogFC FDR
MYOC -5.57595709 3.65E-280
COL11A1 5.435744072 2.7709474120748e-318
CDH3 6.271810973 0
WISP2 -4.826206612 2.24E-281
SCGN -5.789059518 0
T 1B3 3.582078564 1.03E-299
CEMIP 4.974198438 0
SFRP1 -5.667318057 3.33824598767743e-316
STMN2 -4.637314358 9.00E-279
GRIN2D 4.594700353 0
WNT2 5.22888592 0
LGI1 -4.57258113 0
RERGL -4.998344693 6.32E-285
OTX1 4.422895527 0
CNTFR -5.624666851 0
COL10A1 5.795318412 0
PLP1 -6.044439433 0
PCSK2 -5.393391537 0
LYVE1 -4.617715492 1.08E-295
SCN7A -5.650340297 5.47E-287
MMP7 6.644433067 0
ABCA8 -5.631614728 0
BMP3 -6.529378003 0
SST -5.786260348 6.78E-292
SIM2 4.054762318 0
CADM3 -5.716292578 0
CLDN1 4.640378303 0
CLEC3B -4.546615716 0
ES l 5.115811863 0
F0XQ1 5.636436405 0
MAMDC2 -5.59405991 0
KL 6 6.188458755 0
KRT80 6.371232468 0
SCARA5 -5.786701892 0
CST1 6.165506511 0
SE A3E -4.672860025 1.86E-276
VSTM2A -5.411131783 1.01E-291
EPHX4 4.170873994 6.28303281816313e-320
PRIMA1 -5.570661293 0
ETV4 5.204183777 0
CPNE7 4.855819509 0
NRXN1 -5.177794993 1.54E-281
OTOP3 -4.838017009 3.29E-278
ADH1B -6.373823613 0
RP11-474D1.3 6.069026607 0
RP5-884M6.1 3.922322745 0
ELFN1-AS1 4.235409907 3.33E-296
CRNDE 3.990470664 0
C17orf96 3.382370399 5.00E-305
BLACAT1 4.387181703 2.16160550434049e-313
Performance of transcriptomics profile-based prognostic algorithm
The Random Forest based risk mode! stratified ali subjects in training and testing cohorts into two levels of risk for progression as discussed above (normal or early stage). 50 selected genes profiles (normalized) were used as the model input. The risk scores of colon cancer were calculated by the model (Figure 2). We use 0.5 as the cutoff threshold.
The c statistic of the model measured on the testing cohort was 1 (Figure 3).
Unsupervised hierarchical clustering with transcriptomics profiles
Unsupervised hierarchical clustering analysis was applied to the selected genes profiles to visually depict the association of the disease status with the abundance patterns of these genes profiles (Figure 4). This analysis demonstrated two major clusters reflecting normal samples and early stage colon cancer samples. The error rate of the unsupervised clustering is less than 0.001 , which reinforced the effectiveness of the selected gene profiles for colon cancer assessment.
Claims
1. A method, comprising detecting the level of one or more target analytes, but fewer than 50 target analytes, in a sample from a subject to be tested for colon cancer, one or more of the target analytes being selected from the group consisting of MYOC, COL11 A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , ST N2, GRIN2D, WNT2, LGI1, RERGL, OTX1, CNTFR, COL10A1 , PLP1, PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA1, ETV4, CPNE7, NRXN1 , OTOP3, ADH1 B, RP11-474D1.3, RP5- 884M6.1 , ELFN1-AS1 , CRNDE, C17orf96, BLACAT1 wherein a change in the expression of these genes are associated with early stage colon cancer.
2. The method of claim 1 , further comprising detecting one or more additional target analytes.
3. The method of claim 2, comprising detecting three or more target analytes being selected from the group consisting of MYOC, COL11A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1, CNTFR, COL10A1, PLP1 , PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2, KLK6, KRT80, SCARA5, CST1 , SEMA3E, VSTM2A, EPHX4, PRIMA! ETV4, CPNE7, NRXN1 , OTOP3, ADH1B, RP11-474D1.3, RP5-884M6.1 , ELFN1-AS1 , CRNDE, C17orf96, BLACAT1.
4. The method of claim 2, comprising detecting ten or more target analytes.
5. The method of claim 1 , wherein the sample is a blood product selected from whole blood; plasma; serum; and filtered, concentrated, fractionated or diluted samples of the preceding.
6. The method of claim 1 , wherein the sample is a biopsy tissue.
7. The method of claim 1 , wherein the method comprises contacting the sample with a set of capture reagents, wherein each capture reagent specifically binds to a different target analyte being detected.
8. The method of claim 7, wherein each capture reagent is an antibody.
9. The method of claim 7, wherein each capture reagent is a nucleic acid probe.
10. Reagents comprising capture reagents for the detection of two or more target analytes, but fewer than 50 target analytes, two or more of the target analytes being selected from the group consisting of MYOC, COL11A1 , CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1 , STMN2, GRIN2D, WNT2, LGI1 , RERGL, OTX1 , CNTFR, COL10A1 , PLP1 , PCSK2, LYVE1 , SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1 , CLEC3B, ESM1 , FOXQ1 , MAMDC2,
KLK6, KRT80, SCARA5, CST1, SEMA3E, VSTM2A, EPHX4, PRIMA1, ETV4, CPNE7, NRXN1, OTOP3, ADH1B, RP11-474D1.3, RP5-884M6.1, ELFN1-AS1, CRNDE, C17orf96, BLACAT1.
11. The reagents of claim 10, wherein said capture reagents are antibodies.
12. The reagents of claim 10, wherein said capture reagents are nucleic acid probes.
13. A kit comprising the reagents of claim 10 and one or more additional reagents for carrying out an assay in a sample from a subject.
14. The reagents of claim 10, comprising capture reagents for detecting three or more target analytes selected from the group consisting of MYOC, COL11A1, CDH3, WISP2, SCGN, TRIB3, CEMIP, SFRP1, STMN2, GRIN2D, WNT2, LGI1, RERGL, OTX1, CNTFR, COL10A1, PLP1, PCSK2, LYVE1, SCN7A, MMP7, ABCA8, BMP3, SST, SIM2, CADM3, CLDN1, CLEC3B, ESM1, FOXQ1, MAMDC2, KLK6, KRT80, SCARA5, CST1, SEMA3E, VSTM2A, EPHX4, PRIMA1, ETV4, CPNE7, NRXN1, OTOP3, ADH1B, RP11-474D1.3, RP5-884M6.1, ELFN1-AS1, CRNDE, C17orf96, BLACAT1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2017/023478 WO2018174863A1 (en) | 2017-03-21 | 2017-03-21 | Methods and composition for detecting early stage colon cancer with rna-seq expression profiling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2017/023478 WO2018174863A1 (en) | 2017-03-21 | 2017-03-21 | Methods and composition for detecting early stage colon cancer with rna-seq expression profiling |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018174863A1 true WO2018174863A1 (en) | 2018-09-27 |
Family
ID=63585612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2017/023478 WO2018174863A1 (en) | 2017-03-21 | 2017-03-21 | Methods and composition for detecting early stage colon cancer with rna-seq expression profiling |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2018174863A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113293210A (en) * | 2021-04-21 | 2021-08-24 | 山东大学第二医院 | Application of ELFN1-AS1 in colorectal cancer diagnosis biomarkers and therapeutic targets |
CN115216542A (en) * | 2021-04-15 | 2022-10-21 | 复旦大学附属华山医院 | Marker for screening and identifying tumors and application thereof |
CN117219158A (en) * | 2022-12-02 | 2023-12-12 | 上海爱谱蒂康生物科技有限公司 | Individualized treatment decision-making method and system for intestinal cancer and storage medium containing individualized treatment decision-making method and system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120053080A1 (en) * | 2009-03-09 | 2012-03-01 | Juan Cui | Protein markers identification for gastric cancer diagnosis |
-
2017
- 2017-03-21 WO PCT/US2017/023478 patent/WO2018174863A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120053080A1 (en) * | 2009-03-09 | 2012-03-01 | Juan Cui | Protein markers identification for gastric cancer diagnosis |
Non-Patent Citations (1)
Title |
---|
SANZ-PAMPLONA ET AL.: "Aberrant gene expression in mucosa adjacent to tumor reveals a molecular crosstalk in colon cancer", MOLECULAR CANCER, vol. 13, no. 36, 2014, pages 1 - 19, XP021186511 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115216542A (en) * | 2021-04-15 | 2022-10-21 | 复旦大学附属华山医院 | Marker for screening and identifying tumors and application thereof |
CN113293210A (en) * | 2021-04-21 | 2021-08-24 | 山东大学第二医院 | Application of ELFN1-AS1 in colorectal cancer diagnosis biomarkers and therapeutic targets |
CN117219158A (en) * | 2022-12-02 | 2023-12-12 | 上海爱谱蒂康生物科技有限公司 | Individualized treatment decision-making method and system for intestinal cancer and storage medium containing individualized treatment decision-making method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7803552B2 (en) | Biomarkers for predicting prostate cancer progression | |
CA2809282C (en) | Mesothelioma biomarkers and uses thereof | |
US20120143805A1 (en) | Cancer Biomarkers and Uses Thereof | |
WO2018174861A1 (en) | Methods and compositions for detecting early stage breast cancer with rna-seq expression profiling | |
JP7136697B2 (en) | A biomarker for the detection of breast cancer in women with dense breasts | |
EP3414575A1 (en) | Nonalcoholic fatty liver disease (nafld) and nonalcoholic steatohepatitis (nash) biomarkers and uses thereof | |
US20150160225A1 (en) | Renal Cell Carcinoma Biomarkers and Uses Thereof | |
CN113234830B (en) | Product for lung cancer diagnosis and application | |
WO2015164616A1 (en) | Biomarkers for detection of tuberculosis | |
WO2018174863A1 (en) | Methods and composition for detecting early stage colon cancer with rna-seq expression profiling | |
US20160138110A1 (en) | Glioma biomarkers | |
WO2018140049A1 (en) | Methods and compositions for detecting early stage ovarian cancer with rnaseq expression profiling | |
CN113502326B (en) | Biomarker-based pulmonary arterial hypertension diagnosis product and application thereof | |
Park et al. | Forensic body fluid identification by analysis of multiple RNA markers using NanoString technology | |
WO2018174862A1 (en) | Methods and compositions for detecting early stage bladder cancer with rna-seq expression profiling | |
WO2018174860A1 (en) | Methods and compositions for detection early stage lung adenocarcinoma with rnaseq expression profiling | |
US20210072245A1 (en) | Biomarkers for detection of breast cancer | |
WO2016123058A1 (en) | Biomarkers for detection of tuberculosis risk | |
CN113444796B (en) | Biomarkers associated with lung cancer and their use in diagnosing cancer | |
WO2018174859A1 (en) | Methods and compositions for detection of early stage lung squamous cell carcinoma with rnaseq expression profiling | |
US20180356419A1 (en) | Biomarkers for detection of tuberculosis risk | |
EP3736345A1 (en) | Genomic predictors of aggressive micropapillary bladder cancer | |
US20170121774A1 (en) | Methods and compositions for assessing predicting responsiveness to a tnf inhibitor | |
JP2024540836A (en) | Lung cancer prediction and its applications | |
EP4413372A1 (en) | Lung cancer prediction and uses thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17902239 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17902239 Country of ref document: EP Kind code of ref document: A1 |