DNA METHYLATION PATTERNS
Field of the Invention
The present invention relates to genomics and in particular to a method for the rapid assessment of the genome wide distribution of DNA methylation using restriction endonucleases which are sensitive to methylation within their recognition site in combination with size fractionation of the restricted DNA and hybridization to a DNA chip.
Background DNA methylation is a ubiquitous biological process that occurs in diverse organisms ranging from bacteria to humans. During this process, DNA methyltransferases catalyze primarily the post-replicative addition of a methyl group to the N6 position of adenine or the C5 position of cytosine. In higher eukaryotes, DNA methylation plays a central role in epigenetic regulation of gene expression and in particular in transcriptional gene silencing, genomic imprinting and embryonic development. Aberrations in DNA methylation have been implicated in aging and various diseases including cancer.
A method for determining the methylation state within a genomic sequence context based on the use of restriction endonucleases which require the recognition sequence to be unmethylated to allow cleavage at the site has been described by Bird et al, J. Mol. Biol. 118, 27- 47, 1978. The fragments resulting from cleavage of unmethylated recognition sites are detected by gel electrophoresis, transferred to a membrane and hybridized to a labeled probe corresponding to the DNA fragment to be examined. The resulting hybridization pattern reflects the methylation pattern of the DNA. The sensitivity of this method was increased in a variant combined with PCR (Shemer, R. et al., Proc. Natl. Acad. Sci USA, 93, 6371-6376, 1996).
Amplification by two primers located on both sides of the recognition sequence only occurs after cleavage if the recognition sequence is in the methylated form. With both variants only methylation of individual positions is examined.
A new technology, called DNA microarray technology, is attracting more and more interest among biologists. This technology provides means to monitor almost whole genomes on a single chip so that researchers can analyze thousands of genes simultaneously. Terminologies that have been used in the literature to describe this technology include, but are not limited to: biochip, DNA chip, DNA microarray, probe array and gene array. Affymetrix, Inc. owns a registered trademark, GeneChip®, which refers to its high density, oligonucleotide-based DNA
arrays. In scientific literature, however, the term "gene chip(s)" is frequently used as a general terminology that refers to the DNA microarray technology.
An array is an orderly arrangement of DNA samples. It provides a medium for matching known and unknown DNA samples based on base-pairing rules and automating the process of identifying the unknowns. An array experiment can make use of common assay systems such as microplates or standard blotting membranes, and can be created by hand or make use of robotics to deposit the sample, h general, arrays are described as macroarrays or microarrays, the difference being the size of the sample spots. Macroarrays contain sample spot sizes of about 300 microns or larger and can be easily imaged by existing gel and blot scanners. The sample spot sizes in microarray are typically less than 200 microns in diameter and these arrays usually contain thousands of spots. DNA microarray, or DNA chips are fabricated by robotics, generally on glass but sometimes on nylon substrates, for which probes with known identity are used to determine complementary binding, thus allowing massively parallel gene expression and gene discovery studies. An experiment with a single DNA chip can provide researchers infqrmation on thousands of genes simultaneously. With regard to the terminology of the hybridization partners it should be noted that in connection with microarrays a "probe" is frequently considered to be a tethered nucleic acid with known sequence, whereas a "target" is a free nucleic acid sample whose identity or abundance is analyzed.
There are two major application forms for the microarray technology: (1) identification of a nucleotide sequence such as a gene or gene mutation and (2) determination of the expression level or abundance of a nucleotide sequence. There are also two variants of the DNA microarray technology, in terms of the property of arrayed DNA sequence with known identity: In variant 1, probe DNAs of 500-5,000 bases length such as cDNA are immobilized to a solid surface such as glass using robotic spotting and exposed to a set of targets either separately or in a mixture. In variant 2, an array of oligonucleotides (20-80-mer oligos) or peptide nucleic acid (PNA) probes are synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization. The array is exposed to labeled sample DNA, hybridized, and the identity or abundance of complementary sequences is determined. This method, "historically" called DNA chips, is sold under the GeneChip® trademark. In contrast to cDNA spotting methods in which a single clone is used for the analysis, GeneChip® arrays use multiple probes (e.g. 16 probe pairs for one gene in the case of the Arabidopsis chip) to interrogate a chromosomal region. This probe pairing strategy helps to identify and minimize the effects of non-specific hybridization and background signals. A GeneChip® expression array can contain probes corresponding to a
number of reference and control genes. Using reference standards, it is possible to normalize data from different experiments and compare multiple experiments on a quantitative level.
Some commercially available GeneChip probe arrays are manufactured using technology that combines photolithographic methods and combinatorial chemistry. Tens to hundreds of thousands of different ohgonucleotide probes are synthesized on each array. Each probe type is located in a specific area on the probe array called a probe cell. Each probe cell contains millions of copies of a given probe. Probe arrays are manufactured in a series of cycles. A glass substrate is coated with linkers containing photo labile protecting groups. Then, a mask is applied that exposes selected portions of the probe array to ultraviolet light. Illumination removes the photo labile protecting groups enabling selective nucleoside phosphoramidite addition only at the previously exposed sites. Next, a different mask is applied and the cycle of illumination and chemical coupling is performed again. By repeating this cycle, a specific set of oligonucleotide probes is synthesized, with each probe type in a known location. The completed probe arrays are packaged into cartridges. Many companies are manufacturing oligonucleotide-based chips using alternative in situ synthesis or depositioning technologies.
The present invention combines the use of methylation-sensitive restriction enzymes, DNA size fractionation and DNA microarray technology in a way, which makes it feasible to examine the level and the distribution of DNA methylation on a genome wide scale. It also allows resolution down to a gene, gene fragment or any chosen DNA sequence, and can be subjected to quantification. Whereas all types of microarrays can be used in the context of the present invention, it is an important aspect that the probe arrays are hybridized with a selected size fraction of labeled genomic target DNA that has been restricted with a methylation sensitive endonuclease. Hybridization intensities of different sources of genomic target DNA are then compared to each other and optionally to a control digestion with a methylation insensitive endonuclease to identify sequences showing different levels of methylation. The intensities are indicative of the composition of the particular size fraction and therefore of the methylation status of the genomic targets. Thus, a comparatively high intensity of hybridization of a probe with a low molecular weight target fraction resulting from cleavage with an endonuclease inhibited by the presence of 5-methylcytosine or N6-methyladenine in the recognition sequence, reflects hypomethylation in the target fraction relative to other samples or controls. Similarly, a comparatively high intensity of hybridization of a probe with a low molecular weight target resulting from digestion with an endonuclease requiring the presence of 5-methylcytosine or N6- methyladenine in the recognition sequence, reflects hypermethylation in the target fraction relative to other samples or controls.
Summary
The present invention teaches a method to detect differences of genomic methylation comprising (a) separately cleaving different samples of genomic DNA with a sequence specific endonuclease whose cleavage activity is inhibited by or requires the presence of 5- methylcytosine or N6-methyladenine in the recognition sequence;
(b) labelling a defined size fraction of the resulting DNA fragments;
(c) separately hybridizing the labelled DNA fractions of step (b) to an array of DNA molecules representing a plurality of genomic DNA targets; and
(d) quantifying the differences of the hybridization intensity patterns obtained in step (c). Additionally, as a hybridization control, the method may optionally comprise
(i) cleaving identical samples of genomic DNA with a sequence specific endonuclease whose cleavage activity is indifferent to the presence of 5-methylcytosine or N6-methyladenine in the recogmtion sequence and which is preferably an isoschizomere cleaving at the same recognition site as the methylation-sensitive enzyme;
(ii) labelling the same size fraction of the resulting DNA fragments as in step (b); and (iii) separately hybridizing the labelled DNA fractions of step (ii) to an identical array of DNA molecules representing a plurality of genomic DNA targets.
Detailed Description The method is particularly useful to distinguish different cell types on the basis of their methylation pattern, which can be extremely useful in the context of cancer diagnosis and treatment. The method, however, is not restricted to the analysis of mammalian genome methylation, but can also be used for methylation pattern analysis in plants, animals, insects, fungi or microbes. Thus, it can be used in plants to compare methylation patterns in the context of heterosis. It is preferred, that the DNA samples to be compared are of isogenic origin such as healthy versus tumour tissue of the same organism, parental DNA versus progeny or sibling DNA, or DNA of isogenic organisms only differing by one or more specific mutations. In general, with increasing genetic distance of the samples to be compared, interpretation of the results becomes more difficult.
A number of different endonucleases inhibited by the presence of 5-methylcytosine or N6-methyladenine can be used in the context of the present invention. Their respective recognition sequences are usually defined by 4 to 8 base pairs, shorter recognition sequences of 4
to 6 base pairs being preferred. A number of particularly useful endonucleases are listed in Table 1. Endonucleases which are useful at sites of overlapping CG are shown in Table 2. Particularly preferred examples of recogmtion sequences are ACGT, GCGC, CCGG, TCGA or CGCG.
Table 1:
Enzyme Site Enzyme Site Enzyme Site
Atall GACGTC BstUI CGCG Narl GGCGCC
Acil CCGC Cl l ATCGAT Neil CCSGG
Acll AACGTT Eagl CGGCCG NgoMI GCCGGC
Age I ACCGGT Fnu4H I GCNGC Not I GCGGCCGC
Ascl GGCGCGCC Fsel GGCCGGCC Nrul TCGCGA
Aval CYCGRG Fspl TGCGCA Pmll CACGTG
BsaAI YACGTR Haell RGCGCY Pvul CGATCG
BsaHI GRCGYC Hg l GACGC RsrII CGGWCCG
BsiEI CGRYCG Hhal GCGC Sac II CCGCGG
BsiWI CGTACG HinPl I GCGC Sail GTCGAC
BspD ATCGAT Hp ll CCGG Smal CCCGGG
BsrFI RCCGGY Kasl GGCGCC SnaBI TACGTA
BssH II GCGCGC Mini ACGCGT Xhol CTCGAG
BstBI TTCGAA Nael GCCGGC
Table 2:
Enzyme Site Enzyme Site Enzyme Site
Ace I GTMKAC BsmAI GTCTC Nhel GCTAGC
Acc65 I GGTACC BspEI TCCGGA Rsal GTAC
Apal GGGCCC BsrB I1 GAGCGG PaeR71 CTCGAG
ApaL I GTGCAC Dra III CACNNNGTG PshA I GACNNNNGTC
Avail GGWCC Drdl GACN6GTC Sail GTCGAC
Ban I GGYRCC Eael YGGCCR Sau3A I GATC
Bsal GGTCTC Earl CTCTTC Sau96 I GGNCC
BsaBI GATN4ATC Hinc II GTYRAC ScrFI CCNGG
Bsgl GTGCAG Hinfl GANTC Sfil GGCC(N5)GGCC
Bsll CCN7GG Hpal GTTAAC
The type of DNA label is only important in the sense that it needs to be compatible with the hybridization technology required for the probes immobilized on the solid array supports. In general, any method which can label DNA quantitatively can be used for this application, including but not limited to random ohgonucleotide priming , nick translation, chemical labelling of DNA such as labelling with Biotin, and light activated chemical labelling of DNA such as labelling with Psoralen-biotin and activation by UV light. Means to perform these methods are commercially available. It is also possible to label the target DNA on the chip by primer extension using the chip oligo as the primers. A preferred support is the Affymetrix GeneChip which can be hybridized to the target DNA according to the manufacturer's instructions.
The preferred size fraction to be labled primarily depends on the length of the recognition sequence cleaved by the endonuclease. Thus, for a recognition sequence of 4 base pairs the preferred fragment size is up to 3000 base pairs and preferentially between 100 and 2000 base pairs. For a 6 base pair recognition sequence a fragment size of between 100 and 6000 base pairs is suitable.
The specific probes of the DNA array may represent any type of genomic DNA, that is coding sequences such as cDNA or non-coding sequences such as promoters, enhancers, terminators, introns, transposons etc. In the context of methylation studies it is preferred that the specific probe arrays represent non-coding sequences such as regulatory sequences of the genome of an organism.
Experimental data resulting from the hybridization can be analyzed using computer software. Affymetrix, for example, offers a program called GeneChip Microarray Suite. This program, for comparison between two chips, measures and normalizes the 'baseline chip' intensity values to the average signal intensity. Intensity values of the 'experimental chip' are then compared to the baseline chip and a 'difference change' is calculated. The output of the software provides a qualitative call: 'increase', 'marginal increase', 'no change', 'decrease', and 'marginal decrease' as well as discrete numerical metrics used to make the call. Expression elements may be ranked based on absolute level of expression or relative change in expression between a two chip comparison. All numerical data may be exported to a Microsoft Excel spreadsheet for further analysis. Expression elements are identified by a GenBank accession number and the GeneChip analysis software allows for immediate hyperlink to the GenBank entry for full sequence annotation.
Examples
Example 1 - Preparation, labelling and hybridization of genomic target DNA
Genomic DNA of Arabidopsis thaliana mutant som8 plants (O Mittelsten Scheid et al (1998), Proc. Natl. Acad. Sci. USA 95, 632-637) and corresponding wild type plants is isolated by standard biochemical procedures ensuring high molecular weight and sufficient purity for enzymatic modifications. The DNA is subjected to endonucleolytic cleavage with the sequence specific endonuclease Hpall, the activity of which is blocked by the presence of 5-methyl cytosine in the recognition sequence. As a control, the same DNA sample is digested with the endonuclease Mspl. After endonucleolytic cleavage and agarose gel electrophoresis, various size fractions are eluted from the gel and labelled using the Life Technologies DNA Labelling system with some modifications and scaled up to 200μl:
1. 1 Oμl of a DNA size fraction eluted from the gel, generally containing between 0.5-1.5μg of DNA and preferably 0.8μg of DNA, are mixed with lOμl H2O and 80μl of a 2.5x Random Primer Solution and placed on ice;
2. The mixture is then boiled for 5 minutes and placed on ice again immediately;
3. Thereafter 20μl of a lOx dNTP mixture including Biotin-14-dCTP, 72μl H2O and
8μl Klenow fragment are added, mixed and briefly (about 10 seconds) centrifuged before incubation at 37°C for 4-6 hours; 4. The reaction is terminated by the addition of 20μl stop buffer
5. DNA is precipitated by the addition of 22 μl 3M NaAc and 440μl ethanol (95-99%) and leaving the DNA on dry ice for 20 minutes or at -20°C overnight;
6. The DNA is then collected by centrifugation at 4°C for 10 minutes at maximum speed of a table top centrifuge and discarding the supernatant; 7. The pellet is then redissolved in 200μl H2O and reprecipitated with ethanol (steps 5-6);
8. Finally the DNA pellet is dissolved in 100 μl of H2O for use in the hybridization procedure.
The commercially available GeneChip® Arabidopsis genome array used in this example contains probe sets interrogating more than 8,200 genes and more than 100 EST clusters for Arabidopsis thaliana. Eighty percent of the genes represented on the array are predicted coding sequences from genomic BAC entries. Twenty percent are from high quality cDNA sequences. The array also contains more than 100 EST clusters, sharing homology with the predicted coding sequences from BAC clones.
The labelled target DNA of example 1 is hybridized to the Arabidopsis Genome Array and further analyzed as described in example 1 of EP-A-999 285 and corresponding US Patent No. 6,203,989.
Example 2 - Methylation patterns in som8 plants as compared to wild type plants
Using gene-by-gene analysis it has been shown previously that the DNA fraction, which is preferentially demethylated in som8 Arabidopsis mutants, is composed of remnants of transposons and of repetitive DNA. The experimental data derived from the hybridization studies described in example 2, wherein the methylation status of more than 8000 genes are studied in a single experiment, are in agreement with the previous results and additionally provide a direct, unbiased and broader picture of genome wide DNA methylation changes in som8 plants as compared to wild type plants. Of the 8000 genes studied 124 can be characterized as being related to transposable elements and the experimental data confirm that transposable elements are preferentially demethylated in som8 plants as compared to the control wild type plants. The methylation level decrease of the transposons correlates well with their transcriptional reactivation in many independent examples. However, a subset of transposons, although demethylated, remains transcriptionally inactive. In addition, it is found that selected genes and members of multigene families are also subjected to demethylation and transcriptional reactivation in som8 plants similar to the subset of transposons. Among this group of genes those encoding pathogen resistance determinants are the most prominent examples.