OLIGONUCLEOTIDES USEFUL FOR DETECTING AND ANALYZING NUCLEIC ACIDS OF INTEREST
Field of the Invention The invention relates to nucleic acids and methods for expression profiling of mRNAs, identi-fying and profiling of particular mRNA splice variants, and detecting mutations, deletions, or duplications of particular exons, e.g., alterations associated with a disease such as cancer, in a nucleic acid sample, e.g., a patient sample. The invention furthermore relates to methods for detecting nucleic acids by fluorescence in situ hybridization.
Background of the Invention
The field of the invention is oligonucleotides (e.g., oligonucleotide arrays) that are useful for detecting nucleic acids of interest and for detecting differences between nucleic acid samples (e.g, such as samples from a cancer patient and a healthy patient). DNA chip technology utilizes minituarized arrays of DNA molecules immobilized on solid surfaces for biochemical analyses. The power of DNA microarrays as experimental tools relies on the specific molecular recognition via complementary base-pairing, which makes them highly useful for massive parallel analyses. In the post-genomic era, microarray technology has thus become the method of choice for many hybridization-based assays, such as expression profiling, SNP detection, DNA re-sequencing, and genotyping on a genomic scale.
Expression microarrays are capable of profiling gene expression patterns of tens of thousands of genes in a single experiment. Hence, this technology provides a powerful tool for deciphering complex biological systems, and thereby greatly facilitates research in basic biology and living processes, as well as disease diagnostics, theranostics, and drug development. In a typical cell, the mRNAs are distributed in three frequency classes: (i) superprevalent (10-20% of the total mRNA mass); (ii) intermediate (40-45%); and (iii) low- abundant mRNAs (40-45%). It is therefore of utmost importance that the dynamic range and sensitivity of the expression arrays are optimal, especially when analyzing expression levels of messages or mRNA splice variants belonging to the low-abundant class.
The recent explosion of interest in DNA microarray technology has been sparked by two key innovations. The first was the use of non-porous solid support, such as glass or polymer as opposed to nylon or nitrocellulose filters, which has facilitated miniaturization and fluorescence-based detection. Roughly 20,000 cDNAs can be robotically spotted onto a
microscope slide and hybridized with a double-labeled probe. The second was the development of methods for high-density spatial synthesis of oligonucleotides. The two key array technologies are outlined in the following. Oligonucleotide arrays An efficient strategy for oligonucleotide microarray manufacturing involves DNA synthesis on solid surfaces using combinatorial chemistry. Most of the current technology is developed by Affymetrix and Rosetta Inpharmatics. Glass is currently preferred as the synthesis support because of its inert chemical properties and low level of intrinsic fluorescence as well as the ability to chemically derivatize the surface. Of the three approaches currently used to manufacture oligonucleotide arrays, the light-directed deprotection method is the most effective one in generating high density microchips. A single round of synthesis involves light-directed deprotection, followed by nucleotide coupling. Photolithographic masking is used to control the regions of the chip designated for illumination. Affymetrix uses a combination of photolithography and combinatorial chemistry to manufacture its GeneChip Arrays. Using technologies adapted from the semiconductor industry, GeneChip manufacturing begins with a 5 -inch square quartz wafer. Initially the quartz is washed to ensure uniform hydroxylation across its surface. The wafer is placed in a bath of silane, which reacts with the hydroxyl groups of the quartz and forms a matrix of covalently linked molecules. The distance between these silane molecules determines the probes' packing density, allowing arrays to hold over 500,000 features within 1.28 square centimeters. The principal disadvantage of this method is that a significant amount of chip design work and cost is associated with the mask design. Once a set of masks has been made, a large number of chips can be produced at a reasonable cost. The current pricing of oligonucleotide arrays available from Affymetrix are in the range of 5-10 fold more expensive than cDNA microarrays.
DNA-DNA hybridization using oligonucleotide chips is clearly different from that of cDNA microarrays. Hybridizations involving oligos are much more sensitive to the GC content of individual heteroduplexes. In addition, single base mismatches have a pronounced effect on the hybridization reassociation of short oligos, and point mutations can thus be readily detected using oligo chips. cDNA microarrays cDNA microarrays containing large DNA segments such as cDNAs are generated by physically depositing small amounts of each DNA of interest onto known locations on glass
surfaces. Two technologies for printing microarrays are (1) mechanical microspotting, and (2) ink-jetting. Mechanical microspotting has been extensively used at, e.g., Stanford University, and it utilizes pins or capillaries to deposit small quantities of DNA onto known addresses using motion control systems. Recent advances in microspotting technology using modern arraying robots allow for the preparation of 100 microarrays containing over 10,000 features in less than 12 hours. A DNA arrayer is relatively easy to set up, and the cost is usually low compared to on-chip oligoarrayers. cDNA microarrays are capable of profiling gene expression patterns of tens of thousands of genes in a single experiment. To compare the relative abundance of the arrayed gene sequences in two DNA or RNA samples, e.g., the total mRNA isolated from two different cell populations, the two samples are first labeled using two different fluorescent dyes such as Cy-3 and Cy-5. The labeled samples are mixed and hybridized to the clones on the array slide. After the hybridization, laser excitation of the incorporated, fluorescent target molecules yields an emission with a characteristic spectra, which is measured with a confocal laser scanner. The monochrome images from the scanner are imported to the software in which the images are pseudo-colored and merged. Data from a single hybridization is viewed as a normalized ratio in which significant deviations from the ratio are indicative of either increased or decreased expression levels relative to the reference sample. Data from multiple experiments can be examined using any number of data mining tools. Current status of array technology
It has now become clear that cDNA microarrays, originally developed by Pat Brown and co-workers at the Stanford University, are sensitive, but may not be sufficiently specific with respect to, e.g., discrimination of homologous transcripts in gene families and alternatively spliced isoforms. On the other hand, the Affymetrix GeneChip system is specific, but may not be sensitive enough. This lack of sensitivity may explain why
Affymetrix uses 16x 26-mer perfect match capture probes together with 16x25-mer mismatch probes per transcript in its expression profiling chips resulting in enormous data sets in genome- wide arrays. Therefore, the functional genomics field is in the process of switching, as they run out of samples, from existing PCR-amplified cDNA fragment libraries for microarraying to custom longmer oligonucleotide arrays comprising transcript-specific oligonucleotide capture probes typically in the range of 30-mers to 80-mers, thus addressing both specificity and sensitivity.
Alternative splicing
As the field of genomics research is shifting from the acquisition of genome sequences to high-throughput functional genomics, there is an increasing need to understand the dynamics within the genetic regulation as well as RNA and protein sequences in order to elucidate gene expression in all its complexity. A common feature for eukaryotic genes is that they are composed of protein-encoding exons and introns. Introns (intra-genic-regions) are non-coding DNA which interrupt the exons. Introns are characterized by being excised from the pre-mRNA molecule in RNA splicing, as the sequences on each side of the intron are spliced together. RNA splicing not only provides functional mRNA, but is also responsible for generating additional diversity. This phenomenon is called alternative splicing, which results in the production of different mRNAs from the same gene. The mRNAs that represent isoforms arising from a single gene can differ by the use of alternative exons or retention of an intron that disrupts two exons. This process often leads to different protein products that may have related or drastically different, even antagonistic, cellular functions. There is increasing evidence indicating that alternative splicing is very widespread (Croft et al. Nature Genetics, 2000). Recent studies have revealed that at least 60% of the roughly 35,000 genes in the human genome are alternatively spliced. Clearly, by combining different types of modifications and thus generating different possible combinations of transcripts of different genes, alternative splicing is a potent mechanism for generating protein diversity. Analysis of the spliceome, in turn, represents a novel approach to both functional genomics and pharmacogenomics. Antisense transcription in eukaryotes
RNA-mediated gene regulation is widespread in higher eukaryotes and complex genetic phenomena like RNA interference, co-suppression, transgene silencing, imprinting, methylation, and possibly position-effect variegation and transvection, all involve intersecting pathways based on or connected to RNA signalling (Mattick 2001; EMBO reports 2, 11 : 986-991). Recent studies indicate that antisense transcription is a very common phenomenon in the mouse and human genomes (Okazaki et al. 2002; Nature 420: 563-573; Yelin et al. 2003, Nature Biotechnol.). Thus, antisense modulation of gene expression in e.g. human cells may be a common regulatory mechanism. In light of this, the present invention provides novel tools, in which non-naturally occurring nucleic acids, such as LNA oligonucleotides, can be designed to silence or modulate the regulation of a given mRNA by non-coding antisense RNA, by designing a complementary sense LNA oligonucleotide for
the regulatory antisense RNA. This has a high potential in target identification, target validation and therapeutic use of LNA oligonucleotides as modulating and silencing sense nucleic acid agents.
Misplaced control of alternative splicing can cause disease The detection of the detailed structure of all transcripts is an important goal for molecular characterization of a cell or tissue. Without the ability to detect and quantify the splice variants present in one tissue, the transcript content or the protein content cannot be described accurately. Molecular medical research shows that many cancers result in altered levels of splice variants, so an accurate method to detect and quantify these transcripts is required. Mutations that produce an aberrant splice form can also be the primary cause of such severe diseases such as spinal muscular dystrophy and cystic fibrosis.
Much of the study of human disease, indeed much of genetics is based upon the study of a few model organisms. The evolutionary stability of alternative splicing patterns and the degree to which splicing changes according to mutations and environmental and cellular conditions influence the relevance of these model systems. At present, there is little understanding of the rates at which alternative splicing patterns change, and the factors influencing these rates. Table 1 shows a set of genes that are known to be alternatively spliced and that are orthologs of known human disease genes.
Table 1. C. elegans disease orthologs that are known to be differentially spliced in C. elegans.
Disease C. elegans gene BLAST E value brABLl M79.1A 1.00E-162
X-Linked Lymphoprol.-SH2D 1 A M79.1 A 2.00E-58 Cyclin Dep. Kinase 4-CDK4 Fl 8H3.5A 1.00E-124
HNPCC*-PMS2 H12C20.2A 1.00E-123 Neurofibromatosis 2-NF2 C01G8.5A 5.00E-163 Duchenne MD+-DMD F32B4.3A O.OOE+00 Coffm-Lowry-RPS6KA3 T01H8.1A 2.00E-13 Septooptic Dysplasia-HESXl Y113G7A.6A 1.00E-152
Non-Insulin Dep. Diabet.-PCSKl Fl 1 A6.1 A 1.00E-166
Bartter's-SLC12A1 Y37A1C.1A 1.00E-167
Gitelmans-SLC12A3 Y37A1C.1A O.OOE+00
Hered. Spherocytosis-ANKl B0350.2A 1.00E-09
Darier-White-SERCA K11D9.2A 0.00E+00
Spondyloepip.Dysp .-COL2A1 F01G12.5A/let-2 9.00E-20
Previously, other microarray analyses have been performed with the aim of detecting either splicing of RNA transcripts er se in yeast, or of detecting putative exon skipping splicing events in rat tissues, but neither of these approaches had sufficient resolution to estimate quantities of splice variants, a factor that could be essential to an understanding of the changes in cell life cycle and disease.
Thus, improved methods are needed for nucleic acid amplification, hybridization, and classification. Desirable methods can distinguish between mRNA splice variants and quantitate the amount of each variant in a sample. Other desirable methods can detect differences in expressions patterns between patient nucleic acid samples and nucleic acid standards.
Summary of the Invention
The present invention demonstrates the usefulness of LNA-modified oligonucleotides in the construction of highly specific and sensitive microarrays for expression profiling (e.g., mRNA splice variant detection) and comparative genomic hybridization. The invention provides novel technology platforms based on nucleic acids with LNA or other high affinity nucleotides for sensitive and specific assessment of alternative splicing using microarray technology. As opposed to high-density cDNA or DNA oligonucleotide microarrays, LNA microarrays are able to discriminate between highly homologous as well as differentially spliced transcripts. The invention furthermore provides methods for highly sensitive and specific nucleic acid detection by fluorescence in situ hybridization using LNA-modified oligonucleotides. The present methods greatly facilitate the analysis of gene expression patterns from a particular species, tissue, cell type. The analysis of the human spliceome provides important information for pharmacogenetics. Thus, the present methods are highly valuable in medical research and diagnostics as well as in dmg development and toxicological studies.
In general, the invention features populations of high affinity nucleic acids that have duplex stabilizing properties and thus are useful for a variety of nucleic acid detection, amplification, and hybridization methods (e.g., expression or mRNA splice variant profiling). Some of these oligonucleotides contain novel nucleotides created by combining specialized synthetic nucleobases with an LNA backbone, thus creating high affinity oligonucleotides with specialized properties such as reduced sequence discrimination for the complementary strand or reduced ability to form intramolecular double stranded structures. The invention also provides improved methods for identifying nucleic acids in a sample and for classifying a nucleic acid sample by comparing its pattern of hybridization to an array to the corresponding pattern of hybridization of one or more standards to the array (e.g., comparative genomic hybridization).
Other desirable modified bases have decreased ability to self-anneal or to form duplexes with oligonucleotides containing one or more modified bases. The invention also provides arrays of nucleic acids containing these modified bases that have a decreased variance in melting temperature and/or an increased capture efficiency compared to naturally-occurring nucleic acids. These arrays as well as the oligonucleotides in solution can be used in a variety of applications for the detection, characterization, identification, and/or amplification of one or more target nucleic acids. These oligonucleotides and oligonucleotides of the invention in general can also be used for solution assays, such as homogeneous assays. Merged Probes
In one aspect, the invention features a non-naturally-occurring nucleic acid with a melting temperature that is at least 3, 5, 8, 10, 12, 15, 20, 25, 30, 35, or 40°C higher than that of the corresponding control nucleic acid with 2'-deoxynucleotides. The nucleic acid is capable of hybridizing to a first region within a first exon of a target nucleic acid and to a second region within a second exon of the target nucleic acid that is adjacent to the first exon. In a related aspect, the invention provides a non-naturally-occurring nucleic acid with a melting temperature that is at least 3, 5, 8, 10, 12, 15, 20, 25, 30, 35, or 40°C higher than that of the corresponding control nucleic acid with 2'-deoxynucleotides. The nucleic acid hybridizes to a first region within an exon of a target nucleic acid and to a second region within an intron of the target nucleic acid that is adjacent to the exon.
In another aspect, the invention features a non-naturally-occurring nucleic acid with a melting temperature that is at least 3, 5, 8, 10, 12, 15, 20, 25, 30, 35, or 40°C higher than that
of the corresponding control nucleic acid with 2'-deoxynucleotides. The nucleic acid hybridizes to a first region within a first intron of a target nucleic acid and to a second region within a second intron of the target nucleic acid that is adjacent to the first intron.
In yet another aspect, the invention provides a nucleic acid that is a non-naturally- occurring nucleic acid with a capture efficiency that is at least 10, 25, 50, 100, 150, 200, 500, 800, 1000, or 1200% greater than that of a corresponding control nucleic acid with 2'- deoxynucleotides at the temperature equal to the melting temperature of the nucleic acid. The nucleic acid hybridizes to a first region within a first exon of a target nucleic acid and to a second region within a second exon of the target nucleic acid that is adjacent to the first exon.
In a related aspect, the invention features a nucleic acid that is a non-naturally- occurring nucleic acid with a capture efficiency that is at least 10, 25, 50, 100, 150, 200, 500, 800, 1000, or 1200% greater than that of a corresponding control nucleic acid with 2'- deoxynucleotides at the temperature equal to the melting temperature of the nucleic acid. The nucleic acid hybridizes to a first region within an exon of a target nucleic acid and to a second region within an intron of the target nucleic acid that is adjacent to the exon.
In yet another aspect, the invention provides a nucleic acid that is a non-naturally- occurring nucleic acid with a capture efficiency that is at least 10, 25, 50, 100, 150, 200, 500, 800, 1000, or 1200% greater than that of a corresponding control nucleic acid with 2'- deoxynucleotides at the temperature equal to the melting temperature of the nucleic acid. The nucleic acid hybridizes to a first region within a first intron of a target nucleic acid and to a second region within a second intron of the target nucleic acid that is adjacent to the first intron.
In desirable embodiments, the nucleic acids of the invention featuring a non-naturally occurring nucleic acid exhibit increased duplex stability due to slower rates of dissociation of the nucleic acid complexes (the off-rate) (Christensen et al. 2001, Biochem. J. 354: 481-484). In one aspect of the invention the structure of desirable adenosine, thymine, guanine and cytosine analogs are those disclosed in PCT Publication No. WO 97/12896, Formula 5, 6, 7, 8, 9, 10, 11, 12 and 13. These modified bases may be incorporated as part of an LNA, DNA, or RNA unit and used any of the oligomers of the invention.
In still another aspect, the invention features a nucleic acid that is an LNA (i.e., a nucleic acids with one or more LNA units) and that hybridizes to a first region within a first
exon of a target nucleic acid and to a second region within a second exon of the target nucleic acid that is adjacent to the first exon.
In another aspect, the invention features a nucleic acid that is an LNA and that hybridizes to a first region within an exon of a target nucleic acid and to a second region within an intron of the target nucleic acid that is adjacent to the exon.
In one aspect, the invention provides nucleic acid that is an LNA and that hybridizes to a first region within a first intron of a target nucleic acid and to a second region within a second intron of the target nucleic acid that is adjacent to the first intron.
In desirable embodiments of any of the above aspects, the length of the segment of the nucleic acid hybridizing to the first region and the length of the segment of the nucleic acid hybridizing to the second region are between 3 and 50 nucleotides, 10 and 40 nucleotides, or 20 and 30 nucleotides, inclusive. The length of the segment of the nucleic acid hybridizing to the first region and the length of the segment of the nucleic acid hybridizing to the second region may be the same length or different lengths. Desirably, the nucleic acid containing LNA units are symmetrically spaced on both sides of a junction between either two exons, an exon and an intron, or two introns, or alternatively, the nucleic acid containing LNA units are spaced on both sides of a junction based on equalized duplex melting temperatures of the segments. Desirably, the nucleic acid has one or more LNA units within 5, 4, 3, 2, or 1 nucleotides of a junction between either two exons, an exon and an intron, or two introns.
In another aspect, the invention features a population of nucleic acids that includes one or more nucleic acids of any one of the above aspects. Internal Probes
In another aspect, the invention features a non-naturally-occurring nucleic acid with a melting temperature that is at least 3, 5, 8, 10, 12, 15, 20, 25, 30, 35, or 40°C higher than that of the corresponding control nucleic acid with 2'-deoxynucleotides. The nucleic acid hybridizes to only one exon or to only one intron of a target nucleic acid.
In a related aspect, the invention features a non-naturally-occurring nucleic acid with a capture efficiency that is at least 10, 25, 50, 100, 150, 200, 500, 800, 1000, or 1200% greater than that of a corresponding control nucleic acid with 2'-deoxynucleotides at the temperature equal to the melting temperature of the nucleic acid. The nucleic acid hybridizes to only one exon or to only one intron of a target nucleic acid.
In another aspect, the invention features a nucleic acid that is an LNA and that hybridizes to only one exon or to only one intron of a target nucleic acid.
In desirable embodiments of the above aspects for nucleic acids that hybridizes to only one exon or only one intron, the nucleic acid does not hybridize to both an exon and an intron.
In another aspect, the invention features a population of nucleic acids that includes one or more nucleic acids of any one of the above aspects. Pharmaceutical Compositions and Nucleic Acid Populations
In another aspect, the invention features a pharmaceutical composition that includes one or more of the nucleic acids of the invention and a pharmaceutically acceptable carrier, such as one of the carriers described herein.
In another aspect, the invention features a population of two or more nucleic acids of the invention. The populations of nucleic acids of the invention may contain any number of unique molecules. For example, the population may contain as few as 10, 102, 104, or 105 unique molecules or as many as 10 , 10 , 10 or more unique molecules. In desirable embodiments, at least 1, 5, 10, 50, 100 or more of the polynucleotide sequences are a non- naturally-occurring sequence. Desirably, at least 20, 40, or 60% of the unique polynucleotide sequences are non-naturally-occurring sequences. Desirably, the nucleic acids are all the same length; however, some of the molecules may differ in length. Desirable Embodiments of Any of the Above Aspects
In desirable embodiments of any of the above aspects, the length of one or more nucleic acids (e.g., nucleic acids in a nucleic acid population of the invention) is between 15 and 150 nucleotides, 5 and 100 nucleotides, 20 and 80 nucleotides, or 30 and 60 nucleotides in length, inclusive. In particular embodiments, the nucleic acid is 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50 nucleotides or at least 60, 70, 80, 90, 100, 120, or 130 nucleotides in length. In additional embodiments, the nucleic acid is between 8 and 40 nucleotides, such as between 9 and 30, or 12 and 25, or 15 and 20 nucleotides. Desirably, at least 5, 10, 15, 20, 30, 40, 50, 60, or 70% of the nucleotides in the nucleic acid are LNA units. In desirable embodiments, every second nucleotide, every third, every fourth nucleotide, every fifth nucleotide, or every sixth nucleotide in the nucleic acid is an LNA unit. In various embodiments, (i) every second and every third nucleotide, (ii) every second and every fourth nucleotide, (iii) every second and every fifth nucleotide, (iv) every second and every sixth nucleotide, (v) every third and every fourth nucleotide, (vi) every third and
every fifth nucleotide, (vii) every third and every sixth nucleotide, (viii) every fourth and every fifth nucleotide, (ix) every fourth and every sixth nucleotide, and/or (x) every fifth and every sixth nucleotide in the nucleic acid is an LNA unit. Desirably, every second, every third, and every fourth nucleotide in the nucleic acid is an LNA unit. In desirable embodiments, the nucleic acids of the invention have one or more of the following substitution patterns which is repeated throughout the nucleic acids: XxXx, xXxX, XxxXxx, xXxxXx, xxXxxX, XxxxXxxx, xXxxxXxx, xxXxxxXx, or xxxXxxxX in which "X" denotes an LNA unit and "x" denotes a DNA or RNA unit. In some embodiments, the nucleotides that are not LNA units are naturally-occuring DNA or RNA nucleotides. In various embodiments, the nucleic acid comprises two or more contiguous LNA units. Desirably, the nucleic acid comprises at least 2, 3, 4, 5, 6, 7, or 8 contiguous LNA units. In desirable embodiments, the number of contiguous LNA units is between 5 and 20% or 10 and 15% of the total length of the nucleic acid. In a particular embodiment, 5 contiguous nucleotides of a 50-mer merged probe are LNA units. In one embodiment, the nucleic acid does not have greatly extended stretches of modified DNA or RNA residues, e.g. greater than about 4, 5, 6, 7, or 8 consecutive modified DNA or RNA residues. According to this embodiment, one or more non-modified DNA or RNA units are present after a consecutive stretch of about 3, 4, 5, 6, 7, or 8 modified nucleic acids.
Other desirable nucleic acids have an LNA substitution pattern that results in the formation of negligible secondary structure by the nucleic acids with itself. In one such embodiment, the nucleic acids do not form hairpins or do not form other secondary structures that would otherwise inhibit or prevent their binding to a target nucleic acid. Desirably, opposing nucleotides in a palindrome pair or opposing nucleotides in inverted repeats or in reverse complements are not both LNA units. In various embodiments, the nucleic acids in the first population form less than 3, 2, or 1 intramolecular base-pairs or base-pairs between two identical molecules. In desirable embodiments, the nucleic acid does not have LNA-5- nitroindole: LNA-5-nitroindole intramolecular base-pairs.
In other desirable embodiments, at least one LNA unit (e.g., at least 2, 3, 4, 5, 6, 7, 8, 9, or 10 LNA units) in the nucleic acid hybridizes to a first region within a first exon of a target nucleic acid and at least one LNA unit (e.g., at least 2, 3, 4, 5, 6, 7, 8, 9, or 10 LNA units) in the nucleic acid hybridizes to a second region within a second exon of the target nucleic acid that is adjacent to the first exon. The number of LNA units that bind to each region can be the same or different. In some embodiments, the 5' terminal nucleotide of the
nucleic acid is or is not an LNA unit. Desirably, the 3' terminal nucleotide of the nucleic acid is not an LNA unit (e.g., the nucleic acid may contain a 3' terminal naturally-occurring nucleotide).
Desirably, the nucleic acid can distinguish between different nucleic acids (e.g., mRNA splice variants) that cannot be distinguished using a naturally-occurring control nucleic acid (e.g., a control nucleic acid that consists of only 2'-deoxynucleotides such as a control nucleic acid of the same length as the nucleic acid of the invention). Desirably, the hybridization intensity of the nucleic acid for an exon of interest is at least 2, 3, 4, 5, 6, or 10 fold greater than the hybridization intensity of the nucleic acid for another exon in the same target nucleic acid (e.g., mRNA) or in another nucleic acid. Desirably, the hybridization intensity of the nucleic acid for target nucleic acid is at least 2, 3, 4, 5, 6, or 10 fold greater than the hybridization intensity for a non-target nucleic acid with less than 99, 95, 90, 80, 70, or 60% sequence identity to the target nucleic acid.
Desirably, all of the nucleic acids of the population or all of the nucleic acids of a subpopulation of the population are the same length. In some embodiments, the population includes one or more nucleic acids of a different length. In some embodiments, longer nucleic acids contain one or more nucleotides with universal bases. For example, nucleotides with universal bases can be used to increase the thermal stability of nucleic acids that would otherwise have a thermal stability lower than some or all of the nucleic acids in the population. In some embodiments, one or more nucleic acids have a universal base located at the 5' or 3' terminus of the nucleic acid. In desirable embodiments, one or more (e.g., 2, 3, 4, 5, 6, or more) universal bases are located at the 5' and 3' termini of the nucleic acid. Desirably, all of the nucleic acids in the population have the same number of universal bases. Desirable universal bases include inosine, pyrene, 3-nitropyrrole, and 5-nitroindole. In desirable embodiments, the nucleic acid has at least one LNA A or LNA T. In some embodiments, each nucleic acid has at least one LNA A or LNA T. Desirably, all of the adenine and thymine-containing nucleotides in the LNA are LNA A and LNA T, respectively. In some embodiments, a nucleic acid with a increased capture efficiency or melting temperature compared to a control nucleic acid has at least one LNA T or LNA C. In some embodiments, all of the thymine and cytosine-containing nucleotides in the LNA are LNA T and LNA C, respectively. In some embodiments, a nucleic acid with an increased specificity or decreased self-complementarity compared to a control nucleic acid has at least one LNA A or LNA C. In some embodiments, all of the adenine and cytosine-containing
nucleotides in the LNA are LNA A and LNA C, respectively. Desirably, at least 10, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100% of the nucleic acids in the population have one ore more LNA units.
In desirable embodiments, the LNA has at least one 2,6,-diaminopurine, 2- aminopurine, 2-thio-thymine, 2-thio-uracil, inosine, or hypoxanthine base. Desirably, the LNA has a nucleotide with a 2'0, 4'C-methylene linkage between the 2' and 4' position of a sugar moiety. In some embodiments, one or more nucleic acids in the first population are LNA/DNA, LNA/RNA, or LNA/DNA/RNA chimeras.
In desirable embodiments of any of the above aspects, the variance in the melting temperature of the population is at least 10, 20, 30, 40, 50, 60, or 70% less than the variance in the melting temperature of the corresponding control population of nucleic acids of the same length with 2'-deoxynucleotides (e.g., DNA nucleotides) instead of LNA units or other modified or non-naturally-occurring units. In desirable embodiments, the standard deviation in melting temperature is less than 10, 9.5, 9, 8.5, 8, 7.5, 7, 6.5, or 6. In certain embodiment, the range in melting temperatures for nucleic acids in the population is less than 70, 60, 50, 40, 30, or 20°C. Desirably, the variance in the melting temperature of the population is less than 59, 50, 40, 30, 25, 20, 15, 10, or 5.
In still other embodiments, the nucleic acids are covalently bonded to a solid support. Desirably, the nucleic acids are in a predefined arrangement. In various embodiments, the first population has at least 10; 100; 1,000; 5,000; 10,000; 100,000; or 1,000,000 different nucleic acids. Desirably, the nucleic acids in the population together hybridize to at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, or 100% of the exons of a target nucleic acid. In desirable embodiments, the population includes nucleic acids that together hybridize to at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, or 100% of the nucleic acids expressed by a particular cell or tissue. In some embodiments, the population includes nucleic acids that together hybridize to at least one exon from at least 1, 5, 10, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100% of the nucleic acid sequences expressed by a particular cell or tissue at a given point in time (e.g., an expression array with sequences corresponding to the sequences of mRNA molecules expressed by a particular cell type or a cell under a particular set of conditions). In some embodiments, the plurality of nucleic acids are used as PCR primers or FISH probes.
Desirable modified bases of the present invention when incorporated into the central position of a 9-mer oligonucleotide (all other eight residues or units being natural DNA or RNA units with natural bases) exhibit a Tm difference equal to or less than about 15, 12, 10,
9, 8, 7, 6, 5, 4, 3 or 2°C upon hybridizing to the four complementary oligonucleotide variants that are identical except for the unit corresponding to the LNA unit, where each variant has one of the natural bases uracil, cytosine, thymine, adenine or guanine. That is, the highest and the lowest Tm (referred to herein as the Tm differential) obtained with such four complementary sequences is 15, 12, 10, 9, 8, 7, 6, 5, 4, 3 or 2°C or less.
Modified nucleic acid oligomers of the invention desirably contain at least one LNA unit, such as an LNA unit with a modified nucleobase. Modified nucleobases or nucleosidic bases desirably base-pair with adenine, guanine, cytosine, uracil, or thymine. Exemplary oligomers contain 2 to 100, 5 to 100, 4 to 50, 5 to 50, 5 to 30, or 8 to 15 nucleic acid units. In some embodiments, one or more LNA units with natural nucleobases are incorporated into the oligonucleotide at a distance from the LNA unit having a modified base of 1 to 6 (e.g., 1 to 4) bases. In certain embodiments, at least two LNA units with natural nucleobases are flanking an LNA unit having a modified base. Desirably, at least two LNA units independently are positioned at a distance from the LNA unit having the modified base of 1 to 6 (e.g., 1 to 4 bases).
Desirable modified nucleobases or nucleosidic bases for use in nucleic acid compositions of the invention include optionally substituted carbon alicyclic or carbocyclic aryl groups (i.e., only carbon ring members), particularly multi-ring carbocyclic aryl groups such as groups having 2, 3, 4, 5, 6, 7, or 8 linked, particularly fused carbocyclic aryl moieties. Optionally substituted pyrene is also desirable. Such nucleobases or nucleosidic bases can provide significant performance results, as demonstrated in the examples which follow. Heteroalicyclic and heteroaromatic nucleobases or nucleosidic bases also are suitable. In some embodiments, the carbocyclic moiety is linked to the 1 '-position of the LNA unit through a linker (e.g., a branched or straight alkylene or alkenylene). Desirable LNA units have a carbon or hetero alicyclic ring with four to six ring members, e.g. a furanose ring, or other alicyclic ring structures such as a cyclopentyl, cycloheptyl, tetrahydropyranyl, oxepanyl, tetrahydrothiophenyl, pyrrolidinyl, thianyl, tliiepanyl, piperidinyl, and the like. In one aspect, at least one ring atom of the carbon or hetero alicyclic group is taken to form a further cyclic linkage to thereby provide a multi- cyclic group. The cyclic linkage may include one or more, typically two atoms, of the carbon or hetero alicyclic group. The cyclic linkage also may include one or more atoms that are substituents, but not ring members, of the carbon or hetero alicyclic group. Other desirable LNA units are compounds having a substituent on the 2'-position of the central sugar moiety
(e.g., ribose or xylose), or derivatives thereof, which favors the C3'-endo conformation, commonly referred to as the North (or simply N for short) conformation. These LNA units include ENA (2'-O,4'-C-ethylene-bridged nucleic acids such as those disclosed in WO 00/47599) units as well as non-bridged riboses such as 2'-F or 2'-O-methyl. For any of the above aspects, an exemplary control nucleic acid has β-D-2- deoxyribose instead of one or more bicyclic or sugar groups of a LNA unit or other modified or non-naturally-occurring units in a nucleic acid of the first population. In some embodiments, the nucleic acid or population of the invention and the control nucleic acid or population only have naturally-occurring nucleobases. If a nucleic acid in the nucleic acid or population of the invention has one or more non-naturally-occurring nucleobases, the capture efficiency of the corresponding control nucleic acid is calculated as the average capture efficiency for all of the nucleic acids that have either A, T, C, G or mC (methyl Cytosin) in each position corresponding to a non-naturally-occurring nucleobase in the nucleic acid in the first population. Complex of Target Nucleic Acids and Nucleic Acid Probes
In one aspect, the invention features a complex of one or more target nucleic acids and nucleic acid of the invention (e.g., nucleic acid probes) in which one or more target nucleic acids are hybridized to a plurality of nucleic acids of the invention. Desirably, at least 2, 3, 4, 5, 6, 7, 10, 15, 20, 30, or 40 different target nucleic acids are hybridized. In some embodiments, the target nucleic acids are cDNA molecules reverse transcribed from a patient sample or cRNA molecules amplified from a patient sample using a T7 RNA polymerase-based linear amplification system or the like. The target nucleic acids are labeled prior to hybridization to the nucleic acids of invention. Methods for Detecting or Amplifying Target Nucleic Acids In one aspect, the invention features a method for detecting the presence of one or more target nucleic acids in a sample. This method involves incubating a nucleic acid sample with one or more nucleic acids of the invention under conditions that allow at least one target nucleic acid to hybridize to at least one of the nucleic acids of the invention. Desirably, hybridization is detected for at least 2, 3, 4, 5, 6, 8, 10, or 12 target nucleic acids. In some embodiments, the method further includes contacting the target nucleic acid with a second nucleic acid or a population of second nucleic acids that binds to a different region of the target molecule than the first nucleic acid. Desirably, the method further involves identifying one or more hybridized target nucleic acids and/or determining the amount of one or more
hybridized target nucleic acids. In desirable embodiments, the method further includes determining the presence or absence of an mRNA splice variant of interest in the sample and/or determining the presence or absence of a mutation, deletion, and/or duplication of an exon of interest. In some embodiments, the mutation, deletion, and/or duplication is indicative of a disease, disorder, or condition, such as cancer.
In desirable embodiments of any of the above detection methods, at least 5, 10, 15, 20, 30, 40, 50, 80, 100, 150, 200, or more target nucleic acids hybridize to the nucleic acids of the invention. Desirably, the method is repeated under one or more different incubation conditions. In particular embodiments, the method is repeated at 1, 3, 5, 8, 10, 15, 20, 30, 40 or more different temperatures, cation concentrations (e.g., concentrations of monovalent cations such as Na+ and K+ or divalent cations such as Mg2+ and Ca2+), denaturants (e.g., hydrogen bond donors or acceptors that interfere with the hydrogen bonds keeping the base- pairs together such as formamide or urea). Desirably, the method also includes identifying the target nucleic acid hybridized to the nucleic acids of the invention and/or determining the amount of the target nucleic acid hybridized to the nucleic acids of the invention. In particular embodiments, the target nucleic acids are labeled with a fluorescent group. In certain embodiments, the labeling is repeated using different fluorescent groups (e.g., labelling for so-called dye-swap labeling experiments). In desirable embodiments, the determination of the amount of bound target nucleic acid involves one or more of the following: (i) adjusting for the varying intensity of the excitation light source used for detection of the hybridization, (ii) adjusting for photobleaching of the fluorescent group, and/or (iii) comparing the fluorescent intensity of the target nucleic acid(s) hybridized to the nucleic acids of the invention of nucleic acids to the fluorescent intensity of a different sample of nucleic acids hybridized to the nucleic acids of the invention (e.g., a different sample hybridized to the same population of nucleic acids of the invention on the same or a different solid support such as the same chip or a different chip). Desirably, this comparison in fluorescent intensity involves adjusting for a difference in the amount of the nucleic acids of the invention used for hybridization to each sample and/or adjusting for a difference in the
9+ buffer (e.g., a difference in Mg concentration) used for hybridization to each sample or scaling for different labeling efficiencies with different fluorochromes.
Desirably, the target nucleic acids are cDNA molecules reverse transcribed from a patient sample or cRNA molecules amplified using a T7 RNA polymerase-based linear amplification system or the like from a patient sample. In particular embodiments, the
sample has nucleic acids that are amplified using one or more primers specific for an exon of a target nucleic acid, and the method involves determining the presence or absence of an mRNA splice variant with the exon in the sample. Desirably, one or more of the primers are specific for an exon or exon-exon junction of a pathogen of interest, and the method involves determining the presence or absence of a nucleic acid with the exon in the sample.
In a desirable embodiment, the nucleic acids of the invention are covalently bonded to a solid support by reaction of a nucleoside phosphoramidite with an activated solid support, and subsequent reaction of a nucleoside phosphoramide with an activated nucleotide or nucleic acid bound to the solid support. In some embodiments, the solid support or the growing nucleic acid bound to the solid support is activated by illumination, a photogenerated acid, or electric current.
In another aspect, the invention features a method for amplifying a target nucleic acid molecule. The method involves (a) incubating a first nucleic acid of the invention with a target nucleic acid under conditions that allow the first nucleic acid to bind the target nucleic acid; and (b) extending the first nucleic acid with the target nucleic acid as a template. Desirably, the method further involves contacting the target nucleic acid with a second nucleic acid (e.g., a second nucleic acid of the invention) that binds to a different region of the target nucleic acid than the first nucleic acid. In various embodiments, the sequence of the target nucleic acid is known or unknown. In one aspect, the invention features a method of detecting a nucleic acid of a pathogen (e.g., a nucleic acid in a sample such as a blood or urine sample from a mammal). This method involves contacting a nucleic acid probe of the invention (e.g., a probe specific for an exon or a mRNA from a particular pathogen or family of pathogens) with a nucleic acid sample under conditions that allow the probe to hybridize to at least one nucleic acid in the sample. The probe is desirably at least 60, 70, 80, 90, 95, or 100% complementary to a nucleic acid of a pathogen (e.g., a bacteria, virus, or yeast such as any of the pathogens described herein). Hybridization between the probe and a nucleic acid in the sample is detected, indicating that the sample contains the corresponding nucleic acid from a pathogen. In some embodiments, the method is used to determine what strain of a pathogen has infected a mammal (e.g., a human) by determining whether a particular nucleic acid is present in the sample. In other embodiments, the probe has a universal base in a position corresponding to a nucleotide that varies among different strains of a pathogen, and thus the probe detects the presence of a nucleic acid from any of a multiple of pathogenic strains.
Methods for Classifying Nucleic Acids Samples
In one aspect, the invention features a method for classifying a test nucleic acid sample including target nucleic acids. This method involves (a) incubating a test nucleic acid sample with a one or more nucleic acids of the invention under conditions that allow at least one of the nucleic acids in the test sample to hybridize to at least one nucleic acid of the invention, (b) detecting a hybridization pattern of the test nucleic acid sample, and (c) comparing the hybridization pattern to a hybridization pattern of a first nucleic acid standard, whereby the comparison indicates whether or not the test sample has the same classification as the first standard. Desirably, the method also includes comparing a hybridization pattern of the test nucleic acid sample to a hybridization pattern of a second standard. In various embodiments, a hybridization pattern of the test nucleic acid sample is compared to at least 3, 4, 5, 8, 10, 15, 20, 30, 40, or more standards.
Desirably, the method also includes identifying the hybridized target nucleic acid and/or determining the amount of hybridized target nucleic acid. In particular embodiments, the target nucleic acids are labeled with a fluorescent group. Desirably, the first nucleic acid standard is labeled with a different fluorescent group. The fluorescence of the target nucleic acids and the first nucleic acid standard can be detected simultaneously or sequentially. In desirable embodiments, the method further includes determining the presence or absence of an mRNA splice variant of interest in the sample and/or determining the presence or absence of a mutation, deletion, and/or duplication of an exon of interest. In some embodiments, the mutation, deletion, and/or duplication is indicative of a disease, disorder, or condition, such as cancer.
In desirable embodiments, the determination of the amount of bound target nucleic acid involves one or more of the following: (i) adjusting for the varying intensity of the excitation light source used for detection of the hybridization, (ii) adjusting for photobleaching of the fluorescent group, andor (iii) comparing the fluorescent intensity of the target nucleic acid(s) hybridized to the nucleic acids of the invention to the fluorescent intensity of a different sample of nucleic acids hybridized to the nucleic acids of the invention (e.g., a different sample hybridized to same set of nucleic acids of the invention on the same or a different solid support such as the same chip or a different chip). Desirably, this comparison in fluorescent intensity involves adjusting for a difference in the amount of the plurality used for hybridization to each sample and/or adjusting for a difference in the buffer (e.g., a difference in Mg concentration) used for hybridization to each sample.
Desirably, the nucleic acids in the population together hybridize to at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, or 100% of the exons of a target nucleic acid. In desirable embodiments, the population includes nucleic acids that together hybridize to at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 95, or 100% of the nucleic acids expressed by a particular cell or tissue. In some embodiments, the population includes nucleic acids that together hybridize to at least one exon from at least 1, 5, 10, 20, 25, 30, 40, 50, 60, 70, 80, 90, or 100% of the nucleic acid sequences expressed by a particular cell or tissue at a given point in time (e.g., an expression array with sequences corresponding to the sequences of mRNA molecules expressed by a particular cell type or a cell under a particular set of conditions). Desirably, the method further includes using a nucleic acid or a region of a nucleic acid that is present in a first test sample but not present in a first standard or not present in a second test sample as a probe or primer for the detection, amplification, or characterization of the nucleic acid.
In desirable embodiments of any of the above methods, at least 5, 10, 15, 20, 30, 40, 50, 80, 100, 150, 200, or more target nucleic acids hybridize to the nucleic acids of the invention. Desirably, the method is repeated under one or more different incubation or hybridization conditions. In particular embodiments, the method is repeated at 1, 3, 5, 8, 10, 15, 20, 30, 40 or more different temperatures, cation concentrations (e.g., concentration of monovalent cations such as Na+ and K+ or divalent cations such as Mg2+ and Ca 2); denaturants (e.g., hydrogen bond donors or acceptors that interfere with the hydrogen bonds keeping the base-pairs together such as formamide or urea).
In particular embodiments, the sample has nucleic acids that are amplified using one or more primers specific for an exon of a target nucleic acid, and the method involves determining the presence or absence of an mRNA splice variant with the exon in the sample. Desirably, one or more of the primers are specific for an exon or exon-exon junction of a pathogen of interest, and the method involves determining the presence or absence of a nucleic acid with the exon in the sample.
Desirably, the comparison of the hybridization pattern of a patient nucleic acid sample to that of one or more standards is used to determine whether or not a patient has a particular disease, disorder, condition, or infection or an increased risk for a particular disease, disorder, condition, or infection. In some embodiments, the comparison is used to determine what pathogen has infected a patient and to select a therapeutic for the treatment of the patient. Desirably, the comparison is used to select a therapeutic for the treatment or prevention of a disease or disorder in the patient. In yet other embodiments, the comparison
is used to include or exclude the patient from a group in a clinical trial. Desirably, the comparison is used to compare the expression of nucleic acids (e.g., mRNA splice forms associated with toxicity) in the presence and absence of a candidate compound (e.g., a lead compound for drug development). In other embodiments, the comparison is used to determine differences in expression of nucleic acids (e.g., mRNA splice variants) under particular conditions (e.g., under different environmental stress conditions) or at different developmental time points. In particular embodiments, the expression of one or more members from a particular enzyme class (e.g., protein kinase splice variants) is measured.
In a desirable embodiment, the nucleic acids of the invention are covalently bonded to a solid support by reaction of a nucleoside phosphoramidite with an activated solid support, and subsequent reaction of a nucleoside phosphoramide with an activated nucleotide or nucleic acid bound to the solid support. In some embodiments, the solid support or the growing nucleic acid bound to the solid support is activated by illumination, a photogenerated acid, or electric current. The use of a variety of different monomers in the nucleic acids of the invention offers a means to "fine tune" the chemical, physical, biological, pharmacokinetic, and pharmacological properties of the nucleic acids thereby facilitating improvement in their safety and efficacy profiles when used as a therapeutic drug. Applications for the Nucleic Acids of the Invention In another aspect, the invention features the use of one ore more nucleic acids of the invention for the detection, amplification, or classification of a nucleic acid of interest or a population of nucleic acids of interest.
In another aspect, the invention features the use of one or more nucleic acids of the invention for alternative mRNA splice variant detection, expression profiling, comparative genomic hybridization, or real-time PCR. In exemplary real-time PCR applications, the nucleic acids are used to determine the amount of one or more target nucleic acids (e.g., mRNA splice variants) in a sample. In particular embodiments, fluorescently labeled RT- PCR products from the amplification of a test nucleic acid sample are hybridized to a population of nucleic acids of the invention. Desirably, the amount of one or more RT-PCR products is measured to determine the amount of the corresponding nucleic acid in the initial sample.
In yet another aspect, the invention features the use of a nucleic of the invention as a PCR primer or FISH probe.
Methods for Selecting a Population of Nucleic Acid
In one aspect, the invention features a method of selecting a nucleic acid for a population of nucleic acids. This method involves (a) determining the melting temperature of a nucleic acid of the invention, determining the ability of the nucleic acid to self-anneal, determining the ability of the nucleic acid to hybridize to one or more exons or introns of a target nucleic acid, and/or determining the ability of the nucleic acid to hybridize to a non- target nucleic acid, and (b) selecting the nucleic acid for inclusion or exclusion from the population based on the determination in step (a). In desirable embodiments, step (a) is performed for at least 2, 3, 4, 5, 6, 10, 20, 50, 100, 200, 500, 1,000, 5,000 or more nucleic acids, and a subset of the nucleic acids are selected for inclusion in the population based on the determination in step (a). Desirably, the nucleic acids with the highest melting temperatures and/or ability to hybridize to one or more exons or introns of a target nucleic acid are selected. Desirably, the nucleic acids with the lowest ability to self-anneal and/or hybridize to a non-target nucleic acid are selected. Databases with Hybridization Patterns of Nucleic Acids Samples and/or Standards
The invention also features a variety of databases. These databases are useful for storing the information obtained in any of the methods of the invention. These databases may also be used in the diagnosis of disease or an increased risk for a disease or in the selection of a desirable therapeutic for a particular patient or class of patients. Accordingly, in one such aspect, the invention provides an electronic database including at least 1, 10, 102, 103, 5 x 103, 104, 105, 106, 107, 108, or 109 records of a nucleic acid of interest or a population of nucleic acids of interest (e.g., one or more nucleic acids in a standard or in a test nucleic acid sample) correlated to records of its hybridization pattern to a plurality of nucleic acids of the invention under one or more incubation conditions (e.g., one or more temperatures, denaturant concentrations, or salt concentrations).
In another aspect, the invention features computer including the database of the above aspect and a user interface (i) capable of displaying a hybridization pattern for a nucleic acid of interest or a population of nucleic acids of interest whose record is stored in the computer or (ii) capable of displaying a nucleic acid of interest (e.g., displaying the polynucleotide sequence or another identifying characteristic of the nucleic acid of interest) or a population of nucleic acids of interest that produces a hybridization pattern whose record is stored in the computer. Methods for Silencing a Target Nucleic Acid in a Cell or Animal
One method for inhibiting specific gene expression involves the use of antisense or double stranded oligonucleotides, which are complementary to a specific target messenger RNA (mRNA) sequence, such as a specific mRNA splice variant. Of special interest are oligonucleotides with a modified backbone (such as LNA or phosphorothioate) that are not readily degraded by endonucleases in the target cells.
In one aspect, the invention features the use of a nucleic acid of the invention for the manufacture of a pharmaceutical composition for treatment of a disease curable by an antisense or RNAi technology.
In one aspect, the invention provides a method for inhibiting the expression of a target nucleic acid in a cell. The method involves introducing into the cell a nucleic acid of the invention in an amount sufficient to specifically attenuate expression of the target nucleic acid. The introduced nucleic acid has a nucleotide sequence that is essentially complementary to a region of desirably at least 20 nucleotides of the target nucleic acid. Desirably, the cell is in a mammal. In a related aspect, the invention provides a method for preventing, stabilizing, or treating a disease, disorder, or condition associated with a target nucleic acid in a mammal. This method involves introducing into the mammal a nucleic acid of the invention in an amount sufficient to specifically attenuate expression of the target nucleic acid, wherein the introduced nucleic acid has a nucleotide sequence that is essentially complementary to a region of desirably at least 20 nucleotides of the target nucleic acid.
In another aspect, the invention provides a method for preventing, stabilizing, or treating a pathogenic infection in a mammal by introducing into the mammal a nucleic acid of the invention in an amount sufficient to specifically attenuate expression of a target nucleic acid of a pathogen. The introduced nucleic acid has a nucleotide sequence that is essentially complementary to a region of desirably at least 20 nucleotides of the target nucleic acid. In desirable embodiments of the therapeutic methods of the above aspects, the mammal is a human. In some embodiments, the introduced nucleic acid is single stranded or double stranded.
With respect to the therapeutic methods of the invention, it is not intended that the administration of nucleic acids to a mammal be limited to a particular mode of administration, dosage, or frequency of dosing; the present invention contemplates all modes of administration, including oral, intraperitoneal, intramuscular, intravenous, intraarticular, intralesional, subcutaneous, or any other route sufficient to provide a dose adequate to
prevent or treat a disease (e.g., a disease associated with the expression of a target nucleic acid that is silenced with a nucleic acid of the invention). One or more nucleic acids may be administered to the mammal in a single dose or multiple doses. When multiple doses are administered, the doses may be separated from one another by, for example, one week, one month, one year, or ten years. It is to be understood that, for any particular subject, specific dosage regimes should be adjusted over time according to the individual need and the professional judgment of the person administering or supervising the administration of the compositions.
Exemplary mammals that can be treated using the methods of the invention include humans, primates such as monkeys, animals of veterinary interest (e.g., cows, sheep, goats, buffalos, and horses), and domestic pets (e.g., dogs and cats). Exemplary cells in which one or more target genes can be silenced using the methods of the invention include invertebrate, plant, bacteria, yeast, and vertebrate (e.g., mammalian or human) cells.
Optimum dosages for gene silencing applications may vary depending on the relative potency of individual oligonucleotides, and can generally be estimated based on EC5o values found to be effective in in vitro and in vivo animal models. In general, dosage is from 0.001 ug to 100 g per kg of body weight (e.g., 0.001 ug/kg to 1 g/kg), and may be given once or more daily, weekly, monthly or yearly, or even once every 2 to 20 years (U.S.P.N. 6,440,739). Persons of ordinary skill in the art can easily estimate repetition rates for dosing based on measured residence times and concentrations of the drag in bodily fluids or tissues. Following successful treatment, it may be desirable to have the patient undergo maintenance therapy to prevent the recurrence of the disease state, wherein the oligonucleotide is administered in maintenance doses, ranging from 0.001 ug to 100 g per kg of body weight (e.g., 0.001 ug/kg to 1 g/kg), once or more daily, to once every 20 years. If desired, conventional treatments may be used in combination with the nucleic acids of the present invention.
Suitable carriers include, but are not limited to, saline, buffered saline, dextrose, water, glycerol, ethanol, and combinations thereof. The composition can be adapted for the mode of administration and can be in the form of, for example, a pill, tablet, capsule, spray, powder, or liquid. In some embodiments, the pharmaceutical composition contains one or more pharmaceutically acceptable additives suitable for the selected route and mode of administration. These compositions may be administered by, without limitation, any parenteral route including intravenous, intra-arterial, intramuscular, subcutaneous,
intradermal, intraperitoneal, infrathecal, as well as topically, orally, and by mucosal routes of delivery such as intranasal, inhalation, rectal, vaginal, buccal, and sublingual. In some embodiments, the pharmaceutical compositions of the invention are prepared for administration to vertebrate (e.g., mammalian) subjects in the form of liquids, including sterile, non-pyrogenic liquids for injection, emulsions, powders, aerosols, tablets, capsules, enteric coated tablets, or suppositories. Exemplary Oligomers of the Invention and Methods for Synthesizing Them
In desirable embodiments, the invention features a method of synthesizing a nucleic acid. This method involves synthesizing a 2-thio-uridine nucleoside or nucleotide of formula IV using a compound of formula VIII, IX, X, XI, or XII as shown in Figure 6. The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of the invention.
In a particular embodiment, nucleobase thiolation is performed on the 02 position of compound XI to form compound IV. In another embodiment, sulphurization on both O2 and O4 in compound VIII generates a 2,4-dithio-uridine nucleoside or nucleotide of formula X which is converted into compound IV. In yet another embodiment, a cyclic ether of formula XI is transferred into compound IV or a 2-O-alkyl-uridine nucleoside or nucleotide of formula XII through reaction with the 5' position. In other embodiments, a 2-O-alkyl- uridine nucleoside or nucleotide of formula XII is generated by direct alkylation of a uridine nucleoside or nucleotide of formula VIII.
In desirable embodiments R4 and R2 are each independently alkyl (e.g., methyl or ethyl), acyl (e.g., acetyl or benzoyl), or any appropriate protecting group such as silyl, 4,4'- dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl). R5 is any appropriate protecting group such as silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, trityl(triphenylmethyl), acetyl, benzoyl, or benzyl. In desirable embodiments, R5 is hydrogen, alkyl (e.g., methyl or ethyl), 1-propynyl, thiazol-2-yl, pyridine-2-yl, thien-2-yl, imidazol-2-yl, (4/5-methyl)-thiazol-2-yl, 3-(iodoacetamido)propyl, 4- /V,N-bis(3- aminopropyl)amino]butyι), or halo (e.g., chloro, bromo, iodo, fluoro). The group -OR3' in the formulas IV, VIII, IX, X, XI, and XII is any of the groups listed for R3 or R3' in formula la or formula lb or listed for R3 or R3* in formula Ila, Scheme A, or Scheme B, or the group -OR3' or R3' in the formulas IV, VIII, IX, X, XI, and XII is selected from the group consisting of H, -OH, P(O(CH2)2CΝ)Ν(iPr)2,P(O(CH2)2CΝ)Ν(iPr)2,
phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, arylsulfmyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4,-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin.
The group -OR5 in the formulas IV, and VIII, IX, X, and XII is any of the groups listed for R5 or R5 in formula la or formula lb or listed for R5 orR5 in formula Ila, Scheme A, or Scheme B, or the group -OR5' or R5' in the formulas IV, and VIII, IX, X, and XII is selected from the group consisting of H, -OH, P(O(CH2)2CN)N(iPr)2jP(O(CH2)2CN)N(iPr)2, phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, arylsulfmyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. In yet another aspect, the invention features a method of synthesizing a nucleic acid. This method involves synthesizing a 2-thiopyrimidine nucleoside or nucleotide of formula IV using a compound of formula III or compounds of the formula I, II, and III as shown in Figure 7. The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of the invention.
In some embodiments, lewis acid-catalyzed condensation of a substituted sugar of formula I and a substituted 2-thio-uracil of formula II results in a substituted 2-thio-uridine nucleoside or nucleotide of the formula III. In some embodiments, a compound of formula III is converted into a LNA 2-thiouridine nucleoside or nucleotide of formula IV. In desirable embodiments R4 and R5 are, e.g., methanesulfonyloxy, p- toluenesulfonyloxy, or any appropriate protecting group such as silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, trityl(triphenylmethyl), acetyl, benzoyl, or benzyl, R1 is, e.g., acetyl, benzoyl, alkoxy (e.g., methoxy). R2 is, e.g.,acetyl or benzoyl, and R3 is any appropriate protecting group such as silyl, 4,4'-dimethoxvtrityl, monomethoxytrityl, trityl(triphenylmethyι), acetyl, or benzoyl. In desirable embodiments, R5 is hydrogen, alkyl (e.g. methyl or ethyl), 1-propynyl, thiazol-2-yl, pyridine-2-yl, thien-2-yl, imidazol-2-yl, (4/5- methyl)-thiazol-2-yl, 3-(iodoacetamido)propyl, 4-[N,N-bis(3-aminopropyl)amino]butyl), or halo (e.g., chloro, bromo, iodo, fluoro).
The group -OR3' in the formulas I, III, and IV is any of the groups listed for R3 or R3 in formula la or formula lb or listed for R3 orR3 in formula Ila, Scheme A, or Scheme B, or the group -OR3 orR3 in the formulas I, III, and IV is selected from the group consisting of H, -OH, P(O(CH2)2CΝ)Ν(iPr)2; phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin.
The group R5 in the formulas I, III, and IV is any of the groups listed for R5 orR5' in formula la or formula lb or listed for R5 orR5* in formula Ila, Scheme A, or Scheme B, or R5' in the formulas I, III, and IV is selected from the group consisting of H, -OH, P(O(CH2)2CN)N(iPr)2, phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo
(e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, arylsulfmyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. In still another aspect, the invention features a method of synthesizing a nucleic acid.
This method involves synthesizing a 2-thiopyrimidine nucleoside or nucleotide of formula IV using a compound of formula VII, compounds of the formula V, VI, and VII, or compounds of the formula I, V, VI, and VII as shown in Figure 8. The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of the invention. In some embodiments, a 2-thio-uridine nucleoside or nucleotide of the formula IV is synthesized through ring-synthesis of the nucleobase by reaction of an amino sugar of the formula V and a substituted isothiocyanate of the formula VI.
In desirable embodiments, R4 and R5 are each idenpendently, e.g., methanesulfonyloxy, p-toluenesulfonyloxy, or any appropriate protecting group such as silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, trityl(triphenylmethyl), acetyl, benzoyl, or benzyl. R1 is, e.g., acetyl or benzoyl or alkoxy (e.g., methoxy), and R2 is, e.g., acetyl or benzoyl, R3 is any appropriate protecting group such as silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, trityl(triphenylmethyl), acetyl, or benzoyl. R5 are R6 each idenpendently, e.g., hydrogen or alkyl (e.g. methyl or ethyl). R6 can also be, e.g., an appropriate protecting group such as silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl). In desirable embodiments, R5 is hydrogen or methyl, and R6 is methyl or ethyl.
The group -OR3 in the formulas I, V, VII, and IV is any of the groups listed for R3 or R3 in formula la or formula lb or listed for R3 orR3* in formula Ila, Scheme A, or Scheme B, or the grou -OR orR m the formulas I, V, VII, and IV is selected from the group consisting of H, -OH, P(O(CH2)2CN)N(iPr)2j phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g.,
methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, arylsulfmyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin.
R in the formulas I, V, VII, and IV is any of the groups listed for R or R in formula la or formula lb or listed for R5 orR5* in formula Ila, Scheme A, or Scheme B, or R5' in the formulas I, V, VII, and IV is selected from the group consisting of H, -OH, P(O(CH2)2CN)N(iPr)2j phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, arylsulfinyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin.
In another aspect, the invention features a method of synthesizing a nucleic acid. This method involves synthesizing a 2-thiopyrimidine nucleoside as shown in Figure 5. In desirable embodiments, the method further comprises reacting one or both compounds of the formula 4 with a phosphodiamidite (e.g., 2-cyanoethyl tetraisopropylphosphorodiamidite) to produce the corresponding nucleoside phosphoramidite. The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of the invention.
In some embodiments, a glycosyl-donor is coupled to a nucleobase as shown in pathway A. In other embodiments, ring synthesis of the nucleobase is performed as show in pathway B. In still other embodiments, LNA-T diol is modified as shown in pathway C.
In desirable embodiments, Ris hydrogen, methyl, 1-propynyl, thiazol-2-yl, pyridine- 2-yl, thien-2-yl, imidazol-2-yl, (4/5-methyl)-thiazol-2-yl, 3-(iodoacetamido)propyl, 4-[N,N-
bis(3-aminopropyl)amino]butyl, or halo (e.g., chloro, bromo, iodo, fluoro). Desirably, Rι,R2, and R3 are each any appropriate protecting group such as acetyl, benzyl, silyl, 4,4'- dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl).
In another aspect, the invention features a method of synthesizing a nucleic acid. This method involves synthesizing a 2-thiopyrimidine nucleoside or nucleotide of formula 4 using a compound of formula 3, compounds of the formula 2 and 3, or compounds of the formula 1, 2, 3, and 4 as shown in Figure 28. The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of the invention. This method can also be performed using any other appropriate protecting groups instead of Bn (benzyl), Ac (acetyl), or Ms (methansulfonyl).
In desirable embodiments, the method further comprises reacting one or both compounds of the formula 4 with a phosphodiamidite (e.g., 2-cyanoethyl tetraisopropylphosphorodiamidite) to produce the corresponding nucleoside phosphoramidite. In another aspect, the invention features a method of synthesizing a nucleic acid. This method involves synthesizing a nucleoside or nucleotide of formula 10 or 11 using a compound of any one of the formula 6-9, compounds of the formula 5 and any one of the formulas 6-9, or compounds of the formula 4, 5, and any one of the formulas 6-9 as shown in Figure 48. The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of the invention. This method can also be performed using any other appropriate protecting groups instead of DMT, Bn, Ac, or Ms.
In some embodiments, a compound of formula 4 is used as a glycosyl donor in a coupling reaction with silylated hypoxantine to form a compound of the formula 5. In certain embodiments, a compound of the formula 5 is used in a ring closing reaction to forma compound of the formula 6. Desirably, deprotection of the 5'-hydroxy group of compound 6 is performed by displacing the 5'-O-mesyl group with sodium benzoate to produce a compound of the formula 7 that is converted into a compound of the formula 8 after saponification of the 5 '-benzoate. In some embodiments, compound 8 is converted to a DMT- protected compound 9 prior to debenzylation of the 3'-O-hydroxy group. In desirable embodiments, a phosphoramidite of the formula 11 is generated by phosphitylationof a nucleoside of the formula 10.
In desirable embodiments, the R is H or P(O(CH2)2CN)N(iPr)2. In other embodiments, the group Ri or -ORt is any of the groups listed for R3 orR3 in formula la or formula lb or listed for R3 or R3 in formula Ila, Scheme A, or Scheme B, or the group -ORi or Ri is selected from the group consisting of-OH, P(O(CH2)2CN)N(iPr)2; phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, arylsulfmyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. In another aspect, the invention features a method of synthesizing a nucleic acid. This method involves synthesizing a nucleoside or nucleotide of formula 20 or 21 as shown in Figure 3, in wliich compound 4 is the same sugar shown in the above aspect. The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of the invention. This method can also be performed using any other appropriate protecting groups instead of DMT, Bn, Bz (benzoyl), Ac, or Ms. Additionally, the method can be performed with any other halogen (e.g., fluoro or bromo) instead of chloro.
In desirable embodiments to promote the ring closing reaction, a solution of compound 14 in aqueous 1,4-dioxane is treated with sodium hydroxide to give a bicyclic compound 15. In some embodiments, sodium benzoate is used for displacement of 5'- mesylate of compound 15 to give compound 16. In some embodiments, compound 17 is formed by reaction of compound 16 with sodium azide. In some embodiments, compound 18 is produced by saponification of the 5'-benzoate of compound 17. In certain embodiments, hydrogenation of compound 18 produces compound 19. In certain embodiments, the peracelation method is used to benzolylate the 2- and 6-amino groups of compound 19, yielding 20, which is desirably converted into the phosphoramidite compound 21. In a related aspect, the invention features a derivative of a compound of the formula 20 or 21 as described in the above aspect in which 3' -OH or -OP(O(CH2)2CN)N(iPr)2 group is
replaced by any other group is selected from the group consisting of phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, arylsulfmyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin.
In yet another aspect, the invention features a method of synthesizing a nucleic acid. This method involves synthesizing a nucleoside or nucleotide of formula 20 or 21 as shown in Figure 10. The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of the invention. This method can also be performed using any other appropriate protecting groups instead of DMT.
In some embodiments, compound 17 is formed by reaction of compound 7 with! ,3 - dichloro-l,l,3,3-tetraisopropyldisiloxane. Desirably, compound 18 is formed by reaction of compound 17 with phenoxyacetic anhydride. In some embodiments, compound 19 is generated by reaction of compound 18 with acid. Desirably, compound 20 is produced by reacting compound 19 with DMT-C1. In desirably embodiments, compound 20 is reacted with 2-cyanoethyl tetraisopropylphosphorodiamidite to give the phosphoramidite 21. In desirable embodiments, the R is H or P(O(CH2)2CN)N(iPr)2. In other embodiments, the Ror -OR is any of the groups listed for R3 orR3 in formula la or formula lb or listed for R3 orR3* in formula Ila, Scheme A, or Scheme B, or the group -OR orR is selected from the group consisting of-OH, P(O(CH2)2CN)N(iPr)2j phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl,
heteroarylsulfonyl, alkylsulfinyl, arylsulfmyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. In yet another aspect, the invention features a method of synthesizing a nucleic acid. This method involves synthesizing a nucleoside or nucleotide of formula 24 or 25 as shown in Figure 54. The nucleoside, nucleoside phosphoramidite, or nucleotide is incorporated into a nucleic acid of the invention. This method can also be performed using any other appropriate protecting groups instead of Bz, Bn, and DMT. Additionally, the method can be performed with any other halogen (e.g., fluoro or bromo) instead of chloro.
In some embodiments, the compound 16 is formed from compounds 4, 14, and 15 as illustrated in an aspect above. Desirably, the 5'-O-benzoyl group of compound 16 is hydrolyzed by aqueous sodium hydroxyde to give compound 22. Compound 23 is desirably produced by incubation of compound 22 in the presence of paladium hydroxide and ammonium formate. Desirably, the 2-amine of compound 23 is selectively protected with an amidine group after treatment with N,N-dimethylformamide dimethyl acetal to yield compound 24. In some embodiments, the diol 24 is 5'-O-DMT protected and 3'-O- phosphitylated produce the phosphoramidite LΝA-2AP compound 25. In some embodiments, compound 25 has one of the following groups instead of the
P(O(CH2)2CN)N(iPr)2 group: any of the groups listed for R3 orR3 in formula la or formula lb or listed for R3 orR3 in formula Ila, Scheme A, or Scheme B, or a group selected from the group consisting of-OH, phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, methyl phosphonate, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl, heteroarylsulfonyl, alkylsulfinyl, arylsulfmyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine,
ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin.
In another aspect, the invention features a nucleic acid of the invention that includes a compound of the formula 6pCor the product of a compound of the formula 6pC treated with ammonia as described herein. In a related aspect, the invention features a method of synthesizing a nucleic acid that involves performing one or more of the steps described herein for the synthesis of a compound of the formula 6pCor the product of a compound of the foπnula 6pC treated with ammonia.
In yet another aspect, the invention features a method of synthesizing a nucleic acid. This method involves one or more of any of the nucleosides or nucleotides of the invention with (i) any other nucleoside or nucleotide of the invention, (ii) any other nucleoside or nucleotide of formula la, formula lb, formula Ila, Scheme A, or Scheme B, and/or (iii) any naturally-occurring nucleoside or nucleotide. Desirably, the method involves reacting one or more nucleoside phosphoramidites of any of the above aspects with a nucleotide or nucleic acid.
Methods for Synthesis of Nucleic Acids on a Solid Support
In another aspect, the invention provides a method for the synthesis of a population of nucleic acids (e.g., a population of nucleic acids of the invention) on a solid support. This method involves the reaction of a plurality of nucleoside phosphoramidites with an activated solid support (e.g., a solid support with an activated linker) and the subsequent reaction of a plurality of nucleoside phosphoramidites with activated nucleotides or nucleic acids bound to the solid support.
In some embodiments of any of the above aspects, the solid support or the growing nucleic acid bound to the solid support is activated by illumination, a photogenerated acid, or electric current. In desirable embodiments, one or more spots or regions (e.g., a region with an area of less than 1 cm , 0.1 cm , 0.01 cm , 1 mm , or 0.1 mm that desirably contains one particular nucleic acid monomer or oligomer) on the solid support are irradiated to produce a photogenerated acid that removes the 5'-OH protecting group of one or more nucleic acid monomers or oligomers to which a nucleotide is subsequently added. In other embodiments, an electric current is applied to one or more spots or regions (e.g., a region with an area of less than 1 cm2, 0.1 cm2, 0.01 cm2, 1 mm2, or 0.1 mm2 that desirably contains one particular nucleic acid monomer or oligomer) on the solid support to remove an electrochemically sensitive protecting group of one or more nucleic acid monomers or oligomers to which a
nucleotide is subsequently added. In still other embodiments, one or more spots or regions
9 9 9 9
(e.g., a region with an area of less than 1 cm , 0.1 cm , 0.01 cm , 1 mm , or 0.1 mm that desirably contains one particular nucleic acid monomer or oligomer) on the solid support are irradiated to remove a photosensitive protecting group of one or more nucleic acid monomers 5 or oligomers to which a nucleotide is subsequently added. In various embodiments, the solid support (e.g., chip, coverslip, microscope glass slide, quartz, or silicon) is less than 1, 0.5, 0.1. or 0.05 mm thick. Methods for the Synthesis of Nucleic Acids
In another aspect, the invention features a method of reacting a population of nucleic
10 acids of the invention with one or more nucleic acids. This method involves incubating an immobilized population of nucleic acids of the invention with a solution that includes one or more probes (e.g., at least 2, 3, 4, 5, 10, 15, 20, 30, 40, 50, 60, 80, 100, or 150 different nucleic acids) and one or more target nucleic acids (e.g., at least 2, 3, 4, 5, 10, 15, 20, 30, 40, 50, 60, 80, 100, or 150 different target nucleic acids). The incubation is performed in the
15 presence of a ligase under conditions that allow the ligase to covalently react one or more immobilized nucleic acids with one or more nucleic acid probes in solution that hybridize to the same target nucleic acid. Desirably, at least 2, 5, 10, 15, 20, 30, 40, 50, 80, or 100 pairs of immobilized nucleic acids and nucleic acid probes are ligated. In various embodiments, the incubation occurs between 15 and 45°C, such as between 20 and 40°C or between 25 and
20 35°C
Desirable Embodiments of Any of the Aspects of the Invention
In other embodiments of any of various aspects of the invention, a nucleic acid probe or primer specifically hybridizes to a target nucleic acid but does not substantially hybridize to non-target molecules, which include other nucleic acids in a cell or biological sample
25 having a sequence that is less than 99, 95, 90, 80, or 70% identical or complementary to that of the target nucleic acid. Desirably, the amount of the these non-target molecules hybridized to, or associated with, the nucleic acid probe or primer, as measured using standard assays, is 2-fold, desirably 5-fold, more desirably 10-fold, and most desirably 50-fold lower than the amount of the target nucleic acid hybridized to, or associated with, the nucleic acid probe or
30 primer. In other embodiments, the amount of a target nucleic acid hybridized to, or associated with, the nucleic acid probe or primer, as measured using standard assays, is 2- fold, desirably 5-fold, more desirably 10-fold, and most desirably 50-fold greater than the amount of a control nucleic acid hybridized to, or associated with, the nucleic acid probe or
primer. In certain embodiments, the nucleic acid probe or primer is substantially complementary (e.g., at least 80, 90, 95, 98, or 100% complementary) to a target nucleic acid or a group of target nucleic acids from a cell. In other embodiments, the probe or primer is homologous to multiple RNA or DNA molecules, such as RNA or DNA molecules from the same gene family. In other embodiments, the probe or primer is homologous to a large number of RNA or DNA molecules. In desirable embodiments, the probe or primer binds to nucleic acids which have polynucleotide sequences that differ in sequence at a position that corresponds to the position of a universal base in the probe or primer. Examples of control nucleic acids include nucleic acids with a random sequence or nucleic acids known to have little, if any, affinity for the nucleic acid probe or primer. In some embodiments, the target nucleic acid is an RNA, DNA, or cDNA molecule.
Desirably, the association constant (Ka) of the nucleic acid toward a complementary target molecule is higher than the association constant of the complementary strands of the double stranded target molecule. In some desirable embodiments, the melting temperature of a duplex between the nucleic acid and a complementary target molecule is higher than the melting temperature of the complementary strands of the double stranded target molecule.
In some embodiments, the LNA-pyrene is in a position corresponding to the position of a non-base (e.g., a unit without a base) in another nucleic acid, such as a target nucleic acid. Incorporation of pyrene in a DNA strand that is hybridized against the four natural bases decreases the Tm by -4.5°C to -6.8°C; however, incorporation of pyrene in a DNA strand in a position opposite a non-base only decreases the Tm by -2.3°C to -4.6°C, most likely due to the better accommodation of the pyrene in the B-type duplex (Matray and Kool, J. Am. Chem. Soc. 120, 6191, 1998). Thus, incorporation on LNA-pyrene into a nucleic acid in a position opposite a non-base (e.g., a unit without a base or a unit with a small group such as a noncyclic group instead of a base) in a target nucleic acid may also minimize any potential decrease in Tm due to the pyrene substitution
In various embodiments, the number of molecules in the plurality of nucleic acids of the invention is at least 2, 4, 5, 6, 7, 8, or 10-fold greater than the number of molecules in the test nucleic acid sample. In some embodiments, a LNA is a triplex-forming oligonucleotide. In desirable embodiments of any of the aspects of the invention, the target nucleic acids (e.g., cDNA molecules reverse transcribed from a patient sample or cRNA molecules amplified from a patient sample using a T7 RNA polymerase-based amplification system or the like) are fragmented using an enzyme such as a uracil-DNA glycosylase (e.g., E. coli
uracil-DNA glycosylase) or using chemical hydrolysis such as alkaline hydrolysis. In various embodiments, the average size of the fragmented nucleic acids is between 300 and 50 nucleic acids, such as approximately 300, 200, 100, or 50 nucleotides. Advantages The present invention has a variety of advantages related to nucleic acid analysis methods. The ability to equalize melting temperatures of a series of nucleic acids is generally applicable and desirable in all situations where more than one sequence is used simultaneously (e.g. DNA arrays with more than one capture probe, PCR and especially multiplex PCR, homogeneous assays such as Taqman and Molecular beacon). Sample preparation of specific sequences (e.g., DNA or RNA extraction using capture probes on filters or magnetic beads) is another area where melting temperature equalization of specific probe sequences is useful.
For example, the invention provides high affinity nucleotides (e.g., LNA and other high affinity nucleotides with a modified base and/or backbone) that can be used, e.g., arrays of the invention. In particular, the nucleic acids of the invention containing LNA units exhibited a suprising ability to discriminate between different mRNA splice variants compared to naturally-occurring nucleic acids. If desired, universal bases can be added as part of flanking regions in capture probes (e.g., probes of an array) to stabilize hybridization with high affinity nucleotides in the capture probes. Replacement of one or more DNA-t nucleotides with LNA-T and/or replacement of one or more DNA-a nucleotides with LNA-A reduces the variability of melting temperatures for capture probes of similar length but different GC and AT content by desirably at least 10, 20, 30, 40 or 50%. Additionally, replacement of one or more DNA-t nucleotides with LNA-T and or replacement of one or more DNA-c with LNA-C increases the stability of a large number of capture probes, while desirably avoiding self-complementary sequences with LNA:LNA base-pairs within a capture probe that would otherwise reduce or eliminate the binding of target molecules to the probe. Although a general T and C substitution may not reduce the variability of melting temperatures of the probes, this substitution increases the melting temperature and binding efficiency of many capture probes that contain these two nucleotides. The invention also provides a general substitution algorithm for enhancement of the hybridization signal of a test nucleic acid sample by inclusion of high affinity monomers (e.g., LNA and other high affinity nucleotides with a modified base and/or backbone) in the array. This method increases the stability and binding affinity of capture probes while
avoiding substitutions in positions that may form self-complementary base-pairs which may otherwise inhibit binding to a target molecule. The substitution algorithm is broadly useful for specialized arrays, as well as for PCR primers and FISH probes.
Other features and advantages of the invention will be apparent from the following detailed description. Definitions
When used herein, the term "LNA" (Locked Nucleoside Analogues) refers to nucleoside analogues (e.g., bicyclic nucleoside analogues, e.g., as disclosed in WO 9914226) either incorporated in an oligonucleotide or as a discrete chemical species (e.g., LNA nucleoside and LNA nucleotide). Furthermore, the term "LNA" includes the compounds as described in the present specificatiion including the compounds described in Example 17. The term "monomeric LNA" may, e.g., refer to the monomers LNA A, LNA T, LNA C, or any other LNA monomers.
By "LNA unit" is meant an individual LNA monomer (e.g., an LNA nucleoside or LNA nucleotide) or an oligomer (e.g., an oligonucleotide or nucleic acid) that includes at least one LNA monomer. LNA units as disclosed in WO 99/14226 are in general particularly desirable modified nucleic acids for incorporation into an oligonucleotide of the invention. Additionally, the nucleic acids may be modified at either the 3' and/or 5' end by any type of modification known in the art. For example, either or both ends may be capped with a protecting group, attached to a flexible linking group, attached to a reactive group to aid in attachment to the substrate surface, etc. Desirable LNA units and their method of synthesis also are disclosed in WO 0056746, WO 0056748, WO 0066604, Morita et al, Bioorg. Med. Chem. Lett. 12(l):73-76, 2002; Hakansson et al. , Bioorg. Med. Chem. Lett. l l(7):935-938, 2001; Koshkin et al, J. Org. Chem. 66(25):8504-8512, 2001; Kvaemo et al, J. Org. Chem. 66(16):5498-5503, 2001; Hakansson etal, J. Org. Chem. 65(17):5161-5166, 2000; Kvaemo et al, J. Org. Chem. 65(17):5167-5176, 2000; Pfundheller et al, Nucleosides Nucleotides 18(9):2017-2030, 1999; and Kumar et al, Bioorg. Med. Chem. Lett. 8(16):2219-2222, 1998.
By "LNA modified oligonucleotide" is meant a oligonucleotide comprising at least one LNA monomeric unit of the general scheme A, described infra, having the below described illustrative examples of modifications:
wherein X is selected from -O-, -S-, -N(RN , -C(R°R6 )-, -O-C(R7R7 )-, -C(R6R6 )-O-, -S- C(R7R7*)-, -C(R6R6*)-S-, -N(RN*)-C(R7R7*)-, -C(R6R6*)-N(RN*)-, and -C(R6R6*)-C(R7R7*). B is selected from a modified base as discussed above e.g. an optionally substituted carbocyclic aryl such as optionally substituted pyrene or optionally substituted pyrenylmethylglycerol, or an optionally substituted heteroalicylic or optionally substituted heteroaromatic such as optionally substituted pyridyloxazole, optionally substituted pyrrole, optionally substituted diazole or optionally substituted triazole moieties; hydrogen, hydroxy, optionally substituted C^-alkoxy, optionally substituted C^-alkyl, optionally substituted Ci. 4-acyloxy, nucleobases, DNA intercalators, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands.
P designates the radical position for an intemucleoside linkage to a succeeding monomer, or a 5'-terminal group, such intemucleoside linkage or 5'-terminal group optionally including the substituent R . One of the substituents R , R , R , and R is a group P* which designates an intemucleoside linkage to a preceding monomer, or a 273 '-terminal group. The substituents of R1*, R4*, R5, R5*, R6, R6*, R7, R7*, RN, and the ones of R2, R2*, R3, and R3* not designating P* each designates a biradical comprising about 1-8 groups/atoms selected from -C(RaRb)-, -C(Ra)=C(Ra)-, -C(Ra)=N-, -C(Ra)-O-, -O-, -Si(Ra)2-, -C(R )-S, -S-, -SO2-, - C(Ra)-N(Rb)-, -N(Ra)-, and >C=Q, wherein Q is selected from -O-, -S-, and -N(R )-, and Ra and Rb each is independently selected from hydrogen, optionally substituted Ct-π-alkyl, optionally substituted C2.ι2-alkenyl, optionally substituted C2-ι2-alkynyl, hydroxy, Cι_ι2- alkoxy, C2.12-alkenyloxy, carboxy, Cι._12-alkoxycarbonyl, d-n-alkylcarbonyl, formyl, aryl, aryloxy-carbonyl, aryloxy, arylcarbonyl, heteroaryl, hetero-aryloxy-carbonyl, heteroaryloxy, heteroarylcarbonyl, amino, mono- and di(Cι-6-alkyl)amino, carbamoyl, mono- and di(C!-6- alkyl)-amino-carbonyl, amino-Cι.6-alkyl-aminocarbonyl, mono- and
6-alkyl-aminocarbonyl, Ci-β-alkyl-carbonylamino, carbamido, Ci.6-alkanoyloxy, sulphono, Ci-6-alkylsulphonyloxy, nitro, azido, sulphanyl, Cι.6-alkylthio, halogen, DNA intercalators,
photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands, where aryl and heteroaryl may be optionally substituted, and where two geminal substituents Ra and Rb together may designate optionally substituted methylene (=CH2), and wherein two non-geminal or geminal substituents selected from Ra, Rb, and any of the substituents R1*, R2, R2*, R3, R3*, R4*, R5, R5*, R6 and R6*, R7, and R7* which are present and not involved in P, P or the biradical(s) together may form an associated biradical selected from biradicals of the same kind as defined before; the pair(s) of non-geminal substituents thereby forming a mono- or bicyclic entity together with (i) the atoms to which said non-geminal substituents are bound and (ii) any intervening atoms. Each of the substituents R1*, R2, R2*, R3, R4*, R5, R5*, R6 and R6*, R7, and R7* which are present and not involved in P, P or the biradical(s), is independently selected from hydrogen, optionally substituted Cι_ι2-alkyl, optionally substituted C2_ι2-alkenyl, optionally substituted C2_1 -alkynyl, hydroxy,
C2.12-alkenyloxy, carboxy, Cι-ι2- alkoxycarbonyl, Cι-12-alkylcarbonyl, formyl, aryl, aryloxy-carbonyl, aryloxy, arylcarbonyl, heteroaryl, heteroaryloxy-carbonyl, heteroaryloxy, heteroarylcarbonyl, amino, mono- and di Ci-θ-alky amino, carbamoyl, mono- and
aminocarbonyl, mono- and d^Ci-e-alky^amino-Ci-e-alkyl-ammocarbonyl, Cι_6-alkyl- carbonylamino, carbamido, Cj.-6-alkanoyloxy, sulphono, Ci-e-alkylsulphonyloxy, nitro, azido, sulphanyl, Cι-6-alkylthio, halogen, DNA intercalators, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands, where aryl and heteroaryl may be optionally substituted, and where two geminal substituents together may designate oxo, thioxo, imino, or optionally substituted methylene, or together may form a spiro biradical consisting of a 1-5 carbon atom(s) alkylene chain which is optionally interrupted and/or terminated by one or more heteroatoms/groups selected from -O-, -S-, and -(NRN)- where RN is selected from hydrogen and Cι-4-alkyl, and where two adjacent (non- geminal) substituents may designate an additional bond resulting in a double bond; and RN*, when present and not involved in a biradical, is selected from hydrogen and Cj.-4-alkyl; and basic salts and acid addition salts thereof.
Exemplary 5', 3', and/or 2' terminal groups include -H, -OH, halo (e.g., chloro, fluoro, iodo, or bromo), optionally substituted aryl, (e.g., phenyl or benzyl), alkyl (e.g, methyl or ethyl), alkoxy (e.g., methoxy), acyl (e.g. acetyl or benzoyl), aroyl, aralkyl, hydroxy, hydroxyalkyl, alkoxy, aryloxy, aralkoxy, nitro, cyano, carboxy, alkoxycarbonyl, aryloxycarbonyl, aralkoxycarbonyl, acylamino, aroylamine, alkylsulfonyl, arylsulfonyl,
heteroarylsulfonyl, alkylsulfinyl, arylsulfmyl, heteroarylsulfinyl, alkylthio, arylthio, heteroarylthio, aralkylthio, heteroaralkylthio,amidino, amino, carbamoyl, sulfamoyl, alkene, alkyne, protecting groups (e.g., silyl, 4,4'-dimethoxytrityl, monomethoxytrityl, or trityl(triphenylmethyl)), linkers (e.g., a linker containing an amine, ethylene glycol, quinone such as anthraquinone), detectable labels (e.g., radiolabels or fluorescent labels), and biotin. It is understood that references herein to a nucleic acid unit, nucleic acid residue, LNA unit, or similar term are inclusive of both individual nucleoside units and nucleotide units and nucleoside units and nucleotide units within an oligonucleotide.
A "modified base" or other similar term refers to a composition (e.g., a non-naturally occuring nucleobase or nucleosidic base) which can pair with a natural base (e.g., adenine, guanine, cytosine, uracil, and/or thymine) and/or can pair with a non-naturally occurring nucleobase or nucleosidic base. Desirably, the modified base provides a Tm differential of 15, 12, 10, 8, 6, 4, or 2°C or less as described herein. Exemplary modified bases are described in EP 1 072 679 and WO 97/12896. By "nucleobase" is meant the naturally occurring nucleobases adenine (A), guanine
(G), cytosine (C), thymine (T) and uracil (U) as well as non-naturally occurring nucleobases such as xanthine, diaminopurine, 8-oxo-N6-methyladenine, 7-deazaxanthine, 7-deazaguanine, N4,N4-ethanocytosin, N6,N6-ethano-2,6-diaminopurine, 5-methylcytosine (mC), 5-(C3-C6)- alkynyl-cytosine, 5-fluorouracil, 5-bromouracil, pseudoisocytosine, 2-hydroxy-5-methyl-4- triazolopyridin, isocytosine, isoguanine, inosine and the "non-naturally occurring" nucleobases described in Benner et al., U.S. Pat No. 5,432,272 and Susan M. Freier and Karl- Heinz Altmann, Nucleic Acids Research, 1997, vol. 25, pp 4429-4443. The term "nucleobase" thus includes not only the known purine and pyrimidine heterocycles, but also heterocyclic analogues and tautomers thereof. Further naturally and non-naturally occurring nucleobases include those disclosed in U.S. Pat. No. 3,687,808 (Merigan, et al.), in Chapter 15 by Sanghvi, in Antisense Research and Application, Ed. S. T. Crooke and B. Lebleu, CRC Press, 1993, in Englisch et al., Angewandte Chemie, International Edition, 1991, 30, 613-722 (see especially pages 622 and 623, and in the Concise Encyclopedia of Polymer Science and Engineering, J. I. Kroschwitz Ed., John Wiley & Sons, 1990, pages 858-859, Cook, Anti- Cancer Drag Design 1991, 6, 585-607, each of which are hereby incorporated by reference in their entirety). The term "nucleosidic base" or "base unit" is further intended to include compounds such as heterocyclic compounds that can serve like nucleobases including certain "universal bases" that are not nucleosidic bases in the most classical sense but serve as
nucleosidic bases. Especially mentioned as universal bases are 3-nitropyrrole, optionally substituted indoles (e.g., 5-nitroindole), and optionally substituted hypoxanthine. Other desirable universal bases include, pyrrole, diazole or triazole derivatives, including those universal bases known in the art. As described herein, various groups of an LNA unit may be optionally substituted. A
"substituted" group such as a nucleobase or nucleosidic base and the like may be substituted by other than hydrogen at one or more available positions, typically 1 to 3 or 4 positions, by one or more suitable groups such as those disclosed herein. Suitable groups that may be present on a "substituted" group include e.g. halogen such as fluoro, chloro, bromo and iodo; cyano; hydroxyl; nitro; azido; alkanoyl such as a Cι-6 alkanoyl group such as acyl and the like; carboxamido; alkyl groups including those groups having 1 to about 12 carbon atoms, or 1, 2, 3, 4, 5, or 6 carbon atoms; alkenyl and alkynyl groups including groups having one or more unsaturated linkages and from 2 to 12 carbon, or 2, 3, 4, 5 or 6 carbon atoms; alkoxy groups including those having one or more oxygen linkages and from 1 to about 12 carbon atoms, or 1, 2, 3, 4, 5 or 6 carbon atoms; aryloxy such as phenoxy; alkylthio groups including those moieties having one or more thioether linkages and from 1 to about 12 carbon atoms, or 1, 2, 3, 4, 5 or 6 carbon atoms; alkylsulfinyl groups including those moieties having one or more sulfinyl linkages and from 1 to about 12 carbon atoms, or 1, 2, 3, 4, 5, or 6 carbon atoms; alkylsulfonyl groups including those moieties having one or more sulfonyl linkages and from 1 to about 12 carbon atoms, or 1, 2, 3, 4, 5, or 6 carbon atoms; aminoalkyl groups such as groups having one or more N atoms and from 1 to about 12 carbon atoms, or 1, 2, 3, 4, 5 or 6 carbon atoms; carbocyclic aryl having 6 or more carbons; aralkyl having 1 to 3 separate or fused rings and from 6 to about 18 carbon ring atoms, with benzyl being a desirable group; aralkoxy having 1 to 3 separate or fused rings and from 6 to about 18 carbon ring atoms, with O-benzyl being a desirable group; or a heteroaromatic or heteroalicyclic group having 1 to 3 separate or fused rings with 3 to about 8 members per ring and one or more N, O or S atoms, e.g. coumarinyl, quinolinyl, pyridyl, pyrazinyl, pyrimidyl, furyl, pyrrolyl, thienyl, thiazolyl, oxazolyl, imidazolyl, indolyl, benzofuranyl, benzothiazolyl, tetrahydrofuranyl, tetrahydropyranyl, piperidinyl, morpholino and pyrrolidinyl. By "oxy-LNA monomer or unit" is meant any nucleoside or nucleotide which contains an oxygen atom in a 2'-4' linkage.
A "non-oxy-LNA" monomer or unit is broadly defined as any nucleoside or nucleotide which does not contain an oxygen atom in a 2'-4'- linkage. Examples of non-oxy-
LNA monomers include 2'-deoxynucleotides (DNA) or nucleotides (RNA) or any analogues of these monomers which are not oxy-LNA, such as for example the thio-LNA and amino- LNA described herein with respect to formula la and in Singh et al. J. Org. Chem. 1998, 6, 6078-9, and the derivatives described in Susan M. Freier and Karl-Heinz Altmann, Nucleic Acids Research, 1997, vol 25, pp 4429-4443.
By "universal base" is meant a naturally-occurring or desirably a non-naturally occurring compound or moiety that can pair with a natural base (e.g., adenine, guanine, cytosine, uracil, and/or thymine), and that has a Tm differential of 15, 12, 10, 8, 6, 4, or 2°C or less as described herein. By "oligonucleotide," "oligomer," or "oligo" is meant a successive chain of monomers (e.g., glycosides of heterocyclic bases) connected via intemucleoside linkages. The linkage between two successive monomers in the oligo consist of 2 to 4, desirably 3, groups/atoms selected from -CH2-, -O-, -S-, -NRH-, >C=O, >C=NRH, >C=S, -Si(R")2-, -SO-, -S(O)2-, -P(O)2-, -PO(BH3)-, -P(O,S)-, -P(S)2-, -PO(R" , -PO(OCH3)-, and -PO(NHRH)-, where RH is selected from hydrogen and C -alkyl, and R" is selected from Cι.6-alkyl and phenyl. Illustrative examples of such linkages are -CH2-CH2-CH2-, -CH2-CO-CH2-) -CH2- CHOH-CH2-, -O-CH2-O-, -O-CH2-CH2-, -O-CH2-CH= (including R5 when used as a linkage to a succeeding monomer), -CH2-CH2-O-, -NRH-CH2-CH2-, -CH2-CH2-NRH-, -CH2-NRH~ CH2-, -O-CH2-CH2-NRH-, -NRH-CO-O-, -NRH-CO-NRH-, -NRH-CS-NRH-, -NRH-C(=NRH)-NRH-, -NRH-CO-CH2-NRH-, -O-CO-O-, -O-CO-CH2-O-, -O-CH2-CO-O-, -CH2-CO-NRH-, -O-CO-NRH-, -NRH-CO-CH2-, -O-CH2-CO-NRH-, -O-CH2-CH2-NRH-, -CH=N-O-, -CH2-NRH-O-, -CH2-O-N= (including R5 when used as a linkage to a succeeding monomer), -CH2-O-NRH-, -CO-NRH-CH2-, -CH2-NRH-O-, -CH2-NRH-CO-, -O-NRH-CH2-, -O-NRH-, -O-CH2-S-, -S-CH2-O-, -CH2-CH2-S-, -O-CH2-CH2-S-, -S-CH2-CH= (including R5 when used as a linkage to a succeeding monomer), -S-CH2-CH -, -S-CH2-CH2-O-, -S-CH2- CH2-S-, -CH2-S-CH2-, -CH2-SO-CH2-, -CH2-SO2-CH2-, -O-SO-O-, -O-S(O)2-O-, -O-S(O)2- CH2-, -O-S(O)2-NRH-, -NRH-S(O)2-CH2-, -O-S(O)2-CH2-, -O-P(O)2-O-, -O-P(O,S)-O-, -O- P(S)2-O-, -S-P(O)2-O-, -S-P(O,S)-O-, -S-P(S)2-O-, -O-P(O)2-S-, -O-P(O,S)-S-, -O-P(S)2-S-, -S-P(O)2-S-, -S-P(O,S)-S-, -S-P(S)2-S-, -O-PO(R")-O-, -O-PO(OCH3)-O-, -O-PO- (OCH2CH3)-O-, -O-PO(OCH2CH2S-R)-O-, -O-PO(BH3)-O-, -O-PO(NHRN)-O-, -O-P(O)2- NRH-, -NRH-P(O)2-O-, -O-P(O,NRH)-O-, -CH2-P(O)2-O-, -O-P(O)2-CH2-, and -O-Si(R")2-O-; among which -CH2-CO-NRH-, -CH2-NRH-O-, -S-CH2-O-, -O-P(O)2-O-, -O-P(O,S)-O-, -O-P(S)2-O-, -NRH-P(O)2-O-, -O-P(O,NRH)-O-, -O-PO(R")-O-, -O-PO(CH3)-O-, and
-O-PO(NHRN)-O-, where RH is selected form hydrogen and Cι-4-alkyl, and R" is selected from Ci-β-alkyl and phenyl, are especially desirable. Further illustrative examples are given in Mesmaeker et. al, Current Opinion in Stractural Biology 1995, 5, 343-355 and Susan M. Freier and Karl-Heinz Altmann, Nucleic Acids Research, 1997, vol 25, pp 4429-4443. The left-hand side of the intemucleoside linkage is bound to the 5-membered ring as substituent P at the 3 '-position, whereas the right-hand side is bound to the 5 '-position of a preceding monomer.
By "succeeding monomer" is meant the neighboring monomer in the 5'-terminal direction, and by "preceding monomer" is meant the neighboring monomer in the 3 '-terminal direction.
By "LNA spiked oligo" is meant an oligonucleotide, such as a DNA oligonucleotide, wherein at least one unit (and preferably not all units) has been substituted by the corresponding LNA nucleoside monomer.
The term " Tm " is used in reference to the "melting temperature." The melting temperature is the temperature at which 50% of a population of double-stranded nucleic acid molecules becomes dissociated into single strands. The equation for calculating the Tm of nucleic acids is well-known in the art. The Tm of a hybrid nucleic acid is often estimated using a formula adopted from hybridization assays in 1 M salt, and commonly used for calculating Tm for PCR primers: Tm =[(number of A+T) x 2°C + (number of G+C) x 4°C]. C. R. Newton et al. PCR, 2nd Ed., Springer-Neriag (New York: 1997), p. 24. This formula was found to be inaccurate for primers longer that 20 nucleotides. Id. Other more sophisticated computations exist in the art which take stractural as well as sequence characteristics into account for the calculation of Tm. A calculated Tm is merely an estimate; the optimum temperature is commonly determined empirically. A nucleic acid compound that has a Tm differential of a specified amount (e.g., less than 15, 12, 10, 8, 6, 4, 2, or 1°C) means the nucleic acid exhibits that specified Tm differential when incorporated into a specified 9-mer oligonucleotide with respect to the four complementary variants, as defined immediately below.
Unless otherwise indicated, a Tm differential provided by a particular modified base is calculated by the following protocol (steps a) through d)): a) incorporating the modified base of interest into the following oligonucleotide 5'-d(GTGAMATGC), wherein is the modified base;
b) mixing 1.5 x 10"6M of the oligonucleotide having incorporated therein the modified base with each of 1.5x10"6M of the four oligonucleotides having the sequence 3'- d(CACTYTACG), wherein Y is A, C, G, T, respectively, in a buffer of lOmM sodium phosphate, 100 mM sodium chloride, 0.1 mM EDTA, pH 7.0; c) allowing the oligonucleotides to hybridize; and d) detecting the Tm for each of the four hybridized nucleotides by heating the hybridized nucleotides and observing the temperature at which the maximum of the first derivative of the melting curve recorded at a wavelength of 260 nm is obtained.
Unless otherwise indicated, a Tm differential for a particular modified base is determined by subtracting the highest Tm value determined in steps a) tlirough d) immediately above from the lowest Tm value determined by steps a) through d) immediately above.
By "variance in Tm is meant the variance in the values of the melting temperatures for a population of nucleic acids. The Tm for each nucleic acid is determined by experimentally measuring or computationally predicting the temperature at which 50% of a population double-stranded molecules with the sequence of the nucleic acid becomes dissociated into single strands. For a nucleic acid with only A, T, C, G, and/or U bases, the Tm is the temperature at which 50% of a population of 100% complementary double-stranded molecules with the sequence of the nucleic acid becomes dissociated into single strands. For determining the Tm variance when a nucleic acid has one or more nucleobases other than A, T, C, G, or U, the Tm of this "modified" nucleic acid is approximated by determining the Tm for each possible double stranded molecule in which one strand is the modified nucleic acid and the other strand has either A, T, C, or G in each position corresponding to a nucleobase other than A, T, C, G, or U in the modified nucleic acid. For example, if the modified nucleic acid has the sequence XMX in which X is 0, 1, or more A, T, C, G, or U bases and M is any other nucleobase or nucleosidic base, the Tm is calculated for each possible double stranded molecule in which one strand is XMX and the other strand is X' YX' in which X' is the base complementary to the corresponding X base and Y is either A, T, C, or G. The average is then calculated for the Tm values for each possible double stranded molecule (i.e., four possible duplexes per modified nucleobase or nucleoside base in the modified nucleic acid) and used as the approximate Tm value for the modified nucleic acid.
By "capture efficiency" is meant the amount of target nucleic acid(s) bound to a particular nucleic acid or a population of nucleic acids. Standard methods can be used to
calculate the capture efficiency by measuring the amount of bound target nucleic acid(s) and/or measuring the amount of unbound target nucleic acid(s). The capture efficiency of a nucleic acid or nucleic acid population of the invention is typically compared to the capture efficiency of a control nucleic acid or nucleic acid population under the same incubation conditions (e.g., using same buffer and temperature).
For example, a control nucleic acid may have β-D-2-deoxyribose instead of one or more bicyclic or sugar groups of a LNA unit or other modified or non-naturally-occurring units in a nucleic acid of the invention. In some embodiments, the nucleic acid of the invention and the control nucleic acid only have naturally-occurring nucleobases. If a nucleic acid of the invention has one or more non-naturally-occurring nucleobases, the capture efficiency of the corresponding control nucleic acid is calculated as the average capture efficiency for all of the nucleic acids that have either A, T, C, or G in each position corresponding to a non-naturally-occurring nucleobase in the nucleic acid of the invention. Monomers are referred to as being "complementary" if they contain nucleobases that can form hydrogen bonds according to Watson-Crick base-pairing rules (e.g., G with C, A with T, or A with U) or other hydrogen bonding motifs such as for example diaminopurine with T, inosine with C, and pseudoisocytosine with G.
By "substantially complementarity" is meant having a sequence that is at least 60, 70, 80, 90, 95, or 100% complementary to that of another sequence. Sequence complementarity is typically measured using sequence analysis software with the default parameters specified therein (e.g., Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, WI 53705). This software program matches similar sequences by assigning degrees of homology to various substitutions, deletions, and other modifications. The term "homology" refers to a degree of complementarity. There can be partial homology or complete homology (i.e., identity). A partially complementary sequence that at least partially inhibits a completely complementary sequence from hybridizing to a target nucleic acid is referred to using the functional term "substantially homologous."
When used in reference to a double-stranded nucleic acid sequence such as a cDNA or genomic clone, the term "substantially homologous" refers to a probe that can hybridize to a strand of the double-stranded nucleic acid sequence under conditions of low stringency, e.g. using a hybridization buffer comprising 20% formamide in 0.8M saline/0.08M sodium citrate
(SSC) buffer at a temperature of 37°C and remaining bound when subject to washing once with that SSC buffer at 37°C.
When used in reference to a single-stranded nucleic acid sequence, the term "substantially homologous" refers to a probe that can hybridize to (i.e., is the complement of) the single-stranded nucleic acid template sequence under conditions of low stringency, e.g. using a hybridization buffer comprising 20% fomiamide in 0.8M saline/0.08M sodium citrate (SSC) buffer at a temperature of 37°C and remaining bound when subject to washing once with that SSC buffer at 37°C.
By "internal probe" is meant a nucleic acid (e.g., a probe or primer) that hybridizes to either only one exon or only one intron of a nucleic acid (e.g., mRNA). The internal probe may hybridize to the 5' end of the exon or intron, the 3' end of the exon or intron, or between the 5' end and the 3' end of the exon or intron. Desirably, the internal probe is at least 90, 95, 96, 97, 98, 99, or 100% identical to the corresponding region of a target nucleic acid.
By "merged probe" is meant a nucleic acid (e.g., a probe or primer) that hybridizes to more than one exon and/or intron of a nucleic acid (e.g., mRNA). Desirably, the merged probe hybridizes to two consecutive exons (e.g., exons in a mature mRNA transcript that may or may not be consecutive in the corresponding DNA molecule). In another desirable embodiment, the merged probe hybridizes to an exon and the consecutive intron. In desirable embodiments, the merged probe hybridizes to the same number of nucleotides in each exon or to the same number of nucleotides in the exon and intron. In various embodiments, the length of the region of the merged probe that hybridizes to one exon differs by less than 60, 40, 20, 10, or 5% from the length of the region of the merged probe that hybridizes to the other exon or to the intron. Desirably, the merged probe is at least 90, 95, 96, 91, 98, 99, or 100% identical to the corresponding region of a target nucleic acid. By "poly-T2o tail " is meant a DNA polymer consisting of 20 DNA-t units added by polymerase chain reaction as a tail to a nucleic acid sequence, which is subsequently cloned in a plasmid vector allowing in vitro synthesis of poly(A)2o polyadenylated RNA.
By "mixmer" or "mixmer probe" is meant a nucleic acid (e.g., a probe or primer) that contains at least one LNA unit and at least one RNA or DNA unit (e.g., a naturally-occurring RNA or DNA unit).
By "corresponding unmodified reference nucleobase" is meant a nucleobase that is not part of an LNA unit and is in the same orientation as the nucleobase in an LNA unit. ,
By "mutation" is meant an alteration in a naturally-occurring or reference nucleic acid sequence, such as an insertion, deletion, frameshift mutation, silent mutation, nonsense mutation, or missense mutation. Desirably, the amino acid sequence encoded by the nucleic acid sequence has at least one amino acid alteration from a naturally-occurring sequence. By "selecting" is meant substantially partitioning a molecule from other molecules in a population. Desirably, the partitioning provides at least a 2-fold, desirably, a 30-fold, more desirably, a 100-fold, and most desirably, a 1, 000-fold enrichment of a desired molecule relative to undesired molecules in a population following the selection step. The selection step may be repeated a number of times, and different types of selection steps may be combined in a given approach. The population desirably contains at least 109 molecules, more desirably at least 1011, 1013, or 1014 molecules and, most desirably, at least 1015 molecules.
By a "population" is meant more than one nucleic acid. A "population" according to the invention desirably means more than 101, 102, 103, or 10 different molecules. By "photochemically active groups" is meant compounds which are able to undergo chemical reactions upon irradiation with light. Illustrative examples of functional groups are quinones, especially 6-methyl-l,4-naphtoquinone, anthraquinone, naphtoquinone, and 1,4- dimethyl-anthraquinone, diazirines, aromatic azides, benzophenones, psoralens, diazo compounds, and diazirino compounds. By "thermochemically reactive group" is meant a functional group which is able to undergo thermochemically-induced covalent bond formation with other groups. Illustrative examples of functional parts of thermochemically reactive groups are carboxylic acids, carboxylic acid esters such as activated esters, carboxylic acid halides such as acid fluorides, acid chlorides, acid bromide, acid iodides, carboxylic acid azides, carboxylic acid hydrazides, sulfonic acids, sulfonic acid esters, sulfonic acid halides, semicarbazides, thiosemicarbazides, aldehydes, ketones, primary alcohols, secondary alcohols, tertiary alcohols, phenols, alkyl halides, thiols, disulphides, primary amines, secondary amines, tertiary amines, hydrazines, epoxides, maleimides, and boronic acid derivatives.
By "chelating group" is meant a molecule that contains more than one binding site and frequently binds to another molecule, atom, or ion through more than one binding site at the same time. Examples of functional parts of chelating groups are iminodiacetic acid, nitrilotriacetic acid, ethylenediamine tetraacetic acid (EDTA), and aminophosphonic acid.
By "reporter group" is meant a group which is detectable either by itself or as a part of an detection series. Examples of functional parts of reporter groups are biotin, digoxigenin, fluorescent groups (e.g., groups which are able to absorb electromagnetic radiation, e.g. light or X-rays, of a certain wavelength, and which subsequently reemit the energy absorbed as radiation of longer wavelength; such as dansyl (5-dimethylamino)-l-naphthalenesulfonyl), DOXYL (N-oxyl-4,4-dimethyloxazolidine), PROXYL (N-oxyl-2,2,5,5-tetra- methylpyrrolidine), TEMPO (N-oxyl-2,2,6,6-tetramethylpiperidine), dinitrophenyl, acridines, coumarins, Cy3 and Cy5 (trademarks for Biological Detection Systems, Inc.), erythrosine, coumaric acid, umbelliferone, Texas red, rhodamine, tetramethyl rhodamine, Rox, 7- nitrobenzo-2-oxa-l -diazole (NBD), pyrene, fluorescein, Europium, Ruthenium, Samarium, and other rare earth metals), radioisotopic labels, chemiluminescence labels (i.e., labels that are detectable via the emission of light during a chemical reaction), spin labels (a free radical e.g., substituted organic nitroxides) or other paramagnetic probes (e.g., Cu2+ or Mg2+) bound to a biological molecule being detectable by the use of electron spin resonance spectroscopy), enzymes (such as peroxidases, alkaline phosphatases, β-galactosidases, and glycose oxidases), antigens, antibodies, haptens (e.g., groups which are able to combine with an antibody, but which cannot initiate an immune response by itself, such as peptides and steroid hormones), carrier systems for cell membrane penetration, fatty acid units, steroid moieties (cholesteryl), vitamin A, vitamin D, vitamin E, folic acid peptides for specific receptors, groups for mediating endocytose, epidermal growth factor (EGF), bradykinin, and platelet derived growth factor (PDGF). Especially desirable groups are biotin, fluorescein, Texas Red, rhodamine, dinitrophenyl, digoxigenin, Ruthenium, Europium, Cy5, and Cy3.
By "ligand" is meant a compound which binds. Ligands can comprise functional groups such as aromatic groups (such as benzene, pyridine, naphthalene, anthracene, and phenanthrene), heteroaromatic groups (such as thiophene, furan, tetrahydrofuran, pyridine, dioxane, and pyrimidine), carboxylic acids, carboxylic acid esters, carboxylic acid halides, carboxylic acid azides, carboxylic acid hydrazides, sulfonic acids, sulfonic acid esters, sulfonic acid halides, semicarbazides, thiosemicarbazides, aldehydes, ketones, primary alcohols, secondary alcohols, tertiary alcohols, phenols, alkyl halides, thiols, disulphides, primary amines, secondary amines, tertiary amines, hydrazines, epoxides, maleimides, Cι.-C2o alkyl groups optionally interrupted or terminated with one or more heteroatoms such as oxygen atoms, nitrogen atoms, and/or sulphur atoms, optionally containing aromatic or mono/polyunsaturated hydrocarbons, polyoxyethylene such as polyethylene glycol,
oligo/polyamides such as poly-α-alanine, polyglycine, polylysine, peptides, oligo/polysaccharides, oligo/polyphosphates, toxins, antibiotics, cell poisons, and steroids. "Affinity ligands" include functional groups or biomolecules that have a specific affinity for sites on particular proteins, antibodies, poly- and oligosaccharides, and other biomolecules. It should be understood that the above-mentioned specific examples under DNA intercalators, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands correspond to the "active/functional" part of the groups in question. For the person skilled in the art it is furthermore clear that DNA intercalators, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands are typically represented in the form M-K- where M is the
"active/functional" part of the group in question and where K is a spacer tlirough which the "active/functional" part is attached to the 5- or 6-membered ring. Thus, it should be understood that the group B, in the case where B is selected from DNA intercalators, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands, has the form M-K-, where M is the "active/functional" part of the DNA intercalator, photochemically active group, thermochemically active group, chelating group, reporter group, and ligand, respectively, and where K is an optional spacer comprising 1-50 atoms, desirably 1-30 atoms, in particular 1-15 atoms, between the 5- or 6-membered ring and the "active/functional" part. By "spacer" is meant a thermochemically and photochemically non-active distance- making group and is used to join two or more different moieties of the types defined above. Spacers are selected on the basis of a variety of characteristics including their hydrophobicity, hydrophilicity, molecular flexibility and length (e.g., Hermanson et. al, "Immobilized Affinity Ligand Techniques," Academic Press, San Diego, California (1992). Generally, the length of the spacers is less than or about 400 A, in some applications desirably less than 100 A. The spacer, thus, comprises a chain of carbon atoms optionally interrupted or terminated with one or more heteroatoms, such as oxygen atoms, nitrogen atoms, and/or sulphur atoms. Thus, the spacer K may comprise one or more amide, ester, amino, ether, and/or thioether functionalities, and optionally aromatic or mono/polyunsaturated hydrocarbons, polyoxyethylene such as polyethylene glycol, oligo/polyamides such as poly-α-alanine, polyglycine, polylysine, peptides, oligosaccharides, or oligo/polyphosphates. Moreover the spacer may consist of combined units thereof. The length of the spacer may vary, taking into consideration the desired or necessary positioning
and spatial orientation of the "active/functional" part of the group in question in relation to the 5- or 6-membered ring. In particularl embodiments, the spacer includes a chemically cleavable group. Examples of such chemically cleavable groups include disulphide groups cleavable under reductive conditions and peptide fragments cleavable by peptidases. By "target nucleic acid" or "nucleic acid target" is meant a particular nucleic acid sequence of interest. Thus, the "target" can exist in the presence of other nucleic acid molecules or within a larger nucleic acid molecule.
By "solid support" is meant any rigid or semi-rigid material to which a nucleic acid binds or is directly or indirectly attached. The support can be any porous or non-porous water insoluble material, including without limitation, membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, strips, plates, rods, polymers, particles, microparticles, capillaries, and the like. The support can have a variety of surface forms, such as wells, trenches, pins, channels and pores.
By an "array" is meant a fixed pattern of at least two different immobilized nucleic acids on a solid support. Desirably, the array includes at least 10 , more desirably, at least 103, and, most desirably, at least 104 different nucleic acids.
By "antisense nucleic acid" is meant a nucleic acid, regardless of length, that is complementary to a coding strand or mRNA of interest. In some embodiments, the antisense molecule inhibits the expression of only one nucleic acid, and in other embodiments, the antisense molecule inhibits the expression of more than one nucleic acid. Desirably, the antisense nucleic acid decreases the expression or biological activity of a nucleic and or encoded protein by at least 20, 40, 50, 60, 70, 80, 90, 95, or 100%. An antisense molecule can be introduced, e.g., to an individual cell or to whole animals, for example, it may be introduced systemically via the bloodstream. Desirably, a region of the antisense nucleic acid or the entire antisense nucleic acid is at least 70, 80, 90, 95, 98, or 100% complementary to a coding sequence, regulatory region (5' or 3' untranslated region), or an mRNA of interest. Desirably, the region of complementarity includes at least 5, 10, 20, 30, 50, 75,100, 200, 500, 1000, 2000 or 5000 nucleotides or includes all of the nucleotides in the antisense nucleic acid. In some embodiments, the antisense molecule is less than 200, 150, 100, 75, 50, or 25 nucleotides in length. In other embodiments, the antisense molecule is less than 50,000; 10,000; 5,000; or 2,000 nucleotides in length. In certain embodiments, the antisense molecule is at least 200, 300, 500, 1000, or 5000 nucleotides in length. In some
embodiments, the number of nucleotides in the antisense molecule is contained in one of the following ranges: 5-15 nucleotides, 16-20 nucleotides, 21-25 nucleotides, 26-35 nucleotides, 36-45 nucleotides, 46-60 nucleotides, 61-80 nucleotides, 81-100 nucleotides, 101-150 nucleotides, or 151-200 nucleotides, inclusive. In addition, the antisense molecule may contain a sequence that is less than a full-length sequence or may contain a full-length sequence.
By "double stranded nucleic acid" is meant a nucleic acid containing a region of two or more nucleotides that are in a double stranded conformation. In various embodiments, the double stranded nucleic acids consist entirely of LNA units or a mixture of LNA units, ribonucleotides, and/or deoxynucleotides. The double stranded nucleic acid may be a single molecule with a region of self-complementarity such that nucleotides in one segment of the molecule base-pair with nucleotides in another segment of the molecule. Alternatively, the double stranded nucleic acid may include two different strands that have a region of complementarity to each other. Desirably, the regions of complementarity are at least 70, 80, 90, 95, 98, or 100% identical. Desirably, the region of the double stranded nucleic acid that is present in a double stranded conformation includes at least 5, 10, 20, 30, 50, 75,100, 200, 500, 1000, 2000 or 5000 nucleotides or includes all of the nucleotides in the double stranded nucleic acid. Desirable double stranded nucleic acid molecules have a strand or region that is at least 70, 80, 90, 95, 98, or 100% identical to a coding region or a regulatory sequence (e.g., a transcription factor binding site, a promoter, or a 5' or 3' untranslated region) of a nucleic acid of interest. In some embodiments, the double stranded nucleic acid is less than 200, 150, 100, 75, 50, or 25 nucleotides in length. In other embodiments, the double stranded nucleic acid is less than 50,000; 10,000; 5,000; or 2,000 nucleotides in length. In certain embodiments, the double stranded nucleic acid is at least 200, 300, 500, 1000, or 5000 nucleotides in length. In some embodiments, the number of nucleotides in the double stranded nucleic acid is contained in one of the following ranges: 5-15 nucleotides, 16-20 nucleotides, 21-25 nucleotides, 26-35 nucleotides, 36-45 nucleotides, 46-60 nucleotides, 61- 80 nucleotides, 81-100 nucleotides, 101-150 nucleotides, or 151-200 nucleotides, inclusive. In addition, the double stranded nucleic acid may contain a sequence that is less than a full- length sequence or may contain a full-length sequence.
In some embodiments, the double stranded nucleic acid inhibits the expression of only one nucleic acid, and in other embodiments, the double stranded nucleic acid molecule inhibits the expression of more than one nucleic acid. Desirably, the nucleic acid decreases
the expression or biological activity of a nucleic acid of interest or a protein encoded by a nucleic acid of interest by at least 20, 40, 50, 60, 70, 80, 90, 95, or 100%. A double stranded nucleic acid can be introduced, e.g., to an individual cell or to whole animals, for example, it may be introduced systemically via the bloodstream. In various embodiments, the double stranded nucleic acid or antisense molecule includes one or more LNA nucleotides, one or more universal bases, and/or one or more modified nucleotides in wliich the 2' position in the sugar (e.g., ribose or xylose) contains a halogen (such as fluorine group) or contains an alkoxy group (such as a methoxy group) which increases the half-life of the double stranded nucleic acid or antisense molecule in vitro or in vivo compared to the corresponding double stranded nucleic acid or antisense molecule in which the corresponding 2' position contains a hydrogen or an hydroxyl group. In yet other embodiments, the double stranded nucleic acid or antisense molecule includes one or more linkages between adjacent nucleotides other than a naturally-occurring phosphodiester linkage. Examples of such linkages include phosphoramide, phosphorothioate, and phosphorodithioate linkages. Desirably, the double stranded or antisense molecule is purified.
By "purified" is meant separated from other components that naturally accompany it. Typically, a factor is substantially pure when it is at least 50%, by weight, free from proteins, antibodies, and naturally-occurring organic molecules with which it is naturally associated. Desirably, the factor is at least 75%, more desirably, at least 90%, and most desirably, at least 99%, by weight, pure. A substantially pure factor may be obtained by chemical synthesis, separation of the factor from natural sources, or production of the factor in a recombinant host cell that does not naturally produce the factor. Nucleic acids and proteins may be purified by one skilled in the art using standard techniques such as those described by Ausubel et al. (Current Protocols in Molecular Biology, John Wiley & Sons, New York, 2000). The factor is desirably at least 2, 5, or 10 times as pure as the starting material, as measured using polyacrylamide gel electrophoresis, column chromatography, optical density, HPLC analysis, or western analysis (Ausubel et al, supra). Desirable methods of purification include immunoprecipitation, column chromatography such as immunoaffinity chromatography, magnetic bead immunoaffinity purification, and panning with a plate-bound antibody.
By "treating, stabilizing, or preventing a disease, disorder, or condition" is meant preventing or delaying an initial or subsequent occurrence of a disease, disorder, or
condition; increasing the disease-free survival time between the disappearance of a condition and its reoccurrence; stabilizing or reducing an adverse symptom associated with a condition; or inhibiting or stabilizing the progression of a condition. Desirably, at least 20, 40, 60, 80, 90, or 95% of the treated subjects have a complete remission in which all evidence of the disease disappears. In another desirable embodiment, the length of time a patient survives after being diagnosed with a condition and treated with a nucleic acid of the invention is at least 20, 40, 60, 80, 100, 200, or even 500% greater than (i) the average amount of time an untreated patient survives or (ii) the average amount of time a patient treated with another therapy survives. By "treating, stabilizing, or preventing cancer" is meant causing a reduction in the size of a tumor, slowing or preventing an increase in the size of a tumor, increasing the disease-free survival time between the disappearance of a tumor and its reappearance, preventing an initial or subsequent occurrence of a tumor, or reducing an adverse symptom associated with a tumor. In one desirable embodiment, the number of cancerous cells surviving the treatment is at least 20, 40, 60, 80, or 100% lower than the initial number of cancerous cells, as measured using any standard assay. Desirably, the decrease in the number of cancerous cells induced by administration of a nucleic acid of the invention (e.g., a nucleic acid with substantial complementarily to a nucleic acid associated with cancer such as an oncogene) is at least 2, 5, 10, 20, or 50-fold greater than the decrease in the number of non- cancerous cells. In yet another desirable embodiment, the number of cancerous cells present after administration of a nucleic acid of the invention is at least 2, 5, 10, 20, or 50-fold lower than the number of cancerous cells present prior to the administration of the compound or after administration of a buffer control. Desirably, the methods of the present invention result in a decrease of 20, 40, 60, 80, or 100% in the size of a tumor as determined using standard methods. Desirably, at least 20, 40, 60, 80, 90, or 95% of the treated subjects have a complete remission in which all evidence of the cancer disappears. Desirably, the cancer does not reappear or reappears after at least 5, 10, 15, or 20 years.
Exemplary cancers that can be treated, stabilized, or prevented using the above methods include prostate cancers, breast cancers, ovarian cancers, pancreatic cancers, gastric cancers, bladder cancers, salivary gland carcinomas, gastrointestinal cancers, lung cancers, colon cancers, melanomas, brain tumors, leukemias, lymphomas, and carcinomas. Benign tumors may also be treated or prevented using the methods and nucleic acids of the present invention.
By "infection" is meant the invasion of a host animal by a pathogen (e.g., a bacteria, yeast, or virus). For example, the infection may include the excessive growth of a pathogen that is normally present in or on the body of an animal or growth of a pathogen that is not normally present in or on the animal. More generally, an infection can be any situation in which the presence of a pathogen population(s) is damaging to a host. Thus, an animal is "suffering" from an infection when an excessive amount of a pathogen population is present in or on the animal's body, or when the presence of a pathogen population(s) is damaging the cells or other tissue of the animal. In one embodiment, the number of a particular genus or species of pathogen is at least 2, 4, 6, or 8 times the number normally found in the animal. A bacterial infection may be due to gram positive and/or gram negative bacteria. In desirable embodiments, the bacterial infection is due to one or more of the following bacteria: Chlamydophila pneumoniae, C. psittaci, C. abortus, Chlamydia trachomatis, Simkania negevensis, Parachlamydia acanthamoebae, Pseudomonas aeruginosa, P. alcaligenes, P. chlororaphis, P. fluorescens, P. luteola, P. mendocina, P. monteilii, P. oryzihabitans, P. pertocinogena, P. pseudalcaligenes, P. putida, P. stutzeri, Burkholderia cepacia, Aeromonas hydrophilia, Escherichia coli, Citrobacter freundii, Salmonella typhimurium, S. typhi, S. paratyphi, S. enteritidis, Shigella dysenteriae, S. flexneri, S. sonnei, Enterobacter cloacae, E. aerogenes, Klebsiella pneumoniae, K. oxytoca, Serratia marcescens, Francisella tularensis, Morganella morganii, Proteus mirabilis, Proteus vulgaris, Providencia alcalifaciens, P. rettgeri, P. stuartii, Acinetobacter calcoaceticus, A. haemolyticus, Yersinia enterocolitica, Y. pestis, Y. pseudotuberculosis, Y. intermedia, Bordetella pertussis, B. parapertussis, B. bronchiseptica, Haemophilus influenzae, H. parainfluenzae, H. haemolyticus, H. parάhaemolyticus, H. ducreyi, Pasteurella multocida, P. haemolytica, Branhamella catarrhalis, Helicobacter pylori, Campylobacter fetus, C.jejuni, C. coli, Borrelia burgdorferi, V. cholerae, V. parahaemolyticus, Legionella pneumophila, Listeria monocytogenes, Neisseria gonorrhea, N. meningitidis, Kingella dentrificans, K. kingae, K. oralis, Moraxella catarrhalis, M. atlantae, M. lacunata, M. nonliquefaciens, M. osloensis, M. phenylpyruvica, Gardnerella vaginalis, Bacteroides fragϊlis, Bacteroides distasonis, Bacteroides 3452A homology group, Bacteroides vulgatus, B. ovalus, B. thetaiotaomicron, B. uniformis, B. eggerthii, B. splanchnicus, Clostridium difficile, Mycobacterium tuberculosis, M. avium, M. intracellulare, M. leprae, C. diphtheriae, C. ulcerans, C. accolens, C. afermentans, C. amycolatum, C. argentorense, C. auris, C. bovis, C. confusum, C. coyleae, C. durum, C. falsenii, C. glucuronolyticum, C. imitans, C. jeikeium,
C. kutscheri, C. kroppenstedtii, C. lipophilum, C. macginleyi, C. matruchoti, C. mucifaciens, C. pilosum, C. propinquum, C. renale, C. riegelii, C. sanguinis, C. singulare, C. striatum, C. sundsvallense, C. thomssenii, C. urealyticum, C. xerosis, Streptococcus pneumoniae, S. agalactiae, S. pyogenes, Enterococcus avium, E. casseliflavus, E. cecorum, E. dispar, E. durans, E. faecalis, E. faecium, E. flavescens, E. gallinarum, E. hirae, E. malodoratus, E. mundtii, E. pseudoavium, E. raffinosus, E. solitarius, Staphylococcus aureus, S. epidermidis, S. saprophyticus, S. intermedius, S. hyicus, S. haemolyticus, S. hominis, and/or S. saccharolyticus. Desirably, a nucleic acid is administered in an amount sufficient to prevent, stabilize, or inhibit the growth of a pathogenic bacteria or to kill the bacteria. In various embodiments, the viral infection relevant to the methods of the invention is an infection by one or more of the following viruses: West Nile viras (e.g., Samuel, "Host genetic variability and West Nile virus susceptibility," Proc. Natl. Acad. Sci. USA August 21, 2002; Beasley, Virology 296:17-23, 2002), Hepatitis, picornariras, polio, HIV, coxsacchie, herpes (e.g., zoster, simplex, EBV, or CMV), adenovims, retrovius, falvi, pox, rhabdovirus, picoma viras (e.g., coxsachie, entero, hoof and mouth, polio, or rhinoviras), St. Louis encephalitis, Epstein-Barr, myxoviras, JC, coxsakievirus B, togaviras, measles, paramyxovirus, echovirus, bunyavirus, cytomegaloviras, varicella-zoster, mumps, equine encephalitis, lymphocytic choriomeningitis, rabies, simian virus 40, polyoma virus, parvoviras, papilloma virus, primate adenovims, and/or BK. By "mammal in need of treatment" is meant a mammal in which a disease, disorder, or condition is treated, stabilized, or prevented by the administration of a nucleic acid of the invention.
Other aspects and embodiments of the invention are in the detailed description and claims below. Additionally, other nucleic acids and methods described in U.S.S.N. 10/105,639 (Jakobsen et al, "Modified Oligonucleotides and Uses Thereof) or U.S.S.N. 60/410,061 (Ramsing et al, "Populations of Oligonucleotides with Duplex Stabilizing Properties and Uses Thereof) which are hereby incorporated by reference, can be used in the present invention.
Brief Description Of The Drawings
Figure 1 shows the structures of selected nucleotide monomers: DNA (T), LNA (TL), pyrene DNA (Py), 2'-OMe-RNA [2'-OMe(T)], abasic LNA (AbL), phenyl LNA (17a), and pyrenyl LNA (17d).
Figure 2 illustrates the chemical structures of Selective Binding Complementary (SBC) nucleotides.
Figure 3 illustrates a synthetic route.
Figure 4 shows the base-pairing between modified bases and naturally-occurring 5 nucleotides. These modified bases may be incorporated as part of an LNA, DNA, or RNA unit and used any of the oligomers of the invention.
Figures 5-8 illustrate various synthetic routes.
Figure 9 is a schematic illustration of the use of an exemplary synthesis for LNA- furanoPyr-SBC-C. 10 Figures 10 and 11 illustrate various synthetic routes.
Figure 12 shows the sensitivity of 50-mer LNA capture probes compared to 50-mer DNA capture probes at 5, 10 and 20 μM, respectively. SW15-specific 50-mer DNA oligonucleotides (1st, 3rd and 5th bars) and 50-mer capture probes with an LNA nucleotide incorporated at every third nucleotide position (2n , 4 , and 6th bars) were printed at the oligo 15 concentration indicated below. The slides were hybridized at 65°C in 3x SSC (Figure 12 (left)) and at 70°C in 3xSSC (Figure 12 (right)).
Figure 13 shows the specificity of 40-mer LNA capture probes (bar 7-12) compared to DNA capture probes (bar 1-6). The hybridizations were carried out at 65°C in 3XSSC. Bars 1 and 7 represent perfectly matched duplexes, bars 2 and 8, 3, and 9, 4 and 10, 5 and 11, 20 and 6 and 12 represent duplexes with 1, 2, 3, 4, 5 mismatches, respectively. The in vitro RNA used was SW15 in Figure 13 (left) and TH14 in Figure 13 (right).
Figure 14 shows the detection principle for alternative exon skipping in the C. elegans let-2 gene using LNA oligonucleotide capture probes and comparative expression profiling.
Figure 15 shows the detection of alternative splicing of C. elegans Let-2 exon 9 and 25 10 using LNA-modified capture probes.
Figure 16 shows the comparison of DNA and LNA-modified oligonucleotide capture probes in the specific capture of the C. elegans T01D3.3 mRNA, exon 4.
Figure 17 illustrates the LNA exon-exon junction (merged) probe concept.
Figure 18 shows the capture probe specificity for the C. elegans T01D3.3 mRNA, 30 exon 4 (Figure 18 (upper)) and exon 5 (Figure 18 (lower)) as validated by short complementary target oligonucleotides.
Figure 19 shows the construction of the recombinant splice variants in the in vitro transcription vector. The small bars show the location of the hybridization for the
oligonucleotide capture probes used in this example. The sequences of the capture probes are described herein.
Figure 20 (upper, LNA probes; lower, DNA control probes) shows the detection of splice variant #1 and #2, respectively using merged capture probes in a comparative, two- color hybridization.
Figure 21 shows the sensitivity of 50-mer LNA capture probes compared to 50-mer DNA capture probes. SWI5-specific 50-mer DNA oligonucleotides and 50-mer capture probes with an LNA nucleotide incorporated at every second (LNA2) or third (LNA3) nucleotide position. The slides were hybridized at 65°C in 3xSSC. Figure 22 is a bar graph of the signal intensities of a patient DNA sample hybridized to an array of the invention. The names of the probes in Figure 22 and in the table in Example 7 match, although the numbers used in Figure 22 are abbreviated, e.g., probe No. 10580 Menkes.14 50NH2C6-2.LNA in the before-mentioned table corresponds to the second probe counted from the left "14.2" LNA in the lower graph of Figure 22. Figure 23 is a graph comparing the spot intensity for probes of the invention with different LNA substitution patterns.
Figure 24 is a bar graph of the spot intensity for LNA probes for different exons.
Figure 25 illustrates a synthetic route.
Figure 26 is a flow chart of the steps of oligo design software of the invention. The OligoDesign software features LNA modified oligonucleotide secondary structure prediction, LNA spiked oligonucleotide melting temperature prediction, genome wide cross hybridization prediction, secondary stracture prediction of the target, and recognition and filtering of the target in the genome. These features are determined for each possible probe of the query gene and presented to an artificial neural network. The probes are then ranked according to the neural network prediction and the top scoring probes are returned.
Figure 27 is a schematic illustration of the OligoDesign software of the invention.
Figure 28 illustrates a synthetic route.
Figure 29 illustrates photo-activated immobilization of nucleic acids of the invention, which enables polarized coupling of anthraquinone (AQ)-linked LNA oligonucleotides onto the polymer surface. No pretreatment of the slide is needed. A covalent bond is formed between the oligonucleotide and the polymer using a UN source, e.g. Stratalinker.
Figure 30 illustrates an injection-molded polymer slide. Finger indents ease slide handling. The slide has a well-defined printing and hybridization window, frosted surface for identification and orientation, and space for barcodes.
Figure 31 illustrates spot quality on different slides that can be used to immobilize nucleic acids of the invention. The hydrophobic slide surface ensures that extremely homogenous spots are generated when hydrophilic spotting solution is applied to the surface. A high spot quality is obtained on the Immobilizer™ polymer slide compared to a glass slide when using a spot-to-spot distance of 150 :M. The high-quality arrays simplify downstream image analyses. Figure 32 is a schematic illustration of a method of the invention.
Figure 33 is a table of exemplary target nucleic acids (Holstege et al. (1998)( Cell 95, 717-728, and Causton et al. (2001) Mol. Biol. Cell 12, 323-337).
Figure 34 is a graph of on-chip melting profiles of 50-mer oligonucleotide probes of the invention. Yeast actin 1 -specific 50-mer capture probes were synthesized as DNA and DNA/LNA mixmer oligonucleotides. LNA-substituted mixmer capture probes contain an LNA at every 4th, 5th, and 6th nucleotide position (LNA_4, LNA_5, LNA_6). On-chip melting profiles demonstrate a 8-10 °C increase in Tm obtained with LNA capture probes.
Figure 35 (upper) illustrates the heat-shock response in yeast. The array was hybridized with Cy3-labeled standard and Cy5-labelled heat-shock yeast cDNA. Figure 35 (lower) also illustrates the heat-shock response in yeast. The microarray data were normalized using yeast actin 1. The ssa4 gene encoding heat shock protein HSP70 is upregulated over 2-fold. Expression of the gual gene is down-regulated.
Figure 36 (upper) compares expression of wild-type and ssa4 mutant yeast. The array was hybridized with Cy 3 -labeled wild-type and Cy 5 -labelled ssa4 mutant yeast cDNA. Figure 36 (lower) also compares wild-type and ssa4 yeast. The hybridization data were normalized using yeast actin 1. ssa4 is detected in the wild-type yeast strain, but not in the ssa4 knock-out strain.
Figure 37 illustrates mRNA splicing.
Figure 38 is a picture showing gel electrophoresis of fragmented cDNA from the yeast wild-type strain. The molecular marker (lane 1 and 9) is from Life technologies, USA. Lanes 2-8 represents the UDG-fragmented cDNA 1-7 according to the different dUTP/dTTP ratios in Table 18. Figure 39 is a graph of the log ratios of the normalized fluorescence intensities from the wild-type yeast strain (signal) and those from the Assa4 yeast strain (noise) as a function of capture probe position in the 3' region of the SSA4 mRNA. Figure 40 is a schematic illustration of mRNA splicing. Figure 41 is a schematic illustration of alternative mRNA splicing. Figure 42 is a schematic illustration of probes of the invention.
Figure 43 is a schematic illustration of probes of the invention. Figure 44 illustrates an exemplary computer for use in the methods of the invention. Figure 45 shows the sensitivity and specificity of LNA oligonucleotide capture probes (black solid bars) compared to DNA capture probes (white, open bars) on expression microarrays. Fluorescence intensity is shown in arbitrary units (relative measurements). The arrays comprising 50-mer and 40-mer perfect match and 1-5 mismatch capture probes were hybridized at 65°C in 3xSSC with Cy3-labelled cDNA from 10 μg C. elegans total RNA spiked with yeast a) SWI5 RNA and c) THI4 RNA. b) and d) demonstrate the improved mismatch discrimination with the 50-mer LNA probes by increasing the hybridization temperature from 65 °C to 70 °C hybridized with Cy3-labelled cDNA from 10 μg C. elegans total RNA spiked with yeast b) SWI5 RNA and d) THI4 RNA.
Figure 46 shows the expected (black, solid bars) and observed (white, open bars) fold-of-change in the expression levels of the Cy3-ULS-labelled yeast HSP78 spike RNA as measured by on-chip capture using three different 25-mer oligonucleotide capture probes (DNA control, LNA-T substituted, LNA_3 substituted in which every third nucleotide was substituted with an LNA monomer). In the hybridization experiment, one ng of HSP78 in vitro spike RNA or 200 pg HSP78 in vitro spike RNA was used, respectively. Thus, the fold change of the HSP78 RNA in the two hybridizations in the comparison is 5-fold. Fourteen additional synthetic in vitro mRNA spike controls were included in the hybridisation solution as a semi-complex background RNA mixture. Seven of these spikes were used as normalization controls, the remaining seven were used as negative controls. Hybridization temperature was 65°C for 16 hours, and post-hybridization washes as described. Both LNA_T and LNA_3 substituted 25-mer probes are capable of providing highly accurate
measurements for fold-of-changes in gene expression levels, as depicted in Figure 46. Under these conditions the DNA capture probes did not hybridize.
Figure 47 shows the measured intensity levels by on-chip capture using three different 25-mer oligonucleotide capture probe designs ( DNA control, LNA_T substituted and LNA 5 C and T substituted probes). One (1) ng biotin-labeled HSP78 target was used in the hybridization experiments, followed by staining with Streptavidin Phycoerythrin. The LNA_T and LNA_TC substituted 25-mer capture probes show a significantly enhanced on- chip capture of the HSP78 RNA target, compared to the DNA 25-mer control probes under four different hybridization stringency conditions in dicated on the graph.
10 Figure 48 illustrates a synthetic route.
Figure 49 shows the detection of alternatively spliced mRNAs using LNA-substituted 50-mer oligonucleotide capture probes. Parts per million (ppm) calculations indicate spike transcripts per total transcripts in the hybridisation mix. Calculations are based on an average C. elegans RNA being 1000 nucleotides as in Hill et al. (2000) Science 290:809-812. The 50-
15 mer LNA-DNA mixmer capture probes, substituted with an LNA nucleotide at every third nucleotide position, are able to provide highly accurate measurements for fold-changes in the expression of three homologous, alternatively spliced mRNA variants in the concentration range of 1000 ppm to 10 ppm. The quantification of the splice isoforms was carried out using a set of both internal, exon-specific probes and merged, splice junction specific probes,
20 printed onto microarrays and hybridized with complex cDNA target pools spiked with different cloned artificial splice isoforms in wliich the middle exon was either alternatively skipped or excluded completely resulting in the three different splice isoforms; 01-INS3-03, 01-LNS4-03 and 01-03.
Figure 50 shows the detection of alternatively spliced mRNAs using LNA-substituted 25 40-mer oligonucleotide capture probes. Parts per million (ppm) calculations indicate spike transcripts per total transcripts in the hybridisation mix. Calculations are based on an average C. elegans RNA being 1000 nucleotides as in Hill et al. (2000) Science 290:809-812. The 40- mer LNA-DNA mixmer capture probes, substituted with an LNA nucleotide at every third nucleotide position, are able to provide highly accurate measurements for fold-changes in the 30 expression of three homologous, alternatively spliced mRNA variants in the concentration range of 1000 ppm to 10 ppm. The quantification of the splice isoforms was carried out using a set of both internal, exon-specific probes and merged, splice junction specific probes,
printed onto microarrays and hybridized with complex cDNA target pools spiked with different cloned artificial splice isoforms in which the middle exon was either alternatively skipped or excluded completely resulting in the three different splice isoforms; 01-INS3-03, 01-INS4-03 and 01-03. Figure 51 shows the comparison of different LNA/DNA mixmer oligonucleotide probes in the detection of human satellite-2 repeats by fluorescence in situ hybridization. Experiment conditions: 6.4 pmoles of Cy3 labeled probe was hybridized for 30 minutes at 37°C, after simultaneous denaturation of the target and the probe at 75°C for 5 minutes. A. LNA-2 giving signals on chromosomes 1, 16, 9 and 15, B. LNA-3 giving bright signals on chromosomes 1, 16 and 9, C. Dispersed LNA giving signals on chromosomes 1 and 16 only, D. LNA Block giving smaller signals on chromosome 1, E. DNA control oligonucleotide FISH probe giving no signals on any of the chromosomes.
Figure 52 illustrates the hybridisation of the Cy3-labelled human telomere repeat specific, LNA-2 substituted oligonucleotide probe on human metaphase chromosomes, which resulted in prominent signals on the telomeres.
Figure 53 illustrates examples of LNA units. Figure 54 illustrates a synthetic route.
Detailed Description Detection and Analysis of mRNA Splice Variants
Alternative splicing is the process by which different mature messenger RNAs are produced from the same pre-mRNA. Because the mRNA composition of a given cell determines the proteins present in a cell, this process is an important aspect of a cells gene expression profile. Current investigations of transcriptomes (i.e., the total complexity of RNA transcripts produced by an organism) indicate that at least 50-60 % of the genes of complex eukaryotes produce more than one splice variant. The present invention provides a novel method for detecting and quantifying the levels of splice variants in complex mRNA pools using LNA discriminating probes and high-throughput LNA oligonucleotide microarray technology. The detection concept which uses internal LNA exon probes and/or splice-variant specific exon-exon junction or exon-intron or intron-exon (so-called merged) probes is depicted in Figure 17.
Internal, exon-specific (or intron-specific) LNA oligonucleotide probes are designed and used to detect the relative levels of a given exon (or intron) in complex mRNA pools
using oligonucleotide microarray technology or similar techniques. Exon-exon LNA junction probes are designed for multiple or all possible exon-exon combinations or exon- intron combinations. The LNA discriminating probes are highly specific and superior compared to DNA oligonucleotides due to the higher ΔTm of LNA probes. These probes can be used to determine the sequential order of each sub-element (i.e., exon structure or exon- intron structure) in a given alternatively spliced mRNA isoform, thus giving the exact composition of the mRNA. Subsequently, the ratios of each splice variant can be quantified using the combined readouts from both internal and merged LNA probes and control probes. The invention is applicable both in single fluor (single channel) or comparative two-fluor (two channel) microarray hybridizations.
Several "artificial,"altematively spliced mRNA molecules may be constructed in an in vitro transcription vector for the production of clean INT R A. Both internal and junction-specific LΝA oligonucleotide capture probes are designed, synthesized, and spotted onto, e.g., Exiqon's polymer microarray platform. The resulting splice-specific microarray is used to validate the LΝA discriminating probe concept by spiking the in vitro RΝAs individually as well as in different ratios into a complex RΝA background for fluorochrome- labelling and array hybridization.
The internal and merged probes of the invention can also be used in any standard method for the analysis of mRNA splice variants (see, for example, Yeakley et al, Nature Biotechnology 20:353-358, 2002; Clark et al, Science 296:907-910, 2002; Mutch etal, Genome Biology 2(12):preprint00009.1-0009.31, 2001).
Exemplary Applications of Internal and/or Merged Probes
The internal and/or merged probes of the invention can also be used for gene expression profiling of alternative splice variants, oligonucleotide expression microarrays, real-time PCR, and profiling of alternatively spliced mRNAs using microtiterplate assays or fiber-optic arrays.
Detection and characterization of alternative splicing is particularly useful for the study and treatment of human disease (exonhit website, "Inaugural Splicing 2002 Concludes: Alternative Splicing May Make All the Difference in Discovering the Origin of Disease). In particular, RNA splicing is now widely recognized as a means to generate protein diversity.
Alternative splicing is a key mechanism for regulating gene expression, and any mutation or defect in its regulation can impact considerably cell functions. Therefore, it is likely to be an
important source of novel gene and protein targets implicated in human pathology. Industry has long recognized the need for innovative discovery technologies that focuses on the origin of disease for the development of novel diagnostics and therapeutics.
In particular, there are many examples of human pathologies caused by alterations in normal patterns of alternative RNA splicing. Because a large number of human genes undergo alternative splicing, the protein isoforms that result from this process represent a major source of targets for commercial development of therapies and diagnostics. In particular, splicing processes play a significant role in the onset and development of cardiovascular, muscular, CNS diseases, and cancer. Early evidence indicates the origin of many diseases can be identified by examining alternative splicing - which leads to the point of intervention for discovering future generations of drugs. The study of splicing enables the discovery of new mechanisms underlying disease progression. Comparative Genomic Hybridization
Comparative Genomic Hybridization (CGH) is a powerful technology for detection of unbalanced chromosome rearrangements and holds much promise for screening and identification of interstitial submicroscopic rearrangements that otherwise cannot be detected using classical cytogenetic or FISH technologies. The adaptation of CGH onto an oligo microarray platform allows detection of small single exon deletion/duplications on a genome wide scale. There is a strong need for developing microarrays that can detect, e.g., single exon aberrations. This detection can be achieved by employing LNA mixmer oligos as capture probes for individual exons in selected genes.
A model system for these methods is the Menkes loci. Menkes disease is a lethal-X linked recessive disorder associated with copper metabolism disturbance leading to death in early childhood. The Menkes locus has been mapped to Xql3. The gene spans about 150 kb genomic region, contains 23 exons, and encodes a 8.5 kb gene transcript. The gene for Menkes disease (now designated as A TP 7 A) encodes a 1500 amino acid membrane-bound Cu-binding P-type ATPase (ATP7A). The 8.5 kb transcript is expressed in all tissues from normal individuals (though only trace amounts are present in liver), but is diminished or absent in Menkes disease patients. Several different kinds of mutations, like chromosome aberrations, point mutations and partial gene deletions affecting ATP7A have been identified in MD patients. 50-mer capture probes with LNA spiked in every second, third, and fourth position have been designed for every exon (23 exons) representing A TP 7 A, using the OligoDesign software tool, described herein. The C6-amino-linked capture probes were
spotted onto In mobilizer slides and hybridized with patient samples with Cy3 fluorescent dye and a known reference genomic sample with Cy5. After mixing equal amounts of the labelled DNA, the probe is hybridized it to array. The ratio of Cy5 signal to Cy3 for each clone indicates differences in chromosome/DNA material. For example, the Cy5 signal is higher than Cy3 if the patient genome has a deletion, and is lower if there is duplication. In regions that are unchanged, the Cy5:Cy3 ratio is 1:1. These methods can be used to analyze a number of well-characterized Menkes patients with a range of partial deletions of ATP7A.
LNA oligonucleotide-based CGH makes it possible to assess a large number of chromosomal aberrations that are being screen for in the cytogenetic clinic. In contrast, standard FISH analysis typically only detects large chromosomal rearrangements. In desirable embodiments, an array that contains a series of overlapping probes is used to detect a chromosomal deletion in a nucleic acid sample, such as a patient sample.
Clinical diagnostics
Clinical diagnosis is a key element in healthcare management and point-of-care. A large number of analyses in the hospitals are based on the use of robust, cost efficient, sensitive and highly specific diagnostic tests. Thus, the diagnosis of various diseases is performed with a high selectivity and reliability, resulting in confirmation of medical diagnosis, choice of therapy and follow-up treatment as well as prevention. In addition to its importance in the quality of healthcare provided to patients, clinical diagnosis also contributes to the control of healthcare costs. The field of clinical diagnostics involves analyzing biological fluid samples (blood, urine, etc.) or biopsies collected from patients in order to establish the diagnosis of diseases, whether of infectious, metabolic, endocrine or cancerous origin. Medical analysis of infectious diseases involves testing and identifying the micro-organisms causing the infection e.g. testing for and identifying a micro-organism in blood and determining its susceptibility to antibiotics or detecting an antigen-antibody reaction produced as a response to an attack by a micro-organism in the human body, e.g. testing for antibodies for the diagnosis of hepatitis. The accurate diagnosis of metabolic and endocrine diseases and cancers, resulting in a disease phenotype with a bodily imbalance, involves the measurement of diagnostic substances or elements present in the biological fluids or biopsies. These substances are examined and results are interpreted with reference to known normal values. Use of diagnostic kits in microbiological control
The pharmaceutical, cosmetics and agri-food industries are being confronted with increasingly strict quality standards. Thus, the purpose of industrial microbiological control
testing is to detect and measure the presence of potentially pathogenic microbial contaminants throughout the manufacturing process from raw materials to the finished products, as well as in the production environment. The obtained results are subsequently compared to the current regulatory guidelines and industry standards. Application of molecular biological techniques to in vitro diagnostics
Recently, several different molecular biological techniques have been used successfully in accurate quantification of RNA levels in clinical diagnosis as well as in microbiological control. The applications are wide-ranging and include methods for quantification of the regulation and expression of drug resistance markers in tumour cells, monitoring of the responses to chemotherapy, measuring the biodistribution and transcription of gene-encoded therapeutics, molecular assessment of the tumor stage in a given cancer, detecting circulating tumor cells in cancer patients and detection of bacterial and viral pathogens. The reverse transcription polymerase chain reaction (RT-PCR) is the most sensitive method for the detection of mRNA, including low abundant mRNAs, often obtained from limited tissue samples in clinical diagnostics. The application of fluorescence techniques to RT-PCR combined with suitable instrumentation has led to development of quantitative RT-PCR methods, combining amplification, detection and quantification in a closed system avoid from contamination and with minimized hands-on time. The two most commonly used quantitative RT-PCR techniques are the Taqman RT-PCR assay (ABI, Foster City, USA) and the Lightcycler assay (Roche, USA). A third method applied to detection and quantification of RNA levels is real-time nucleic acid sequence based amplification (NASBA) combined with molecular beacon detection molecules. NASBA is a singe-step isothermal RNA-specific amplification method that amplifies mRNA in a double stranded DNA environment, and this method has recently proven useful in the detection of various mRNAs and in the detection of both viral and bacterial RNA in clinical samples. Finally, the recent explosion in microarray technology holds the promise of using microarrays in clinical diagnostics. For example van't Veer et al. (Nature 2002: 415, 31) describe the successful use of microarrays in obtaining digital mRNA signatures from breast tumors and the use of these signatures in the precise prediction of the clinical outcome of breast cancer in patients. The success of exploiting molecular biological techniques in diagnostics and diagnostic kits depends on continuous optimization of the technologies and the development of new robust and cost-effective technology platforms for producing accurate, reproducible and valid clinical data. Locked nucleic acid (LNA) oligonucleotides constitute a novel class of bicyclic
RNA analogs having an exceptionally high affinity and specificity toward their complementary DNA and RNA target molecules. Besides increased thermal stability, LNA- containing oligonucleotides show significantly increased mismatch discrimination, and allow full control of the melting temperature across microarray hybridizations. The LNA chemistry is completely compatible with conventional DNA phosphoramidite chemistry and thus LNA substituted oligonucleotides can be designed to optimize performance. LNA oligonucleotides would be well-suited for large-scale clinical studies providing highly accurate genotyping by direct competitive hybridization of two allele-specific LNA probes to e.g. microarrays of immobilized patient amplicons. In addition, the use of LNA substituted oligonucleotides would increase both sensitivity and specificity in detection and quantification of mRNA levels in clinical samples, either by quantitative RT-PCR, quantitative NASBA or oligonucleotide microarrays, compared with DNA probes. Application of LNA oligonucleotides into diagnostic kits would thus significantly enhance their performance. Finally, the use of LNA substituted oligonucleotides would increase the sensititity and specificity in the detection of alternatively spliced mRNA isoforms and non-coding RNAs either by homogeneous assays (Taqman assay, Lightcycler assay, NASBA) or by oligonucleotide microarrays in a massive parallel analysis setup. Optimized Nucleic Acids of the Invention
Decreasing the variation in melting temperatures (Tm) of a population of nucleic acids allows the nucleic acids to hybridize to target molecules under similar binding conditions, thereby simplifying the simultaneous hybridization of multiple nucleic acids. Similar melting temperatures also allow the same hybridization conditions to be used for multiple experiments, wliich is particularly useful for assays involving hybridization to nucleic acids of varying "AT" content. For example, current methods often require less stringent conditions for hybridization of nucleic acids with high "AT" content compared to nucleic acids with low "AT" content. Due to this variation in hybridization stringency, current methods may require significant trial and error to optimize the hybridization conditions for each experiment.
To overcome limitations in current nucleic acid hybridization and/or amplification techniques, we have developed populations of nucleic acid probes or primers with minimal variation in melting temperature (U.S.S.N. 60/410,061). For example, the unique properties of LNA nucleotide analogs increase their binding affinity for DNA and RNA. The stability of duplexes can generally be ranked as follows: DNA:DNA < DNA:RNA < RNA:RNA <
LNA:DNA < LNA:RNA < LNA:LNA. The DNA:DNA duplex is thus the least stable and the LNA:LNA duplex the most stable. The affinity of the LNA nucleotides A and T corresponds approximately to the affinity of DNA G and C to their complementary bases. General substitution of one or more A and T nucleotides with LNA A and LNA T in DNA oligonucleotides is therefore a simple way of equalizing differences in Tm. Furthermore, the mean melting temperature is increased significantly, which is often important for shorter oligonucleotides. For example, predictions of melting temperature of all possible 9-mer oligonucleotides have shown that the mean temperature increases from 39.7 °C to 59.3 °C by substituting all DNA A and T nucleotides with LNA A and T nucleotides. The variance in Tm of all 9-mers furthermore decreases from 59.6 °C for DNA oligonucleotides to only 4.7 °C for the LNA substituted oligonucleotides. The estimations are based on the latest LNA Tm prediction algorithms such as those disclosed herein, which have a variance of 6-7 °C. If desired, the capture efficiency of one or more nucleic acids can be increased by including any of the high affinity nucleotides (e.g., LNA units) described herein within the nucleic acids. The examples herein also provide algorithms for optimizing the substitution patterns of the nucleic acids to minimize self-complementarity that may otherwise inhibit the binding of the nucleic acids to target molecules.
For various applications of the nucleic acids and arrays of the invention, LNA A and LNA T substitutions are made to equalize the melting temperatures of the nucleic acids. In other embodiments, LNA A and LNA C substitutions are made to minimize self- complementarity and to increase specificity. LNA C and LNA T substitutions also minimize self-complementarity. Additionally, oligonucleotides containing LNA C and LNA T are desirable because these modified nucleotides are easy to synthesis and are especially useful for applications such as antisense technology in which minimizing cost is especially desirable.
The following non-limiting examples are illustrative of the invention. All documents mentioned herein are incorporated herein by reference in their entirety. In the following Examples, compound reference numbers designate the compound as shown in Scheme 1 and 2 herein.
Example 1 : The Use of LNA-modified Oligonucleotides in Microarrays Provide Significantly Improved Sensitivity and Specificity in Expression Profiling
This example demonstrates the advantages of using LNA oligonucleotide microarrays in gene expression profiling experiments. Capture probes for the Saccharomyces cerevisiae genes SWI5 (YDR146C) and THI4 (YGR144W) were designed as 50-mer standard DNA and different LNA/DNA "mixmer" oligonucleotides (i.e., oligonucleotides containing both LNA and DNA nucleotides) respectively, for comparison (Table 2). In addition, 40-mer oligonucleotides were designed as truncated versions of the 50-mer capture probes (Table 2). The specificity of the LNA oligoarrays was addressed by introducing 1-5 consecutive mismatches positioned in the middle of 40-mer LNA/DNA mixmer capture probes with LNA in every fourth position. To assess the sensitivity of DNA versus LNA capture probes in complex hybridization mixtures, in vitro synthesized yeast RNA for either SWI5 or THI4 was spiked into Caenorhabditis elegans total RNA for cDNA target synthesis. These experiments are described further below. Cultivation of Caenorhabditis elegans worms
Mixed stage C. elegans cultures were grown according to standard methods. Samples were harvested by centrifugation at 3,000xg, suspended in RNA Later storage buffer (Ambion, USA), and immediately frozen in liquid nitrogen. RNA extraction RNA was extracted from the worm samples using the FastRNA® Kit, GREEN (Q-
BIO) essentially according to the suppliers' instructions. In vitro RNA synthesis
Amplification of the yeast genes was performed using standard PCR with yeast genomic DNA as the template. In the first step, a forward primer containing a restriction enzyme site and a reverse primer containing a universal linker sequence were used. In the second PCR reaction, the reverse primer was exchanged with a nested primer containing a poly-T20 tail and a restriction enzyme site. The DNA fragments were ligated into the pTRIamplδ vector (Ambion, USA) using the Quick Ligation Kit (New England Biolabs, USA) according to the supplier's instructions and transformed into E. coli DH-5α by standard methods. The PCR clones were sequenced using Ml 3 forward and Ml 3 reverse primers on an ABI 377 (Applied Biosystems, USA). Synthesis of in vitro RNA was carried out using the MEGAscript™ T7 Kit (Ambion, USA) according to the manufacturer's instructions.
Design and synthesis of the LNA capture probes
To design the capture probes, regions with unique mRNA sequence of the selected target genes were identified. The optimal 50-mer oligonucleotide sequences with respect to Tm, self-complementarity, and secondary stracture were selected. LNA modifications were incorporated to increase affinity and specificity. Printing of the LNA microarrays
The microarrays were printed on Immobilizer™ Micro Array Slides (Exiqon, Denmark) using the Biochip One Arrayer from Packard Biochip technologies (Packard, USA). The arrays were printed with a spot volume of 2 x 300 pi of a 10 μM capture probe solution. Four replicas of the capture probes were printed on each slide. Synthesis offluorochrome labelled first strand cDNAfrom total RNA
Ten ng of S. cerevisiae in vitro synthesized RNA (either SWI5 or THI4) was combined with 10 μg of C. elegans total RNA and 5 μg oligo dT primer (T20VN) in an RNase free, pre-siliconized 1.5 mL tube, and the final volume was adjusted with DEPC- ater to 8 μL. The reaction mixture was heated at +70°C for 10 minutes, quenched on ice for 5 minutes, and spun for 20 seconds, followed by addition of 1 μL SUPERase-In™ (20U/μL, RNAse inhibitor, Ambion, USA), 4 μL 5xRTase buffer (Invitrogen, USA), 2 μL 0.1 M DTT (Invitrogen, USA), 1 μL dNTP (20mM dATP, dGTP, dTTP; 0.4 mM dCTP in DEPC-water, Amersham Pharmacia Biotech, USA), and 3 μL Cy3™-dCTP or Cy5™-dCTP (Amersham Pharmacia Biotech, USA). First strand cDNA synthesis was carried out by adding 1 μL of Superscript™ II (Invitrogen, 200 U/mL), mixing, and incubating the reaction mixture for one hour at 42°C. An additional 1 μL of Superscript™ II was added, and the cDNA synthesis reaction mixture was incubated for an additional one hour at 42°C; the reaction was stopped by heating at 70°C for 5 minutes, and quenching on ice for 2 minutes. The RNA was hydrolyzed by adding 3 μL of 0.5 MNaOH, and incubating at 70°C for 15 minutes. The samples were neutralized by adding 3 μL of 0.5 M HC1, and purified by adding 450 μL lxTE buffer, pH 7.5 to the neutralized sample and transferring the samples onto a Microcon-30 concentrator. The samples were centrifuged at 14000xg in a microcentrifuge for ~8 minutes, the flow-through was discarded and the washing step was repeated twice by refilling the filter with 450 μl lxTE buffer and by spinning for -12 minutes. Centrifugation was continued until the volume was reduced to 5 μL, and finally the labelled cDNA probe was eluted by inverting the Microcon-30 tube and spinning at lOOOxg for 3 minutes.
Hybridization withfluorochrome-labelled cDNA
The arrays were hybridized overnight using the following protocol. The Cy3™ or Cy5™-labelled cDNA samples were combined in one tube followed by addition of 3 μL 20χSSC (3xSSC final), 0.5 μL 1 M HEPES, pH 7.0 (25 mM final), 25 μg yeast tRNA (1.25 5 μg/μL final), 10 μg PolyA blocker (0.5 μg/μL final), 0.6 μL 10% SDS (0.3% final), and DEPC-treated water to 20 μL final volume. The labelled cDNA target sample was filtered in a Millipore 0.22 micron spin column according to the manufacturer's instructions (Millipore, USA), and the probe was denatured by incubating the reaction at 100°C for 2 minutes. The sample was cooled at 20-25 °C for 5 minutes by spinning at maximum speed in a
10 microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was carefully placed on top of the microarray spotted on ImmobilizerTM MicroArray Slide, and the hybridization mixture was applied to the array from the side. An aliquot of 30 μL of 3xSSC was added to both ends of the hybridization chamber, and the Immobilizer™ MicroArray Slide was placed in the hybridization chamber. The chamber was sealed watertight and incubated at 65°C for 16-
15 18 hours submerged in a water bath. After hybridization, the slide was removed carefully from the hybridization chamber and washed using the following protocol. The Lifterslip coverslip was washed off in 2 x SSC, pH 7.0 containing 0.1% SDS at room temperature for one minute, followed by washing of the microarrays subsequently in 1.0 x SSC, pH 7.0 at room temperature for one minute, and then in 0.2 x SSC, pH 7.0 at room temperature for one
20 minute. Finally, the slides were washed for 5 seconds in 0.05 x SSC, pH 7.0. The slides were then dried by centrifugation in a swinging bucket rotor at approximately 600 rpm for 5 minutes. Data analysis
Following washing and drying, the slides were scanned using a ScanArray 4000XL
25 scanner (Perkin-Elmer Life Sciences, USA), and the array data were processed using the GenePix™ Pro 4.0 software package (Axon, USA). Results
Incorporation of LNA nucleotides at every third nucleotide position in standard 50- mer expression array oligonucleoitde capture probes resulted in a 3-fold increase in
30 fluorescence intensity levels, when hybridized under standard stringency conditions (Figure 12). When the hybridization temperature is increased from 65 °C to 70 °C, the capture of the SWI5 spike mRNA by LNA 50-mer oligos is increased by 8-fold relative to the DNA
controls. Thus, it can clearly be concluded that oligonucleotides containing LNA units are more sensitive in expression profiling compared to oligonucleotides containing only DNA units. The specificity of 40-mer LNA/DNA mixmer capture probes in the discrimination of highly homologous target sequences was addressed by introducing 1-5 consecutive mismatches in the middle of$WI5 and THI4 capture oligos together with the corresponding DNA controls. As demonstrated in Figure 13, the LNA-spiked (LNA modification at every fourth nucleotide position) 40-mer triple mismatch oligos showed a 3-fold signal intensity decrease relative to the perfectly matched duplexes, whereas the corresponding 40-mer standard DNA capture probes did not form duplexes under standard hybridization stringency. Further, the 40-mer perfect match LNA capture probes showed a 5 -fold to 14-fold increase in the intensity levels compared to DNA oligonucleotides under standard hybridization conditions. Capture probes of other lengths and/or with other LNA substitution patterns can be used similarly. Table 2. DNA and LNA-modified SWI5 (YDR146C) and THI4 (YGR144W) oligonucleotide capture probes. LNA modifications are depicted by uppercase letters in the sequence, mt denotes the number of mismatches (bolded) in the center of the oligonucleotide with respect to its target cDNA (mRNA), and "mC" denotes LNA methyl cytosine. Oligo Name Sequence
YDR146C-40 acggggattatggtttcgccaatgaaaactaatcaaaggt
YDR 146C-40__mt1 acggggattatggtttcgcctatgaaaactaatcaaaggt
YDR146C-40 τιt2 acggggattatggtttcgcgtatgaaaactaatcaaaggt
YDR146C-40_mt3 acggggattatggtttcgggtatgaaaactaatcaaaggt
YDR146C-40_mt4 acggggattatggtttcgggtttgaaaactaatcaaaggt
YDR146C-40_mt5 acggggattatggtttcgggttagaaaactaatcaaaggt
YDR146C-40_LNA4 acGgggAttaTggtTtcgmCcaaTgaaAactAatcAaagGt
YDR146C-40_LNA4_mt1 acGgggAttaTggtTtcgmCctaTgaaAactAatcAaagGt YDR146C-40_LNA4_mt2 acGgggAttaTggtTtcgmCgtaTgaaAactAatcAaagGt YDR146C-40J_NA4_mt3 acGgggAttaTggtTtcgGgtaTgaaAactAatcAaagGt YDR146C-40_LNA4_mt4 acGgggAttaTggtTtcgGgttTgaaAactAatcAaagGt YDR146C-40_LNA4_mt5 acGgggAttaTggtTtcgGgttAgaaAactAatcAaagGt YDR146C-50 tgggaatggaacggggattatggtttcgccaatgaaaactaatcaaaggt
YDR146C-50_mt1 tgggaatggaacggggattatggtatcgccaatgaaaactaatcaaaggt
YDR146C-50_mt2 tgggaatggaacggggattatggtaacgccaatgaaaactaatcaaaggt
YDR146C-50_mt3 tgggaatggaacggggattatggtaaggccaatgaaaactaatcaaaggt
YDR146C-50_mt4 tgggaatggaacggggattatggaaaggccaatgaaaactaatcaaaggt
YDR146C-50_mt5 tgggaatggaacggggattatggaaagcccaatgaaaactaatcaaaggt
TgGgAaTgGaAcGgGgAtTaTgGtTtmCgmCcAaTgAaAamCtAaTcAaAg
YDR146C-50_LNA2 Gt
YDR146C-50_LNA3 TggGaaTggAacGggGatTatGgtTtcGccAatGaaAacTaaTcaAagGt
YDR146C-50_LNA4 TgggAatgGaacGgggAttaTggtTtcgmCcaaTgaaAactAatcAaagGt
YDR146C-50_LNA5 TgggaAtggaAcgggGattaTggttTcgccAatgaAaactAatcaAaggt
YDR146C-50_LNA6 TgggaaTggaacGgggatTatggtTtcgccAatgaaAactaaTcaaagGt
YDR146C-50_LN A3_mt1 TggGaaTggAacGggGatTatGgtAtcGccAatGaaAacTaaTcaAagGt
YDR146C-50_LNA3_mt2 TggGaaTggAacGggGatTatGgtAacGccAatGaaAacTaaTcaAagGt
YDR146C-50 .N A3_mt3 TggGaaTggAacGggGatTatGgtAagGccAatGaaAacTaaTcaAagGt
YDR146C-50_LNA3_mt4 TggGaaTggAacGggGatTatGgaAagGccAatGaaAacTaaTcaAagGt
YDR146C-50_LNA3_mt5 TggGaaTggAacGggGatTatGgaAagmCccAatGaaAacTaaTcaAagGt
YGR144W-40 ttgctgaactggatggattaaaccgtatgggtccaacttt
YGR144W-40_mt1 ttgctgaactggatggatttaaccgtatgggtccaacttt
YGR144W-40_mt2 ttgctgaactggatggatataaccgtatgggtccaacttt
YGR144W-40_mt3 ttgctgaactggatggatattaccgtatgggtccaacttt
YGR144W-40_mt4 ttgctgaactggatggatatttccgtatgggtccaacttt
YGR144W-40_mt5 ttgctgaactggatggatatttgcgtatgggtccaacttt
YGR144W-40_ NA4 ttGctgAactGgatGgatTaaamCcgtAtggGtccAactTt
YGR144W-40_LN A4_mt1 ttGctgAactGgatGgatTtaamCcgtAtggGtccAactTt
YGR144W-40_LNA4_ nt2 ttGctgAactGgatGgatAtaamCcgtAtggGtccAactTt
YGR144W-40_LNA4_mt3 ttGctgAactGgatGgatAttamCcgtAtggGtccAactTt
YGR144W-40_ NA4_mt4 ttGctgAactGgatGgatAtttmCcgtAtggGtccAactTt
YGR144W-40_LNA4_mt5 ttGctgAactGgatGgatAtttGcgtAtggGtccAactTt
YGR144W-50 ggtatggaagttgctgaactggatggattaaaccgtatgggtccaacttt
YGR144W-50_mt1 ggtatggaagttgctgaactggatcgattaaaccgtatgggtccaacttt
YGR144W-50_mt2 ggtatggaagttgctgaactggatccattaaaccgtatgggtccaacttt
YGR144W-50_mt3 ggtatggaagttgctgaactggaaccattaaaccgtatgggtccaacttt
YGR144W-50_mt4 ggtatggaagttgctgaactggaacctttaaaccgtatgggtccaacttt
YGR144W-50_mt5 ggtatggaagttgctgaactggaacctataaaccgtatgggtccaacttt
YGR144W-50_LNA3 GgtAtgGaaGttGctGaamCtgGatGgaTtaAacmCgtAtgGgtmCcaActTt
GgtAtgGaaGttGctGaamCtgGatmCgaTtaAacmCgtAtgGgtmCcaAct
YGR144W-50_LN A3_mt1 Tt
GgtAtgGaaGttGctGaamCtgGatmCcaTtaAacmCgtAtgGgtmCcaAct
YGR144W-50_LN A3_mt2 Tt
GgtAtgGaaGttGctGaamCtgGaamCcaTtaAacmCgtAtgGgtmCcaAct
YGR144W-50_LNA3_mt3 Tt YGR144W-50 NA3 mt4 GgtAtgGaaGttGctGaamCtgGaamCctTtaAacmCgtAtgGgtmCcaAct
Tt
GgtAtgGaaGttGctGaamCtgGaamCctAtaAacmCgtAtgGgtmCcaAct YGR144W-50_LNA3_mt5 Tt
Example 2: Detection of Alternative Splice Isoforms Using Exon-Specific, Internal LNA Capture Probes in the Caenorhabditis elegans Gene let-2 Capture probe design Finding the regions of interest
From the database "intronerator" [http://www.cse.ucsc.edu/~kent/intronerator/] as well as scientific literature, the C. elegans Let-2 gene encoding type IN collagen was found according to the following criteria. The generation of mature mRΝA desirably involves either complete exon or intron skipping. ESTs (expressed sequence tags) desirably indicate different isoforms. If ESTs were only different from the gene annotation(s), this could simply mean that the prediction is wrong, and nothing more. Desirably, there are different EST splice indications in different developmental stages. Two gene prediction algorithms (e.g., GeneFinder and Genie) desirably agree upon the number of genes in a coding segment. Exons of interest (e.g., exons being skipped and their flanking exons) in the C. elegans gene T01D3.3 desirably exceed 70 base-pairs. Other genes of interest may be selected using one or more of the above criteria or using other criteria, such as the medical relevance of the gene.
Determining melting temperatures and palindromic properties of the C. elegans let-2 gene/exons 8. 9. 10. and 11 -specific capture probes
The script PICK70 (which was kindly provided by Jingchun Zhu from the Joe DeRisi
Laboratory and which is publicly available) was used to run a sliding 50 base-pair window across the regions in which an oligonucleotide capture probe should be designed. The output data were saved for later. Determining the uniqueness of the regions
All regions were compared using a publicly available BLAST program to the complete set of annotated transcripts from the C. elegans genome downloaded from ΝCBI.
For each region a list with the location of all BLAST hits was made.
Choice of Desirable Ta for Capture Probes
From the PICK70 output, the distribution of melting temperatures for all possible oligonucleotides was collected. As these centered around approximately 80°C, this temperature was chosen as the desirable temperature.
For each region, an oligonucleotide with a palindromic value below 100 (default value in PICK70, value based on Smith- Waterman algorithm) and with a melting temperature closest to 80°C was picked. The location of the oligonucleotide within the region was then compared to the list made using the above BLAST search. If the oligonucleotide did not coincide with a BLAST hit exceeding around 25 (consecutive) base-pairs, this oligonucleotide sequence was chosen as a 50-mer capture probe. Otherwise, a new oligo sequence was picked from the PICK70 output. Checking Probe Sequences
The selected 50-mer oligonucleotide sequences were "BLASTed" against the C. elegans transcripts again, as described above. Accounting for the Introns. The oligonucleotide sequences were "BLASTed" against the complete C. elegans genome. The matches were run against a list made from the GenBank reports of the complete genome, indexing whether positions in the genome were genie or intergenic.
It was checked to determine whether new hits to genie regions appeared (compared to the initial BLAST search using the PICK70 output). If this was not case, the oligonucleotide sequences were selected for capture probe synthesis. Design of the LNA-modified Capture Probes
For the LNA-modified oligonucleotide capture probes, every fourth DNA nucleotide was substituted with an LNA nucleotide, as shown in Table 3. The oligonucleotides were synthesized with an anthraquinone (AQ) moiety at the 5'-end of each oligonucleotide (e.g., as described in allowed U.S.S.N. 09/611,833), followed by a hexaethyleneglycol tetramer linker region and the LNA/DNA mixmer capture oligonucleotide sequence.
Table 3. C. elegans let-2 gene/exons 8, 9, 10, and 11 -specific capture probes
Capture probe Sequence (LNA=uppercase, DNA=lowercase letters)
CE42.08-0HEG4 GgctGgatmCcccAggaAaccmCaggAatcGgaaGcatTggamCcaaAaggAg
CE42.09-0HEG4 mCaccGgatmCcggmCtcaAttgTcggAcctmCgcgGaaamCcctGgagAaaaGg
CE42.10-0HEG4 TccgmCcagGcccAatcGcctmCcacmCatgTccaAgggAaccAttaTcggTc
CE42.11-0HEG4 GagcmCaggAgagGgagGtcaAcgcGgttAcccAggaAatgGaggActcTc
Strains and Growth Conditions
C. elegans wild-type strain (Bristol-N2) was maintained on nematode growth medium (NG) plates seeded with Escherichia coli strain OP50 at 20°C, and the eggs and LI larvae were prepared as described in Hope, I. A. (ed.) " C. elegans - A Practical Approach ", Oxford University Press 1999. The samples were immediately flash frozen in liquid N2 and stored at - 80°C until RNA isolation. Isolation of Total RNA
A 100 μl aliquot of packed C. elegans worms from a LI larvae population was homogenized using the FastPrep Bio 101 from Kem-En-Tec for 1 minute at speed 6 followed by isolation of total RNA from the extracts using the FastPrep Biol 01 kit (Kem-En-Tec) according to the manufacturer's instructions. A 50 μl aliquot of packed C. elegans eggs was homogenized in lysis buffer (RNeasy total RNA purification kit, QIAGEN) containing quartz sand for 3 minutes using a Pellet Pestle Motor followed by isolation of total RNA according to the manufacturer's manual.
The eluted total RNA from worms (LI larvae) as well as eggs was ethanol precipitated for 24 hours at- 20°C by addition of 2.5 volumes of 96% EtOH and 0.1 volume of 3M Na-acetate, pH 5.2 (Ambion, USA), followed by centrifugation of the total RNA sample for 30 minutes at 13200 rpm. The total RNA pellet was air-dried and redissolved in 6 μl (worms) or 2.5 μl (eggs) of diethylpyrocarbonate (DEPC)-treated water (Ambion, USA) and stored at - 80°C. Reverse transcription (RT)-PCR
Total RNA (1.5 μg) from eggs or 1 μg total RNA from worms (LI larvae) were mixed with 5 μg oligo(dT)12-18 primer (Amersham Pharmacia Biotech, USA) and 0.5 μg of random hexamers, pd(N)6 (Amersham Pharmacia Biotech, USA) and DEPC-treated water to a final volume of 7 μl. The mixture was heated at 70°C for 10 minutes, quenched on ice for 5 minutes, followed by addition of 20 units of Superasin RNase inhibitor (Ambion, USA), 4 μl of 5 x Superscript buffer (Life Technologies, USA), 2 μl of 100 mM DTT, 1 μl of dNTP solution (20 mM each dATP, dGTP, dTTP and dCTP, Amersham Pharmacia Biotech, USA), and 3 μl of DEPC-treated water.
The primers were pre-annealed at 37°C for 5 minutes, followed by addition of 400 units of Superscipt II reverse transcriptase (Invitrogen, USA). First strand cDNA synthesis
was carried out at 37°C for 30 minutes, followed by 2 hours at 42°C, and the reaction was stopped by incubation at 70°C for 5 minutes, followed by incubation on ice for 5 minutes. Unincorporated dNTPs were removed by gel filtration using MicroSpin S-400 HR columns as described below. The column was pre-spun for 1 minute at 735 x g in a 1.5 ml tube, and the column was placed in a new 1.5 ml tube. The cDNA sample was slowly applied to the top center of the resin and spun at 735 x g for 2 minutes. The eluate was collected. The volume of the eluate was adjusted to 50 μl with TE-buffer pH 7.0 before being used as the template for linear PCR. Four μl template (RT from eggs or worms) was combined with 1 μl dNTP solution (lOmM each dATP, dGTP, dTTP and dCTP, Amersham Phamacia Biotech, USA), 1 μl of each primer ( 20μM CE42.07 sense: gatcgaattcctccaggagagaagggagatg, and CE42.12 antisense:
5'gatcaagcttatctcttcctgggtatccagctt), 5 μl 10 x AmpliTaq Gold Polymerase buffer, 5μl 25 mM MgCl2, 0.5 μl AmpliTaq Gold DNA polymerase (5U/μl, Applied Biosystems), 2 μl Cy3- dCTP (Amersham Phamacia Biotech, USA) (eggs) or 2 μl Cy5-dCTP (Amersham Pharmacia Biotech, USA) (worms), and 31.5 μl DEPC-treated water to a final volume of 50 μl. The PCR reactions were carried out using the following program: 95 °C for 5 minutes followed by 30 cycles of PCR using the following cycling program (denaturation at 95°C for 45 seconds, annealing at 60°C for 30 seconds, and extension at 72°C for 1 minute) followed by a final extension step at 72°C for 10 minutes and incubation on ice for 5 minutes. Purification of the PCR amplicons from eggs as well as worms was performed using a
Qiaquick PCR purification kit (QIAGEN) according to the manufacturer's instructions. Fluorochrome-labeling of the let-2 cDNA Fragments using Primer Extension
Four (4) μl template (RT from eggs or worms) was combined with 1 μl dNTP solution (lOmM each dATP, dGTP, dTTP and dCTP, Amersham Phamacia Biotech, USA), 1 μl of each primer (20μM CE42.12 antisense 5'gatcaagcttatctcttcctgggtatccagctt), 5 μl 10 x AmpliTaq Gold Polymerase buffer, 5μl 25 mM MgCl2, 0.5 μl AmpliTaq Gold DNA polymerase (5U/μl, Applied Biosystems), 2 μl Cy3-dCTP (Amersham Phamacia Biotech, USA) (eggs) or 2 μl Cy5-dCTP (Amersham Phamacia Biotech, USA) (worms), and 31.5 μl DEPC-treated water to a final volume of 50 μl. The PCR reactions were carried out using the following program: 95°C for 5 minutes followed by 30 cycles of PCR using the following cycling program (denaturation at 95°C for 45 seconds annealing at 60°C for 30 seconds
extension at 72°C for 1 minute) followed by a final extension step at 72°C for 10 minutes and incubation on ice for 5 minutes.
Purification of the PCR amplicons from eggs as well as worms were performed using a Qiaquick PCR purification kit (QIAGEN) according to the manufacturer's instructions. Unincorporated dNTP nucleotides were removed by gel filtration using MicroSpin S-400 HR columns as described above before the eluted, fluorochrome-labelled DNA fragments were stored at -20°C in the dark until microarray hybridization. Printing and Coupling of the C. elegans Let-2 Exon 8-11 Microarrays
The C. elegans gene Let-2/exon 8-11 capture probes were synthesized with a 5' anthraquinone (AQ)-modification, followed by a hexaethyleneglycol-4 (HEG4) linker (Table 3). The capture probes were first diluted to a 10 μM final concentration in 100 mM Na- phosphate buffer pH 7.0 and spotted on Euray COP microarray slides using the Biochip Arrayer One (Packard Biochip Technologies) with a spot volume of 300 pi and 300 μm between the spots. The capture probes were immobilized onto the microarray slide by UN irradiation in a
Stratalinker for 90 seconds at full power (Stratagene, USA). Νon-immobilized capture probe oligonucleotides were removed from the slides by washing the slides for Vz hour in 30% acetone before rising in milli-Q H2O. After washing, the slides were centrifuged at 800 rpm for 2 minutes and stored in a slide box until microarray hybridization. Comparative Hybridization of the C. elegans microarrays and Post-hybridization Washes The slides were hybridized with 2.5 μl of the Cy3-labelled and 2.5 μl of the Cy5- labelled target preparation from eggs and worms, respectively, as described above (see "Reverse transcription (RT)-PCR" section) in 25 μl of hybridization solution, containing 25 mM HEPES, pH 7.0, 3 x SSC, 0.3% SDS, and 25 μg of yeast tRΝA. The target probe was filtered in a Millipore 0.22 micron spin column (Ultrafree-MC, Millipore, USA), denatured by incubation at 100°C for 5 minutes, cooled at room temperature for 5 minutes, and then carefully applied onto the prepared microarray. One-third of a cover slip was laid over the microarray, and the hybridization was performed for 16-18 hours at 65°C in a hybridization chamber (DieTech, model Joe deRisi, USA). Following hybridization, the slides were washed sequentially by plunging gently in 2 x SSC/0.1% SDS at room temperature until the cover slip falls off into the washing solution, then in lx SSC pH 7.0 (150 mM ΝaCl, 15 mM Sodium Citrate) at room temperature for 1 minute, then in 0.2 x SSC, pH 7.0 (30 mM ΝaCl, 3 mM Sodium Citrate) at room temperature
for 1 minute, and finally in 0.05 x SSC (7.5 mM NaCl, 0.75 mM Sodium Citrate) for 5 seconds, followed by drying of the slides by spinning at 500 rpm for 5 minutes. The slides were stored in a slide box in the dark until scanning. Microarray data analysis The C. elegans let-2 gene microarray was scanned in an Array WoRx Scanner
(Applied Precision, USA) using an exposure time of 5 seconds, resolution of 5.0, and high (high level) sensitivity. The hybridization data were analyzed using the ArrayVision image analysis software package 5.1 (IMAGING Research Inc., USA). The detection principle for alternative exon skipping the C. elegans let-2 gene is shown in Figure 14. As demonstrated in Figure 15, analysis of the comparative hybridization data from the C. elegans Let-2 exon 8-11 array demonstrates detection of alternative exon skipping of the let-2 exon 9 (eggs) and exon 10 (LI larvae) using LNA-modified 50-mer capture probes. Capture probes of other lengths and/or with other LNA substitution patterns can be used similarly.
Example 3: Improved Sensitivity in the Specific Detection of the C. elegans Gene T01D3.3
Exon 4 Using LNA-Modified Oligonucleotide Capture Probes
Capture Probe Design: The design method of exon-specific capture probes for the C. elegans gene T01D3.3 exon 4 has been described in example 2.
Design of the LNA-modified Capture Probes: For the LNA-spiked oligonucleotide capture probes, every fourth DNA nucleotide was substituted with an LNA nucleotide, as shown in
Table 4.
Table 4. C. elegans gene T01D3.3/exon 4-specific capture probes.
Cultivation of Caenorhabditis elegans Worms
Mixed stage C. elegans cultures were grown according to standard methods. Samples were harvested by centrifugation at 3000xg, suspended in RNA Later (Ambion, USA), and immediately frozen in liquid nitrogen. mRNA Isolation from C. elegans Mixed Stages Worms
Poly(A)+RNA was isolated from the worm samples using the Pick-Pen (Bio-Nobile, Finland) Starter kit combined with the KingFisher mRNA purification kit (ThermoLabsystems, Finland) according to the manufacturer's instructions. The yield was 1- 2 μg poly(A)+RNA from approximately 50 mg of C. elegans worms. Synthesis of fluorochrome labelled first strand cD A from C. elegans mRNA
One μg of C. elegans poly(A)+RNA was combined with 2 μg oligo dT primer (T20VN) in an RNase free, pre-siliconized 1.5 mL tube, and the final volume was adjusted with DEPC- ater to 8 μL. The reaction mixture was heated at +70°C for 10 minutes, quenched on ice 5 minutes, spun for 20 seconds, followed by addition of 1 μL SUPERase- In™ (20U/μL, RNAse inhibitor, Ambion, USA), 4 μL 5xRTase buffer (Invitrogen, USA), 2 μL 0.1 M DTT (Invitrogen, USA), 1 μL dNTP (20mM dATP, dGTP, dTTP; 4 mM dCTP in DEPC-water, Amersham Pharmacia Biotech, USA), and 3 μL Cy3™-dCTP (Amersham Pharmacia Biotech, USA). First strand cDNA synthesis was carried out by adding 1 μL of Superscript™ II (Invitrogen, 200 U/mL), mixing, and incubating the reaction mixture for one hour at 42°C. An additional 1 μL of Superscript™ II was added and the cDNA synthesis reaction mixture was incubated for an additional one hour at 42°C; the reaction was stopped by heating at 70°C for 5 minutes, and quenching on ice for 2 minutes. The RNA was hydrolyzed by adding 3 μL of 0.5 M NaOH and incubating at 70°C for 15 minutes. The samples were neutralized by adding 3 μL of 0.5 M HC1 and purified by adding 450 μL lxTE buffer, pH 7.5 to the neutralized sample and transferring the samples onto a Microcon-30 concentrator. The samples were centrifuged at 14000xg in a microcentrifuge for ~8 minutes, the flow-through was discarded, and the washing step was repeated twice by refilling the filter with 450 μl lxTE buffer and by spinning for -12 minutes. Centrifugation was continued until the volume was reduced to 5 μL, and finally the labelled cDNA probe was eluted by inverting the Microcon-30 tube and spinning at lOOOxg for 3 minutes.
Printing and Coupling of the C. elegans Microarrays
The C. elegans gene T01D3.3/exon 4 capture probes were synthesized with a 5' anthraquinone (AQ)-modification, followed by either a hexaethyleneglycol-2 or a hexaethyleneglycol-4 (HEG2/HEG4) linker (Table 4). The capture probes were first diluted to a 10 μM final concentration in 100 mM Na-phosphate buffer pH 7.0, followed by a twofold dilution series (10 μM, 5 μM, 2.5 μM, 1.25 μM, 0.625 μM, 0.31 μM, and 0.155 μM) and spotted on Exiqon's polycarbonate microarray slides using the Biochip Arrayer One (Packard Biochip Technologies, USA) with a spot volume of 3x 300 pi and 400 μm between the spots. The capture probes were immobilized onto the microarray slide by UN irradiation in a Stratalinker for 90 seconds at full power (Stratagene, USA). Νon-immobilized capture probe oligonucleotides were removed from the slides by washing the slides for 24 hours in milli-Q H O. After washing, the slides were dried in an oven at 37°C for 30 minutes and stored in a slide box until microarray hybridization. Hybridization with CyS-labelled cDNA The arrays were hybridized overnight using the following protocol. The Cy3™- labelled cDΝA sample was combined with 3 μL 20xSSC (3xSSC final), 0.5 μL 1 M HEPES, pH 7.0 (25 mM final), 25 μg yeast tRΝA (1.25 μg/μL final), 10 μg PolyA blocker (0.5 μg/μL final), 0.6 μL 10% SDS (0.3% final), and DEPC-treated water to 20 μL final volume. The labelled cDΝA target sample was filtered in a Millipore 0.22 micron spin column according to the manufacturer's instructions (Millipore, USA), and the probe was denatured by incubating the reaction at 100°C for 2 minutes. The sample was cooled at 20-25 °C for 5 minutes by spinning at maxium speed in a microcentrifuge, and then carefully applied on top of the microarray. A cover slip was laid over the microarray and the hybridization was performed for 16 hours at 63 °C in a hybridization chamber (Corning, USA) submerged in a water bath, with an aliquot of 30 μL of 3xSSC added to both ends of the hybridization chamber to prevent evaporation. After hybridization, the slide was removed carefully from the hybridization chamber and washed using the following protocol. The coverslip was washed off in 2 x SSC, pH 7.0 containing 0.1% SDS at room temperature for one minute, followed by washing of the microarrays subsequently in 1.0 x SSC, pH 7.0 at room temperature for one minute, and then in 0.2 x SSC, pH 7.0 at room temperature for one minute. Finally, the slides were washed for 5 seconds in 0.05 x SSC, pH 7.0. The slides were then dried by centrifugation in a swinging bucket rotor at approximately 600 rpm for 5 minutes.
Microarray data analysis
The C. elegans gene T01D3.3 exon 4 array was scanned in an Array WoRx Scanner (Applied Precision, USA) using an exposure time of 5 seconds, resolution of 5.0, and high (high level) sensitivity. The hybridization data were analyzed using the ArrayNision image analysis software package 5.1 (IMAGING Research Inc., USA). As shown in Figure 16, analysis of the hybridization data from the C. elegans gene 26/T01D3.3 exon 4 array demonstrates that the use of LNA-modified capture probes for the C. elegans T01D3.3 exon results in 5 -fold increased sensitivity in exon 4 capture compared to the corresponding DNA oligonucleotide capture probe controls printed on the same microarray. Capture probes of other lengths and/or with other LNA substitution patterns can be used similarly.
Example 4: Assessment of Capture Probe Specificity for the C. elegans Gene T01D3.3 Exons
4 and 5 Using Synthetic Antisense Target Oligos
Capture probe design: Exon-specific capture probes for the C. elegans gene T01D3.3 exons 4 and 5 were designed as described in Example 2.
Design of the LNA-modified capture probes: For the LNA-spiked oligonucleotide capture probes, every fourth DNA nucleotide was substituted with an LNA nucleotide, as shown in
Table 5: C. elegans gene T01D3.3/exons 4 and 5-specific capture probes and synthetic target oligonucleotides. Table 5.
Printing and coupling of the C. elegans gene T01D3.3/exon 4-5 microarrays
The C. elegans gene T01D3.3/exon 4-5 capture probes were synthesized with a 5' anthraquinone (AQ)-modification, followed by either a hexaethyleneglycol-2 or a hexaethyleneglycol-4 (HEG2/HEG4) linker (Table 5). The capture probes were first diluted to a 10 μM final concentration in 100 mM Na-phosphate buffer pH 7.0, followed by a twofold dilution series (10 μM, 5 μM, 2.5 μM, 1.25 μM, 0.625 μM, 0.31, μM, and 0.155 μM) and spotted on Euray polycarbonate microarray slides using the Biochip Arrayer One (Packard Biochip Technologies) with a spot volume of 3x 300 pi and 400 μm between the spots. The capture probes were immobilized onto the microarray slide by UN irradiation in a Stratalinker for 90 seconds at full power (Stratagene, USA). Νon-immobilized capture probe oligonucleotides were removed from the slides by washing the slides for 24 hours in milli-Q H2O. After washing, the slides were dried in an oven at 37°C for 30 minutes, and stored in a slide box until microarray hybridization.
Hybridization of the C. elegans microarrays and post-hybridization washes
The slides were hybridized with a high (saturated) concentration of 1 μM of each gene T01D3.3, exon 4 or 5 target oligo (Table 5) in 50 μl of hybridization solution, containing 25 mM HEPES, pH 7.0, 3 x SSC, 0.22 % SDS, and 0.8 μg/μl of poly(A) blocker. The target probes were filtered in a Millipore 0.45 micron spin column (Ultrafree-MC, Millipore, USA), denatured by incubation at 100 °C for 2 minutes, cooled at room temperature for 5 minutes, and then carefully applied onto the prepared microarray. One-half of a cover slip was laid
over the microarray, and the hybridization was performed for 16-18 hours at 63 °C in a hybridization chamber (Coming, USA).
Following hybridization, the slides were washed sequentially by plunging gently in 1 x SSCT (150 mM NaCl, 15 mM Sodium Citrate + Tween 20) at room temperature for one minute, then in 0.2 x SSCT (30 mM NaCl, 3 mM Sodium Citrate + Tween 20) at room temperature for one minute, and finally in Milli Q water, followed by drying of the slides in an oven at 37°C for 30 minutes. The slides were Cy5 labelled using a Cy5-straptavidin target. Thirty μl of a Cy 5 -streptavidin (2μg/ml in 1 x SSCT) were carefully applied onto the hybridized microarray and incubated one hour at room temperature before an additional washing step were performed in 1 x SSCT (150 mM NaCl, 15 mM Sodium Citrate + Tween 20) at room temperature for one minute, then in 0.2 x SSCT (30 mM NaCl, 3 mM Sodium Citrate + Tween 20) at room temperature for one minute, and finally in Milli Q water. Following washing, the slides were drying in an oven at 37°C for 30 minutes and stored in a slide box in the dark until scanning. Microarray data analysis
The C. elegans gene T01D3.3 exon 4-5 microarray was scanned in an ArrayWoRx Scanner (Applied Precision, USA) using an exposure time of 5 seconds, resolution of 5.0, and high (high level) sensitivity. The hybridization data were analyzed using the ArrayNision image analysis software package 5.1 (IMAGING Research Inc., USA). As shown in Figure 18, analysis of the hybridization data from the C. elegans gene 26/T01D3.3 array demonstrates that both the DNA as well as the LNA capture probes for the C. elegans T01D3.3 exons 4 (Figure 18 (upper)) and exon 5 (Figure 18 (lower)), respectively are highly specific with a very low level of cross-hybridization between their respective target oligonucleotides. The exon-specific design of the oligonucleotide capture probes is thus validated. Capture probes of other lengths and/or with other LNA substitution patterns can be used similarly.
Example 5: Detection of Alternatively Spliced Isoforms using Internal Exon-specific, and Exon-Exon Junction-Specific (merged) LNA-modified Capture Probes
Oligonucleotide design for microarrays.
Methods for designing exon-specific internal oligonucleotide capture probes has been described in Example 2. Design of the LNA-modified capture probes For the LNA-modified oligonucleotide capture probes, every third DNA nucleotide was substituted with an LNA nucleotide. The probes designed to capture the junction of the recombinant splice variants were designed with LNA modifications in a block of five consecutive LNAs nucleotides, two on the 5' side of the splice junction and three on the 3' side of the splice junction. All capture probes are shown in Table 6.
Table 6. Internal, exon-specific and merged, exon-exon junction specific oligonucleotide capture probes.
Capture probes Sequence (LNA=uppercase, DNA lowercase letters) gene78.01a cctgaaagtagatttgttatttccgaaacgccttctcccgttcttaagtc gene78.0 lb catataccacaaatagtccctcaaaaatcacaagaaaactcacaacactg gene78.03a gatttgcagcggtggtaaaaagtatgaaaacgtggtaattaaaaggtctc gene78.03b ccaatgaaaactaatcaaaggtaaacgtggatcccatggcaattcccggg gene78.m011NS3 caacactgcccagaggttcaatcgatccgatgatcctaatgaaggcgccc gene78.mIΝS303 gtccagtatcgtccatcatagtatcgataaatatgtgaaggaaatgcctg gene78.m011NS4 caacactgcccagaggttcaatcgatgtgtgataggatcagtgttcaggg gene78.mlNS403 gaaggcgaaggagactgctaatatcgataaatatgtgaaggaaatgcctg gene78.01 a_50_LΝA3 mCctGaaAgtAgaTttGttAttTccGaaAcgmCctTctaCccGttaιCttAagTc gene78.01b_50_LNA3 mCatAtamCcamCaaAtaGtcmCctmCaaAaaTcamCaaGaaAacTcamCaamCacTg gene78.03a_50_LΝA3 GatTtgmCagmCggTggTaaAaaGtaTgaAaamCgtGgtAatTaaAagGtcTc gene78.03b_50_LNA3 mCcaAtgAaaActAatmCaaAggTaaAcgTggAtcmCcaTgg CaaTtcmCcgGg gene78.m01INS3_50_block caacactgcccagaggttcaatcGATmCmCgatgatcctaatgaaggcgccc gene78.rnlNS303_50_block gtccagtatcgtccatcatAGTATcgataaatatgtgaaggaaatgcctg gene78.m011NS4_50_block caacactgcccagaggttcaatcGATGTgtgataggatcagtgttcaggg gene78.mlNS403_50_block gaaggcgaaggagactgctAATATcgataaatatgtgaaggaaatgcctg
Printing and Coupling of the Splice Isoform-Specific Microarrays
The splice variant capture probes were synthesized with a 5' anthraquinone (AQ)- modification, followed by a hexaethyleneglycol-2 (HEG2) linker. The capture probes were first diluted to a 20 μM final concentration in 100 mM Νa-phosphate buffer pH 7.0, and
spotted on the Immobilizer polymer microarray slides (Exiqon, Denmark) using the Biochip Arrayer One (Packard Biochip Technologies, USA) with a spot volume of 2x 300 pi and 300 μm between the spots. The capture probes were immobilized onto the microarray slide by UV irradiation in a Stratalinker with 2300 μjoules (Stratagene, USA). Non-immobilized capture probe oligonucleotides were removed from the slides by washing the slides two times 15 minutes in lxSSC. After washing, the slides were dried by centrifugation at lOOOx g for 2 minutes, and stored in a slide box until microarray hybridization. Construction of Splice Variant Vectors
The recombinant splice variant constructs were cloned into the Triamplδ vector (Ambion, USA). The constructs were sequenced to confirm their construction. The plasmid clones were transformed into E. coli XLIO-Gold (Stratagene, USA). Triampl8/SWI5 vector construct
Genomic DNA was prepared from a wild-type standard laboratory strain of Saccharomyces cerevisiae using the Nucleon MiY DNA extraction kit (Amersham Biosciences, USA) according to the supplier's instmctions. Amplification of the partial yeast gene was performed using standard PCR with yeast genomic DNA as the template. In the first step of amplification, a forward primer containing a restriction enzyme site and a reverse primer containing a universal linker sequence were used. In this step, 20 base-pairs were added to the 3 '-end of the amplicon, next to the stop codon. In the second step of amplification, the reverse primer was exchanged with a nested primer containing a poly-T2o tail and a restriction enzyme site. The SWI5 amplicon contains 730 bp of the SWIS ORF plus a 20 bp universal linker sequence and a ρoly-A2o tail. The PCR primers used were YDR 146C-For-EcoRI (acgtgaattcaaatacagacaatgaaggagatga), YDR146C-Rev-Uni (gatccccgggaattgccatgttacctttgattagttttcattggc), and Uni-polyT-BamHI (acgtggatccttttttttttttttttttttgatccccgggaattgccatg).
The PCR amplicon was cleaved with the restriction enzymes EcoRI and BamHI. The DNA fragment was ligated into the pTRIampl8 vector (Ambion, USA) using the Quick Ligation Kit (New England Biolabs, USA) according to the supplier's instructions and transformed into E. coli DH-5α by standard methods. Construction of the Recombinant Splice Variant #1 (Triampl8/swi5-rubisco)
The Arabidopsis thaliana Rubisco small subunit ssu2b gene fragment (gil7064721) was amplified from genomic DNA using primers named DJ305 (5'- ACTATGATGGACGATACTGGAC-3') and DJ306 (5'-
ATTGGATCGATCCGATGATCCTAATGAAGGC-3'), containing CM restriction site linkers. The purified PCR fragment was digested with Clal and then cloned into the swi5 (gl:7839148) vector at the unique Clal site (atcgat) giving each insert a flanking sequence from the original yeast SWI5 insert (named exonOl and exon 03, Figure 19). The product was inserted in the reverse orientation, so that the insert sequence is as follows: atcgatCCGATGATCCTAATGAAGGCGCCCGGGTACTCCTTCTTGCATTCTTCAACTT CCTTCAACACTTGAGCGGAGTCGGTGCATCCGAACAATGGAAGCTTCCACATTGT CCAGTATCGTCCATCATAGTatcgat
Nucleotide sequence analysis revealed a difference between the sequence of A. thaliana rabisco expected from the GenBank database and that obtained from all sequenced constructs and PCR products. Position 30 in the Rubisco insert is "C" rather than the expected "A." This SNP was probably created by PCR. None of the oligonucleotide capture probes used in the example cover this region. The Rubisco sequence in Genbank is TCCTAATGAAGGCGCCA, and the sequence obtained from the plasmid contract is TCCTAATGAAGGCGCCC.
Construction of the Recombinant Splice Variant # 2 (Triampl8/swi5-lea)
The Arabidopsis thaliana Lea gene (gi 1526423) was amplified from genomic DNA with primers named DJ307 (5'-GGAATTATCGATGTGTGATAGGATCAGTGTTCAG-3'), andDJ308 (5'-AATTGGATCGATATTAGCAGTCTCCTTCGCC-3'), including the Clal linker sites as above. The PCR fragment was digested with Clal cloned into the yeast SWI5 INT constract as above at the unique Clal site.
The fragment was inserted in the forward orientation, resulting in the following insert sequence: atcgatGTGTGATAGGTTCAGTGTTCAGGGCTGTCCAAGGAACGTATGAGCATGCGA GAGACGCTGTAGTTGGAAAAACCCACGAAGCGGCTGAGTCTACCAAAGAAGGA GCTCAGATAGCTTCAGAGAAAGCGGTTGGAGCAAAGGACGCAACCGTCGAGAA AGCTAAGGAAACCGCTGATTATACTGCGGAGAAGGTGGGTGAGTATAAAGACTA TACGGTTGATAAAGCTAAAGAGGCTAAGGACACAACTGCAGAGAAGGCGAAGG AGACTGCTAATatcgat. In vitro RNA Preparation fi-om Splice Variant Vectors
In vitro RΝA from the splice variants were made using the MEGAscript™ high yield transcription kit according to the manufacturer's instructions (Ambion, USA). The yield of INT RΝA was quantified at aΝanodrop spectrophotometer (Νanodrop Technologies, USA).
Isolation of total RNA from C.elegans
C. elegans wild-type strain (Bristol-N2) was maintained on nematode growth medium (NG) plates seeded with Escherichia coli strain OP50 at 20°C, and the mixed stages of the nematode were prepared as described by Hope (ed.) ("C. elegans - A Practical Approach", Oxford University Press 1999). The samples were immediately flash frozen in liquid N2 and stored at -80°C until RNA isolation.
A 100 μl aliquot of packed C. elegans worms from a mixed stage population was homogenized using the FastPrep Biol 01 from Kem-En-Tec for one minute, speed 6 followed by isolation of total RNA from the extracts using the FastPrep Biol 01 kit (Kem-En-Tec) according to the manufacturer's instructions.
The eluted total RNA was ethanol precipitated for 24 hours at - 20°C by addition of 2.5 volumes of 96% EtOH and 0.1 volume of 3M Na-acetate, pH 5.2 (Ambion, USA), followed by centrifugation of the total RNA sample for 30 minutes at 13200 rpm. The total RNA pellet was air-dried and redissolved in 10 μl of diethylpyrocarbonate (DEPC)-treated water (Ambion, USA) and stored at - 80°C. Fluorochrome-labelling of the Target
Ten (10) μg total RNA from C. elegans and 1 ng of in vitro RNA from Splice variant #1 were combined with 5 μg anchored oligo(dT o) primer and DEPC-treated water to a final volume of 8 μl. The mixture was heated at 70°C for 10 minutes, quenched on ice for 5 minutes, followed by addition of 20 units of Superasin RNase inhibitor (Ambion, USA), 1 μl dNTP solution (lOmM each dATP, dGTP, dTTP and 0.4 mM dCTP, and 3 μl Cy5-dCTP, Amersham Biosciensces, USA), 4 μl 5 x RTase buffer (Invitrogen), 2μl 0.1 mM DTT (Invitrogen), 400 units of Superscript II reverse transcriptase (Invitrogen, USA), and DEPC- treated water to 20 μl final volume. A parallel set-up was made with 10 μg total RNA from C. elegans and 1 ng of in vitro
RNA from Splice variant #2, labelling with Cy3-dCTP. Both cDNA syntheses were carried out at 42°C for 2 hours, and the reactions were stopped by incubation at 70°C for 5 minutes, followed by incubation on ice for 5 minutes.
Unincorporated dNTPs were removed by gel filtration using MicroSpin S-400 HR columns as described below. The column was pre-spun for one minute at 1500 x g in a 1.5 ml tube, and the column was placed in a new 1.5 ml tube. The cDNA sample was slowly applied to the top center of the resin and spun 1500-x g for 2 minutes. The eluate was collected. RNA was degraded by adding 3 μl of 0.5 M NaOH. The solution was mixed well
and incubated at 70 °C for 15 minutes. The solution was neutralized by adding 3 μl of 0.5 M HC1 and mixed well. Then, 450 μl lxTE, pH 7.5 was added to the neutralized sample, and the sample was transferred onto a Microcon-30 concentrator (prior to use, 500 μl l TE was spun through the column to remove residual glycerol). The samples were spun at 14000 x g in a micro centrifuge for 12 minutes, and the volume was checked. Spinning was continued until the volume was reduced to 5 μl. The labelled cDNA probe was eluted by inverting the Microcon-30 tube and spinning at 1000 x g for 3 minutes. The Microcon filter was checked for proper elution. Comparative Hybridization of the Splice Variant Microarrays and Post-hybridization washes The Cy3 and Cy5-labelled cDNA samples, respectively, were combined in one tube.
The following was added: 3.75 μl 20x SSC (3x SSC final, pass through 0.22 μfilter prior to use to remove particulates), yeast tRNA (1 μg/μl final), 0.625 μl 1 M HEPES, pH 7.0 (25 mM final, pass through 0.22 μfilter prior to use to remove particulates), 0.75 μl 10 % SDS (0.3 % final), and DEPC-water to 25 μl final volume. The labelled cDNA target sample was filtered in Millipore 0.22 μ filter spin column (Ultrafree-MC, Millipore, USA) according to the manufacturer's instructions, followed by incubation of the reaction mixture at 100 °C for 2-5 minutes. The cDNA probes were cooled at room temperature for 2-5 minutes by spinning at maxium speed in a microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was carefully placed on top of the microarray spotted on lmmobilizer™ MicroArray Slide, and the hybridization mixture was applied to the array from the side. An aliquot of 30 μL of 3xSSC was added to both ends of the hybridization chamber, and the lmmobilizer™ MicroArray Slide was placed in the hybridization chamber (DieTech, USA). The chamber was sealed watertight and incubated at 65°C for 16-18 hours submerged in a water bath. After hybridization, the slide was removed carefully from the hybridization chamber and washed using the following protocol.
The slides were washed sequentially by plunging gently in 2 x SSC/0.1% SDS at room temperature until the cover slip falls of into the washing solution, then in lx SSC pH 7.0 (150 mM NaCl, 15 mM Sodium Citrate) at room temperature for one minute, then in 0.2 x SSC, pH 7.0 (30 mM NaCl, 3 mM Sodium Citrate) at room temperature for one minute, and finally in 0.05 x SSC (7.5 mM NaCl, 0.75 mM Sodium Citrate) for 5 seconds, followed by drying of the slides by spinning at 1000 x g for 2 minutes. The slides were stored in a slide box in the dark until scanning. Microarray data analysis
The splice variant microarray was scanned in a ScanArray 4000XL confocal laser scanner (Packard Instruments, USA). The hybridization data were analyzed using the GenePix Pro 4.01 microarray analysis software (Axon, USA).
In the data analysis, the experimental variation in the labelling efficiency between the two fluorescent dyes was normalized (scaled) as follows. The average signal intensities from the "exonl" and "exon3" internal capture probes (Table 6), were used to calculate normalization factor of 2.75. This factor was multiplied to the signal intensity values from the Cy-3 target.
Analysis of the data from the specific detection of the two recombinant splice variants in a complex RNA pool demonstrates that the merged capture probes containing a LNA block have significantly higher signals and a very low level of cross-hybridization, compared to the DNA capture probes (Figure 20). In addition, the specific detection of the two artificial splice variants #1 and #2 is validated with the results from LNA-modified oligonucleotide capture probes. Capture probes of other lengths and/or with other LNA substitution patterns can be used similarly. In contrast, the corresponding DNA oligonucleotide capture probes fail to detect splice variant #1 (Figure 20 (lower)).
Example 6: The Use of LNA-modified Oligonucleotides in Microarrays Provides Significantly Improved Sensitivity in Expression Profiling This example demonstrates the advantages of using LNA oligonucleotide microarrays in gene expression profiling experiments. Capture probes for the Saccharomyces cerevisiae gene SWI5 (YDR146C) were designed as 50-mer standard DNA and two different LNA- modified oligonucleotides with LNA substitutions at every second or every third nucleotide position, respectively, for comparison (Table 7). To assess the sensitivity of DNA versus LNA capture probes, hybridizations with different amounts of biotin-labelled antisense oligonucleotides in a 10-fold dilution series were performed. Design and Synthesis of the' LNA Capture Probes
To design capture probes, regions with unique mRNA sequence of the selected target genes were identified. Optimized 50-mer oligonucleotide sequences with respect to Tm, self- complementarity, and secondary stracture were selected. LNA modifications were incorporated to increase affinity and specificity. The biotin-labelled antisense DNA target oligonucleotide corresponds to the reverse complement sequence. Printing of the LNA Microarrays
The microarrays were printed on lmmobilizer™ MicroArray Slides (Exiqon, Denmark) using the Biochip One Arrayer from Packard Biochip technologies (Packard, USA). The arrays were printed with a spot volume of 2x300 pi of a 10 μM (final concentration) capture probe dilution. Four replicas of the capture probes were printed on each slide Hybridization with Biotin-labelled Antisense Oligonucleotide
The arrays were hybridized overnight using the following protocol. The desired amount of biotin-labelled oligonucleotide was combined in one tube followed by addition of 3 μL 20xSSC (3xSSC final), 0.5 μL 1 M HEPES, pH 7.0 (25 mM final), 25 μg yeast tRNA (1.25 μg/μL final), 0.6 μL 10% SDS (0.3% final), and DEPC-treated water to 20 μL final volume. The biotin-labelled target sample was filtered in a Millipore 0.22 micron spin column according to the manufacturer's instmctions (Millipore, USA), and the probe was denatured by incubating the reaction at 100°C for 2 minutes. The sample was cooled at 20- 25°C for 5 minutes by spinning at maxium speed in a microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was carefully placed on top of the microarray spotted on lmmobilizer™ MicroArray Slide, and the hybridization mixture was applied to the array from the side. An aliquot of 30 μL of 3xSSC was added to both ends of the hybridization chamber, and the lmmobilizer™ MicroArray Slide was placed in the hybridization chamber. The chamber was sealed watertight and incubated at 65 °C for 16-18 hours submerged in a water bath. After hybridization, the slide was removed carefully from the hybridization chamber and washed using the following protocol. The Lifterslip coverslip was washed off in 2xSSC, pH 7.0 containing 0.1% SDS at room temperature for 1 minute, followed by washing of the microarrays subsequently in l.OxSSC, pH 7.0 at room temperature for 1 minute, and then in 0.2xSSC, pH 7.0 at room temperature for 1 minute. Finally, the slides were washed for 5 seconds in 0.05xSSC, pH 7.0. The slides were then dried by centrifugation in a swinging bucket rotor at approximately 200 G for 2 minutes. To visualize the biotin containing duplexes, an aliquot of 40 μL of the 2 μg/ml streptavidin-Cy3 in lxSSC+0.05 % Tween solution was applied to the slide as described for the hybridization mixture above. The slide was incubated in a humidified chamber for 1 hour at room temperature. The coverslip was washed off in lxSSC+0.05 % Tween for 1 minute, followed by wash in 0.2χSSC+0.05 % Tween for 1 minute and then 10 seconds in MilliQ-water. The slide was dried by centrifugation in a swinging bucket rotor for 2 minutes at 200 G.
Data Analysis
Following washing and drying, the slides were scanned using a ScanArray 4000XL scanner (Perkin-Elmer Life Sciences, USA), and the array data were processed using the GenePix™ Pro 4.0 software package (Axon, USA). Results
Incorporation of LNA nucleotides at every second or third nucleotide position in standard 50-mer expression array oligonucleotide capture probes results in a 2-7-fold increase in fluorescence intensity levels using an unsaturated target concentration and hybridizing under standard stringency conditions (Figure 21). Thus, it can clearly be concluded that the LNA oligonucleotides are more sensitive in expression profiling compared to DNA oligonucleotides.
Table 7. DNA and LNA-modified SWI5 (YDR146C) oligonucleotide capture probes. LNA modifications are depicted by uppercase letters in the sequence; "mC" denotes LNA methyl cytosine.
Oligo Name Sequence
YDR146C-50 tgggaatggaacggggattatggtttcgccaatgaaaactaatcaaaggt
YDR146C-50_LNA2 TgGgAaTgGaAcGgGgAtTaTgGtTtmCgmCcAaTgAaAamCtAaTcAaAgGt YDR146C-50_LNA3 TggGaaTggAacGggGatTatGgtTtcGccAatGaaAacTaaTcaAagGt
Example 7: The Use of LNA-modified Oligonucleotides in Microarrays Provides Significantly Improved Sensitivity in Comparative Genome Hybridization (CGH).
This example demonstrates the advantages of using LNA oligonucleotide microarrays in Comparative Genome Hybridization (CGH) experiments. Capture probes for all 23 exons of the Menkes gene (ATP7A) were designed as 50-mer standard DNA and different LNA/DNA mixmer oligonucleotides, respectively, for comparison (see the table below). The C6-amino-linked capture probes were applied to lmmobilizer slides and hybridized with patient DNA samples labelled with a Cy3 fluorescent dye. Design and synthesis of the LNA capture probes
To design the capture probes, regions comprising individual exons of the Menkes gene were identified. The optimal 50-mer oligonucleotide sequences with respect to Tm, self- complementarity, and secondary stmcture were selected for each exon. LNA modifications were incorporated to increase affinity and specificity. A software tool "OligoDesign", which automatically designs capture probes that are optimized for sequence specificity, Tm, self-
complementarity, secondary structure, and LNA modifications was used for oligonucleotide design.
Results
Fluorescent Cy3 labelled patient genomic DNA was hybridized to microarrays spotted with the CGH capture probes listed in the table below. Compared to DNA capture probes, capture probes with LNA in every second position (LNA-2) had a significantly better capture rate of non-amplified labelled genomic patient DNA as shown in Figure 22, Figure
23 and Figure 24. Capture probes of other lengths and/or with other LNA substitution patterns can be used similarly.
EQ No Oligo Name Sequence
8253 Menkes.02 50NH2C6-DNA Tctgttgagggtatgacttgcaattcctgtgtttggaccattgagcagca
8254 Menkes.02 50NH2C6-4.LNA TctgTtgaGggtAtgamCttgmCaatTcctGtgtTtggAccaTtgaGcagmCa
8255 Menkes.02 50NH2C6-3.LNA TctGttGagGgtAtgActTgcAatTccTgtGttTggAccAttGagmCagtnCa
8256 Menkes.02 50NH2C6-2.LNA TcTgTtGaGgGtAtGamCtTgmCaAtTcmCtGtGtTtGgAcmCaTtGaGcAgmCa
8258 Menkes.04 50NH2C6-DNA agaaaagcaatagaggctgtatcaccggggctatatagagttagtatcac 8259Menkes.0450NH2C6-4.LNA AgaaAagcAataGaggmCtgtAtcamCcggGgctAtatAgagTtagTatcAc
8260 Menkes.04 50NH2C6-3.LNA AgaAaaGcaAtaGagGctGtaTcamCcgGggmCtaTatAgaGttAgtAtcAc
8261 Menkes.04 50NH2C6-2.LNA AgAaAaGcAaTaGaGgmCtGtAtmCamCcGgGgmCtAtAtAgAgTtAgTaTcAc
8263 Menkes.06 50NH2C6-DNA gctgttatacaacccccaatgatagcagagttcatccgagaacttggatt 8264 Menkes.06 50NH2C6-4.LNA GctgTtatAcaamCcccmCaatGataGcagAgttmCatcmCgagAactTggaTt
8265 Menkes.06 50NH2C6-3.LNA GctGttAtamCaamCccmCcaAtgAtaGcaGagTtcAtcmCgaGaamCttGgaTt
8266 Menkes.06 50NH2C6-2.LNA GcTgTtAtAcAamCcmCcmCaAtGaTaGcAgAgTtmCaTcmCgAgAamCtTgGaTt
8268 Menkes.08 50NH2C6-DNA tctttggtcaagaaggatcggtcagcaagtcacttagatcataaacgaga 8269 Menkes.08 50NH2C6-4. NA TcttTggtmCaagAaggAtcgGtcaGcaaGtcamCttaGataAtaaAcgaGa
8270 Menkes.08 50NH2C6-3.LNA TctTtgGtcAagAagGatmCggTcaGcaAgtmCacTtaGatmCatAaamCgaGa
8271 Menkes.08 50NH2C6-2.LNA TcTtTgGtmCaAgAaGgAtmCgGtmCaGcAaGtmCamCtTaGaTcAtAaAcGaGa
8273 Menkes.10 50NH2C6-DNA ttataaagcactgaagcataagacagcaaatatggacgtactgattgtgc 8274 Menkes.10 50NH2C6-4.LNA TtatAaagmCactGaagmCataAgacAgcaAataTggamCgtamCtgaTtgtGc 8275 Menkes.10 50NH2C6-3.LNA TtaTaaAgcActGaaGcaTaaGacAgcAaaTatGgamCgtActGatTgtGc 8276 enkes.10 50NH2C6-2.LNA TtAtAaAgmCamCtGaAgmCaTaAgAcAgmCaAaTaTgGamCgTamCtGaTtGtGc
8278 Menkes.12 50NH2C6-DNA aacaagtggatgtggaacttgtacaacgtggagatatcattaaagtagtt
8279 Menkes.12 50NH2C6-4.LNA AacaAgtgGatgTggaActtGtacAacgTggaGataTcatTaaaGtagTt
8280 Menkes.12 50NH2C6-3.LNA AacAagTggAtgTggAacTtgTacAacGtgGagAtaTcaTtaAagTagTt
8281 Menkes.12 50NH2C6-2.LNA AamCaAgTgGaTgTgGaAcTtGtAcAamCgTgGaGaTaTcAtTaAaGtAgTt
8283 Menkes.1450NH2C6-DNA ccattgccaccctcttggtatggattgtaattggatttctgaattttgaa 8284 Menkes.14 50NH2C6-4.LNA mCcatTgccAcccTcttGgtaTggaTtgtAattGgatTtctGaatTttgAa
8285 Menkes.14 50NH2C6-3.LNA mCcaTtgmCcamCccTctTggTatGgaTtgTaaTtgGatTtcTgaAttTtgAa
8286 Menkes.1450NH2C6-2.LNA mCcAtTgmCcAcmCcTcTtGgTaTgGaTtGtAaTtGgAtTtmCtGaAtTtTgAa
8288 Menkes.16 50NH2C6-DNA ggtatttgataagactggaaccattactcacggaaccccagtggtgaatc
8289 Menkes.16 50NH2C6-4.LNA GgtaTttgAtaaGactGgaamCcatTactmCacgGaacmCccaGtggTgaaTc
8290 Menkes.16 50NH2C6-3.LNA GgtAttTgaTaaGacTggAacmCatTacTcamCggAacmCccAgtGgtGaaTc
8291 Menkes.16 50NH2C6-2.LNA GgTaTtTgAtAaGamCtGgAamCcAtTamCtmCamCgGaAcmCcmCaGtGgTgAaTc
8293 Menkes.18 50NH2C6-DNA attggtaaccgggagtggatgattagaaatggtcttgtcattaataacga 8294 enkes.18 50NH2C6-4.LNA AttgGtaamCcggGagtGgatGattAgaaAtggTcttGtcaTtaaTaacGa 8295 Menkes.18 50NH2C6-3.LNA AttGgtAacmCggGagTggAtgAttAgaAatGgtmCttGtcAttAatAacGa 8296 enkes.18 50NH2C6-2.LNA AtTgGtAamCcGgGaGtGgAtGaTtAgAaAtGgTcTtGtmCaTtAaTaAcGa
8298 Menkes.20 50NH2C6-DNA tggcacaggcacagatgtagccattgaagcagctgatgtggttttgataa
8299 Menkes.20 50NH2C6-4.LNA TggcAcagGcacAgatGtagmCcatTgaaGcagmCtgaTgtgGtttTgatAa
8300 Menkes.20 50NH2C6-3.LNA TggmCacAggmCacAgaTgtAgcmCatTgaAgcAgcTgaTgtGgtTttGatAa
8301 Menkes.20 50NH2C6-2.LNA TgGcAcAgGcAcAgAtGtAg CcAtTgAaGcAgmCtGaTgTgGtTtTgAtAa
EQ No. Oligo Name Sequence
10573Menkes.01 50NH2C6-2.LNA GtGamCtTcTcmCgAtTgTgTgAgmCtTtGtTgGaGcmCtGcGtAcGtGgAtTt 10574 enkes.03 50NH2C6-2.LNA TtTtAamCtGamCamCcTtGtTtmCtGamCtGtTamCgGcGtmCamCtGamCtTtGcmCa 10575 Menkes.05 50NH2C6-2.LNA mCaTamCaGgTcAcTgGcAtGamCtTgmCgmCtTcmCtGtGiAgmCaAamCaTtGaAc 10576 Menkes.07 50NH2C6-2.LNA TgAgGgGaAtGamCgTgTgmCcTcmCtGcGtAcAtAaAaTaGaGtmCtAgTcTc
10577 Menkes.0950NH2C6-2.LNA TgTaTtmCcTgTaAtGgGgmCtGaTgAcAtAtAtGaTgGtTaTgGamCcAcmCa
10578 Menkes.11 50NH2C6-2. NA AcAtmCaGaGgmCtmCtTgmCaAaGtTaAtTtmCamCtAcAaGcTamCaGaAgmCaAc
10579 Menkes.13 50NH2C6-2.LNA TtmCcAtTaAcmCaGaAcGgGtmCamCtGcTtAtmCtGcGcAamCamCaTgTtGgAg
10580 Menkes.1450NH2C6-2.LNA mCcAtTgmCcAcmCcTcTtGgTaTgGaTtGtAaTtGgAtTtmCtGaAtTtTgAa
10581 Menkes.15 50NH2C6-2.LNA GaAamCgAtAaTamCgAtTtGcTtTcmCaAgmCcTcTaTcAcAgTtmCtGtTgmCa
10582 Menkes.17 50NH2C6-2.LNA AtGaAcAgTcAtmCaAcTtmCgTcTtmCcAtGaTtAtTgAtGcmCcAgAtmCtmCa
10583 Menkes.1950NH2C6-2.LNA GtTcTgAtGamCtGgAgAcAamCaGtAaAamCaGcTaGaTcTaTtGcTtmCtmCa
10584 Menkes.21 50NH2C6-2.LNA TgGcAaGtAtTgAcTtAtmCaAgAaAgAcAgTcAaGaGgAtTcGgAtAaAt
10585 Menkes.23 50NH2C6-2.LNA GcmCtmCtAtAaAcTcAcTarnCtGtrnCtGaTaAarnCgmCtrnCcrnCrAaAcAgTgTtGt 10705Menkes.2250NH2C6-2.LNA mCtGgAtGgGaTcTgmCaGcAaTgOcTgmCtTcAtmCtGtTtmCtGtAgTamCtTt
Example 8: Expression Profiling of Stress and Toxicity in Caenorhabditis elesans using LNA Oligonucleotide Microarrays
This example demonstrates the use of the C. elegans LNA tox oligoarray in gene expression profiling experiments in the nematode Caenorhabditis elegans. The C. elegans
tox oligoarray monitors the expression of a selection of 110 genes relevant for general stress response and for the metabolism of toxic compounds. Two different capture probes for each of these target genes were designed and included in the LNA tox array. In addition, the C. elegans LNA tox oligoarray contained capture probes providing control for cDNA synthesis efficiency and the developmental stage of the nematode. Capture probes for constitutively expressed genes for data set normalization were also included on the C. elegans LNA tox oligoarray.
Cultivation ofC. elegans Worms
For all cultures, the sample was divided into two, and one half of the sample was used as the control, the other was used as the treated sample. Worm samples were harvested and sucrose cleaned by standard methods. For heat shock treatment, the heat shock sample was added to S-media preheated to 33°C in a 1 L flask suspended in a water bath at 33°C, the other sample was added to a 1 L flask with S-media at 25°C. Both samples were shaken at approximately 100 rpm for an hour. For Lansoprazole treatment, 0.5 mL of 10 mg/mL Lansoprazole (Sigma) in DMSO was added to each 500 mL volume of S-media culture after 28 hours of growth from LI. At the same time, 0.5 mL of DMSO was added to the control. Incubation was for 24 hours. Samples were then harvested by centrifugation at 3000xg suspended in RNAE ter™ (Ambion) and immediately frozen in liquid nitrogen.
RNA Extraction
RNA was extracted from the worm samples using the FastRNA® Kit, GREEN (Q- BIO) essentially according to the suppliers' instmctions.
Design and Synthesis of the LNA Capture Probes
To design the capture probes, regions with unique mRNA sequence of the selected target genes were identified. The optimal 50-mer oligonucleotide sequences with respect to
Tm, self-complementarity, and secondary stracture were selected. LNA modifications were incorporated to increase affinity and specificity.
Printing of the LNA Microarrays
The microarrays were printed on lmmobilizer™ MicroArray Slides (Exiqon, Denmark) using the Biochip One Arrayer from Packard Biochip technologies (Packard,
USA). The arrays were printed with a spot volume of 2x300 pi of a 10 μM capture probe solution. Four replicas of the capture probes were printed on each slide.
Synthesis of Fluorochrome Labelled First Strand cDNA from Total RNA
15 μg of C. elegans total RNA was combined with 5 μg oligo dT primer (T20VN) in an RNase free, pre-siliconized 1.5 mL tube, and the final volume was adjusted with DEPC- water to 8 μL. The reaction mixture was heated at +70°C for 10 minutes, quenched on ice 5 5 minutes, spin 20 seconds, followed by addition of 1 μL SUPERase-In™ (20U/μL, Ambion, USA), 4 μL 5xRTase buffer (Invitrogen, USA), 2 μL 0.1 M DTT (Invitrogen, USA), 1 μL dNTP (20mM dATP, dGTP, dTTP; 0.4 mM dCTP in DEPC-water, Amersham Pharmacia Biotech, USA), and 3 μL Cy3™-dCTP or Cy5™-dCTP (Amersham Pharmacia Biotech, USA). First strand cDNA synthesis was carried out by adding 1 μL of Superscript™ II
10 (Invitrogen, 200 U/mL), mixing, and incubating the reaction mixture for 1 hour at 42°C. An additional 1 μL of Superscript™ II was added, and the cDNA synthesis reaction mixture was incubated for an additional 1 hour at 42°C; the reaction was stopped by heating at 70°C for 5 minutes, and quenching on ice for 2 minutes. The RNA was hydrolyzed by adding 3 μL of 0.5 M NaOH, and incubating at 70°C for 15 minutes. The samples were neutralized by
15 adding 3 μL of 0.5 M HC1, and purified by adding 450 μL lxTE buffer, pH 7.5 to the neutralized sample and transferring the samples onto a Microcon-30 concentrator. The samples were centrifuged at 14000xg in a microcentrifuge for ~8 minutes, the flow-through was discarded, and the washing step was repeated twice by refilling the filter with 450 μl lxTE buffer and by spinning for -12 minutes. Centrifugation was continued until the
20 volume was reduced to 5 μL, and finally the labelled cDNA probe was eluted by inverting the Microcon-30 tube and spinning at lOOOxg for 3 minutes. Hybridization with Fluorochrome-labetted cDNA
The arrays were hybridized overnight using the following protocol. The Cy3™ and Cy5™-labelled cDNA samples were combined in one tube followed by addition of 3 μL
25 20xSSC (3xSSC final), 0.5 μL 1 M HEPES, pH 7.0 (25 mM final), 25 μg yeast tRNA (1.25 μg/μL final), 0.6 μL 10% SDS (0.3% final), and DEPC-treated water to 20 μL final volume. The labelled cDNA target sample was filtered in a Millipore 0.22 micron spin column according to the manufacturer's instructions (Millipore, USA), and the probe was denatured by incubating the reaction at 100°C for 2 minutes. The sample was cooled at 20-25 °C for 5
30 minutes by spinning at maximum speed in a microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was carefully placed on top of the microarray spotted on lmmobilizer™ MicroArray Slide, and the hybridization mixture was applied to the array from the side. An
aliquot of 30 μL of 3xSSC was added to both ends of the hybridization chamber, and the lmmobilizer™ MicroArray Slide was placed in the hybridization chamber. The chamber was sealed watertight and incubated at 65 °C for 16-18 hours submerged in a water bath. After hybridisation, the slide was removed carefully from the hybridization chamber and washed using the following protocol. The Lifterslip coverslip was washed off in 2xSSC, pH 7.0 containing 0.1% SDS at room temperature for 1 minute, followed by washing of the microarrays subsequently in l.OxSSC, pH 7.0 at room temperature for 1 minute, and then in 0.2xSSC, pH 7.0 at room temperature for 1 minute. Finally, the slides were washed for 5 seconds in 0.05xSSC, pH 7.0. The slides were then dried by centrifugation in a swinging bucket rotor at approximately 200 G for 2 minutes. The slide was then ready for scanning.
Data analysis.
Following washing and drying, the slides were scanned using a ScanArray 4000XL scanner (Perkin-Elmer Life Sciences, USA), and the array data were processed using the GenePix™ Pro 4.0 software package (Axon, USA). The data in each image was normalized so that the ratio of means of all of the features is equal to 1.
Results
Use of LNA-modified oligonucleotide capture probes in the C. elegans LNA tox oligoarray clearly allows the identification of distinct expression profiles for C. elegans genes relevant for general stress response and for the metabolism of toxic compounds.
Table 12. Expression profiling using LNA Oligonucleotide Microarrays. Log2 transformed fold of changes for five selected genes in the two expression profiling experiments. Protein name
Cclone name) Heat shock Lansoprazole
HSP70 (F44E5.4/5) 4.11 nd
CYP37A (F01D5.9) nd 0.98
Ubiquitin (M7.1) 0.16 -0.12
Histone lQ (C01B10.5) -1.49 nd
HSP90 (C47E8.5) nd -1.17
Table 13. LNA-modified oligonucleotide capture probes. LNA modifications are depicted by uppercase letters in the sequence; "mC" denotes LNA methyl cytosine.
Oligo Name Sequence
CEABC_C34G6.4_U293_LNA3 TgcmCatTgcAcgGgcActTgtTcgAtcTccTtcTgtTttActTttGgaTg CEABC_C34G6.4_u375_LNA3 TcaTtcTagGatTgcmCagAtgGttAtgAtamCtcAtgTcgGagAgaAagGa CEABC_F57C12.4_u15_LNA3 mCcaAtgTtgTttAatTggTtgTaaTgtmCttGatGacmCtgmCatAatmCatAt CEABC_F57C12.4_u480_LNA3 mCacAagAtcmCtgTgtTgtTctmCcgGaamCaaTgaAaaTgaActTagAtcmCa CEABC_F57C12.5_u111_LNA3 TacTtgTtcTcgAcaAagGttGtgTagmCcgAgtTtgAcamCtcmCgaAgaAa CEABC_F57C12.5_u444_LNA3 TgaActTggAtcmCctTctTtgmCatTtaGcgAtgAtcAaaTttGggAagmCg CEABC_K08E7.9_d8_LNA3 TcaTtaAttTtgTgtAgcTttmCttTctmCgaTttTtgmCacGatmCttTccmCc CEABC_K08E7.9_u51_LNA3 AggGtgmCctActAcaAacTgamCccAaaAgcAgaTgamCcgAgaAgaAatAa CEABC_Y39D8C.1_u37_LNA3 AttGaaAgcGacGcgGaaAgtGccAtgTatTtcTaaTttTgtTttmCttTa CEABC_Y39D8C.1_u422_LNA3 TtgTcaGcaTatmCaaGagTagAtaTggAagTggAtamCacTctGctAatmCc CEADH_H24K24.3a_d3_LNA3 mCacmCttAttGcgTtcAatTttTgtTtcmCacmCtamCtarrιCtamCgaAtamCgtTg CEADH_H24K24.3a_u50_LNA3 TcamCaaGggAgaGagTctGcgGtcGgtGctGgcGttmCgaGaaAatAtaAc CEAPEX_R09B3.1_u191_LNA3 mCatGcaTccmCgamCgaGaaGaaGtamCtcAttTtgGagTtaTctGgcGaaTt CEAPEX_R09B3.1_u37_LNA3 GacmCatGctmCcgGtcGtcAtgmCaaAtcGacTtcTaaAttGctTctGatTa CEAPO_C35D10.9_u15_LNA3 TtgmCatGctGttAaaAccTatmCgtGtamCaaTatTgcmCtgTatAttmCccmCt CEAPO_C35D10.9_u609_LNA3 TggmCacAgcTtaAtaAcaAatTggAaaGtcGagGatTagTcgGtgTtgAa CEAPO_C48D1.2_u176_LNA3 GacAcamCgcAaaGgaTatGgaTgtTgtTgaGctGctGacTgaAgtmCaaTa CEAPO_C48D1.2_u23_LNA3 AgcAcgAaamCtcTgcmCgtmCtaAaaTtcActmCgtGatTcaTtgmCccAatTg CEAPO_F20C5.1_u453_LNA3 AtgGtcAtamCtcTaaAatGggmCagAacTtcAacmCaaAtcAttmCtcGtcAg CEAPO_F20C5.1_u96_LN A3 AacmCcgAgcTtgmCcgmCaaAgtGcaAgaAaaTtaTagAacGaaTgaAacAg CEATPase_B0365.3_u31_LNA3 GgaTggGtcGagmCgtGagAccTacTacTaaAgaAcaGctTgtGaaTctTt mCaamCgtTctmCgaTtcmCtamCggAcaAgaAtgGacmCtaTgcmCaamCagA
CEATPase B0365.3 u386 LNA3 aaGa
CEATPase_C17H12.14_u356_LNA3 TgcTcgTtaTccAgcTatTttGaaGggActTgtmCatGcaAggActTctTc
CEATPase_C17H12.14_u89_LNA3 mCcgTttAgaGctTatTgcTaamCcaGatTgtmCccAcaAgtmCagAacAgcTc CEATPase_F55F3.3_u215_LNA3 TgamCggAcgmCtamCtamCccAtaTgtAttTgtTccAtcTtamCcaGcaAccAa AgcTacTtcAttmCgamCaaGgaAcaTctmCggAaaAgtmCaaGtamCatmCccG
CEATPase F55F3.3_u275_LNA3 g
CEATPase_Y49A3A.2_u103_LNA3 AaaTtcAagGatmCcaGttGccGatGgtGaaGccAagAttmCgcAagGatTa mCgaTcgTttmCtgmCccAttmCtamCaaGacTgtmCggTatGctmCaaGaaTatG
CEATPase_Y49A3A.2_u272_LNA3 a CECALR_Y38A10A.5_u238_LNA3 TcaGgaAcgAtcTttGacAacAttAtcAtcAccGacTctGttGagGagGc CECALR_Y38A10A.5_u296_LNA3 TgaActmCtamCtcTtaTgaAagmCtgGggAgcmCatmCggAttmCgaTttGtgGc CECAT_Y54G11 A.5b_u137_LNA3 GaamCttTgcAggGccGctmCggGgaAtgTcaTgaTttmCatTatTaaGggAa CECAT_Y54G11 A.5b_u189_LNA3 GtcAatTctGggAgaAggTgtTggAtamCcgGggmCtcGggAgaGaaTgtGc CECC_C03D6.3_u275_LNA3 AtgTaaAgaAggAatGctTccmCgaAtgGatTggAtaTttAttTgtmCcaGa CECC_C03D6.3_u430J_.NA3 GgamCcgAaaTttGtgmCagmCatGtcGgamCacGaaAttGatGgtmCtcAttTt CECC_C07G2.3_d9_LNA3 mCagAcamCgaAggTtamCgaTagAtaAccAtcTctmCaaAgtmCtaTcgAccTc
CECC_C07G2.3_u44_LNA3 mCgamCgaTgtGcgTgtTccTgamCgaTgaAagAatGggAtaTtaAgaAaamCc
CECC__Y46G5A.2_u331_LNA3 TtgTgcTccAtcGctGctmCcgmCttAcaGacTtgAcaAcgmCtcAccTttGc
CECC_Y46G5A.2_u385_LNA3 AatGagmCggTtgTgcmCgtGtgAcgTcamCttmCgtmCacAgtGttGctmCtamCt
CECoA_C29F3.1_u316_LNA3 AaaTtgAcamCcaAtcAaaTctGtcTcaTctmCctGagGacmCgtmCaamCttmCg
CECoA_C29F3.1_u392_LNA3 AatmCttTgtGtamCggAgaTggGgcAaaAggmCagmCaaGaaAgtAaamCcaAg
CECoA_F08A8.4_u1094_LNA3 AggAcaAggGgcActActGgcAcaGgcTttGatTatTgcAgtGagAtaTt
CECoA_F08A8.4_u1260_LNA3 TtaAtgGagGtgAcaAtgGgtTccTtgGatTcgAtaAatTccGagTgcmCc
GctmCttmCtcmCagTggGctmCaaAatAgtmCaamCtcAacAgaTcgGaaGttmC
CECoA_F59F4.1_u109_LNA3 t
CECoA_F59F4.1_u424_LNA3 AaaGctTcgAgaTggmCacGttmCgtmCtgTatmCtcGtgAagAacTtaTtgmCa
CECoA_Y25C1 A.13_u115_LNA3 GatTcgmCtgAacTttAtcAagAcgTggAatAtgAgcmCagmCtcmCtgTcgAc
CECoA_Y25C1A.13_u451_LNA3 GatmCttAtcAccGcgTgcGatAttmCgaGtaGctTcamCagGatGcgAttTt
CECOL_C27H5.5_u493_LNA3 GgaAagGaaGgaTccAttmCtcAgcTctGcamCttmCcamCcaTcaGagmCcaTg
CECOL_C27H5.5_u680_LNA3 TggAtamCaaGgaGggAtcTggmCagTggTggAtcTggAagTggTggAtaTg
CECOQ_ZC395.2_u199_LNA3 TtgAaaGaamCtcmCttGccGacGatmCctGaaAcamCacAaaGaaTtgmCtgAa
CECOQ_ZC395.2_u400_LNA3 AtgTggGatGagGagAaaGaamCatTtaGatAcaAtgGaaAgaTtaGctGc
CECRYZ_F39B2.3_u171_LNA3 AggmCtgAgcTctTggActTtgGcaTcaAcaTtgTctmCatTctTgaAggAa
CECRYZ_F39B2.3_u222_LNA3 TtaTggTtamCagAagGagmCtgTttAcgGtgTagmCatTggGaaTgtmCttmCc mCacTtcAacmCaamCtcmCgtGttAatmCaaGcaAgcmCgcmCacmCatmCta
CECyclin_R02F2.1 a_u24_LNA3 AtgAg CECyclin_R02F2.1a_u312_LNA3 TctmCatTgcTcgTcgAggmCtamCcaAcaAacActGgcAatAccmCaaTtaAt CECyclin_ZC168.4_u203_LNA3 TaaGaaAgtmCatTgaGgaTgcTgtmCgcTttGctmCgcmCgaAgtmCtcGtaTa CECyclin_ZC168.4_u273_LNA3 AagTtcAtcmCtgTtgAcgGaaTcgAggmCggAgaAtgmCtgTatmCggTcaTt CECYP_B0213.15_u133_LNA3 AcaGgaAatAtgAttTtgGatTtcGatTttGaaTcgGttGgtGctGccmCc CECYP_B0213.15_u202_LNA3 GctGagmCtgTatTtgGctAgtGaaAtgTgtGttTttGatActTtaAatGa CECYP_B0304.3_u38_LNA3 AcgAggTttGgaTcamCaaTcaGaaTtcTgtGaaAtaAgcGttTttTggGa CECYP_B0304.3_u89_LNA3 AgtTctmCggTctAacAgtGtcTccmCgtTgaAtaTtcTtgTaaAatmCacAc CECYP_C03G6.14_u706_LNA3 AtgAccActmCaaAatActGctAaaAgaTttGcaGcgGcaGaaGccGttAa CECYP_C03G6.14_u768_LNA3 TtgAtaTggmCtgTacmCtgTatGgtTttTgaGgamCgtTttTtaGgaGtcGa CECYP_C03G6.15_d9_LN A3 AttTatTcaTtcAtcmCatGtaAacTgtAtaTttTgaAttTgtGttGtaAa CECYP_C03G6.15_u148_LNA3 GccAaaGcaGaaTtgTatTtgAtcTtcGgtAacmCttmCtcmCttmCgcTacAa CECYP_C06B3.3_u102_LNA3 AttTtgAatmCttmCtgGgaAaaTgcmCatrnCcamCtcGagAaarriCcgTtcmCgtTt CECYP_C06B3.3_u474_LNA3 mCtaAcgGagGatrnCtcGccAatTatmCttTgaGagAcaAaamCtgAaarnCtcrnCt CECYP_C12D5.7_u399_LNA3 AtcTagTccmCaaTgaAtcTccmCacAtgmCtgTtamCtcGtgAtgTtcAacTc CECYP_C12D5.7_u65_LNA3 TttTgcTttmCatmCgcAaaAgcTcaAgaTtamCacAtgTcaGgtmCaaGccAa CECYP_C45H4.17_u27_LNA3 mCcgmCgamCttTaaAgaGaaGatmCatAaaTttGcaTtgTttTttGttTgtAt CECYP_C45H4.1 _u598_LNA3 mCgaGggTgaTtcGgaGacTttmCagTaaTgtmCcaActTtcAaaTgtTtgmCa CECYP_C45H4.2_u110_LNA3 TagAtamCaaGatAcaTccmCtcAaaAgaAggmCctAccGtcAatGgcmCaaAg CECYP_C45H4.2_u429_.LNA3 TcaAcgmCgtmCtaTaaAtgAatmCacAacGagGtaTcaAcaTtcTccmCccTg CECYP_C49C8.4_u363_LNA3 AtgmCtgAtgTtgAaaTtgmCtgGctAccGtaTtcmCaaAagAtamCtgTaaTc CECYP_C49C8.4_u883_LNA3 AtgAatmCcaTggmCttGgamCatmCtcmCcgTttTtcAagGgaTatAaaAatGt CECYP C49G7.8 d6 LNA3 AtgmCaamCgaAttAgtGaaAaaTtcAtcmCtgGaaTaaAaaAtaAttmCtaAa
CECYP_ C49G7.8 _u795_LNA3 AtcGctAcgAcaAtcTttmCcgAtgmCctTcgAagTttmCgaAagmCttTctmCt CECYP. F01D5.9. u374_LNA3 GagGtcGgtGgaGgaGgaAgtGgaAatTgamCggmCaaAatmCctGccmCaaGg CECYP. F01D5.9. U46_LNA3 mCccTctTtgGgaTttmCcamCtcAagTttActGttmCggmCagmCagTgaTatAa CECYP. F08F3.7_ u25_LNA3 GagTtgGttmCcamCagAatGctTagGacGttTaaAttmCgtmCacAaamCttTt CECYP. F08F3.7_ u401_LNA3 mCaaTatGgtTccmCatTttAgcAacTcaTatGaamCacAgaAgaTgtmCctTg CECYP. F14F7.2_ u397_LNA3 GaaAaaGgcGtcGacAttTtaTgtGacAcgTggAcamCttmCacTatGacAa CECYP. F14F7.2_ U68_LNA3 TaaTtgAatTacGggTctTttGtamCatAttAatTttAgtAtamCttTgtGa CECYP. F42A9.5_ u435_LNA3 AtaTcaAtgmCaamCtaTtaAtgAatmCacAacGtcTtgmCcaAtcTtcTccmCg CECYP. F42A9.5. u55_LNA3 GgaGtgActAtgAaaGcaAagAgtTacmCgaTtgAaamCtgAaaGacAgamCa CECYP. K07C6.3. u3_LNA3 AatmCttTaaTgaTaaTttAtgGgaTctGtaTttmCtcTttmCtgTcaAtaAa CECYP. K07C6.3..u354_LNA3 AtgAgcmCcamCaaAtgTaaAagGatAcgAgaTtgAttmCggGaamCagTcaTg CECYP K07C6.4 u118 LNA3 AtcmCtgmCgaTatGacAttAagmCcamCatGgtTctGaamCctTcaAcaGaaGa mCtgAacmCttmCaamCagAagAtaAacTtcmCgtAtaGcgmCtgGaaAaamCtc
CECYP_K07C6.4_u87_LNA3 mCt CECYP_K07C6.5_u7_LNA3 AttTaaAggAatTcamCagmCtcAaaAaaTaaTaamCtamCcgGttmCagAgaTt CECYP_K07C6.5_u99_LNA3 AatTtgAgcmCacAtgGcaAgtTatmCaamCagAggAgamCaaTgcmCgtAcaGt CECYP_K09A11.3_u362_LNA3 TgamCatTctActTaaAggGaaGaaAatAccAacTggTacmCctTgtAttTg
TcamCcamCaaAgcmCatAcaTatGcgAgcTagTtcmCtcAggmCtgmCttAaam
CECYP_K09A11.3_u48_LNA3 Cc CECYP_K09A11.4_u238_LNA3 TtcGacAaaActAttTtgGaaAgaAcaAtcmCcaTtcAgtGtcGgcAaamCg CECYP_K09A11.4_u68_LNA3 TctGacAacAaaGccAtamCacGtgmCcgActAatTccAcaAtcAgcTagAa CECYP_K09D9.2_u151_LNA3 TtgGcaAaaGcaGaaTtgTatTtaAtcTttGgaAacmCtcmCttmCttmCgcTa CECYP_K09D9.2_u866_LNA3 TgaAtcTttmCaaActTatmCacTccTttTaaTacTacmCgtTccTgtTtgGa CECYP_T10B9.10_u410_LNA AttGagAttGtaTccAttGgcGtcTctTgtTcamCaaTcgAaaAtgTctmCa CECYP_T10B9.10_u56_LNA AacTgcTacTatTgcGccAtcAagTgtGctGctmCaaActTaaAtcmCagGt CECYP π 0B9.7_u102_LNA3 TtgAgamCagGaaAtaAgamCtaGaaTtcmCttTgaAacTggTggGaaGtgmCt CECYP_T10B9.7_u267_LNA3 AagAtgTcaAagAatTcaAgcmCagAacGatGgtmCcamCcgAcgAgcmCatTa CECYP_T19B10.1_u100_LNA3 AttGaamCcaActmCtgAaaTatAatGacAcaAaamCcaTgtmCtgGaaGtgGt CECYP_T19B10.1_u319_LNA3 GgcAatGtgAcaAtaTctmCcaAtgGttmCttmCacAgcAatmCatmCacGtgTt CECYP_Y49C4A.9_u121_LNA3 mCtaTtcAatmCgaTatTttAtcAcamCcaTccAgtGctGgamCctmCcaTcaTt CECYP_Y49C4A.9_u413_LNA3 GtcTcaGagAtgTgtAaaTttActTccmCtgmCaaTttGttTcamCgcAacTa CECYP_ZK177.5_u394_LNA3 TtcmCgaAtgTttmCcaAttGggActGaaGttTcaAgaGtcAccmCagAaaAa CECYP_ZK1 7.5_u445_LNA3 GatmCcaGraTctTccAagmCttAcaTtcmCtcmCgtGctTgtAtcAagGaaAc CEDA0_C47A10.5_d9_LNA3 TttGaaAacmCtgTttTatTatTaaAatAgaTaaTtgAttAgtTctGtamCg CEDAO_C47A10.5_u269_LNA3 AtamCgtTgcActGcaTccGgcTatGagGgaGccAaaAatmCttAggGgaGt CEDC_C01A2.3_u373_LNA3 GcamCttmCcaTtcAtcTctGcaGctActAtgGctTtgGtgAcaAaaGttGg CEDC_C01A2.3_u96_LNA4 mCcgTccAaaAgaAtgmCcaTctmCacAagTctTgaAatmCttAtaAagGtaGt CEDC_C34F6.1_u301_LNA3 GagGgaTcaAcaGtaAccTcgTgcGgtAttGacAagGgaTgtmCcgGaaGg CEDC _C34F6.1_u450_LNA3 GatGgtTctTcgAtcGcaAacAaaAcaGatGtgmCtcmCatTtamCatAcgGa CEDC_F33D11.3_u126_LNA3 AtgGagAaaAtgGatmCtgAtgGagTtgmCagGaaGtgAtgGagmCtcmCagGa CEDC_F33D11.3_u14_LNA3 TgaAtcTccAtaAatTatTcaAtgTttmCcaAatAttTaaTttAtcAatTg CEDC_F46E10.2_u392_LNA3 GctmCaamCacGgtAggAtcmCtaTggAacmCgtmCggAggAgcAggmCctmCg
gAg
CEDC_F46E10.2_u54_LNA3 mCgtGacAacmCtcTtaTttAttTctGtaAaamCtgAttmCgcmCaaActTttGt CEDC_F56G4.2_u382_LNA3 GaaGctTtcAaamCcaAatGagTtcmCttmCccGgaAtcmCcaAagAatAccAa CEDC_F56G4.2_u82_LNA3 AcaAtgAaaAgaGagGatGgaAagGaaAtcGaaGtcTctGttmCttGacGa CEDC_ 162.2_u103J.NA3 GatGagGtamCatAacTttGtgTgcAgtTatAggmCcaTctAcaGtamCctGc CEDC_M162.2_u480_LNA3 TtcmCatmCatmCacTaamCcgAttGtcmCtgAcaTtgAtgGccAaamCcaGggAa CEDC_R10E4.11_u274_LNA3 TcamCatTatmCgaAcaAgtActAgtAagmCatGctGtgAtgGagTgcmCgcTa CEDC_R10E4.11_u397_LNA3 mCacGgaGatmCacGacAtcAaaGcgGatTgcTtaGagTgtGgaAacmCgtmCt CEDC_T04C9.1_u321_LNA3 ActAtcTacGtgGcamCgtTggActmCatmCatmCgaTggGaamCgamCgtAtaAg CEDC_T04C9.1_u64_LNA3 TctmCtgGccAgtTcamCttTgtGatmCaaTctmCagAttrnCgtmCcamCacAagAt mCtamCttmCcgmCaaGaaGgcmCcgTcgTttmCtaAtcGatmCgaAcaTctmCac
CEDC_W02A2.3_u32J.NA3 Ac
AtgGatGatmCgarriCccActTgcmCacTgarnCccAcaAtcmCcgmCacTcamCta
CEDC_W02A2.3_u374_LNA3 mCc CEDC_W05G11.3_u153_LNA3 AagAcgGagAggmCtgGagAgaAcgGtamCcgAtgGagAgcmCagGaamCtgAt CEDC_W05G11.3_u51_LNA3 mCcamCccAggAggAggGatAcaAgaGaaGaaAgtAcaGatTctmCcaActAa CEDC_ZK863.5_u256_LNA3 AgtTtcAcamCttmCttTttGccGttTtgGttmCccGttAtcAatmCcaTtgAt CEDC_ZK863.5_u324_LNA3 mCttTtaTatTctmCatmCaaTttGttTccTacTtgGtcAgcTgaGgaTcgTt CEEPHX_Y55B1BR.4_u161_LNA3 TtcGgcAcaAatGgaGcaAaaGtaTcgTggTtaTtgTgaTgcGatTatTc CEEPHX_Y55B1 BR.4_u93_LNA3 mCtamCtaTgaAtgAgcTcamCtgGacTcaTttAtcAacTcgAgtmCaaAagmCc CEER_18S_u388_LNA3 GttGgcGaaTctTcgGgtTcgTatAacTtcTtaGagGgaTaaGcgGtgTt CEER_18S_u82_LNA3 GaamCtgAttmCgaGaaGagTggGgamCtgTcgmCttmCgaGgtTtaAcgActTc CEER_26S_u342_LNA3 TgtTatTgcGaaAgtAatmCctGctTagTacGagAggAacAgcGggTtcAa CEER_26S_u38_LNA3 TgcAtamCgamCttGgtmCtcTtgGtcAagGtgTtgTatTcaGtaGagmCagTc CEFOXO_R13H8.1b_u331_LNA3 TgtGctmCagAatmCcamCttmCttmCgaAatmCcaAttGtgmCcaAgcActAacTt
TtaAgamCggAacmCaaTtgmCtcmCacmCacmCatmCatAccAcgAgtTgaAca
CEFOXO_R13H8.1b_u393_LNA3 Gt
AcaTtgmCtamCcaAggmCctAagmCcgmCttmCaaAttmCtcTaaGtcTgaAatG
CEGAPDH_K10B3.7_u21_LNA3 a CEGAPDH_K10B3.7_u727_LNA3 GttGagTccAccGgaGtcTtcAccAccAtcGagAagGccAatGctmCacTt CEGBA_F11 E6.1 a_u232_LNA3 AgtAaaTtcmCttmCcamCgtGgaTctActmCgtGtgTtcAcaAagAtcGagGg CEGBA_F11 E6.1 a_u451_LNA3 GgtmCcaAtaAtgGgaGacTggTtcmCgcGcaGaaAgtTatGcaGatGatAt CEGLU_C02A12.1_u264_LNA3 AgaAaamCttmCgtTggAccmCtgmCtaAggAgaAgtAttTcaAgcTtcTgaGc CEGLU_C02A12.1_u55_LNA3 GagmCacmCcgAagmCtcAagmCcaTatTtgGaaAcaAgamCcaTacTctTcaAa CEGLU_C46F11.2_u271_LNA3 GttAccmCtcTacAaaTctmCgcTtcAatmCcaAtgTtgTtcGcaGtcAccAa CEGLU_C46F11.2_u45_LNA3 mCcgAagAgcTcgTtamCtaTgcGagGagGtgTgaAgcmCggAatAatTttTt CEGLU_F26E4.12_u 109_LNA3 AagTtcTtgGttGgamCgcGatGggAaaAttAtcAagAgaTttGgamCcaAc CEGLU_F26E4.12_u480_LNA3 AcgAttTcaAcgTcaAaaAtgmCtaAtgGtgAtgAcgTgtmCacTttmCggAt CEGLU_R07B1.4_u166_LNA3 AccTggGttGatGttTttGcgGctGaaAgtTtcTccAagmCtcAttGatTa CEGLU_R07B1.4_u38_LNA3 GaaGtamCgtmCtcmCcaAagAaaAgcTacmCccAgcTtaAggmCatTgcAcaAt CEGLU_T09A12.2_u220_LNA3 GcgmCcaGatAtgTatTcaAagAtcGagGtaAatGgtmCagAacActmCatmCc CEGLU T09A12.2 u335 LNA3 AatmCtamCagGgaAaaAggAttTcgAgtTgcmCgcGttTccAtgmCaaTcaAt
CEGLU_T28A11.11_u299_LNA3 AgaTggmCaaAgaAgcAtamCatAacTgaAacTctTccmCggGgaGctActAc CEGLU_T28A11.11_u54_LNA3 TgaAtaAacGggmCcgAacTaaAtcmCatTcgTcaGtgGaaAtgGgaAacAa CEGPD_B0035.5_u256_LNA3 GtcmCgtmCttmCctGatGctTatGaamCgcmCtaTttmCtcGaaGtaTtcAtgGg CEGPD_B0035.5_u478_LNA3 TgtGgaAaaGctmCtcAacGagAagAaaGcaGaaGttmCgtAtamCaaTtcAa
AtaTcgmCcgmCctGctTccTcamCcaAccmCgaAtaAcgmCaamCaaAaamCtt
CEHSP_C09B8.6_d8_LNA3 Ta CEHSP_C09B8.6_u286_LNA3 AagAgcmCcamCtcAtcAagGatGaaAgtGatGgaAagActmCttmCgtmCtcAg CEHSP_C12C8.1_u127_LNA3 mCaaGatAttTtaAcaAaaAtgmCatmCaamCaaGaaGccmCaaTcaGgtTccGg CEHSP_C12C8.1_u1531_LNA3 mCttGggmCatTctGtamCggGatGctGtcAttActGtgmCctGcaTatTttAa CEHSP_C47E8.5_u310J.NA3 AagAagmCatmCtcGaaAtcAacmCcaGacmCacGctAtcAtgAagAcamCttmCg CEHSP_C47E8.5_u361_LNA3 AtgAaaGctmCaaGctmCttmCgtGatTccTctActAtgGgaTacAtgGccGc CEHSP_F26D10.3_u276_LNA3 TtaAgcAgamCcaTtgAggAcgAgaAgcTcaAggAtaAgaTcaGccmCagAa CEHSP_F26D10.3_u397_LNA3 mCgtmCttTccAagGatGacAttGaamCgcAtgGtcAacGaaGctGagAaaTa CEHSP_F43D9.4_u 169_LNA3 GtcGacTtgGctmCacAtcmCacAccGtcAtcAacAagGaaGgamCagAtgAc mCaaTctTgaGggAcamCgtTctmCacmCatTgaGggAcamCcamCgaGgtmCa
CEHSP_F43D9.4_u275_LNA3 aGa
TcamCtaAaaTgcAccAatmCtgGacAatmCttmCtgmCttmCtgmCtgGatGcgmC
CEHSP_F44E5.4/5_u123_LNA3 t CEHSP_F44E5.4/5_u380_LNA3 TcaTgaAgcTaaAcaAttmCgaAaaGgaAgaTggTgaAcaAcgGgaAcgTg CEHSP_F52E1.7_u175_LNA3 AagTatAacmCttmCcaAcaGggGtcmCgtmCcaGaamCaaAtcAagTccGaaTt CEHSP_F52E1.7_u448_LNA3 TttAacmCatGgcmCgcAgaTtcTtcGatGacGtcGacTttGatmCgcmCacAt CEHSP_F54D5.8_u252_LNA3 GcgTcgAaaAgaTctmCccTgaAgtmCtgmCatTgamCtgGccTtgAtaTtaTg CEHSP_F54D5.8_u318_LNA3 AcaTagTctTcgTcaTcaAggAtaAgcmCacAccmCgaAatTcaAgcGagAg CEHUS_H26D21.1_u117J.NA3 TcgmCcaAcamCtcGgamCacGtgmCcaAaaTgaAtaTcaTctmCaaAtcGaaTg CEHUS_H26D21.1_u478_LNA3 GtcGaaGttAgaAatmCcaGaaGccGatAttGttTctmCatmCaaAttmCcaAt CEMRE_ZC302.1_u169_LNA3 ActActmCgtGgaAgaTccAatAaaGttGttTcaAcgmCgamCaaAtcGatTc CEMRE_ZC302.1_u292_LNA3 GgcAgtGaaGatGaaGtgGcaAatTctGatGaaGaaAtgGgaAgcAgtAt CEMTL_T08G5.10_d 127_LNA3 TtgTcaAcgAccAgaAgcAaaAatTatGggAatmCgcGatAaaAttmCaaGg CEMTL_T08G5.10_u45_LN A3 GatGcaAgtGtgmCcaActGcgAatGtgmCtcAggmCtgmCtcAttAatTtgAa CENAP_D2096.8_u356_LNA3 GacGatAtgTtcGatTtcmCcaGgaGagGacGgtGatGatGtgTcaGacTt CENAP_D2096.8_u70_LNA3 GacGatAtgTtcGatTtcmCcaGgaGagGacGgtGatGatGtgTcaGacTt CEPAI_F56D12.5_u241_LNA3 GagGtcGtcGtaAtcmCacAagGctmCcaAgaAagmCaaGtgmCtcGacAttTc CEPAI_F56D12.5_u301_LNA3 GatActTttGgcAagmCtcGttmCcaAtcAagAagGagGtcAtcmCcaGatmCg CEPDI_C07A12.4_u28_LNA3 GatGagGagGgamCacAccGagmCtcTaaAtcmCacAttmCcaAtamCagTtcAa CEPDI_C07A12.4_u433_LNA3 mCttAtgTccGaaGatAtcmCcaGagGatTggGacAagAacmCcaGtcAagAt
TacmCccAgtmCgamCtaTgaTggAgamCagAaamCctmCgaGaaGttmCgaAg
CEPDI_C14B1.1_u119_LNA3 aAt CEPDI_C14B1.1_u358_LNA3 mCtcGtcGccTccAacTtcAacGaaAttGccmCttGatGaaAccAagActGt CEPGK_T03F1.3_d9_LNA3 TtcTatTgtTtaTtcmCttGccmCaaTagTgtAttTgtAttTatTctTtcTc CEPGKJ03F1.3_U424_LNA3 mCaaAtcmCatmCtcmCcaGtgGatTtcGtcAttGctGacAagTtcGccGagGa CEPON_E01A2.7_u223_LNA3 GttTctGatTcgAcamCttTatGgamCcaTctmCaaGttmCtgmCgaGttTctTt CEPON E01A2.7 u79 LNA3 GggAaamCaaAtgAttGttGgtAcaGtaGccmCgcmCctGctAttmCacTgtGa
mCgaGcamCatmCatmCcaAtcGttmCctGttmCaamCaaGgcmCttmCtaAtcGtt
CEPPGB_F13D12.6_u44_LN A3 Ag CEPPGB_F13D12.6_u440_LN A3 TgaTgaGagmCccAgtAacmCaaTtaTttGaamCcgTcaGgaTgtGcgTaaGg mCgtmCtaAtcGaaGaaGggGatmCgtGggmCaaTcaTaamCtaAttAacmCttm
CEPPS_T14G10.1_d2_LNA3 Ca
CEPPS_T14G10.1_u240_LNA3 mCaaTggmCtcmCagGtcTttrnCtgmCtcTtcAtaTacTtcmCatTccGagTtgmCt
CEPRDX_R07E5.2_u405_LNA3 GttrnCtcTtgGagmCtgAagTtgTcgmCgtGctmCgtGtgAttmCtcActTctmCt
CEPRDX_R07E5.2_u42_LNA3 TcgmCtamCcaGcaAggAatActTcaAcaAggTcaAcaAgtGatmCacAcaGa
CEPYC_D2023.2_u256_LNA3 AagGaaAttGtaActrnCgcmCcaAgaGctmCtcmCcaGgtGtcmCgtGgamCatAt
CEPYC_D2023.2_u427_LNA3 TtgActGgaTtgGagAttGcgGaaGaaGttGatGttGaaAtcGagAgtGg
CERAD_F10G7.4_u169_LNA3 GccAagTctmCaaGcaAtaAgtGttGatmCaaTcaGagmCcaTacGgaGagAt
AtaTtgAgamCttmCggGacAagmCggActTctmCatmCtgTcamCagmCaarnCtg
CERAD_F10G7.4_u267_LNA3 mCc CERAD_F32A11.2_u250_LNA3 GatmCcgmCagAgaAtcGagTatTtcmCtcTcgAgamCccAtgGatAtcAacTg CERAD_F32A11.2_u380_LNA3 TccGttAagAagmCtcActGgaAaaAcamCacGgcTcgAacGaaAttGgaAt CERAD_T04H1.4_u274_LNA3 AatTtgGatGagAgcAaaGtgGaaGgaAtgGctAtcGttTtgGcaGatAt CERAD_T04H1.4_u375_LNA3 GtgmCtgGtcAaaAaaTgcTtgmCttmCgtTgcTtaTtcGcaTtgmCacTcgmCa CERAD_W06D4.6_u325_LNA3 mCttmCgaGaamCtcTtcAagTtgGaaTcaAcaGtgGcaTcgGatAcamCatGa CERAD_W06D4.6_u34_LNA3 GtgmCctTctGaaGccGaaGaaAacGacGatTagTtaAatGttTccAagTt CERAD_Y116A8C.13_u289_LN A3 GatAaaAtcGatAgcGacGacGatGagGaaGccGatGatGagGagmCtcGa CERAD_Y116A8C.13_u59_LN A3 GcaGgtGgaTacGgaTgtGgaGctGacTttTgcGttTtaTcaAgaAtcTc CERAD_Y39A1A.23_u221_LNA3 TccmCgtAgaAgtAgaAatGctAgaAgaAccTgaAcaAgaAgaTcaAgaAa CERAD_Y39A1 A.23_u276_LN A3 TgcAagAtgTcaGtaTtgAaamCaaTtcmCtgTagAgamCccmCcgAagAaaAt CERAD_Y41 C4A.14_u509_LNA3 AgtmCtcGtaTccGggAatGttTcaGccTgtGaaAatGctTgtTgaAgamCg mCttmCaaAacmCgtmCgcTttTaaGgaTacAggAacGtgGcamCgcTtcmCgaG
CERAD_Y41 C4A.14_u731_LNA3 9 CERAD_Y43C5A.6_u131_LNA3 mCagAttGtamCctTcgAaaAggAaaAggAgaGaaTcgmCgtmCgcAaaAatGg CERAD_Y43C5A.6_u429_LNA3 TgaTggmCttTgaTtaTtcGagmCagGagmCaaTgaTgtmCcgAgaGtcGttAt CERFC_F31 E3.3_u128_LNA3 mCaaTgamCgaGaaTatTggAgrAatGggGaaActGgtTgcGacTtgmCgaAa CERFC_F31 E3.3_u55_LNA3 TtgGaaAacAatmCtcmCtcGacTttmCtgmCtcActmCttmCgtGaaActAtcmCa CERPL_K11 H12.2_d1_LNA3 TctTgtTatTttAttTtgTttTggGctTgtTccGaaAatGaaAtgGttGt mCaaTggAtcAccAagmCcaGttmCacAagmCacmCgtGagmCaaAgaGgamCt
CERPL_K11H12.2_u172_LNA3 cAc CERT_F36A4.7_u1396_LNA3 mCttTgtGatGtgAtgActGcgAagGgamCacTtgAtgGctAttAcgAgamCa CERT_F36A4.7_u2302_LNA3 GagmCcaGctActmCagAtgAcamCtcAacAcgTtcmCatTatGcaGgaGttTc CERT_F36A4.7_u289_LNA3 TacActmCcaTccTcgmCcgAcaTacAatmCcaAcaTctmCcamCgcGgaTtcTc CERT_F36A4.7_u2919_LNA3 AtgGagAagAtgGttTggAtgGaaTgtGggTtgAgaAtcAgaAtaTgcmCg CERT_F36A4.7_U4269_LNA3 AacmCggGarAccGtgTcgAacGtcAcaTgaAagAtgGcgAtaTaaTcgTc CERT_F36A4.7_U5485_LNA3 GagGagAttAaamCgcAtgTcaGtgGctmCatGtcGagTttmCcaGaaGtcTa CESLC_F52F12.1 a_u249_LNA3 AgaTatTgcmCtcTacTtaTcaTggGccTgaTggmCttTgtmCtgmCcgGtaTt
GaaTctmCaamCcamCttmCtgGaarnCccmCatAcamCcaAtgGatAgaAgamC
CESLC F52F12.1a u76 LNA3 ggAg
CESLC _K11 G9.5_u400_LNA3 GttGttmCttTttTccGtgAtcTttTcaTgtTtaTgtmCtgAacGtgGcaGg CESLC_K11 G9.5_u462_LNA3 GacTcgTtgGtgTctTgcTagGatGtcTtgGgtTcaTtcmCtcAatmCgtTg CESLC_Y32F6B.1_u179_LNA3 GtamCtgGgcTcgAggGctGaaActAatmCgaAgaAgaAacTccAgaAgaTa CESLC_Y32F6B.1_u280_LNA3 GgaTcaTgcTctGttTacGacActGatGagTtaAgaGtcAgamCtgmCacGt CESLC_Y37A1 C.1 a_u104_LNA3 mCgaTggTtcTtcTcgTctAtcAtaTcgGggTagTtgmCcgAagTgtTgaAa CESLC_Y37A1 C.1 a_u404_LNA3 mCaaAtcGaamCtgGtaTaaAggAggAccGacGgaGacGaaTttGaamCgaGa CESLC_Y70G10A.3_u383_LNA3 AttmCgaTcaAagAacTctGgcTctmCggmCgtTaamCtgGacAttTgtTcgTc CESLC_Y70G10A.3_u46_LNA3 rnCtcmCccGagmCagGcgAttAttmCacGctAgtTatGctmCaaAtgTgaTctGt CESOD_C15F1.7_u435_LNA3 mCcgGtamCtaTctGgaTcamCacAgaAgtmCcgAaaAtgAccAggmCagTtaTt CESOD_C15F1.7_u9_LNA3 mCccAgtGacTacmCtgAatmCgcGtcTctGaaTctmCcamCacAatTccTacTa CESOD_F10D11.1_u326_LNA3 GgaGttGctmCacmCgcAatTaaGagmCgamCttmCggAtcTctGgaTaaTctTc CESOD_F10D11.1_u477_LNA3 AaaTtgAggAaaAgcTtcAcgAggmCggTctmCcaAagGaaAcgTcaAagAa mCaaTcgTacmCatGaaAgaAgtTggAagmCcamCgtGcaAgaGaaGaaAtcmC
CESULT_EEED8.2_u316_LNA3 a
CESULT_EEED8.2_u82_LNA3 AagAagAttmCctGacmCagAgaGacTcamCgtGctTacmCcaAgaAgcAtcTa
CESULT_Y113G7A.11_u252_LNA3 AgcAttGgtGgaAatAcgAaaTggmCatGggAagAgaAacmCccTctmCaaTt
CESULT_Y113G7A.11_u96_LNA3 mCtgGttAcgGtaGtgTatGgtmCccTgtmCctmCtcAgaAtgmCaaAtaTgtmCg
CESULT_Y67A10A.4_u108_LNA3 TctAcgTcgAtgGaaAagmCcgAttTaamCaaTcaAagmCcaAcaAcgmCagTt
CESULT_Y67A10A.4_u327_LNA3 GgaAagGtgmCcaAaaAgtTgamCagmCaaTtgGagGatmCttAttmCatTgcmCa
CETOPO_K12D12.1_u398_LNA3 AgaTgaTgaTgaAgtTccTgcAaaGaaGccTgcTccAgcGaaGaaAgcTg
CETOPO_K12D12.1_u449_LNA3 AaaAccTcgTacTggAaaAggAgcTgcGaaAgcGgaAgtTatmCgaTttGt
CETOPO_M01 E5.5b_u256_LNA3 GagAagGccmCagAagAagTacGacAgamCtgAagGagmCagTtgAaaAagTt
CETOPO_M01 E5.5b_u429_LNA3 TtcTgtmCatAcaAtcGtgmCtaAtcGgcAggTtgmCgaTccTttGtaAccAt
CEUb i_F25B5.4_u186_LNA3 AagmCttmCggAcamCcaTtgAgaAtgTcaAagmCcaAaaTccAggAtaAggAg
CEUbi_F25B5.4_u2_LNA3 AatmCgaAccmCatmCaaTtcActmCgtTatTccTccTcgAtcTccGttmCaaGt
CEUbi_F29B9.6_u145_LNA3 mCtgAacmCatmCcaAatAttGaaGatmCcaGctmCagGctGaaGccTatmCagAt
CEUbi_F29B9.6_u230_LNA3 mCgtGtgmCttAtcTctTctGgaTgaAaamCaaGgaTtgGaaGccGtcAatmCt
CEUbi_M7.1_u239_LNA3 mCggAagmCatmCtgmCctTgamCatTctmCcgTtcGcaGtgGtcGccGgcTctG
CEUbi_M7.1_u53_LNA3 AaaGtamCgcTatGtgAggAggmCtaAcamCcaTtcAtaTaaGaamCgcAgcmCa
CEUGT_F39G3.1_u40_LNA3 TgtTgcmCgtAgaAgaGagActAaaActAagAacGatTgaTtgAagGtcTg
CEUGT_F39G3.1_u466_LNA3 TacAatTctTtgmCagGaaGcaAtaTccGccGgaGtcmCccmCttAtcActAt
CEUGT_M88.1_u480_LNA3 mCtcAcgGagGttAtaAttmCtaTgcAggAggmCaaTttmCtgmCtgGagTtcmCa
CEUGT_M88.1_u72_LNA3 AccGttTcaTgaGagmCtgTaaTcaGgtGttGttTctGtaAaaAgtGtgAa
YAL009W_u145_LNA3 GtgGatGtgAaaTtaGtcmCtcAacmCccAgaGcaTttAgtGcaGagAttAg
YAL009W_u341_LNA3 GcaGttTaaTgtGaaGctAgtTaaAgtAcaGtcTacGtgGgamCgaGaaAt
YAL059W_u262_LNA3 AttGccAagTccAttTctmCgtGccAagTacAttmCaaAatAcaAgaAagGc
YAL059W_u51_LNA3 AgamCtcmCtamCaaAtaGatTcgGtgTccTgcmCagAcgAtgTtgAagAarAg
YER109C_u109_LNA3 TtgAagTttGggAatAttGgtAtgGttGaaGacmCaaGgamCcgGatTacGa
YER109C_u436_LNA3 GagGcgmCaaGtaGgcAatGatTcaAgaAgtAgtAaaGgcAatmCgtAacAc
YHR152W_u128_LNA3 TgaGcamCaaAgtTaaGatGttmCggAaaGaaAaaGaaAgtmCaaTccTatGa mCaaGtgAcc^atmCag CacGcamCggmCttmCcaTccTcaAgamCtgAtaTta
YHR152W u510 LNA3 mCc
YKL130C_u211_LNA3 AttAaaTgcGcaGatGagGacGgaAcgAatAtcGgaGaaActGatAatAt YKL130C_u85_LNA3 GatGgtAagmCtgAgcGccTtgGacGaaGaaTttGatGttGtcGctActAa YKL178C_u199_LNA3 TacGtcAcgmCaaGgamCagAgcTttGacGacGaaAtaTcamCttGgaGgaTt YKL178C_u367_LNA3 TctmCccTgtGtaGgtAcamCcaAtaTcamCaaGcgmCatTtcTatGtcGacTa
TgcTaamCacmCagTttAgamCcaTggAaaTccmCacmCgcAaaTatAagmCaa
YLR443W_u179_LNA3 Tg
GcaGgamCatAagAttmCcgGtcAagmCaamCgamCagTgaAgaAagTatGcaA
YLR443W_u86_LNA3 a YOR092W_u251_LNA3 mCcgTctAgtGaaAgcGggAtgGctAaaTtgGgaAaamCgamCaaGatGttAt YOR092W u82 LNA3 GatGctTcaAtaTccTttGatGgtmCgtTagTttAccAttTttGgtGtcTt
AgtmCatTtgAgtTatGtgAagAccGttGgtGggAaaGaaGagAtcA
YPL263C_u132_LNA3 ggTg YPL263C u257 LNA3 GtcTtgGctAccAcamCccAaaAccGttmCgaAacTttAagAgcAttmCtamCt
Example 9. Performance Analysis of LNA Oligonucleotide Capture Probes Designed to Detect Ratios of Splice Variants in mRNA Pools. Oligonucleotide Design for Microarrays. The methods for designing exon-specific internal oligonucleotide capture probes are described above. Design of the LNA-modified Capture Probes
For the internal LNA-modified oligonucleotide capture probes, every third DNA nucleotide was substituted with an LNA nucleotide. The probes designed to capture the junction of the recombinant splice variants were designed with LNA modifications in a block of five consecutive LNAs nucleotides, two on the 5' side of the splice junction and three on the 3' side of the splice junction. All capture probes are shown in Table 14.
Table 14. Internal, exon-specific and merged, exon-exon junction specific oligonucleotide capture probes used in this example.
Capture probes Sequence (LNA=uppercasef DNA lowercase letters) gene78.01a_50_LNA3 mCrtGaaAgtAgaTttGttAttTccGaaAcgmCctTctmCccGttmCttAagTc mCatAtamCcamCaaAtaGtcmCctmCaaAaaTcamCaaGaaAacTcamCaamC gene78.01b_50_LNA3 acTg gene78.03a_50_LNA3 GatTtgmCagmCggTggTaaAaaGtaTgaAaamCgtGgtAatTaaAagGtcTc gene78.03b_50_LNA3 mCcaAtgAaaActAatmCaaAggTaaAcgTggAtcmCcaTggmCaaTtcmCcgGg gene78.m01INS3_50_block caacactgcccagaggttcaatcGATmCmCgatgatcctaatgaaggcgccc
gene78.mINS303_50_block gtccagtatcgtccatcatAGTATcgataaatatgtgaaggaaatgcctg gene78.m01INS4_50_block caacactgcccagaggttcaatcGATGTgtgataggatcagtgttcaggg gene78.mINS403_50_block gaaggcgaaggagactgctAATATcgataaatatgtgaaggaaatgcctg
Printing and Coupling of the Splice Isoform-specific Microarrays
The splice variant capture probes were synthesized with a 5' anthraquinone (AQ)- modification, followed by a hexaethyleneglycol-2 (HEG2) linker. The capture probes were first diluted to a 20 μM final concentration in 100 mM Na-phosphate buffer pH 7.0, and spotted on the lmmobilizer polymer microarray slides (Exiqon, Denmark) using the Biochip Arrayer One (Packard Biochip Technologies, USA) with a spot volume of 2x 300 pi and 300 μm between the spots. The capture probes were immobilized onto the microarray slide by UN irradiation in a Stratalinker with 2300 μjoules (Stratagene, USA). Νon-immobilized capture probe oligonucleotides were removed from the slides by washing the slides two times 15 minutes in lxSSC. After washing, the slides were dried by centrifugation at lOOOxg for 2 minutes, and stored in a slide box until microarray hybridization.
Construction of Splice Variant Vectors The recombinant splice variant constructs were cloned into the Triamplδ vector
(Ambion, USA). The constructs were sequenced to confirm their construction. The plasmid clones were transformed into E. coli XLIO-Gold (Stratagene, USA). Triampl8/SWI5 Vector Construct
Genomic DΝA was prepared from a wild type standard laboratory strain of Saccharomyces cerevisiae using the Νucleon MiY DΝA extraction kit (Amersham
Biosciences, USA) according to the supplier's instmctions. Amplification of the partial yeast gene was performed using standard PCR using yeast genomic DΝA as template. In the first step of amplification, a forward primer containing a restriction enzyme site and a reverse primer containing a universal linker sequence were used. In this step, 20 bp was added to the 3 '-end of the amplicon, next to the stop codon. In the second step of amplification, the reverse primer was exchanged with a nested primer containing a poly-T2o tail and a restriction enzyme site. The SWI5 amplicon contains 730 bp of the SWI5 ORF plus 20 bp universal linker sequence and a poly-A2o tail. The PCR primers used were; YDR146C-For-ΕcoRI: acgtgaattcaaatacagacaatgaaggagatga
YDRl 46C-Rev-Uni : gatccccgggaattgccatgttacctttgattagttttcattggc
Uni-polyT-BamHI: acgtggatccUtLlULltLtttttttttgatccccgggaattgccatg,
The PCR amplicon was cut with the restriction enzymes, EcoRI + BamHI. The DNA fragment was ligated into the pTRIamplδ vector (Ambion, USA) using the Quick Ligation Kit (New England Biolabs, USA) according to the supplier's instmctions and transformed into E. coli DH-5α by standard methods.
Construction of the Recombinant Splice Variant #1 (Triampl8/swi5 -rubisco)
The Arabidopsis thaliana Rubisco small subunit ssu2b gene fragment (gil7064721) was amplified from genomic DNA by primers named DJ305 5'- ACTATGATGGACGATACTGGAC-3' and DJ306 5'-
ATTGGATCGATCCGATGATCCTAATGAAGGC-3', containing Clal restriction site linkers. The purified PCR fragment was digested with Clal and then cloned into the swi5
(gl:7839148) vector at the unique Clal site (atcgat) giving each insert a flanking sequence from the original yeast SWI5 insert (named exonOl and exon 03, Figure 19). The product was inserted in the reverse orientation, so that the insert sequence is:
atcgatCCGATGATCCTAATGAAGGCGCCCGGGTACTCCTTCTTGCATTCTTCAACTT CCTTCAACACTTGAGCGGAGTCGGTGCATCCGAACAATGGAAGCTTCCACATTGT CCAGTATCGTCCATCATAGTatcgat
Nucleotide sequence analysis revealed a difference between the sequence of A. thaliana mbisco expected from the GenBank database and that obtained from all sequenced constmcts and PCR products. Position 30 in the Rubisco insert is "C" rather than the expected "A". This SNP was probably created by PCR. None of the oligonucleotide capture probes used in the example cover this region. Rubisco sequence in genbank is TCCTAATGAAGGCGCCA. The sequence obtained from the plasmid contract is TCCTAATGAAGGCGCCC. Construction of the Recombinant Splice variant # 2 (Triampl8/swi5-lea)
The Arabidopsis thaliana Lea gene (gi 1526423) was amplified from genomic DNA with primers named DJ307 5'-GGAATTATCGATGTGTGATAGGATCAGTGTTCAG-3' and DJ308 5'-AATTGGATCGATATTAGCAGTCTCCTTCGCC-3' including the Clal linker sites as above. The PCR fragment was digested with Clal cloned into the yeast S I5
INT construct as above at the unique Clal site. The fragment was inserted in the forward orientation, resulting in the following insert sequence: atcgatGTGTGATAGGTTCAGTGTTCAGGGCTGTCCAAGGAACGTATGAGCATGCGA GAGACGCTGTAGTTGGAAAAACCCACGAAGCGGCTGAGTCTACCAAAGAAGGA GCTCAGATAGCTTCAGAGAAAGCGGTTGGAGCAAAGGACGCAACCGTCGAGAA AGCTAAGGAAACCGCTGATTATACTGCGGAGAAGGTGGGTGAGTATAAAGACTA TACGGTTGATAAAGCTAAAGAGGCTAAGGACACAACTGCAGAGAAGGCGAAGG AGACTGCTAATatcgat
Preparation of Target
In vitro RΝA Preparation from Splice Variant Vectors
In vitro RΝA from the splice variants were made using the MEGAscript™ high yield transcription kit according to the manufacturer's instmctions (Ambion, USA). The yield of IVT RΝA was quantified at aΝanodrop spectrophotometer (Νanodrop Technologies, USA).
Isolation of Total RΝA from C. elegans
C. elegans wild-type strain (Bristol-Ν2) was maintained on nematode growth medium
(NG) plates seeded with Escherichia coli strain OP50 at 20 °C, and the mixed stages of the nematode were prepared as described in Hope, I. A. (ed.) " C. elegans - A Practical Approach ", Oxford University Press 1999. The samples were immediately flash frozen in liquid N2 and stored at - 80 °C until RNA isolation.
A 100 μl aliquot of packed C. elegans worms from a mixed stage population was homogenized using the FastPrep Biol 01 from Kem-En-Tec for 1 minute, speed 6 followed by isolation of total RNA from the extracts using the FastPrep Biol 01 kit (Kem-En-Tec) according to the manufacturer's instmctions. The eluted total RNA was ethanol precipitated for 24 hours at -20°C by addition of 2.5 volumes of 96% EtOH and 0.1 volume of 3M Na- acetate, pH 5.2 (Ambion, USA), followed by centrifugation of the total RNA sample for 30 minutes at 13200 rpm. The total RNA pellet was air-dried and redissolved in 10 μl of diethylpyrocarbonate (DEPC)-treated water (Ambion, USA) and stored at - 80°C. Fluorochrome-labelling of the Target
The following fluorochrome-labelled cDNA targets were synthesized to test the performance of 'merged' probes that span exon borders. Synthetic RNAs corresponding to the splice variant #1 (exon01-INS3-exon03 (1-INS3-3) and splice variant #2 (exon01-INS4-
exon03 (1-INS3-3) were spiked into lOμg of C. elegans reference total RNA sample in two different ratios. The first target pool (KU007) contained 10 ng of splice variant #1 (1-INS3- 3) transcript and 2 ng of variant #2 (1-INS4-3) transcript, a ratio of 5:1. The second target pool (KU008) contained 2 ng variant #1 (1-INS3-3) transcript and 10 ng of splice variant #2 5 (1-INS4-3) transcript, a ratio of 1 :5. Both mRNA pools were combined in separate labeling reactions with 5 μg anchored oligo(dT2o) primer and DEPC-treated water to a final volume of 8 μl. The mixture was heated at 70°C for 10 minutes, quenched on ice for 5 minutes, followed by addition of 20 units of Superasin RNase inhibitor (Ambion, USA), 1 μl dNTP solution (lOmM each dATP, dGTP, dTTP and 0.4 mM dCTP, and 3 μl Cy5-dCTP,
10 Amersham Biosciensces, USA), 4 μl 5 x RTase buffer (Invitrogen), 2μl 0.1 mM DTT (Invitrogen), 400 units of Superscript II reverse transcriptase (Invitrogen, USA) and DEPC- treated water to 20 μl final volume. Background hybridization to merged capture probes was monitored in both hybridizations using the other fuor channel with lOμg of C. elegans reference RNA alone labeled with Cy3-dCTP, according to the labeling method described
15 above for the splice variant spikes. All four cDNA syntheses were carried out at 42°C for 2 hours, and the reaction was stopped by incubation at 70°C for 5 minutes, followed by incubation on ice for 5 minutes.
Unincorporated dNTPs were removed by gel filtration using MicroSpin S-400 HR columns as described below. The column was pre-spun for 1 minute at 1500 xg in a 1.5 ml
20 tube, and the column was placed in a new 1.5 ml tube. The cDNA sample was slowly to the top center of the resin, spun 1500-xg for 2 minutes, and the eluate was collected. The RNA was hydrolyzed by adding 3 μl of 0.5 M NaOH, mixing, and incubating at 70 °C for 15 minutes. The samples were neutralized by adding 3 μl of 0.5 M HC1 and mixing, followed by addition of 450 μl lxTE, pH 7.5 to the neutralized sample and transfer onto a Microcon-
25 30 concentrator (prior to use, 500 μl lxTE was spun through the column to remove residual glycerol). The samples were centrifuged at 14000-xg in a microcentrifuge for 12 minutes.
Spinning was continued until volume was reduced to 5 μl. The labelled cDNA probes were eluted by inverting the Microcon-30 tube and spinning at 1000-xg for 3 minutes.
Microarray Hybridization 30 The fluorochrome-labelled cDNA samples, respectively, were combined (the two different ratios separately). The following were added: 3.75 μl 20x SSC (3x SSC final, which was passed through a 0.22 μfilter prior to use to remove particulates) yeast tRNA (1 μg/μl final) 0.625 μl 1 M HEPES, pH 7.0 (25 mM final, which was passed through 0.22
μfilter prior to use to remove particulates) 0.75 μl 10 % SDS (0.3 % final) and DEPC-water to 25 μl final volume. The labelled cDNA target samples were filtered in Millipore 0.22 μ filter spin column (Ultrafree-MC, Millipore, USA) according to the manufacturer's instructions, followed by incubation of the reaction mixture at 100 °C for 2-5 minutes. The cDNA probes were cooled at room temp for 2-5 minutes by spinning at maximum speed in a microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was carefully placed on top of the microarray spotted on ImmobilizerTM MicroArray Slide, and the hybridization mixture was applied to the array from the side. An aliquot of 30 μL of 3xSSC was added to both ends of the hybridization chamber, and the lmmobilizer™ MicroArray Slide was placed in the hybridization chamber (DieTech, USA). The chamber was sealed watertight and incubated at 65 °C for 16-18 hours submerged in a water bath. After hybridization, the slide was removed carefully from the hybridization chamber and washed using the following protocol. The slides were washed sequentially by plunging gently in 2 x SSC/0.1% SDS at room temperature until the cover slip falls off into the washing solution, then in lx SSC pH 7.0 (150 mM NaCl, 15 mM Sodium Citrate) at room temperature for 1 minute, then in 0.2 x SSC, pH 7.0 (30 mM NaCl, 3 mM Sodium Citrate) at room temperature for 1 minute, and finally in 0.05 x SSC (7.5 mM NaCl, 0.75 mM Sodium Citrate) for 5 seconds, followed by drying of the slides by spinning at 1000 xg for 2 minutes. The slides were stored in a slide box in the dark until scanning. Microarray data analysis
The splice variant microarray was scanned in a ScanArray 4000XL confocal laser scanner (Packard Instruments, USA). The hybridization data were analysed using the GenePix Pro 4.01 microarray analysis software (Axon, USA). Only the Cy5 (650 nm) data were examined as both hybridizations produced comparable, and acceptably low, signal from the C. elegans reference RNA alone (Cy3 channel). Normalization
Data was normalized so that it could be compared between hybridizations. Both hybridizations contained the same amount of RNA from synthetic exons 01 and exons 03 (10+2 ng), so signal from the capture probes designed to internal regions of these exons is expected to be equal. The ratio of raw Cy5 signal between the two different labeled cDNA target pools, designated as KU007 and KU008 hybridizations, for each probe corresponding to either of these exons was calculated, that is for each probe _we calculated the ratio
probe/KU007/ probetKU008). The average of all of these ratios was used as the normalization ratio. Expectations of normalized data.
To reflect the proportions of RNA spiked into the hybridization, the ratio of signal in hybridization KU007/KU008 should be 5 for probes designed to exon junctions of the LNS3 splice variant #1 and 0.2 for probes corresponding to 1-INS4 splice variant #2. Data was log2 transformed: log2(5)=+2.32, log2(0.2)=-2.32. The merged probe corresponding to the exon 01 -exon 03 border desirably produces a consistently low value that is desirably independent of which transcript was more abundant, i.e., log2(ratio)=0. Array results
Results are summarized in Table 15. 50-mer capture probes containing LNA in a block spanning exon-exon junctions were consistent in producing the expected ratios.
Table 15. LNA 50-mer block probes are most consistent in producing overall data closest to expected ratios.
Example 10: Improved Signal-to-noise Ratios using LNA Oligonucleotide Capture Probes
Combined with cDNA Target Fragmentation with the E. coli Uracil-DNA Glycosylase. Capture Probe Design
The capture probes were designed to a 602-nucleotide sequence in the 3 '-region of the Yeast (S. cerevisiae) 70 kDa heat shock protein (SSA4) gene. The 602-base pair sequence is shown in Table 16. For the LNA-spiked oligonucleotide capture probes, every third DNA nucleotide was substituted with a LNA nucleotide. All capture probes are shown in Table 17.
Table 16. Six hundred and two (602) base pair sequence stretch of theS. cerevisiae ssa4 gene. The underlined segments indicate the position of the capture probes. First underline is equal to capture probe YER103W-554, second underline is equal to capture probe YER103W-492 and so forth.
ggt;gaaaggacaaggacaaaagacaacaatctactgggtaaatttgagttgageggtattccacccgetccaagaggcgtaccac aaattgaagttac..tttgatatcgatgcaaatggtattctgaacgtatctgccgttgaaaaaggtactggtaaatctaacaagat tacaattactaacgataagggaagattatcgaaggaagatatcgataaaatggttgetgaggcagaaaagttcaaggccgaagat gaacaagaagctcaacgtgttcaagetaagaatcagctagaatcgtacgcgtttactttgaaaaattctgtgagegaaaataact tcaaggagaaggtgggtgaagaggatgeeaggaaattggaagccgccgcccaagatgetataaattggttagatgcttcgcaagc ggcctccaccgaggaatacaaggaaaggcaaaaggaactagaaggtgttgcaaaccccattatgagtaaattttacggagctgea ggtggtgccccaggagcaggcccagttccgggbgetggagcaggecccactggagcaccagacaacggcccaacggttgaagagg ttgattag
Table 17. Capture probes for the SSA4 tile array.
Oligo Name Sequence
YER103W-1-DNA gccccactggagcaccagacaacggcccaacggttgaagaggttgattag
YER103W-38-DNA gccccaggagcaggcccagttccgggtgctggagcaggccccactggagc
YER103W-73-DNA ccattatgagtaaattttacggagctgcaggtggtgccccaggagcaggc
YER103W-92-DNA ctagaaggtgttgcaaaccccattatgagtaaattttacggagctgcagg
YER103W-127-DNA cctccaccgaggaatacaaggaaaggcaaaaggaactagaaggtgttgca
YER103W-200-DNA ggtgaagaggatgccaggaaattggaagccgccgcccaagatgctataaa
YER103W-245-DNA actttgaaaaattctgtgagcgaaaataacttcaaggagaaggtgggtga
YER103W-272-DNA aagaatcagctagaatcgtacgcgtttactttgaaaaattctgtgagcga
YER103W-336-DNA aatggttgctgaggcagaaaagttcaaggccgaagatgaacaagaagctc
YER103W-393-DNA taacaagattacaattactaacgataagggaagattatcgaaggaagata
YER103W-447-DNA cgatgcaaatggtattctgaacgtatctgccgttgaaaaaggtactggta
YER103W-492-DNA acccgctccaagaggcgtaccacaaattgaagttacatttgatatcgatg
YER103W-554-DNA ggtgaaaggacaaggacaaaagacaacaatctactgggtaaatttgagtt
YER103W-1-LNA1 GccmCcamCtgGagmCacmCagAcaAcgGccmCaamCggTtgAagAggTtgAttAg
YER103W-38-LNA1 GccmCcaGgaGcaGgcmCcaGttmCcgGgtGctGgaGcaGgcmCccActGgaGc
YER103W-73-LNA1 mCcaTtaTgaGtaAatTttAcgGagmCtgmCagGtgGtgmCccmCagGagmCagGc
YER103W-92-LNA1 mCtaGaaGgtGttGcaAacmCccAttAtgAgtAaaTttTacGgaGctGcaGg
YER103W-127-LNA1 inCctmCcamCcgAggAatAcaAggAaaGgcAaaAggAacTagAagGtgTtgmCa
YER103W-200-LNA1 GgtGaaGagGatGccAggAaaTtgGaaGccGccGccmCaaGatGctAtaAa
YER103W-245-LNA1 ActTtgAaaAatTctGtgAgcGaaAatAacTtcAagGagAagGtgGgtGa YER103W-272-LNA1 AagAatmCagmCtaGaaTcgTacGcgTttActTtgAaaAatTctGtgAgcGa YER103W-336-LNA1 AatGgtTgcTgaGgcAgaAaaGttmCaaGgcmCgaAgaTgaAcaAgaAgcTc YER103W-393-LNA1 TaamCaaGatTacAatTacTaamCgaTaaGggAagAttAtcGaaGgaAgaTa YER103W-447-LNA1 mCgaTgcAaaTggTatTctGaamCgtAtcTgcmCgtTgaAaaAggTacTggTa YER103W-492-LNA1 AccmCgcTccAagAggmCgtAccAcaAatTgaAgtTacAttTgaTatmCgaTg YER103W-554-LNA1 GgtGaaAggAcaAggAcaAaaGacAacAatmCtamCtgGgtAaaTttGagTt Control capture probes
YFL039C-50 acaagaatacgacgaaagtggtccatctatcgttcaccacaagtgtttct
YFL039C-50J.NA3 AcaAgaAtamCgamCgaAagTggTccAtcTatmCgtTcamCcamCaaGtgTttmCt
YDR146C-50 tgggaatggaacggggattatgg tttcgccaatgaaaactaatcaaaggt
YDR146C-50_LNA3 TggGaaTggAacGggGatTatGgtTtcGccAatGaaAacTaaTcaAagGt
Printing and Coupling of the Yeast SSA4 Tile Microarrays
The SSA4 capture probes were synthesized with a 5' anthraquinone (AQ)- 5 modification, followed by a hexaethyleneglycol-2 (HEG2) linker. The capture probes (Table 17) were first diluted to a 20 μM final concentration in 100 mM Na-phosphate buffer pH 7.0, and spotted on the lmmobilizer microarray slides (Exiqon, Denmark) using the Biochip Arrayer One (Packard Biochip Technologies) with a spot volume of 2x 300 pi and 400 μm between the spots. The capture probes were immobilized onto the microarray slide by UV 10 irradiation in a Stratalinker with 2300 μjoules (Stratagene, USA). Non-immobilized capture probe oligonucleotides were removed from the slides by washing the slides two times 15 minutes in lxSSC. After washing, the slides were dried by centrifugation at lOOOxg for 2 minutes, and stored in a slide box until microarray hybridization.
Yeast Cultures 15 Saccharomyces cerevisiae wild-type (BY4741 , MATa; his3Δl ; leu2Δ0; metl 5Δ0; ura3Δ0) and Assa4 (MATa; his3Δl; leu2Δ0; metl5Δ0; ura3Δ0; YER103w::kanMX4) mutant strains (EUROSCARF) were grown in YPD at 30°C until the A6oo density of the cultures reached 0.8. Half of the cultures were collected by centrifugation and resuspended in one volume of 40°C preheated YPD. Incubation was continued for an additional 30 minutes at
20 30°C or 40°C for the standard and heat-shocked cultures, respectively. Cells were harvested by centrifugation and stored at -80°C.
RNA Extraction
Total RNA was extracted using the FastRNA Kit-RED (BIO 101) according to suppliers' instmctions. The quantity and quality of the RNA preparations were examined by standard spectrophotometry on a NanoDrop ND-1000 (USA) and by gel electrophoresis. Only high quality RNA preparations were used for microarray analyses. Fluorochrome-labelling of the Target
A total of seven cDNA assay mixtures were produced; each with ten (10) μg total RNA from wtand combined with 5 μg anchored oligo(dT2o) primer and DEPC-treated water to a final volume of 8 μl. The mixtures were heated at 70°C for 10 minutes, quenched on ice for 5 minutes, followed by addition of 20 units of Superasin RNase inhibitor (Ambion, USA), 3 μl Cy3-dCTP (Amersham Biosciences), lOmM final concentration of dATP and dGTP, 4 μl 5 x RTase buffer (Invitrogen), 2μl 0.1 mM DTT (Invitrogen), 400 units of Superscript II reverse transcriptase (Invitrogen, USA), dUTP and dTTP accordingly to Table 18, and DEPC-treated water to 20 μl final volume. A parallel set-up was made with 10 μg total RNA from Δssa4 for target cDNA labelling with Cy5-dCTP. All cDNA syntheses were carried out at 42°C for 2 hours, and the reaction was stopped by incubation at 70°C for 5 minutes, followed by incubation on ice for 5 minutes. Each cDNA pool (except the unfragmented control pool) was incubated at 37°C for 2 hours with 2 units of Uracil-DNA Glycosylase (UDG, New England Biolabs, USA) and by addition of 2.4 μl (lx final concentration in the reaction mixture) of UDG reaction buffer. The enzyme was heat-inactivated at 95 °C for 10 minutes. Unincorporated dNTPs were removed by gel filtration using MicroSpin S-400 HR columns as described in Example 9.
Table 18. dUTP and dTTP ratios in cDNA target labelling.
Gel electrophoresis of the cDNA target pools
0.5 μl of each of the seven fragmented cDNA pools were analysed on a 2% agarose- gel. The data show that the cDNA is fragmented linearly with respect to the concentration of dUTP used in the synthesis. Figure 38 shows the gel electrophoresis of fragmented cDNA from the yeast wild-type strain.
Comparative Hybridization of the SSA4 Tile Array with Fluorochrome-labelled wild-type and Δssa4 cDNA Target Pools and Post-hybridization Washes
The fluorochrome-labelled cDNA samples, respectively, were combined (the different UDG-fragmented samples separately). The following were added: 3.75 μl 20x SSC (3x SSC final, pass through 0.22 μ filter prior to use to remove particulates) yeast tRNA (1 μg/μl final) 0.625 μl 1 M HEPES, pH 7.0 (25 mM final, pass through 0.22 μ filter prior to use to remove particulates) 0.75 μl 10 % SDS (0.3 % final) and DEPC- water to 25 μl final volume. The labelled cDNA target samples were filtered in Millipore 0.22 μ filter spin column (Ultrafree-MC, Millipore, USA) according to the manufacturer's instmctions, followed by incubation of the reaction mixture at 100 °C for 2-5 minutes. The cDNA probes were cooled at room temp for 2-5 minutes by spinning at maximum speed in a microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was carefully placed on top of the SSA4 microarrays spotted on lmmobilizer™ MicroArray Slide, and the hybridization mixture was applied to the array from the side. An aliquot of 30 μL of 3xSSC was added to both ends of the hybridization chamber, and the slide was placed in the hybridization chamber (DieTech, USA). The chamber was sealed watertight and incubated at 65°C for 16-18 hours submerged in a water bath. After hybridization, the slide was removed carefully from the hybridization chamber and washed using the following protocol. The slides were washed sequentially by plunging gently in 2 x SSC/0.1% SDS at room temperature until the cover slip falls of into the washing solution, then in lx SSC pH 7.0 (150 mM NaCl, 15 mM Sodium Citrate) at room temperature for 1 minute, then in 0.2 x SSC, pH 7.0 (30 mM NaCl, 3 mM Sodium Citrate) at room temperature for 1 minute, and finally in 0.05 x SSC (7.5 mM NaCl, 0.75 mM Sodium Citrate) for 5 seconds, followed by drying of the slides by spinning at 1000 xg for 2 minutes. The slides were stored in a slide box in the dark until scanning. Microarray Data Analysis
The slides were scanned in a ScanArray 4000XL confocal laser scanner (Packard Instruments, USA). The hybridization data were analysed using the GenePix Pro 4.01 microarray analysis software (Axon, USA).
In the data analysis, the differences in labelling efficiency between the two fluorescent dyes were scaled by using an internal normalization approach. The average signal intensities from the control capture probes (Table 17) were used to calculate the normalization factor. This factor was multiplied to the signal intensity values from the Cy-3 target. Analysis of the data demonstrates that capture probes with LNA in every third position have up to 5.2 fold higher signal-to-noise ratios, compared to the DNA capture probes (Figure 39).
Example 11 : Interpretation of Splice Array Data Using LNA Discriminating Probes.
This example illustrates the interpretation of microarray analysis of alternative mRNA splicing. Different LNA capture probe design types are formalized, and the expression constant 0 is introduced as a measurement of alternative splicing. Introduction
The eukaryotic pre-mRNA is the subject of Splicing and Alternative Splicing, hence sequences refer to RNA sequences, Original sequence refers to pre-mRNA, and splice forms refer to mRNA sequences. The splicing is conducted by a cellular machinery named the spliceosome. The terms exons and introns can be used to refer to regions of pre-RNA sequences (or more specifically a single splice form). It is noted that a part of the corresponding DNA/pre-mRNA sequence that is an exon (not excised) in one splice form can potentially be absent in another splice form (e.g., partly absent in exon truncation and completely absent in exon skipping). Thus, the terms "constant regions" and "variable regions" (see below) are useful for characterizing the process of identifying different splice forms.
Splicing can be defined as the production of a new sequence via the excision of part(s) of an original sequence (Figure 40). Alternative splicing can be defined as the production of more than one novel sequence via the excision of different parts of the original sequence. When comparing two different splice forms, they can be divided into a constant region that is shared by both sequences and a variable region by which the two splice forms differ (Figure 41).
Alternative splicing can be categorized in terms of (i) whether or not the variable region is flanked by a single constant region or surrounded by two constant regions, (ii) the size of the variable region (e.g., exon skipping/intron retention vs. extension and truncation) [(intron/exon) 5' and 3'], and (iii) the number of variable regions (and hence the number of splice forms). Capture Probe Design
Capture Probe design can be divided into 3 distinct types according to their position: Merged Probes (MP) or Junction Probes, Unique Internal Probes (UIP), and Shared Internal Probes (SIP) (Figure 42). Considering the case of a single variable region surrounded by constant regions, there are several different possible capture probe positions for each type (Figure 43). Data Interpretation
The aim of the analyses can be to determine (i) whether a given original sequence is subject to alternative splicing (i.e., whether there is more than one splice form present), and (ii) whether there is a difference in alternative splicing of the original sequence between two biological samples (i.e., whether the proportions between the two splice forms differ between biological samples). The analysis can also be used for data validation.
Possible biases in the microarray platform include (a) noise in terms of non-specific binding and subsequent false signal, (b) differences in dye labeling efficiency, (c) differences in capture probe affinity, (d) differences in sample conditions (e.g., number of cells, and amount of RNA), and (e) differences in reverse transcriptase efficiency of different splice forms. Biases can be corrected for by various means of normalization and/or standardization. Data Analysis
In order to analyze the expression of the different splice forms, the expression constant 0 is introduced. 0 denotes the relation between the proportions of the signals (capture probes a and b) between the labeled extracts from biological samples (labeled with Cy5 & Cy3). That is,
(Cy5a / Cy3a) = (Cy5b / Cy3b) * 0 or,
0 = (Cy5a / Cy3a) / (Cy5b / Cy3b) = (Cy5a * Cy3b) / (Cy5b * Cy3a) and,
0 = (Cy5a / Cy5b) / (Cy3a / Cy3b) = (Cy5a * Cy3b) / (Cy3a * Cy5b) = (Cy5a * Cy3b) / (Cy5b * Cy3a) [same as above]
Considering normalization due to different biases and given a sample normalization factor S due to differences between the samples in terms of amounts of RNA, RT-efficiency, dye properties, etc. and a probe normalization factor P due to differences in probes in terms of affinity, position in target sequence, etc., the following equations apply.
For two probes: a and b, a * P = b
For two samples Cy5 & Cy3, Cy5 * S = Cy3,
Thus, considering two probes from two samples the signals are:
- Cy5a * P * S - Cy5b * S
- Cy3a * P - Cy3b
With respect to 0:
0 = [(Cy5a * P * S) / (Cy3a * P)]/[ (Cy5b * S)/( Cy3b)] = (Cy5a * P * S * Cy3b) / (Cy3a * P * Cy5b * S) = (Cy5a * Cy3b * P * S) / (Cy3a * Cy5b * P * S) = (Cy5a * Cy3b) / (Cy3a * Cy5b) * (P * S) / (P * S) = (Cy5a * Cy3b) / (Cy3a * Cy5b) * 1
= (Cy5a * Cy3b) / (Cy3a * Cy5b) [same as without normalization]
Note that the calculation of 0 is not affected by the normalization factors S and P, hence it is not necessary to normalize the array data when interpreting alternative splice arrays with the use of the Expression constant 0. Properties of0
If 0 = 1, there is no difference in the proportions of the targets of capture probes a and b in the two samples. Even in the case of alternative splicing, it is not possible to determine whether there is more than a single splice form present using this particular
method. If 0 ≠ 1, there is a difference in the proportions of the targets of capture probes a and b in the two samples, thus there is a difference in splice pattern and therefore there must be more than one splice form present. Comparing 0 's 0 can be compared between different transcripts to determine whether they have correlated expression, and 0's from sets of capture probes from the same transcript (different probes) can be averaged. Example
Considering a simple example of a single large variable region surrounded by constant regions using a combination of a Merged Probe and a Shared Internal Probe.
Calculating 0 of a single splice form can be performed using the following equation:
0 = (Cy5MP * Cy3SIP) / (Cy3MP * Cy5SIP)
If 0 = 1 , there is no difference in the proportions of the targets of capture probes a and b in the two samples, and it may not be possible to determine whether multiple splice forms are present using this particular method. If 0 ≠ 1, there is a difference in the proportions of the targets of capture probes a and b, thus there is a difference in splice pattern and therefore there must be more than one splice form present. Conclusions
It is possible to infer difference in expression level of two capture probe targets from two tissues when one is comparing the proportions of signals from one capture probe with the proportion of signals from the other probe. In contrast, single signals may be subject to biases from normalizations and standardizations for each probe and sample.
Example 12: Exemplary Microarrays
The nucleic acid arrays of the invention can be generated by standard methods for either synthesis of nucleic acid probes that are then bonded to a solid support or synthesis of the nucleic acid probes on a solid support (e.g., by sequential addition of nucleotides to a reactive group on the solid support). In desirable methods for on-chip synthesis of the capture probes, photogenerated acids are produced in light-irradiate sites of the chip and used to deprotect the 5'-OH group of nucleic acid monomers and oligomers (e.g., to remove an acid-labile protecting group such as 5'-O-DMT) to which a nucleotide is to be added (Gao et
al., Nucleic Acid Research 29:4744-4750, 2001). Standard methods can also be used to label the nucleic acids in a test sample with, e.g., a fluorescent label, incubate the labeled nucleic acid sample with the array, and remove any unbound or weakly bound test nucleic acids from the array. Exemplary methods are described, for example, in U.S.P.N. 6,410,229; 6,406,844; 6,403,957; 6,403,320; 6,403,317; 6,346,413; 6,344,316; 6,329,143; 6,310,189; 6,309,831; 6,309,823; 6,261,776; 6,239,273; 6,238,862; 6,156,501; 5,945,334; 5,919,523; 5,889,165; 5,885,837; 5,744,305; 5,445,934; 5,800,9927; and 5,874,219.
In an exemplary method for synthesis of an array, capture probes were immobilized using AQ technology with a HEG5 linker (U.S.P.N. 6,033,784) onto an lmmobilizer™ slide. An exemplary chip consists of 288 spots in four replicates (i.e., 1152 spots) with a pitch of 250 μm, and an exemplary hybridization buffer is 5xSSCT (i.e., 750 mM NaCl, 75 mM Sodium Citrate, pH 7.2, 0.05% Tween) and 10 mM MgCl2. An exemplary target is a 45-mer oligonucleotide with Cy5 at the 5' end and with a final concentration in the hybridization solution of 1 μM. Hybridization was performed with 200 μL hybridization solution in a hybridization chamber created by attaching a CoverWell™ gasket to the lmmobilizer slide. The incubation was conducted overnight at 4°C. After hybridization, the hybridization solution was removed, and the chamber was flushed with 3 x 1.0 mL hybridization buffer described above without any target nucleic acid. A cover Well™ chamber was then filled with 200 μL hybridization solution without target. The slide was observed with a Zeiss Axioplan 2 epifluorescence microscope with a 5x Fluar objective and a Cy5 filterset from OMEGA. The temperature of the microscope stage was controlled with a Peltier element. Thirty-five images at each temperature were acquired automatically with a Photometries camera, automated shutter, and motorized microscope stage. The images were acquired, stitched together, calibrated and stored in stack by the software package "MetaNue"
Arrays can be generated using capture probes of any desired length (e.g., arrays of pentamers, hexamers, or heptamers.) In various embodiments, 1, 2, 3, 4, 5, 6, 7, 8, or more nucleotides of the probes are LΝA nucleotides. Desirably, at least 1, 2, 3, 5, 7, 9, or all of the A and T nucleotides in the probes are LΝA A and LΝA T nucleotides. LΝA nucleotides can be placed in any position of the capture probe, such as at the 5' terminus, between the 5' and 3' termini, or at the 3' terminus. LΝA nucleotides may be consecutive or may be separated by one or more other nucleotides. The microarrays can be used to analyze target nucleic acids of any "AT" or "GC" content, and are especially useful for analyzing nucleic acids with
high "AT" content because of the increased affinity of the microarrays of the present invention for such nucleic acids compared to traditional microarrays. Desirably, the array has at least 100, 200, 300, 400, 500, 600, 800, 1000, 2000, 5000, 8000, 10000, 15000, 20000, or more different probes. If desired, nucleotides with a universal base can be included in the capture probes to increase the Tm of the capture probes (e.g., capture probes of less than 7, 6, 5, or 4 nucleotides). Exemplary "non-discriminatory" nucleotides include inosine, random nucleotides, 5 nitro-indole, LNA, inosine, and LNA 2-aminopurine. In desirable embodiments, 1, 2, 3, 4, 5, or more nucleotides with a universal base are located at the 5' and/or 3' termini of the capture probes.
Example 13: Exemplary Application of Nucleic Acids of the Invention
An exemplary application of these methods includes comparing hybridization patterns of cDNA or cRNA from a patient sample to classify early-tumors or detect an infection or a diseased state. The microarrays of the invention may also be used as a general tool to analyze the PCR products generated by amplification of a test sample with PCR primers for one or more nucleic acids of interest. For example, PCR primers can be used to amplify nucleic acids with a particular exon or exon-exon combination, and then the PCR products can be identified and/or quantified using a microarray of the invention. For identification of splice variants, PCR primers to specific exons can be used to amplify nucleic acids that are then applied to a microarray for detection and/or quantification as described herein. To detect microbial pathogens, species-specific PCR primers (e.g., primers specific for an exon whose sequence differs among species) can be used to amplify nucleic acids in a sample for subsequent analysis using a microarray. For example, the hybridization pattern of the PCR products to the array can be used to distinguish between different bacteria, viruses, or yeast and even between different strains of the same pathogenic species. In particular embodiments, the array is used to determine whether a patient sample contains a bacteria strain that is known to be resistant or susceptible to particular antibiotics or contains a viras or yeast strain known to be resistant or susceptible to certain drugs. Changes in product composition or raw material origin can also be detected using a microarray. The arrays can also be used to determine the composition of mRNA cocktails.
Exemplary environmental microbiology applications of these arrays include identification of major rRNA types in contaminated soil samples and classification of microbial isolates. These rRNA amplificates are formed from rRNA by rtPCR or from the
rDNA gene by conventional PCR. Numerous general and selective primers for different groups of organisms have been published. Most frequently an almost full length amplificate of the 16S rDNA gene is used (e.g., the primers 26F and 1492R). For purifying rRNA from a soil sample, standard methods such as one or more commercial extraction kits from companies such as QIAGEN ("Rneasy", Q-biogene "RNA PLUS," or "Total RNA safe" can be used.
Example 14: Methods for Minimizing the Variance in Melting Temperatures in Nucleic Acid Populations of the Invention Any simultaneous use of more than one primer or probe is made difficult because the involved primers or probes must work under the same conditions. An indication of whether or not two or more primers or probes will work under the same conditions is the relative Tms at which the hybridized oligonucleotides dissociate. In cases where probes are applied for specific detection of homologous sequences such as splice variants, the ΔTm is of importance. ΔTm expresses the difference between Tm of the match and the Tm of the mismatch hybridizations. Generally, the larger ΔTm obtained, the more specific detection of the sequence of interest. In addition, a large ΔTm facilitates more probes to be used simultaneously and in this way a higher degree of multiplexity can be applied.
High affinity nucleotide analogs such a LNA can be also be used universally to equalize the melting properties of oligonucleotides with different AT and CG content. The increased affinity of LNA adenosine and LNA thymidine corresponds approximately to the normal affinity of DNA guanine and DNA cytosine. An overall substitution of all DNA-A and DNA-T with LNA-A and LNA-T results in melting properties that are nearly sequence independent but only depend on the length of the oligonucleotide. This may be important for design of oligonucleotide probes used in large multiplex analysis. The effect of LNA A and T substitutions has been evaluated by predicting the Tm value of all possible 9-mer oligonucleotides with different universal substitutions. The distribution of the 262,000 Tm- values exhibits a very homogeneous Tm value for universally LNA A and T substituted oligonucleotides. The standard deviation of the melting temperature for all 9-mers drops from 7.7°C for pure DNA to only 2.2°C for LNA A and T substituted oligonucleotides. This equalizing effect may also be utilized for photomediated on-chip synthesis of oligonucleotides.
It is often difficult to design probes and primers with the same range of melting temperature due to the variance in A/T and G/C content of the probing sites. Highly A/T rich regions typically give lower Tm values. Furthermore, if single mismatches are to be resolved, G/T mismatches are known to contribute little to ΔTm. As discussed above, the use of LNA is a desirable way to solve problems related to multiplex use of primers and probes. LNA offers the possibility to adjust Tm and increase the ΔTm at the same time. LNA increases Tm with 4-8°C/substitution and increases ΔTm in many cases (Table 9). Table 9. Demonstration of LNA controlled increase of Tm and Δ Tm.
As LNA can be mixed with DNA during standard oligonucleotide synthesis, LNA can be placed at optimal positions in probes in order to adjust Tm. The specificity of PCR may also be enhanced by the use of LNA in primers, or probes, and this facilitates a higher degree of multiplexity. By incorporation of LNA, the Tm of the primers or probes can be adjusted to work at the same temperature. Amplification or hybridization is more specific when LNA is included in primers or probes. This is due to the LNA increased ΔTm, which relates to higher specificity. Once ΔTm of the primers or probes is high, more primers or probes can potentially be brought to work together. Prediction ofTm LNA can be used to enhance any experiment that is based on hybridization. The series of algorithms described herein have been developed to predict the optimal use of LNA. Melting properties of 129 different LNA substituted capture probes hybridized described herein to their corresponding DNA targets were measured in solution using UV- spectrophotometry. The data set was divided into a training set with 90 oligonucleotides and a test set with 39 oligonucleotides. The training set was used for training of both linear regression models and neural networks. Neural networks trained with nearest neighbour information, length, and DNA/LNA neighbour effect are efficient for prediction of Tm with the given set of data.
Applications of the Normalization of Thermal Stability by LNA A and T Nucleotide Substitutions
All assays in which DNA/RNA hybridization is conducted may benefit from the use of LNA in terms of increased specificity and quality. Exemplary uses include sequencing, 5 primer extension assays, PCR amplification, such as multiplex PCR, allele specific PR amplification, molecular beacons, (e.g., nucleic acids be multiplexed with one colour based on multiple Tm's), Taq-man probes, in situ hybridization probes (e.g., chromosomal and bacterial 16S rRNA probes), capture probes to the mRNA poly- A tail, capture probes for microarray detection of SNPs, capture probes for expression microarrays (sensitivity 10 increased 5-8 times), and capture probes for assessment of alternative mRNA splicing.
Example 15: Exemplary Methods for the Prediction of Melting Temperatures for Nucleic Acid Populations of the Invention
LNA units have different melting properties than DNA and RNA nucleotides. Until
15 recently, thermodynamical models for melting temperature prediction have existed for DNA and RNA only, but not for LNA. Now a Tm prediction model for LNA/DNA mixed oligonucleotides has been developed. The Tm prediction tool is available on-line at the Exiqon website (www.LNA-Tm.com and http://www.exiqon.com/Poster/Tmpred-ET- view.pdf).
20 Numerous applications in molecular biology are based on the ability of DNA and
RNA to hybridize in a temperature dependent manner (e.g. the microarray techniques, PCR reactions and blotting techniques). The melting properties of nucleic acid duplexes, in particular the melting temperature Tm, are crucial for optimal design of such experiments. Tm is usually computed using a two-state thermodynamical model (Breslauer, Meth. EnzymoL,
25 259:221-242, 1995). Several different groups have estimated model parameters for nearest neighbors in the sequence based on experimental data (for a review see SantaLucia, Proc. Natl. Acad. Sci., 95:1460-1465, 1998).
The model described herein predicts the Tm of duplexes of mixed LNA/DNA oligonucleotides hybridized to their complementary DNA strands. DNA monomers are
30 denoted with lowercase letters, and LNA monomers are denoted with uppercase letters, e.g., there are eight types of monomers in the mixed strand: a, c, g, t, A, C, G and T. The model is based on the formula (SantaLucia, 1998, supra; Allawi et al, Biochemistry 36:10581-10594, 1997).
NH
T m =
ΔS + R - ln(C- C„, /2)+0.368(E-l)ln[Na+]'
in which the salt concentration [Na+] enters as an entropic correction together with the oligonucleotide concentrations. R is the gas constant, C and Cm are the concentrations of the two strands where C > Cm , and L is the length of the strands. For self-complementary sequences, C -Cm /2 is replaced by the total strand concentration Cτ and a symmetry correction of —1.4 cal/k-mol is added to S (SantaLucia, 1998, supra).
The LNA model differs from SantaLucia' s DNA model in the way the changes in enthalpy lHand entropy ΔS are calculated. As in SantaLucia' s model, they depend on nearest neighbor sequence information and special contributions for the terminal base-pairs in the two ends of the duplex. However, with eight types of monomers (LNA and DNA) the increased number of nearest neighbor combinations requires more model parameters to be determined and hence more data. Parameter Reduction
Usually zlHand ΔS are calculated as a sum of contributions from all nearest neighbor pairs in the sequence. The inclusion of LNA doubles the number of monomer types and quadruples the number of possible nearest neighbor pairs. Parameter reduction strategies are used for matching the model complexity to limited data sets. A strategy for reducing model complexity is to sum AH from single base-pair contributions, which do not take the influence of adjacent nucleotides into account. However, nearest neighbor contributions are added as a correction term to the single base-pair contributions.
Another strategy is to use hierarchically reduced monomer alphabets. Here, similar monomers are identified with the same letter. A four-letter alphabet, {w,s,W,S}, defines classes according to binding strength: w={a,t}, s={c,g}, W={A,T} and S={C,G}. The smallest alphabet, {D,L}, simply identifies the monomer type: DNA or LNA. As an example, the sequence GcTAAcTt can be written as SsWWWsWw or as LDLLLDLD.
The principle is to split AH and AS into contributions that depend on different levels of detail of the sequence. The fine levels of detail require many parameters to be determined, while the coarse levels need fewer parameters. The more detailed contributions can then be treated as minor corrections, thus effectively reducing the total number of model parameters.
Training
Model parameters were determined using data from melting experiments on hundreds of oligonucleotides. The oligonucleotides were random sequences with lengths between 8 and 20 and a percentage of LNA between 20 and 70. Melting curves were obtained using a Perkin-Elmer UV λ-40 spectrophotometer, but only the Tm values were used for modeling. Model parameters were adjusted using a gradient descent algorithm that minimizes the error function p ■ _ "1 ___ t pred _ iexp |2
"^ _ T »ι ι» I ' data -<V set i.e., the distance between predicted and experimental Tm values. Many different models were trained in this way and their performance was evaluated on test sets distinct from the training data. Seven reliable models were chosen and combined to form the committee model implemented at the Exiqon website (www.LNA-Tm.com.) Machine Learning And Thermodynamics
The aim of this work has been to estimate Tm values as accurately as possible. To this end, a machine learning approach has been adopted in which the prediction of the physical lHand AS quantities is less important. The parameters of this model may be inaccurate as thermodynamic quantities. First, the gradient descent algorithm produces a broad ensemble of models in which the AH and AS parameters can vary substantially, while maintaining an accuracy in the predicted Tm. Second, the thermodynamic meaning of AH and ΔS is based on a two-state assumption, which may not be realistic in every case. Even short oligonucleotides can form different secondary stmctures or melt through multiple-state transitions (Tøstesen et αl., J. Phys. Chem. B. 105:1618-1630, 2001). Third, the use of an optical instrument instead of a calorimetric instrument (DSC) introduces an error in the measured AH and AS. Nevertheless, the uncertain thermodynamic interpretation of the -dHand AS model parameters does not imply that the Tm prediction model is unreliable. Results
The Tm prediction model has been tested on two data sets that were not used during the training process. One set consisted of pure DNA oligonucleotides without LNA monomers and had a standard deviation of the residuals (SEP) of 1.57 degrees. The other set consisted of mixed oligonucleotides with both LNA and DNA and had a SEP of 5.25 degrees. The difference in prediction accuracy between the two types of oligonucleotides
suggests that Tm prediction of mixed strands is a more complex task than Tm prediction of pure DNA. This is possibly due to irregularities in the duplex helical stracture induced by the LNA monomers (Nielsen et al, Bioconjug. Chem. 11:228-238, 2000). The obtained prediction accuracy is in both cases adequate for most biological applications. In conclusion, the reduced nearest neighbor model implemented at the Exiqon website (www.LNA- Tm.com) can predict Tm surprisingly well for both types of oligonucleotides. This indicates that the parameter reduction strategy is applicable for other types of modified oligonucleotides.
Example 16A: Algorithm to Optimize the Substitution Pattern of Nucleic Acids of the Invention
High affinity nucleotides such as LNA and other nucleotides that are conformationally restricted to prefer the C3'-endo conformation or nucleotides with a modified backbone and/or nucleobase stabilize a double helix configuration. As these effects are generally additive, the most stable duplex between a high affinity capture oligonucleotide and an unmodified target oligonucleotide should generally arise when all nucleotides in the capture probe or primer are replaced by their high affinity analogue. The most stable duplex should thus be formed between a fully modified LNA capture probe and the corresponding DNA/RNA target molecule. Such a fully modified capture probe should be more efficient in capturing target molecules, and the resulting duplex is more thermally stable.
However, many high affinity nucleotides (e.g., as LNA) have an even higher affinity for other high affinity nucleotides (e.g., as LNA) than for DNA/RNA. A fully modified capture probe may thus form duplexes with itself, or if it is long enough, internal hairpins that are even more stable than duplexes with the desired target molecule. Probes with even a small inverse repeat segment where all constituent positions are substituted with high affinity nucleotides may bind to itself and be unable to bind the target. Thus, a sequence dependent substitution pattern is desirably used to avoid substitutions in positions that may form self- complementary base-pairs.
For example, a computer algorithm can be used to automatically determine the optimal substitution pattern for any given capture probe sequence according to the following two criteria. First, the difference between the stability of (i) the duplex formed between the capture probe and the target molecule and (ii) the best possible duplex between two capture probes should be above a certain threshold. If this is not possible, then the substitution
pattern with the largest possible difference is chosen. Second, the capture probe should contain as many substitutions as possible in order to bind as much target as possible at any given temperature and to increase the thermal stability of the formed duplex. Alternatively, the second criterion is substituted with the following alternative criterion to obtain capture probes with similar thermal stability. The number and position of capture probe substitutions should be adjusted so that all the duplexes between capture probes and targets have a similar thermal stability (i.e., Tm equalization).
For oligonucleotide capture probes such, incomplete matches between target and capture probe are likely to be a reproducible feature of the recorded biosignatures. For short probes, the second criterion for increasing thermal stability is more desirable that the alternative second criterion for Tm equalization. For long capture probes and PCR primers, the second alternative criterion is desirably used since Tm equalization is desirable for these probes and primers.
An exemplary algorithm works as follows. For each nucleotide sequence in an array of length n, all possible substitution patterns, i.e., 2" different sequences are evaluated. Each evaluation consist of estimating the energetic stability of the duplex between the substituted capture sequence and a perfect match unmodified target ("target duplex") and the energetic stability of the most stable duplex that can be formed between two substituted capture probes themselves ("self duplex"). The energetic stability estimate for a duplex may be calculated, e.g., using a Smith-
Waterman algorithm with the following scoring matrix. Gap initiation penalty: -8 Gap continuation penalty: -50 a c g t A C G T a -2 c -2 -2 g -2 3 -2 t 2 -2 1 -2 A -3 -3 -3 4 -3 C -3 -3 6 -3 -3 -3
G -3 6 -3 2 -3 9 -3 T 4 -3 2 -3 6 -3 3 -3
This scoring matrix was partly based on the best parameter fit to a large (over 1000) number of melting curves of different DNA and LNA containing duplexes and partly by visual scoring of test capture probe efficiency. If desired, this scoring matrix may be optimized by optimizing the parameter fit as well as increasing or optimizing the dataset used to obtain these parameters.
As an example of these calculations, the heptamer sequence ATGCAGA in which each position can be either an LNA or a DNA nucleotide is used. The target duplex formed between a fully modified capture probes with this sequence and its unmodified target receive a score of 34 as illustrated below.
Capture sequence: A-T-G-C-A-G-A
I I I I I I I Target sequence: t-a-c-g-t-c-t Score: 4+4+6+6+4+6+4 = 34
The most stable self duplex that can be formed between two modified capture probes has an almost equivalent energetic stability with a score of 30 as illustrated below.
Capture sequence: A-T-G-C-A-G-A I I I I
Target sequence: A-G-A-C-G-T-A
Score: +6+9+9+6 = 30
Thus, the capture probe efficiency of a fully modified probe is likely reduced by its propensity to form a stable duplex with itself. In contrast, by choosing a slightly different substitution pattern, ATGcaGA in which capital letters represent LNA nucleotides, the stability of the target duplex is reduced slightly from 34 to 29.
Capture sequence: A-T-G-c-a-G-A I I I I I I I
Target sequence: t-a-c-g-t-c-t
Score: 4+4+6+3+2+6+4 = 29
However, the most stable self complementary duplex that can be formed is reduced much more from 30 to 20, as illustrated below.
Capture sequence: A-T-G-c-a-G-A I I I I
Target sequence: A-G-a-c-G-T-A
Score: +4+6+6+4 = 20
The difference between the stability of the desired target duplex and the undesired self duplex can be further increased by using the capture sequence AtgcaGA where the target duplex has a score of 24.
Capture sequence: A-t-g-c-a-G-A
I I I I I I I Target sequence: t-a-c-g-t-c-t
Score: 4+2+3+3+2+6+4 = 24
Whereas the score of the self duplex is only 10, as shown below.
Capture sequence: A-t-g-c-a-G-A
I I I I Target sequence: A-G-a-c-g-t-A Score: +2+3+3+2 = 10
The additional destabilization of the self duplex is generally not required if the difference in stability between the target duplex and self duplex is above a threshold of 25% of the target duplex stability, as illustrated below.
Discrimination for ATGCAGA = (34-30) /34 = 12% < threshold (25%)
Discrimination for ATGcaGA = (29-20) /29 = 31% > threshold (25%) Discrimination for ATGCAGA = (24-10) /24 = 58% > threshold (25%)
Thus, ATCcaGA is the substitution pattern with the highest degree of substitution for which the stability of the target duplex is adequately more stable than the stability of the best self duplex (e.g., above 25%).
This algorithm can be used to determine desirable substitution patterns for any size capture probe or any given probe sequence. The following simple design rules may also be applied for probe design, especially for short probes. The best self alignment for the corresponding DNA capture probe in the sequence is determined using a simple Smith- Waterman scoring matrix of:
a c g t a -2 c -2 -2 g -2 3 -2 t 2 -2 1 -2
Additionally, all possible positions in the sequence are substituted, with the exception of desirably avoiding the substitution of both bases of a self-complementary base-pair. The most stable self duplex thus does not contain any LNA:LNA base-pairs but only LNA:DNA basepairs.
Example 16B: Computer code for a preferred software program of the invention.
A. The oligod program takes a gene sequence as input and returns sequences for
LNA spiked oligonucleotides:
#! /usr/bin/perl -w #$Revision: 1.18 $
#
#$Id: oligod, v 1 . 18 2002 /07 /29 06 : 26 : 35 nt Exp $
#NA E #oligod - microarray oligod design # tSYNOPSIS
# oligod [-blastdb blastdbl -fastadb fastadb] fastaseq
# -length oligolength] # -mindist minimunoligodistance]
# -maxhits n] # -min_tm t] # -max_tm t] # -maxoligo n] # -minscore n]
# -gene_ident_cutoff ident]
# -gene_len_cutoff len]
# -comp I -revcomp | -rev]
# -capitalize -cphase n -cfreq n] # -verbose]
# -self_score]
# -para ] parameterfile
# -conf] configurationfile
# -lna] # -matrix] hybridisationmatrix
#
#DESCRIPTION
# 1. Blast against blastdb # 2. Less than 17 consecutive matching nt's # 3. Identity less than 60% # 4. Palindrom = SmithWaterman against complement # 5. Melting temperature (0.88xA/T) + (1.47xG/C) + (X) # [PICK70 script (DeRisi group)] (+X is by Niels Tolstrup) # 6. Salt: [http://jsll.chem.wayne.edu/Hyther/hytherm2main.html] # a. Monovalent cation 1 mol/L # b. Mg2+ 0 mol/L # c. Hybridization temperature 37.0 degrees Celcius # d. Target 2e-7 mol/L # e. Primer le-9 mol/L # 7. Number of primers to show. # 8. Primer length. # # -maxhits n # n is the maximal number of alignments to show for each oligo # # -maxoligo n # n is the maximal number of oligos to suggest for each sequence # # -min_score n # n is the minimal score required for a hit to be included in the scoring of a oligo
-mindist n
Oligos will not overlap if n = oligolength
0 and 1 allow for all oligos
# -gene_ident_cutoff ident
Mask hits with identity larger than or equal to ident.
# Used together with gene_len_cutoff . # Default 98
#
# -gene_len_cutoff len
# Mask hits with alignment length longer than or equal to len
# Used together with gene_ident_cutoff.
# Default 50 #
# -lna
# If a LNA matrix is used for selfhybridisation this must be set. #
# -matrix hybridisationmatrix # The format of the hybridisation matrix is that used by the fasta
# package. LNA is represented by A=L C=I G=0 and T=U in the matrix. #
# -capitalize
# -cphase phase # -cfreq freq
# -end_spike_len len
# phase is 0 to freq - 1
# freq is the frequence of LNA default is 4 # # -blastdb "dbl db2 db3" #
# -sense reverse
# Uses the reverse complement of the query, use with strand = bottom # # oligod reads the configuration file oligod. conf # #FILES
# blastall
# formatdb # dan
# lynx
# ssearch
# dyp # tAUTHOR
# Copyright Niels Tolstrup 2001,2002
# tolstrup@exiqon.com
# Exiqon # use strict; use Getopt : :Long; use FindBin qw($Bin); use lib "$Bin/../lib"; use L P: :UserAgent; use HTTP: .-Cookies;
my $glb;
$glb->{ dir} = $Bin;
$glb->{ conf} = "$glb->{ dir} /oligod. conf"; $glb->{ tmpdir} = "$glb->{ dir}/tmp";
$glb->{ program_name} = "$glb->{ dir} /oligod";
$glb->{ lynx_cookie} = "$glb->{ dir} /lynx_cookies";
$glb->{ dyp} = "$glb->{ dir} /bin/dyp";
$glb->{ bin} = "/usr/bin"; $glb->{ formatdb} = "$glb->{ bin} /formatdb";
$glb->{ blastall} = "$glb->{ bin} /blastall";
$glb->{ ssearch} = "$glb->{ bin}/ssearch";
$glb->{ dan} = "$glb->{ bin} /dan";
$glb->{ lynx_env} = "LD_LIBRARY_PATH=/usr/local/lib ";
$glb->{ lynx} = "$glb->{ lynx_env}$glb->{ bin}/lynx"; $glb->{ tm_exiqon_url} = "http: //armstrong/cgi-bin/tmpredict . cgi";
$glb->{ tm_url} = 'http://lna-tm.com/';
$glb->{ ua} = L P: :UserAgent->new;
$glb->{ cookiefn} = "$glb->{ dir} /lwpcookies.txt";
$glb->{ ua}->cookie_jar ( HTTP: :Cookies->new( file => $glb->{ cookiefn}, autosave => 1) ) ;
$glb->{ version} = "0.9"; $glb->{ verbose} = 0;
#$glb->{ view} =
"BLAST : FASTA: SELF_EVIDENCE : PROBE: RUN TIME : PARAMETER" .
" : SCOREJSVIDENCE";
#$glb->{ view} = "BLAST : SELFJEVIDENCE: PROBE : RUNJTIME: SCORE_EVIDENCE" ;
$glb->{ view} =
"PARAMETER: FASTA: BLAST : SELF_EVIDENCE: PROBE : RUNJTIME" .
" : SCOREJEVIDENCE:RUNTIME: TARGETJ3TRUCT";
#$glb->{ view} = "";
######### XML I/O ########### fuse XML: : Dumper; #sub dump__parameter_xml ($$) { # my ( $fn, $dat) = @_;
# my $dump = new XML: : Dumper;
# my $xml = $dump->pl2xml ( $dat) ;
# local FILE;
# open( FILE, "> $fn") || die "Could not open $fn $!"; # print FILE "$xml\n";
# close FILE; #}
#
#sub read_parameter_xml ($) { # my ( $fn) = @_;
# my $xml = "";
# local FILE;
# open( FILE, "< $fn") || die "Could not open $fn $!";
# while ( <FILE>) { # $xml .= $_;
# }
# my $dump = new XML: : Dumper;
# my $dat = $dump->xml2pl ( $xml) ;
# close FILE; # $dat;
#}
######## PARAMETER I/O ########## sub read_conf ($) { my ( $glb) = @_; my $var; local *FILE; open( FILE, "$glb->{ conf}") || die "Could not open $glb->{conf} "; while (<FILE>) {
chomp; if( /Λ\s*#/ ) { } elsif( /Λ\s*$/ ) { } elsif{ /A\s*(\S+)\s+([Λ#]+)\s*/ ) { if( defined $glb->{ $1}) { $glb->{ $1} = $2; } } else { print "$_ was not recognized\n"; } } close FILE; foreach $var ( "tmpdir", "lynx_cookie") { if ( $glb->{ $var} !~ Λ//) {
$glb->{ $var} = "$glb->{ dir}/$glb->{ $var}"; } } }
sub read_param($$) { my ( $param, $file) = @_; my $warning = ' ' ; my $tmpl; my $tmp2; while (<$file>) { if( /Λ\s*#/ ) { } elsif( /Λ\s*$/ ) { } elsif( /Λ\s*<PARAMETER>$/ ) { } elsif( /Λ\s*<\/PARAMETERS/ ) { last; } elsif( / \s*oligod_param/i) { if( /Λ\s*oligod_param\s+(\S+)\s+(\S+)\s+(\S+)/i ) { $tmpl = $1; $tmp2 = $2; $tmpl =~ tr/A-Z/a-z/; $tmp2 =~ tr/A-Z/a-z/; if ( defined $param->{oligod_param} { $tmpl}) { if( defined $param->{oligod_param} { $tmpl}{ $tmp2}) {
$param->{oligod_param} { $tmpl}{ $tmp2} = $3; } else {
$warning .= "Did not recognize $tmp2 in $_"; } } else { $warning .= "Did not recognize $tmpl in $_"; } } else { $warning .= "Not enough values fore oligod_param in $_"; } } elsif( / \s*blast_param/i) { if{ /Λ\s*blast_param\s+(\S+)\s+(.+)/i ) { $tmpl = $1; $tmpl =~ tr/A-Z/a-z/; if( defined $param->{ blast_param} { $tmpl}) {
$param->{ blast_param} { $tmpl} = $2; } else { $warning .= "Did not recognize $tmpl in $_"; } } else { $warning .= "Not enough values fore blast_param in $J';
} } elsif( /Λ\s*(\S+)\s+([Λ#]+)/ ) { $tmpl = $1; $tmpl =~ tr/A-Z/a-z/; if( defined $param->{ $tmpl}) { $tmp2 = $2;
$tmp2 =~ s/Λ(.*?)\s*$/$l/; $param->{ $tmpl} = $tmp2; } else { $warning .= "Did not recognize $tmpl in $_"; } } else { $warning .= "Key value pair expected, line not recogniced $_" ; }
}
$ arning;
}
sub update_derived_param($) { my ($param) = @_; foreach ( keys %{$param->{ oligod_param} } ) { $param->{ oligod_param} { $_} { squash_factor} = squash_factor (
$param->{ oligod_param} { $_} { squash_dx}, $param->{ oligod_param} { $_} { squash_dy} ) ; } }
sub initialize_param($) { my ($popt) = @__; my $mat;
# Parameter priority
# 1. Commandline parameters .
# 2. Parameters from parameter file.
# 3. Default values. my $param = {
"fastadb", "",
"blastdb", "/home/nt/database/bf/h_sapiens",
"oligo_length", 50, "oligo_sense", "direct",
"cphase", 0,
"cfreq", 3,
"end__spike_len", 0,
"mindist", 50, "maxhits", 8,
"max noligo", 10,
"min_score", 0, # 35~ ignore < 18 matches in oligo scoring
"dnaconc", 2000, # nMol
"saltconc", 115, # mMol
"gene_len_cutoff", 50,
"gene_ident_cutoff", 98,
"blast_param", {"wordlen", 11, "strand", "both strands",
"expect", 50, "nproc", 2, "filter", "F"}, "lna", 1 };
$param->{ oligod_param} = {
"self natch", { "weight", 1, "cutoff", 50, "squash_dx", 10,
"squash_dy", 0.9, "squash_factor", 1, "layer", 1, }, "selfjnyp", { "weight", 0,
"cutoff", 50, "squash_dx", 10, "squash_dy", 0.9, "squash_factor", 1, "layer", 1,
}, "tm_min", { "weight", 1, "cutoff", 10, "squash_dx", 2, "squash_dy", 0.9,
"squash_factor", 1, "layer", 0,
}, "tm_max", { "weight", 1, "cutoff", 100,
"squash_dx", 2, "squash_dy", 0.9, "squash_factor", 1, "layer", 0, },
"tm", { "weight", 1,
"cutoff", 1, "squash_dx", 0.8, "squash_dy", 0.9, "squash_factor", 1,
"layer", 1,
},
"tm_dan_min", { "weight", 1, "cutoff", 10, "squash_dx", 2,
"squash_dy", 0.9, "squash_factor", 1, "layer", 0,
}, "tm_dan_max", { "weight", 1,
"cutoff", 100, "squash_dx", 2, "squash_dy", 0.9, "squash_factor", 1, "layer", 0,
}/
"tm_dan", { "weight", 1, "cutoff", 1, "squash_dx", 0.8, "squash_dy", 0.9, "squash_factor", 1,
"layer", 1,
},
"hit_score", { "weight", 0, "cutoff", 10000, "squash_dx'\ 5000,
"squash_dy", 0.9, "squash_factor", 1, "layer", 1,
}, "target_struct", { "weight", 1,
"cutoff", 30,
"squash_dx", 20,
"squash_dy", 0.9,
"squash_factor", 1, "layer", 1,
}, "max_match", { "weight", 1, "cutoff", 30, "squash_dx", 5, "squash_dy", 0.9,
"squash_factor", 1, "layer", 1,
},
"max_stretch", { "weight", 1, "cutoff", 20,
"squash_dx", 2,
"squash_dy", 0.9,
"squash_factor", 1,
"layer", 1, },
"ggg", { "weight", 0,
"cutoff", 3,
"squash_dx", 1,
"squash_dy", 0.9, "squash_factor", 1,
"layer", 1,
}, "eg", { "weight", 0,
"cutoff", 1, "squash_dx", 1,
"squash_dy", 0.9,
"squash_factor", 1,
"layer", 1,
}, };
if( $popt->{ paramfn} ) { local *FILE; open( FILE, "< $popt->{ paramfn}") || die "Could not open $popt->{ paramfn} $!"; print read_param( $param, \*FILE) ; close FILE;
}
foreach ( "oligo_length", "oligo_sense", "mindist", "maxhits", "max_noligo",
"cphase", "cfreq", "end_spike_len", "min_score", "lna", "matrix", "fastadb", "blastdb", "gene_ident_cutoff", "gene_len_cutoff") { if ( defined $popt->{ $_} ) {
$param->{ $_} = $popt->{ $_} ; } } if ( defined $popt->{ oligod_param_ggg_weight} ) { $param->{ oligod_param} { ggg} { weight} = $popt->{ oligod_param_ggg_weight } ; } if ( defined $popt->{ oligod_param_cg_weight} ) { $param->{ oligod param} { eg} { weight} = $popt->{ oligod param eg weight}; } ~ ~ if ( defined $popt->{ oligod_param_tm_min} ) { $param->{ oligod_param} { tm_min}{ cutoff} =
$popt->{ oligod_param_tm_min} ;
} if ( defined $popt->{ oligod_param_tm_max} ) { $param->{ oligod_param} { tm_max}{ cutoff} =
$popt->{ oligod_param_tm_max} ;
} if { defined $popt->{ oligod_param_tm_dan_min} ) { $param->{ oligod_param} { tm_dan_min} { cutoff} =
$popt->{ oligod_param_tm_dan_min} ;
} if ( defined $popt->{ oligod__param_tm_dan_max} ) { $param->{ oligod_param} { tm_dan_max} { cutoff} = $popt->{ oligod_param_tm_dan_max} ;
} if( $param->{ matrix} ) { if ( ! -e $param->{ matrix}) { die "Did not find $param->{ matrix} $!"; } } else { if( $param->{ lna} ) { $param~>{ matrix} = "$glb->{ dir} /lna.mat"; } else { $param~>{ matrix} = "$glb->{ dir} /dna.mat"; } if ( ! -e $param->{ matrix}) { if( $param->{ lna} ) { $mat = default_ma ( "lna"); } else { $mat = default_mat ( "dna") ; } write file( $mat, $param->{ matrix}); } } update_derived_param( $param) ; $param; }
sub parse_argv($) { my ( $glb) = @_; my $opt; my $popt;
GetOptions ( "param: s" => \$popt->{ paramfn}, "conf:s" => \$glb->{ conf}, "fastadb :s" => \$popt->{ fastadb},
"blastdb: s" => \$popt->{ blastdb},
"length: i" => \$popt->{ oligo_length} , "sense:!" => \$popt->{ oligo_sense} ,
"mindist:!" => \$popt->{ mindist},
"cphase:i" => \$popt->{ cphase},
"cfreq:i" => \$popt->{ cfreq} ,
"end_spike_len:i" => \$popt->{ end_spike_len} ,
"min_tm_dan : i' => \$popt->{ oligod__param_tm_dan_min} , "max_tm_dan: i' => \$popt->{ oligod_param_tm_dan_max} , "min_tm: i" => \$popt->{ oligod_param_tm_min} , "max_tm: i" => \$popt->{ oligod_param_tm_max} , "min_score:i" => \$popt->{ min_score}, "maxhits:!" => \$popt->{ maxhits}, "maxoligo:!" => \$popt->{ max_noligo}, "lna:i" => \$popt->{ lna}, "gene_ident_cutoff :i" => \$popt->{ gene_ident_cutoff} ,
"matrix: s" => \$popt->{ matrix}, » rev" => \$opt->{ rev}, "comp" => \$opt->{ comp}, "revcomp" => \$opt->{ revcomp},
"capitalize" => \$opt->{ capitalize}, "verbose÷" => \$glb->{ verbose},
'view: s => \$glb->{ view}, => \$opt->{ stdio}
) if( $opt->{ stdio} ) { push ΘARGV, "-"; } if ( $#ARGV = -1) { usage ( $glb, "No sequence file") ; }
($opt, $popt) ; }
sub dump_param($$) { my ( $glb, $param) = @__; my $key; my $property; my $value; my $str = "<PARAMETER>\n"; foreach $key ( sort keys %$param) { if ( $key eq "oligod_param" ) { foreach $property ( sort keys %{$param->{ $key}}){ foreach $value ( sort keys %{$param->{ $key} { $property} } ) { $str .= sprintf( "%-16s %-16s %-16s %g\n", $key, $property, $value, $param->{ $key} { $property} { $value});
} } } elsif ( $key eq "blast_param" ) { foreach $property ( sort keys %{$param->{ $key}}){
$str .= sprintf( "%-16s %-16s %s\n", $key, $property, $param->{ $key} { $property}); } } else {
$str .= sprintff "%-16s %s\n", $key, $param->{ $key});
} } $st .r .= '</PARAMETER>\n" ;
$st -r;
}
######## SEQUENCE I/O ######### sub print fasta{$$) { my ($seq , $filept) = @_; my $sq = $seq->{ seq}; my $str; print $filept ">$seq->{ name}"; if( defined $seq->{ comment} ) {
$str = substr( $seq->{ comment}, 0, 55) print $filept " $str"; } print $filept "\n"; if( defined $sq ) { $sq =~ tr/A-Za-z//cd; while ( $sq =~ / ( . { 60} ) /gc ) { print $filept "$l\n";
} if( $sq =~ /(.+)/gc ) { print $filept "$l\n"; } } else { print $filept "\n"; } } sub read_fasta($$) { my ( $lastline, $filept) = @_; my $seq; if( !$$lastline) { while ( <$filept>) { if{ /Λ>/ ) ( $$lastline = $_; last; } } } if( $$lastline) { if( $$lastline =~ /> (\S+) \s* (\S*) / ) { $seq->{ name} = $1; $seq->{ comment} = $2; $seq->{ seq} _ H ιr $$lastline while ( <$filept> ) { if( /Λ>/ ) { $$lastline = $_; $seq->{ seq} =~ tr/a-zA-Z//cd; return $seq; }
$seq->{ seq} . — $_; }
} else { $seq->{ name} r . $seq->{ comment} $seq->{ seq} _ nil,
}
} if( $seq ) {
$seq->{ seq} =~ tr/a-zA- •Z//cd;
$seq; }
sub print_fasta_fn ( $$){ my ($seq, $fn) = @_ local *FILE; open( FILE, ">$fn") 1 | die ' •Could not open $fn $!"; print_fasta { $seq, \*FILE) ; close FILE; }
sub write__file ( $$) { my ($str, $fn) = @_; local *FILE; open( FILE, ">$fn") || die "Could not open $fn $!"; print FILE $str; close FILE; } sub read_file ( $ ) { my ($fn) = @_; local *FILE; my $str = ""; open( FILE, "<$fn") || die "Could not open $fn $!"; while ( <FILE>) {
$str .= $_; } close FILE; $str; }
########### BLAST ################## sub formatblastdb($$$) { my ( $glb, $dbfn, $type) = @_ my $db = $dbfn; my $size; my $mtime; my $prev_size = 0; my $prev_mtime = 0; local *FILE; my $d;
$db =~ s/(.*)\. ( [Λ.]*)$/$l/; # Remove suffix $db =~ s/.*\/([Λ\/]*)$/$l/; # Remove path print "Do I need to format $db?\n";
($size, $mtime) = stat $dbfn; if( -e "$glb->{ tmpdir}/$db. stat") { open{ FILE, "<$glb->{ tmpdir} /$db. stat") || die "Could not open $glb->{ tmpdir} /$db. stat $!"; while ( <FILE>) { if( /size=\s*(\d+) . *mtime=\s* (\d+) /) { $prev_size = $1; $prev_mtime = $2; } } close FILE; } if( ! -e "$glb->{ tmpdir}/$db" || $prev_size != $size || $prev_mtime < $rntime ) { print " YES\n";
$d = "$glb->{ tmpdir} /$db"; if ( -1 $ ) { unlink $d; } if ( $dbfn =~ /Λ\// ) { $d = $dbfn;
} else { my $cwd; chomp ( $cwd = "pwd"); $d = "$cwd/$dbfn"; } print "In -s $d $glb->{ tmpdir} /$db - print "cd $glb->{ tmpdir}; $glb->{ formatdb} -t $db -i $db -p $type"; print "echo "size= $size mtime= $mtime" > $glb->{ tmpdir }/$db. stat " ; } else { print " No, The database is ok.\n"; }
$db = "$glb->{ tmpdir} /$db"; $db; } sub blast ($$$$$) { my ( $glb, $queryseq, $dbfn, $method, $parameters) = @_; my $queryfn = "$glb->{ tmpdir} /$$. asta"; local *FILE; local *PIPE; my $result = ""; open( FILE, ">$queryfn") || die "Could not open $queryfn $!"; print_fasta( $queryseq, \*FILE) ; close FILE;
#print "$glb->{ blastall} $parameters -p $method -d $dbfn -i $queryfn\n"; open( PIPE, qq{$glb->{ blastall} $parameters -p $method -d "$dbfn" -i $queryfn |}) || die "Failed to run $glb->{ blastall}"; while ( <PIPE>) {
$result .= $ ;
} close PIPE; unlink $queryfn;
$result;
}
sub parsejblast ($) { my ($blastout) = @_; my $query - q{Score =\s* (\d+) .*?} q{Expect =\s*{[\deE\- . ] + ) . * ? } q{\((\d+)%\) .*?} . q{ Strand = (PluslMinus) \\// (Plus | Minus ).*? } q{ ((Query: \s* (\d+) \s+) ( [Λ ]+)[Λ>%]*} . q{ (\d+) [ΛS]+Sbjct[Λ>%]*)\n\n[Λ>S]*}; y %result = ( ) ; my $expect; my $name; my $score; my $ident; my $qstrand; my $tstrand; my $qstart; my $qend; my $tstart; my $tend; my $alignment my $blastρosl my $blastρos2 my $1; my $sl; my $b; my $queryseq = "" my $matchseq = "" my $targetseq = "" my $target; my $target_no = 0; my $qlen = 0; if( $blastout =~ /ΛQuery=\s* (\S+) .*?\ ( (\d+) \s+letters\) /sm ) { $result{ query_name} = $1; $result{ qlen} = $2; } else { $result{ query_name} = "-"; $result{ qlen} = 0; } $result{ hits} = []; while ( $blastout =~ / (Λ> (\S+) [Λ>] *) /smgc ) { $target = $1; $name = $2; $target_no++;
$blastposl = pos; while ( $target =~ /$query/smgc ) { tprint "$2, $3, $4\n"; $tstart = " ; $score = $1; $expect = $2;
$ident = $3;
$qstrand = $4;
$tstrand = $5;
$alignment $6;
$1 $7;
$qstart = $8;
$b $9;
$qend = $10;
$blastpos2 = pos;
$queryseq
$matchseq
$targetseq
$1 = length ( $1) ;
$sl = $1 - length ("Query:") ;
$b = length ( $b) ; while ( $alignment =~ /ΛQuery: . {$sl} ( . {1, $b} ) \ \d+\n.{$l}
(.{l,$b})\nSbjct: (.{$sl}) (.{l,$b})\ (\d+) \n/mgcx) {
$queryseq .= $1;
$matchseq .= $2;
$targetseq .= $4; if ( $tstart eq ' ' ) { $tstart = $3;
}
$tend = $5; }
#print "\nl: $queryseq\n2 : $matchseq\n3 : $targetseq\n";
$expect =~ s/Λe/le/; $qlen = $qend - $qstart + 1; push @{$result{ hits}}, { "tna e", $name, "score", $score, "expect", $expect, "ident", $ident, "qstrand", $qstrand, "tstrand", $tstrand, "target_no", $target_no, "qstart", $qstart, "qend", $qend, "qlen", $qlen, "tstart", $tstart, "tend", $tend, "self", 0,
"queryseq", $queryseq, "matchseq", $matchseq, "targetseq", $targetseq} ; pos = $blastpos2;
} pos = $blastposl;
}
%result; }
##### perceptron ##### sub squash_factor ($$) {
# This factor determens how steep the squashing function operates
# a change of $x from 0 gives a $y response if used with $k my ($x, $y) = @_; my $k = - log(l/$y - 1) / $x; $k; } sub squash ($) { my ($x) = @_; my $y = 1/ (1+exp (-$x) ) ; $y;
sub dot ( $$ ) { my ( $a, $b ) = @_; my $sum = 0 ; my $i ; if ( $#$a ! = $#$b ) { die "dot : a ( $#$a) and b ( $#$b) should have the same number of elements " ; } for( $i=0; $i<=$#$a; $i++) {
#if( ! defined $a->[ $i] ) { die "OHOH a $i was not defined"; } #if ( ! defined $b->[ $i] ) { die "OHOH b $i was not defined"; } $sum += $a->[ $i] * $b->[ $i] ;
}
$sum;
} sub sum($) { my ($a) = @_, my $sum = 0; my $i; for( $i=0; $i<=$#$a; $i++) { $sum += $a->[ $i] ; } $sum; } sub perceptron ($$) { my ($ eight, $input) = @_; my $sum = sum ( $weight) ;
#my $output = squash( dot ( $weight, $input) ) my $output = dot ( $weight, $input)/ $sum; $output; }
######## SYSTEM ################ sub openfile($) { my ($fn ) = @_; local *FILE; open( FILE, $fn) | 1 die "Could not open $fn $!\n";
*FILE; }
sub makedir($) { my ($dir) = @_; my Θdir = split (/\//, $dir) ; my $d = ' ' ; foreach (@dir) { $d .= $ ;
if( $d && ! -e $d ) { mkdir $d; } $d .= "/" ; } }
sub usage ($$) { my ($glb, $message) = @_; my $file = openfile ( $glb->{ program_name} ) ;
$message .= "\n"; while (<$file>) { if( /Λ#(.*)/ ) { $message .= "$l\n"; } else { last; } } close $file; die "$message"; } sub min ( @ ) { my $min = 99999999999; foreach (@_) { if( $_ < $min ) { $min = $_; } } $min; } sub max(Θ) { my $max = -99999999999; foreach (@_) { if ( $_ > $max ) { $max = $_; } }
$max; } sub up ($) { my ( $str) = @_;
$str =~ tr/a-z/A-Z/;
$str; }
########## SEQUENCE FUNCTIONS ############ sub comp ( $ ) { my ($seq) = @_; $seq =~ tr/acgtumrwsykvho^xnACGTUMRWSYKVHDBXN/tgcaakywsrmbdhvxxTGCAAKY SRMBDHVXX/; $seq; } sub rev($) { my ($seq) = @_; $seq = reverse ( split ( " ", $seq) ) ;
$seq; } sub revcomp ( $ ) { my ($seq) = @_;
$seq = rev( $seq) ; $seq = comp ( $seq) ; $seq;
} sub capitalize ($$$$) { my ( $seq, $phase, $freq, $end_spike_len) = @_; my $pat = "." x $phase; my $n = $freq - $phase - 1;
$seq =~ tr/A-Z/a-z/; if ( $freq != 0 ) { if ( $phase < 0 ]| $phase >= $freq ) { die "phase ($phase) is out of range\n"; } if ( $freq < 1 ) { die "freq ($freq) is out of range\n"; }
$seq =~ s/($pat) (.) (.{0,$n})/$l . up($2) . $3/eg; } if( $end_spike_len != 0 ) { $seq =~ s/(.{0,$end_spike_len}) (.*)(. {$end_spike_len} ) /up ($1) . $2 . up ($3) /eg; }
$seq; }
######### MELTING TEMPERATURE ################### sub melt_temp ($ ) { my ($seq) = @_; my $temp; my $n_at; my $n_gc; my $n_ot; $seq =~ tr/a-zA-Z//cd;
$_ = $seq;
$n_at = tr/aAtT//d; $_ = $seq; $n_gc = tr/gGcC//d; $_ = $seq; $n_ot = tr/aAtTgGcC//cd;
$temp = (0.88 * $n_at) + (1.47 * $n_gc) + $n_ot;
($n__at, $n_gc, $n_ot, $temp) ; } sub tm_pred($) { my ($seq) = @_; my $temp = 0; my $conf = 0;
if (1) {
$seq =~ tr/acgtACGT//cd;
#01igoconc is the sum of target and probe concentrations molar
my $output = "$glb->{ lynx} -cookie_file=$glb->{ lynx_cookie} -source $glb->{ tm_exiqon_url}?sequences='$seq' &saltconc=' 0.115 ' &oligoconc=' 0.000002' ; print "$glb->{ lynx} -cookie_file=$glb->{ lynx_cookie} -source $glb->{ tm_exiqon_url } ?sequences=' $seq' \n"; print "$output\n"; if( $output =~ /<td>( [\d.]+)<\/td>\s+<td>( [\d. ] +) <\/tdx\/tr>/s) { $temp = $1; $conf = $2; } } ($conf, $temp) ; }
sub tm($$@) { my ($glb, $param, @seq) = @_; my $temp = 99999; my $conf = 0; my $comand; my $seqfn = "$glb->{ tmpdir} /${$ }tm. seq"; my $resultfn = "$glb->{ tmpdir}/${$}tm.dat"; local *FILE; local *0LDERR; my $score__min; my $score_max; my $score; my $wsum; my $req; my $res; my $output; my (-result; my $seq; my $content = ' ' ; foreach $seq ( @seq) { $seq =~ tr/acgtACGT//cd; if ( $content ) {
$content .= "\n$seq"; } else { $content = "sequences=$seq"; } }
#$seq = $seq[0] ;
$req = HTTP: :Request->ne ( POST => $glb->{ tm_url}); $req->content_type ( ' application/x-www-form-urlencoded1 ) ; #$req->content ( ' tmuser=oligod&email=tolstrup@exiqon. com' ) ; #$req->content ("sequences=$seq") ; $req->content { $content) ;
$res = $glb->{ ua}->request ($req) ;
#print "$seq\n"; # $output = $res->content;
Iprint "$output\n";
# check the outcome if ($res->is__success) { $output = $res->content; } else { print "Unsuspected Tm result:" . $res->status_line . "\n"; }
#my $output =
# "$glb->{ lynx} -cookieJfile=$glb->{ lynx_cookie} -source $glb->{ tm_exiqon_url} ?sequences='$seq' " ; # print "#### THE OUTPUT WAS #####\n$output\n"; if( defined $output ) { while ( $output =~ /<td>( [\d. ] +) <\/td>\s+<td> ( [\d. ] +) <\/tdx\/tr>/gis) { $temp = $2;
$score_min = squash ( $param->{ oligod_param} { tm_min} { squash_factor} * ($temp - $param->{ oligod_param} { tm_min} { cutoff}));
$score_max = squash ( $param->{ oligod_param} { tm_max} { squash_factor} * ($param->{ oligod_param} { tm_max} { cutoff} - $temp) ) ;
$wsum = $score_min * $param->{ oligod_param} { tm_min} { weight} + $score_max * $param->{ oligod_param} { tm_max} { weight};
$score = squash ( $param->{ oligod_param} { tm} { squash_factor} * ( $wsum - $param->{ oligod_param} { tm} { cutoff})); push (-result, { "tm", { "raw", $wsum, "score", $score}, "tm_min", { "raw", $temp, "score", $score_min}, "tm max", { "raw", $temp, "score", $score max}}; } } else { print STDERR "Sorry, the tm prediction is not available."; foreach (@seq) { push (-result, { "tm", { "raw", 0, "score", 0}, "tm_min", { "raw", 0, "score", 0}, "tm_max", { "raw", 0, "score", 0}}; } }
\@result; }
sub tm_dan($$$) { my ($glb, $param, $seq) = @_ my $temp = 9; my $comand; my $seqfn = "$glb->{ tmpdir} /${$}dan. seq"; my $resultfn = "$glb->{ tmpdir} /$ {$}dan. dat";
local *FILE; local *OLDERR; $seq =~ tr/acgtACGT//cd; my $windowsize = length ( $seq) ; my $score__min; my $score_max; my $score; my $wsum; write_file ( $seq, $seqfn) ;
$comand = "$glb->{ dan} -sequence $seqfn -windowsize ". "$windowsize -shiftincrement 1 ".
"-dnaconc $param->{ dnaconc} -saltconc $param->{ saltconc} ". "-outfile $resultfn"; open(OLDERR, ">&STDERR"); open(STDERR, '>', "/dev/null") ; my $output = "$comand"; close (STDERR) ; open(STDERR, ">&OLDERR") ; open( FILE, "<$resultfn") | | die "Could not open $resultfn"; while ( <FILE>) { if( /Tm=([\d.]+)/) { $temp = $1; } elsif ( /Λ\s+\d+\s+\d+\s+( [0-9.]+) /) { $temp = $1; }
} close FILE; unlink $seqfn; unlink $resultfn;
$score_min = squash ( $param->{ oligod_param} { tm_dan_min}{ squash_factor} ($temp - $param->{ oligod_param} { tm_dan_min} { cutoff}));
$score__max = squash ( $param->{ oligod_param} { tm_dan_max} { squash_factor} *
($param->{ oligod_param} { tm_dan_max} { cutoff} - $temp) ) ; $wsum = $score_min * $param->{ oligod_param} { tm_dan_min} { weight} + $score_max * $param->{ oligod_param} { tm_dan_max} { weight};
$score = squash ( $param->{ oligod_param} { tm_dan} { squash_factor} * ( $wsum - $param->{ oligod param} { tm dan} { cutoff})); ~
({ "raw", $wsum, "score", $score}, { "raw", $temp, "score", $score_min}, { "raw", $temp, "score", $score_max} ) ; }
############ SELF HYBRIDISATION ################# sub default_mat ($) { my ( $mat_name) = @_; my $mat;
if( $mat_name eq "lna" ) { $mat = ";D LNA hybridisation scoring matrix 1 45 80 5 6 80 4 -8 -50 0 12
ACGTRYMWSKDHVBNLIOU 0 1 2 3 0 1 0 0 1 2 0 0 0 1 0 0 1 2 3 -4 -4
-4 5 -4
3 -4 -4 -4
1 -1 1 -1 1
-1 1 -1 1 -4 1 l 1 -2 -2 -1 -1 1
1 -2 -2 1 -1 -1 -1 1
-2 1 1 -2 -1 -1 -1 -1 1
-2 -2 1 1 -1 -1 -1 -1 -1 1
1 -2 1 1 1 -1 -1 1 -1 1 1 l 1 -2 1 -1 1 1 1 -1 -1 -1 1
1 1 1 -2 1 -1 1 -1 1 -1 -1 -1 1
-2 1 1 1 -1 1 -1 -1 1 1 -1 -1 -1 1
1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1
-4 -4 -4 6 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -4 -4 6 -4 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-4 6 -4 -4 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 8 -1
6 -4 -4 -4 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 8 -1 -1 /
} elsif( $mat : name eq "dr.la" ) {
$mat
";D hybridis :ation scoring matrix
1 45 80 5 6 ; 80 4
-8 -50 * @ 0 1 2
ACGTRYMWSKDHVBN
0 1 2 3 0 1 0 0 1 2 C I 0 0 1 . 0
-4
-4 -4 -4 5 -4
3 -4 -4 -4
1 -1 1 -1 1
-1 1 -1 1 -4 1
1 1 -2 -2 -1 -1 1 l -2 -2 1 -1 -1 -1 1
-2 1 1 -2 -1 -1 -1 -1 1
-2 -2 1 1 -1 -1 -1 -1 -1 1
1 -2 1 1 1 -1 -1 1 -1 1 1
1 1 -2 1 -1 1 1 1 -1 -1 -1 1 l 1 1 -2 1 -1 1 -1 1 -1 -1 -1 1
-2 1 1 1 -1 1 -1 -1 1 1 -1 -1 -1 1
1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1
} else{ die "Ups $mat_name was not recogniced try lna or dna"; }
$mat; }
sub self_match($$$) { my ( $glb, $param, $seq) = @_; local *PIPE; my $seqfn = "$glb->{ tmpdir }/${ $}seq. fasta"; my $compseqfn = "$glb->{ tmpdir}/${$}compseq. fasta"; my $f_seq; my $score = 0; my $norm_score = 0; my $alignment = ""; local *OLDERR; my $evidence my $ssparam my $str; if ( $param->{ matrix} ) {
$ssparam = " -s $param->{ matrix} "; } open(OLDERR, ">&STDERR"); open(STDERR, ">/dev/null") || die "Can't open /dev/null $!";
$f_seq->{ seq} = $seq; $f_seq->{ name} = "seq"; if( $param->{ lna} ) {
$f_seq->{ seq} =~ tr/ACGT/LIOU/; } print_fasta_fn( $f_seq, $seqfn) ; $f_seq->{ seq} = rev ( $seq) ; $f_seq->{ name} = "rev"; if( $param->{ lna} ) { $f_seq->{ seq} =~ tr/ACGT/LIOU/; } print fasta fn( $f_seq, $compseqfn) ; if( $glb->{ verbose} > 1 ) { print "<SSEARCH>"; } #print "$glb->{ ssearch} $ssparam $seqfn $compseqfn\n"; open( PIPE, "$glb->{ ssearch} $ssparam $seqfn $compseqfn |") || die "Failed to open pipe $!"; while ( <PIPE>) { if( /Smith-Waterman score : \s* (\d+) ; /) {
$score = $1; } if( $glb->{ verbose} > 1 ) { print; } if( /Λ[ .:\t]+$/) { tr/.:/: /; } if( $param->{ lna} ) { if ( /Λ(\S+) (.*)/) { $alignment .= $1; $str = $2;
$str =~ tr/ACGTLIOU/acgtACGT/; $alignment .= "$str\n";
} else {
$alignment .= $_;
} } else { $alignment .= $ ; } } close PIPE; if ( $glb->{ verbose} > 1 ) { print "</SSEARCH>\n"; } if( $alignment =~ /(Smith-Waterman score. *) Library sca/s) { $evidence = $1; if( $glb->{ verbose} > 1 ) { print "<SELF_EVIDENCE>\n"; print "$evidence"; print "</SELF_EVIDENCE>\n"; } }
close (STDERR) ; open(STDERR, ">&OLDERR"); unlink $seqfn; unlink $compseqfn;
$norm_score = squash ( $param->{ oligod_param} { self_match} { squash_factor} *
( $param->{ oligod_param} { self_match} { cutoff} - $score) ); { "score", $norm_score, "raw", $score, "evidence", $evidence}; }
sub self_hyp($$@) { my ( $glb, $param, Θseqs) = @_; local *PIPE; my $score = 0; my $norm_score = 0; my $evidence = ' ' ; my $seqfn = "$glb->{ tmpdir} /$ {$}seq_d. fasta"; local *FILE; my $seq; my $i = 0; my (-result = () ; my $dyp_opt = '-min_score 0 -max_hyp 1'; my $read;
open( FILE, "> $seqfn") II die "Could not open $seqfn"; foreach (@seqs) {
$seq->{ seq} = $_;
$seq->{ name} = sprintf ( "seqj%03d", $i) ; $i++; print_fasta( $seq, \*FILE) ; } close FILE; if( $glb->{ verbose} > 1 ) { print "<DYP>"; } #print "$glb->{ dyp} -f $seqfn $dyp_opt\n";
open( PIPE, "$glb->{ dyp} -f $seqfn $dyp_opt|") | | die "Failed to open pipe $!"; $read = 1; while ( <PIPE>) { if( $glb->{ verbose} > 1 ) { print; } if( />(\S*)/) { } elsif( /Score=\s*(\d+)/) { if( $evidence ) { $norm_score = squash ( $param->{ oligod_param} { selfjhyp} { squash_factor} *
( $param->{ oligod_param} { selfjhyp} { cutoff} - $score) ); my $evidence_str = "$evidence"; push Θresult, { "score", $norm_score, "raw", $score, "evidence", $evidence_str} ; $evidence = ' ' ; }
$score = $1; $read = 1; } elsif( /Mask:/ ) { $read = 0; } elsif ( $read) { $evidence .= $_; } } close PIPE; $norm_score = squash ( $param->{ oligod_param} { self_hyp}{ squash_factor} * ( $param->{ oligod_param} { self_hyp}{ cutoff} - $score) ); push (-result, { "score", $norm_score, "raw", $score, "evidence", $evidence} ; if( $glb->{ verbose} > 1 ) { print "</DYP>\n"; } if( $glb->{ verbose} > 1 ) { print "<SELF_EVIDENCE>\n"; print "$evidence"; print "</SELF_EVIDENCE>\n"; } gresult;
}
sub target_struct {$$$) { my ( $glb, $param, $seq) = @_; local *PIPE; my $score = 0; my $evidence = ' ' ; my $seqfn = "$glb->{ tmpdir}/${$}seq_t . fasta"; my Θresult = ( ) ; my $min_score = 15; my $dyp_opt = "-depth 100 -min_score $min_score"; my $read_mask = 0; my $mask;
print fasta fn ( $seq, $seqfn) ;
#print "$glb->{ dyp} -f $seqfn $dyp_opt"; open( PIPE, "$glb->{ dyp} -f $seqfn $dyp_opt|") || die "Failed to open pipe $!"; while ( <PIPE>) { print; if( $glb->{ verbose} > 1 ) { print; } if( />(\S*)/) { $mask = ' ' ; } elsif( /Score=\s*(\d+)/) { $score = $1;
} elsif ( /Mask:/) { $read_mask = 1;
} elsif ( $read_mask) {
$mask .= $_; } else {
$evidence .= $_; } } close PIPE; print "<TARGET_STRUCT>\n"; print $evidence; if( $mask ) { print $mask; } else { print "No significant secondary structure identified.";
} print "</TARGET STRUCT>\n";
$evidence =~ s/\n//g; $mask =~ s/\n//g; if( $score < $min_score) { $mask = ""; }
$mask; }
######## PATTERN FILTERS ######### sub target_ _struct_score ($$$$) { my ($mask, $start, $end, $param) my $score = 1; my $count = 0; $_ = ""; if( $mask ) { my $len = $end - $start; $_ = substr( $mask, $start, $len) ; $count = tr/-/#/; my $match = int ( $count / $len * 100);
$score = squash( $param->{ oligod_param} { target_struct } { squash_factor}
( $param->{ oligod_param} { target_struct} { cutoff} - $match) ); }
{ "score", $score, "raw", $count, "evidence", $_} ;
} sub cg_score($){ my ($seq) = @_; if ( $seq =~ /cg/i) { return { "score", 0, "raw", 1};
}
{ "score", 1, "raw", 0}; } sub ggg_score ($$) { my ($param, $seq) = @_; my $maxg = 0; my $ng; my $score; while ( $seq =~ / (g+) /gi) { $ng = length ( $1) ; if ( $ng > $maxg) { $maxg = $ng; } }
$score = squash( $param->{ oligod_param} { ggg}{ squash_factor} *
( $param->{ oligod_param} { ggg} { cutoff} - $maxg) ); { "score", $score, "raw", $maxg}; }
############ PROBE ############
sub oligo_property_score ($$) { my ($param, $oligo_property) = @_; my $oligo_par; my $name; my @input; my (-weight; my $output; foreach $name ( keys %{$param->{ oligod_param} } ) { $oligo_par = $param->{ oligod_param} { $name}; if( $oligo_j.ar->{ weight} && $oligo_par->{ layer} ) { #print "name = $name\n";
#print "score = $oligo_property->{ $name}{ score }\n"; #print "weight = $oligo__par->{ weight }\n"; push Θinput, $oligo_property->{ $name}{ score}; push (-weight, $oligo_par->{ weight}; } } $output = perceptron( \@weight, \@input) ; $output; }
sub print_oligo_property ($$) { my ($param, $oligo_property) = @_;
my $name; my $oligo_par; print "<SCOREJΞVIDENCE>\n"; printf( "%-lls %5.2f\n", "oligo score :", $oligo_property->{ score}); foreach $name ( sort keys %{$param->{ oligod_param} } ) { $oligo_par = $param->{ oligod_param} { $name}; if( $oligo_par->{ weight}) { printf( " %-13s %7g score=%5.2f x %3g (%6g %5g %4g)\n", $name, $oligo_property->{ $name}{ raw},
$oligo_property->{ $name}{ score}, $oligo_par->{ weight}, $oligo_par->{ cutoff}, $oligo_par->{ squash_dx}, $oligo_par->{ squash_dy} ) ;
} } print "</SCORE_EVIDENCE>\n\n" ;
}
sub collect_score ($$$$$$$) { my ($prev_region,
$start_count, $end_count, $start_score, $end_score, $prev_pos, $region) = @__; my $next_region; my $extra_region; if( $start_count && $end_count ) { if ( $prev_region ) {
$prev_region->{ end} = $prev_pos - 1;
$next_region->{ hit_count}= $prev_region->{ hit_count};
$next_region->{ score} = $prev_region->{ score};
$extra_region->{ hit_count}= $prev_region->{ hit_count}; $extra_region->{ score} = $prev_region->{ score};
}
$next_region->{ start} = $prev_pos + 1;
$extra_region->{ start} = $prev_pos;
$extra_region->{ end} = $prev_pos; $extra_region->{ length} = 1;
$extra_region->{ hit_count}+= $start_count;
$extra_region->{ score} += $start_score;
$next_region->{ start} = $prev_pos + 1;
$next_region->{ hit_count} += $start_count; $next_region->{ score} += $start_score; $next_region->{ hit_count} -= $end_count; $next region->{ score} -= $end_score; } elsif ( $start_count) { if ( $prev_region ) { $prev_region->{ end} = $prev_pos - 1; $next_region->{ hit_count}= $prev_region->{ hit_count}; $next_region->{ score} = $prev_region->{ score};
}
$next_region->{ start} = $prev_pos; $next_region->{ hit_count} += $start_count; $next region->{ score} += $start_score; } elsif ( $end_count) {
if ( $prev_region ) {
$prev_region->{ end} = $prev_jpos;
$next_region->{ hit_count}= $prev_region->{ hit_count};
$next_region->{ score} = $prev_region->{ score}; }
$next_region->{ start} = $prev_pos + 1; $next_region->{ hit_count} -= $end_count; $next region->{ score} -= $end score;
} else {
$next_region->{ start} = $prev_pos;
$next_region->{ hit_count} = 0;
$next_region->{ score} = 0; } if ( $prev_region) { $prev_region->{ length} = $prev_region->{ end} - $prev_region->{ star }+l; if ( $prev_region->{ length} >0 ) { push §$region, $prev_region; } } if ( $extra_region ) { push @$region, $extra region; }
($prev_region, $next_region) ; } sub print_hit ($$) { my ($hit, $newline) = @_; my $nl = " "; if( $newline) { $nl = "\n";} printf ( "%4d - %4d %3d%% %4d %10s %6d - %6d %s $nl", $hit->{ qstart}, $hit->{ qend},
$hit->{ ident}, $hit->{ score},
$hit->{ tname}, $hit->{ tstart}, $hit->{ tend},
$hit->{ tstrand}); }
######## IDENTIFY THE SEARCH SEQUENCE IN THE DATABASE ########
sub check_selfjhit ($$) { my ($qlen, $hit) = @_; my $glb_ident = 0; if( $qlen ) { $glb_ident = ( ($hit->{ qend} - $hit->{ qstart}) * $hit->{ ident})/ $qlen '} if( $glb__ident > 95) { return $glb_ident; }
0; }
sub overlap($$){ my ($hitl, $hit2) = @_ my $overlap = 1; my $minslack = 5;
my $maxslack = 30; my $len = min ( $hit2->{qend}-$hit2->{ qstart},
$hitl-> { qend} -$hitl-> { qstar } ) ; my $slack = max ( $minslack, min ( $maxslack, $len * 0.1)); if( ( $hitl->{ qstart} + $slack) > $hit2->{qend} || ( $hit2->{ qstart} + $slack) > $hitl->{qend} ) { $overlap = 0;
} $overlap; }
sub check_coverage ($@) { my ($qlen, .exon) = @_ my $hit; my $prev_end = 0; my $overlap; my $maxgap = 30; my $maxoverlap = 0; print "<GENE>\n"; print "Coverage of the query ($qlen):\n"; foreach $hit ( sort {$a->{ qstart} <=> $b->{ qstart}} @exon) {
$overlap = $hit->{ qstart} - $prev_end; if( abs ( $overlap) > $maxoverlap) { $maxoverlap = abs ( $overlap) ; } if ( $prev_end) { print "$overlap\n";
} printJhit ( $hit, 0) ; $prev_end = $hit->{ qend}; } if ( $prev_end) { $overlap = $qlen - $prev_end; if( abs ( $overlap) > $maxoverlap) { $maxoverlap = abs ( $overlap) ; } print "$overlap\n"; } if ( $maxoverlap > $maxgap ) { print "Warning: The query sequence was not found completely in ".
"the database\n"; }
}
sub find__self ($%) { my ( $param, %blast_parse) my $hit; my -exon; my $overlap_flag; my $overlapJhit; foreach $hit ( sort {$b->{score} <=> $a->{score}} @{$blast_parse{ hits}} if( $hit->{ qlen} >= $param->{ gene_len_cutoff} &&
$hit->{ ident} >= $param->{ gene_ident_cutoff} ) { $hit->{ self} = 1; $overlap_flag = 0; foreach (Θexon) { if( overlap ( $_, $hit) ) { $overlap_flag = 1; my %new_var = %$_;
$overlap hit = \%new var;
last; } } if( ! $overlap_flag ) { push Θexon , $hit; } else { #print "Overlap to:\n"; #print hit ( $overlap hit, 1); }
} } check_coverage ( $blast_parse{ qlen}, @exon) ; }
sub print_exon (@ ) { my (Θexon) = @_; my $hit; my $i = 0; foreach $hit ( sort { $a->{ qstart} <=> $b->{ qstart}} -exon) { $i++; print "$hit->{ qstart} - $hit->{ qend}\n"; } }
################################################################
sub blast_param2param_line ($) { my( $blast_param) = @_; my %key2opt = ( "wordlen", "-W", "strand", "-S", "expect", "-e", "filter", "-F", "nproc", "-a"); my %strand2opt = ( "-", 3, "both strands", 3,
"direct strand only", 1, "reverse strand only", 2, "top", 1, "bottom", 2); my $param line = ' ' ; foreach (keys %$blast_param) { if( $key2opt{ $_} ) { if ( $_ eq "strand" ) {
$param line .= " $key2opt{ $ } $strand2opt{ $blast param->{ $ }}"; } else {
$param_line .= " $key2opt{ $_} $blast_param->{ $_}"; } } }
$param_line; }
sub param_line2blast_param($) { my( $param_line) = @__; my %key2opt = { "wordlen", "-W", "strand", "-S", "expect", "-e", "filter", "-F", "nproc", "-a"}; my %opt2strand = { 3, "both strands", 1, "direct strand only",
2, "reverse strand only"};
my $blast_param; my %opt2key = reverse %key2opt; my %param_hash = split ( " " , $param_line) ; foreach (keys %param_hash) { if ( $opt2key{ $_} ) { if ( $_ eq "-S" ) {
$blast_param->{ $opt2key{ $_} } = $opt2strand{ $paramjhash{ $_} } ;
} else {
$blast_param->{ $opt2key{ $_} } = $param_hash{ $_} ; } } else { die "Sorry, the option $_ has not been implemented yet."; } } $blast_param;
}
sub position_score ($$$) { my ( $glb, $param, $seq) = @_; my $hit; my $hit_id = 0; my $blast_out; my %blast_parse;
# -e 100 : show all hits 50 = down to 17 nuc in human
# -F F : Don't filter # -S 1 : Search only top strand of query, default is 3=both strands
# -a 4 : Use four CPU's
# -W 30 : Word lengt 30 (default is 11) my @end_points; my @region; my $prev_pos = 1; my $p; my $score; my $hit_count; my $prev_region; my $next_region; my $one_nuc_region; my $start_score; my $end_score; my $start_count; my $end_count; my $ident; my $ ignore = 0 ; my $param_line = blast_param2param_line ( $param->{ blast_param} ) ;
$blast_out = blast ( $glb, $seq, $param->{ blastdb} , "blastn" ,
$param_line) ; if ( $glb->{ view} =~ /BLAST/ ) { print "<BLAST>$blast_out</BLAST>\n" ;
}
%blast_parse = parsejblast ( $blast_out) ; find_self( $param, %blast_parse) ;
# push @{$blast_parse{ hits}}, { "query_name", "X", "tname", "X", "score", 10, # "qstrand", "P", "tstrand", "P", "qstart", 252, "qend", 269, "ident", 0}; foreach $hit (@{$blast_parse{ hits}}) {
#if( $ident = check_self_hit ( $blast_parse{ qlen}, $hit) ) { if( $hit->{ self}) { if( ! $ignore ) { $ignore = 1; print "Ignoring the following hit(s):\n";
}
$ident = sprintf ( "%3.0f", $hit->{ ident}); printf( "%30s %6d %6d %s\n", $hit->{ tname}, $hit->{ tstart}, $hit->{ tend} , "$hit->{ ident}% identity to $blast_parse{ query_name } " ) ;
} else{ push @end_points, {"hit_id", $hit_id, "pos", $hit->{ qstart}, "type", "start"}; push @end_points, {"hit_id", $hit_id, "pos", $hit->{qend} , "type", "end"}; if (0){ print "$blast__parse{ query_name}, $hit->{ tname} ". "$hit->{score} ".
"$hit->{expect} ". "$hit->{qstrand} ". "$hit->{tstrand} ". "$hit->{qstart} ". "$hit->{qend} ".
"$hit->{ident }\n";
}
$hit_id++; } } if ( ! $ignore ) { print "The query sequence was not found in the database! \n";
} print "</GENE>\n";
$start_count = 0;
$end_count = 0;
$start_score = 0; $end_score = 0; if( $#end_points >= 0 ) { foreach (sort { $a->{pos} <=> $b->{pos}} @end_points) { # print "$_->{ pos}\n"; if ( $_->{ pos} > $prev_pos) { $prev_region = collect_score ( $prev_region, $start_count, $end_count, $start_score, $end_score, $prev_pos, \@region) ;
$prev_pos = $_->{ pos}; $start_count = 0; $end__count = 0; $start_score = 0; $end_score = 0; } if( $_->{ type} eq "start" ) { $start_score += $blast_parse{ hits}->[ $_->{ hit_id}]->{ score}; $start_count ++; }
else { $end_score += $blast_parse{ hits}->[ $_ ->{ hit id}]->{ score}; $end_count ++;
}
}
($prev_region, $p) = collect_score ( $prev_region, $start_count, $end_count, $start_score, $end_score, $prev_pos, \@region) ; my $last_region;
$last_region->{ start} = $prev_region->{ end} + 1;
$last_region->{ end} = length ( $seq->{ seq});
$last_region->{ length} =
$last_region->{ end} - $last_region->{ start} + 1;
$last_region->{ score} = 0;
$last_region->{ hit_count} = 0; if ( $last_region->{ length} > 0 ) { push (-region, $last_region;
} }
if (0){ foreach (gregion) { print
"$_->{ start}, $_->{ end}, $_->{ score}, $_->{ hit_count}, $_->{ length}\n";
} }
(\@region, \%blast_parse) ;
} sub hit_score ($$$$) { my ($position score, $start, $end, $param) my $sumscore = 0; my $n = 0; my $score; my $overlap; my $maxscore = ■ ■9999999999; my $norm_score; my $result; foreach (8$position_score) { if( ($_->{ start} <= $end && $_->{ end} >= $start) ) { $overlap = min ( $end, $_->{ end}) - max ( $start, $_->{ start}); $score = $_->{ score} * $overlap; # / $_->{ length}; if ( $score > $maxscore ) { $maxscore = $score; } if( $score > $param->{ min_score} ) { $sumscore += $score; ttprint "($score $_->{ start} $_->{ end} $_->{ score} $_->{ length} $overlap) "; }
$n ++; } } $norm_score = squash( $param->{ oligod_param} { hit_score}{ squash_factor}
( $param->{ oligod_param} { hit_score}{ cutoff} - $sumscore) ); #print "SCORE: $sumscore, $param->{ min_score}, $norm_score\n";
#$score = int ( $sumscore + 0.5); # 0.5 to get correct rounding $result = { "score", $norm score, "raw", $sumscore};
$result; } sub oligo_sort_diff ($$) { my ($a,$b) = Θ_; my $result;
$result = $a->{dbmatch} <=> $b->{ dbmatch}; if( ! $result) { $result = $a->{temp} <=> $b->{ temp}; } #if( ! $result) { $result = $a->{self_score} <=> $b->{ self_score}; }
$result;
sub find_oligo_hits ($$$) { my ($blast_parse, $start, $end) = @_; my $hit; my $overlap; my $qoffset; my Θhits = ( ) ; my $qseq; my $mseq; my $tseq; my $1 = ' ' ; my $r = ' ' ; foreach $hit (@{$blast_parse->{ hits}}) { $overlap = min( $end, $hit->{ qend}) - max ( $start, $hit->{ qstart}); if( !$hit->{ self} && $overlap > 0 ) { $qoffset = max( 0, $start - $hit->{ qstart}) + 1; $1 = ' ' x max( ($hit->{ qstart} - $start) , 0) ; $r = ' ' x max( ($end - $hit->{ qend}), 0) ;
#$qseq = $1 . substr ( $hit->{ queryseq}, $qoffset, $overlap) . $r; $mseq = $1 . substr ( $hit->{ matchseq} , $qoffset, $overlap) . $r; $tseq = $1 . substr ( $hit->{ targetseq} , $qoffset, $overlap) . $r; push Θhits, { "overlap", $overlap,
"qseq", $qseq, "mseq", $mseq, "tseq", $tseq, "tname", $hit->{ tname} }; } } Θhits;
} sub match_score ($$$$) { my ($param, $blast_parse, $seq_start, $seq_end) = Θ_; my $myhit; my $nm; my $max_match = 0; my $max_stretch = 0; my $1; my $max_stretch_score; my $max_match_score; my Θhits = find_oligo_hits ( $blast_parse, $seq_start, $seq_end) ; foreach $myhit (sort {$b->{ overlap} <=> $a->{ overlap}} Θhits) { $__ = $myhit->{ mseq};
while ( / ( \ | + ) /g ) {
$1 = length ( $1 ) ; if ( $1 > $max_stretch) { $max_stretch = $1 ;
} }
$nm = s/\|/./g; if ( $nm > $max match) { $max match $nm; }
} $max_match_score = squash (
$param->{ oligod_param} { max_match} { squash_factor} * ( $param->{ oligod_param} { max_match} { cutoff} - $max_match) ); $max_stretch_score = squash (
$param->{ oligod_param} { max_stretch} { squash_factor} * ( $param->{ oligod_param} { max__stretch} { cutoff} - $max_stretch) #print "MATCH $max stretch, $max_match\n";
({ "score", $max_match_score, 'raw", $max_match}, { "score", $max_stretch score, "raw", $max stretch}); }
sub find_oligo ($$$$$$) { my ($glb, $seq, $position_score, $blast_parse, $param, $seq_number) = Θ my $i; my $1 = length ($seq->{seq} ) - $param->{ oligo_length} ; my $s; my $temp = 0; my $temp_tm = 0; my $tm_dan = { "temp", 0}; my $tm; my $self_score = 0; my $db_score; my Θresult; my $min_hit_score_score = 999999999; my @oligo_center = (); my $c = 0; my $dist; my $this_oligo_overlaps= 0; my Θhits; my $hitcount = 0; my $nm; my $myhit; my $lnaseq; my $evidence; my $count; my $max_match; my $seq_start; my $seq_end; my $oligo_property; my Θseqs; my Θlnaseqs; my @dummy,• my @self_hyp; my $selfJhyp_batch = 1; my $tm_batch = 1; my $tm_result; my $target_mask;
if ( $param->{ oligod_param} { target_struct} { weight} ) {
$target_mask = target_struct ( $glb, $param, $seq) ;
} for ( $seq_start=l; $seq_start<$l; $seq_start++) { $seq_end = $seq_start + $param->{ oligo__length} ;
$s = substr ( $seq->{ seq}, $seq_start,
$param->{ oligo_length} ) ; $i = $seq_start - 1;
$seqs[ $i] = $s; $lnaseqs[ $i] = capitalize ( $s, $param->{cphase} ,
$param->{ cfreq} , $param->{ end_spike_len} ) ; } if( $self_hyp_batch) { if ( $param->{ oligod_param} { selfJhyp} { weight} ) { if( $param->{ lna}) {
Θselfjhyp = self_hyp( $glb, $param, Θlnaseqs);
} else { Θselfjhyp = selfjhyp ( $glb, $param, Θseqs) ;
} } } if( $tm_batch) { if( $param->{ oligod_param} { tm_min}{ weight} && $param->{ oligod_param} { tm_max}{ weight} ) { $tm = tm( $glb, $param, Θlnaseqs);
} } for ( $seq__start=l; $seq_start<$l; $seq_start++) { $i = $seq_start - 1; $lnaseq = $lnaseqs [ $i] ; $seq_end = $seq_start + $param->{ oligo_length} ;
$oligo_property = { "seq", $seqs [ $i] , "start", $seq_start,
"end", $seq_end}; $s = $seqs[ $i] ; if( $param->{ oligod_param} { hit_score}{ weight} ) { $oligo_j?roperty->{ hit_score} = hit_score ( $position_score, $seq_start, $seq_end, $param) ; } if( $param->{ oligod_param} { target_struct} { weight} ) { $oligo__property->{ target_struct} = target_struct_score ( $target_mask, $seq_start, $seq_end, $param) ; } if( $param->{ oligod_param} { max_match} { weight} || $param->{ oligod_param} { max_stretch} { weight} ) {
($oligo_property->{ max_match}, $oligo_property->{ max_stretch} ) match_score ( $param, $blast_parse, $seq__start, $seq_end) ; } if( $param->{ oligod_param} { self_match} { weight} ) { if( $param->{ lna}) {
$oligo_property->{ self_match}= self_match( $glb, $param, $lnaseq) ; } else {
$oligo_property->{ self_match}= self_match( $glb, $param, $s) ;
} if( $param->{ oligod_param} { self_hyp}{ weight} ) { if( $self_hyp_batch ) { $i = $seq_start - 1; $oligo_property->{ self_hyp} = $self_hyp[$i] ;
} else { if( $param->{ lna}) {
($oligo_property->{ selfjhyp}, Θdummy) = self_hyp ( $glb, $param, ($lnaseq) ) ;
} else { ( $oligo_property->{ self_hyp} , Θdummy) = self_hyp ( $glb, $param,
( $s) ) ; } } }
if( $param->{ oligod_param} { tm_dan_min} { weight} && $param->{ oligod_param} { tm_dan_max} { weight} ) { ($oligo_property->{ tm_dan}, $oligo_property->{ tm_dan_min}, $oligo_property->{ tm_dan_max}) : tm_dan ( $glb, $param, $s) ; } if{ $param->{ oligod_param} { tm_min} { weight} && $param->{ oligod_param} { tm_max} { weight} ) { if( $tm_batch ) { $i = $seq_start - 1; $tm_result $tm->[$i]; $oligo_property->{ tm} = $tm_result->{ tm} ; $oligo_property->{ tm_min} = $tm_result->{ tm_min}; $oligo_property->{ tm max} = $tm result->{ tm max};
} else { $tm_result tm( $glb, $param, ($lnaseq) ] $oligo_property->{ tm} = $tm_result->[0] { tm} ; $oligo_property->{ tm_min} = $tm_result->[0] { tm_rαin}; $oligo_property->{ tm max} = $tm result->[0]{ tm max};
}
}
if( $param->{ oligod_param} { ggg} { weight} ) {
$oligo_property->{ ggg} = ggg_score( $param, $s); } if( $param->{ oligod_param} { eg} { weight} ) {
$oligo_property->{ eg} = cg_score ( $s) ; } $oligo_property->{ score} = oligo_property_score ( $param, $oligo_property) ; push Θresult, $oligo_property; }
$i = 0;
foreach $oligo_property ( sort {$b->{ score} <=> $a->{ score}} Θresult) { $this_oligo_overlaps = 0; if( $param->{ mindist} ) { $c = $oligo_property->{ start} + ($oligo_property->{ end} - $oligo_property->{ start}) / 2; foreach (@oligo_center) { $dist = abs ( $_ - $c) ; if ( $dist < $param->{ mindist} ) { $this_oligo_overlaps = 1; last; } } } #print "Oligo center is $c, overlap is $this_oligo_overlaps\n"; if( ! $this_oligo_overlaps) { $count++; if( $glb->{ view} =~ /PROBE/ ) { print "<PR0BE>\n"; print "seqno: $seq number oligono: $count\n"; }
#$temp = meltjtemp ( $oligo_property->{ seq});
$lnaseq = capitalize ( $oligo__property->{ seq}, $param->{cphase} ,
$param->{ cfreq} , $param->{ end_spike_len} ) ; #$temp_tm = tm_pred( $lnaseq) ;
#$tm_dan = tm_dan( $glb, $param, $oligo_property->{ seq} ) ; #$self_score = self_match( $glb, $param, $oligo_property->{ seq}); Θhits = find_oligoJnits ( $blast_parse, $oligo_property->{ start}, $oligo property->{ end});
$max_match = 0; foreach $myhit (sort {$b->{ overlap} <=> $a->{ overlap}} Θhits) {
$_ = $myhit->{ mseq};
$nm = s/\|/./g; if ( $nm > $max_match) { $max_match = $nm; } } if( $glb->{ view} =~ /SCOREJEVIDENCE/ ) { print oligo property ( $param, $oligo property) ; }
#if( $glb->{ view} =~ /PROBE/ ) { #XXX printf( "score=%4.2f tm_ex=$temp_tm", $oligo_property->{ score}); printf{ "score=%4.2f ", $oligo_property->{ score}); if ( $param->{ oligod_param} { tm_min} { weight} && $param->{ oligod_param} { tm_max} { weight}) { #print ", tm=$oligo_property->{ tm_min} { raw}\n\n"; print ", tm=$oligo_property->{ tm_min}{ raw}"; } else { print "\n"; } if ( $param->{ oligod_ )aram} { tm_dan_min} { weight } && $param->{ oligod_param} { tm_dan_max} { weight} ) { tprint ", tm_dan=$oligo_property->{ tm_dan_min} { raw} \n\n" ; print " , tm_dan=$oligo_property->{ tm_dan_min} { raw} \n\n" ; } else { print "\n\n" ; } print "$lnaseq $oligo_property->{ start} $oligo_property->{ end}\n";
#} if( $glb->{ view} =~ /HITS/ ) { print "<HITS>\n"; # "$oligo__property->{ dbmatch}, ".
# "$window->{ self_score}, $max_match\n"; $hitcount = 0; foreach $myhit (sort {$b->{ overlap} <=> $a->{ overlap}} Θhits) { if( $hitcount >= $param->{ maxhits}) { my $n = $#hits + 1; print "Only showing $param->{ maxhits} hits of $n hits\n"; last; }
#print "\n$myhit->{ qseq}\n$myhit->{ mseq} \n$myhit->{ tseq}\n"; $_ = $myhit->{ mseq}; $nm = s/\| /. /g; printf ( "$myhit->{ mseq} %2d matches\n$myhit->{ tseq} $myhit->{ tname}\n",
$nm) ; $hitcount++;
} print "</HITS>\n";
} if( $glb->{ view} =~ /SELFJEVIDENCE/ ) { print "<SELF_EVIDENCE>\n"; if ( $param->{ oligod_param} { selfjhyp}{ weight} ) { print "$oligo_property->{ self_hyp}{ evidence}";
} if( $param->{ oligod_param} { self_match} { weight} ) { print "$oligo_property->{ self_match} { evidence}"; } print "</SELF_EVIDENCE>\n"; } if( $glb->{ view} =~ /TARGET_STRUCT_HIT/ ) { print "<TARGET_STRUCT_HIT>\n"; if( $param->{ oligod_param} { target_struct } { weight} ) { print "$oligo__property->{ target_struct } { evidence}"; } print "\n</TARGET STRUCT HIT>\n"; } if( $glb->{ view} =~ /PROBE/ ) { print "</PROBE>\n"; } push @oligo_center, $c; $i++; } if( $i > $param->{ max_noligo} ) { last; } } }
########## MAIN ###### sub main($) { my ($glb) = @_; my $lastline; my $seq; my $count = 0; my $position score; my $blast parse;
my $hit; my $check_blast = 1; my $param; my $starttime; my $date; my $runtime; my $opt; my $popt; local *INFILE; my $oligo_design = 0;
if(0) { $seq = read_file ( "out. blast") ; #print $seq; parse_blast ( $seq) ; exit ( 0 ) ; } ($opt, $popt) = parse_argv( $glb) ; read_conf ( $glb) ;
$param = initialize_param( $popt) ;
makedir ( $glb->{ tmpdir});
foreach (ΘARGV) { if ( $_ eq "-") { *INFILE = *STDIN; } else { open( INFILE, "<$_") || die "Could not open $_ $!"; } while ( $seq = read_fasta ( \$lastline, \*INFILE) ) { $count++; if ( $opt->{ capitalize}) { $seq->{ name} = $seq->{ name} . "_lna"; $seq->{ seq} = capitalize ( $seq->{ seq},
$param->{ cphase}, $param->{ cfreq}, $param- >{ end_spike_len} ) ; print_fasta( $seq, \*STDOUT) ; } elsif ( $opt->{ rev}) {
$seq->{ seq} = rev ( $seq->{ seq} ) ; $seq->{ name} = $seq->{ name} . "_rev";
$seq->{ seq} - capitalize; $seq->{ seq}, 2, 3, 0) ; print_fasta( $seq, \*STDOUT) ;
} elsif ( $opt->{ comp}) { $seq->{ seq} = comp ( $seq->{ seq} ) ; $seq->{ name} = $seq->{ name} . "_comp' print_fasta( $seq, \*STDOUT) ;
} elsif ( $opt->{ revcomp}) { $seq->{ seq} = revcomp ( $seq->{ seq}); $seq->{ name} = $seq->{ name} . "_revcomp"; print_fasta( $seq, \*STDOUT) ;
} elsif ( $opt->{ self_score}) { my ($evidence, $self_score) = selfJhyp ( $glb, $param, $seq->{ seq}); print ( "score = $self_score\n") ; } else{ $starttime = time ( ) ; if( ! $oligo_design) { my ($sec, $min, $hour, $mday, $mon, $year) = (localtime( $starttime) ) [0..5] ; $year += 1900; $mon++;
$date = "$year $mon $mday $hour: $min: $sec"; $oligo_design = 1; print "<PROBE_RUN $date version=$glb->{ version}>\n"; if( $glb->{ view} =~ /PARAMETER/ ) { print dump_param( $glb, $param) ; } } if ( $param->{ oligo_sense} eq "reverse") {
$seq->{ seq} = revcomp ( $seq->{ seq}); } print "<PROBEJDESIGN>\n"; if( $check_blast && $param->{ fastadb}) { my Θfastadbs = split ( " ", $param->{ fastadb}); my $bdb; foreach (Θfastadbs) {
$bdb = formatblastdb ( $glb, $_, "F"); $param->{ blastdb} .= " " . $bdb;
}
$check blast = 0;
($position_score, $blast parse) = position_score ( $glb, $param, $seq) if( $glb->{ view} =~ /FASTA/ ) { print "<FASTA>\n"; print_fasta( $seq, \*STDOUT) ; print "</FASTA>\n"; } find_oligo ( $glb, $seq, $position_score, $blast_parse, $param, $count) ;
$runtime = time() - $starttime; if( $glb->{ view} =~ /RUNTIME/ ) { print "<RUNTIME>$runtime s</RUNTIME>\n";
} print "</PROBE_DESIGN>\n";
} } close INFILE;
} if( $oligo_design) { print "</PROBEJRUN>\n"; }
} main ( $glb) ;
B. The dyp program is used by oligod to predict the secondary structure and self annealing properties of the oligonucleotides
/*
#$Id : dyp . c, v 1 . 15 2002/07/29 06 : 28 : 00 nt Exp $
DESCRIPTION
Loading sequences :
-f fastafile -seq sequence
Set min_score to 0 to see all structures -min score min score
Size of sliding window default is INTJMAX -depth depth
Number of results to show -max_res max_res -secondary_structure
Calculates the secondary structure of an oligo
-self_annealing
Calculates the binding energy between an oligo and it self.
-hybridization
Calculates the binding energy between two different oligos, or an oligo and its target.
AUTHOR
Copyright Niels Tolstrup 2002 Exiqon */
#include <stdio.h> /* fprintf, rand..*/
#include <stdlib.h> /* calloc.. */
#include <stdarg.h> /* va_list.. */ #include <string.h> /* strlen.. */
#include <limits.h> /* USHRTJMAX.. */
#include <math.h> /* pow */
#include <ctype.h> /* tolower */
#include "getopt.h" /* getopt long */
/* export MALLOC_TRACE=dyp_memtrace mtrace dyp $MALLOCJTRACE #include <mcheck.h> mtrace
* *** * ***** * * ******* *** * DEFINES *** *** *********/ tdefine THEVERSION "1.2 2002-10-11-15-46" #define TRACEBACK_INT unsigned short #define TRACEBACK_INT_MAX USHRT_MAX #define HI printf ( "Hello\n") ; tdefine SECONDARY_STRUCTURE 1 tdefine SELF_ANEALING 2
#define HYBRIDIZATION 4 tdefine TM 8
/* Increase the stack by this amount in case of overflow */ tdefine STACK_CHUNK 255 /* Number of sequences to allocate space for before realloc */ tdefine SEQS_CHUNK 255
/* Size of sequence to allocate space for before realloc */ tdefine SEQ CHUNK 1024
/* Longest possible line in a matrix that can be read */ tdefine MAX__LINE_LEN 5000 tdefine MIN(a,b) (a<b?a:b) tdefine MAX (a,b) (a>b?a:b)
/*********** GLOBAL VARIABLES ****************/
int verbose = 0; char* program_name;
/************ STRUCTURES *********************/
typedef struct sequences_t { int nseq; int maxnseq; char **seqs; } sequences_type;
typedef struct dynamic__str_t { int len; int maxlen; char *s; } dynamic str type;
typedef struct pairjt {
TRACEBACK_INT i;
TRACEBACK_INT j ; } pair_type;
typedef struct stack_t {
TRACEBACKJINT sp;
TRACEBACK NT size; pair_tyρe *stack; } stack type;
typedef struct tokenarray_t { int ntok; int maxntok; char **tok; } tokenarray_type;
typedef struct matrix t char *seql; char *seq2; int 11; int 12; int max size; int *d;
} matrix type;
typedef struct score_rec__t { int alphjlen; int gap_start; int gap_cont; int loop_score; int match_cont_factor; int match_threshold; int min_strong_ident_score; int min__ident_score; int min_sim_score; char *alph; int *mat; } score_rec type;
typedef struct param t { char *alph; int +mat yp; int depth; int min_score; int max res; sequencesJ;ype *seqs; sequences_type *seqs2; score_rec_type *score_rec; } param_type;
/************** SYSTEM ****************,
void die ( char *fmt, ... ) { va_list ap; fprintf ( stderr, "Uhoh: "); va start ( ap, fmt) ;
vfprintf ( stderr, fmt, ap) ; va_end( ap) ; fprintf( stderr, "\n") ; exit ( 1) ; }
void usage ( char *fmt, ... ) { va_list ap; va_start ( ap, fmt); vfprintf ( stderr, fmt, ap) ; va_end( ap) ; fprintf ( stderr,
\nUsage: dyp -h, —help\n" " -v, —verbose\n" -r, —version\n" -u, —secondary_structure\n" -a, —self_anealing\n" -y, —hybridization\n" " -t,—tm\n"
-o, —output filename\n" -i, —input filename\n" -j , —input2 filename2\n" -s, —seq sequence\n" " -z, —seq2 sequence2\n" -d, —depth depth\n" -n, —min_score score\n" -p, —max_res n\n" -m, —matrix matfn\n" -1, —allolig length_of_oligo\n" -f, —spikefreq freq\n" -e, —sample n_samples\n") ; fprintf ( stderr, "\n"); exit ( 1 ) ; }
void hi ( char *fmt, ... ) { va_list ap; fprintf ( stdout, "Hi there ); va_start ( ap, fmt); vfprintf ( stdout, fmt, ap) ; va_end ( ap) ; fprintf ( stdout, "\n"); }
void *salloc( int nobj, int size) { void *mem; mem = calloc ( nobj, size);
/*printf ( "calloc: allocating %d objects of size %d bytes\n", nobj , size) ;*/ if ( mem == NULL) { die ( "Could not allocate %d x %d bytes\n", nobj, size); } return mem; }
FILE *sfopen( char *fn, const char *mode) { FILE *file;
file = fopen ( fn, mode); if ( file == NULL ) die ( "Could not open %s for %s.", fn, mode) ; return file; }
/*******■** STRING FUNCTIONS ***********
char *empty_str( int len) { char *str; str = calloc ( len+1, sizeof( char)); memset ( str, ' ' , len) ; str [len] = 0; return str; }
char *reverse ( char *str) { int i, len; char c; len = strlen( str); for( i=0; i<len/2; i++) { c = str [i] ; str[i] = str[(len-l) - i] ; strf(len-l) - i] = c; } return str; }
char *lower ( char *s) { int i, 1 = strlen( s) ; for( i=0; i<l; i++) s [i] = tolower ( s[i]); return s; }
char tocomp ( char c) { char fr_s[35] = "acgtumrwsykvhdbxnACGTUMRWSYKVHDBXN"; char to_s[35] = "tgcaakywsrmbdhvxxTGCAAKY SRMBDHVXX"; char *p; char cc; p = strchr( fr_s, c) ; if( P ) { p = to_s + (p - fr_s); cc = *p; } else cc = ' ' ; return cc; }
char *comp( char *s) { int i, 1 = strlen( s) ; for( i=0; i<l; i++)
s [i] = tocomp ( s [i] ) ; return s ; }
/******* I/O of strings *******/
void *print_lines ( FILE *outfile, char **str, int nstr) { int linelen = 80; int i, j, p, len; if( nstr > 0 ) { len = strlen ( str[0] for ( j=0; j<nstr; j++) { if( strlen( str[j]) != len ) { die( "String len = %d differs from first string len %d %s\n", strlen ( str[j]), len, str[j]); }
} p = 0; while ( p < len) { for ( j=0; j<nstr; j++) { f or ( i = p; i < p + linelen && i < len; i++) { putc ( str[j][i], outfile); } putc ( '\n', outfile); } p += linelen; putc( '\n', outfile); } } }
■*■*■*■*****■* STACK **************/
stack_type *init_stack( TRACEBACK_INT size) { stack_type *stack; stack = calloc ( 1, sizeof( stack -ype) ) ; stack->sp = 0; stack->size = size; stack->stack = calloc ( size, sizeof( pair -ype) ) ; return stack; }
void *free_stack( stackjtype *stack) { free ( stack->stack) ; free ( stack) ; }
void print_pair ( pair__type pair) { printf ( "%4d, %4d\n", pair.i, pair.j }
void push ( stack_type *stack, pair_type pair) { int new_size; if ( stack->sp < stack->size ) { stack->stack[ stack->sp++] = pair; if (0) {printf ("pushed :"); print_pair ( pair);}
} else { new_size = (stack->size + STACK_CHUNK) * sizeof ( pair_type) ; if ( stack->stack = realloc ( stack->stack, new_size) ) { stack->size = stack->size + STACK_CHUNK; stack->stack[ stack->sp++] = pair; } else { die ( "Failed to realloc stack to %d bytes", new_size) ; } } }
pair_type setpair ( int i, int j) { pairjtype pair; pair.i = i; pair.j = j; return pair; }
pair_type pop ( stack_type *stack) { if ( stack->sp > 0 ) return stack->stack[ — (stack->sp) ] ; else return setpair ( TRACEBACK_INT_MAX, TRACEBACK_INT_MAX) ; }
void test_stack( ) { stack_type *stack; int i; stack = init_stack( STACKjCHUNK) ; for( i=0; i<30; i++) push ( stack, setpair( i, i+1) ) ; for( i=0; i<30; i++) print_pair ( pop ( stack) ) ; free_stack( stack); }
/*********** DYNAMIC STRINGS *********/
free__dynamic_str_type ( dynamic_str _ype *seq) { free ( seq->s ) ; free ( seq) ; }
dynamic_str_type *new_dynamic_str_type ( ) { dynamic_strjtype *seq; seq = calloc ( 1, sizeof ( dynamic_strjhype) ) ; seq->len = 0; seq->maxlen = SEQ_CHUNK; seq->s = calloc ( seq->maxlen, sizeof ( char)); seq->s[0] = '\0'; return seq;
dynamic_str_type *addchar ( dynamic_str_type *seq, int c) { if ( seq == NULL) { seq = new_dynamic_strjtype ( ) ; } if( seq->len + 1 >= seq->maxlen ) { seq->maxlen += SEQ_CHUNK; if( ! (seq->s = realloc ( seq->s, seq->maxlen * sizeof ( char))) ) die ( "Failed to realloc seq to %d chars", seq->maxlen) ; } seq->s [ seq->len++] = c; return seq; }
/************* OLIGO FUNCTIONS ********/
long n_nmer ( int n_mer, char *alph) { int alphlen = strlen ( alph); long n_nmer = 0; if ( alphlen == 0 ) n_nmer = 0 ; else n_nmer = pow ( alphlen, n_mer) ; return n_nmer; }
char *make_nmer ( long number, int n_mer, char *alph) { long i, j, n = number; int alphlen = strlen ( alph); char *seq; seq = empty_str ( n_jmer) ; if ( alphlen == 0 ) strcpy( seq, "-"); else { for( j=0; j<n_mer; j++) { i = n % alphlen; n = n / alphlen; seq[j] = alph[i] ; } } seq = reverse ( seq) ;
return seq; }
char *make_random_nmer ( int n_mer, int spikefreq, char *alph) { long i, j; int alphlen = strlen ( alph); int halflen = alphlen / 2; char *seq; char phase = 0; seq = empty_str( n_mer) ; if( spikefreq) phase = (rand{) * spikefreq) / (RANDJMAX+1) ;
if( alphlen == 0 ) strcpy ( seq, "-") ; else { if ( spikefreq) for ( j=0; j<n_mer; j++) { i = (rand() * halflen) / (RANDJMAX+1) ; if( ( (j+phase) % spikefreq) == 0) i += halflen; seqfj] = alph[i] ; } else for ( j=0; j<n_mer; j++) { i = (rand() * alphlen) / (RANDJMAX+1) ; seqtj] = alph[i] ; } } return seq; }
/************* MATRIX I/O ************/
score_rec_type *init_score_rec_type ( int gap_start, int gap_cont, int loop_score, int match_threshold, int match_cont_factor, int min_strong_ident_score, int min_ident__score, int min_sim_score, char *alph, int *mat) { score_rec_type *score_rec; score__rec = (score_recJcype *) calloc ( 1, sizeof ( score_rec__type) ) ; score_rec->alph_len = strlen ( alph) ; score_rec->gap_start = gap_start; score_rec->gap_cont = gap_cont; score__rec->loop_score = loop_score; score_rec->match_threshold = match_threshold; score_rec->matσh_cont_factor = match_cont factor; score_rec->alph = alph; score_rec->mat = mat; score_rec->min_strong_ident_score = min_strong ident score; score rec->min_ident score = min ident_score;
score_rec->min_sim_score = mm sim score; return score rec;
void free_score_rec_type ( score_rec_type *score_rec ) f ree ( score_rec->alph) ; f ree ( score_rec->mat ) ; f ree ( score_rec) ;
}
void f ree_param_type ( param_type *param) { free_score_rec_type ( param->score_rec) ; f ree ( param->seqs->seqs ) ; f ree ( param->seqs ) ; f ree ( param->seqs2->seqs ) ; f ree ( param->seqs2 ) ; free ( param) ;
void validate_score_rec_type ( score_rec_type *score_rec, int min, int max) { int i, j ; if( strlen ( score_rec->alρh) < 1) { die ( "No alphabet\n") ; } if ( strlen ( score_rec->alph) != score_rec->alph__len) { die ( "alph length mismatch %d, %d\n", strlen ( score_rec->alph) , score_rec->alph_len) ; } for( i=0;i<score_rec->alph_len;i++) { for ( j=0; j<score_rec->alph_len; j++) { if ( score_rec->mat [i*score_rec->alph_len+j ] <min ) die ( "Matrix out of range min=%d", min); if ( score_rec->mat [i*score_rec->alρh_len+j ] >max ) die ( "Matrix out of range %d max=%d", score_rec->mat [i*score_rec->alph_len+j] , max) ;
} } if( score_rec->gap__start > max ) die ( "gap_start out of range max") ; if( score_rec->gap__start < min ) die ( "gap_start out of range max"); if( score_rec->gap_cont > max ) die ( "gap_cont out of range max") if( score_rec->gaρ_cont < min ) die ( "gap_cont out of range max") if( score_rec->loop_score > max ) die ( "loop_score out of range max") if( score rec->loop score < min ) die ( "loop_score out of range max")
void print_mat ( char *seql, char *seq2, int *mat) { int i, j; int lenl = strlen ( seql) ; int len2 = strlen ( seq2); int p num = 1; if ( p_num == 1) { printf ( " "); for ( i=0; Klenl; i++) { printf ( " %2d i)
} printf ( "\n ") ;
printf ( " " ) ; for ( i=0; i<lenl; i++) { printf ( " %c ", seql [i] ) ; } printf ( "\n\n"); for( j=0; j<len2; j++) { if ( ρ_num == 1) { printf ( "%2d %c ", j, seq2[j]); } else { printf ( "%c ", seq2[j]); } for ( i=0;i<lenl;i++) { printf ( "%5d ", mat [j *lenl+i] ) ; } printf ( "\n"); } }
/******* TOKENIZE A LINE *************/
void addtoken( char* token, tokenarray_type *tokena) { tokena->ntok++; if( tokena->ntok > tokena->maxntok ) { tokena->maxntok += 100; if( 1 (tokena->tok = realloc ( tokena->tok, tokena->maxntok * sizeof ( char))) ) die ( "Failed to realloc tokenarray to %d tokens", tokena- >maxntok) ; } tokena->tok[ tokena->ntok - 1] = token; }
tokenarray_tyρe *alloc_tokenarray_type ( ) { tokenarray type *tokena; tokena = calloc ( 1, sizeof ( tokenarray_type) ) ; tokena->maxntok = 30; tokena->ntok = 0; tokena->tok = calloc ( tokena->maxntok, sizeof ( char*) return tokena; }
void *free_tokenarray_type ( tokenarray_type *tokena) { free ( tokena->tok) ; free ( tokena) ; }
void *tokenize ( char *s, tokenarray_type *tokena) { char *token; tokena->ntok = 0; token = strtok( s, " \n"); if( token } addtoken( token, tokena);
while ( token = strtok( NULL, " \n") ) addtoken ( token, tokena); }
int tokenarray_is_num_list ( tokenarray_type *tokena) { int i = 0; char *endp; int res = 0; while ( i<tokena->ntok && strtol ( tokena->tok[i] , &endp, 10) == i) { if( *endp != 0 ) res = 0; i++; } if( i == tokena->ntok) res = i; else res = 0; return res; }
char *tokenarray2str ( tokenarrayjtype *tokena) { int i=0; char *alph; alph = calloc ( tokena->ntok + 1, sizeo ( char)); for ( i=0; i<tokena->ntok; i++) { if( strlen( tokena->tok[i] ) ! = 1) die ( "Sorry, the word %s was found were a char was expected,", tokena->tok[i] ) ; alphfi] = *tokena->tok[i] ; } alph[i+l] = 0; /* EOL */ return alph;
}
int tokenarray2dat ( tokenarrayjtype *tokena, int with_num, matrix_type *mt, char *c) { int i = 0; int j = 0; char *endp; int err = 0; int *dat; if( with num ) { if( atoi( tokena->tok[i] ) != mt->12) err = 1; i++; } if( strlen (tokena->tok [i] ) != 1) err = 2;
*c = *tokena->tok[i] ; i++; if ( tokena->ntok - i != mt->ll ) err = 3; else {
dat = calloc ( mt->ll, sizeof ( int)); for ( ; i<tokena->ntok; i++) { dat [j ] = strtol ( tokena->tok [i] , Sendp, 10) ; if ( *endp ! = 0 ) err = j + 10 ; j++; } } if( !err) { if ( mt->ll * mt->12 > mt->max_size ) { mt->max_size += 100; if ( ! (mt->d = realloc ( mt->d, mt->max_size * sizeof ( int))) ) die ( "Failed to realloc matrix to %d int's", mt->max size); } for( j=0; j<mt->ll; j++) mt->d[mt->ll * mt->12 + j] = dat[j]; mt->12++; } free( dat) ; return err;
/*********** i/o score_rec_type ************/
matrix^type *read_ma ( FILE *file) { char *s, *res; int n = MAX_LINE_LEN; int col_name_found = 0; int numbers_found = 0; tokenarray_type *tokena; dynamic_str_type *seq = NULL; char c; matrix_tyρe *mt; mt = calloc ( 1, sizeof ( matrix_type) ) ; mt->max_size = 100; mt->d = calloc ( mt->max_size, sizeof ( int)); mt->12 = 0; tokena = alloc_tokenarray_type ( ) ; s = calloc ( n + l, sizeof ( char) ) ; seq = new_dynamic_str_type ( ) ; do { res = fgets ( s, n + 1, file) ; if( res != NULL ) { tokenize ( s, tokena);
/*for(i=0;i<tokena->ntok;i++) printf ( "%d %s\n", i, tokena->tok[i] ) ; */ if ( ! col_name_found ) { /* Is it a column number line? */ if( ! numbers_found && (mt->ll = tokenarray_is_num_list ( tokena))) { numbers_found = 1;
} else {
/* Is it a column name line? */ mt->seql = tokenarray2str ( tokena); if ( numbers_found )
if ( mt->ll != strlen( mt->seql) ) die ("Expected a string of length %d but found %s", mt->ll, mt->seql) ; else mt->ll = strlen ( mt->seql) ; col_name_found = 1; } } else { /* skip empty lines */ if( tokena->ntok > 0) { /* Is it a data line? */ if ( tokenarray2dat { tokena, numbers__found, mt, &c) == 0) seq = addchar( seq, c) ; } else { res = NULL; } } }
}
} while ( res) ; mt->seq2 = calloc ( strlen ( seq->s) + 1, sizeof ( char)); strcpy( mt->seq2, seq->s) ; free_dynamic_str_type { seq) ; free ( s ) ; free_tokenarray_type ( tokena); return mt;
void print_score_rec_type ( score_rec_type *score_rec) { printf ( "alph__len %d\n", score_rec->alph_len) ; printf ( "gap_start %d\n", score_rec->gap__start) ; printf ( "gap_cont %d\n", score_rec->gap__cont) ; printf ( "alph %s\n", score_rec->alph) ; t print mat ( score rec->alph, score rec->alph, score rec->mat) ; }
void read_score__rec_type ( char *fn, score_rec_type *score_rec) { FILE *file; matrix type *mt; file = sfopen( fn, "r"); fscanf( file, "alph_len %d\n", & (score_rec->alph_len) ) ; fscanf( file, "gap_start %d\n", & (score_rec->gap_start) ) ; fscanf( file, "gap_cont %d\n", & (score_rec->gap_cont) ) ; fscanf( file, "min_strong__ident_score %d\n", &(score_rec-
>min_strong_ident_score) ) ; fscanf( file, "min_ident_score %d\n", & (score_rec->min_ident__score) ) ; fscanf( file, "min_sim_score %d\n", & (score__rec->min_sim_score) ) ; fscanf( file, "alph %s\n", score_rec->alph) ; mt = read_mat ( file) ; if{ mt->ll != mt->12) die ("Matrix not quadratic %d %d", mt->ll, mt->12) ; if ( strcmp ( mt->seql, mt->seq2) != 0) die ("Matrix sequences not identical %s %s", mt->seql, mt->seq2) ; if( strcmp ( mt->seql, score rec->alph) != 0)
die ("Matrix alphabet differs from alph %s %s", mt->seql, score_rec- >alph) ; if ( score_rec->mat ) free ( score_rec->mat) ; score_rec->mat = mt->d; free ( mt->seql) ; free ( mt->seq2) ; free ( mt) ; fclose ( file) ; }
/ -k -λ- * * - ■*■ ■*• -k M TRI
int *seq2idx_seq ( char *seq, score_rec_type *score_rec) { char c [2] ; int i, 1; int *idx_seq; int id;
1 = strlen ( seq) ; idx_seq = calloc{ 1, sizeof ( int)); for( i=0; i<l; i++) { c[0] = seqfi] ; c[l] - 0; id = strcspn ( score_rec->alph, c) ; if ( id >= score_rec->alph_len) { fprintf ( stderr,
"Sorry, the character %c was not found in the alphabet %s\n", seqfi], score_rec->alph) ; id = score_rec->alph_len - 1; } idx_seq[i] = id; } return idx_seq; }
int matidx( int i, int j, int 11, int 12) { if( i<0 ) die ( "matrix index i (%d) less than zero", i) ; if( i>=ll ) die( "matrix index i (%d) too large (%d) ", i, 11); if( j<0 ) die ( "matrix index j (%d) less than zero", j); iff j>=12 ) die( "matrix index j (%d) too large (%d) ", j, 12); return j*ll + i; }
int matsize( int 11, int 12) { return 11*12; }
int m( int i, int j, matrixjtype *mat) { return mat->d[ matidx ( i, j, mat->ll, mat->12)]; }
void print_annot ( char *seq, char *annot) { printf ("%s\n", seq);
printf ( "%s\n" , annot )
char match_sym ( char a, char b, score_rec_type *score_rec ) { int il , i2 ; char * s ; int score; s = strchr ( score_rec->alph, a); if( s == NULL) return ' '; il = s - score_rec->alph; s = strchr ( score_rec->alph, b) ; if( s == NULL) return ' '; i2 = s - score_rec->alph; score = score_rec->mat [ i2*score_rec->alph_len+il] ; if ( score >= score_rec->min_strong_ident_score ) return ' | ' ; else if( score >= score_rec->min_ident_score ) return ' : ' ; else if ( score >= score_rec->min_sim_score ) return ' . ' ; return ' ' ;
/************* ALIGN - self association ****************/
void align_global_tracebac ( FILE *outfile, ρaram_type * param, char *seql, char *seq2, int *path) { iinntt 1 111 = strlen ( seql); int 12 = strlen ( seq2) ; int done = 0; int is = = 0; int ^ j; int il, i2; , char *qseq, *mseq, *tseq; if( 11>0 && 12>0 ) { qseq = calloc ( 11+12+1, sizeof ( char) ) ; mseq = calloc ( 11+12+1, sizeof ( char) ) ; tseq = calloc ( 11+12+1, sizeof ( char) ) ; i = 11 - 1; j = 12 - 1; il = 11 - 1; i2 = 12 - 1; do { if( verbose ) printf ( "%d %d\n", i, j); if( path[j*ll+i] == 0) { if( il >= 0 ) { qseqfis] = seql[il]; il—;
} else { qseq [is] = '-'; mseq [is] = ' ' ;
} if( i2 >= 0 ) { tseqfis] = seq2 [i2] ; i2 — ;
}
10 else { tseqfis] = '-'; mseq [is] = ' ';
}
/*
15 if( tseq[is] == qseq[is] ) { mseqfis] = ' : ' ; } else { mseq[is] = ' ' ; }
*/ mseq[is] = match_sym( tseqfis], qseq[is] , ρaram->score_rec) ; is++;
20 i— ; j--; if( i<0 && j<0 ) { done = 1;
}
25 } else if ( path[j*ll+i] == 2) { qseq[is] = '-'; mseqfis] = ' ' ; tseq[is] = seq2[i2];
30 i2— ; is++; j — ;
} else i ( path[j*ll+i] == 1) {
35 qseq[is] = seqlfil]; il--; mseqfis] = ' tseqfis] = '- is++;
4 400 i—;
} if( i < 0 ) ) ii == 0; if( j < 0 )) jj == 0; if ( verbose )
4 455 printf ( "%d %d %c %c\n", i, j, qseqfis-1], mseq[is-l], tseq[is-l] ) ;
} while ( Idone) ; qseqfis] = 0; mseqfis] = 0;
50 tseqfis] = 0; fprintf ( outfile, "%s\n", reverse ( qseq) ) ; fprintf ( outfile, "%s\n", reverse ( mseq) ) ; fprintf ( outfile, "%s\n", reverse ( tseq));
55 free ( qseq) ; free ( mseq) ; free ( tseq) ;
60 }
int align_local_traceback( FILE *outfile, param_type *param, int 11, int 12, int *idx__seql, int *idx_seq2, char *mask_seql, char *mask_seq2, char *seql, char *seq2, int *mat, int *path) { stackjtype *stack; int i, j, k; pair_type pair; char *annot; matrix_type *mt; int score, maxscore, firstscore=0, thescore=INT_MAX, max_i, max_j ; int njhyp = 0; char *outstr[3]; char *qseq, *mseq, *tseq; stack = init_stack( STACK_CHUNK) ; mt = calloc ( 1, sizeof ( matrix_type) ) ; mt->d = mat; mt->ll = 11; mt->12 = 12; qseq = calloc ( 11+12+1, sizeof ( char)); mseq = calloc ( 11+12+1, sizeof ( char)); tseq = calloc ( 11+12+1, sizeof ( char)); while { thescore >= param->min_score && n_hyp < param->max_res) { maxscore = INTJMIN; for( j=0; j<12; j++) { for( i=0; i<ll; i++) { score = matf matidx ( i, j, 11, 12)]; if( score > maxscore) { maxscore = score; max_i = i; max__j = j ; } } } if ( ! firstscore && maxscore >= param->min_score) firstscore = maxscore; push( stack, setpair ( max_i, max_j ) ) ; thescore = matf matid ( max_i, max_j, 11, 12)]; matf matidx ( max__i, max_j , 11, 12)] = 1; /* Trace back */ k=0; while ( stack->sp > 0 ) { pair = pop ( stack); if(0){ printf ( "pop :" ) ; print_pair ( pair);} i = pair.i; j = pair.j; if( pathf matidx ( i, j, 11, 12) ] == 1 ) { if( i>0 && matf matid ( i-l, j, 11, 12) ] > 0 ) { push ( stack, setpair ( i-l, j)); } qseq[k] = seqlfi]; mseq[k] = ' '; tseqfk] = '-';
k++; } else if ( pathf matidx ( / j , 11 , 12 ] == 2 ) { if( j>0 && matf matidx l i , j -i , 11 , 12 ) ] > 0 ) { push ( stack, setpair ( i/ j -D ) r } qseqfk] = '-'; mseqfk] = ' '; tseqfk] = seq2 [j] ; k++;
} else if( pathf matid ( i, j, 11, 12)] == 0 ) { if( i>0 && j>0 && matf matidx( i-l, j-1, 11, 12)] > 0 ) { pus ( stack, setpair ( i-l, j-1)); } qseqfk] = seqlfi] ; tseqfk] = seq2 [j] ; mseqfk] = match_sym( tseqfk], qseqfk], param->score__rec) ; k++; if (0) printf ( " ( ) %d %d\n", j, i ); } } if ( thescore >= param->min_score ) { n_hyp++; if ( njhyp <= param->max_res) { qseqfk] = 0; mseqfk] = 0; tseqfk] = 0; outstrfO] = reverse ( qseq); outstrfl] = reverse ( mseq); outstr[2] = reverse ( tseq); if ( outfile != NULL ) { fprintf ( outfile, "Score= %d\n", thescore); print_lines ( outfile, outstr, 3) ; }
} annot = empty_str ( 11); free ( annot) } free ( mt ) ; free ( qseq) ; free ( mseq) ; free ( tseq) ; free_stack( stack); return firstscore;
void out_align_fill ( param zype *param, int 11, int 12, int *idx_seql, int *idx_seq2, int *mat, int *path, char *seql, char *seq2) { int i, j, k, max_path; int score, max__score; int h_size = 3; int gap_score; int match factor; /*
0 = match or mismatch, go diagnoally
1 = insertion in query, go horizontally
2 = gap in query, go vertically */ for( j=0; j<12; j++) { for( i=0; i<ll; i++) { if(l)printf( "%d %d\n", i, j); if( pathf matidx ( i, j, 11, 12)] == -1 ) gap_score = param->score_rec->gap_cont; else gap_score = param->score__rec->gap_start; max__score = mat f matidx ( i, j , 11, 12 ) ] - gap_score; max path = -1; if( pathf matidx ( i, j+1, 11, 12)] == -2 ) gap_scσre = param->score_rec->gap_cont; else gap_score = param->score_rec->gap_start; score = matf matidx { i, j+1, 11, 12)] - gap_score; if( score > max_score) { max_score = score; max_path = -2;} if ( i-j <= h_size ) { score = param->score rec->loop_score; } else { if( pathf matidx( i-l, j+1, 11, 12)] == 0 ) match_factor = 1; else match_factor = param->score_rec->match_cont_factor; score = matf matidx ( i-l, j+1, 11, 12)] + match_factor * param->score_rec->mat [ idx_seql [i] * param->score rec->alph len + idx seq2[j]]; } if(0) { printf ( "%d %d %d %d %d %d %d\n", i, j, idx__seql [i] , idx_seq2 [j ] , param->score rec->alph_len, param->score_rec->mat [ idx_seql [i] * param->score_rec->alph_len + idx_seg2 [j] ] , idx_seql[i] * param->sσore_rec->alph_len + idx_seq2 [j ] ) ; } if( score > max_score) { max_score = score; max_path = 0;} for( k=j+l; k<i; k++) { score = matf matidx ( i, k+1, 11, 12)] + matf matidx ( k, j, 11, 12)] - param->score_rec->gap_start; if( score > max_score) { max_score = score; max_path = k; } if(0)printf( "%d %d %d %d %d\n", i, k, k, j, score); } if ( max_score > 0 ) { matf matid ( i, j, 11, 12)] = max_score;
} else { matf matid ( i, j, 11, 12)] = 0; } pathf matidx( i, j, 11, 12)] = max path;
if ( 0 ) { printf ("%c%c %3d %3d score: %3d path: %3d\n", seqlfi], seq2[j] , i, j, max score, max path); } } } }
void align_fill ( param_tyρe *param, int 11, int 12, int *idx_seql, int *idx_seq2, int *mat, int *path, char *seql, char *seq2) { int i, j; char cl[2] , c2[2] ; int il, i2; int sm, si, sd, max_s; int dir; if( verbose ) printf ( "%d %d\n", 11, 12);
/*
0 = match or mismatch, go diagnoally
1 = insertion in query, go horizontally 2 = gap in query, go vertically
*/ for ( j=0; j<12; j ++) { c2 [0] = seq2 [j ] ; c2 [l] = 0; i2 = strcspn ( param->score__rec->alph, c2 ) ; for ( i=0 ; i<ll; i++) { cl [ 0] = seql [i] ; cl f l] = 0; il = strcspn ( param->score_rec->alph, cl) ;
if( i > 0 && j > 0) { sm = mat[(j-l)*ll+(i-l)]; } else { if ( i > 0 ) { sm = param->score_rec->gap_start + (i-l) * param->score rec->gap cont; } else if( j > 0 ) { sm = param->score_rec->gap_start + (j-1) * param->score_rec->gap_cont; } sm = 0; } sm += param->score_rec->mat [ i2*param->score_rec->alph_len+il] ;
/*if( (i > 0) && (path [j*ll+ (i-l)] > 0) ) { */ if ( i > 0 ) { si = mat[j*ll+(i-l) ] ;
} else { si = 0; }
if( (i>0 && path[j*ll+(i-l)] > 0) || ( j>0 && path [ (j-1) *ll+i] > 0)) { si -= param->score_rec->gap_cont; } else { si -= param->score_rec->gap_start; }
/*if( (j > 0) && (pathf (j-1) *ll+i] > 0) ) { */ if( j > 0 ) { sd = matf (j-1) *ll+i] ;
} else { sd = 0; } sd -= param->score_rec->gap_cont; max__s = sm; dir = 0; i ( si > max_s) { max_s = si; dir = 1;
} if ( sd > max_s) { max__s = sd; dir = 2;
} matfj*ll+i] = MAX( max_s, 0) ; path[j*ll+i] = dir;
/*printf( "%s %s %d %d %d %d\n", cl, c2, il, i2, score, mat[j*ll+i] ) ;*/ } } if( 0) { print_mat ( seql, seq2, mat); printf ( "\n"); print_mat ( seql, seq2, path); printf ( "\n"); }
/* printf ( "score= %d\n", mat [11*12-1] ) ; */ }
int align( FILE *outfile, param_type *param, char *seql, char *seq2) { int 11, 12; int *mat, *path; char *mask_seql, *mask_seq2; int *idx_seql, *idx_seq2; int score = INT_MAX;
11 = strlen ( seql) ;
12 = strlen ( seq2) ; mat = calloc( matsize{ 11, 12), sizeof ( int) ) ; path = calloc ( matsize( 11, 12), sizeof ( int) ) ; mask_seql = calloc ( 11 + 1, sizeof ( char) ) ; mask_seq2 = calloc ( 12 + 1, sizeof ( char) ) ; strcpy( mask_seql, seql);
strcpy( mask_seq2, seq2);
if(0) { print_mat ( seql, seq2, mat); printf ( "\n"); if(l) print_mat ( seql, seq2, path) printf ( "\n"); } align_fill ( param, 11, 12, idx_seql, idx_seq2, mat, path, mask_seql, mask_seq2); score = align_local_traceback ( outfile, param, 11, 12, idx_seql, idx_seq2, mask_seql, mask_seq2, seql, seq2, mat, path) ; free ( mat) ; free ( path) ; free ( mask_seql) ; free ( mask_seq2) ; return score; }
/******* Nussinov - secondary structure prediction ******/
int nussinov_local_traceback( FILE *outfile, param__type *param, int depth, int 1, int* idx_seq, char *mask_seq, char *seq, int *mat, int *path) { stack_type *stack; int i, j, k; pair_type pair; char *annot; matrix_type *mt; char *mask; int score, maxscore, thescore=INT_MAX, max_i, max_j ; int maxpath; int overlap = 0; int njhyp = 0; char *outstr [2] ; stack = init stack ( STACK CHUNK); mt = callocf 1, sizeof ( matrix type) ) ; mt->d = mat; mt->ll = l; mt->12 = 1; annot = empty_str ( 1) ; mask = empty str ( 1) ; while ( thescore >= param->min_score && njhyp < param->max_res) { maxscore = IMJMIN; for( j=0; j<l; j++) { if( maskfj] == ' ' ) { for( i=j; Kdepth + j + 1 && i<l; i++) { if( maskfi] == ' ' ) { score = matf matidx ( i, j, 1, 1) ] ; if ( score > maxscore) {
maxscore = score; max_i = i ; max j = j ;
} push ( stack, setpair ( max_i, max_j ) ) ; maskf max_i] = 's'; maskf ma _j ] = ' s1;
/* Trace back */ while ( stack->sp > 0 ) { pair = pop ( stack) ; if (0) { printf ( "pop " ); print pair ( pair);} i = pair.i; j = pair.j; if ( i > j ) { if ( pathf matidx ( i, j , 1, 1) ] == -1 ) { if ( (overlap | | mas [i-l] == ' ') && mat [ matidx ( i-l, j, 1, 1)] > 0) { push ( stack, setpair ( i-l, j)); } mask[i] = 'i ' ; } else if( path [ matidx ( i, j, 1, 1) ] == -2 ) { if( ( overlap || mask[j+l] == ' ') && mat [ matid ( i, j+1, 1, 1)] > 0) { push ( stack, setpair( i, j+1)); } maskfj] = ' d'; } else if ( pathf matidx ( i, j, 1, 1)] == 0 ) { if( ( overlap || (mask [i-l] == ' ' && mask[j+l] =- ' ' ))
&& matf matid ( i-l, j+1, 1, 1)] > 0 ) { push ( stack, setpair ( i-l, j+1));
} if ( param->score_rec->match_threshold <= param->score_rec->mat [ idx_seq[i] * param->score_rec->alph_len idx_seq[j]]) { annot [i] = annot [j] =
} else { annot [i] = annot [j] =
} mask_seq[i] = mask_seq[j] = maskfi] = 'm'; maskfj] = 'm'; if (O)printf ( " ( ) %d %d\n", j, i ); else { k = pathf matidx ( i, j, 1, 1) ] ; push( stack, setpair ( k , j)); push( stack, setpair ( i , k+1));
} } } if (0) { /* Trace forward */ push( stack, setpair ( max_i, max_j ) ) ; while ( stack->sp > 0 ) { pair = pop ( stack); if(l){ printf ( "pop :" ); print_pair ( pair);} i = pair.i; j = pair.j; if ( i<depth + j && i<l-l && j>0 ) { maxscore = 0; maxpath = -3; if( path[ matidx ( i+l, j, 1, 1)] == -1 && ( overlap II mask[i+l] == ' ' ) && matf matidx ( i+l, j, 1, 1)] >= maxscore ) { maxscore = matf matidx ( i+l, j, 1, 1) ] ; maxpath = -1; } if( pathf matidx( i, j-1, 1, 1)] == -2 &&
( overlap | | mask[j-l] == ' ' ) && mat [ matidx ( i, j-1, 1, 1)] >= maxscore ) { maxscore = mat [ matidx ( i, j~l/ 1, 1) ] ! maxpath = -2;
} if( path[ matidx ( i+l, j-1, 1, 1)] == 0 && ( overlap || (mask[i+l] =- ' ' && mask[j-l] == ' ')) && mat [ matidx ( i+l, j-1, 1, 1)] >= maxscore ) { maxscore = matf matidx ( i+l, j-1, 1, 1) ] ; maxpath = 0; } if( pathf matidx( i+l, j-1, 1, 1)] > 0 && mat [ matid ( i+l, j-1, 1, 1)] >= maxscore ) { maxscore = mat [ matidx ( i+l, j-1, 1, 1)]; maxpath = pathf matidx( i+l, j-1, 1, 1)]; } if ( maxpath == -1 ) { push( stack, setpair ( i+l, j)); mask[i+l] = ' I ' ; } else if ( maxpath == -2 ) { push ( stack, setpair ( i, j-1) ) ; mask[j-l] = 'D';
} else if ( maxpath == 0 ) { push ( stack, setpair ( i+l, j -1) ) ; if ( param->score_rec->match_threshold <= param->score_rec->mat [ idx_seq [i+l] * param->score_rec- >alph_len + idx_seq [j -l] ] ) { annot [i+l] = ' ) ' ; annot [j -1] = ' ( ' ; } else { annot [i+l] = ' . ' ; annot [j-1] = ' • ' ;
}
mask [i+l ] = ' M ' ; mask [j -l ] = ' M ' ; if ( 0 ) printf ( " ( ) %d %d\n", j, i ); else if( maxpath > 0 ) { k = maxpath; push( stack, setpair ( k , j-D); push( stack, setpair ( i+l , k+1)
}
}
} thescore = mat [ matidx ( max_i, max_j, 1, 1) ] ; if ( thescore >= param->min_score ) { n_hyp++; if ( njhyp <= param->max_res ) { if(0)printf( "%s\n", seq ); if (0) printf ( "%s\n", annot ); outstr [0] = seq; outstr [1] = annot; if ( outfile != NULL ) { fprintf ( outfile, "Score= %d\n", thescore); print_lines ( outfile, outstr, 2);
}
}
} free ( annot) ; annot = empty_str ( 1)
free ( mt) ; free ( annot ) ; free ( mask) ; free_stack( stack) return thescore;
char *nussinov_traceback( FILE *outfile, param_type *param, int depth, int 1, int* idx_seq, char *seq, int *mat, int *path) { stack_type *stack; int i, j, k, d; pair_type pair; char *annot; matrix zype *mt; int gap__scorel, gap_score2; stack = init_stack ( STACK_CHUNK) ; mt = calloc ( 1, sizeof ( matrix_type) ) ; mt->d = mat; mt->ll = 1; mt->12 = 1; annot = empty_str( 1) ; for ( d=depth; d<l; d++) { push( stack, setpair ( d, d-depth));
while ( stack->sp > 0 ) { pair = pop ( stack); if(0){ printf ( "pop :" ) ; print_pair ( pair);} i = pair.i; j = pair.j; if ( i > j ) { if( pathf matidx ( i, j , 1, 1) ] =- -1 ) { push ( stack, setpair ( i-l, j)); } else if( pathf matidx ( i, j, 1, 1) ] == -2 ) { push( stack, setpair( i, j+1));
} else if( path[ matidx ( i, j, 1, 1)] == 0 ) { if ( param->score__rec->matchJthreshold <= param->score_rec->mat [ idx_seq[i] * param->score_rec- >alph_len + idx_seq[j]]) { annot [i] = ' ) ' ; annot [j] = ' ('; } else { annot [i] = ' . ' ; annot [j] = '.';
} if(0)printf( " ( ) %d %d\n", j, i ); push( stack, setpair ( i-l, j+1)); } else { k = path[ matidx ( i, j, 1, 1) ] ; push ( stack, setpair ( k , j)); push ( stack, setpair ( i , k+1)); } if (0){ if( pathf matidx ( i-l, j, 1, 1)] == -1 ) gap_scorel = param->score_rec->gap_cont; else gap_scorel = param->score_rec->gap_start; if ( path[ matidx ( i, j+1, 1, 1)] == -2 ) gap_score2 = param->score_rec->gap_cont; else gap_score2 = param->score_rec->gap_start; if( m( i-l, j, mt) == m( i, j, mt) - gap_scorel) { if(0)printf( " mat: -1 %d %d\n", m( i-l, j, mt) , m( i, j, mt) ) ; push( stack, setpair ( i-l, j));
} else if( m( i, j+1, mt) == m( i, j, mt) - gap_score2) { if(0)printf( " mat: -2 %d %d\n", m( i-l, j, mt) , m{ i, j, mt) ) ; push ( stack, setpair( i, j+1));
} else if( (mat [ matidx ( i-l, j+1/ 1» 1)] + param->score_rec->mat [ idx_seq[i] * param->score_rec- >alph_len
+ idx_seq[j] ] ) mat [ matidx ( i, j, 1, 1)]) { if(0)printf( " mat: 0 %d %d\n", m( i-l, j, mt) , m( i, j, mt) ) ; annot [i] = ') '; annot [j] = ' ( ' ;
if (0) printf ( " ( ) %d %d\n", j, i ); push( stack, setpair ( i-l , j+1));
} else { for( k=j+l; k<i; k++) { if (0) printf ( " %d %d %d %d\n", i, k, k, j); if( mat [ matidx ( i, k+1, 1, 1) ] + mat [ matidx ( k, j, 1,
1) mat [ matidx ( 1, Dl) { if (O)printf ( mat: %d -d %d\n", k, m( i-l, j, mt) , m( i, mt)) push( stack, setpair ( k , j)); push ( stack, setpair ( i , k+1)); break; }
} } fprintf ( outfile, "Score = %d\n", matf matidx ( d, d-depth, 1, 1) ] ) ; printf ( "%d>%s\n", d, seq ); printf ( "%d>%s\n", d, annot ); free ( annot); annot = empty_str ( 1) ;
} free_stack ( stack); return annot;
void nussinov_fill ( paramjcype *param, int depth, int 1, int *idx_seq, int *mat, int *path, char *seq) { int d; int i, j, k, max_path; int score, max_score; int h_size = 3; int gap_score; int match_factor; for( d=0; d<depth; d++) { for( i=d+l; i<l; i++) { if(0)printf( "%d %d %d\n", d, i, depth); j = i - 1 - d; if ( path f matidx ( i-l, j , 1, 1 ) ] == -1 ) gap_score = param->score_rec->gap_cont; else gap_score = param->score_rec->gap_start; max_score = matf matidx( i-l, 30 1, 1)] - gap_score; max_path = -1; if ( path f matid ( i, j+1, 1, 1 ) ] == -2 ) gap_score = param->score_rec->gap_cont; else gap_score = param->score_rec->gap_start; score = matf matidx ( i, j+1, 1, 1)] - gap_score; if( score > max_score) { max_score = score; max_path = -2;}
if ( i-j <= h_size ) { score = param->score_rec->loop_score; } else { if( pathf matidx ( i-l, j+1, 1, 1)] == 0 ) match_factor = 1; else match_factor = param->score_rec->match_cont_factor; score = matf matidx ( i-l, j+1, 1, 1)] + match_factor * param->score_rec->mat [ idx_seq[i] * param->score_rec->alph Len + idx_seq[j] ] ;
} if(0)printf( "%d %d %d %d %d %d %d\n", i, j, idx_seq[i], idx_seq[j] , param->score_rec->alph_len, param->score_rec->mat [ idx_seq[i] * param->score_rec->alph_len + idx seqfj]], idx seqfi] * param->score rec->alph len + idx seqfj]); _ _ _ if ( score > max_score) { max_score = score; max_path = 0;} for( k=j+l; k<i; k++) { score = ma f matidx ( i, k+1, 1, 1)] + matf matidx ( k, j, 1, 1)] - param->score_rec->gap_start; if ( score > max^score) { max_score = score; max_path = k; } if (0) printf ( "%d %d %d %d %d\n", i, k, k, j, score); } if ( max_score > 0 ) { matf matidx ( i, j, 1, 1)] = max_score; } else { matf matid ( i, j, 1, 1)] = 0; } pathf matidx( i, j, 1, 1)] = max_path; if (O)printf ("%c%c %3d %3d score: %3d path: %3d\n", seqfi], seqfj], i, j, max__score, max_path) ; } } }
int nussinov( FILE *outfile, paramjtype *param, char *seq) { int 1, i; int *mat, *path; int *idx_seq; int score = INTJMAX; char *mask_seq; int max_score = 0; int depth = param->depth;
1 = strlen ( seq) ; if( depth > 1-1 ) { depth = 1-1;} mat = calloc ( matsize ( 1, 1), sizeof ( int)); path = calloc ( matsize ( 1, 1), sizeof ( int)); mask_seq = calloc ( 1+1, sizeof ( char)); strcpy( mask_seq, seq) ;
if (0)
print_mat ( seq, seq, mat) ; printf ( "\n"); if (0)print_mat ( seq, seq, path); printf ( "\n");
} if( 0) { idx_seq = seq2idx_seq( seq, param->score_rec) ; nussinov_fill ( param, depth, 1, idx_seq, mat, path, seq); nussinov_traceback( outfile, param, depth, 1, idx_seq, seq, mat, path) ; } else { i = 0; while ( score > param->min_score && i < param->max_res) { idx_seq = seq2idx_seq( mask_seq, param->score_rec) ; nussinov_fill ( param, depth, 1, idx_seq, mat, path, mask_seq) ; score = nussinov_local_traceback( outfile, param, depth, 1, idx_seq, mask_seq, seq, mat, path) ; i++; if ( score > max_score) max_score = score; } if ( outfile != NULL ) { printf ( "Mask:\n"); if ( score >= param->min_score) print_lines ( outfile, &mask_seq, 1); } }
free ( mat) ; free ( path) ; free ( idx_seq) ; free ( mask_seq) ; return max score;
/********* OLIGO ******************/ void calc_olig( int id, char *seq, char *rev_seq, param_type *param) { int score_self, scoreJtarget, score_struc; strcpy ( rev_seq, seq) ; reverse ( rev_seq) ; score_self = align ( NULL, param, seq, rev_seq) ; lower ( rev_seq) ; comp ( rev_seq) ; reverse ( rev_seq) ; score_target = align ( NULL, param, seq, rev_seq) ; score_struc = nussinov ( NULL, param, seq) ; printf ( "%7d %s %7d %7d %7d\n", id, seq, score_self, score_target, score_struc) ; }
void allolig( FILE *outfile, int oligolen, int oligosample, int spikefreq, param_type *param) { char alph [255] = "acgtACGT";
long n; char *seq, *rev_seq; long j ; rev_seq = empty_str ( oligolen) ; par am->min_s core = 2 ; param- >max_res = 1 ; if ( oligosample ) { f or ( j=0 ; j <oligosample; j++) { seq = make_random_nmer ( oligolen, spikefreq, alph) ; calc_olig ( j , seq, rev_seq, param) ; free ( seq) ; }
} else { n = n_nmer ( oligolen, alph) ; for( j=0; j<n; j++) { seq = make_nmer ( j , oligolen, alph) ; calc_olig ( j , seq, rev_seq, param) ; free ( seq) ; } } free ( rev_seq) ;
/******* ARGUMENT PARSING AND INITIALIZATION **************/
void add_seq2seqs ( char *seq, sequences_type *seqs) { int new_size, i; if ( seqs->nseq < seqs->maxnseq ) { seqs->seqs [ seqs->nseq++] = seq; } else { new_size = (seqs->maxnseq + SEQS_CHUNK) * sizeof ( char *); if( seqs->seqs = realloc ( seqs->seqs, new_size) ) { seqs->maxnseq = seqs->maxnseq + SEQS_CHUNK; for ( i=seqs->nseq; i<seqs->maxnseq; i++) seqs->seqs [i] = NULL; seqs->seqs [ seqs->nseq++] = seq; } else { die ( "Failed to realloc seq array to %d bytes", new_size) ; } } }
void read_fasta ( FILE *seqfile, sequences_type *seqs, char *alph) { int c; char name [255] ; char comment [255] ; char s [255] ; int seq_flag = 0; dynamic_str_type *seq = NULL; while ( (c = fgetc( seqfile) ) != EOF) { if( c == '>') {
if ( seq != NULL && seq->len > 0 ) add_seq2seqs ( seq->s, seqs); free ( seq) ; seq = new_dynamic_str_type ( ) ;
} fgets ( s, 254, seqfile) ; sscanf( s, "%s%s", name, comment); seq_flag = 1; } else if ( seq_flag == 1 ) { if ( strchr ( alph, c) != NULL ) { seq = addchar ( seq, c) ;
} } } if ( seq != NULL && seq->len > 0 ) add seq2seqs ( seq->s, seqs); free ( seq) ; }
paramjcype *init ( ) { char a [255] ; int i, j; param_type *param; int *mat; tdefine ALPHLEN 9 int mathyp init [ALPHLEN] [ALPHLEN] :
{ "3, -3, -3, 3, -5, -5 -5 5 -99},
{ -3, -3, 5, -3, -5, -5 9 -5 -99},
{ -3, 5, -3, 1, -5, 9 -5 2 -99},
{ 3, -3, 1, -3, 5, -5 2 , -5 - -99},
{ -5, -5, -5, 5, -8, -8 -8 8 -99},
{ "5, -5, 9, -5, -8, -8 14 -8 r -99},
{ -5, 9, -5, 2, -8, 14 -8 4 - -99},
{ 5, -5, 2, -5, 8, -8 4 -8 -99},
{-99,- -99,- -99,- -99,- -99, -99 -99 r-99 r -99}
param = calloc ( 1, sizeof ( paramjtype)
/* alph */ strcpy( a, "acgtACGT-") ; param->alph = calloc ( strlen ( a), sizeof ( char)); strcpy( param->alph, a);
/* seqs */ param->seqs calloc ( 1, sizeof ( sequences iype) ) ; param->seqs->nseq 0; param->seqs->maxnseq SEQSjCHUNK; param->seqs->seqs calloc ( param->seqs->maxnseq, sizeof ( char param->seqs2 calloc ( 1, sizeof { sequences_type) ) ; for ( i=0; i<param->seqs->maxnseq; i++) param->seqs->seqs [i] = NULL; param->seqs2->nseq = 0; param->seqs2->maxnseq = SEQSJ-HUNK; param->seqs2->seqs = calloc ( param->seqs2->maxnseq, sizeof char *) ) for ( i=0; i<param->seqs2->maxnseq; i++) param->seqs2->seqs [i] = NULL;
/* hybridization matrix */ mat = calloc ( ALPHLEN * ALPHLEN, sizeof ( int) for ( i=0; KALPHLE ; i++) { for( j=0; j<ALPHLEN; j++) { mat [i*ALPHLEN+j ] = mathyp_init [i] [ j ] ;
} }
/* defaults */ param->depth = INTJMAX; param->min_score = 20; param->max_res = INTJMAX;
param->score_rec = init_score_rec_type ( 14, 7, -40, 2, 1, 8, 3, 1, param->alph, mat) ; validate_score_rec_type ( param->score_rec, -200, 200); return param; }
***************** j[AiN *****************/
int main (int argc, char* argv[]) { int i; char *rev seq; param_ type *param; int algo = 0 ; char *options; int next option; char* outfn = NULL; char* infn = NULL;
FILE *outfile, *seqfile; int othermethod = 0; int oligolen = 7; int oligosample = 0; int spikefreq = 0; const char* const short_options = "hvruayto:i:j : s: z:d:n:p:m:l:e: f: "; const struct option long_options [] = { { "help", 0, NULL, *h' } ,
"verbose", 0, NULL, 'V }, "version", 0, NULL, 'r' }, "secondary structure", 0, NULL, 'u'
"self_anealing", o, NULL, 'a'
"hybridization" , o, NULL, 'y>
"tm", 0, NULL, *t« },
"output", 1, NULL, '0'
"input", 1, NULL, 'i'
"seq", 1, NULL, 's'
"seq2", 1, NULL, 'z'
"depth", 1, NULL, 'd'
"mm_score" , 1, NULL, 'n' },
"max_res " , 1 , NULL, 'p' },
"matrix" , 1, NULL, 'm' },
"allolig" , 1 , NULL, '1' },
"sample" , 1, NULL, 'e' },
" spikefreq" , 1, NULL, 'f },
NULL, o, NULL, 0 } /* Required at end of array.
};
/* mtrace () ; debug memory leaks under linux */ param = init ( ) ; program_name = argv[0]; do { next_option = getopt_long (argc, argv, short_options, long_options, NULL) ; switch (next_option)
{ case 'h': /* -h or —help */ usage (""); case 'i': /* -i or —input */ infn = optarg; seqfile = sfopen( infn, read_fasta( seqfile, param->seqs, param->alph) , fclose ( seqfile); break; case 'j': /* -j or —input2 */ infn = optarg; seqfile = sfopen( infn, "r"); read_fasta ( seqfile, param->seqs2, param->alph) ; fclose ( seqfile); break; case 'o': /* -o or —output */ outfn = optarg; break; case 'm' : /* -m or —matrix*/ options = optarg; read_score_rec_type ( options, param->score_rec) ; break; case 's': /* -s or —seq*/ options = optarg; add_seq2seqs ( options, param->seqs) ; break; case 'z': /* -z or —seq2*/ options = optarg; add_seq2seqs ( options, param->seqs2) ; break; case 'd': /* -d or —depth*/ options = optarg; param->depth = atoi ( options); break;
case 'p' : /* -p or —max_res*/ options = optarg; param->max_res = atoi ( options); break; case 'n' : /* -n or —min__score*/ options = optarg; param->min_score = atoi ( options); break; case 'v': /* -v or — erbose */ verbose += 1; break; case 'r' : /* -r or —version */ printf ( "dyp version %s\n", THEVERSION) ; break; case 'u': /* -s or —secondary_structure */ algo = algo | SECONDARYJ3TRUCTURE; break; case 'a' : /* -s or —self_anealign */ algo = algo | SELF_ANEA ING; break; case 'y' : /* -s or —self_anealign */ algo = algo | HYBRIDIZATION; break; case 't': /* -t or —tm */ algo = algo | TM; break; case '1': /* -1 or —allolig */ options = optarg; oligolen = atoi ( options); othermethod = 1; break; case 'e': /* -p or —sample */ options = optarg; oligosample = atoi ( options); break; case 'f: /* -f or —spikefreq */ options = optarg; spikefreq = atoi ( options); break; case '?': /* The user specified an invalid option. */ usage ("Sorry, one of the options was not recogniced"); case -1: /* Done with options. */ break; default: /* Something else: unexpected. */ abort ( ) ; }
while (next_option != -1) ;
if( outfn == NULL ) outfile = stdout; else outfile = sfopen( outfn, "w"); if( verbose) print_score_rec_type ( param->score_rec) ;
if( othermethod) { allσlig( outfile, oligolen, oligosample, spikefreq, param); } else { if( !algo ) usage ( "please select a method: u, a or y") ; for ( i=0; i<param->seqs->nseq; i++) { fprintf ( outfile, ">sequence_%03d\n", i+l); if( algo & SECONDARY_STRUCTURE ) nussinov ( outfile, param, param->seqs->seqs [i] ) ; if( algo & SELF_ANEALING ) { rev_seq = calloc ( strlen ( param->seqs->seqs [i] ) , sizeof ( char)); strcpy( rev_seq, param->seqs->seqs [i] ) ; reverse ( rev_seq) ; iff param->min_score < 2 ) { param->min score = 2; } align ( outfile, param, param->seqs->seqs [i] , rev__seq) ; free ( rev__seq) ;
} if( algo & TM ) { rev_seq = calloc ( strlen ( param->seqs->seqs [i] ) , sizeof ( char)); strcpy( rev_seq, param->seqs->seqs [i] ) ; lower ( rev_seq) ; comp ( rev_seq) ; if( param->min_score < 2 ) { param->min_score = 2;
} align ( outfile, param, param->seqs->seqs [i] , rev_seq) ; free ( rev_seq) ;
} if( algo & HYBRIDIZATION ) { if ( param->seqs2->seqs [i] == NULL) die ( "Sequence 2 is missing"); rev_seq = calloc ( strlen ( param->seqs2->seqs [i] ) , sizeof ( char) ) ; strcpy( rev__seq, param->seqs2->seqs [i] ) ; reverse ( rev_seq) ; align ( outfile, param, param->seqs->seqs [i] , rev_seq) ; free ( rev_seq) ; } } } fclose( outfile); free_param_type ( param) ; return 0;
C. The expression_array_param file contains parameters used by the oligod program
<PARAMETER> oligo_length 30 blastdb /home/probe/database/bf/c_elegans mindist 30
#blast_param -e 50 -F F -a 4 blast_param wordlen 11 blast_param strand both strands blast_param expect 5 blast_param nproc 2 blast param filter cfreq 3 cphase 0 end_spike_len 0 dnaconc 2000 saltconc 115 max noligo 10 maxhits 8 min score 35
#perceptron parameters oligod_param max__match cutoff 30 oligod_param max match squash dx 5 oligod_param max match squash dy 0.9 oligod_param max__match weight 1 oligod_param max stretch cutoff 20 oligod_param max stretch squash_dx 2 oligod_param max_stretch squash dy 0.9 oligod_param max__stretch weight 1 oligod_param self hyp cutoff 25 oligod_param selfjhyp squash dx 10 oligod_param self hyp squash dy 0.9 oligod_param self hyp weight 1 oligod_param self match cutoff 50 oligod_param self match squash dx 10 oligod_param self match squash dy 0.9 oligod__param self match weight 0 oligod_param tm_min cutoff oligod_param tm_min squash dx 2 oligod_param tm_min squash_dy 0.9 oligod_param tm_min weight 1 oligod_ρaram tm max cutoff 95 oligod_param tm_max squash dx 2 oligod_param tm_max squ sh_dy 0.9 oligod__param tm max weight 1 oligod_param tm cutoff 1.7 oligod_param tm squash dx 0.1 oligod_ρaram tm squash dy 0.9 oligod_jparam tm weight 5 oligod_param tm dan min cutoff 50 oligod param tm_dan_min squash dx 2
oligod_param tm_dan_min squash dy 0.9 oligod_param tm_dan_min weight 0 oligod_param tm_dan_max cutoff 95 oligod_param tm_dan_max squash_dx 2 oligod_param tm_dan_max squash_dy 0.9 oligod param tm__dan_max weight 0 oligod param tm__dan cutoff 1 oligod_param tm_dan squash_dx 0.1 oligod_param tm_dan squash_dy 0.9 oligod param tm_dan weight 0 oligod__param eg weight 0 oligod_param ggg weight 0 oligod param hit score weight 0
</PARAMETER>
D. The code for a Tm prediction program.
// $Id: Tm predictor . Java, v 1.6 2002/10/17 14:12:50 jgk Exp $
package tmpred;
import j ava . io . * ; import java.util.*; import java.lang.*; import Java. text.*;
// Compile with: // /usr/javal.2/bin/javac -classpath . : /usr/javal .2/ tmpred/Tm_predictor . j ava
// GCC compile:
// gcj —main=tmpred.Tm_ρredictor tmpred/Tm_predictor. Java tmpred/Tm_thermodynamic_model . java tmpred/BadOligoException. java Tm_predictor public abstract class Tm__predictor { public abstract double calcTm( String sequence ) throws BadOligoException; public abstract String lastSequence () ; public abstract Double lastTm(); public abstract String getParameter ( String paramName ) ; public abstract void setParameter ( String paramName, String value ); public abstract Enumeration getParameterNames () ; public static String getVersion () { return "$Id: Tm_ρredictor. java, 1.6 2002/10/17 14:12:50 jgk Exp $"; }
/** Singleton instance. */ private static Tm_predictor instance = null; /** Return a singleton instance of the default implementation. *
* (-author Kim Haagensen */ public static Tm_predictor getTm_predictor ( ) { if ( instance == null ) { instance = getlnstance () ; } return instance; }
/** Return an instance of the default implementation. */ public static Tmjpredictor getlnstance ( ) { // Default model (for now..) return new Tm_thermodynamic_model ( ) ; }
// For demo purposes public static void main (String argsf]) { Tm_predictor tmmodel=Tm_predictor. getlnstance () ;
Enumeration pnames=tmmodel . getParameterNames ( ) ;
String paramName;
// tmmodel. setParameter ("debug", "true") ;
System. ou .println ("Version: "+tmmodel . getVersion ( ) ) ; System. out.println() ;
System. out .println ("Constants: ") ; while (pnames .hasMoreElements ( ) ) { paramName= (String) pnames . nextElement ( )
System. out .println (paramName+"="+tmmodel . getParameter (paramName) ) ; }
System. out. println () ;
System. out .println ("Results: ") ;
System. out .println ("Input Sequence="+args [0] ) ; try{
System. out. println ("Tm="+tmmodel.calcTm(args [0] ) ) ; System. out .println ( "Translated Sequence : "+tmmodel . lastSequence ( } ) ; } catch (BadOligoException expt) {
System. out.println ("Error: "+expt . getMessage ( ) ) ; System. out.println ("Tm= "+tmmodel. lastTm( ) ) ; System. out .println ( "Translated Sequence : "+tmmodel . lastSequence ( ) ) ; } } }
E. The code for a Tm thermodynamic model.
package tmpred; import java.io.*; import java.util.* import java.lang.* import Java. text.* class Tm_param { public Hashtable table_deltaS_init, table_deltaH_init, table_H__nn, table_S_nn, table H mono; }; -~ class Tm_thermodynamic__model extends Tm_predictor { private Tm_param modpara [ ] ; private Hashtable table_LD_translate, table_ws_translate, table_ten_translate; private Properties parameterProperties,- private String sequence; // Last translated sequence private Double tm; // Last calculated Tm
// Get methods public String lastSequence () { return sequence; } public Double lastTm() { return tm; } // Property methods public String getParameter (String paramName) { return (String) arameterProperties . get (paramName) ;
} public void setParamete (String paramName, String value) { parameterProperties.put (paramName, value);
} public Enumeration getPararαeterNarαes ( ) { return parameterProperties . propertyNam.es ( ) ;
}
// Utility functions: private double hget (Hashtable h. String key) {
// get double value from hashtable, zero if it wasn't there. if (key!=null) {
Double val= (Double)h. get (key) ; if (val != null) { return val.doubleValue ( ) ; } else { return 0.0;
} } return 0.0;
private void hcount (Hashtable h. String key) {
// Count entry in h(key) up one, if it exists. if (key!=null) {
Double val= (Double) h.get (key) ; if (val!=null) { h.put (key, new Double (val .doubleValue () +1) ) ; } else { h.put (key, new Double (1));
private String trans_LD (String seq) { int i; String result=""; for(i=0; i<seq. length () ; i++) { result=result + (String) table_LD_translate. get (seq. substring (i, i+l) ) ;
} return result;
private String trans_ws (String seq) { int i; String result=""; for(i=0; i<seq. length () ; i++) { result=result + (String) table_ws_translate. get (seq. substring (i, i+l) ) ; } return result;
private String findReplace (String inpseq, String find, String replace) { String subseq; String seq=inpseq; int i; for(i=0; i<seq. length () -find. length() +1; i++) { subseq=seq. substring (i, i+find. length () ) ; if (subseq. equals (find) ) { seq=seq. substring (0, i) +replace+seq. substring (i+find. length ( ) ) ;
} } return seq;
private void checkSequence (String sequence) throws BadOligoException { int goodlength=0; int i; char on;
String badletters="" ; for(i=0; i<sequence. length () ; i++) { mon=sequence.charAt (i) ; switch (mon) { case 'a' case •g' case 'c' case 't' case 'A': case 'G': case 'C : case : ggooodlength++; break; default: if (badletters=="") { badletters=""+mon;
} else { badletters=badletters+" , "+mon; }
if (sequence. length () !=goodlength) { this. tm=null; // this. sequence=""; throw new BadOligoException ("Wrong monomer (s) in sequence:
"+badletters) ; } } Tm_thermodynamic_model () { parameterProperties=new Properties () ; parameterProperties.put ("debug", "false") ; // Set to true if debugging parameterProperties.put ("oligo_conc", "0.000002") ; // =2 micromolar, this is the total strand concentration, when the two complements are in 1 uM each parameterProperties. ut ("salt__conc", "0.115") ; // =115mM, Exiqon most common concentration sequence=null; tm=null;
// Translate tables: table_LD_translate=ne Hashtable () ; table_LD_translate .put ( "a", "D") table_LD_translate.pu ("g", "D") table_LD_translate .put ("c", "D") tableJD -ranslate. ut ( "t", "D") table_LD_translate.put ("A", "L") table_LD_translate .put ( "G", "L") tableJ-DJ-ranslate. ut ("C", "L") table LDJ-ranslate.put ("T", "L") table_ws_ translate =new Hashtable ( ) table_ws_ translate .put ("a", "w") table_ws_ translate putC'g", "s") table_ws_ translate put("c", "s") table_ws_ translate .put("t", "w" ) table_ws translate put ("A", "W") table_ws_ translate ,put("G", "S") table_ws translate ,put("C", "S") table ws translate .putC'T", "W") table ten_translate=new Hashtable () tabled ten_translate .put "aa", -aa table" ten_translate.put "tt", -aa" table_ ten_translate .put "at", -at" table_ ten_translate.put "ta", -ta" table_ ten_translate.put "ca", -ca" table ten_translate.put "tg", -ca" table^ ten_translate.put "gt", -gt" table_ ten_translate.put "ac", -gt" table ten -ranslate.put "ct", -ct" table" ten_translate .put "ag", -ct" tabled ten_translate.put "ga", -ga" table_ ten -ranslate .put "tc", -ga" table ten_translate.put "eg", -eg" table" ten -ranslate .put "gc", -gc" table" ten -ranslate.put "gg", -gg" table" ten_translate.put "cc", -gg"
//
// Model Parameters
// modparam=new Tm_param[7] ;
// Parameterfil: /home/jgk/projects/tmpred/para /param. tmp
modparam[0]=new Tm_param(); modparamfO] . table_deltaS_init=new Hashtable () ; modparam[0] . table_deltaH_init=new Hashtable (); modparamfO] . tableJ-_nn=new Hashtable () ; modparamfO] . table_S_nn=new Hashtable () ; modparamfO] . table_H_mono=new Hashtable ();
// deltaS_init modparamfO] . table_deltaS_ init.put ("A", new Double (2.63578811043473) ) ; modparam[0] . table_deltaS_ init.put ("C", new Double(-4.60854049293735) ) modparamfO] . table_deltaS init.put ("G", new Double (-4.56687267801977)) modparam[0] . table_deltaS_ init.put ("T", new Double(2.7258379273728) ) ; modparamfO] . table_deltaS_ init.put ("s", new Double (-2.8) ) ; modparamfO] .table deltas init.put ("w", new Double (4.1) ) ;
// deltaH init modparamfO] table deltaH_init.put ( "A", new Doubled.41884851398012)); modparamfO] table" "deltaH_init.put ("C", new Double (-2.01835247107257)) modparamfO] . table^"deltaH_init.put ("G", new Double (-0. 40703588206669) ; modparamfO] table deltaH_init.put ("T", new Double (1.19531513535407) ) ; modparamfO] table" deltaH_init.put ("s", new Double (0.1) ) ; modparamfO] table" deltaH_init.put ( " ", new Double (2.3) ) ;
// H__mono modparamfO] table_H_mono.put ( "A", new Double(-15.5177140147424)) , modparamfO] table_H_mono .put ("C", new Double (-17.840200113071) ) ; modparamfO] table_H_mono.put ("G", new Double(-18.2150417538429)), modparamfO] table Jmono .put ("T new Double(-16.4741049031613) ) modparamfO] table H mono.put ("s new Double (-9.36) ) ; modparamfO] .table H mono.put ("w", new Double (-7.35) ) ;
// H_nn modparamfO table_ H_nn put "-aa", new Double (-0.55)); modparamfO table_ H_nn .put "-at", new Double (0.15)); modparamfO table_ H_nn put "-ca", new Double (-0.145)) ; modparamfO table" H_nn .put "-eg", new Double (-1-24)); modparamfO table_ H_nn .put "-ct", new Double (0.555) ) ; modparamfO table H_nn .put "-ga", new Double (0.155) ) ; modparam[0 table_ H_nn ■ put "-gc", new Double (-0.44)); modparamfO table Η_nn .put "-gg", new Double (1.36)); modpara fO table" H_nn .put "-gt", new Double (-0.045)); modparamfO table H_nn .put "-ta", new Double (0.15)); modparamfO table_ H_nn .put "SS", new Double ( -0.130528857633394 modparamfO table_ H_nn .put "S ", new Double ( -1.0637157585591)) modparamfO table H_nn .put "Ss", new Double ( -0.357926846466049 )) modparamfO table] H_nn .put "Sw", new Double ( 0.379281367679472 )). modparamfO table H_nn .put "WS", new Double { -0.405577627528957 ); modparamfO table H_nn .put "W ", new Double ( -0.726103393389269 ); modparamfO table_ H_nn .put "Ws", new Double ( 1.04679214548348) ) modparamfO table_ H_nn .put "Ww", new Double ( 0.0429919627021588 )); modparamfO table H_nn .put "sS", new Double ( 0.543872586888427) modparamfO table"Η_nn .put "sW", new Double ( 0.575455160673995 )), modparamfO table" H_nn .put "wS", new Double ( 0.865853023978453) modparam[0 table" H nn .put "wW", new Double ( 0.789896125879916)
// S_nn modparamfO table_S_ nn. ut "-aa", new Double (-22.2) ) modparam[0 table_s nn.put "-at", new Double (-20.4) ) modparamfO table_S_ nn.put "-ca", new Double (-22.7) ) modparamfO table_s nn.put "-eg", new Double (-27.2) ) modparamfO table_S_ nn.put "-ct", new Double (-21) ) ; modparamfO table_S nn.put "-ga", new Double (-22.2) ) modparamfO table_s] nn.put "-gc", new Double (-24.4) ) modparamfO table_s" nn.put "-gg", new Double (-19.9) ) modparam[0 table s" nn.put "-gt", new Double (-22.4) )
modparamfO] .table_S_nn. put ("-ta", new Double (-21.3) ) ; modparamfO] . table_S_nn.put ("DL", new Double (-32.6588469014549) ) modparamfO] . table_S_nn.put ("LD", new Double (-31.5601899937808) ) modparamfO] . table_S_nn.put ("LL", new Double (-46.8237053519389) )
// Parameterfil: /home/jgk/projects/tmpred/param/4.4.0.3.48.myhomo. ttall3 modparam[l]=new Tm_param(); modparamfl] . table_deltaS_init=new Hashtable () ; modparamfl] . table_deltaH__init=new Hashtable () ; modparamfl] . table_H_nn=new Hashtable () ; modparamfl] . table_S_nn=new Hashtable () ; modparamfl] . table_H_mono=new Hashtable () ;
// deltaS_init modparamfl] . table_deltaS_init.put ("A new Double(-0.417636505370175) ) ; modparamfl] . table_deltaS__init.put ("C new Double (-0.336962265530943) ) ; modparamfl] . table_deltaS_init .put ("G", new Double (0.158722707567369)); modparamfl] . table_deltaS_init.put ("T", new Double(-0.208276246703806) ) modparamfl] . table_deltaS_init .put ("s", new Double (-0.372560237115056)) modparamfl] .table deltas init. put ("w", new Double(-0.369231842748097))
// deltaH_init modparam[1] . t ble_deltaH_init.put ( "A" new Double (0.782799015317204)) modparamfl] . table_deltaH_init .put ("C", new Double (0.505143958976093) ) modparamfl] . table_deltaH_init.put ("G new Double(-0.278435409858967) modparamfl] . table_deltaH__init.put ("T new Double(0.435388592178298) ) modparamfl] . table_deltaH_init.put ("s", new Double (0.674274914671847) ) modparamfl] .table deltaH init.put ("w", new Double (0.737204216419899) ) .
// H_mono
// H_nn modparam[ 1 .table_ H_nn put ( -aa", new Double (-7.48976721304652) ) modparam[1 . table_Η_nn put ( -at", new Double (-7.32484387195421)) modparamfl . table_Η_nn put ( -ca", new Double (-7.87301403612122) ) modparamfl .table H_nn put ( -eg", new Double (-8.42170498069151) ) modparamfl . table]Η_nn put ( -ct", new Double (-7.74869970168917)) modparam[1 .table Η_nn put ( -ga", new Double (-7.76121630019634) ) modpara fl .table H_nn put ( -gc", new Double (-8.50240032966233)) modparam[1 .table] H_nn put ( -gg", new Double (-8.25010916492258) ) modparamfl . table_ H_nn put ( -gt", new Double (-7.93137471336983) ) modparamfl .table" H_nn put ( -ta", new Double (-7.19824578597169)) modparamfl . table" H_nn put ( DL", new Double ( -7.64614418904991) ) ; modparamfl .table" H_nn .put ( LD", new Double ( -8.33674742394922)) ; modparamfl . table" H nn .put ( LL", new Double ( -8.48734357858688) ) ;
// S_nn modparamfl .table s nn put ( "-aa", new Double (-21.0183983393758) ) modparamfl .table] s nn put ( "-at", new Double (-20.9296841275317)) modparam[1 . table_ s nn put ( "-ca", new Double (-20.9035893027225) ) modparam[ 1 . table_ s nn put ( "-eg", new Double (-20.6106325122583) ) modparamfl .table s nn put ( "-ct", new Double (-20.9057484730661)) modparam[ 1 .table] s nn put ( "-ga", new Double (-20.9212322254303) ) modparam[1 . table_ s nn put ( "-gc", new Double (-20.5901247977925) ) modparamfl .table s nn put ( "-gg", new Double (-20.7858543446338) ) modparamfl . table] s nn put ( "-gt", new Double (-20.8689105244148) ) modparamfl . table_ s nn put ( "-ta", new Double (-20.9664930077991) ) modparam[1 .table s nn put ( "AA", new Double ( -21.5617306575376) ) ; modparam[1 .table" s nn put ( "AC", new Double (-20.3786877774106) ) ; modpara [1 .table] s nn put ( "AG", new Double (-19.7518872341502) ) ; modparamfl . table s nn put ( "AT", new Double (-21.2727546056293)) ; modparamfl .table s nn put ( "Aa", new Double (-20.938973530993) ) ; modparamfl .table s nn put ( "Ac", new Double (-20.2351610010894)) ;
modparam 1 .table S_ nn put "Ag", new Double ( -19.4518499755888) modparam 1 . tablets] nn put ( "At", new Double ( -18.0784147699652) modparam 1 .table S~ nn put "CA", new Double ( -20.7498315076301) modparam 1 .table S] nn put "CC" new Double ( -18.5922344843667) modparam 1 .table s" nn put "CG", new Double ( (-18.0631881798201) modparam 1 .table s] nn put "CT" new Double ( -20.9602092321618) modparam 1 .table s" nn put "Ca" new Double ( -19.34976501959) ) ; modparam 1 .table s" nn put "Cc", new Double ( -20.0636169441132) modparam 1 .table s" nn put "Cg" new Double ( -18.939750726347) ) modparam 1 .table S nn put "Ct" new Double ( (-21.6068106111011) modparam 1 .table s" nn put "GA" new Double ( -20.6285173614998) modparam 1 . table_S_ nn put "GC" new Double ( (-21.1393042416962) modparam 1 .tablejs] nn put "GG" new Double ( -19.2292577000144) modparam Ll .table S nn put "GT" new Double ( -20.5200676805333) modparam 1 .table_s] nn put "Ga" new Double ( (-18.7514389680752) modparam 1 . table_s] nn put "Gc" new Double ( -18.6326804946465) modparam 1 .table S nn put "Gg" new Double ( -19.4129295709103) modparam 1 .table_s] nn put "Gt", new Double ( -19.7139582131827) modparam 1 . table_s] nn put "TA" new Double ( -21.7225383328128) modparam 1 . table_S_ nn put "TC" new Double ( (-19.0166955249448) modparam 1 .table S nn put "TG", new Double ( -18.1432831834084) modparam 1 . table_s] nn put new Double ( (-21.702655506544)) modparam 1 .table s] nn put "Ta" new Double ( -21.6061975277002) modparam 1 .table s] nn put "Tσ" new Double ( -21.5479504447966) modparam 1 .tablejs] nn put "Tg" new Double ( (-20.6436696433912) modparam 1 .table S_ nn put "Tt" new Double ( -21.6266985715776) modparam ri .table S_ nn put "aA" new Double ( -20.5319311697475) modparam 1 .table s] nn put "aC" new Double ( -19.025255715981) ) modparam 1 .table s] nn put "aG" new Double ( -20.3463306063697) modparam [1 .table s" nn put "aT" new Double ( -20.3507644210827) modparam 1 .table_s" nn put "cA" new Double ( -18.6568044948628) modparam [1 .table s] nn put "cC" new Double ( -20.8609785312075) modparam [1 .table_s] nn .put "cG" new Double ( -19.1534484621275) modparam 1 .table S nn put "cT" new Double ( -18.949763882851)) modparam [1 . table_s] nn .put "gA" new Double ( -19.9252395614033) modparam [1 .table s] nn .put "gC" new Double ( (-19.5747039562859) modparam [1 . table_s] nn put "gG" new Double ( -19.4942636766488) modparam 1 .table_S_ nn .put "gT" new Double ( -19.9443842009211) modparam [1 .table S_ nn .put ("tA" new Double ( -22.5442702941593) modparam [1 .table s" nn .put '"tC" new Double ( -19.7656230175562) modparam 1 . table_s] nn .put "tG" new Double ( (-20.6803075424075) modparam [1 .table_S_ nn .put , new Double ( (-20.9918041722088)
// ==------=================--===--===---=====--====-=--=--====-==_.==--==
// Parameterfil : /home/jgk/projects/tmpred/param/4.4.0.3.48.myhomo. ttugetre7 modparam[2] =new Tm_param( ) ; modparam[2] . table_deltaS_Lnit=new HashtableO ; modparam[2] . table_deltaH_init=new HashtableO; modparam[2] .table_H_nn=new HashtableO ; modparam[2] . table_S_nn=new Hashtable ; modparam[2] .table_H_mono=new Hashtable ;
// deltaS_init modparam[2] . table_deltaS_Lnit .put ("A" new Double(-0.407793407213917) ) modparam[2] . table_deltaS_init .put ("C", new Double (-0.213527372713559)) modparam[2] . table_deltaS_init .put ("G", new Double (0.35848328509707) ) ; modparam[2] . table_deltaS_init .put ("T", new Double (0.203944086326224) ) ; modparam[2] .table_deltaS__init.put ("s' new Double (-0.372560237115056) ) modparam[2] . table_deltaS__init .put ( "w" new Double(-0.369231842748097) ) ,
// deltaH_init modparam[2] . table_deltaH_init .put ("A", new Double (0. 86242607856365) ) ; modparam[2] . table_deltaH_init.put ("C", new Double (-0.0536311403356718) ) ; modparam[2] . table_deltaH_init .put ("G", new Double (-0.787818472323185) ) ;
modparam[2] . table_deltaH_init .put ("T", new Double (-0.577191417685245)) modparam[2] . table_deltaH_init .put ("s", new Double (0.674274914671847) ); modparam[2] . table_deltaH_init.put ("w", new Double (0.737204216419899) ) ;
// H_mono
// H_nn modparam 2] . table_H_nn new Double (-7.48976721304652) modparam 2] . table_H nn new Double (-7.32484387195421) modparam 2 . table_H_nn new Double (-7.87301403612122) ) modparam 2 . table_H nn new Double (-8.42170498069151) ) modparam 2] . table_H_nn new Double (-7.74869970168917)) modparam 2] . table__H nn new Double (-7.76121630019634)) modparam 2; . table__H_nn new Double (-8.50240032966233)) modparam 2] . table_H_nn new Double (-8.25010916492258)) modparam 2 . table__H nn new Double (-7.93137471336983)) modparam 2 . table_H_nn new Double (-7.19824578597169) ) modparam 2 . table_H_nn new Double (-7.57500309750847) ) modparam 21 . table_H nn new Double (-8.62638977709659) ) modparam 2 . table_H nn new Double (-8.30977887038586) )
// S_nn modparam 2 .table_S nn , new Double (-21.0183983393758) modparam 2 .table_S nn , new Double (-20.9296841275317) modparam 2 .table_S nn , new Double (-20.9035893027225) modparam 2 . table_S nn , new Double (-20.6106325122583) modparam 2 .table__S_nn , new Double (-20.9057484730661) modparam 2 .tablets nn , new Double (-20.9212322254303) modparam 2 . table_S nn , new Double (-20.5901247977925) modparam 2 .table S_nn , new Double (-20.7858543446338) modparam 2 .table S nn , new Double (-20.8689105244148) modparam 2 .table S nn , new Double (-20.9664930077991) modparam 2 .table_S_nn new Double -20.9447059251337) ) modparam 2 .table S nn new Double -21.3977869950092) ) modparam 2 .table S nn new Double -19.3180598469252) ) modparam 2 . table_S_nn new Double -21.7372825480374)) modparam 2 . table_S_nn new Double -20.519746311685)); modparam 2 . table_S_nn new Double -20.1102797514339)) modparam .2 .table S nn new Double -19.8837896896234) ) modparam r2 . table_S nn new Double -16.6189306374549)) modparam 2 .table S nn new Double -21.7677432772647) ) modparam [2 .table_S nn new Double -18.6955344509688) ) modparam T2 . table_S_nn new Double -17.7951906152784)) modparam [2 . table_S_nn new Double -20.9908366322904) ) modparam 2 .table_S nn new Double -19.5503830674945) ) modparam [2 .table_S nn new Double -20.3830477876892) ) modparam [2 .table S nn new Double -18.9002469686417)) modparam [2 .table_S nn new Double -21.408268210924) ) ; modparam : 2 .table_S nn new Double -20.8162756238568) ) modparam [2 . table_S nn new Double -20.6272454690228)) modparam [2 . table_S_nn new Double -18.323886570111) ) ; modparam [2 .table S nn new Double -20.3270306928687) ) modparam [2 .table S_nn new Double -18.8840728478426) ) modparam ;2 . table S_nn new Double -19.469767014314) ) ; modparam [2 .table S nn new Double -20.5553996930734) ) modparam [2 .table S_nn new Double -19.0425869495101)) modparam [2 . table S nn new Double -22.4469367362978) ) modparam [2 I .table_S_nn new Double -17.8972135549923) ) modparam [2 .table S_nn new Double -18.5655748363087) ) modparam [2 | .table S_nn new Double -21.8796937437075)) modparam [2 I .table S_nn new Double -22.7345090588341) ) modparam [2 I . table S_nn new Double -20.3745455995998)) modparam :2 I . table S_nn new Double -20.7769612524783)) modparam ;2 | . table S nn new Double -21.4720373674347) ) modparam [2 I .table_S_nn new Double -20.5263443813538) ) modparam ;2 | .table_S_nn new Double -19.0058736831775) ) modparam ;2 ] .table_S_nn
new Double -20.642144617583)) ;
modparam[2 . table__S_nn. put("aT" new Double(-20.3395587841216) ) modparam [2 . table"_S_nn. putO'cA" new Double(-18.9418426287649)) modparam[2 . table]_S_nn. putO'cC" new Double (-20.4556223524877) ) modparam[2 . table]_S_nn. put("cG" new Double(-19.4702848936854) ) modparam f2 . table]_S_nn. put("cT" new Double (-18.7713856854974)) modparam[2 . table__S_nn, putC'gA" new Double (-20.6980212771351) ) modparamf2 . table__S_nn. putO'gC" new Double (-19.6358318414429) ) modparamf2 .table"]s_nn, put("gG" new Double (-20.2590169785449) ) modparam[2 . table"_S_n , put("gT" new Double(-19.6460069771352) ) modparam [2 . table"_S_nn. put("tA" new Double(-19.56763639202) ) ; modparam[2 .table] _S_nn put("tC" new Double (-19.4028441502224)) modparam[2 . table]_S_nn, put("tG" new Double (-21.5258339764989)) modparam[2 . table" S nn, put("tT" new Double (-21.9790781626011) )
// =========================================================
I I Parameterfil: /home/jgk/projects/tmpred/param/4. .4.12.12.myhomo. ttallδ modparam[3] =new Tm_para ( ) ; modparam[3] . table_deltaS_init=new HashtableO ; modparam[3] .table_deltaH_init=new HashtableO ; modparam[3] . table_H_nn=new HashtableO; modpara [3] . table_S_nn=new Hashtable ; modparam[3] . table_H__mono=new HashtableO ;
// deltaS_init modparam[3] . table_deltaS_init .put ("A", new Double (-5.14449897466861) ) ; modparam[3] . table_deltaS_init.put ("C", new Double (-6.98913655256153) ) ; modparam[3] . table_deltaS_init .put ( "G" new Double(-5.36464180365539) ) modparam[3] . table_deltaS_init .put ("T" new Double(-4.19684771688626)) , modparam[3] .table deltas init. put ("s" new Double (-0.372560237115056) ; modparam[3] . table_deltaS_init -put ("w", new Double (-0.369231842748097) ) ;
// deltaH_init modparam[3] .table deltaH init .put ( "A", new Double (-0.308850061360838) ) modparam[3] .table_deltaH_init.put ("C new Double(-0.972721500770524) ) modparam[3] . table_deltaH_init .put ("G1 new Double(-0.856281675074907) ) modpara [3] . table_deltaH_init .put ("T new Double(-0.254870883861951) ) modparam[3] . table_deltaHJ.nit.put ("s" new Double(0.674274914671847) ) modparam[3] . table_deltaH_init .put ("w" new Double (0.737204216419899) )
// H_mono modparam[3 tableJJmono . put ( "A" new Double (-13.0850625834165) ) modparam[3 table Jmono .put ( "C" new Double (-15.4415946710241) ) modparam[3 table_H_mono.put ( "G" new Double (-15.5290562655932) ) modpara [3 table_H_mono .put ( "T" new Double (-12.8962280546374))
new Double (-7.48976721304652)) new Double (-7.32484387195421) ) new Double (-7.87301403612122)) new Double (-8.42170498069151)) new Double (-7.74869970168917)) new Double (-7.76121630019634)) new Double (-8.50240032966233) ) new Double (-8.25010916492258)) new Double (-7.93137471336983)) new Double (-7.19824578597169) ) new Double (-4.94211739183454)) ; new Double (-3.2298329995419)) ; new Double (-2.65193801394363)) new Double (-1.92736294288536)) new Double (-3.74842584808002) ) new Double (0.250976824915542)) new Double (-2.52230884729791) )
new Double (-2.47894638749835) )
modparam[3 . tableJJnn. put "sS" new Double(-2.08573406157705) ) modpara [3 . table_H_nn . put "S " new Double (-3.16193826406854) ) modparam[3 . tableJJnn.put "wS" new Double(-2.10629510531988) ) modparam[3 .table H nn.put "wW" new Double (-2.34429223089846) )
// S_nn modparam[3 . table_ s nn put -aa new Double -21.0183983393758 modparam[3 . table_ s nn put -at new Double -20.9296841275317 modparam[3 .table s nn put -ca new Double -20.9035893027225 modparam[3 .table s nn put -eg new Double -20.6106325122583 modparam[3 .table" s nn put -ct new Double -20.9057484730661 modparam[3 .table" s nn put -ga new Double -20.9212322254303 modparam[3 . table] s nn put -gc new Double -20.5901247977925 modparam[3 .table s nn put -gg new Double -20.7858543446338 modparam[3 . table] s nn put -gt new Double -20.8689105244148 modparam[3 . table] s nn put -ta new Double -20.9664930077991 modparam[3 .table s nn put SS" new Double ( -52.7341097218716) modparam[3 .table" s nn put SW" new Double ( -48.0173532462563) modparam[3 . table] s nn put Ss" new Double ( -24.7053248747157 modparam[3 .table s nn put Sw" new Double ( -25.2483086657963) modparam[3 .table_ s nn put WS" new Double ( -46.7082776752559) modparam[3 . table_ s nn put WW" new Double ( -33.9766242546579) modparam[3 .table s nn put Ws" new Double ( -24.593626756038) ) modparam[3 . table] s nn put Ww" new Double ( -24.1779478033306) modparam[3 . table__ s nn put sS" new Double ( -24.1344000692216) modparam[3 . table] s nn put sW" new Double ( -23.8911849011862) modparam[3 . table s nn put wS" new Double ( -24.5041713703224) modparam[3 . table" s nn put wW" new Double ( -23.8041757177075)
// =========================================================
// Parameterfil: /home/jgk/projects/tmpred/param/4.4. .12.3.myhomo. ttal16 modparam[4] =new Tm_pa am() ; modparam[4] . table_deltaS_init=new HashtableO ; modparam[4] . table_deltaH_init=new HashtableO ; modparam[4] . table_H_nn=new HashtableO ; modparamf 4] .table_S_nn=new HashtableO ; modparam[4 ] . table_H_mono=new HashtableO ;
// deltaS_init modparam[4] . table_deltaSJ.nit.put ("A", new Double ( -1 80759598042465)) ; modparam[4] . table_deltaS_init.put ("C", new Double ( -1 64205431364765) ) ; modparam[4] . table_deltaS_init.put ("G", new Double ( -1 53474825333411)); modparam[4] . table_deltaS_init.put ("T", new Double { -1 56722518744529)); modparam[4 ] . table_deltaS_init .put ("s", new Double ( -0 372560237115056)) , modparam[ ] . table_deltaS_init .put ("w" , new Double i -0 369231842748097) ) ,
// deltaH_init modparamf4] .table_ deltaH_init. ut ( "A" , new Double (0.562011155975154) ) ; modparamf4] .table deltaH_init .put ( "C" , new Double (0.501526805191486) ) ; modparam[4] .table deltaH_init .put ( "G" , new Double (-0.223329383181387) ) ; modparamf 4] .table deltaH_init.put ("T", new Double (0.404061229265622) ) modparamf4] .table deltaH_init . put ( "s" , new Double(0.674274914671847) ) modparam[4] .table deltaH_init . put ( "w" , new Double (0.737204216419899) )
// Hjnono modparam[4] .table_H_mono.put ("A", new Double (- 17.5329265252141)) modparam[4] . table_H_mono.put ("C", new Double (-■18.5157490113392)) modparam[4] .table_H_mono.put ("G", new Double (-■18.8714990370891)) modparam[4] .table H mono.put ("T", new Double(-■17.430598429616));
// H_nn modparam[4] . table_H_nn.put ("-aa", new Double (-7. 8976721304652) ) modparam[4] . table_H_nn.put ("-at",- new Double (-7.32484387195421) ) modparam[4] . table_H_nn.put ("-ca", new Double (-7.87301403612122) )
modparam[4 ,table_ H_nn. put "-eg new Double (-8.42170498069151 modparam[4 , table]"H_nn. put "-ct new Double (-7.74869970168917 modparam[4 . table_ H_nn . put "-ga new Double (-7.76121630019634 modparam[4 . table]Η_nn. put "-gc new Double (-8.50240032966233 modparam[ .table]Η_nn. put "-gg new Double (-8.25010916492258 modparam[4 . table]"H_nn . put "-gt new Double (-7.93137471336983 modparam[4 .table]Η_nn. put "-ta new Double (-7.19824578597169 modparam[4 . table_"H_nn. put "SS" new Double (-0.273099798601551 modparam[4 . table] H_nn . put "SW" new Double (0.419286733365625) modparam[4 .table_"H_nn. put "Ss" new Double (-0.541700784403176 ); modparam[4 . table_ H_nn, put "Sw" new Double ( 0.350546387435068) modparam[4 . table_ H_nn. put "WS" new Double (-0.296817518810437 ); modparam[4 .table H_nn. put "WW" new Double (0.175879861412384) modparam[4 . table] H_nn, put "Ws" new Double (0.328320257160163) modpara [4 .table H_nn. put "Ww" new Double ( 0.169293035915045 ); modparam[4 . table]"H_nn. put "sS" new Double 0.326542458181525) modparam[4 . table] H_nn, put "sW" new Double ( 0.938829019706826 ); modpara [ .table] H_nn. put "wS" new Double (0.300181620507486) modparam[4 .table""H nn, put "wW" new Double ( 0.316641864778047)
// S_nn modparam[4 .table s nn put "-aa", new Double -21.0183983393758 modparam[4 . table] s nn put "-at". new Double -20.9296841275317 modparam[4 . table_ s" nn put "-ca", new Double -20.9035893027225 modparam[4 . table_ s]nn put "-eg", new Double -20.6106325122583 modparam[4 . table] s]nn put "-ct", new Double -20.9057484730661 modparam[ 4 . table" s]nn put "-ga", new Double -20.9212322254303 modpara [4 .table] s" nn put "-gc", new Double -20.5901247977925 modpara [4 . table] s" nn put "-gg", new Double -20.7858543446338 modparam[4 . table_ s" nn put "-gt", new Double -20.8689105244148 modpara [4 . table] s" nn put "-ta", new Double -20.9664930077991 modparam[4 . table_ s" nn put "DL", new Double ( -22.2388993255255) modparam[4 .table" s" nn put "LD", new Doublet -23.9336954436759) modparam[4 . table" s" nn put "LL", new Double(' -48.2573001896195)
// =========================================================
// Parameterfil : /home/j gk/proj ects/tmpred/param/ . 4. 4. 48.3.myhomo . ttall3 modpara [ 5 ] =new Tm_param ( ) ; modparam[5] . table_deltaS_init=new HashtableO ; modparam[5] . table_deltaH_init=new Hashtable ; modparam[5] . table_H_nn=new Hashtable ; modparam[5] . table_S_nn=new HashtableO ; modparam[5] . table_H_mono=new Hashtable () ;
// deltaS_init modparam[5] . table_deltaS_init put("A", new Double (-0.888469343292043) ) , modparam[5] .table_deltaS_init put("C", new Double (-1.02543607545834) ) ; modparamf5] . table_deltaS_init put("G", new Double (-0.541648874710394) ) , modparam[5] . table_deltaS_init put("T", new Double (-0.65983861685686) ) ; modparam[5] . table_deltaS init.putC's", new Double (-0.372560237115056) ) , modparamf5] . table_deltaS init.putO'w", new Double (-0.369231842748097) ) ,
// deltaH_init modparamf5] .table_ deltaH_init .put ( "A" , new Double(0.615705041687039)); modparam[5] .table "deltaH_init .put ( "C" , new Double(0.174168418982924)); modparam[5] . table_"deltaH_init .put ( "G", new Double (-0.316854740516605) ) modparamf5] .table_"deltaH_init .put ( "T" , new Double (0.468853851084089)) modpara [5] .table_"deltaH_init.put ("s", new Double(0.674274914671847)) modparam[5] .table "deltaH_init .put ( "w" , new Double (0.737204216419899))
// H_mono modparam[5] . table_H_mono. put ("A", new Double (-12.2836541825738) ) modparam[5] . table_H_mono.put ("C", new Double (-12.8366937840179) ) modparam[5] . table_H_mono.put ("G", new Double (-13.1042874575601) )
modparam[5] . table_H_m.ono.put ("T", new Double (-12.1930059340835) ) ;
new Double ( 48976721304652 new Double ( 32484387195421 new Double ( 87301403612122 new Double ( 42170498069151 new Double ( 74869970168917 new Double ( 76121630019634 new Double (-8.50240032966233 new Double (-8.25010916492258 new Double (-7.93137471336983 new Double (-7.19824578597169 new Double 0.178443264646713 new Double 0.626677714801197 new Double 0.596829770378585 new Double 0.487815178989618 new Double 0.771469450002954 new Double -1 00851806983534) new Double 1.42002924459608) new Double 2.1602568782306) ) new Double 0.551742312137283 ); new Double 0.706523147036115 new Double 1.0219743340256)) new Double 0.343403406607851 ); new Double 1.42098576721322) new Double 1.16189045546058) new Double 1.50512036091287) new Double 0.286178902087989 new Double 0.325781391314861 new Double .0408233779793723 new Double 0.712259747306718 new Double 0.228069270366563 new Double 1.53023383659402) new Double 1.42345356134335) new Double 1.16308519064659) new Double 0.902484755483895 new Double 0.246725295583488 new Double 0.920526169197009 new Double 1.279823124502) ) ; new Double 0.317238648246969 new Double 0.656770342200844 new Double 0.832334138900636 new Double 0.807723137936193 new Double 0.879062052445067 new Double 0.959947989068327 new Double 1.40288154815212) new Double 0.791217583005563 ); new Double 1.02678440448276) new Double 1.72086127613672) new Double 0.706538514572194 ); new Double 1.27407148022265) new Double 1.67492049706038) new Double 1.32646498674451) new Double 1.21084101077631) new Double 1.17115932925001) new Double 1.51513536216474) new Double 0.166992658485835 new Double 0.957616290525477 new Double 0.724202653405165
new Double ■1.08943892482195)
// S_nn modparam . table_S_nn. put "-aa", new Double (-21.0183983393758 modparam . table_S_n . put "-at", new Double (-20.9296841275317 modparam . table_S_nn. put "-ca", new Double (-20.9035893027225 modparam .table S_nn.put "-eg", new Double (-20.6106325122583
modparamf5] table S_nn.put ( "-ct" , new Double (-20.9057484730661)) modparam[5] table] S_nn. put ( "-ga", new Double (-20.9212322254303)) modparam[5] table_ S_nn.put ( "-gc" , new Double (-20.5901247977925) ) modparamf5] table_ S_nn. put ( "-gg" , new Double (-20.7858543446338)) modparam[5] table_ S_nn. put ( "-gt", new Double (-20.8689105244148)) modparamf5] table_ S_nn.put ( "-ta" , new Double (-20.9664930077991)) modparam[5] table S_nn. put ( "DL" , new Double (-18.3725054920304) ); modparam[5] table S_nn.put LD" new Double ( 18.5867411854372)) ; modparam[5] table" S nn.put LL" new Double ( 33.7400646113203) ) ;
// =========================================================
// Parameterfil: /home/jgk/projects/tmpred/param/4.4.4.48.3.myhomo. ttugetrel modparam[ 6] =new Tm para ( ) ; modparamf 6] . table_deltaS_init=new HashtableO ; modparamf 6] . table_deltaH_init=new HashtableO ; modparamf 6] . table_H_nn=new HashtableO ; modparamf 6] . table_S_nn=new HashtableO ; modparamf 6] . table_H_mono=new Hashtable ;
// deltaS_init modparam[6] .table_ deltaS_init . put ( "A", new Double (- 1.0899853854154)); modparamf 6] . table_ deltaS_init . put ( "C" , new Double (- 1.26650514222434) ) ; modparamf 6] . table_ deltaS_init .put ( "G", new Double (- 0.636096340366464) ) modparamf 6] . table^ deltaS_init . put ( "T", new Double (- 0.692536920626161) ) modparamf 6] . table__ deltaS_init . put ( "s" , new Double (- 0.372560237115056) ) modparamf 6] . table^ deltaS_init .put ( "w" , new Double (- 0.369231842748097) )
// deltaH_init modparamf 6] .table deltaH_init . put ( "A" , new Double (0.346600533782424) ) ; modparamf 6] . table__'deltaH_init .put ( "C" , new Double(-0.571391474074441) ) modparamf 6] .table deltaH_init .put ( "G" , new Double(-0.431395953242384) ) modparamf 6] .table_ deltaH_init . put ( "T" , new Double (-0.175680165547273) ) modparamf 6] .table_'deltaH_init .put ( "s" , new Double(0.674274914671847) ) ; modparamf 6] .table "deltaH_init . put ( "w" , new Double(0.737204216419899)) ;
// H_mono modparam[6 table_H_mono . put ( "A", new Double(-13.0754325103332) ) modparam[ 6 table_H_mono . put ( "C" , new Double (-13.646666116136) ) ; modparamf6 table_H_mono.put ( "G", new Double (-13.6293972843139) ) modparam[ 6 table_H_mono . put ( "T", new Double (-12.9859631483842) )
// H_nn modpara [6 table_ H_nn. put new Double (-7.48976721304652) modparam[6 table Η_nn. put new Double (-7.32484387195421) modparam[ 6 table Η_nn. put new Double (-7.87301403612122) modparam[ 6 table] H_nn. put new Double (-8.42170498069151) modpara [6 table Η_nn. put new Double (-7.74869970168917) modparam[ 6 table" H_nn. put new Double (-7.76121630019634) modparamf 6 table] H_nn. put new Double (-8.50240032966233) modparam[ 6 table Η_nn. put new Double (-8.25010916492258) modparam[ 6 table] H_nn. put new Double (-7.93137471336983) modparam[6 table "H_nn. put new Double (-7.19824578597169): modparam[6 table H_nn. put new Double ( -0.550329001961804): modparam[ 6 table]Η_nn. put new Double ( 0.547445535909528) modparam[ 6 table "H_nn. put new Double (-0.921006379530219) : modparam[6 table "H_nn. put new Double (-0.344957768635853) modparam[ 6 table "H_nn. put new Double (-0.754556907253045) modparam[ 6 table" H_nn. put new Double (-1.24531973714279) ) ; modparam[ 6 table" H_nn. put new Double (-1.14112776038759) ) ; modparam[6 table" H_nn. put new Double (-2.40661512922826) ) ; modpara [6 table] H_nn. put new Double (-0.599849310599913) ) ; modparam[6 table H_nn. put new Double (-0.979896995845449) ) ; modparam[ 6 table Η_nn. put new Double ( 1.40702090497018) ) ; modpara [6 table H nn, put
new Double ( 0.544368218865807) ) ;
modparam .table H_nn.put new Double -0.84623437898934) ) modparam .table] H_nn . put new Double -1.22861371811788)) modparam •table H_nn.put new Double -1.58027912989158)) modparam . table_ H_nn. put new Double -0.475151811986212)) modparam , table_ H_nn.put new Double -0.162584226406409)) modparam •table H_nn.put new Double -0.587156187857858) ) modparam . table_ H_nn . put new Double -1.41663092804754)) ; modparam •table H_nn.put new Double -0.583688894933071) ) modparam •table H_nn. put new Double -1.6538514342035)) ; modparam .table H_nn .put new Double -1.12801570402914)) ; modparam .table H_nn. put new Double -0.659846271417488)) modparam • table] H_nn. put new Double -0.881015310109863) ) modparam . table H_nn.put new Double -0.346920026493557) ) modparam . table_ H_nn. put new Double -0.918176991777502)) modparam . table_ H_nn.put new Double -1.44038679704405)) ; modparam .table H_nn. put new Double -0.63544324592585) ) ; modparam •table H_nn.put new Double -0.509070031861056) ) modparam • table] H_nn. put new Double -1.03584670655476) ) ; modparam .table H_nn.put new Double -1.40946877218105)); modparam .table H_nn. put new Double -0.870845046257428) ) modparam •table H_nn. put new Double -1.20602481577836)) ; modparam •table H_nn.put new Double -1.23960733216066)) ; modparam . table] H_nn. ut new Double -0.633900561424835) ) modparam . table H_nn. put new Double -0.94390709787839) ) ; modparam .table H_nn. put new Double -1.46284985286192)) ; modparam .table H_nn.put new Double -0.816866823421651)) modparam •table H_nn.put new Double -1.31737354533552)); modparam .table] H_nn. ut new Double -1.70861548243372) ) ; modparam .table H_nn.put new Double -1.0656492536418)) ; modparam .table H_nn.put new Double -1.3976489761836) ) ; modparam .table] H_nn. ut new Double -0.645745276016066)) modparam .table H_nn.put new Double -1.70832316347213)) ; modparam .table H_nn. put new Double -1.20474315180883) ) ; modparam .table H_nn.put new Double -1.1530087974343) ) ; modparam . table" H_nn. put new Double -0.366342339192337) ) modparam . table" H nn.put new Double -0.709379925645181) )
// S_nn modparam . table_ S_nn.put . new Double (-21.0183983393758) ) modparam . table_ S_nn. put new Double (-20.9296841275317) ) modparam .table_ S_nn.put . new Double (-20.9035893027225) ) modparam . table_ S_nn.put new Double (-20.6106325122583) ) modparam . table S_nn. put new Double (-20.9057484730661)) modparam . table S_nn .put ■ new Double (-20.9212322254303)) modparam . table" S_nn. ut . new Double (-20.5901247977925) ) modparam . table" S_n . put . new Double (-20.7858543446338)) modparam . table] S_nn.put ■ new Double (-20.8689105244148)) modparam .table S_nn.put . new Double (-20.9664930077991) ) modparam . table S_nn.put new Double ( -19.0042167633551)) modparam . table] S_nn.put new Double (-19.3040440770709)) modparam .table S_nn.put
new Double (-37.0502230904161) )
// End of model parameters
} public double calcTm(String inputsequenee) throws BadOligoException { boolean debug=Boolean.valueθf (parameterProperties .getProperty ("debug", "null") ) .booleanValue int i ; double f ] results=new double [7 ] ; double retval ; double sum=0. 0; int num=0 ; for (i=0 ; i<7 ; i++) { results [ i] =docalcT (modparamfl] , inputsequenee) ;
if (debug) {
System. out.println ( "Results [" + i + "]=" + results [i]); } sum+=results [i] ; num-t-t-; }
Arrays . sort (results) ;
// retval=sum/num; // Mean of results retval=Math. round (results [3]+0.05) ; // Median of results (add 0.05 to get same result as www.lna-tm.com) if (debug) {
System. out.println ("Retval =" + retval + " ("+results [3] +") ") ; } return retval; // Median of results } private double docalcTm(Tm_param modparam, String inputsequenee) throws BadOligoException { int i; String mon, dimer;
Hashtable pairs=new Hashtable ; Hashtable endmonos=new HashtableO; Hashtable monos=new HashtableO; double oligo_conc=Double.valueOf (parameterProperties .getProperty ("oligo_conc", "null") ) . ou bleValue () ; double salt_conc=Double.valueOf (parameterProperties. getProperty ("salt_conc", "null") ) .doubl eValueO ; boolean debug=Boolean.valueθf (parameterProperties. getProperty ("debug", "null") ) .booleanValue
0; sequence=findReplace (inputsequenee, "mC", "C") ; sequence=findReplace (sequence, "X", "C") ; if (debug) {
System, out.println ("Translated seq: "-sequence) ; } checkSequence (sequence) ;
// Count endpairs String first_mon=sequence. substring (0, 1) ; String last_mon=sequence. substring (sequence. length () -1, sequence. length () ) ; if (debug) {
System. out . println ( "f irst_mon="+f irst_mon+" ; "+ (String) trans_ws (first_mon) +" ; "+ (Stri ng) trans_LD (f irst_mon) ) ;
System. out . println ( "last_mon="+last_mon+" ; "+ ( String) trans_ws (last_mon) +" ; "+ ( String) trans_LD (last_mon) ) ; } hcount (endmonos, (String) first non) ; hcount (endmonos, (String) trans_ws (first_mon) ) ; hcount (endmonos, (String) trans_LD (first_mon) ) ; hcount (endmonos, (String) last_mon) ; hcount (endmonos, (String) trans_ws (last_mon) ) hcount (endmonos, (String) trans_LD (last_mon) )
// Count monomers for(i=0; i<sequence. length () ; i++) { mon=sequence. substring (i, i+l) ; if (debug) {
System. out.println ("mon="+mon+"; "+ (String) trans_ws (mon) +"; "+ (String) trans_LD
(mon) ) ; hcount (monos, (String) mon) ; hcount (monos, (String) trans_ws (mon) ) ; hcount (monos, (String) trans_LD (mon) ) ;
// Count nearest neightbors for(i=0; i<sequence. length () -1; i++) { dimer=sequence. substring (i, i+2) ; if (debug) {
System. out.println ("dimer="+dimer+"; "+ (String) trans_ws (dimer) +" "+ (String) tr ans_LD (dimer) +";"+ (String) table_ten_translate. get (dimer) ) ; ) hhccoouunntt ((ppaaiirrss,, (String) dimer) ; hcount (pairs, (String) trans_LD (dimer) ) ; hcount (pairs, (String) trans_ws (dimer) ) ; hcount (pairs, (String) table ten_translate. get (dimer) ) ;
}
// Do the calculation Enumeration hkeys; String key; double deltaH_sum=0, deltaS_sum=0;
// delta_H hkeys=modparam. table_deltaH_init . keys ( ) ; while (hkeys .hasMoreElements 0 ) { key= (String) hkeys . nextEle ent ( ) ; if (debug) f
System. out.println ("deltaH_init: "+ key + "; param=" + hget (modparam. table_deltaH_init, key) + "; val="+ hget (endmonos, ey) ) ;
} deltaH_sum+=hget (modparam. table_deltaH_init, ey) *hget (endmonos, key) ;
} hkeys=modparam. table_H_mono.keys () ; while (hkeys .hasMoreElements () ) { key= (String) hkeys . nextElement ( ) ; if (debug) {
System. out.println ("H_mono: "+ key+ "; param=" + hget (modparam.table_H_mono, key) + "; val="+ hget (monos, key) ) ;
} deltaH_sum+=hget (modparam. table_H_mono, key) * (hget (monos , key) - 0. 5*hget (endmonos , key) ) ;
} hkeys=modparam. table_H_nn . keys ( ) ; while (hkeys . hasMoreElements ( ) ) { key= ( String) hkeys . nextElement ( ) ; if (debug) {
System. out . println ( "H_nn: "+ key+ " ; param=" + hget (modparam. table_H_nn, key) + " ; val="+ hget (pairs, ey) ) ;
} deltaH sum+=hget (modparam. table H nn, ey) *hget (pairs, key) ; }
// delta S
hkeys=modparam. table_deltaS_init. keys () ; while (hkeys .hasMoreElements ) { key= (String) hkeys.nextElement () ; if (debug) { System. out.println ("deltaS_init: "+ key+ "; param=" + hget (modparam. table_deltaS_init, ey) + "; val="+ hget (endmonos, key) ) ; } deltaS_sum+=hget (modparam. table_deltaS_init, key) *hget (endmonos, key) ;
} hkeys=modparam. table_S_nn. keys ( ) ; while (hkeys.hasMoreElements () ) { key=(String) hkeys. nextElement () ; if (debug) {
System. out .println ("S_nn: "+ key+ "; param=" + hget (modparam. table_S_nn, key) + "; val="+ hget (pairs, key) ) ; } deltaS_suιtι+=hget (modparam. table_S_nn, key) *hget (pairs, key) ; } // Calculate Tm tm=new Double ( (double) (-273.15 + 1000 * (deltaH_sum/ (deltaS_sum+0.368* (sequence. length () - 1) *Math.log(salt_conc)+1.987*Math.log(oligo_conc/4) ) ) ) ) ; return tm.doubleValue () ; } }
Exemplary Computer Any of the methods described herein may be implemented using virtually any computer. Figure 44 shows such an exemplary computer system. Computer system 2 includes internal and external components. The internal components include a processor 4 coupled to a memory 6. The external components include a mass-storage device 8, e.g., a hard disk drive, user input devices 10, e.g., a keyboard and a mouse, a display 12, e.g., a monitor, and usually, a network link 14 capable of connecting the computer system to other computers to allow sharing of data and processing tasks. Programs are loaded into the memory 6 of this system 2 during operation. These programs include an operating system 16, e.g., Microsoft Windows, which manages the computer system, software 18 that encodes common languages and functions to assist programs that implement the methods of this invention, and software 20 that encodes the methods of the invention in a procedural language or symbolic package. Languages that can be used to program the methods include, without limitation, Visual C/C"1"1" from Microsoft. In preferred applications, the methods of the invention are programmed in mathematical software packages that allow symbolic entry of equations and high-level specification of processing, including algorithms used in the
execution of the programs, thereby freeing a user of the need to program procedurally individual equations or algorithms. An exemplary mathematical software package useful for this purpose is Matlab from Mathworks (Natick, MA). Using the Matlab software, one can also apply the Parallel Nirtual Machine (PNM) module and Message Passing Interface (MPI), which supports processing on multiple processors. This implementation of PNM and MPI with the methods herein is accomplished using methods known in the art. Alternatively, the software or a portion thereof is encoded in dedicated circuitry by methods known in the art.
Example 17.: Exemplary Locked Nucleic Acids (LNA) As disclosed in WO 99/14226, LNA are DNA analogues that form DNA- or RNA- heteroduplexes with exceptionally high thermal stability. LNA units include bicyclic compounds as shown immediately below where ENA refers to 2'O,4'C-ethylene-bridged
nucleic acids:
References herein to Locked Nucleoside Analogues, LNA units, LNA monomers, or similar terms are inclusive of such compounds as disclosed in WO 99/14226, WO 00/56746, WO 00/56748, and WO 00/66604.
Desirable LNA monomers and oligomers share some chemical properties of DNA and RNA; they are water soluble, can be separated by agarose gel electrophoresis, and can be ethanol precipitated.
Desirable LNA monomers and oligonucleotide units include nucleoside units having a 2'-4' cyclic linkage, as described in the International Patent Application WO 99/14226 and WO 0056746, WO 0056748, and WO 0066604. Desirable LNA monomers structures are exemplified in the formulae la and lb below. In formula la the configuration of the furanose is denoted D - β, and in formula lb the configuration is denoted L - α. Configurations which are composed of mixtures of the two, e.g. D - β and L - α, are also included.
In la and lb, X is oxygen, sulfur and carbon; B is a universal or modified base (particularly non-natural occurring base) e.g. pyrene and pyridyloxazole derivatives, pyrenyl, pyrenylmethylglycerol moieties, all of which may be optionally substituted. Other desirable universal bases include, pyrrole, diazole or triazole moieties, all of which may be optionally substituted, and other groups e.g. modified adenine, cytosine, 5-methylcytosine, isocytosine. pseudoisocytosine, guanine, thymine, uracil, 5-bromouracil, 5-propynyluracil, 5-propyny-6- fluoroluracil, 5-methylthiazoleuracil, 6-aminopurine, 2-aminopurine, inosine, diaminopurine, 7-propyne-7-deazaadenine, 7-propyne-7-deazaguanine. R , R or R , R or R , R and R are hydrogen, methyl, ethyl, propyl, propynyl, aminoalkyl, methoxy, propoxy, methoxy- ethoxy, fluoro, or chloro.
P designates the radical position for an intemucleoside linkage to a succeeding monomer, or a 5 '-terminal group, R3 or R3' is an intemucleoside linkage to a preceding monomer, or a 3 '-terminal group. The internucleotide linkage may be a phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, phosphoroselenoate, phosphorodiselenoate, alkylphosphotriester, or methyl phosphonate. The internucleotide linkage may also contain non-phosphorous linkers, hydroxylamine derivatives (e.g. -CH2- NCH3-O-CH2-), hydrazine derivatives, e.g. -CH2-NCH3-NCH3-CH2, amid derivatives, e.g. - CH2- CO-NH-CH2-, CH2-NH-CO-CH2-. In la, R4' and R2' together designate -CH2-O-5 -CH2- S-, -CH2-NH-,-CH2-NMe-, -CH2-CH -O-, -CH2-CH2-S-, -CH2-CH2-NH-, or -CH2-CH2-NMe-
where the oxygen, sulfur or nitrogen, respectively, is attached to the 2'-position. In Formula lb, R4' and R2 together designate -CH2-O-, -CH2-S-, -CH2-NH-, -CH2-NMe-, -CH2-CH2-O-, - CH2-CH2-S-, -CH2-CH2-NH-, or -CH -CH2-NMe- where the oxygen, sulphur or nitrogen, respectively, is attached to the 2-position (R2 configuration). Desirable LNA monomer stmctures are stmctures in which X is oxygen (Formula la and lb); B is a universal base such as pyrene; R , R or R , R or R , R and R ' are hydrogen; P is a phosphate, phosphorothioate, phosphorodithioate, phosphoramidate, and methyl phosphomates; R3 or R3 is an intemucleoside linkage to a preceding monomer, or a 3'-terminal group. In Formula la, R4 and R2 together designate -CH2-O-, -CH2-S-, -CH2- NH-, -CH2-NMe-, -CH2-CH2-O-, -CH2-CH2-S-, -CH2-CH2-NH-, or -CH2-CH2-NMe- where the oxygen, sulphur or nitrogen, respectively, is attached to the 2'-ρosition, and in Formula lb, R4' and R2 together designate -CH2-O-, -CH2-S-, -CH2-NH-,-CH2-NMe-, -CH2-CH2-O-5 - CH2-CH2-S-, -CH2-CH2-NH-, or -CH2-CH2-NMe- where the oxygen, sulphur or nitrogen, respectively, is attached to the 2'-position in the R2 configuration.
Particularly desirable LNA monomer for incorporation into an oligonucleotide of the invention include those of the following formula Ila
wherein X oxygen, sulfur, nitrogen, substituted nitrogen, carbon and substituted carbon, and desirably is oxygen; B is a modified base as discussed above e.g. an optionally substituted carbocyclic aryl such as optionally substituted pyrene or optionally substituted pyrenylmethylglycerol, or an optionally substituted heteroalicylic or optionally substituted heteroaromatic such as optionally substituted pyridyloxazole. Other desirable universal bases include, pyrrole, diazole or triazole moieties, all of which may be optionally substituted; R1 , R2, R3, R5 and R5 are hydrogen; P designates the radical position for an intemucleoside linkage to a succeeding monomer, or a 5 '-terminal group, R is an intemucleoside linkage to a preceding monomer, or a 3 '-terminal group; and R2* and R4* together designate -O-CH2- or
-CH2-CH2-O- where the oxygen is attached in the 2'-position, or a linkage of -(CH2)n- where n is 2, 3 or 4, desirably 2, or a linkage of -S-CH2- or -NH-CH2-.
LNA units of formula Ila where R2* and R4* contain oxygen are sometimes referred to herein as "oxy-LNA"; units of formula Ila where R2* and R4* contain sulfur are sometimes referred to herein as "thio-LNA"; and units of formula Ila where R2 and R4* contain nitrogen are sometimes referred to herein as "amino-LNA". For many applications, oxy-LNA units are desirable modified nucleic acid units of oligonucleotides of the invention.
Particularly desirable LNA monomers for use in oligonucleotides of the invention are 2'-deoxyribonucleotides, ribonucleotides, and analogues thereof that are modified at the 2'- position in the ribose, such as 2'-O-methyl, 2'-fluoro, 2'-trifluoromethyl, 2'-O-(2- methoxyethyl), 2'-O-aminopropyl, 2'-O-dimethylamino-oxyethyl, 2'-O-fluoroethyl or 2'-O- propenyl, and analogues wherein the modification involves both the 2 'and 3' position, desirably such analogues wherein the modifications links the 2'- and 3 '-position in the ribose, such as those described in Nielsen et al., J. Chem. Soc, Perkin Trans. 1, 1997, 3423-33, and in WO 99/14226, and analogues wherein the modification involves both the 2'- and 4'- position, desirably such analogues wherein the modifications links the 2'- and 4'-position in the ribose, such as analogues having a -CH2-S- or a -CH2-NH- or a -CH2-NMe- bridge (see Singh et al. J. Org. Chem. 1998, 6, 6078-9). Although LNA monomers having the β-D-ribo configuration are often the most applicable, other configurations also are suitable for purposes of the invention. Of particular use are α-L-ribo, the β-D-xylo and the α-L-xylo configurations (see Beier et al., Science, 1999, 283, 699 and Eschenmoser, Science, 1999, 284, 2118), in particular those having a 2'-4* -CH2-S-, -CH2-NH-, -CH2-O- or -CH2-NMe- bridge. In another desirable embodiment, LNA modified oligonucleotides used in this invention comprises oligonucleotides containing at least one LNA monomeric unit of the general scheme A above, wherein X, B, P are defined as above. One of the substituents R2, R2*, R3, and R3 is a group P* which designates an intemucleoside linkage to a preceding monomer, or a 273'-terminal group. Two of the substituents of R1*, R2, R2*, R3, R4*, R5, R5*, R6, R6*, R7, and R7* when taken together designate a biradical stmcture selected from -(CR*R*)r-M- (CR*R*)S-, -(CR*R*)r-M-(CR*R*)s-M-5 -M-(CR*R*)r+s-M-, -M-(CR*R*)r-M-(CR*R*)s-, -
(CR*R*)r+s-, -M-, -M-M-, wherein each M is independently selected from -O-, -S-, -Si(R*)2-, - N(R*)-, >C=O, -C(=O)-N(R*)-, and -N(R*)-C(=O)-. Each R*and R^-R7^, which are not involved in the biradical, are independently selected from hydrogen, halogen, azido, cyano,
nitro, hydroxy, mercapto, amino, mono- or di(C1.6-alkyl)amino, optionally substituted C^- alkoxy, optionally substituted Ci-β-alkyl, DNA intercalators, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands, and/or two adjacent (non-geminal) R may together designate a double bond, and each of r and s is 0-4 with the proviso that the sum r+s is 1-5. Examples of LNA units are shown in Figure 53 wherein the groups, X and B are defined as above. P designates the radical position for an intemucleoside linkage to a succeeding monomer, nucleoside such as an L-nucleoside, or a 5'-terminal group, such intemucleoside linkage or 5'-terminal group optionally including the substituent R . One of the substituents R , R , R , and R is a group P* which designates an intemucleoside linkage to a preceding monomer, or a 273 '-terminal group.
Desirable nucleosides are L-nucleosides such as for example, derived dinucleoside monophosphates. The nucleoside can be comprised of either a beta-D, a beta-L or an alpha-L nucleoside. Desirable nucleosides may be linked as dimers wherein at least one of the nucleosides is a beta-L or alpha-L. B may also designate the pyrimidine bases cytosine, 5- methyl-cytosine, thymine, uracil, or 5-fluorouridine (5-FUdR) other 5-halo compounds, or the purine bases, adenosine, guanosine or inosine.
As discussed above, a variety of LNA units may be employed in the monomers and oligomers of the invention including bicyclic and tricyclic DNA or RNA having a 2'-4' or 2'- 3' sugar linkages; 2'-O,4'-C-methylene-β-D-ribofuranosyl moiety, known to adopt a locked C3'-endo RNA-like furanose conformation. Illustrative modified stmctures that may be included in oligonucleotides of the invention are shown in Figure 1. Other nucleic acid units that may be included in an oligonucleotide of the invention may comprise 2'-deoxy-2'-fluoro ribonucleotides; 2'-O-methyl ribonucleotides; 2'-O-methoxyethyl ribonucleotides; peptide nucleic acids; 5-propynyl pyrimidine ribonucleotides; 7-deazapurine ribonucleotides; 2,6- diaminopurine ribonucleotides; and 2-thio-pyrimidine ribonucleotides, and nucleotides with other sugar groups (e.g. xylose).
Oligonucleotides containing LNA are readily synthesized by standard phosphoramidite chemistry. The flexibility of the phosphoramidite synthesis approach further facilitates the easy production of LNA oligos carrying all types of standard linkers, fluorophores and reporter groups.
Example 18: Selective Binding Complementary (SBQ nucleotides
Selective Binding Complementary (SBC) nucleotides are unable to form stable hybrids with each other, yet are able to form stable, sequence-specific hybrids with complementary unmodified strands of nucleic acids. Thus, the reduced ability of SBC oligonucleotides to form intramolecular hydrogen bond base-pairs between regions of substantially complementary sequence causes a reduced level of secondary stmcture. Self- complementarity is an important issue in nucleic acid technologies as reported for DNA, PNA and LNA, and in different biological applications especially in the field of homogeneous assays. LNA:LNA duplexes are the most thermally stable nucleic acid type duplex system known, making the reduction of self-complementarity even more important. Exemplary SBC oligonucleotides contain 2-amino-A (D) and 2ST incorporated in the same oligonucleotide as replacements of A and T, respectively. The SBC name refers to the fact that D and 2ST form a destabilised base-pair compared to the A-T base-pair, but D-T and 2ST-A base-pairs are normally more stable than the original A-T base-pair. Exemplary SBC- G nucleotides include inosine or LNA-inosine, and exemplary SBC-C nucleotides inclue PyrroloPyr, LNA- PyrroloPyr, 2SC, and LNA-2SC (Figure 4). Other exemplary SBC nucleotides are shown in Figure 2 and Figure 4. If desired, SBC nucleotides may be incorporated into the nucleic acids and arrays of the invention, using standard methods.
The systems disclosed herein can provide significant nucleic acid probes for universal hybridization. In particular, universal hybridization can be accomplished with a conformationally restricted monomer, including a desirable pyrene LNA monomer.
Universal hybridization behavior also can be accomplished in an RNA context. Additionally, the binding affinity of probes for universal hybridization can be increased by the introduction of high affinity monomers without compromising the base-pairing selectivity of bases neighboring the universal base. Incorporation of one or more modified nucleobases or nucleosidic bases into an oligonucleotide can provide significant advantages. Among other things, LNA oligonucleotides can often self-hybridize, rather than hybridize to another oligonucleotide. Use of one or more modified bases with the LNA units can modulate the propensity of the oligonucleotide to form double stranded stmctures with other oligonucleotides containing modified nucleobases including internal duplex formation, thereby inhibiting undesired self- hybridization.
Example 19: Exemplary Methods for Synthesizing LNA-2-thiopyrimidine Nucleosides and Nucleotides
2-Thiopyrimidine nucleosides can be prepared in several ways as described in Figure 6. For example, the 2-thiouridine-nucleosides (IV) can be synthesized from a substituted 5 uridine nucleoside (VIII). By protection of the O4-position (IX) on the nucleobase tbionation can be performed, 02 position, which results in the 2-thio-uridine nucleoside (IV). Performing sulphurisation on both 02 and 04 results in 2,4-dithio-uridine nucleoside (X) which may be transformed into the 2-thio-uridine nucleoside (IV) (Saladino, et. al, Tetrahedron, 1996, 52, 6759). Another way is to generate a cyclic ether (XI) through
10 reaction with the 5' position this product can then be transformed to the 2-thio-uridine nucleoside (IV) or the 2-O-alkyl-uridine nucleoside (XII). The 2-O-alkyl-uridine nucleoside (XII) can also be generated by direct alkylation of the uridine nucleoside (VIII). Treatment of the 2-O-alkyl-uridine nucleoside (XII) can also be transformed into the 2-thio-uridine nucleoside (Brown et. al, J. Chem. Soc. 1957, 868; Singer, et. al, Proc. Natl. Acad. Sci.
15 USA, 1983, 80, 4884; Rajur and McLaughlin, Tetrahedron Lett., 1992, 33, 6081).
In another method (see Figure 7), lewis acid-catalyzed condensation of a properly substituted sugar (I) and a substituted 2-thio-uracil (II) can result in a substituted 2-thio- uridine nucleoside of the stracture (III) which by further synthetic manipulations can be transformed into the LNA 2-thiouridine nucleoside (IV) (Harnamura et. al., Moffatt, J. Med.
20 Chem., 1972, 15, 1061; Bretner et. al, J. Med. Chem, 1993, 36, 3611).
Using a properly substituted amino-sugar (V), a 2-thio-uridine nucleoside can be synthesized through ring-synthesis of the nucleobase by reaction of the amino sugar (V) and an substituted isothiocyanate(VI), yielding the substituted LNA 2-thio-uracil nucleoside (VI) (Shaw and Warrener, J. Chem. Soc. 1957, 153; Cusack et al, J. Chem. Soc. Perkin 1, 1973,
25 1721) (see Figure 8).
Example 20: Exemplary Methods for Synthesizing sT-LNA
Three different strategies for synthesis of 2sT-LNA are outlined in the Summary of the Invention section. Strategy A involves coupling a glycosyl-donor and a nucleobase, using 30 standard methodology for synthesis of existing LNA monomers. Strategy B involves ring synthesis of the nucleobase. This strategy is desirable because the availability of 1-amino- L A enables introduction of a variety of new nucleobases. Strategy C includes modification of T-LNA; the easy synthesis of LNA-T diol makes this an attractive pathway.
In a desirable embodiment, 2sT-LNA is synthesized as illustrated in Figure 28. In particular, the known coupling sugar l,2-di-O-acetyl-3, 5 di-O-benzyl, 4-C- mesyloxymethyl, α,β-D-ribofuranose 1 was coupled with the nucleobase 2-thio-thymidine in a Norbruggen type reaction. Thus, the nucleobase was silyilated and condensed with the sugar using SnCl4 as catalyst to promote the reaction affording nucleoside 2. Mass spectrometry and ΝMR subsequently identified the isolated product as the desired one. ΝMR data were compared with published data of a 2-thio-thymindine derivative (Kuimelis and Νambiar, Nucleic Acid Res., 1994, 22, 1429-1436) in order to validate the correct attachment point of the nucleobase. Subsequently, a base mediated ring-closing reaction afforded the di-benzylated LNA derivative 3 in 77% yield. The signals in the 1H-NMR spectmm of the compound appeared as singlets, thus proving that the cyclization had occurred to give the LNA skeleton, in which the l'-H and 2'-H are perpendicular to each other causing the 3Jr;2' to be 0 Hz. MALDI mass spectrometry was likewise used for the identification of the compound. The LNA derivative was protected at the nucleobase with the toluoyl protective group to give 4. This group is well known for the protection of 2-thio-thymidine derivatives, (Kuimelis and Nambiar, Nucleic Acid Res., 1994, 22, 1429-1436). The protection of the nucleobase occurs at both the N-3 and the O-4 position and hence the compound is isolated as a mixture of two compounds. NMR shows that the ratio of the two isomers in the isolated mixture is 2:1.
These methods are described further below.
l-f2-O-acetyl-3-O. 5-O-dibenzyl.4-C-mesyloxymethyl-β-D-ribofuranosyl)-2-thio-thymine
1, 2-di-O-acetyl-3, 5 di-O-dibenzyl, 4-C-mesyloxymethyl, α,β-D-ribofuranose (1,
2.0g, 3.83 mmol) and 2-thio-thymine (552mg, 3.89mmol) were co-evaporated with anhydrous acetonitrile (100 ml) and redissolved in anhydrous acetonitrile (80ml), N,O- bistrimethylsilylacetamide (1.5, 5.85mmol) was added, and the reaction was stirred at 80°C for one hour. The mixture was cooled to 0°C, SnCl4 (0.9 ml, 7.66mmol) was added, and the reaction was left to stir for 24 hours. The reaction mixture was diluted with EtOAc and washed with ΝaHCO3 and subsequently with water. The organic phase was dried (Na2SO4) and evaporated to dryness. The product was purified using column chromatography, giving
the thio-thymidine derivative 2 (l.lg, 1.82mmol, 40%) as a white foam. Rf (10% THF/dichloromethane): 0.75.
MALDI-MS: 627 (M+Na) 13C-NMR (CDC13): δ= 174.40, 169.29, 159.89, 136.13, 136.51, 136.05, 128.62, 128.56, 128.41, 128.29, 128.07, 127.89, 12767, 116.18, 91.41, 86.21, 75.59, 5 75.31, 74.46, 74.22, 73.61, 69.25, 69.04, 37.52, 20.62, 11.91
(iR.3R. R.7S)-7-(benzyloxyVl-('benzyloχymethylV3-r2-thiothvmidineV2.5- dioxabicyclo[2.2.1]heptane (3)
1 -(2-O-acetyl-3-O, 5-O-dibenzyl, 4-C-mesyloxymethyl-β-D-ribofuranosyl)-2-thio-
10 thymine (2, 630mg, 1.04mmol) was dissolved in dioxane (15ml) and water (8ml), and aqueous NaOH (2M, 5ml) was added, and the reaction was left to stir at room temperature for one hour. The yellow solution was neutralized with HC1 (1M, 6ml) affording a precipitation. The mixture was diluted with dichloromethane and ethyl acetate causing an emulsion. After separation, the aqueous phase extracted with ethyl acetate, and the combined organic phase
15 was dried (Na2SO4) and evaporated to dryness. The compound was purified by column chromatography (0-2, then 5% THF/dichloromethane), giving the ring closed compound 3 as a white foam (370mg, 0.79mmol, 77%). Rf (2% MeOH/dichloromethane): 0.23. MALDI-MS: 488 (M+Na) 13C-NMR (CDCI3): δ= 173.14, 160.39, 137.20, 136.63, 136.00, 128.46, 128.34, 128.02, 127.66, 115.52, 90.29, 87.77, 77.39,75.26, 73.77, 72.07, 71.70,
20 64.15, 30.17, 12.33
1H-NMR (CDC13): δ= 9.87 (s, IH), 7.69 (d, 1.1Hz, IH), 7.26-7.37 (m, 10H), 6.13 (s, IH), 4.84 (s, IH), 4.66 (d, J= 11.3 Hz, IH), 4.61 (s, 2H), 4.52 (d, 11.5Hz, IH), 4.04 (d, J=7.7Hz, IH), 3.93 (s, IH), 3.88 (d, J= 11.0Hz, IH), 3.82 (d, J= 7.7Hz, IH), 3.82 (d, J= 10.8 Hz, IH), 1.59 (d, J= l.l Hz, 3H)
25
(1R.3R.4R.7S)-7-fbenzyloxyV 1 -(benzyloxymethyl>3-C2-thio-N3/O4-toluoyl- thvmidi-.eV2.5- dioxabicyclo[2.2.1]heptane (4)
(1R, 3R,4R, 7S)-7-(benzyloxy)-l -(benzyloxymethyl)-3-(2-thiothymidine)-2,5- dioxabicyclo[2.2.1]heptane (3, 290mg, 0.62mmol) was dissolved in anhydrous pyridine and
30 diisopropylethylamine (0.2ml, 1.15mmol), toluoyl chloride (0.25ml, 1.89 mmol) was added, and the reaction mixture was stirred at room temperature for three hours. After completion, the reaction mixture was diluted with dichloromethane, and the reaction was quenched by addition of water. The phases were separated, and the organic phase was dried (Na2SO4) and
evaporate to dryness. The residue was co-evaporated with toluene. The product was purified by column chromatography (0-1% MeOH/dichloromethane) to give nucleoside 4 as a white foam (320 mg, 0.55mmol, 89%). Rf (2%MeOH/dichloromethane): 0.78. MALDI-MS: 606 (M+Na) 13C-NMR (CDC13): 5=171.98, 168.30, 160.30, 145.92, 145.82, 5 137.22, 136.65, 135.98, 130.39, 130.27, 129.85, 129.50, 128.51, 128.41, 128.08, 127.73, 115.11, 90.10, 87.81, 76.01, 75.80, 75.39, 75.01, 73.83, 72.19, 72.09, 71.74, 64.15, 21.75, 12.40.
Example 21: Exemplary Methods for Synthesizing LNA-I. LNA-D. and LNA-2AP 0 2'-0, 4'-C-methylene linked (LNA) nucleosides containing hypoxanthine (or inosine)
(LNA-I), 2,6-diaminopurine (LNA-D), and 2-aminopurine (LNA-2AP) nucleobases were efficiently prepared via convergent syntheses. The nucleosides were converted into phosphoramidite monomers and incorporated into LNA oligonucleotides using an automated phosphoramidite method. The complexing properties of oligonucleotides containing these 5 LNA nucleosides were assessed against perfect and singly mismatch DNA.
LNA-I
LNA-D
LNA-2AP
Hypoxantine, the base found in the nucleotides inosine and deoxyinosine, is considered as a guanine analogue in nucleic acids.
Oligonucleotides containing 2,6-diaminopurine replacements for adenines are expected to bind more strongly to their complementary sequences especially as part of A- type helixes due to the potential formation of three hydrogen bounds with thymine or uracil. The reported effect of 2,6-diaminopurine deoxyriboside (D) on the stability of polynucleotide duplexes reaches, on average, about 1.5 °C per modification. Higher stabilization effects for mismatches were observed for D nucleosides involved in formation of duplexes prone to form A-type helixes. LNA D and LNA 2'-OMe-D are expected to have increased stabilization and mismatch discrimination. LNA can be used in combination with 2-thio-T for constmction of selectively binding complementary oligonucleotides. Taking into consideration the extremely high stability of LNA:LNA duplexes, this approach might be very useful for constmcting of LNA containing capture probes and antisense reagents including drags.
2-Aminopurine (2-AP) is a fluorescent nucleobase (emission at 363 mn), which is useful for probing nucleic acids stmcture and dynamics and for hybridizing with thymine in Watson-crick geometry. LNA-I, LNA-D, and/or LNA-2AP may be used in the nucleic acids of the present invention, e.g., to increase the priming efficiency of DNA oligonucleotides in PCR experiments and to constract selectively binding complementary agents. Synthesis of LNA-I The convergent method adopted for preparation of LNA monomers (Koshkin et al. , J.
Org. Chem. 66:8504, 2001) was successfully applied for syntheses of the modified LNA nucleotides 1-3. The synthetic route to LNA-I phosphoramidite 11 is depicted in Figure 48.
The previously described 4-C-branched furanose 4 (Koshkin et al, supra) was used as a glycosyl donor in coupling reaction with silylated hypoxantine by the method of Vorbriiggen etal. (Vorbriiggen et al, Chem. Ber. 114:1234, 1981; Vorbriiggen et al. , Chem. Ber. 114:1256, 1981; and Norbriigen, Acta Biochim. Pol., 43:25, 1996). The reaction resulted in 5 high yield formation of desired β-configurated nucleoside derivative 5. However, analogous to the coupling reaction of 4 with protected guanines, the formation of undesired N-7 isomer (ratio of N-9/N-7 = 4:1) was also detected. The mixture of the isomers was used for the ring closing reaction and protected LΝA nucleoside 6 was isolated in 68 % yield as a crystalline compound. The correct stmcture of the isolated isomer was confirmed later by chemical
10 conversion of LΝA-I into LΝA-A nucleoside (vide infra). Deprotection of the 5'-hydroxy group of 6 was accomplished via two-step procedure developed for the syntheses of other LΝA nucleosides ( Koshkin et al, supra). First, 5'-O-mesyl group was displaced by sodium benzoate to produce nucleoside 7. The latter was converted into 5'-hydroxy derivative 8 after saponification of the 5'-benzoate. Direct removal of the 3'-O-benzyl group from compound 8
15 was unsuccessful under the conditions tested due to a solubility problem. Therefore, compound 8 was converted to DMT-protected nucleoside 9 prior to catalytic debenzylation of the 3'-O-hydroxy group. The phosphoramidite 11 was finally afforded via standard phosphitylation (McBride et al, Tetrahedron Lett. 24:245, 1983; Sinha et al, Tetrahedron Lett. 24:5843, 1983; and Sinha et al, Nucleic Acids Res. 12:4539, 1984) of the nucleoside
20 10. In order to verify the correct orientation of the glycoside bond (N-9 isomer) in synthesized LΝA-I nucleoside, compound 7 was successfully converted into the known LΝA-A derivative 13 (Koshkin et al. , supra) (Scheme 2). Thus, a treatment of 7 with phosphoryl chloride according to the procedure reported by Martin (Helv. Chim. Acta 78:486, 1995) resulted in a high yield formation of 6-chloropurine derivative 12. The
25 adenosine derivative 13 was derived from 12 after reaction with ammonia. Exemplary Analytical Data
Data for compound 8 includes the following: mp 302-305 °C (dec). 1HΝMR (DMSO- 6): δ 8.16, (s, IH), 8.06 (s, IH), 7.30-7.20 (m, 5H), 5.95 (s, IH), 4.69 (s, IH), 4.63 (s, 2H), 4.28 (s, IH), 3.95 (d, J= 7.7, IH), 3.83 (m, 3H). 13C NMR (DMSO-rf6): δ 156.6,
30 147.3, 146.1, 137.9, 137.3, 128.3, 127.6, 127.5, 124.5, 88.2, 85.4, 77.0, 72.1, 71.3, 56.7. MALDI-MS m/z: (M+H)+. Anal. Calcd for Cι8H18N4O5-5/12 H2O: C, 57,21; H, 5.02; N, 14.82. Found: C, 57,47; H, 4.95; N, 14.17.
Analysis of compound 11 indicated that 31P NMR (DMSO-d6): δ 148.90.
The synthesis of LNA-I is illustrated in Figure 48 (Keys: (a) hypoxantine, BSA, TMSOTf, 1,2-dichloromethane; 93%; (b) NaOH, THF, EtOH, H2O; 69%; (c) NaOBz, DMSO; 76%; (d) NaOH, THF, MeOH, H2O; 85%; (e) DMT-C1, pyridine; 92%; (f) Pd/C, HCO2NH4; 77%; (g) 2-cyanoethyl-NN-diisoρroρyl-phosphoramidochloridite, DIPEA, DMF; 75%.)
Exemplary Experimental Conditions
(1R, 3R.4R, 7SV7-(2-Cyanoethoxy(diisopropylamino)phosphinoxyV 1 -(4.4'- dimethoxytrityloxymethyl)-3- hyroxanthin-9-yl)-dioxabicyclo[2.2.1]heptane ril) Compound 10 (530 mg, 0.90 mmol, described previously, (see for example, WO
00/56746) was dissolved in anhydrous EtOAc (5 mL) and cooled in an ice-bath. DIPEA (0.47 mL, 2.7 mmol) and (250 μL, 1.1 mmol) were added under intensive stirring. Formation of insoluble material was observed, and CH2C12 (3 mL) was added to produce a clear solution. More 2-cyanoethyl-N,N-diisopropylphosphoramidochloridite (200 μL, 0.88 mmol) was added after one hour, and the mixture was stirred overnight. EtOAc (30 mL) was added, the mixture was washed with sat. ΝaHCO3 (2 x 50 mL), brine (50 mL), dried (Na SO4), and concentrated to a solid residue. Purification by silica gel HPLC (1-5 %MeOH/CH2Cl2 v/v, containing 0.1% of pyridine) gave compound 11 (495 mg, 75%) as a white solid material. P NMR (DMSO--f6): δ 148.90. Synthesis of LNA-D
Taking advantage of a high availability of the natural deoxy- and riboguanosines, a number of effective methods were developed for their conversion into 2,6-diaminopurine (D) nucleosides (Fathi et a , Tetrahedron Lett. 31:319, 1990; Gryaznov et al. , Tetrahedron Lett., 35:2489, 1994; and Lakshman et al, Org. Lett., 2:927, 2000). However, the production of LNA-G nucleoside is a multi-step synthetic procedure.
Scheme for Synthesis of LNA-G
For the synthesis of LNA-D nucleoside, a novel synthesis method was developed that employed a common convergent scheme, related to the strategy used earlier for the synthesis of its anhydrohexitol counterpart (Boudou et al, Nucleic Acids Res. 27:1450, 1999). In particular, a properly protected carbohydride unit was conjugated with 6-chloro-2- aminopurine to give a stable 6-chloro intermediate derivative (scheme below) which was further converted into desired diaminopurine nucleoside.
Thus, it was shown that glycosylation of 2-chloro-6-aminopurine with compound 4 resulted in highly stereoselective formation of the nucleoside derivative 14. To promote the ring closing reaction, a solution of 14 in aqueous 1,4-dioxane was treated with 10-fold excess of sodium
Synthesis of LNA-D, see Figure 3 (Keys: (a) 2-chloro-6-aminoρurine, BSA, TMSOTf, 1,2- dichloromethane; 90 %; (b) NaOH, 1,4-dioxane, H2O; 87%; (c) NaOBz, DMF; (d) NaN3, DMSO; (e) NaOH, EtOH; 79% (three steps); (f) 10% Pd/C, HCO2NHι, MeOH, H2O; 84%; (g) 1. BzCl, pyridine; 2. NaOH, EtOH, pyridine; 62%; (h) DMT-C1, pyridine; 80%; (i) 2-cyanoethyl-NN-diisopropylphosphoramidochloridite, DIPEA, DMF; 74% hydroxide to give bicyclic compound 15 in 87% yield. The standard reaction with sodium benzoate in hot DMF was then successfully applied for displacement of 5'-mesylate of 15. Notably, this reaction proceeded in very selective manner and no side products originating from the modification of the nucleobase were detected. The desired compound 16 was precipitated from the reaction mixture after addition of water. In order to introduce the 6-amino group
into nucleobase stmcture, intermediate 6-azido derivative 17 was synthesized via reaction of 16 with sodium azide. The nucleoside derivative 18 was isolated as a crystalline compound after saponification of the 5'-benzoate of 17. Subsequent catalytic hydrogenation of 18 on palladium hydroxide resulted in simultaneous reduction of 6-azido and 3 '-benzyl groups to 5 give LNA-D diol 19 after crystallization from water. By the use of peracelation method, 2- and 6-amino groups of 19 were benzoylated at the next step to give the nucleobase protected derivative 20, which was in the standard way further converted into phosphoramidite monomer 21. This phosphoramidite has been produced in a quantity of 0.5 grams.
10 Exemplary Analytical Data
Data for compound 19 includes the following: 1H NMR (DMSO-d6): δ 7.81 (s, IH), 6.78 (br s, 2H), 5.91 (br s, 2H), 5.71 (s, IH), 5.66 (br s, IH), 5.04 (br s, IH), 4.31 (s, IH), 4.20 (s, IH), 3.90 (d, J= 7.7 Hz, IH), 3.77 (m, 2H), 3.73 (d, J= 7.7 Hz, IH). 13C NMR (DMSO-d6): δ 160.5, 156.2, 150.9, 134.2, 113.4, 88.3, 85.0, 79.3, 71.5, 70.0, 56.8. MALDI-
15 MS m/z: 295.0 (M+H)+. Anal. Calcd for CnHι4N6O4T.5 H2O: C, 41,12; H, 5.33; N, 26.15. Found: C, 41.24; H, 5.19; N, 25.80.
The31P NMR (DMSO-- ) spectrum for compound 24 contained signals at δ 149.19 and 148.98.
Data for compound 23 includes the following: crystallized from MeOH. mp. 227.5-
20 229 °C (dec). 1H NMR (DMSO-d6): δ 8.60 (s, IH), 8.15 (s, IH), 6.64 (br s, 2H), 5.82 (s, IH), 5.71 (br s, IH), 5.04 (br s, IH), 4.40 (s, IH), 4.21 (s, IH), 3.92 (d, J= 7.7 Hz, IH), 3.79 (m, 2H), 3.75 (d, J= 7.7 Hz, IH). 13CNMR(DMSO- ): δ 160.6, 152.0, 149.4, 139.3, 127.1, 88.6, 84.8, 79.1, 71.6, 70.2, 56.8. MALDI-MS m/z: 334.7 (M+H)+.
For protected compound 23, the P NMR (DMSO--i6) spectrum has a signal at
25 148.93 and 148.85.
Exemplary Experimental Conditions t7S,3R.^R.7S -3-(2-amino-6-chloropurin-9-ylV7-benzyloχy-l-methanesulfonoxymethyl-2.5- dioxabicvclo[2.2.1]heptane (15)
To a solution of compound 14 (40 g, 64.5 mmol) in 1,4-dioxane (300 mL) was added
30 1 M NaOH (350 mL). The mixture was stirred for one hour at 0 °C, neutralized with AcOH (40 mL), and washed with CH2C12 (2 x 200 mL). The combined organic layers were dried (Na2SO ) and concentrated under reduced pressure. The solid residue was purified by silica gel flash chromatography to give compound 15 (27.1 g, 87%) as a white solid material. 1H
NMR (CDCI3): δ 7.84 (s, IH), 7.32-7.26 (m, 5H), 5.91 (s, IH), 4.73 (s, IH), 4.66 (d, J= 11.7 Hz, IH), 4.61 (d, J= 11.7 Hz, IH), 4.59 (s, 2H), 4.31 (s, IH), 4.18 (d, J= 8.0 Hz, 2H), 3.99 (d, J= 7.9 Hz, 1H), 3.05 (s, 3H). 13C NMR (CDCl3) δ 158.9, 152.2, 151.4, 139.1, 136.4, 128.4, 128.2, 127.7, 125.3, 86.5, 85.2, 77.2, 76.8, 72.4, 72.1, 64.0, 37.7. MALDI-MS m/z 5 482.1 [M+H]+.
(1S.3R.4R.7S)-3-(2-amino-6-chloropurin-9-yl)-l-benzoyloxymethyl-7-benzyloxy-2.5- dioxabicvclo[2.2.1]heptane (16)
A mixture of sodium benzoate (7.78 g, 54 mmol) and compound 15 13 g, 27 mmol) 0 was suspended in anhydrous DMF (150 mL) and stirred for two hours at 105 °C. Ice-cold water (500 mL) was added to the solution under intensive stirring. The precipitate was filtered off, washed with water, and dried in vacuo. The intermediate product 16 (8 g) was used for ext step without further purification. Analytical sample was additionally purified by silica gel HPLC (0-2% MeOH/CH2Cl2 v/v). 1H NMR (CDC13) δ 7.98-7.95 (m, 2H), 7.79 (s, 5 IH), 7.62-7.58 (m, IH), 7.48-7.44 (m, 2H), 7.24 (m, 5H), 5.93 (s, IH), 4.80 (d, J= 12.6 Hz, IH), 4.77 (s, IH), 4.67 (d, J= 11.9 Hz, IH), 4.65 (d, J= 12.6 Hz, IH), 4.56 (d, J= 11.9 Hz, IH), 4.27 (d, J= 8.0 Hz, IH), 4.25 (s, IH), 4.08 (d, J= 7.9 Hz, IH). 13C NMR (CDC13) δ 165.7, 158.8, 152.1, 151.3, 138.9, 136.4, 133.4, 129.4, 129.0, 128.5, 128.4, 128.2, 127.6, 125.4, 86.4, 85.7, 77.2, 76.7, 72.5, 72.3, 59.5. MALDI-MS m/z 508.0 [M+H]+. 0 CIS.3R.4R, 7S)-3 -(2-amino-6-azidopurin-9-yl)-7-benzyloxy- 1 -hydroxymethyl-2.5- dioxabicvclo[2.2.1]heptane (18)
All the amount of compound 16 from the previous experiment was dissolved in anhydrous DMSO (100 mL) and NaN3 (5.4 g, 83 mmol) was added. The mixture was stirred for two hours at 100 °C and cooled to room temperature. Water (400 ml) was added, and the 5 mixture was stirred for 30 minutes at 0 °C (ice-bath) to give a yellowish precipitate 17. The precipitate was filtered off, washed with water, and dissolved in THF (25 mL). 2M NaOH (30 mL) was then added to the solution, and after 15 minutes of stirring the mixture was neutralized with AcOH (4 mL). The mixture was concentrated to approximately 1/2 of its volume and cooled in an ice-bath. The titel compound was collected by filtration, washed 0 with cold water, and dried in vacuo. Yield: 8.8 g (79% from 15). 1H NMR (DMSO-d6) δ 8.53 ( br s, 2H), 8.23 (s, IH), 7.31-7.26 (m, 5H), 6.00 (s, IH), 5.26 (t, J= 5.7 Hz, IH), 4.76 (s, IH), 4.64 (s, IH), 4.31 (s, IH), 3.99 (d, J= 7.9 Hz, IH), 3.88-3.85 (m, 3H). 13C NMR (DMSO- 6) δ 146.0, 144.0, 143.8, 137.9, 137.0, 128.3, 127.7, 127.6, 112.3, 88.3, 85.6, 77.1,
77.0, 72.2, 71.4, 56.8. MALDI-MS m/z 384.7 [M+H]+ for 2,6-diaminopurine product, 410.5 [M+H]+. Anal. Calcd for Cι8H18 N8O4: C, 52.68; H, 4.42; N, 27.30. Found: C, 52.62; H, 4.36; N, 26.94.
CiS.iR.^R.7S -3-(2.6-Diaminopurin-9-yl)-7-hvdroxy-l-hvdroxymethyl-2.5- 5 dioxabicyclo[2.2.1]heptane (19)
To a suspension of compound 18 (8 g, 19.5 mmol) in MeOH (100 mL) were added Pd(OH)2/C (20%, 5.5 g) and HCO2NH4 (3g). The mixture was refluxed for 30 minutes and more HCO2NH4 (3g) was added. After refluxing for further 30 minutes, the catalyst was filtered off and washed with boiling MeOHH2O (1/1 v/v, 200 mL). The combined filtrates
10 were concentrated to approximately 100 mL and cooled in an ice-bath. The precipitate was filtered off, washed with ice-cold H2O and dried in vacuo to give compound 19 (5.4 g, 94 %) as a white solid material. 1H NMR (DMSO-d6): δ 7.81 (s, IH), 6.78 (br s, 2H), 5.91 (br s, 2H), 5.71 (s, IH), 5.66 (br s, IH), 5.04 (br s, IH), 4.31 (s, IH), 4.20 (s, IH), 3.90 (d, J= 7.7 Hz, IH), 3.77 (m, 2H), 3.73 (d, J= 7.7 Hz, IH). 13C NMR(DMSO- 6) δ 160.5, 156.2, 150.9,
15 134.2, 113.4, 88.3, 85.0, 79.3, 71.5, 70.0, 56.8. MALDI-MS m/z: 295.0 (M+H)+. Anal. Calcd for CιιH14N6O4T.5 H2O: C, 41,12; H, 5.33; N, 26.15. Found: C, 41.24; H, 5.19; N, 25.80. CIS.3R.4R.7S -3-r2.6-Di-(N-benzoylamino purin-9-yl)-7-hvdroxy- 1 -hvdroxymethyl-2.5- dioxabicyclo[2.2.1]heptane (20
A solution of compound 19 (0.5 g, 1.7 mmol) in anhydrous pyridine (20 mL) was
20 cooled in an ice-bath and benzoyl chloride (1.5 mL, 12.9 mmol) was added under intensive stirring. The mixture was allowed to warm to room temperature and was stirred overnight. Ethanol (20 mL) and 2 M ΝaOH (20 mL) were added, and the mixture was stirred for an additional hour. EtOAc (75 mL) was added and the solution was washed with water (2 x 50 mL). The combined aqueous layers were washed with CH C12 (2 x 50 mL). The combined
25 organic phases were dried (Νa2SO ) and concentrated under reduced pressure to a solid residue. The residue was suspended in Et O (75 mL, under refluxing for 30 minutes) and cooled in an ice-bath. The product was collected by filtration, washed with cold Et2O, and dried in vacuo to give compound 20 (530 mg, 62 %) as a slightly yellow solid material. C1R.3R.4R.7S)-3 -(2.6-Di-(N-benzoylamino)purin-9-ylV 1 -(4.4'-dimethoxytrityloxymethyl)-7-
30 hvdroxy-2.5-dioxabicvclor2.2.1]heptane (21)
Compound 20 (530 mg, 1.06 mmol) was co-evaporated with anhydrous pyridine (2 x 20 mL) and dissolved in anhydrous piridine (10 mL). DMT-C1 (600 mg, 1.77 mmol) was added, and the solution was stirred overnight at rt. The mixture was diluted with EtOAc (100
mL), washed with saturated NaHCO3 (100 mL) and brine (50 mL). Organic layer was dried over Na SO4 and concentrated under reduced pressure. Purification by silica gel HPLC (20- 100%> EtOAc/hexane v/v, containing 0.1 % of pyridine) gave compound 21 (670 mg, 79%) as a white solid material. 1HNMR (CD3OD): δ 8.41 (s, IH), 8.15-8.03 (m, 4H), 7.71-7.22 (m, 15H), 6.92-6.86 (m, 4H), 6.23 (s, IH), 4.77 (s, IH), 4.62 (s, IH), 4.03 (d, J= 7.9 Hz, IH), 3.99 (d, J= 7.9 Hz, IH), 3.79 (s, 6H), 3.67 (d, J= 10.9 Hz, IH), 3.54 (d, J= 10.8 Hz, IH),. MALDI-MS m/z: 826 (M+Na)+. Anal. Calcd for C46H40N6O8Η2O: C, 67.14; H, 5.14; N, 10.21. Found: C, 67.24; H, 4.97; N, 10.11. (1R, 3R.4R.7S)-7-(2-Cyanoethoxy(diisopropylamino)phosphinoxy)-3-(2.6-di-(N- benzoylamino)purin-9-yl)-l-(4.4'-dimethoxytrityloxymethyl)-2.5-dioxabicyclo[2.2.1]heptane (21). To a stirred solution of compound 20 (640 mg, 0.8 mmol) in anhydrous DMF (5 mL) were added DIPEA (420 L, 2.4 mmol) and 2-cyanoethyl-NN- diisopropylphosphoramidochloridite (300 μL, 1.2 mmol). The mixture was stirred for 1.5 hours at room temperature, diluted with EtOAc (100 mL), and washed with saturated ΝaHCO3 (2 x 100 mL) and brine (50 mL). Organic layer was dried (Na2SO4) and concentrated under reduced pressure to give a yellow solid residue. Purification by silica gel HPLC (20-100 % EtOAc/hexene containing 0.1 % of pyridine) gave compound 21 (590 mg, 74%) as a white solid material. 31P NMR (DMSO-d6) δ 149.19, 148.98. Synthesis of Pac-protected LNA-D amidite Figure 10 illustrates a method for synthesizing a Pac-protected version of LNA-D amidite.
Compound 17
Compound 7 (lg, 3.39 mmol) was co-evaporated with anhydrous DMF (2 x 10 mL) and dissolved in DMF (10 mL). Imidazole (0.69 g, 10.17 mmol) and 1,3-dichloro-l, 1,3,3- tetraisopropyldisiloxane (1.4 mL, 4.37 mmol) were added, and the mixture was stirred overnight. H2O (100 mL) was added under intensive stirring to precipitate nucleoside material. The precipitate was filtered off, washed with H2O, and dried in vacuo. Crystallization from ethanol gave compound 17 (1.15 g, 63%) as a white solid material. MALDI-MS: m/z 537.3 (M+H)+.
Compound 18
To a solution of compound 17 (1.15 g, 2.14 mmol) in anhydrous pyridine (5 mL) was added phenoxyacetic anhydride (2 g, 7.0 mmol) and the mixture was stirred for four hours.
EtOAc (100 mL) was added, and the solution was washed with sat. NaHCO3 (2 x 100 mL), brine (50 mL), dried (Na2SO4), and concentrated to a solid residue. Purification by silica gel
HPLC (50-100% v/v EtOAc/hexane) gave compound 18 (1.65 g, 95%) as a white solid material. MALDI-MS: m/z 827.3 (M+Na)+.
CIS, 3R.4R.7S)-3 -(2.6-Di-(N-phenoxyacetylamino)purin-9-yl)-7-hydroxy- 1 -hvdroxymethyl-
2.5-dioxabicvclor2.2.1 ]heptane (19) To a solution of compound 18 (0.96 g, 1.19 mmol) in anhydrous THF (10 mL) was added Et3Ν-3HF (0.2 mL) and the mixture was stirred overnight at room temperature. The formed precipitate was collected by filtration and washed with THF (5 mL) and pentane (5 mL) to give after drying compound 19 (650 mg, 97%) as a white solid material. MALDI-MS: m/z 563.0 (M+H)+. C1R.3R.4R.7SV3 -(2.6-Di-(N-phenoxyacetylamino)-purin-9-yl)- 1 -(4.4'- dimethoxytrityloxymethyl)-7-hydroxy-2,5-dioxabicyclo[2.2.1]heptane (20)
To a solution of compound 19 (650 mg, 1.15 mmol) was added DMT-C1 (500 mg,
1.48 mmol). The mixture was stirred for five hours, diluted with EtOAc (100 mL), and washed with sat. ΝaHCO3 (2 x 100 mL). The organic layer was dried and concentrated to a solid residue. Crystallization from EtOAc gave compound 20 (810 mg, 81%) as a white solid material.
(JR.3R.4R.7SV7-(2-Cyanoethoxy(diisopropylamino)phosphinoxyV3 -(2.6-di-(N- phenoxyacetylamino)-purin-9-yl)-l-(4.4'-dimethoxytrityloxymethyl)-2.5- dioxabicyclo[2.2.1]heptane (21) To a solution of compound 20 (800 mg, 0.92 mmol) in anhydrous DMF (10 mL) were added 0.75 M solution of DCI in EtOAc (0.7 mL) and 2-cyanoethyl tetraisopropylphosphorodiamidite (0.32 mL, 1.01 mmol). The mixture was stirred at room temperature overnight and EtOAc (75 mL) was added. The resulting solution was washed with sat. ΝaHCO3 and brine, dried and concentrated to a solid residue. Purification by silica gel HPLC (30-100% v/v EtOAc/hexane, containing 0.1% of pyridine) gave phosphoramidite
21 (550 mg, 56%) as a white solid material.
31P NMR (DMSO-£tg): δ 149.08, 148.8.
Synthesis ofLNA-2AP
The intermediate derivative 16 was also used for the synthesis of LNA-2AP nucleoside. First, the 5'-O-benzoyl group of 16 was hydrolyzed by aqueous sodium hydroxide to give the nucleoside derivative 22 in 72% yield. The conditions of catalytic transfer hydrogenation usually used for removal of the 3'-O-benzyl group turned out to be suitable for complete dechlorination of the nucleobase of 22. Thus, totally deprotected LNA- 2AP nucleoside 23 was afforded in high yield after refluxing of the methanolic solution of 22 in the presence of paladium hydroxide and ammonium formate. The 2-amine of 23 was selectively protected with an amidine group after treatment with NN-dimethylformamide dimethyl acetal. The resulting diol 24 was then 5'-O-DMT protected and 3'-O-phosphitylated to yield the desired phosphoramidite LΝA-2AP monomer 25 (McBride et al, J. Am. Chem. Soc. 108:2040, 1986).
Synthesis of LNA-2AP, see Figure 54 (Keys: (a) NaOH, 1,4-dioxane, H2O; 72%; (b) 20% Pd(OH)2/C, HCOiNE , MeOH, H2O; 89%; (c) NN-dimethylformamide dimethyl acetal, DMF; (d) DMT-C1, pyridine; 87% (two steps); (e) 2-cyanoethyl-NN- diisopropylphosphoramidochloridite, DIPEA, DMF; 64%.)
Exemplary Experimental Conditions CiS.3R. R, 7S -3-(2-amino-6-chloropurin-9-yl)-7-benzyloxy-l-hydroxymethyl-2.5- dioxabicvclo[2.2.1"|heptane (22)
To a solution of compound 16 (3 g, 5.92 mmol) in 1,4-dioxane (20 mL) was added 2 M ΝaOH (20 mL) and the mixture was stirred for one hour. AcOH (3 mL) was added, and the solvents were removed under reduced pressure. The solid residue was re-dissolved in 20% MeOH/EtAc (50 mL), washed withΝaHCO3 (2 x 50 mL), dried (Na2SO4) and concentrated to a solid residue. The residue was purified by silica gel column chromatography (1-2% MeOH/EtAc v/v) to give compound 22 (1.72 g, 72%) as a white solid material.
(IS.3R.4R.7SV3-(2-aminopurin-9-yl)-7-hvdroxy-l -hvdroxymethyl-2.5- dioxabicvclol"2.2.1"]heptane (23) To a solution of compound 22 (0.72 g, 1.79 mmol) in MeOH/dioxane (1/1 v/v) were added Pd(OH)2/C (20%, 0.5 g) and HCOjNHj (1.5 g, 23.8 mmol). The mixture was stirred under refluxing for 30 minutes and cooled to room temperature. The catalyst was filtered off and washed with MeOH. The combined filtrates were concentrated under reduced pressure to
yield compound 23 (0.44 g, 89 %) as a white solid material. Analytical sample was crystallized from MeOH. mp. 227.5-229 °C (dec). 1HNMR (DMSO-d6): δ 8.60 (s, IH), 8.15 (s, IH), 6.64 (br s, 2H), 5.82 (s, IH), 5.71 (br s, IH), 5.04 (br s, IH), 4.40 (s, IH), 4.21 (s, IH), 3.92 (d, J= 7.7 Hz, IH), 3.79 (m, 2H), 3.75 (d, J= 7.7 Hz, IH). 13C NMR(DMSO-^6): δ 160.6, 152.0, 149.4, 139.3, 127.1, 88.6, 84.8, 79.1, 71.6, 70.2, 56.8. C1R.3R.4R.7$)- 1 -(4.4'-dimethoxytrityloxymethyl)-3-(2-N-
(dimethylaminomethylidene)aminopurin-9-yl)-7-hydroxy-2.5-dioxabicyclo[2.2.1]heptane (5, DMT protected version of 24)
Compound 23 (0.4 g, 1.43 mmol) was co-evaporated with anhydrous DMF (10 mL) and dissolved in DMF (15 mL). NN-Dimethylformamide dimethylacetal (0.8 mL) was added and the solution was stirred for three days at room temperature. Water (5 mL) was added, and the solvents were removed under reduced pressure. The solid residue was co-evaporated with anhydrous pyridine (2 10 mL) and dissolved in anhydrous pyridine (5 mL). DMT-C1 (0.7 g, 2.1 mmol) was added, the solution was stirred for four hours, diluted with EtOAc (50 mL), and washed with ΝaHCO3 (2 x 50 mL) and brine (50 mL). Organic layer was dried (Na2SO4) and concentrated to a yellow solid residue. Purification by silica gel HPLC (1-6% MeOH/CH2Cl2 v/v, containing 0.1% of pyridine) gave the 5' DMT protected version of compound 24 (0.87 g, 87%) as a white solid material. (1R, 3R.4R, 7S)-7-(2-Cyanoethoxy(diisopropylamino)phosphinoxyV 1 -(4.4'- dimethoxytrityloxymethyl)-3-(2-N-(dimethylaminomethylidene)aminopurin-9-yl)-2.5- dioxabicvclo[2.2.1"]heptane (25)
The 5' DMT protected version of compound 24 (0.5 g, 0.79 mmol) was dissolved in anhydrous DMF (10 mL) and DIPEA (350 μL) and 2-cyanoethyl-NN- diisopropylphosphoramidochloridite (250 μL) were added. The mixture was stirred for one hour, diluted with EtOAc (50 mL), washed with saturated ΝaHCO3 (2 x 100 mL) and brine (50 mL), dried (Na SO ), and concentrated to a solid residue. Purification by silica gel HPLC (0-3% MeOH/CH2Cl2 v/v, containing 0.1% of pyridine) gave compound 25 (0.42 g, 64%) as a white solid material. 31P NMR (DMSO- 6) δ 148.93, 148.85.
Synthesis of Oligomers
Along with previously described LNA phosphoramidites (Koshkin et al. , supra; and Pedersen et al, Synthesis p. 802, 2002), the phosphoramidite monomers 11, 21, and 25 were successfully applied for automated oligonucleotide synthesis (Carathers, Aec. Chem. Res.
24:278, 1991) to produce the LNA oligomers depicted in Table 9. Oligonucleotide syntheses were performed on a 0.2 μmol scale using an Expedite synthesizer (Applied Biosystems) with the recommended commercial reagents. Standard protocols for DNA synthesis were used, except that the coupling time was extended to 5 minutes and the oxidation time was extended to 30 second cycles. Deprotection of the oligonucleotides were performed by treatment with concentrated ammonium hydroxide for five hours at 60 °C. After that, the LNA-D containing oligonucleotides were additionally treated with AMA (concentrated ammonium hydroxide / 40% aqueous MeNH2; 1/1 v/v) for one hour at 60°C. All the synthesized oligonucleotides were purified by RP-HPLC, and their stractures were verified by MALDI-TOF mass spectra.
The complexing properties of oligonucleotides containing new LNA monomers 1-3 were assessed. Comparative binding data from an 8-mer LNA sequence is shown in Table 9 as the melting temperatures against complementary single stranded DNA. An exemplary sequence for this comparison is GAC AT AGG, which is the central part of a capture probe used for SNP detection in GluclNS7-7asA (A:a mismatch position). The thermal stabilities of reference DΝA duplexes (entries 1-7, Table 9) can be directly compared with their LΝA counterparts (entries 8-14). The hybridizing ability of all LΝA 8-mers is superior to that of isosequencial DΝA oligonucleotides. The average melting temperatures of DΝA and LΝA 8-mers against complementary DΝAs typically differ by about 40 °C. The replacement of one internal LΝA-A nucleotide by LΝA-D resulted in the further stabilization of the complementary duplex (i.e., compare entries 8 and 11) by 6.2 °C. Interestingly, the analogous replacement made in an DΝA octamer destabilized the corresponding duplex by 0.5 °C (i.e., entries 1 and 4). D-nucleosides may facilitate a B to A helix transition, because the A-type stracture of an LΝA:DΝA duplex is more suitable for effective D:t pairing. This stabilizing effect is expected to be even more pronounced for LNA:RNA duplexes, which can be very useful for construction of antisense or other gene-silencing reagents. The mismatch discrimination ability of the D-nucleoside was also studied (entry 11). In comparison to LNA-A (entry 8) D-nucleoside demonstrated remarkable increased mismatch discrimination against DNA-g nucleoside.
Table 9. Melting temperatures (Tm) of the complementary DNA-DNA and LNA-DNA duplexes.8 Modified monomers (LNA are in CAPITALS): I = inosine; D = 2,6- diaminopurine; X = 2-aminopurine.
aThe melting temperatures (Tm values) were obtained as a maxima of the first derivative of the corresponding melting curves (optical density at 260 nm versus temperature). Concentration of the duplexes: 2.5 μM. Buffer: 0.1 MNaCl; 10 mM Na-phosphate (pH 7.0); I mM EDTA. b Low cooperativity of transitions (accuracy ± 1 °C).
Table 10. The mismatch discrimination effect of the chimeric LNA-DNA 12-mers containing LNA-A or LNA-D nucleosides against the point of mutation
'Concentration of duplexes: 2 μM; Buffer: see Table 9.
Table 11. Melting temperatures of the LNA and DNA duplexes (LNAs are CAPITALIZED) containing 2-thio-deoxythymidine (s) and diaminopurineriboside (d). See Table 9 for experimental conditions.
* Tm values in the shaded cells were measured in low salt buffers (1 mM Na-phosphate, pH 7.0). Low cooperativity of the transitions was observed (accuracy ±1.5°C)
Example 22: Exemplary Methods for Synthesizing LNA-PyrroloPyr-SBC-C
The furanopyrimidine phosphoramidite 6pC used for incorporation of the pyrroloC analogue can be synthesized from LNA-U through a series of reactions as illustrated in Figure 9. Starting from LNA-U lpC iodine can be introduced on the 5 position on the
nucleobase (Chang and Welch, J. Med. Chem. 1963, 6, 428). This compound can be used in a Sonogashira type palladium coupling reaction (Sonogashira, Tohda and Hagihara, Tetrahedron Lett. 1975, 4467) resulting in the 5-ethynyl-LNA-U 3pC. The 5-ethynyl-LNA- U 3pC can be transformed to the furanopyrimidie LNA analogue 4pC when reacted with Cul, and then transformed into the DMT-protected phosphoramidite 6pC (Woo, Meyer, and Gamper, Nucleic Acids Res., 1996, 24, 2470). LNA-PyrroloPyr-SBC-C is formed when 6pC or an oligonucleotide containing 6pC is deprotected with ammonia.
Example 23: Exemplary Modified Bases such as Universal Bases Desirable modified bases are covalently linked to the l'-position of a furanosyl ring, particularly to the l'-position of a 2',4'-linked furanosyl ring, especially to the l'-position of a
2'-O,4'-C-methylene-beta-D-ribofuranosyl ring.
As discussed above, other desirable modified bases contain one or more carbon alicyclic or carbocyclic aryl units, i.e. non-aromatic or aromatic cyclic units that contain only carbon atoms as ring members. Modified bases that contain carbocyclic aryl groups are generally desirable, particularly a moiety that contains multiple linked aromatic groups, particularly groups that contain fused rings. That is, optionally substituted polynuclear aromatic groups are especially desirable such as optionally substituted naphthyl, optionally substituted anthracenyl, optionally substituted phenanthrenyl, optionally substituted pyrenyl, optionally substituted chrysenyl, optionally substituted benzanthracenyl, optionally substituted dibenzanthracenyl, optionally substituted benzopyrenyl, with substituted or unsubstituted pyrenyl being particularly desirable.
Without being bound by any theory, it is believed that such carbon alicyclic and/or carbocyclic aryl modified bases can increase hydrophobic interaction with neighboring bases of an oligonucleotide. Those interactions can enhance the stability of a hybridized oligo pair, without necessity of interactions between bases of the distinct oligos of the hybridized pair. Again without being bound by any theory, it is further believed that such hydrophobic interactions can be particularly favored by platelike stacking of neighboring bases, i.e. intercalation. Such intercalation will be promoted if the base comprises a moiety with a relatively planar extended stracture, such as provided by an aromatic group, particularly a carbocyclic aryl group having multiple fused rings. This is indicated by the increases in Tm values exhibited by oligos having LNA units with pyrenyl nucleobases relative to comparable oligos having LNA units with naphthyl nucleobases.
Modified bases that contain one or more heteroalicyclic or heteroaromatic groups also are suitable for use in LNA units, particularly such non-aromatic and aromatic groups that contains one or more N, O or S atoms as ring members, particularly at least one sulfur atom, and from 5 to about 8 ring members. Also desirable is a nucleo base that contains two or more fused rings, where at least one of the rings is a heteroalicyclic or heteroaromatic group containing 1, 2, or 3 N, O, or S atoms as ring members.
In general, desirable are modified bases that contain 2, 3, 4, 5, 6, 7 or 8 fused rings, which may be carbon alicyclic, heteroalicyclic, carbocyclic aryl and/or heteroaromatic; more desirably modified bases that contain 3, 4, 5, or 6 fused rings, which may be carbon alicyclic, heteroalicyclic, carbocyclic aryl and/or heteroaromatic, and desirably the fused rings are each aromatic, particularly carbocyclic aryl.
In some embodiments, the base is not an optionally substituted oxazole, optionally substituted imidazole, or optionally substituted isoxazole modified base.
Other suitable modified bases for use in LNA units in accordance with the invention include optionally substituted pyridyloxazole, optionally substituted pyrenylmethylglycerol, optionally substituted pyrrole, optionally substituted diazole and optionally substituted triazole groups.
Desirable modified bases of the present invention when incorporated into an oligonucleotide containing all LNA units or a mixture of LNA and DNA or RNA units will exhibit substantially constant Tm values upon hybridization with a complementary oligonucleotide, irrespective of the bases present on the complementary oligonucleotide.
In some embodiments, one or more of the common RNA or commonly used derivatives thereof, such as 2'-O-methyl, 2'-fiuoro, 2'-allyl, and 2'-O-methoxyethoxy derivatives are combined with at least one nucleotide with a universal base to generate an oligonucleotide having between five to 100 nucleotides.
Modified nucleic acid compounds may comprise a variety of nucleic acid units e.g. nucleoside and/or nucleotide units. As discussed above, an LNA nucleic acid unit has a carbon or hetero alicyclic ring with four to six ring members, e.g. a furanose ring, or other alicyclic ring stractures such as a cyclopentyl, cycloheptyl, tetrahydropyranyl, oxepanyl, tetrahydrothiophenyl, pyrrolidinyl, thianyl, thiepanyl. piperidinyl, and the like.
In an aspect of the invention, at least one ring atom of the carbon or hetero alicyclic group is taken to form a further cyclic linkage to thereby provide a multi-cyclic group. The cyclic linkage may include one or more, typically two atoms, of the carbon or hetero alicyclic
group. The cyclic linkage also may include one or more atoms that are substituents, but not ring members, of the carbon or hetero alicyclic group.
Unless indicated otherwise, an alicyclic group as referred to herein is inclusive of group having all carbon ring members as well as groups having one or more hetero atom (e.g. N, O, S or Se) ring members. The disclosure of the group as a "carbon or hetero alicyclic group" further indicates that the alicyclic group may contain all carbon ring members (i.e. a carbon alicyclic) or may contain one or more hetero atom ring members (i.e. a hetero alicyclic). Alicyclic groups are understood not to be aromatic, and typically are fully saturated within the ring (i.e. no endocyclic multiple bonds). Desirably, the alicyclic ring is a hetero alicyclic, i.e. the alicyclic group has one or more hetero atoms ring members, typically one or two hetero atom ring members such as O,
N, S or Se, with oxygen being often desirable.
The one or more cyclic linkages of an alicyclic group may be comprised completely of carbon atoms, or generally more desirable, one or more hetero atoms such as O, S, N or Se, desirably oxygen for at least some embodiments. The cyclic linkage will typically contain one or two or three hetero atoms, more typically one or two hetero atoms in a single cyclic linkage.
The one or more cyclic linkages of a nucleic acid compound of the invention can have a number of alternative configurations and/or configurations. For instance, cyclic linkages of nucleic acid compounds of the invention will include at least one alicyclic ring atom. The cyclic linkage may be disubstituted to a single alicyclic atom, or two adjacent or non-adjacent alicyclic ring atoms may be included in a cyclic linkage. Still further, a cyclic linkage may include a single alicyclic ring atom, and a further atom that is a substituent but not a ring member of the alicyclic group. For instance, as discussed above, if the alicyclic group is a furanosyl-type ring, desirable cyclic linkages include the following: C-l', C-2'; C-2', C-3'; C-2', C-4'; or a C-2', C-
5' linkage.
A cyclic linkage will typically comprise, in addition to the one or more alicyclic group ring atoms, 2 to 6 atoms in addition to the alicyclic ring members, more typically 3 or 4 atoms in addition to the alicyclic ring member(s).
The alicyclic group atoms that are incorporated into a cyclic linkage are typically carbon atoms, but hetero atoms such as nitrogen of the alicyclic group also may be incorporated into a cyclic linkage.
Specifically desirable modified nucleic acids for use oligonucleotides of the invention include locked nucleic acids as disclosed in WO99/14226 (which include bicyclic and tricyclic DNA or RNA having a 2 -4' or 2'-3' sugar linkages); 2'-deoxy-2'-fluoro ribonucleotides; 2'-O-methyl ribonucleotides; 2'-O-methoxyethyl ribonucleotides; peptide nucleic acids; 5-propynyl pyrimidine ribonucleotides; 7-deazapurine ribonucleotides; 2,6- diaminopurine ribonucleotides; and 2-thio-pyrimidine ribonucleotides.
LNA units as disclosed in WO 99/14226 are in general particularly desirable modified nucleic acids for incorporation into an oligonucleotide of the invention. Additionally, the nucleic acids may be modified at either the 3' and/or 5' end by any type of modification known in the art. For example, either or both ends may be capped with a protecting group, attached to a flexible linking group, attached to a reactive group to aid in attachment to the substrate surface, etc. Desirable LNA units also are disclosed in WO 0056746, WO 0056748, and WO 0066604.
Desirable syntheses of pyrene-LNA monomers is shown in the following Schemes 1 and 2. In the below Figure 1, and Figure 11 and Figure 25, the compound reference numerals are also referred to in the examples below.
A wide variety of modified nucleic acids may be employed, including those that have 2'-modification of hydroxyl, 2'-O-methyl, 2 '-fluoro, 2'-trifluoromethyl, 2'-O-(2- methoxyethyl), 2'-O-aminopropyl, 2'-O-dimethylamino-oxyethyl, 2'-O-fluoroethyl or 2'-O- propenyl. The nucleic acid may further include a 3' modification, desirably where the 2'- and 3 '-position of the ribose group is linked. The nucleic acid also may contain a modification at the 4'-position, desirably where the 2'- and 4'-positions of the ribose group are linked such as by a 2'-4' link of -CH2-S-, -CH2-NH-, or -CH2-NMe- bridge. The nucleotide also may have a variety of configurations such as α-D-ribo, β-D-xylo, or α-L-xylo configuration.
The intemucleoside linkages of the units of oligos of the invention may be natural phosphorodiester linkages, or other linkages such as -O-P(O)2-O-, -O-P(O,S)-O-, -O- P(S)2-O-, -NRH-P(O)2-O-, -O-P(O,NRH)-O-, -O-PO(R")-O-, -O-PO(CH3)-O-, and -O- PO(NHRN)-O-, where RH is selected from hydrogen and C^-alkyl, and R" is selected from d-β-alkyl and phenyl.
A further desirable group of modified nucleic acids for incorporation into oligomers of the invention include those of the following formula:
wherein X is -O-; B is a modified base as discussed above e.g. an optionally substituted carbocyclic aryl such as optionally substituted pyrene or optionally substituted pyrenylmethylglycerol, or an optionally substituted heteroalicylic or optionally substituted heteroaromatic such as optionally substituted pyridyloxazole. Other desirable universal bases include, pyrrole, diazole or triazole moieties, all of which may be optionally substituted. R1 is hydrogen. P designates the radical position for an intemucleoside linkage to a succeeding monomer, or a 5'-terminal group, such intemucleoside linkage or 5'-terminal group optionally including the substituent R , R being hydrogen or included in an intemucleoside linkage. R3 is a group P* which designates an intemucleoside linkage to a preceding monomer, or a 3 '-terminal group. One or two pairs of non-geminal substituents selected from the present substituents of R , R , R , R , may designate a biradical consisting of 1-4 groups/atoms selected from -C(RaRb)-, -C(Ra)=C(Ra)-, -C(Ra)=N-, -O-, -S-, -SO2-, -N(Ra)-, and >C=Z. Z is selected from -O-, -S-, and -N(Ra)-, and Ra and Rb each is independently selected from hydrogen, optionally substituted Cι..6-alkyl, optionally substituted C2.6-alkenyl, hydroxy, Cι-6- alkoxy, C2-6-alkenyloxy, carboxy, Cχ.6-alkoxycarbonyl, C δ-alkylcarbonyl, formyl, amino, mono- and d^ -e-alky arnino, carbamoyl, mono- and di(C1.6-alkyl)-amino-carbonyl, amino- Cι-6-alkyl-aminocarbonyl, mono- and di(Ci.6-alkyl)amino-Ci.6-alkyl-aminocarbonyl, Cι-6- alkyl-carbonylamino, carbamido, Ci-β-alkanoyloxy, sulphono, Cι.6-alkylsulphonyloxy, nitro, azido, sulphanyl, Cι_6-alkylthio, halogen, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands, the possible pair of non- geminal substituents thereby forming a monocyclic entity together with (i) the atoms to which the non-geminal substituents are bound and (ii) any intervening atoms; and each of the substituents R2, R2*, R3, R4* which are present and not involved in the possible biradical is independently selected from hydrogen, optionally substituted Ci-6-alkyl, optionally substituted C2-6-alkenyl, hydroxy, Cι.6-alkoxy, C2-6-alkenyloxy, carboxy, Cι._6- alkoxycarbonyl, Cι-6-alkylcarbonyl, formyl, amino, mono- and di(Cι_6-alkyl)amino,
carbamoyl, mono- and
amino-Cι-6-alkyl-aminocarbonyl, mono- and di(Cι.6-alkyl)amino-Cι-6-alkyl-aminocarbonyl, Cι-6-alkyl-carbonylamino, carbamido, Cι.6-alkanoyloxy, sulphono, Cι-6-alkylsulphonyloxy, nitro, azido, sulphanyl, Cι-6- alkylthio, halogen, photochemically active groups, thermochemically active groups, chelating groups, reporter groups, and ligands; and basic salts and acid addition salts thereof.
Modified nucleobases and nucleosidic bases may comprise a cyclic unit (e.g. a carbocyclic unit such as pyrenyl) that is joined to a nucleic unit, such as a l'-position of furasonyl ring through a linker, such as a straight of branched chain alkylene or alkenylene group. Alkylene groups suitably having from 1 (i.e. -CH -) to about 12 carbon atoms, more typically 1 to about 8 carbon atoms, still more typically 1 to about 6 carbon atoms.
Alkenylene groups suitably have one, two or three carbon-carbon double bounds and from 2 to 12 carbon atoms, more typically 2 to 8 carbon atoms, still more typically 2 to 6 carbon atoms.
Example 24: Exemplary Nucleic Acid Monomers and Oligomers
Desirable LNA units include those that contain a furanosyl-type ring and one or more of the following linkages: C-l', C-2'; C-2', C-3'; C-2*, C-4'; or a C-2', C-5' linkage. A C-2', C- 4' is particularly desirable. In another aspect of the invention, desirable LNA units are compounds having a substituent on the 2'-position of the central sugar moiety (e.g., ribose or xylose), or derivatives thereof, which favors the C3'-endo conformation, commonly referred to as the North (or simply N for short) conformation. Desirable LNA In various embodiments, the oligonucleotide has at least one LNA unit with a modified base as disclosed herein. Suitable oligonucleotides also may contain natural DNA or RNA units (e.g., nucleotides) with natural bases, as well as LNA units that contain natural bases. Furthermore, the oligonucleotides of the invention also may contain modified DNA or RNA, such as 2'-O-methyl RNA, with natural or modified nucleobases (e.g., pyrene). Desirable oligonucleotides contain at least one of and desirably both of 1) one or more DNA or RNA units (e.g., nucleotides) with natural bases, and 2) one or more LNA units with natural bases, in addition to LNA units with a modified base. In other embodiments, the nucleic acid does not contain a modified base.
Oligonucleotides of the invention desirably contain at least 50 percent or more, more desirably 55, 60, 65, or 70 percent or more of non-modified or natural DNA or RNA units (e.g., nucleotides) or units other than LNA units based on the total number of units or
residues of the oligo. A non-modified nucleic acid as referred to herein means that the nucleic acid upon incorporation into a 10-mer oligomer will not increase the Tm of the oligomer in excess of 1°C or 2°C. More desirably, the non-modified nucleic acid unit (e.g., nucleotide) is a substantially or completely "natural" nucleic acid, i.e. containing a non- modified base of uracil, cytosine, 5 -methyl-cytosine, thymine, adenine or guanine and a non- modified pentose sugar unit of β-D-ribose (in the case of RNA) or β-D-2 -deoxyribose (in the case of DNA).
Oligonucleotides of the invention suitably may contain only a single modified (i.e. LNA) nucleic acid unit, but desirably an oligonucleotide will contain 2, 3, 4 or 5 or more modified nucleic acid units. Typically desirable is where an oligonucleotide contains from about 5 to about 40 or 45 percent modified (LNA) nucleic acid units, based on total units of the oligo, more desirably where the oligonucleotide contains from about 5 or 10 percent to about 20, 25, 30 or 35 percent modified nucleic acid units, based on total units of the oligo. Typical oligonucleotides that contain one or more LNA units with a modified base as disclosed herein suitably contain from 3 or 4 to about 200 nucleic acid repeat units, with at least one unit being an LNA unit with a modified base, more typically from about 3 or 4 to about 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140 or 150 nucleic acid units, with 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 LNA units with a modified base being present. As discussed above, particularly desirable oligonucleotides contain a non- modified
DNA or RNA unit at the 3' terminus and a modified DNA or RNA unit at one position upstream from (generally referred to hereing as the -1 or penultimate position) the 3' terminal non-modified nucleic acid unit. In some embodiments, the modified base is at the 3' terminal position of a nucleic acid primer, such as a primer for the detection of a single nucleotide polymorphism. Other particularly desirable nucleic acids have an LNA unit with or without a modified base in the 5' and/or 3' terminal position.
Also desirable are oligonucleotides that do not have an extended stretches of modified DNA or RNA units, e.g. greater than about 4, 5 or 6 consecutive modified DNA or RNA units. That is, desirably one or more non-modified DNA or RNA will be present after a consecutive stretch of about 3, 4 or 5 modified nucleic acids.
Generally desirable are oligonucleotides that contain a mixture of LNA units that have non-modified or natural nucleobases (i.e., adenine, guanine, cytosine, 5-methyl- cytosine, uracil, or thymine) and LNA units that have modified bases as disclosed herein.
Particularly desirable oligonucleotides of the invention include those where an LNA unit with a modified base is interposed between two LNA units each having non-modified or natural bases (adenine, guanine, cytosine, 5-methyl-cytosine, uracil, or thymine. The LNA "flanking" units with natural base moieties may be directly adjacent to the LNA with modified base moiety, or desirably is within 2, 3, 4 or 5 nucleic acid units of the LNA unit with modified base. Nucleic acid units that may be spaced between an LNA unit with a modified base and an LNA unit with natural nucleobasis suitably are DNA and/or RNA and/or alkyl-modified RNA/DNA units, typically with natural base moieties, although the DNA and or RNA units also may contain modified base moieties. The oligonucleotides of the present invention are comprised of at least about one universal base. Oligonucleotides of the present can also be comprised, for exmple, of between about one to six 2'-Ome-RNA unit, at least about two LNA units and at least about one LNA pyrene unit.
Example 25: Exemplary Target Nucleic Acids
In the practice of the present invention, target genes may be suitably single-stranded or double-stranded DNA or RNA; however, single-stranded DNA or RNA targets are desirable. It is understood that the target to which the nucleic acids of the invention are directed include allelic forms of the targeted gene and the corresponding mRNAs including splice variants. There is substantial guidance in the literature for selecting particular sequences for nucleic acids with LNA or other high affinity nucleotides given a knowledge of the sequence of the target polynucleotide, e.g., Peyman and Ulmann, Chemical Reviews, 90:543-584, 1990; Cτooke, Ann. Rev. Pharmacol Toxicol, 32:329-376 (1992); and Zamecnik and Stephenson, Proc. Natl. Acad. Sci, 75:280-284 (1974). Desirable mRNA targets include the 5' cap site, tRNA primer binding site, the initiation codon site, the mRNA donor splice site, and the mRNA acceptor splice site, e.g., Goodchild et al., U.S. Patent 4,806,463.
Example 26: Exemplary Applications of Present Methods The chimeric oligos of the present invention are highly suitable for a variety of diagnostic purposes such as for the isolation, purification, amplification, detection, identification, quantification, or capture of nucleic acids such as DNA, mRNA or non-protein
coding cellular RNAs, such as tRNA, rRNA, snRNA and scRNA, or synthetic nucleic acids, in vivo or in vitro.
The oligomer can comprise a photochemically active group, a thermochemically active group, a chelating group, a reporter group, or a ligand that facilitates the direct or indirect detection of the oligomer or the immobilization of the oligomer onto a solid support. Such group are typically attached to the oligo when it is intended as a probe for in situ hybridization, in Southern hybridization, Dot blot hybridization, reverse Dot blot hybridization, or in Northern hybridization.
When the photochemically active group, the thermochemically active group, the chelating group, the reporter group, or the ligand includes a spacer (K), the spacer may suitably comprise a chemically cleavable group.
An additional object of the present invention is to provide oligonucleotides which combines an increased ability to discriminate between complementary and mismatched targets with the ability to act as substrates for nucleic acid active enzymes such as for example DNA and RNA polymerases, ligases, phosphatases. Such oligonucleotides may be used for instance as primers for sequencing nucleic acids and as primers in any of the several well known amplification reactions, such as the PCR reaction.
Introduction of LNA monomers with natural bases into either DNA, RNA, or pure LNA oligonucleotides can result in extremely high thermal stability of duplexes with complementary DNA or RNA, while at the same time obeying the Watson-Crick base- pairing rales. In general, the thermal stability of heteroduplexes is increased 3-8°C per LNA monomer in the duplex. Oligonucleotides containing LNA can be designed to be substrates for polymerases (e.g., Taq polymerase), and PCR based on LNA primers is more discriminatory towards single base mutations in the template DNA compared to normal DNA-primers (e.g., allele specific PCR). Furthermore, very short LNA oligos (e.g. 5-mers or 8-mers) which have high Tm's when compared to similar DNA oligos can be used as highly specific catching probes with outstanding discriminatory power towards single base mutations (e.g., SNP detection).
LNA oligonucleotides are capable of hybridizing with double-stranded DNA target molecules as well as RNA secondary stmctures by strand invasion as well as of specifically blocking a wide selection of enzymatic reactions such as, digestion of double-stranded DNA by restriction endonucleases; and digestion of DNA and RNA with deoxyribonucleases and ribonucleases, respectively.
In a further aspect, oligonucleotides of the invention may be used to construct new affinity pairs with exhibit enhanced specificity towards each other. The affinity constants can easily be adjusted over a wide range and a vast number of affinity pairs can be designed and synthesized. One part of the affinity pair can be attached to the molecule of interest (e.g. proteins, amplicons, enzymes, polysaccharides, antibodies, haptens, peptides, etc.) by standard methods, while the other part of the affinity pair can be attached to e.g. a solid support such as beads, membranes, micro-titer plates, sticks, tubes, etc. The solid support may be chosen from a wide range of polymer materials such as for instance polypropylene, polystyrene, polycarbonate or polyethylene. The affinity pairs may be used in selective isolation, purification, capture and detection of a diversity of the target molecules.
Oligonucleotides of the invention also may be employed as probes in the purification, isolation and detection of for instance pathogenic organisms such as viral, bacteria and fungi etc. Oligonucleotides of the invention also may be used as generic tools for the purification, isolation, amplification and detection of nucleic acids from groups of related species such as for instance rRNA from gram-positive or gram negative bacteria, fungi, mammalian cells etc. Oligonucleotides of the invention also may be employed as an aptamer in molecular diagnostics, e.g. in RNA mediated catalytic processes, in specific binding of antibiotics, drugs, amino acids, peptides, stractural proteins, protein receptors, protein enzymes, saccharides, polysaccharides, biological cofactors, nucleic acids, or triphosphates or in the separation of enantiomers from racemic mixtures by stereospecific binding.
Oligonucleotides of the invention also may be used for labeling of cells, e.g. in methods wherein the label allows the cells to be separated from unlabelled cells.
Oligonucleotides also may be conjugated to a compound selected from proteins, amplicons, enzymes, polysaccharides, antibodies, haptens, and peptides. Kits are also provided containing one or more oligonucleotides of the invention for the isolation, purification, amplification, detection, identification, quantification, or capture of natural or synthetic nucleic acids. The kit typically will contain a reaction body, e.g. a slide or biochip. One or more oligonucleotides of the invention may be suitably immobilized on such a reaction body. The invention also provides methods for using kits of the invention for carrying out a variety of bioassays, e.g. for diagnostic purposes. Any type of assay wherein one component is immobilized may be carried out using the substrate platforms of the invention. Bioassays utilizing an immobilized component are well known in the art. Examples of assays utilizing
an immobilized component include for example, immunoassays, analysis of protein-protein interactions, analysis of protein-nucleic acid interactions, analysis of nucleic acid-nucleic acid interactions, receptor binding assays, enzyme assays, phosphorylation assays, diagnostic assays for determination of disease state, genetic profiling for drag compatibility analysis, and SNP detection (U.S.P.N. 6,316,198; 6,303,315).
Identification of a nucleic acid sequence capable of binding to a biomolecule of interest can be achieved by immobilizing a library of nucleic acids onto the substrate surface so that each unique nucleic acid was located at a defined position to form an array. The array would then be exposed to the biomolecule under conditions which favored binding of the biomolecule to the nucleic acids. Non-specifically binding biomolecules could be washed away using mild to stringent buffer conditions depending on the level of specificity of binding desired. The nucleic acid array would then be analyzed to determine which nucleic acid sequences bound to the biomolecule. Desirably the biomolecules would carry a fluorescent tag for use in detection of the location of the bound nucleic acids. Oligonucleotides of the invention can be employed in a wide range of applications, particularly those applications involving a hybridization reaction. Oligonucleotides also may be used in DNA sequencing aiming at improved throughput in large-scale, shotgun genome sequencing projects, improved throughput in capillary DNA sequencing (e.g. ABI prism 3700) as well as at an improved method for 1) sequencing large, tandemly repeated genomic regions, 2) closing gaps in genome sequencing projects and 3) sequencing of GC-rich templates. In DNA sequencing, oligonucleotide sequencing primers are combined with LNA enhancer elements for the read-through of GC-rich and/or tandemly repeated genomic regions, which often present many challenges for genome sequencing projects. LNA may increase the specificity of certain sequencing primers and thus facilitate selection of a particular version of a repeated sequence and possibly also use strand invasion to open up recalcitrant GC rich sequences.
The incorporation of one or more universal nucleosides into the oligomer makes bonding to unknown bases possible and allows the oligonucleotide to match ambiguous or unknown nucleic acid sequences. As discussed above, oligonucleotides of the invention may be used for therapeutic applications, e.g. as an antisense, antigene or ribozyme or double stranded nucleic acid therapeutic agents. In these therapeutic methods, one or more oligonucleotides of the
invention is administered as desired to a patient suffering from or susceptible the targeted disease or disorder, e.g. a viral infection.
In an exemplary in vitro method for measuring the ability of a nucleic acid of the invention to silence a target gene, cells are cultured in standard medium supplemented with 1% fetal calf serum as previously described (Lykkesfeld et al, Int. J. Cancer 61:529-534, 1995). At the start of the experiment cells are approximately 40% confluent. The serum containing medium is removed and replaced with serum-free medium. Transfection is performed using, e.g., Lipofectin (GibcoBRL cat. No 18292-011) diluted 40X in medium without serum and combined with the oligo to a concentration of 750 nM oligo, 0.8 ug ml Lipofectin. Then, the medium is removed from the cells and replaced with the medium containing oligo-Lipofectin complex. The cells are incubated at 37°C for 6 hours, rinsed once with medium without serum and incubated for a further 18 hours in DME/F12 with 1% FCS at 37°C. Standard methods are used to measure the level of mRNA or protein encoded by the target gene to measure the level of gene silencing. It is also contemplated that information on the stmctures assumed by a target nucleic acid may be used in the design of the probes, such that regions that are known or suspected to be involved in folding may be chosen as hybridization sites. Such an approach will reduce the number of probes that are likely to be needed to distinguish between targets of interest.
There are many methods used to obtain stractural information involving nucleic acids, including the use of chemicals that are sensitive to the nucleic acid stracture, such as phenanthroline/copper, EDTA-Fe2+, cisplatin, ethylnitrosourea, dimethyl pyrocarbonate, hydrazine, dimethyl sulfate, and bisulfite. Enzymatic probing using structure-specific nucleases from a variety of sources, such as the Cleavase™ enzymes (Third Wave Technologies, Inc., Madison, Wis.), Taq DNA polymerase, E. coli DNA polymerase I, and eukaryotic structure-specific endonucleases (e.g., human, murine and Xenopus XPG enzymes, yeast RAD2 enzymes), murine FEN-1 endonucleases (Harrington and Lieber, Genes and Develop., 3:1344 [1994]) and calf thymus 5' to 3' exonuclease (Murante et al., J. Biol. Chem., 269:1191 [1994]). In addition, enzymes having 3' nuclease activity such as members of the family of DNA repair endonucleases (e.g., the Rrpl enzyme from Drosophila melanogaster, the yeast RADI/RADIO complex and E. coli Exo III), are also suitable for examining the stractures of nucleic acids.
If analysis of stmcture as a step in probe selection is to be used for a segment of nucleic acid for which no information is available concerning regions likely to form
secondary stmctures, the sites of stracture-induced modification or cleavage must be identified. It is most convenient if the modification or cleavage can be done under partially reactive conditions (i.e., such that in the population of molecules in a test sample, each individual will receive only one or a few cuts or modifications). When the sample is analyzed as a whole, each reactive site should be represented, and all the sites may be thus identified. Using a Cleavase Fragment Length Polymorphism™ cleavage reaction as an example, when the partial cleavage products of an end labeled nucleic acid fragment are resolved by size (e.g., by electrophoresis), the result is a ladder of bands indicating the site of each cleavage, measured from the labeled end. Similar analysis can be done for chemical modifications that block DNA synthesis; extension of a primer on molecules that have been partially modified will yield a nested set of termination products. Determining the sites of cleavage/modification may be done with some degree of accuracy by comparing the products to size markers (e.g., commercially available fragments of DNA for size comparison) but a more accurate measure is to create a DNA sequencing ladder for the same segment of nucleic acid to resolve alongside the test sample. This allows rapid identification of the precise site of cleavage or modification.
Example 27: General Reaction Conditions for Synthesis of Some Compounds of the Invention Reactions were conducted under an atmosphere of nitrogen when anhydrous solvents were used. All reactions were monitored by thin-layer chromatography (TLC) using EM reagent plates with fluorescence indicator (SiO2-60, F-254). The compounds were visualized under UN light and by spraying with a mixture of 5%) aqueous sulfuric acid and ethanol followed by heating. Silica gel 60 (particle size 0.040-0.063 mm, Merck) was used for flash column chromatography. ΝMR spectra were recorded at 300 MHz for !H ΝMR, 75.5 MHz for 13C ΝMR and 121.5 MHz for 31P ΝMR on a Narian Unity 300 spectrometer. £Nalues are
1 1 ^ in ppm relative to tetramethyl silane as internal standard ( H and C ΝMR) and relative to 85% H3PO4 as external standard (31P ΝMR). Coupling constants are given in Hertz. The assignments, when given, are tentative, and the assignments of methylene protons, when given, may be interchanged. Bicyclic compounds are named according to the Non Bayer nomenclature. Fast atom bombardment mass spectra (FAB-MS) were recorded in positive ion mode on a Kratos MS50TC spectrometer. The composition of the oligonucleotides were
verified by MALDI-MS on a Micromass Tof Spec E mass spectrometer using a matrix of diammonium citrate and 2,6-dihydroxyacetophenone.
Example 28: Synthesis of 1.2-O-Isopropylidene-5-O-methanesulfonyl-4-C- 5 methanesulfonyloxymethyl-3-O-(p-methoxybenzyl)-α-D-ribofuranose [Compound 2 in Scheme 1 above]
Mesyl chloride (8.6 g, 7.5 mmol) was dropwise added to a stirred solution of 4-C- hydroxymethyl- 1 ,2-O-isopropylidene-3 -O-p-methoxybenzyl- or-D-ribofuranose [R. Yamaguchi, T. Imanishi, S. Kohgo, H. Horie and H. Ohrui, Biosci. Biotechnol Biochem., 10 1999, 63, 736] (1, 10.0 g, 29.4 mmol) in anhydrous pyridine (30 cm3) and the reaction mixture was stirred overnight at room temperature. The mixture was evaporated to dryness under reduced pressure to give a residue which was co-evaporated with toluene (2 x 25 cm3), dissolved in CH2C12 (200 cm3) and washed successively with saturated aqueous NaHCO3 (2 x 100 cm ) and brine (50 cm ). The organic phase was dried (Na2SO4), filtered and 15 evaporated to dryness under reduced pressure. The colorless viscous oil obtained was purified by column chromatography [0.5-1% (v/v) MeOH in CH2C12 as eluent], followed by crystallization from MeOH to give furanose 2 as a white solid material (13.6 g, 93%); R/0.57 (CH2Cl2/MeOH 95:5, v/v); <3 (CDC13) 7.30 (2 H, d, J8.7), 6.90 (2 H, d, J8.5), 5.78 (1 H, d, J3.7), 4.86 (1 H, d, J 12.0), 4.70 (1 H, d, J 11.4), 4.62 (1 H, dd, J5.0 and 3.8), 4.50 (1 H, d, J 20 11.1), 4.39 (1 H, d, J 12.3), 4.31 (1 H, d, J11.0), 4.17 (1 H, d, J5.1), 4.11 (1 H, d, J11.0), 3.81 (3 H, s), 3.07 (3 H, s), 2.99 (3 H, s), 1.68 (3 H, s), 1.34 (3 H, s); δc (CDC13) 159.8, 129.9, 128.8, 114.1, 114.0, 104.5, 83.2, 78.0, 77.9, 72.6, 69.6, 68.8, 55.4, 38.1, 37.5, 26.3, 25.7.
Example 29: Synthesis of Methyl 5-O-methanesulfonyl-4-C-methanesulfonyloxymethyl-3- 25 O-(p-methoxybenzyl)-D-ribofuranoside [Compound 3 in Scheme 1 above]
A suspension of furanoside 2 (13.5 g, 27.2 mmol) in a mixture of H2O (45 cm3) and 15%» HC1 in MeOH (450 cm3, w/w) was stirred at room temperature for 72 h. The mixture was carefully neutralized by addition of saturated aqueous NaHCO3 (100 cm3) followed by NaHCO3 (s) whereupon the mixture was evaporated to dryness under reduced pressure. H2O 30 (100cm3) was added, and extraction was performed with EtOAc (3 x 100 cm3). The combined organic phase was washed with brine (100 cm3), dried (Na2SO ), filtered and then evaporated to dryness under reduced pressure. The residue was coevaporated with toluene (2 x 25 cm3) and purified by column chromatography [1-2% (v/v) MeOH in CH2C12] to give
furanoside 3 as an anomeric mixture (clear oil, 11.0 g, 86%, ratio between anomers ca. 6:1); R 0.39, 0.33 (CH2Cl2/MeOH 95:5, v/v); & (CDC13, major anomer only) 7.28 (2 H, d, J8.4), 6.91 (2 H, d, J8.9), 4.87 (1 H, s), 4.62 (1 H, d, J 11.4), 4.53 (1 H, d, J 11.2), 4.41 (2 H, s), 4.31 (1 H, d, J9.8), 4.24 (1 H, d, J4.6), 4.06 (1 H, d, J 10.0), 3.98 (1 H, br s), 3.81 (3 H, s), 3.33 (3 H, s), 3.06 (3 H, s), 3.03 (3 H,s); δc (CDC13, major anomer only) 160.0, 130.1, 128.5, 114.3, 107.8, 81.7, 81.2, 73.8, 73.6, 69.7, 69.6, 55.5, 55.4, 37.5, 37.4.
Example 30: Synthesis of (1R3RS.4R.7S)-1 -Methanesulfonyloxymethyl-3-methoxy-7-(p- methoxybenzyloxy)-2.5-dioxabicyclo 2.2.1]heptane [Compound 4 in Scheme 1 above] To a stirred solution of the anomeric mixture of Compound 3 (10.9 g, 23.2 mmol) in anhydrous DMF (50 cm3) at 0 °C was during 10 min added sodium hydride (2.28 g of a 60% suspension in mineral oil (w/w), 95.2 mmol) and the mixture was stirred for 12 h at room temperature. Ice-cold H O (200 cm3) was slowly added and extraction was performed using
EtOAc (3 x 200 cm ). The combined organic phase was washed successively with saturated aqueous NaHCO3 (2 x 100 cm3) and brine (50 cm3), dried (Na2SO4), filtered and evaporated to dryness under reduced pressure. The residue was purified by column chromatography [0.5- 1% (v/v) MeOH in CH2C12 ] to give first the major isomer (6.42 g, 74%) and then [1.5% (v/v) MeOH in CH2C12 ] the minor isomer (1.13 g, 13%), both as clear oils; R/0.56, 0.45 (CH2Cl2/MeOH 95:5, v/v); δ\ι (CDC13, major isomer) 7.16 (2 H, d, J8.8), 6.74 (2 H, d, J 8.4), 4.65 (1 H, s), 4.42-4.32 (4 H, m), 3.95-3.94 (2 H, m), 3.84 (1 H, d, J7.4), 3.66 (3 H, s), 3.54 (1 H, d, J7.4), 3.21 (3 H, s), 2.90 (3 H, s); δc (CDC13, major isomer) 159.6, 129.5,1293, 114.0, 105.3, 83.2, 78.6, 77.2, 72.1, 71.8, 66.3, 55.6, 55.4, 37.8; <5 (CDC13, minor isomer) 7.27 (2 H, d, J8.9), 6.89 (2 H, d, J8.6), 4.99 (1 H, s), 4.63-4.39 (4 H, m), 4.19 (1 H, s), 4.10- 3.94 (2 H, m), 3.91 (1 H, s), 3.81 (3 H, s), 3.47 (3 H, s), 3.05 (3 H, s); δc (CDC13, minor isomer) 159.7, 129.6, 129.5, 114.1, 104.4, 86.4, 79.3, 77.1, 72.3, 71.9, 66.2, 56.4, 55.4, 37.7.
Example 31: Synthesis of (lR.4R.7S)-l-Acetoxymethyl-3-methoxy-7-(p- methoxybenzyloxy)-2.5-dioxabicyclo[2.2.1]heptane [Compound 5 in Scheme 1]
To a stirred solution of furanoside 4 (major isomer, 6.36 g, 17.0 mmol) in dioxane (25 cm3) was added 18-crown-6 (9.0 g, 34.1 mmol) and KOAc (8.4 g, 85.6 mmol). The stirred mixture was heated under refluxed for 12 h and subsequently evaporated to dryness under reduced pressure. The residue was dissolved in CH2C12 (100 cm3) and washing was performed, successively, with saturated aqueous NaHCO3 (2 x50 cm3) and brine (50 cm3).
The separated organic phase was dried (Na2SO ), filtered and evaporated to dryness under reduced pressure. The residue was purified by column chromatography [1% (v/v) MeOH in CH2C12] to give furanoside 5 as a white solid material (one anomer, 5.23 g, 91%); R/-0.63 (CH2Cl2/MeOH 95:5, v/v); ό (CDC13) 7.27-7.24 (2 H, m), 6.90-6.87 (2 H, m), 4.79 (1 H, s), 5 4.61 (1 H, d, 711.0), 4.49 (2 H, m), 4.28 (1 H, d, J 11.0), 4.04 (3 H, m), 3.80 (3 H, s), 3.68 (1 H, m), 3.36 (3 H, s), 2.06 (3 H, s); δc (CDC13) 170.7, 159.5, 129.5, 129.4, 113.9, 105.1, 83.3, 78.9, 77.2, 72.0, 71.9, 61.0, 55.4, 55.3, 20.8.
Example 32: Synthesis of (lS.4R.7S)-l-Hvdroxymethyl-3-methoxy-7-(p-
10 methoxybenzyloxy)-2.5-dioxabicyclo[2.2.1]heptane [Compound 6 in Scheme 1]
A solution of furanoside 5 (one anomer, 5.16 g, 15.3 mmol) in saturated methanolic ammonia (200 cm ) was stirred at room temperature for 48 h. The reaction mixture was evaporated to dryness under reduced pressure, coevaporated with toluene (2 x 50 cm3), and the residue purified by column chromatography [2-3% (v/v) MeOH in CH2C12 ] to give
15 furanoside 6 as a white solid material (one anomer, 3.98 g, 88%); R/0.43 (CH2Cl2/MeOH 95:5, v/v); <5 (CDC13) 7.27 (2 H, d, J8.6), 6.88 (2 H, d, J8.9), 4.79 (1 H, s),-4.59 (1 H, d, J 11.3), 4.53 (1 H, d, J 11.4), 4.09 (2 H, s), 3.97 (1 H, d, J7.5), 3.86 (2 H, br s), 3.80 (3 H, s), 3.75-3.62 (2 H, m), 3.37 (3 H, s); δc (CDC13) 159.4, 129.7, 1293, 113.9, 105.2, 85.6, 78.3, 77.4, 71.9, 71.8, 58.8, 55.5, 553.
20
Example 33: (lS,4R.7S)-3-Methoxy-7-(p-methoxybenzyloxy)-l-(p- methoxybenzyloxymethyl)-2.5-dioxabicyclo[2.2.1]heptane [Compound 7 in Scheme 1]
To a stirred solution of furanoside 6 (one anomer, 3.94 g, 13.3 mmol) in anhydrous DMF (50 cm3) at 0 °C was added a suspension of NaH [60% in mineral oil (w/w), 1.46 g,
25 60.8 mmol] followed by dropwise addition of p-methoxybenzyl chloride (2.74 g, 17.5 mmol). The mixture was allowed to warm to room temperature and stirring was continued for another 4 h whereupon ice-cold H2O (50 cm3) was dropwise added. The mixture was extracted with CH2C12 (3 x 100 cm3) and the combined organic phase was washed with brine (100 cm3), dried (Na2SO4), filtered, evaporated to dryness under reduced pressure and
30 coevaporated with toluene (3 x 50 cm3). The residue (4.71 g) tentatively assigned as a mixture of 7 and aldehyde 11 was used in the preparation of 11 (see below) without further purification.
Example 34: 4-C-Methanesulfonyloxymethyl-3.5-di-O-(p-methoxybenzylV 1.2-0- isopropylidene-α-D-ribofuranose [Compound 9 in Scheme 1]
4-C-Hydroxymethyl-3 ,5-di-O-(p-methoxybenzyl)- 1 ,2-O-isopropylidene-or-D- 5 ribofuranose [R. Yamaguchi, T. Imanishi, S. Kohgo, H. Horie and H. Ohrui, Biosci. Biotechnol Biochem., 1999, 63, 736] (8, 3.2 g, 6.95 mmol) was mesylated using MsCl (2.00 g, 17.5 mmol) and pyridine (10 cm3) following the procedure described for 2. After work-up, the colorless viscous oil was purified by column chromatography [1% (v/v) MeOH in CH2C12] to give derivative 9 in 89% yield (3.17 g) as a clear oil; R/0.45 (CH2Cl2/MeOH
10 98:2, v/v); δ (CDC13) 7.22 (2 H, d, J8.9), 7.18 (2 H, d, J8.7), 6.86 (4 H, d, J 8.3), 5.76 (1 H, d, J3.8), 4.83 (1 H, d, J 12.0), 4.64 (1 H, d, J 11.6), 4.59 (1 H, m), 4.49-435 (4 H, m), 4.24 (1 H, d, J5.3), 3.80 (6 H, s), 3.56 (1 H, d, J 10.5), 3.45 (1 H, d, J 10.5), 3.06 (3 H, s), 1.67 (3 H, s), 133 (3 H, s); δc (CDC13) 159.6, 159.4, 129.9, 129.8, 129.7, 129.5, 129.4, 129.3, 114.0, 113.9, 113.8, 113.7, 113.6, 104.5, 84.9, 78.6, 78.1, 73.4, 72.4, 71.0, 69.9, 55.3, 38.0,
15 26.4, 25.9.
Example 35: Methyl 4-C-methanesulfonyloxymethyl-3.5-di-O-(p-methoxybenzyl)-D- ribofuranose [Compound 10 in Scheme 1]
Methanolysis of furanoside 9 (3.1 g, 5.76 mmol) was performed using a mixture of a
20 solution of 15% HCl in MeOH (w/w, 120 cm3) and H2O (12 cm3) following the procedure described for the synthesis of 3. After work-up, the cmde product was purified by column chromatography [0.5-1% (v/v) MeOH in CH2C12] to give the major anomer of 10 (1.71 g, 58%) and [1-1.5% (v/v) MeOH in CH2C12] the minor anomer of 10 (0.47 g, 16%), both as clear oils; R 0.31, 0.24 (CH2Cl2/MeOH 98:2, v/v); δc (major anomer, CDC13) 159.8, 159.5,
25 129.9, 129.8, 129.6, 129.5, 129.0, 114.2, 114.1, 114.0, 113.9, 107.9, 84.7, 79.9, 74.2, 73.5, 73.5, 70.2, 64.4, 55.6, 55.4, 37.4.
Example 36: Alternative preparation of Compound 7 in Scheme 1
Ring closure of furanoside 10 (major anomer, 1.68 g, 3.28 mmol) was achieved using 30 NaH (60%) suspension in mineral oil (w/w), 0.32 g, 13.1 mmol) in anhydrous DMF (10 cm3) following the procedure described for the synthesis of 4 to give a cmde product tentatively assigned as a mixture of furanoside 7 and aldehyde 11 (see below) (1.13 g).
Example 37: (2R3S.4SV4-Hvdroxy-3-(p-methoxybenzyloxyV4-(p- methoxybenzyloxymethyl)-tetrahydrofuran-2-carbaldehyde [Compound 11 in Scheme 1]
A solution of crude furanoside 7 (as a mixture with 11 as prepared as described above, 5.80 g) in 80% glacial acetic acid (100 cm3) was stirred at 50 °C for 4 h. The solvent was distilled off under reduced pressure and the residue was successively coevaporated with absolute ethanol (3 x 25 cm3) and toluene (2 x 25 cm3) and purified by column chromatography [4-5% (v/v) MeOH in CH2C12 ] to give aldehyde 11 as a colorless oil (4.60 g); R 037 (CH2Cl2/MeOH 95:5, v/v); <5 (CDC13) 9.64 (1 H, br s), 7.27-7.17 (4 H, m), 6.87- 6.84 (4 H, m), 4.59 (1 H, d, J 11.6), 4.51-4.41 (2 H, m), 4.35 (1 H, s), 3.92-3.90 (2 H, m), 3.79 (6 H, s), 3.77-3.68 (3 H, m), 3.55 (2 H, br s); δc (CDC13) 203.6, 159.5, 159.4, 129.7, 129.6, 129.5, 129.2, 114.0, 113.9, 113.8, 87.3, 86.7, 81.0, 75.1, 73.4, 71.6, 67.6, 55.3.
Example 38: General procedure for the reaction of aryl magnesium bromides with aldehyde
11 to give Compounds 12a-e in Scheme 2 A solution of aldehyde 11 (Scheme 2) in anhydrous THF (10 cm3) was added dropwise during 5 min to a stirred solution of the aryl magnesium bromide dissolved in anhydrous THF at 0 °C. The mixture was allowed to heat to room temperature and stirred for
12 h. The mixture was evaporated to dryness under reduced pressure and the residue diluted with CH2C12 and washed several times with saturated aqueous NH4CI. The organic phase was dried (Na2SO4), filtered, and evaporated to dryness under reduced pressure. Column chromatography of the crude product obtained afforded the compounds 12a-e as shown in Scheme 2.
Example 38a: Synthesis of (2S3S.4S)-4-Hvdroxy-2-[(R)-hvdroxy(phenyl)methyl]-4-(p- methoxybenzyloxy)-3-(p-methoxybenzyloxymethyl) tetrahydrofuran [Compound 12a of Scheme 2]
Grignard reaction of phenylmagnesium bromide (1.0 M solution in THF, 14.2 cm3, 14.2 mmol) with aldehyde 11 (515 mg, 1.28 mmol) afforded 12a as shown in Scheme 2. The cmde product was purified by column chromatography [4% (v/v) MeOH in CH2C12] to give tetrahydrofuran 12a (540 mg, 88%) as a colorless oil; R/0.34 (CH2Cl2/MeOH 95:5, v/v); δn (CDC13) 7.40-7.19 (7 H, m), 6.91-6.73 (6 H, m), 4.73 (1 H, d, J6.4), 4.48 (2 H, s), 4.08 (2 H, s), 3.88 (1 H, d, J9.4), 3.79 (1 H, m), 3.78 (3 H, s), 3.76 (3 H, s), 3.75-3.69 (2 H, m), 3.50 (1 H, d, J9.4), 3.45 (1 H, s), 3.42 (1 H, br s), 3.26 (1 H. br s); δc (CDC13) 159.5, 159.3, 140.7,
129.7, 129.6, 129.5, 129.2, 128.5, 128.0, 127.3, 113.9, 113.8, 113.7, 89.4, 84.6, 81.8, 75.3, 74.7, 73.5, 71.6, 693, 55.3; m/z (FAB) 503 [M+Na]+, 479 [M-H]+, 461 [M-H-H2O]+.
Example 38b: Synthesis of (2S.3S.4S)-4-Hvdroxy-2-[(R)-hvdroxy(4-fluoro-3- 5 methylphenyl)-methyl]-4-(p-methoxybenzyloxy)-3-(p- methoxybenzyloxymethyDtetrahydrofuran [Compound 12b of Scheme 2]
Grignard reaction of 4-fluoro-3-methylphenylmagnesium bromide (1.0 M solution in THF, 15.0 cm3, 15.0 mmol) with aldehyde 11 (603 mg, 1.5 mmol) afforded 12b as shown in Scheme 2. The cmde product was purified by column chromatography [4-5% (v/v) MeOH in
10 CH2C12] to give tetrahydrofuran 12b (611 mg, 85%) as a colorless oil; R/0.34
(CH2Cl2/MeOH 95:5, v/v); ό (CDC13) 7.24-7.12 (5 H, m), 6.98-6.84 (5 H, m), 6.77 (1 H, d, J8.5), 4.65 (1 H, dd, J2.8 and 6.4), 4.49 (2 H, s), 4.15 (2 H, s), 4.01 (1 H, dd, J23 and 6.5), 3.87 (1 H, d, J9.3), 3.79 (3H, s), 3.78 (3 H, s), 3.76-3.68 (2 H, m), 3.52 (1 H, s), 3.47 (1 H, d, J 10.3), 3.42 (1 H, d, J2.9), 3.22 (1 H, s), 2.24 (3 H, d, J0.8); δc (CDC13) 162.7, 159.5,
15 159.4, 136.2, 136.1, 130.3, 130.2, 129.7, 129.6, 129.5, 129.4, 129.1, 126.1, 126.0, 115.1,
114.8, 114.0, 113.9, 113.8, 89.3, 84.5, 81.8, 75.3, 74.0, 73.5, 71.7, 69.2, 55.4, 55.3, 14.7 (d, J 3.9); m/z (FAB) 535 [M+Na , 511 [M-H]+, 493 [M-H-H2O]+.
Example 38c: Synthesis of (2S3S.4S)-4-Hvdroxy-2-r(R)-hvdroxy(l-naphtyl)methyl]-4-(p- 20 methoxybenzyloxy)-3-(p-methoxybenzyloxymethyl) tetrahydrofuran [Compound 12c of Scheme 2]
1-Bromonaphthalene (1.55 g, 7.5 mmol) was added to a stirred mixture of magnesium turnings (182 mg, 7.5 mmol) and iodine (10 mg) in THF (10 cm3). The mixture was stirred at 40 °C for 1 h whereupon it was allowed to cool to room temperature. A solution of aldehyde
25 11 (603 mg, 1.5 mmol) in THF (10 cm ) was added slowly and the reaction was stirred for 12 h. The cmde product was purified by column chromatography [4-5% (v/v) MeOH in CH2C12] to give tetrahydrofuran 12c (756 mg, 95%) as a colorless oil; R/0.35 (CH2Cl2/MeOH 95:5, v/v); δa (CDC13) 8.08 (1 H, m), 7.86 (1 H, m), 7.79 (1 H, d, J8.2), 7.72 (1 H, d, J7.2), 7.49- 7.44 (3H, m), 7.18 (2 H, d, J 8.4), 6.84 (2 H, d, J 8.6), 6.74 (2 H, d, J 8.7), 6.68 (2 H, d, J
30 8.8), 5.52 (1 H, dd, J3.7 and 5.6), 4.45 (2 H, s), 4.34 (1 H, dd, J2.5 and 5.9), 4.03 (1 H, d, J 11.0), 3.96 (1 H, d, J 11.0), 3.93 (1 H, d, J9.5), 3.80 (1 H, d, J93), 3.77 (3 H, s), 3.75 (1 H, d, J2.6), 3.72 (3 H, s), 3.68 (1 H, d, J9.3), 3.56 (1 H, d, J3.7), 3.49 (1 H, d, J93), 3.34 (1 H, s); 4 (CDC13) 159.5, 159.3, 136.3, 134.0, 131.0, 129.7, 129.6, 129.5, 129.4, 129.0, 128.6,
128.2, 125.6, 125.5, 123.5, 114.0, 113.8, 113.7, 88.7, 84.7, 81.9, 75.5, 73.5, 71.7, 71.3, 69.3, 55.4, 553; m/z (FAB) 553 [M+Na]+, 529 [M-H]+, 511 [M-H-H2O]+.
Example 38d: (2S3S.4S)-4-Hydroxy-2-[(R)-hydroxy(l-pyrenyl)methyl]-4-(p-methoxy- benzyloxy)-3-(p-methoxybenzyloxymethyl)tetrahydrofuran [Compound 12d of Scheme 2] Tetrahydrofuran 12d was synthesized from aldehyde 11 (515 mg, 1.28 mmol), 1- bromopyrene (1.0 g, 3.56 mmol), magnesium turnings (155 mg, 6.4 mmol), iodine (10 mg ) a and THF (20 cm ) following the procedure described for synthesis of compound 12c. The cmde product was purified by column chromatography [3-4% (v/v) MeOH in CH2C12] to give tetrahydrofuran 12d (690 mg, 89%) as a pale yellow solid; R/0.35 (CH2Cl2/MeOH 95:5, v/v); <5 (CDC13) 8.23 (2 H, d, J8.4 and 9.2), 8.19-8.13 (3 H, m), 8.05-7.99 (4 H, m), 7.14 (2 H, d, J8.8), 6.82 (2 H, d, J9.0), 630 (2 H, d, J8.7), 6.20 (2 H, d, J8.6), 5.87 (1 H, d, J7.2), 4.43 (2 H, s), 4.41 (1 H, m), 4.01 (1 H, d, J9.4), 3.91 (1 H, d, J 11.8), 3.86 (1 H, d, J9.2), 3.77 (1 H, d, J1.9), 3.76 (3 H, s), 3.70-3.64 (3 H, m), 3.52-3.45 (1 H, m), 3.44 (3 H, s); δc (CDC13) 159.5, 158.9, 133.9, 131.4, 131.1, 130.7, 129.7, 129.5, 129.2, 128.9, 128.5, 127.8, 127.7, 127.5, 126.0, 125.5, 125.3, 125.2, 125.1, 125.0, 124.9, 122.9, 113.9, 1133, 89.5, 83.5, 82.0, 75.7, 73.4, 71.3, 71.0, 69.3, 55.3, 55.0; m/z (MALDI) 627 [M+Na , 609 [M++Na- H2O]+.
Example 38e: (2S3S.4S)-4-Hvdroxy-2-[(R)-hvdroxy(2.4.5-trimethylphenyl)methyll-4-(p- methoxybenzyloxy)-3-(p-methoxybenzyloxymethyl) tetrahydrofuran [Compound 12e of Scheme 2]
Tetrahydrofuran 12e was synthesized from aldehyde 11 (515 mg, 1.28 mmol), 1- bromo-2,4,5-trimethylbenzene (1.28 g, 6.4 mmol), magnesium turnings (155 mg, 6.4 mmol), iodme (10 mg) and THF (20 cm ) following the procedure described for synthesis of compound 12c. The cmde product was purified by column chromatography [3-4% (v/v) MeOH in CH2C12] to give tetrahydrofuran 12e (589 mg, 88%) as a colorless oil; R/0.34 (CH2Cl2/MeOH 95:5, v/v); δa (CDC13) 7.25 (2 H, d, J8.7), 7.21 (2 H, d, J8.9), 6.90 (1 H, s), 6.87 (1 H, s), 6.85 (2 H, d, J8.9), 6.76 (2 H, d, J8.7), 4.95 (1 H, dd, J3.6 and 5.9), 4.48 (2 H, s), 4.18-4.08 (3 H, m), 3.89 (1 H, d, J9.6), 3.80 (1 H, m), 3.79 (3 H, s), 3.77 (3 H, s), 3.71 (1 H, d, J9.2), 3.64 (1 H, d, J2.6), 3.51 (1 H, d, J9.4), 3.24 (1 H, s), 3.18 (1 H, d, J3.4), 2.25 (3 H,s), 2.22 (3 H,s), 2.21 (3 H, s); δc (CDC13) 159.5, 159.3, 136.0, 135.8, 134.2, 132.5,
132.0, 129.8, 129.7, 129.6, 129.5, 128.5,113.9, 113.8, 88.6, 84.7, 81.7, 75.4, 73.5, 71.7, 70.9, 69.4, 553, 19.5, 19.4, 19.0; m/z (FAB) 545 [M+Na , 521 [M-H]+, 503 [M-H-H2O]+.
Example 39: General procedure for the cyclization of 12a-e to give compounds 13a-e as 5 shown in Scheme 2.
N,N,N,N-Tetramethylazodicarboxamide (TMAD) was added in one portion to a stirred solution of the compounds 12a-e as shown in Scheme 2 and tributylphosphine in benzene at 0 °C. The mixture was stirred for 12 h at room temperature whereupon it was diluted with diethyl ether (50 cm3). The organic phase was washed successively with 10 saturated aqueous ΝEUC1 (2 x 20 cm3) and brine (25 cm3), dried (Νa2SO4), filtered and evaporated to dryness under reduced pressure. The cmde product obtained was purified by column chromatography [1.5-2% (v/v) MeOH in CH2C12] to give compounds 13a-e as shown in Scheme 2.
15 Example 39a: (lS3S.4R.7S)-7-(p-Methoxybenzyloxy)-l-(p-methoxybenzyloxymethyl)-3- phenyl-2.5-dioxabicyclo[2.2.1]heptane [Compound 13a of Scheme 2]
Cyclization of compound 12a (540 mg, 1.13 mmol) in the presence of TMAD (310 mg, 1.8 mmol), PBu3 (364 mg, 1.8 mmol) and benzene (10 cm3) followed by the general work-up procedure and column chromatography afforded compound 13a as a colorless oil
20 (400 mg, 77%); R/0.51 (CH2Cl2/MeOH 98:2, v/v); <5 (CDC13) 7.36-733 (7 H, m), 7.10 (2 H, d, J 8.3), 6.88 (2 H, d, J 8.7), 6.78(2 H, d, J 8.7), 5.17 (1 H, s, H-3), 4.59 (2 H, br s, - CH2(MPM)), 4.43 (1 H, d, J 113, -CH2(MPM)), 4.34 (1 H, d, J 11.3, -CH2(MPM)), 4.19 (1 H, s, H-4), 4.09 (1 H, d, J7.7, H-6), 4.06 (1 H, d, J7.7, H-6), 4.01 (1 H, s, H-7), 3.82-3.77 (5 H, m, -C,-CH2-O-, OCH3), 3.76 (3 H, s, -OCH3); δc (CDC13) 159.4, 159.3, 139.4 (C-1'),
25 1303, 129.7, 129.5, 129.3, 128.5, 127.5, 125.4, 113.9, 113.8, 85.9 (C-1), 84.1 (C-3), 81.1 (C- 4), 77.4 (C-7), 73.7 (-CH2(MPM)), 73.4 (C-6), 71.8 (-CH2(MPM)), 66.3 (-Cι-CH2-O-), 55.4 (-OCH3), 55.3 (-OCH3); m/z (FAB) 467 [M+Na-H2O]+, 461 [M-H]+.
Example 39b: (lS.3S.4R.7S)-3-(4-Fluoro-3-methylphenyl)-7-(p-methoxybenzyloxy)-l-(p-
30 methoxybenzyloxymethyl)-2.5-dioxabicvclor2.2.1]heptane rCompound 13b of Scheme 2]
Cyclization of compound 12b (550 mg, 1.08 mmol) in the presence of TMAD (275 mg, 1.6 mmol), PBu3 (325 mg, 1.6 mmol) and benzene (10 cm3) followed by the general work-up procedure and column chromatography afforded compound 13b as a colorless oil
(445 mg, 84%); R/0.52 (CH2Cl2/MeOH 98:2, v/v); δa (CDC13) 7.28 (2 H, d, J8.7 ), 7.11 (2 H, d, J 8.6), 7.08-7.09 (2 H, m, H-2' and H-6'), 6.94 (1 H, dd, J8.5 and 9.2, H-5'), 6.88 (2 H, d, J8.6), 6.79 (2 H, d, J8.4), 5.08(1 H, s, H-3), 4.62-4.55 (2 H, m, -CH2(MPM)), 4.45 (1 H, d, J ll.l, -CH2(MPM))5 4.36 (1 H, d, J11.6, -CH2(MPM)), 4.13 (1 H, s, H-4), 4.07, 4.03 (1 H each, 2d, J7.6 each, H-6), 3.99 (1 H, s, H-7), 3.81 (2 H, m,-Cι-CH2-O-), 3.80 (3 H, s,- OCH3), 3.77 (3 H, s,-OCH3), 2.23 (3 H, d, J 1.6, Ar-CH3); δc (CDC13) 162.3 (C-4'), 159.4, 159.3, 134.8, 134.7, 1303, 129.6, 129.5, 129.2, 128.5, 128.4, 128.3, 124.2, 115.1, 114.8, 113.9, 113.8, 85.9 (C-1), 83.5 (C-3), 81.0 (C-4), 77.1 (C-7), 73.6 (-CH2(MPM)), 73.4 (C-6), 71.8 (-CH2(MPM)), 66.2 (-C,-CH2-O-), 55.4 (-OCH3), 553 (-OCH3), 14.7 ( d, J33, Ar- CH3); m/z (FAB) 494 [M]+, 493 [M-H]+.
Example 39c: (1 S3S.4R,7S)-7-(p-Methoxybenzyloxy)-l -(p-methoxybenzyloxymethyl)-3-(l - naphthyl)-2.5-dioxabicyclo[2.2.1]heptane [Compound 13c of Scheme 2]
Cyclization of compound 12c (700 mg, 1.32 mmol) in the presence of TMAD (345 mg, 2.0 mmol), PBu3 (405 mg, 2.0 mmol) and benzene (15 cm3) followed by the general work-up procedure and column chromatography afforded compound 13c as a colorless oil (526 mg, 78%); R/0.53 (CH2Cl2/MeOH 98:2, v/v); <5k (CDC13) 7.91-7.86 (2 H, m), 7.78 (1 H, d, J8.2), 7.73 (1 H, d, J7.1), 7.53-7.46 (3 H, m), 7.32 (2 H, d, J8.7), 7.04 (2 H, d, J8.7), 6.90 (2 H, d, J8.3), 6.71 (2 H, d, J8.6), 5.79 (1 H, s, H-3), 4.67-4.61 (2 H, m, -CH2(MPM)), 4.43 (1 H, s, H-4), 438 (1 H, d, J 11.2, -CH2(MPM)), 4.27 (1 H, d, J 10.9, -CH2(MPM)), 4.16 (2 H, br s, H-6), 4.08 (1 H, s, H-7), 3.91, 3.87 (1 H each, 2d, J 11.0 each C CH^O-), 3.81 (3 H, s,-OCH3), 3.72 (3 H, s,-OCH3); δc (CDC13) 159.3, 134.6 (C-1'), 133.5, 130.3, 129.8, 129.7, 129.4, 129.3, 128.9, 128.1, 126.4, 125.8, 125.6, 123.8, 122.7, 113.9, 113.7, 85.7 (C-1), 823 (C-3), 79.9 (C-4), 78.2 (C-7), 73.7 (-OCH2(MPM)), 73.5 (C-6), 71.8 (- OCH2(MPM)), 66.3 (-Cι-CH2-O-), 55.4 (-OCH3), 55.3 (-OCH3); m/z (FAB) 512 [M]+, 511 [M-H]+.
Example 39d: (lS3S.4R.7S)-7-(p-Methoxybenzyloxy)-l-(p-methoxybenzyloxymethyl)-3- (l-pyrenyl)-2.5-dioxabicyclo[2.2.1]heptane [Compound 13d of Scheme 2] Cyclization of compound 12d (650 mg, 1.08 mmol) in the presence of TMAD (275 mg, 1.6 mmol), PBu3 (325 mg, 1.6 mmol) and benzene (10 cm3) followed by the general work-up procedure and column chromatography afforded compound 13d as a pale yellow solid (496 mg, 79%); R/0.53 (CH2Cl2/MeOH 98:2, v/v); ό (CDCI3) 8.29 (1 H, d, J8.2),
8.18-8.12 (5 H, m), 8.08-8.01 (2 H, m), 7.96 (1 H, d, J7.5), 7.35 (2 H, d, J8.5), 6.97 (2 H, d, J8.9), 6.92 (2 H, d, J8.8), 6.60 (2 H, d, J8.8), 6.09 (1 H, s, H-3), 4.71-4.65 (2 H, m, - CH2(MPM)), 4.49 (1 H, s, H-4), 4.34 (1 H, d, J 11.4, -CH2(MPM)), 4.23 (1 H, d, J 11.1, - CH2(MPM)), 4.25 (1 H, d, J7.6, H-6), 4.21 (1 H, d, J7.8, H-6), 4.16 (1 H, s, H-7), 3.95-3.94 5 (2 H, m, -Cι-CH2-O-), 3.81 (3 H, s,-OCH3), 3.59 (3 H, s,-OCH3); δc (CDC13) 159.4, 159.3, 132.2 (C-1'), 131.4, 130.8, 130.7, 130.4, 129.5, 129.4, 128.0, 127.5, 127.4, 126.9, 126.1, 125.6, 125.4, 124.9, 124.8, 124.7, 123.6, 122.0, 113.9, 113.7, 85.9 (C-1), 82.7 (C-3), 80.6 (C- 4), 77.9 (C-7), 73.9 (-OCH2(MPM)), 73.5 (C-6), 71.8 (-OCH2(MPM)), 66.3 (-CrCHrO-), 55.4 (-OCH3), 55.2 (-OCH3); m/z (FAB) 587 [M+H]+, 586 [M]+.
10
Example 39e: (1S3S .4R.7 S)-7-(p-Methoxybenzyloxy)- 1 -(p-methoxybenzyloxymethyl)-3 - (2.4.5-trimethylphenyl)-2.5-dioxabicyclo[2.2.1]heptane [Compound 13e of Scheme 2]
Cyclization of compound 12e (550 mg, 1.05 mmol) in the presence of TMAD (275 mg, 1.6 mmol), PBU3 (325 mg, 1.6 mmol) and benzene (10 cm3) followed by the general
15 work-up procedure and column chromatography afforded compound 13e as a colorless oil (425 mg, 80%); R/0.52 (CH2Cl2/MeOH 98:2, v/v); δ (CDC13) 7.30 (2 H, d, J9.0), 7.24 (1 H, s, H-6'), 7.13 (2 H, d, J8.9), 6.89 (1 H, s, H-3'), 6.88 (2 H, d, J8.8), 6.79 (2 H, d, J8.6), 5.18 (1 H, s, H-3), 4.64-4.57 (2 H, m, -CH2(MPM)), 4.46 (1 H, d, J 11.2, -CH2(MPM)), 4.36 (1 H, d, J 11.5, -CH2(MPM)), 4.18 (1 H, s, H-4), 4.14 (1 H, s, H-7), 4.09 (1 H, d, J7.9, H-6),
20 4.04 (1 H, d, J7.7, H-6), 3.86 (2 H, s,-C!-CH2-O-), 3.80 (3 H, s,-OCH3), 3.76 (3H, s,-OCH3), 2.21 (6 H, s, 2 x Ar-CH3), 2.17 (3 H, s, Ar-CH3); δc (CDC13) 159.4, 159.3, 135.5 (C-1'), 134.4, 134.0, 131.7, 131.3, 130.5, 129.9, 129.4, 129.2, 127.2, 113.9, 113.8, 85.6 (C-1), 82.4 (C-3), 79.4 (C-4), 77.6 (C-7), 73.5 (-OCH2(MPM)), 73.4 (C-6), 71.8 (-OCH2(MPM)), 66.3 (- C!-CH2-O-), 55.4 (-OCH3), 55.3 (-OCH3), 19.5 (-CH3), 19.3 (-CH3), 18.4 (-CH3); m/z (FAB)
25 504 [M]+, 503 [M-H]+.
Example 40: General procedure for the oxidative removal of the p-methoxybenzyl groups to give Compounds 14a-e as shown in Scheme 2.
To a stirred solution of Compound 13a-e in CH2C12 (containing a small amount of 30 H2O) at room temperature, was added 2,3-dichloro-5,6-dicyanoquinone (DDQ) wliich resulted in an immediate appearance of a deep greenish-black color which slowly faded into pale brownish-yellow. The reaction mixture was vigorously stirred at room temperature for 4 h. The precipitate was removed by filtration through a short pad of silica gel and washed with
EtOAc. The combined filtrate was washed, successively, with saturated aqueous NaHCO3 (2 x 25 cm3) and brine (25 cm3). The separated organic phase was dried (Na2SO4), filtered and evaporated to dryness under reduced pressure. The cmde product obtained was purified by column chromatography [4-5% (v/v) MeOH in CH2C12] to give compounds 14a-e.
Example 40a: (lS3S.4R.7SV7-Hvdroxy-l-hvdroxymethyl-3-phenyl-2.5-dioxabicvclo[2.2.11- heptane [Compound 14a of Scheme 2]
Compound 13a (400 mg, 0.86 mmol) was treated with DDQ (600 mg, 2.63 mmol) in a mixture of CH2C1 (10 cm3) and H2O (0.5 cm3). After the general work-up procedure and column chromatography, compound 14a was obtained as a white solid material (128 mg, 66%); R/0.30 (CH2Cl2/MeOH 9:1, v/v); δa ((CD3)2CO/CD3OD; (CD3)2CO was added to the compound followed by addition of CD3OD until a clear solution appeared) 7.40-7.22 (5 H, m), 4.99 (1 H, s), 4.09 (1 H, s), 4.04 (1 H, s), 4.01 (1 H, d, J7.7), 3.86 (1 H, d, J7.7), 3.90 (2 H, br s), 3.77 (2 H, br s); ((CD3)2CO/CD3OD; (CD3)2CO was added to the compound followed by addition of CD3OD until a clear solution appeared) 140.0, 128.2, 127.2, 125.4, 87.2, 83.7, 83.5, 72.3, 70.2, 58.4; m/z (FAB) 223 [M+H]+.
Example 40b: (lS.3S.4R.7S)-3-(4-Fluoro-3-methylphenyl)-7-hvdroχy-l-hvdroxymethyl-2.5- dioxabicyclo[2.2.1]heptane [Compound 14b of Scheme 2] Compound 13b (400 mg, 0.81 mmol) was treated with DDQ (570 mg, 2.5 mmol) in a mixture of CH2C12 (10 cm3) and H2O (0.5 cm3). After the general work-up procedure and column chromatography, compound 14b was obtained as a white solid material (137 mg, 67%); R/0.31 (CH2Cl2/MeOH 9:1, v/v); δa (CD3OD) 7.23 (1 H, d, J8.1), 7.19 (1 H, m), 6.99 (1 H, dd, J8.5 and 9.3), 4.99 (1 H, s), 4.09 (1 H, s), 4.06 (1 H, s), 4.03 (1 H, d, J7.6), 3.93- 3.91 (3 H, m), 2.25 (3 H, d, J 1.4); δc (CD3OD) 161.9 (d, J243.3), 136.4 (d, J3.4), 129.6 (d, J5.0), 126.1 (d, J22.8), 125.5 (d, J8.0), 115.7 (d, J22.9), 88.5, 85.0, 84.3, 73.5, 71.3, 59.4, 14.5 (d, J3.7); m/z (FAB) 255 [M+H]+.
Example 40c: (lS.3S.4R.7S)-7-Hvdroxy-l -hvdroxymethyl-3-(l-naphthyl)-2.5-dioxabicvclo- [2.2.1]heptane [Compound 14b of Scheme 2]
Compound 13c (475 mg, 0.93 mmol) was treated with DDQ (600 mg, 2.63 mmol) in a mixture of CH2C12 (10 cm3) and H2O (0.5 cm3). After the general work-up procedure and column chromatography, compound 14c was obtained as a white solid material (170 mg,
67%); R/0.31 (CH2Cl2/MeOH 9:1, v/v); 00, (CDCl3/CD3OD; CD3OD was added to the compound followed by addition of CDC13 until a clear solution appeared) 7.94-7.86 (2 H, m), 7.80-7.74 (2 H, m), 7.55-7.46 (3 H, m), 5.74 (1 H, s), 4.56 (2 H, br s), 4.37 (1 H, s), 4.24 (1 H, s), 4.17-4.11 (2 H, m), 4.04 (2 H, br s); δc (CDCl3/CD3OD; CD3OD was added to the compound followed by addition of CDCI3 until a clear solution appeared 134.7, 134.0, 130.2, 129.3, 128.6, 126.8, 126.2, 125.8, 123.8, 122.8, 87.4, 83.1, 82.2, 73.1, 71.5, 59.0; w/.r (FAB) 273 [M+H]+, 272 [M]+.
Example 40d: (lS3S.4R.7S)-7-Hydroxy-l-hydroxymethyl-3-(l-pyrenyl)-2.5-dioxabicyclo- 2.2.1]heptane rCompound 14d of Scheme 2]
Compound 13d (411 mg, 0.7 mmol) was treated with DDQ (570 mg, 2.5 mmol) in a mixture of CH2C12 (10 cm3) and H2O (0.5 cm3). After the general work-up procedure and column chromatography, compound 14d was obtained as a white solid material (182 mg, 75%); R/0.32 (CH2Cl2/MeOH 9:1, v/v); δ (CDCI3/CD3OD; CD3OD was added to the compound followed by addition of CDCI3 until a clear solution appeared) 8.32 (1 H, d, J 7.8), 8.23-8.18 (5 H, m), 8.06 (2 H, br s), 8.01 (1 H, d, J7.6), 6.06 (IH, s), 4.47 (1 H, s), 4.36 (1 H, s), 4.27-4.18 (2 H, m), 4.10 (2 H, br s); & (CDCI3/CD3OD) 132.2, 131.0, 128.5, 127.8, 127.3, 126.5, 125.9, 125.7, 125.1, 123.6, 122.1, 87.7, 83.7, 82.6, 73.1, 71.4, 58.9; m/z (FAB) 347 [M+H]+, 346 [M]+.
Example 40e: (1 S3S.4R.7S)-7-Hvdroxy-l -hydroxymethyl-3-(2.4.5-trimethylphenyl)-2.5- dioxabicyclo[2.2.1]heptane [Compound 14e of Scheme 2]
Compound 13e (355 mg, 0.7 mmol) was treated with DDQ (570 mg, 2.5 mmol) in a mixture of CH2C12 (10 cm3) and H2O (0.5 cm3). After the general usual work-up procedure and column chromatography, compound 14e was obtained as a white solid material (120 mg, 65%); R/0.31 (CH2Cl2 MeOH 9:1, v/v); ό (CDCI3/CD3OD; CD3OD was added to the compound followed by addition of CDCI3 until a clear solution appeared) 7.23 (1 H, s), 6.92 (1 H, s), 5.14 (1 H, s), 4.26 (1 H, s), 4.10 (1 H, s), 4.08, (1 H, d, J7.7), 4.00-3.95 (3 H, m), 2.23 (6 H, s), 2.21 (1 H, s); δc (CDCI3/CD3OD; CD3OD was added to the compound followed by addition of CDCI3 until a clear solution appeared) 135.6, 133.9, 133.8, 131.7, 131.2, 126.6, 86.6, 82.1, 81.9, 72.3, 70.6, 58.5, 19.2, 19.0, 18.1; m/z (FAB) 265 [M+H]+, 264 [M]+.
Example 41: General procedure for dimethoxytritylation of compounds 14a-e to give Compounds 15a-e as shown in Scheme 2.
4,4'-Dimethoxytrityl chloride (DMTC1) was added in one portion to a stirred solution of compound 14a-e in anhydrous pyridine. After stirring the mixture at room temperature for 4 h, methanol (0.2 cm3) was added and the resulting mixture was evaporated to dryness under reduced pressure. The residue was coevaporated with anhydrous CH3CN (2 x 5 cm3) and anhydrous toluene (2 x 5 cm3) and then dissolved in CH2C1 (20 cm3, traces of acid removed by filtration through a short pad of basic alumina). The resulting solution was washed,
•a a successively, with saturated aqueous NaHCO3 (2 x 10 cm ) and brine (10 cm ). The separated organic phase was dried (Na2SO4), filtered and evaporated to dryness under reduced pressure. The crude product obtained was purified by column chromatography [0.25- 0.50% (v/v) MeOH in CH2C12, containing 0.5% Et3N] affording compounds 15a-e.
Example 41a: (lR3S.4R.7S)-l-(4.4'-Dimethoxytrityloxymethyl)-7-hvdroxy-3-phenyl-2.5- dioxabicyclo[2.2.1]heptane [Compound 15a of Scheme 2]
Dimethoxytritylation of compound 14a (108 mg, 0.49 mmol) using DMTC1 (214 mg, 0.63 mmol) in anhydrous pyridine (2 cm3) followed by the general work-up procedure and column chromatography afforded compound 15a as a white solid material (180 mg, 71%); R/ 0.31 (CH2Cl2/MeOH 98:2, v/v); δn (CDC13) 7.66-7.21 (14 H, m), 6.84 (4 H, d, J8.8), 5.19 (1 H, s), 4.29 (1 H, s), 4.13 (1 H, s), 4.07 (1 H, d, J8.4), 4.01 (1 H, d, J8.3), 3.78 (6 H, s), 3.55 (1 H, d, J 10.2), 3.50 (1 H, d, J 10.7), 2.73 (1 H, br s); δc (CDC13) 158.6, 149.8, 144.9, 139.4, 136.2, 135.9, 135.8, 130.3, 130.2, 128.5, 128.3, 128.0, 127.6, 126.9, 125.4, 123.9, 1133, 86.4, 86.0, 83.8, 83.4, 73.0, 71.6, 60.2, 553; m/z (FAB) 525 [M+H]+, 524 [M]+.
Example 41b: (1 R3 S.4R.7S V 1 -(4.4' -Dimethoxytrityloxymethyl)-3 -(4-fluoro-3- methylphenyl)-7-hvdroxy-2.5-dioxabicyclo[2.2.1]heptane [Compound 15b of Scheme 2]
Dimethoxytritylation of compound 14b (95 mg, 0.38 mmol) using DMTC1 (129 mg, 0.42 mmol) in anhydrous pyridine (2 cm3 ) followed by the general work-up procedure and column chromatography afforded compound 15b as a white solid material (126 mg, 61%); Rf 0.32 (CH2Cl2/MeOH 98:2, v/v); δa (CDCI3) 7.53-7.15 (11 H, m), 6.97 (1 H, dd, J8.7 and 8.9), 6.84 (4 H, d, J8.8), 5.11 (1 H, s), 4.26 (1 H, d, J3.9), 4.08 (1 H, s), 4.03 (1 H, d, J8.0), 3.95 (1 H, d, J8.0), 3.78 (6 H, s), 3.54 (1 H, d, J 10.5), 3.47 (1 H, d, J 10.1), 2.26 (3 H, d, J
1.5), 2.08 (1 H, br s); δc (CDC13) 160.8 (d, J244.1), 158.7, 144.9, 135.9, 134.7, 134.6, 130.3, 130.2, 130.1, 128.5, 128.4, 128.3, 128.0, 127.0, 125.2, 124.9, 124.4, 124.3, 115.2, 114.9, 113.4, 86.5, 86.0, 83.7, 83.0, 72.9, 71.7, 60.1, 55.3, 14.8 (d, J3.1); m/z (FAB) 556 [M]+.
5 Example 41c: lR3S.4R.7S)-l-(4.4'-Dimethoxytrityloxymethyl)-7-hvdroxy-3-(l-naphthyl)- 2.5-dioxabicyclo[2.2.1]heptane [Compound 15c of Scheme 2]
Dimethoxytritylation of compound 14c (125 mg, 0.46 mmol) using DMTC1 (170 mg, 0.5 mmol) in anhydrous pyridine (2 cm3) followed by the general work-up procedure and column chromatography afforded compound 15c as a white solid material (158 mg, 60%); Rf
10 0.35 (CH2Cl2/MeOH 98:2, v/v); δn (CDC13) 7.95-7.86 (3 H, m), 7.79 (1 H, d, J8.3), 7.58- 7.41 (9 H, m), 735-7.23 (3 H, m), 6.86 (4 H, d, J8.8), 5.80 (1 H, s), 4.36 (1 H, s), 432 (1 H, d, J6.5), 4.17 (1 H, d, J8.3), 4.06 (1 H, d, J8.0), 3.78 (6 H, s), 3.62-3.56 (2 H, m), 2.00 (1 H, d, J6.6); <5 (CDCl3) 158.7, 144.9, 136.0, 135.9, 134.5, 133.6, 1303, 129.8, 129.0, 128.3, 128.2, 128.1, 127.0, 126.5, 125.9, 125.6, 123.9, 122.6, 113.4, 86.6, 85.7, 82.5, 81.7, 73.1,
15 72.6, 60.2, 55.3; m/z (FAB) 575 [M+H]+, 574 [M]+.
Example 41 d: ( 1 R3 S.4R.7SV 1 -(4.4' -Dimethoxytrityloxymethyl)-7-hvdroxy-3 -(1 -pyrenvD- 2,5-dioxabicyclo[2.2.1]heptane [Compound 15d of Scheme 2]
Dimethoxytritylation of the compound 14d (130 mg, 0.38 mmol) using DMTC1 (140
20 mg, 0.42 mmol) in anhydrous pyridine (2 cm3) followed by the general work-up procedure and column chromatography afforded compound 15d as a white solid material ( 147 mg, 61%); R/037 (CH2Cl2/MeOH 98:2, v/v); <5 (CDCI3) 8.46 (1 H, d, J8.0), 8.19-8.00 (7 H, m), 7.61 (2 H, dd, J 1.6 and 7.4), 7.48 (4 H, d, J8.3), 7.35 (2 H, dd, J7.2 and 7.5), 7.25 (1 H, m), 7.15 (1 H, m), 6.88 (4 H, d, J9.0), 6.10 (1 H, s), 4.46 (1 H, s), 4.43 (1 H, br s), 4.25 (1 H, d, J
25 8.1), 4.12 (1 H, d, J8.1), 3.79 (6H, s), 3.71-3.63 (2 H, m), 2.22 (1 H, br s); δc (CDC13) 158.7, 149.8, 144.9, 136.1, 136.0, 135.9, 132.1, 131.4, 130.9, 130.6, 1303, 130.2, 129.2, 129.1, 128.4, 128.3, 128.2, 128.1, 127.5, 127.4, 127.0, 126.9, 126.2, 125.5, 125.4,124.9, 124.8, 124.7,123.8, 123.7, 121.9, 113.4, 86.6, 86.1, 83.2, 82.2, 73.2, 72.4, 60.3, 55.3; m/z (FAB) 649 [M+Hf, 648 [M]+.
30
Example 41e: (1 R3S.4R.7S)-l-(4.4'-Dimethoxytrityloxymethyl)-7-hvdroxy-3-(2.4.5- trimethylphenyl)-2.5-dioxabicvclo[2.2.1]heptane [Compound 15e of Scheme 2]
Dimethoxytritylation of compound 14e (80 mg, 0.3 mmol) using DMTC1 (113 mg, 0.33 mmol) in anhydrous pyridine (2 cm3) followed by the general work-up procedure and column chromatography afforded compound 15e as a white solid material (134 mg, 78%); R/ 0.32 (CH2Cl2/MeOH 98:2, v/v); ό (CDC13) 7.55 (2 H, d, J7.9), 7.45-7.42 (4 H, m), 7.32- 7.21 (4 H, m), 6.93 (1 H, s), 6.84 (4 H, d, J8.2), 5.20 (1 H, s), 4.40 (1 H, s), 4.08 (1 H, s), 4.04 (1 H, d, J8.3), 3.95 (1 H, d, J8.2), 3.78 (6 H, s), 3.56 (1 H, d, J 10.5), 3.47 (1 H, d, J 10.2), 2.24 (3 H, s), 2.22 (3 H, s), 2.19 (3 H, s); δc (CDCI3) 158.6, 145.0, 136.0, 135.7, 134.4, 134.2, 131.8, 1313,1303, 130.2, 128.3, 128.0, 127.2, 126.9, 113.3, 86.4, 85.7, 82.1, 81.8, 73.0, 71.8, 60.2, 553, 19.6, 19.3, 18.4; m/z (FAB) 567 [M+H]+, 566 [M]+.
Example 42: General procedure for synthesis of the phosphoramidite derivatives 16a-e as shown in Scheme 2.
2-Cyanoethyl N,N-diisopropylphosphoramidochloridite was added dropwise to a stirred solution of nucleoside 15a-e and N,N -diisopropylethylamine (DIPEA) in anhydrous CH2C12 at room temperature. After stirring the mixture at room temperature for 6 h, methanol (0.2 cm3) was added and the resulting mixture diluted with EtOAc (20 cm3, containing 0.5% Et3Ν, v/v). The organic phase was washed, successively, with saturated a. NaHCO3 (2 x 10 a "a cm ) and brine (10 cm ). The separated organic phase was dried (Na2SO ), filtered and evaporated to dryness under reduced pressure. The residue obtained was purified by column chromatography [25-30% (v/v) EtOAc in rø-hexane containing 0.5% Et3N] to give the amidites 16a-e.
Example 42a: Synthesis of (lR3S.4R.7SV7-[2-Cyanoethoxy(diisopropylamino) phosphinoxy]-l-(4.4'-dimethoxytrityloxymethyl)-3-phenyl-2.5-dioxabicvclo[2.2.1] heptane [Compound 16a of Scheme 2]
Treatment of compound 15a (170 mg, 0.32 mmol) with 2-cyanoethyl N,N- diisopropylphosphoramidochloridite (85 mg, 036 mmol) in the presence of DIPEA (0.4cm3) and anhydrous CH2C12 (2.0 cm ) followed by the general work-up procedure and column chromatography afforded phosphoramidite 16a as a white solid material (155 mg, 66%); Rf 0.45, 0.41 (CH2Cl2/MeOH 98:2, v/v); δ? (CDC13) 149.3, 148.9.
Example 42b: (lR.3S.4R.7S)-7-[2-Cvanoethoxy(diisoproρylamino) phosphinoxv]-! -(4.4'- dimethoxytrityloxymethyl)-3-(4-fluoro-3-methylphenyl)-2.5-dioxabicyclo[2.2.1]heptane [Compound 16b of Scheme 2]
Treatment of compound 15b (95 mg, 0.17 mmol) with 2-cyanoethyl N,N- diisopropylphosphoramidochloridite (53 mg, 0.22 mmol) in the presence of DIPEA (0.3cm3) and anhydrous CH C12 (2.0 cm3) followed by the general work-up procedure and column chromatography afforded phosphoramidite 16b as a white solid material (85 mg, 66%); Rf 0.45, 0.41 (CH2Cl2/MeOH 98:2, v/v); & (CDC13) 149.3, 148.8.
Example 42c: Synthesis of (lR3S.4R.7S)-7-r2-
Cvanoethoxy(diisopropylamino)phosphinoxy]-l-(4,4,-dimethoxytrityloxymethyl)-3-(l- naphthyl)-2.5-dioxabicyclo[2.2.1]heptane [Compound 16c of Scheme 2]
Treatment of compound 5c (158 mg, 0.28 mmol) with 2-cyanoethyl N,N- diisopropylphosphoramidochloridite (75.7 mg, 0.32 mmol) in the presence of DIPEA (0.4cm3) and anhydrous CH2C12 (2.0 cm3) followed by the general work-up procedure and column chromatography afforded phosphoramidite 16c as a white solid material (127 mg, 60%); R/0.47, 0.44 (CH2Cl2/MeOH 98:2, v/v); δ? (CDC13) 149.2, 149.1.
Example 42d: Synthesis of (lR3S.4R,7S)-7-[2-Cyanoethoxy(diisopropylamino) phosphinoxy]-l-(4.4'-dimethoxytrityloxymethyl)-3-(l-pyrenyl)-2.5-dioxabicyclo[
2.2.1]heptane [Compound 16d of Scheme 2]
Treatment of compound 15d (140 mg, 0.22 mmol) with 2-cyanoethyl N,7V- diisopropylphosphoramidochloridite (64 mg, 0.27 mmol) in the presence of DIPEA (0.3cm ) and anhydrous CH2C12 (2.0 cm3) followed by the general work-up procedure and column chromatography afforded phosphoramidite 16d as a white solid material (124 mg, 68%>); Rf
0.51, 0.47 (CH2Cl2 MeOH 98:2, v/v); < (CDC13) 149.4, 149.1.
Example 42e: Synthesis of (lR3S.4R.7SV7-[2-Cvanoethoxy(diisopropylamino) phosphinoxy]-l-(4.4'-dimethoxytrityloxymethyl)-3-(2.4.5-trimethylphenyl)-2.5- dioxabicvclo[2.2.1]heptane [Compound 16e of Scheme 21
Treatment of compound 15e (130 mg, 0.23 mmol) with 2-cyanoethyl N,N- diisopropylphosphoramidochloridite (64 mg, 0.27 mmol) in the presence of DIPEA (0.3cm3) and anhydrous CH2C12 (2.0 cm3) followed by the general work-up procedure and column
chromatography afforded phosphoramidite 16e as a white solid material (111 mg, 63%); R/ 0.44, 0.42 (CH2Cl2/MeOH 98:2, v/v); δ? (CDC13) 149.0.
Example 43 : Synthesis, deprotection and purification of oligonucleotides 5 All oligomers were prepared using the phosphoramidite approach on a Biosearch
8750 DNA synthesizer in 0.2 μmol scale on CPG solid supports (BioGenex). The stepwise coupling efficiencies for phosphoramidites 16a-c (10 min coupling time) and phosphoramidites 16d and 16e (20 min coupling time) were >96% and for unmodified deoxynucleoside and ribonucleoside phosphoramidites (with standard coupling time)
10 generally >99%, in all cases using lH-tetrazole as activator. After standard deprotection and cleavage from the solid support using 32% aqueous ammonia (12 h, 55 °C), the oligomers were purified by precipitation from ethanol. The composition of the oligomers were verified by MALDI-MS analysis and the purity (>80%) by capillary gel electrophoresis. Selected MALDI-MS data ([M-Η]-; found/calcd.: ON3 2731/2733; ON4 2857/2857; ON6
15 3094/3093).
Example 44. Thermal denaturation studies
The thermal denaturation experiments were performed on a Perkin-Elmer UN/VIS spectrometer fitted with a PTP-6 Peltier temperature-programming element using a medium
20 salt buffer solution (10 mM sodium phosphate, 100 mM sodium chloride, 0.1 mM EDTA, pΗ 7.0). Concentrations of 1.5 mM of the two complementary strands were used assuming identical extinction coefficients for modified and unmodified oligonucleotides. The absorbance was monitored at 260 nm while raising the temperature at a rate of 1 °C per min. The melting temperatures (Tm values) of the duplexes were determined as the maximum of
25 the first derivatives of the melting curves obtained.
Example 45: Synthesis of compounds 16a-16e and oligomers containing monomers 17a-17e
LΝA containing the derivatives 17a-17e (Figure 1, Figure 11 and Figure 25), were synthesized, all based on the LΝA-type 2'-O,4'-C-methylene- ?-D-ribofuranosyl moiety 30 which is known to adopt a locked C3,-endo RΝA-like furanose conformation [S. Obika, D. Νanbu, Y. Ηari, K. Morio, Y. In, T. Ishida, and T. Imanishi, Tetrahedron Lett, 1997, 38, 8735; S. K. Singh, P. Nielsen, A. A. Koshkin and J. Wengel, Chem. Commun., 1998, 455; A. A. Koshkin, S. K. Singh, P. Nielsen, V. K. Rajwanshi, R. Kumar, M. Meldgaard, C. E. Olsen
and J. Wengel, Tetrahedron, 1998, 54, 3607; S. Obika, D. Nanbu, Y. Hari, J. Andoh, K. Morio, T. Doi and T. Imanishi, Tetrahedron Lett, 1998, 39, 5401]. The syntheses of the phosphoramidite building blocks 16a-16e suitable for incorporation of the LNA-type aryl C- glycosides 17a-17e are shown in Scheme 1 and Scheme 2 and described in details in the experimental section. In the design of an appropriate synthetic route, it was decided to utilize a reaction similar to one described recently in the literature. Thus, stereoselective attack of Grignard reagents of various heterocycles on a carbonyl group of an aldehyde corresponding to aldehyde 11 (Scheme 2) but with two O-benzyl groups instead of the two - methoxybenzyl groups of aldehyde 11 (Scheme 2) has been reported to furnish locked-C- nucleosides [S. Obika, Y. Hari, K. Morio and T. Imanishi, Tetrahedron Lett., 2000, 41, 215; S. Obika, Y. Hari, K. Morio and T. Imanishi, Tetrahedron Lett., 2000, 41, 221]. The key intermediate in the synthetic route selected herein, namely the novel aldehyde 11 was synthesized from the known furanoside 1 [R. Yamaguchi, T. Imanishi, S. Kohgo, H. Horie and H. Ohrai, Biosci. Biotechnol Biochem., 1999, 63, 736] following two different routes. In general, O-(p-Methoxy)benzyl protection was desirable instead of O-benzyl protection as removal of the benzyl protection at a later stage (i.e. 13 — >14) could also likely result in the cleavage of the benzylic Ο-Cι bond present, e.g., in compounds 13 and 14 (Scheme 2). In one route to give aldehyde 11, regioselectivep-methoxybenzylation of the furanoside 1, followed by mesylation and methanolysis yielded the anomeric mixture of the methyl furanosides 9. Base induced cyclization followed by acetyl hydrolysis afforded the aldehyde 11 in approximately 24% overall yield from 1 (Scheme 1 and Scheme 2). This yield was improved to following a different strategy. Thus, di-O-mesylation of 1 followed by methanolysis and base induced intramolecular nucleophilic attack from the 2-OH group afforded the cyclized anomeric mixture of methyl furanoside 4. Substitution of the remaining mesyloxy group of 4 with an acetate group, followed by deacetylation, 7-methoxybenzylation and then acetyl hydrolysis afforded the required aldehyde 11 (Scheme 1).
Coupling of the aldehyde 11 with different aryl Grignard reagents yielded selectively one epimer of each of the compounds 12a-e in good yields (see experimental section for further details on this and other synthetic steps). Each of the diols 12a-e was cyclized under Mitsunobu conditions (TMAD, PBu3) to afford the bicyclic /?-C-nucleoside derivatives 13a- e. Oxidative removal of the 7-methoxybenzyl protections was achieved in satisfactory yields using DDQ. Subsequent, selective 4,4'-dimethoxytritylation ( to give compounds 15a-e) followed by phosphorylation afforded the phosphoramidite building blocks 16a-e in
satisfactory yields. The configuration of compounds 13, and thus also compounds 11, 12 and 14-17 were assigned based on IH NMR spectroscopy, including NOE experiments.
All oligomers were prepared in the 0.2 μmol scale using the phosphoramidite approach. The stepwise coupling efficiencies for phosphoramidites 16a-c (10 min coupling time) and phosphoramidites 16d and 16e (20 min coupling time) were >96% and for unmodified deoxynucleoside and ribonucleoside phosphoramidites (with standard coupling time) generally >99%, in all cases using lH-tetrazole as activator. After standard deprotection and cleavage from the solid support using 32% aqueous ammonia (12 h, 55 °C), the oligomers were purified by precipitation from ethanol. The composition of the oligomers were verified by MALDI-MS analysis and the purity (>80%>) by capillary gel electrophoresis.
Example 46 Thermal denaturation studies to evaluate hybridization properties
The hybridization of the oligonucleotides ON1-ON11 (Table 8 below) toward four 9- mer DNA targets with the central base being each of four natural bases were studied by thermal denaturation experiments (Tm measurements; see the experimental section for details). Compared to the DNA reference ONI, introduction of one abasic LNA monomer AbL (ON2) has earlier been reported to prevent the formation of a stable duplex above 0 °C (only evaluated with adenine as the opposite base) [L. Kvaernø and J. Wengel, Chem. Commun., 1999, 657]. With the phenyl monomer 17a (ON3), Tm values in the range of 5-12 °C was observed. Thus, the phenyl moiety stabilizes the duplexes compared to AbL, but universal hybridization is not achieved as a preference for a central adenine base in the complementary target strand is indicated (Table 8). In addition, significant destabilization compared to the ONl:DNA reference duplex was observed. Results similar to those obtained for ON3 were obtained for oligomers isosequential with ON3 but containing 17b, 17c or 17e instead of 17a as the central monomer (Table 8, ON7, ON8 and ON9, respectively).
Table 8. Thermal denaturation experiments (Tm values shown) for ONI -ONI 1 towards DNA complements with each of the four natural bases in the central position"
DNA target: 3 '-d(CACTYTACG) Y: A C G T ONI 5'-d(GTGATATGC) 28 11 12 19
ON2 5'-d(GTGAAb ATGC) <3 n.d. n.d. n.d.
ON3 5'-d(GTGA17aATGC) 12 5 6 7
ON4 5'-d(GTGA17dATGC) 18 17 18 19
ON5 5'-d[2'-OMe(GTGATATGC)] 35 14 19 21 ON6 5'-d[2'-OMe(GTLGA17dAT GC)] 39 38 37 40
ON7 5'-d(GTGA17bATGC) 15 7 6 8
ON8 5'-d(GTGA17cATGC) 15 7 6 9
ON9 5'-d(GTGA17eATGC) 13 6 6 7
ON10 5'-d[2'-OMe(GTLGA17bATLGC)] 31 25 26 27 ON11 5'-d[2'-OMe(GTLGA17cATLGC)] 34 27 27 32 a Melting temperatures (Tm values/°C) measured as the maximum of the first derivative of the melting curve (A26o vs temperature) recorded in medium salt buffer (10 mM sodium phosphate, 100 mM sodium chloride, 0.1 mM EDTA, pH 7.0) using 1.5 μM concentrations of the two strands; A = adenine monomer, C = cytosine monomer, G = guanine monomer, T = thymine monomer; See Figure 1 and Figure 25 for stractures of TL, Ab and 17a-17e; DNA sequences are shown as d(sequence) and 2'-OMe-RNA sequences as 2'- OMe(sequence); "n.d." denotes "not determined". The data reported for ONI have been reported earlier [A. A. Koshkin, S. K. Singh, P. Nielsen, V. K. Rajwanshi, R. Kumar, M. Meldgaard, C. E. Olsen and J. Wengel, Tetrahedron, 1998, 54, 3607]. The data reported for ON2 has been reported earlier [L. Kvasrnø and J. Wengel, Chem. Commun., 1999, 657].
The pyrene LNA nucleotide 17d (in ON4) displays more encouraging properties (Table 8). Firstly, the binding affinity towards all four complements is increased compared to ON3 (containing 17a). Secondly, universal hybridization is observed as shown by the four Tm values all being within 17-19 °C. With respect to universal hybridization, 17d thus parallels the pyrene DNA derivative Py [T. J. Matray and E. T. Kool, J. Am. Chem. Soc, 1998, 120, 6191], but the decrease in thermal stability compared to the ONl:DNA reference is more pronounced for 17d (-10 °C) than reported for Py (-5 °C in a 12-mer polypyrimidine DNA sequence) [T. J. Matray and E. T. Kool, J. Am. Chem. Soc, 1998, 120, 6191]. It therefore appears that stacking (or intercalation) by theO pyrene moiety is not favored by the conformational restriction of the furanose ring of 17d, although comparison of the thermal
stabilities of ON2, ON3 and ON4 strongly indicate interaction of the pyrene moiety within the helix.
When measured against an RNA target [3'-r(CACUAUACG)], the Tm values (using identical experimental conditions as for the experiments descried above) of ON3 was 11.9 °C and of ON4 was 12.7 °C. For oligomers ON7, ON8 and ON9 (Table 8), the corresponding Tm values were 11.7, 8.8 and 10.2 °C, respectively.
Example 47: The effect of pyrene LNA monomers in an RNA-like strand.
ON5, ON6, ON10 and ON11 (see Table 8 above), were synthesized. The former being composed entirely of 2'-OMe-RNA monomers and the latter three of six 2'-OMe-RNA monomers (see Figure 1), two LNA thymine monomers TL (see Figure 1), and one central LNA pyrene monomer 17d (oligomer ON6), or one central monomer 17b (ON10) or 17c (ON11). A sequence corresponding to ON6 but with three T monomers has earlier been shown to form a duplex with complementary DNA of very high thermal stability. ON6 is therefore suitable for evaluation of the effect of introducing high-affinity monomers around a universal base. As seen in Table 8, the 2'-OMe-RNA reference ON5 binds to the DNA complement with slightly increased thermal stability and conserved Watson-Crick discrimination (compared to the DNA reference ONI). Indeed, the LNA/2' -OMe-RNA chimera ON6 displays universal hybridization behavior as revealed from the four Tm values (37, 38, 39 and 40 °C). All four Tm values obtained for ON6 are higher than the Tm values obtained for the two fully complementary reference duplexes ONl:DNA (Tm = 28 °C) and ON5:DNA (Tm = 35 °C).
These novel data demonstrate that the pyrene LNA monomer 17d display universal hybridization behavior both in a DNA context (ON4) and in an RNA-like context (ON6), and that the problem of decreased affinity of universal hybridization probes can be solved by the introduction of high-affinity monomers, e.g. 2' -OMe-RNA and/or LNA monomers. Increased affinities compared to ON7 and ON8 were obtained for ON10 and ON11, respectively, but universal hybridization behavior was not obtained as a preference for a central adenine base in the complementary target strand is indicated (Table 8 above).
Example 48: Base-pairing selectivity in hybridization probes.
A systematic thermal denaturation study with ON6 (Table 11) was performed to determine base-pairing selectivity. For each of the four DNA complements (DNA target
strands; monomer Y = A, C, G or T) used in the study shown in Table 8 above, ON6, containing a central pyrene LNA monomer 17d, was hybridized with all four base combinations in the neighboring position towards the 3 '-end of ON6 (DNA target strands; monomer Z = A, C, G or T, monomer X = T) and the same towards the 5 '-end of ON6 (DNA target strands; monomer X = A, C, G or T, monomer Z = T). In all eight subsets of four data points, satisfactory to excellent Watson-Crick discrimination was observed between the match and the three mismatches (Table 11 below, ΔTm values in the range of 5-25 °C).
Table 11. Thermal denaturation experiments (Tm values shown) to evaluate the base-pairing selectivity of the bases neighboring the universal pyrene LNA monomer 17d in the 2'-OMe- RNA LNA chimera ON6. In the target strand [3'-d(CAC-XYZ-ACG)], the central three bases XYZ are varied among each of the four natural bases0
5'-[2'-OMe(GTLG-A17dA-TLGC)] 3'-d(CAC -X Y Z- ACG)
XYZ Tm/°C XYZ Tm/°C XYZ Tm/°C XYZ Tm/°C
TAA 26 TCA 22 TGA 22 TTA 29
TAC 26 TCC 29 TGC 26 TTG 31 TAG 24 TCG 24 TGG 30 TTC 32
TAT 39 TCT 38 TGT 37 TTT 40
AAT 18 ACT 27 AGT 22 ATT 28
CAT 30 CCT 31 CGT 27 CTT 35
GAT 14 GCT 28 GGT 16 GTT 27 TAT 39 TCT 38 TGT 37 TTT 40
a See caption below Table 8 for abbreviations and conditions used; The data for matched neighboring bases (X = Z = T) are shown in bold.
The results reported herein have several important implications for the design of probes for universal hybridization: (1) Universal hybridization is possible with a conformationally restricted monomer as demonstrated for the pyrene LNA monomer; (2) Universal hybridization behavior is feasible in an RNA context; (3) The binding affinity of probes for universal hybridization can be increased by the introduction of high-affinity
monomers without compromising the universal hybridization and the base-pairing selectivity of bases neighboring the universal base.
Based on the results reported herein, that chimeric oligonucleotides comprising pyrene and other known universal bases attached at various backbones (e.g. LNA-type monomers, ribofuranose monomers or deoxyribose monomers in 2'-OMe-RNA/LNA chimeric oligos) likewise will display attractive properties with respect to universal hybridization behavior. For example, an oligomer identical with the 2'-OMe-RNA/LNA oligo ON6 but with the 17d monomer substituted by a pyrenyl-2'-OMe-ribonucleotide monomer.
Example 49: Chimeric oligonucleotides
These chimeric oligonucleotides are comprised of pyrene and other known universal bases attached at various backbones (e.g. LNA-type monomers, ribofuranose monomers or deoxyribose monomers in 2'-OMe-RNA/LNA oligos). Experimentation with these chimeric oligonucleotides are for evaluating the possibility of obtaining similar results to the 2'-OMe-
RNA/LNA oligo ON6 at a lower cost, for example, by substituting PyL with a pyrenyl-2'-
OMe-ribonucIeotide monomer.
Example 50: The use of LNA Oligonucleotide Microarrays Provides Superior Sensitivity and Specificity in Expression Profiling
A. In vitro synthesis of the yeast spike RNAs
Amplification of the yeast genes was carried out by standard two-step PCR using yeast genomic DNA as template (21). In the second PCR, a poly-T2o tail was inserted in the amplicon. The DNA fragments were ligated into the pTRIamplδ vector (Ambion, USA) using the Quick Ligation Kit (New England Biolabs, USA) according to the manufacturers instmctions and transformed into E. coli DH-5α (21). Synthesis of in vitro RNA was done using the MEGAscript™ T7 Kit (Ambion, USA) according to the manufacturers instmctions.
B. Design of the LNA expression arrays
Capture probes were designed using the OligoDesign™ software as described in the previous examples.
C. Printing and hybridisation of the LNA expression microarrays
The LNA oligonucleotide microarrays were printed onto lmmobilizer™ MicroArray Slides (Exiqon, Denmark) using the Packard Biochip I Arrayer (Packard, USA), with a spot
volume of 2 x 300 pi of a 10 DM capture probe solution. Four replicas of each capture probe were printed on each slide. Mixed staged Caenorhabditis elegans worm cultures were cultivated according to standard protocols. RNA was extracted from worm samples using the FastRNA Kit, GREEN (Q-BIO, USA) according to the supplier's instmctions. Fluorochrome-labelled first strand cDNA was synthesized from worm total RNA or in vitro synthesized RNA as described (22) followed by purification of the cDNA target, hybridisation of the microarrays overnight at 65°C, washing of the slides and drying of the arrays (22). The slides were scanned using a ScanArray 4000 XL scanner (Perkin-Elmer, USA), and the array data were processed using the GenePix™ Pro 4.0 software package (Axon, USA).
D. Assessment of sensitivity and specificity in LNA expression microarrays
To enable direct comparisons between LNA and DNA capture probes in measuring gene expression levels, specific oligonucleotide capture probes for the Saccharomyces cerevisiae genes SWI5 and THI4 were designed in the 3 '-end of the two ORFs. The capture probes were synthesized as 50-mer DNA and corresponding LNA-modified oligonucleotides, respectively, with an LNA substitution at every 3rd nucleotide position. In addition, 40-mer DNA and LNA oligonucleotides were designed as truncated versions of the 50-mer capture probes, along with oligonucleotides with 1 to 5 consecutive mismatches positioned centrally in the 50-mer and 40-mer capture probes. All capture probes were synthesized with an anthraquinone group at the 5 '-end and a hexaethyleneglycol dimer linker region (HEG2 spacer arm) enabling photocoupling onto polymer microarray slides as described in US patent No. 6,531,591.
To assess the sensitivity and specificity of the oligonucleotide microarrays, in vitro synthesized yeast RNA for either SWI5 or THI4 was spiked into C. elegans total RNA for cDNA target synthesis followed by hybridization of the microarrays with fluorochrome- labelled cDNA target pools. The incorporation of LNA nucleotides into 50-mer DNA oligonucleotide capture probes results in a 3 to 4-fold increase in fluorescence intensity levels, when hybridized with the spiked, complex cDNA target pools under standard stringency conditions (Figure 45 (a and c)). The sensitivity increase is even more pronounced, 5 to 12-fold, when 40-mer LNA capture probes are employed. None of the yeast capture probes showed cross-hybridization to C. elegans cDNA target control without yeast spike RNA under the same conditions.
The specificity of the oligonucleotide capture probes was examined using a panel of LNA mismatch oligonucleotides together with the DNA controls. As demonstrated in Figure 45 (a and c), the fluorescence intensities obtained with the LNA-modified 40-mer triple mismatch oligonucleotides show a 3-fold intensity decrease relative to the perfectly matched duplexes. In contrast, the corresponding 40-mer standard DNA capture probes are neither capable of forming stable duplexes nor discriminating between the perfect match and mismatched targets under standard hybridization stringency conditions, resulting in low intensity values from all 40-mer DNA capture probes (Figure 45 (a and c)). Interestingly, mismatch discrimination with the 50-mer LNA probes could be significantly improved by increasing the hybridization temperature from 65 °C to 70 °C (Figure 45 (b and d)), without compromising their capture sensitivity. By comparison, the signal intensities from all 50-mer DNA capture probes including the perfect match oligonucleotides were reduced under the same conditions (Figure 45 (b and d)). Considered together, our results strongly support the contention that LNA oligonucleotide capture probes are significantly more sensitive and specific than DNA probes, being able to discriminate between highly homologous (90 %) mRNAs with a 5 to 10-fold increase in sensitivity.
In a typical cell, mRNAs can be subdivided into three kinetic classes: (i) highly abundant (30-90 % of the total mRNA mass, 0.1 % of the sequence complexity); (ii) medium abundant (50 % mass, 2-5 % of complexity); and (iii) low-abundant mRNAs (<1% mass, >90 % of complexity). In addition, alternative splicing has been shown to be prevalent in higher eukaryotes, where at least 50 % of the genes appear to be alternatively spliced, thereby generating additional diversity within the transcriptome. It is thus of utmost importance that the dynamic range, sensitivity and specificity of the expression profiling technology used are optimal, especially when analyzing expression levels of messages and mRNA splice variants belonging to the low-abundant class of high mRNA sequence complexity. A common problem for all DNA oligonucleotide microarrays is the need for an adequate compromise with respect to the sensitivity and specificity of the platform. In the present example LNA oligonucleotide microarrays perform better in expression profiling than microarrays with corresponding DNA probes. Our results clearly demonstrate that both the specificity and sensitivity in target molecule capture can be improved using LNA oligonucleotide microarrays, enabling discrimination between highly homologous mRNAs and alternative splice variants with a simultaneous increase in sensitivity.
Figure 45 shows the sensitivity and specificity of LNA oligonucleotide capture probes (black bars) compared to DNA capture probes (white bars) on expression microarrays. Fluorescence intensity is shown in arbitrary units (relative measurements). The arrays comprising 50-mer and 40-mer perfect match and 1-5 mismatch capture probes were hybridized at 65°C in 3xSSC with Cy3 -labelled cDNA from 10 μg C. elegans total RNA spiked with yeast a) SWI5 RNA and c) THI4 RNA. Demonstration of improved mismatch discrimination with the 50- mer LNA probes by increasing the hybridization temperature from 65 °C to 70 °C hybridized with Cy3 -labelled cDNA from 10 μg C. elegans total RNA spiked with yeast b) SWI5 RNA and d) THI4 RNA.
Example 51. Improved sensitivity in the on-chip capture of yeast HSP78 mRNA using LNA- substituted 25-mer oligonucleotide capture probes.
A. Capture probe design
Unique capture probes for the yeast HSP78 gene were designed using the OligoDesign software, described in Figure 27. The design options used were: (i) length of each oligonucleotide probe was 25 nucleotides; (ii) Blast word length 7; (iii) Blast expectation cutoff 1000; other options were as default; 24 DNA capture probes were selected. Furthermore, three different LNA-substituted probes were designed based on the sequences of the 24 DNA capture probes, selected by OligoDesign: optimal LNA_T, optimal LNA_TC and LNA_3. In the LNA_T design the DNA t nucleotides were substituted with LNA T. For the LNA_TC design, LNA T and C nucleotides were used to substitute DNA t and c. In the LNA_3 design every third DNA nucleotide was substituted with the corresponding LNA nucleotide. For the LNA_T and LNA_TC design, no blocks of LNAs were allowed; in addition the LNAs were substituted in a pattern providing a more narrow duplex melting temperature range compared to the DNA Tm range. In addition, an equivalent set of capture probes with a single mismatch in the central nucleotide position was designed. Altogether, 192 capture probes were designed including a anthraquinone (AQ29) 5 '-modifier and a hexaethyleneglycol dimer (HEG2) at the 5 'end of each probe - as shown in Table 13.
B. Determination of the duplex melting temperatures (Tm). The duplex melting temperatures of the DNA, LNA T and LNA TC designed probes were measured using the Perkin Elmer Lambda 40 Spectrophotometer and according to Wahlestedt et al. PNAS 97/10 2000. All oligos were measured twice and if the replica values
deviated more than 1°C, then a third or a fourth measurement was carried out. The average
Tm for each oligonucleotide duplex is presented in Table 14.
C. In vitro synthesis of fluorochrome-labeled yeast HSP78 RNA
Cl. Genomic DNA was prepared from a wild type standard laboratory strain of Saccharomyces cerevisiae using the Nucleon MiY DNA extraction kit (Amersham
Biosciences) according to supplier's instmctions.
C2. PCR amplification of the partial yeast gene was done by standard PCR using yeast genomic DNA as template. In the first step of amplification, a forward primer containing a restriction enzyme site and a reverse primer containing a universal linker sequence were used. In this step 20 bp was added to the 3' -end of the amplicon, next to the stop codon. In the second step of amplification, the reverse primer was exchanged with a nested primer containing apoly-T o tail and a restriction enzyme site. The SWI5 amplicon contains 730 bp of the SWI5 ORF plus 20 bp universal linker sequence and a poly-A2o tail.
The PCR primers used were: YDR258C-For-SacI: acgtgagctcttttgacatgtcagaatttcaag
YDR258C-Rev-Uni: gatccccgggaattgccatgttacttttcagcttcctcttcaac
Uni-polyT-BamHI: acgtggatccttlllUlUtUUU Lgatccccgggaattgccatg,
C3. Plasmid DNA constructs
The PCR amplicon was cut with the restriction enzymes, EcoRI + BamHI. The DNA fragment was ligated into the pTRIam lδ vector (Ambion) using the Quick Ligation Kit
(New England Biolabs) according to the supplier's instructions and transformed into E. coli
DH-5α by standard methods.
C4. DNA sequencing
To verify the cloning of the PCR amplicon, plasmid DNA was sequenced using Ml 3 forward and Ml 3 reverse primers and analysed on an ABI 377.
C5. Biotin labeling of cRNA
One μg of plasmid containing the HSP78 sequence was linearized with restriction enzyme
BamHI (Amersham Pharmacia Biotech, USA) for 2 hours at 37 °C. The RNA was labeled with biotin-CTP and biotin-UTP using the Message AMP aRNA kit from Ambion (USA) according to the manufacturer's instructions. Following hybridisation, the slides were stained with Streptavidin Phycoerythrin (Molecular Probes, S-866, USA) according to the GeneChip
Expression Analysis Technical Manual (Affymetrix, USA)
C6. Fluorochrome labeling of spike RNA
In vitro synthesized spike RNA from the HSP78 plasmid constract was labeled with either Cy3-ULS or Cy5-ULS (Amersham Biosciences, USA) according to the manufacturer's instructions, followed by filtration through a ProbeQuant G50 Micro Column and Microcon30 (Millipore, USA). The labeling efficiency was monitored using the Nanodrop spectrophotometer (Nanodrop Technologies, USA) D. Microarray fabrication
The microarrays were printed on lmmobilizer™ MicroArray Slides (Exiqon, Denmark) using the MicroGrid II from Biorobotics (UK) using a 20 μM capture probe solution for each oligonucleotide probe. Four replicas of each capture probe were printed on the slides. E. Hybridization with fluorochrome-labelled cRNA
The arrays were hybridized for 16 hours using the following protocol. The labelled RNA samples were hybridized in a hybridization solution (20 μL final volume) containing 3xSSC (final concentration), 25 mM HEPES, pH 7.0 (final concentration), 1.25 μg/μL yeast tRNA, 0.3% SDS. The labeled RNA target sample was filtered in a Millipore 0.22 micron spin column according to the manufacturer's instructions (Millipore, USA), and the probe was denatured by incubating the reaction at 100°C for 2 min. The sample was cooled at 20-25 °C for 5 min. by spinning at max speed in a microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was carefully placed on top of the microarray spotted on lmmobilizer™ MicroArray Slide and the hybridization mixture was applied to the array from the side. An aliquot of 30 μL of 3χSSC was added to both ends of the hybridization chamber, and the lmmobilizer™ MicroArray Slide was placed in the hybridization chamber. The chamber was sealed watertight and incubated at 45°C, 55°C or 65°C for 16-18 hours submerged in a water bath. After hybridisation, the slide was removed carefully from the hybridization chamber and washed using the following protocol. The Lifterslip coverslip was washed off in 6χSSC, pH 7.0 containing 0.1%) Tween20 at 50°C for 15 min., followed by washing of the microarrays in 0.4χSSC, pH 7.0 at 50°C for 30 min. Finally the slides were washed for 5 seconds in 0.05xSSC, pH 7.0. The slides were then dried by centrifugation in a swinging bucket rotor at approximately 200 G for 2 min. F. Data analysis. Following washing and drying, the slides were scanned using a ScanArray 4000XL scanner (Perkin-Elmer Life Sciences, USA), and the array data were processed using the GenePix™ Pro 4.0 software package (Axon, USA).
G. RESULTS
Gl. Duplex melting temperatures
The Tm data clearly shows that LNA-substituted oligonucleotide capture probes have a significantly increased average duplex melting temperature compared to the corresponding DNA probes. Furthermore, the difference in melting temperature between the perfectly matched (PM) and single mismatched (MM) probes, designated as ΔTm, is significantly higher than the corresponding ΔTm for DNA probes (Table 12).
Table 12. The average difference in melting temperature between the perfectly matched (PM) and single mismatched (MM) probes in different capture probe designs. The observed difference between the DNA and LNA substituted probes is statistically significant as revealed by a t-Test; Two-Sample Assuming Unequal Variances.
G2. Microarray hybridization results Both LNA_T and LNA_3 substituted 25-mer probes are capable of providing highly accurate measurements for fold-of-changes in gene expression levels, as depicted in Figure 46. The DNA capture probes did not provide any hybridisation signals under the given microarray hybridisation conditions (Figure 46). Figure 46 shows the expected (black bars) and observed (white bars) fold-of-change in the expression levels of the Cy3-ULS labelled HSP78 spike RNA as measured by on-chip capture using different oligonucleotide capture probes. In the hybridization experiment, 1 ng HSP78 in vitro spike RNA or 200 pg HSP78 in vitro spike RNA was used, respectively. Thus, the fold change of the HSP78 RNA in the two hybridizations in the comparison is 5 -fold. Fourteen additional synthetic in vitro mRNA spike controls were included in the hybridisation solution as a semi-complex background RNA mixture. Seven of these spikes were used as normalization controls, the other seven were used as negative controls. Hybridization temperature was 65°C for 16 hours, and post-
hybridization washes as described above. Under these conditions the DNA capture probes did not produce hybridization signals. Figure 47 shows measured intensity levels by on-chip capture using three different 25-mer oligonucleotide capture probe designs. One (1) ng biotin-labeled HSP78 target was used in the hybridization experiments, followed by staining with Streptavidin Phycoerythrin. The LNA_T and LNA_TC substituted 25-mer capture probes show a significantly enhanced on-chip capture of the HSP78 RNA target, compared to the DNA 25-mer control probes under four different hybridization stringency conditions.
Table 13. Design of the yeast HSP78 capture probes. YDR258C denotes the ORF name of the S. cerevisiae HSP78 gene. The numbers refer to the nucleotide position from the 3 '-end of the HSP78 mRNA sequence. PM = perfectly matched probe, MM = single mismatch probe, LNA substitutions are depicted by capital letters, C denotes LNA methyl-C
Oligo name Sequence Oligo name Sequence
YDR258C_PM_043 tttggtagcacgacaagcttagtat YDR258C_PM_043T TTTggTagcacgacaagcTTagTat
YDR258C_PM_078 cactctaacagtttcgccgtttcta YDR258C_PM_078T cacTcTaacagtTtcgccgTTTcTa
YDR258C PM_124 gttgccatggagttcaaaatctgtc YDR258C_PM_124T gTTgccaTggagTtcaaaaTcTgTc
YD 258C_PM_164 tcaatggccttgcaccatataattg YDR258C_PM_164T TcaaTggccTTgcaccaTaTaaTTg
YDR258C PMJ201 tcagttagccaatccttcgcttcat YDR258C_PM_201T TcagTTagccaaTccTTcgcTTcat
YDR258C PM__249 tttttcggccaaacgatcttgaatt YDR258C_PM_249T TtTTTcggccaaacgaTcTTgaaTt
YDR258C PM_295 attgacctcaaaactttcttggata YDR258C_PM_295T aTTgaccTcaaaacTTTcTTggaTa
YDR258C PM__356 gatgaactcaggtggataggatctt YDR258C_PM_356T gaTgaacTcaggTggaTaggaTcTt
YDR258C_PM_424 accatcatcacccaactttgtgtcg YDR258CJ>M_424T accaTcaTcacccaacTTTgTgTcg
YDR258C_PM_433 cccaactttgtgtcgtttaataaaa YDR258C PM_433T cccaacTTTgTgTcgTTTaaTaaaa
YDR258CJPM_486 caatgatcgtgttacggaaatcaac YDR258C_PM_486T caaTgaTcgTgtTacggaaaTcaac
YDR258C_PM_515 tggcctagggaatcggtcagcttac YDR258C_PM_515T TggccTagggaaTcggTcagcTTac
YDR258C_PM_566 ttggaaacatcggggtgcgcttttt YDR258C_PM_566T TTggaaacaTcggggTgcgcTTTTt
YDR258C_PM_569 gaaacatcggggtgcgctttttcaa YDR258C_PM_569T gaaacaTcggggTgcgcTTTTTcaa
YDR258C P _604 aaaacgacagcataaggctttcttc YDR258C_PM_604T aaaacgacagcaTaaggcTTTcTTc
YDR258C_PM_631 gacagcctcagttaattggccacca YDR258C_PM_631T gacagccTcagtTaaTTggccacca
YDR258C_PM_686 ccgattaaacgagagacagtatgct YDR258C_PM_686T ccgaTTaaacgagagacagTaTgct
YDR258C_PM_757 atcaaataggaattcagctaaagcc YDR258CJ>M_757T aTcaaaTaggaaTTcagcTaaagcc
YDR258C PMJ813 gacctaagaacataaagctggcaat YDR258C_PM_813T gaccTaagaacaTaaagcTggcaat
YDR258C PM 23 ataaagctggcaataggtctctttt YDR258C_PM_823T aTaaagcTggcaaTaggTcTcTTTt
YDR258C_PM_870 agacgtacagcatcagaaatagcag YDR258CJ>M_870T agacgTacagcaTcagaaaTagcag
YD 258C_PM_888 aaatagcagcaatggcctcgtcttg YDR258C_PM_888T aaaTagcagcaaTggccTcgTcTTg
YDR258C_PM_890 atagcagcaatggcctcgtcttggc YDR258C_PM_890T aTagcagcaaTggccTcgTcTTggc
YDR258C PM_896 gcaatggcctcgtcttggccaacga YD 258C_PM_896T gcaaTggccTcgTcTTggccaacga
YDR258C_PM_043 TtTggTagmCamCgamCaagmC
TC tTagTat YDR258C_PM_LNA3_043 TttGgtAgcAcgAcaAgcTtaGtat
YDR258C_PM_078 mCamCtmCtaamCagtTtcgmCc
TC gTtTmCTa YDR258C_PM_LNA3_078 mCacTctAacAgtTtcGccGttTcta
YDR258CJPMJ24 gTtgmCmCaTggagTtmCaaaaT
TC mCtgTc YDR258C _PMJ NA3_124 GttGccAtgGagTtcAaaAtcTgtc
YDR258C_PM_164 TmCaaTggmCcTTgmCacmCa
TC TaTaaTTg YDR258C_PM_LNA3_164 TcaAtgGccTtgmCacmCatAtaAttg
YDR258C_PM_201 TmCagTtagcmCaaTcmCtTmC
TC gmCttmCat YDR258C_PM_LNA3_424 AccAtcAtcAccmCaamCttTgtGtcg
YDR258C_PM_249 TtTtTmCggmCmCaaamCgatm
TC CtTgaaTt YDR258C_PM_LNA3_486 mCaaTgaTcgTgtTacGgaAatmCaac
YDR258C_PM_295 aTTgamCmCTmCaaaamCTTt
TC mCTTggaTa YDR258C_PMJLNA3_515 TggmCctAggGaaTcgGtcAgcTtac
YDR258C PM_356 gaTgaamCTmCaggTggaTagga
TC TmCTt YDR258CJPM_LNA3_566 TtgGaaAcaTcgGggTgcGctTttt
YDR258C_PM_424 amCcaTcaTcacmCmCaaccTtg
TC TgtmCg YDR258CJ>M_LNA3_569 GaaAcaTcgGggTgcGctTttTcaa
YDR258C_PM_433 mCmCmCaacTTtgTgTcgTTTa
TC aTaaaa YDR258C_PM_LNA3_604 AaaAcgAcaGcaTaaGgcTttmCttc
YDR258C PM_486 mCaaTgaTmCgTgtTamCggaaa
TC TmCaac YDR258C_PM_LNA3_757 AtcAaaTagGaaTtcAgcTaaAgcc
YDR258C PM_515 TggmCcTagggaaTmCggTcagm
TC CtTac YDR258C_PM_LNA3_813 GacmCtaAgaAcaTaaAgcTggmCaat
YDR258C_PM_566 TTggaaamCatmCggggTgmCgc
TC tTtTt YDR258C PMJ,NA3_823 AtaAagmCtgGcaAtaGgtmCtcTttt
YDR258C_PM_569 gaaamCaTmCggggTgmCgctttT
TC tmCaa YDR258C >M_LNA3_870 AgamCgtAcaGcaTcaGaaAtaGcag
YDR258C_PM_604 aaaamCgamCagmCaTaaggmC AaaTagmCagmCaaTggmCctmCgtm
TC TtTmCtTc YDR258C_PM_LNA3_888 Cttg
YDR258C_PM_631 gamCagmCcTcagtTaaTTggcm
TC CacmCa YDR258C PMJ_NA3_890 AtaGcaGcaAtgGccTcgTctTggc
YDR258C_PM_686 mCmCgaTTaaamCgagagamCa
TC gTaTgmCt YDR258C_PM_LNA3_896 GcaAtgGccTcgTctTggmCcaAcga
YDR258C_PM_757 aTmCaaaTaggaaTtmCagmCTa
TC aagmCc YDR258C_PM_LNA3_631 GacAgcmCtcAgtTaaTtgGccAcca
YDR258C_PM_813 gamCmCTaagaamCaTaaagmC
TC TggmCaat YDR258C_PM_LNA3_686 mCcgAttAaamCgaGagAcaGtaTgct
YDR258C_PM_823 aTaaagmCTggmCaaTaggTmC
TC TcTTTt YDR258C_PM_LNA3_356 GatGaamCtcAggTggAtaGgaTctt
YDR258C PM_870 agamCgTamCagmCaTmCagaa
TC aTagmCag YDR258CJPM_LNA3_201 TcaGttAgcmCaaTccTtcGctTcat
YDR258C PM_888 aaaTagmCagmCaaTggmCcTm
TC CgTctTg YDR258C_PM_LNA3_249 TttTtcGgcmCaaAcgAtcTtgAatt
YDR258C PM_890 aTagmCagcaaTggmCcTcgtmCt
TC Tggc YDR258C_PM_LNA3_295 AttGacmCtcAaaActTtcTtgGata
YD 258C_PM_896 gcaatggcctmCgTmCttggccaacg
TC a YDR258C_PM_LNA3_433 mCccAacTttGtgTcgTttAatAaaa
YDR258C_MM_043 tttggtagcacgtcaagcttagtat YDR258C MMJ)43T TTTggTagcacgtcaagcTTagTat YD 258C_MM_078 cactctaacagtatcgccgtttcta YDR258C_MM_078T cacTcTaacagtatcgccgTTTcTa YDR258C_MM_124 gttgccatggagatcaaaatctgtc YDR258C_MM_124T gTTgccaTggagatcaaaaTcTgTc YDR258C_MM_164 tcaatggccttggaccatataattg YDR258C_MM_164T TcaaTggccTTggaccaTaTaaTTg YDR258C_MM_201 tcagttagccaaaccttcgcttcat YDR258C_MM_201T TcagTTagccaaaccTTcgcTTcat YDR258C_MM_249 tttttcggccaatcgatcttgaatt YDR258CJVIM_249T TtTTTcggccaatcgaTcTTgaaTt YDR258C_MM_295 attgacctcaaatctttcttggata YDR258C_MM_295T aTTgaccTcaaatcTTTcTTggaTa YDR258C_MM_356 gatgaactcaggaggataggatctt YDR258C_MM_356T gaTgaacTcaggaggaTaggaTcTt YDR258C_MM_424 accatcatcaccgaactttgtgtcg YDR258CJVLM_424T accaTcaTcaccgaacTTTgTgTcg YDR258C_MM_433 cccaactttgtgacgtttaataaaa YDR258C_MM_433T cccaacTTTgTgacgTTTaaTaaaa YDR258C_MM_486 caatgatcgtgtaacggaaatcaac YDR258CJVLM_486T caaTgaTcgTgtaacggaaaTcaac YDR258C_MM_515 tggcctagggaaacggtcagcttac YDR258C_MM_515T TggccTagggaaacggTcagcTTac YDR258C_MM_566 ttggaaacatcgcggtgcgcttttt YDR258C_MM_566T TTggaaacaTcgcggTgcgcTTTTt YDR258C_MM_569 gaaacatcggggagcgctttttcaa YDR258C_MM_569T gaaacaTcggggagcgcTTTTTcaa YDR258C_MM_604 aaaacgacagcaaaaggctttcttc YDR258C_MM_604T aaaacgacagcaaaaggcTTTcTTc YDR258C_MM_631 gacagcctcagtaaattggccacca YDR258C_MM_631T gacagccTcagtaaaTTggccacca YDR258C_MM_686 ccgattaaacgacagacagtatgct YDR258C_MM_686T ccgaTTaaacgacagacagTaTgct YDR258C_MM_757 atcaaataggaaatcagctaaagcc YDR258C_MM_757T aTcaaaTaggaaatcagcTaaagcc YDR258C_MM_813 gacctaagaacaaaaagctggcaat YDR258C_MM_813T gaccTaagaacaaaaagcTggcaat YDR258C_MM_823 ataaagctggcattaggtctctttt YDR258C_MM_823T aTaaagcTggcaaaaggTcTcTTTt YDR258C_MM_870 agacgtacagcaacagaaatagcag YDR258C_MM_870T agacgTacagcaacagaaaTagcag YDR258C_MM_888 aaatagcagcaaaggcctcgtcttg YDR258C_MM_888T aaaTagcagcaaaggccTcgTcTTg YDR258CJMM_890 atagcagcaatgccctcgtcttggc YDR258C_MM_890T aTagcagcaaTgcccTcgTcTTggc YDR258C_MM_896 gcaatggcctcgacttggccaacga YDR258C MM 896T gcaaTggccTcgacTTggccaacga
YDR258C_MM_043 TtTggTagmCamCgtcaagmCtTa TC gTat YDR258C_MM_ NA3_043 TttGgtAgcAcgTcaAgcTtaGtat
YDR258C_MM_078 mCamCtmCtaamCagtatcgmCc TC gTtTmCTa YDR258C_MM_LNA3_078 mCacTctAacAgtAtcGccGttTcta
YD 258C_MM_124 gTtgmCmCaTggagatmCaaaaT TC mCtgTc YDR258C_MM_LNA3_124 GttGccAtgGagAtcAaaAtcTgtc
YDR258C_MM_164 TmCaaTggmCcTTggacmCaTa TC TaaTTg YDR258C_MM__LNA3_164 TcaAtgGccTtgGacmCatAtaAttg
YDR258C_MM_201 tmCagTtagcmCaaacctTcgmCtt TC mCat YDR258C_MM_LNA3_201 TcaGttAgcmCaaAccTtcGctTcat
YDR258C_MM_249 TtTtTmCggmCmCaatcgatmCt
TC TgaaTt YDR258C_MM_LNA3_249 TttTtcGgcmCaaTcgAtcTtgAatt
YDR258C_MM_295 aTTgamCmCTmCaaatcTTtmC
TC TTggaTa YDR258C_MM_LNA3_295 AttGacmCtcAaaTctTtcTtgGata
YDR258C_MM_356 gaTgaamCTmCaggaggaTagga
TC TmCTt YD 258C_MM_LNA3_356 GatGaamCtcAggAggAtaGgaTctt
YD 258C_MM_424 amCcaTcaTcacccaaccTtgTgtm
TC Cg YDR258C_MM_LNA3_424 AccAtcAtcAccGaamCttTgtGtcg
YDR258C_MM_ 433 mCmCmCaacTTtgTgacgTTTa
TC aTaaaa YDR258C_MM_LNA3_433 mCccAacTttGtgAcgTttAatAaaa
YDR258C_MM_486 mCaaTgaTmCgTgttamCggaaa
TC TmCaac YDR258C_MM_LNA3_486 mCaaTgaTcgTgtAacGgaAatmCaac
YDR258C_MM_515 TggmCcTagggaaacggTcagmCt
TC Tac YDR258C_MM_LNA3_515 TggmCctAggGaaAcgGtcAgcTtac
YDR258C_MM_566 TTggaaamCatmCgcggTgmCgc
TC tTtTt YDR258C_MM_LNA3_566 TtgGaaAcaTcgmCggTgcGctTttt
YDR258C_MM_569 gaaamCaTmCggggagmCgctttT
TC tmCaa YDR258C_MM_LNA3_569 GaaAcaTcgGggAgcGctTttTcaa
YDR258C_MM_604 aaaamCgamCagmCaaaaggmC
TC TtTmCtTc YDR258C_MM_LNA3_604 AaaAcgAcaGcaAaaGgcTttmCttc
YDR258C_MM_631 gamCagmCcTcagtaaaTTggcm
TC CacmCa YDR258C_MM_LNA3_631 GacAgcmCtcAgtAaaTtgGccAcca
YDR258C_MM_686 mCmCgaTTaaamCgacagamCa mCcgAttAaamCgamCagAcaGtaTgc
TC gTaTgmCt YDR258C_MM_ NA3_686 t
YDR258C_MM_757 aTmCaaaTaggaaatmCagmCTa
TC aagmCc YDR258C_MM_LNA3_757 AtcAaaTagGaaAtcAgcTaaAgcc
YDR258C_MM_813 gamCmCTaagaamCaaaaagmC
TC TggmCaat YDR258C_MM_LNA3_813 GacmCtaAgaAcaAaaAgcTggmCaat
YDR258C_MM_823 aTaaagmCTggmCattaggTmCT
TC cTTTt YDR258C_MM_LNA3_823 AtaAagmCtgGcaTtaGgtmCtcTttt
YDR258C_MM_870 agamCgTamCagmCaacagaaaT
TC agmCag YDR258C_MM_LNA3_870 AgamCgtAcaGcaAcaGaaAtaGcag
YDR258C_MM_888 aaaTagmCagmCaaaggmCcTm AaaTagmCagmCaaAggmCctmCgtm
TC CgTctTg YDR258C_MM_LNA3_888 Cttg
YDR258C_MM_890 aTagmCagcaaTgcccTcgtmCtT
TC ggc YDR258CJVIMJLNA3_890 AtaGcaGcaAtgmCccTcgTctTggc
YDR258C_MM_896 gmCaaTggcctmCgactggccaam
TC Cga YDR258C_MM_LNA3_896 GcaAtgGccTcgActTggmCcaAcga
Table 14. Duplex melting temperatures (Tm) for the 144 different 25-mer oligonucleotide capture probes. The design column denotes the sequence design of the probe. PM = perfectly
matched probe, MM = single mismatch probe, LNA substitutions are depicted by capital letters, mC denotes LNA methyl-C
Example 52. Performance analysis of LNA substituted oligonucleotide capture probes designed to detect splice variants in complex RNA pools. A. Oligonucleotide design for microarrays. The methods for designing exon-specific internal oligonucleotide capture probes has been described in example 2. Al. Design of the LNA-modified capture probes
For the internal LNA-modified oligonucleotide capture probes, every third DNA nucleotide was substituted with an LNA nucleotide. The probes designed to capture the splice junction of the recombinant splice variants were designed with LNA substitutions at every third nucleotide position. All capture probes are shown in Table 15.
Table 15. Internal, exon-specific and merged, exon-exon splice junction specific oligonucleotide capture probes used in the example. Capital letters denote LNA nucleotides and mC LNA methyl-cytosine
H 1380 l>gene78.INS4b_ 40 LNA3 IcGctGatTatActGcgGagAagGtgGgtGagTatAaaGac |
B. Printing and coupling of the splice isoform-specific microarrays
The splice variant capture probes were synthesized with a 5' anthraquinone (AQ)- modification, followed by a hexaethyleneglycol-2 (HEG2) linker. The capture probes were first diluted to a 20 μM final concentration in 100 mM Na-phosphate buffer pH 7.0, and spotted on the lmmobilizer polymer microarray slides (Exiqon, Denmark) using the Biochip Arrayer One (Packard Biochip Technologies, USA) with a spot volume of 2x 300 pi and 300 μm between the spots. The capture probes were immobilized onto the microarray slide by UN irradiation in a Stratalinker with 2300 μjoules (Stratagene, USA). Non-immobilized capture probe oligonucleotides were removed from the slides by washing the slides two times 15 min. in lxSSC. After washing, the slides were dried by centrifugation at lOOOx g for 2 min., and stored in a slide box until microarray hybridization.
C. Construction of the Splice Variant Clones
The recombinant splice variant constmcts were cloned into the Triamplδ vector (Ambion, USA). The constmcts were sequenced to confirm their construction. The plasmid clones were transformed into E. coli XLIO-Gold (Stratagene, USA).
Genomic DNA was prepared from a wild type standard laboratory strain oi Saccharomyces cerevisiae using the Nucleon MiY DNA extraction kit (Amersham Biosciences, USA) according to the supplier's instmctions. Amplification of the partial yeast gene was done by standard PCR using yeast genomic DNA as template. In the first step of amplification, a forward primer containing a restriction enzyme site and a reverse primer containing a universal linker sequence were used. In this step 20 bp was added to the 3 '-end of the amplicon, next to the stop codon. In the second step of amplification, the reverse primer was exchanged with a nested primer containing a poly-T o tail and a restriction enzyme site. The SWI5 amplicon contains 730 bp of the SWI5 ORF plus 20 bp universal linker sequence and a poly-A20 tail.
The PCR primers used were;
YDR146C-For-ΕcoRI: acgtgaattcaaatacagacaatgaaggagatga YDR146C-Rev-Uni: gatccccgggaattgccatgttacctttgattagttttcattggc Uni-polyT-BamHI: acgtggatectttttttttttttt^^
The PCR amplicon was cut with the restriction enzymes, EcoRI + BamHI. The DNA fragment was ligated into the ρTRIampl8 vector (Ambion, USA) using the Quick Ligation
Kit (New England Biolabs, USA) according to the supplier's instmctions and transformed into E. coli DH-5α by standard methods.
Cl. Constmction of the recombinant splice variant #1 (Triampl8/swi5-rubisco) The Arabidopsis thaliana Rubisco small subunit ssu2b gene fragment (gil7064721) was amplified from genomic DNA by primers named DJ305 5'- ACTATGATGGACGATACTGGAC-3' and DJ306 5'-
ATTGGATCGATCCGATGATCCTAATGAAGGC-3', containing Clal restriction site linkers. The purified PCR fragment was digested with Clal and then cloned into the swi5 (gl:7839148) vector at the unique Clal site (atcgat) giving each insert a flanking sequence from the original yeast SWI5 insert (named exonOl and exon 03, see Figure 19). The product was inserted in the reverse orientation, so that the insert sequence is:
atcgatCCGATGATCCTAATGAAGGCGCCCGGGTACTCCTTCTTGCATTCTTCAACTT CCTTCAACACTTGAGCGGAGTCGGTGCATCCGAACAATGGAAGCTTCCACATTGT CCAGTATCGTCCATCATAGTatcgat
Nucleotide sequence analysis revealed a difference between the sequence of A. thaliana Rubisco expected from the GenBank database and that obtained from all sequenced constmcts and PCR products. Position 30 in the Rubisco insert is C rather than the expected A. This SNP was probably created by PCR. None of the oligonucleotide capture probes used in the example cover this region.
Rubisco seq. in genbank TCCTAATGAAGGCGCCA The sequence obtained from the plasmid contmct TCCTAATGAAGGCGCCC C2. Constmction of the recombinant splice variant # 2 (Triampl8/swi5-lea)
The Arabidopsis thaliana Lea gene (gil 526423) was amplified from genomic DNA with primers named DJ307 5'-GGAATTATCGATGTGTGATAGGATCAGTGTTCAG-3', and DJ308 5'-AATTGGATCGATATTAGCAGTCTCCTTCGCC-3', including the Clal linker sites as above. The PCR fragment was digested with Clal cloned into the yeast SWI5 INT constmct as above at the unique Clal site.
The fragment was inserted in the forward orientation, resulting in the following insert sequence:
atcgatGTGTGATAGGTTCAGTGTTCAGGGCTGTCCAAGGAACGTATGAGCATGCGA GAGACGCTGTAGTTGGAAAAACCCACGAAGCGGCTGAGTCTACCAAAGAAGGA GCTCAGATAGCTTCAGAGAAAGCGGTTGGAGCAAAGGACGCAACCGTCGAGAA AGCTAAGGAAACCGCTGATTATACTGCGGAGAAGGTGGGTGAGTATAAAGACTA TACGGTTGATAAAGCTAAAGAGGCTAAGGACACAACTGCAGAGAAGGCGAAGG AGACTGCTAATatcgat.
Figure 19 shows the constmction of the recombinant splice variants in the in vitro transcription vector. The small bars show the location of the oligonucleotide capture probes used in this example. The sequences of the capture probes are shown in Table 15. D. Preparation of target
Dl. In vitro RNA preparation from splice variant vectors
In vitro RNA from the splice variants were made using the MEGAscript™ high yield transcription kit according to the manufacturer's instmctions (Ambion, USA). The yield of IVT RNA was quantified at a Nanodrop spectrophotometer (Nanodrop Technologies, USA) D2. Isolation of total RNA from C. elegans
Strains and growth conditions: C. elegans wild-type strain (Bristol-N2) was maintained on nematode growth medium (NG) plates seeded with Escherichia coli strain OP50 at 20 °C, and the mixed stages of the nematode were prepared as described in Hope, I. A. (ed.) " C. elegans - A Practical Approach ", Oxford University Press 1999. The samples were immediately flash frozen in liquid N and stored at - 80 °C until RNA isolation.
A 100 μl aliquot of packed C. elegans worms from a mixed stage population was homogenized using the FastPrep Bio 101 from Kem-En-Tec for 1 min, speed 6 followed by isolation of total RNA from the extracts using the FastPrep Biol 01 kit (Kem-En-Tec) according to the manufacturer's instmctions. The eluted total RNA was ethanol precipitated for 24 hours at - 20°C by addition of 2.5 volumes of 96% EtOH and 0.1 volume of 3M Na- acetate, pH 5.2 (Ambion, USA), followed by centrifugation of the total RNA sample for 30 min at 13200 rpm. The total RNA pellet was air-dried and redissolved in 10 μl of diethylpyrocarbonate (DEPC)-treated water (Ambion, USA) and stored at - 80°C. E. Fluorochrome-labelling of the target
The following fluorochrome-labelled cDNA targets were synthesized to test the performance of 'merged' splice junction probes that encompass exon borders. Synthetic RNAs corresponding to three artificial splice variants; #1 (exonOl -INS3-exon03 (1-INS3-3),
#2 (exon01-INS4-exon03) (01-INS3-3) and #3 without the middle exon (01-03) were spiked into lOμg of C. elegans reference total RNA samples in various combinations and concentrations prior to fluorochrome-labelling with either Cy3 or Cy5 as indicated in Table 16. At the same time lOμg of C. elegans reference total RNA was labeled with Cy3 for control experiments. Hybridizations were performed with Cy3- and Cy5 labeled C. elegans RNA + spike RNA mix. The details of RNA samples and synthetic RNA spikes are shown in Table 16. The RNA samples were combined in individual labeling reactions with 5 μg anchored oligo(dT2o) primer and DEPC-treated water to a final volume of 8 μl. The mixture was heated at 70°C for 10 min, quenched on ice for 5 min, followed by addition of 20 units of Superasin RNase inhibitor (Ambion, USA), 1 μl dNTP solution (lOmM each dATP, dGTP, dTTP and 0.4 mM dCTP, and 3 μl of Cy3-dCTP or Cy5-dCTP, Amersham Biosciensces, USA), 4 μl 5 x RTase buffer (Invitrogen), 2μl 0.1 mM DTT (Invitrogen), 400 units of Superscript II reverse transcriptase (Invitrogen, USA) and DEPC-treated water to 20 μl final volume. Background hybridization to merged capture probes was monitored with lOμg of C. elegans reference RNA alone labeled with Cy3-dCTP; according to the labeling method described above for the splice variant spikes. All cDNA syntheses were carried out at 42°C for 2 hours, and the reaction was stopped by incubation at 70°C for 5 min., followed by incubation on ice for 5 min.
Unincorporated dNTPs were removed by gel filtration using MicroSpin S-400 HR columns as described in the following: Pre-spin the column 1 min at 1500 x g in a 1.5 ml tube and place the column in a new 1.5 ml tube. Slowly apply the cDNA sample to the top centre of the resin, spin 1500-x g for 2 min and collect the eluate. The RNA was hydrolyzed by adding 3 μl of 0.5 M NaOH, mixing and incubating at 70 °C for 15 min. The samples were neutralized by adding 3 μl of 0.5 M HCl and mixing, followed by addition of 450 μl lxTE, pH 7.5 to the neutralized sample and transfer onto a Microcon-30 concentrator (prior to use, spin 500 μl lxTE through the column to remove residual glycerol). The samples were centrifuged at 14000-x g in a microcentrifuge for 12 min. Spinning was continued until volume was reduced to 5 μl. The labelled cDNA probes were eluted by inverting the Microcon-30 tube and spinning at 1000-x g for 3 min.
Table 16. Synthetic splice variant RNAs spiked into C. elegans samples*.
* Parts per million (ppm) calculations indicate spike transcripts per total transcripts in the hybridisation mix. Calculations are based on an average C. elegans RNA being 1000 nucleotides as in Hill et al. (2000). Science 290:809-812. F. Microarray hybridization
The fluorochrome-labelled cDNA samples, respectively, were combined. The following was added: 3.75 μl 20x SSC (3x SSC final, pass through 0.22 μ filter prior to use to remove particulates) yeast tRNA (1 μg/μl final) 0.625 μl 1 M HEPES, pH 7.0 (25 mM final, pass through 0.22 μfilter prior to use to remove particulates) 0.75 μl 10 % SDS (0.3 % final) and DEPC-water to 25 μl final volume. The labelled cDNA target samples were filtered in Millipore 0.22 μ filter spin column (Ultrafree-MC, Millipore, USA) according to the manufacturer's instmctions, followed by incubation of the reaction mixture at 100 °C for 2-5 min. The cDNA probes were cooled at room temp for 2-5 min by spinning at max speed in a microcentrifuge. A LifterSlip (Erie Scientific Company, USA) was carefully placed on top of the microarray spotted on ImmobilizerTM MicroArray Slide and the hybridization mixture was applied to the array from the side. An aliquot of 30 μL of 3xSSC was added to both ends of the hybridization chamber, and the lmmobilizer™ MicroArray Slide was placed in the
hybridization chamber (DieTech, USA). The chamber was sealed watertight and incubated at 65°C for 16-18 hours submerged in a water bath. After hybridisation, the slide was removed carefully from the hybridization chamber and washed using the following protocol. The slides were washed sequentially by plunging gently in 2 x SSC/0.1% SDS at room temperature until the cover slip falls of into the washing solution, then in lx SSC pH 7.0 (150 mM NaCl, 15 mM Sodium Citrate) at room temperature for 1 min, then in 0.2 x SSC, pH 7.0 (30 mM NaCl, 3 mM Sodium Citrate) at room temperature for 1 min, and finally in 0.05 x SSC (7.5 mM NaCl, 0.75 mM Sodium Citrate) for 5 sec, followed by drying of the slides by spinning at 1000 x g for 2 min. The slides were stored in a slide box in the dark until scanning.
G. Microarray data analysis.
The splice variant microarray was scanned in a ScanArray 4000XL confocal laser scanner (Packard Instruments, USA). The hybridisation data were analysed using the GenePix Pro 4.01 microarray analysis software (Axon, USA). The C. elegans reference RNA alone converted to first strand cDNA and labelled with Cy3-dCTP did not produce significant fluorescence intensity signals from the LNA substituted spike RNA specific capture probes.
Gl. A mathematical formula for analysis of the microarray data for alternative splicing One of the major limitations to comparative microarray hybridisation assays is that only identical probes can be compared between samples. Different alternative splice forms are detected using different probes, and this will tell directly if one splice form is more abundant in a given tissue compared to another. However, the estimation of the ratios between splice forms in a single tissue is not directly accessible. Given an example similar to that described below we employ the following calculations to calculate quantities of splice variants from array data. The theoretical justification is shown. To our knowledge this justification has not been used by any previous analysis.
splice form A exonl exon3 probe2 probe 1
splice form B exonl exon2 exon3 probe3 probel
The above scenario is tested in a comparative hybridisation, with two channels: I & II (signal from probe2 in channel I is called probe2(I), and so forth). Probel hybridises to both splice forms, Probe2 hybridises to A only, Probe3 hybridises to B only.
Since every transcript will hybridise to probel, and every transcript will hybridise to either probe2 or probe3, there exists some relationship between the following: probe 1 (I) ~ {ρrobe2(I) and ρrobe3 (I) } . probel (II) ~ {probe2(II) andprobe3(II)}.
For simplicity we assume that systematic differences between channels have already been eliminated through normalisation, although this is not essential. We now imagine a factor (x) that will transform the signal of ρrobe2 into a value directly comparable to probel . Likewise we imagine factor (y) for probe3. As long as we are not facing saturation in the hybridisations, the assumption of a linear relationship between absolute probe signals is reasonable.
The introduction of variables x & y will give the following equations: probel(I) = (x)ρrobe2(I) + (y)probe3(I). probel (U) = (x)probe2(II) + (y)probe3(II).
Since all signals are measurable, the above is two linear equations with two unknown variables, that can easily be solved. Further the ratio between (x)probe2(I) & (y)probe3(I) will provide the ratio between splice form A and B in channel I. Similarly, the ratio of (x)probe2(II) to (y)probe3(II) is used for channel II.
Data normalization is not required for this method.
In the above equations, probel denotes all probes that will hybridize to both spliceforms, probe2 denotes probes that specifically will hybridize to spliceform A but not B, and probe3 denotes probes that will specifically hybridize to spliceform B but not A. In the case where two spliceforms consist of gene78 with two different insertsmiddle exons
(INS3& INS4), probes can be grouped as follows (only LNA 40mers are considered here):
The equations can be solved with any combinations of one representative from each probe group. This gives a total of 48 (4x3x3) possible ways of solving the equations. The estimated quantities of the constmcts are given as the average of all possible solutions (equations yielding non-positive solutions are ignored). This was done for all comparative hybridizations. Note that when comparing with gene78 with no insert, only 12 equations are possible (The, since the artificial splice variant constmct with no insert has only one specific probe). The results from analysis of the microarray hybridization data from the RNA pools spiked with different splice isoforms at different ratios and concentrations are shown in Figure 49 and Figure 50. RESULTS
Figure 49 shows detection of alternatively spliced mRNAs using LNA-substituted 50-mer oligonucleotide capture probes in a bar diagram format. Figure 50 shows detection of alternatively spliced mRNAs using LNA-substituted 40-mer oligonucleotide capture probes. Both 50-mer and 40-mer LNA-DNA mixmer substituted oligonucleotide capture probes, substituted with an LNA nucleotide at every third nucleotide position, were able to provide highly accurate measurements for fold-changes in the expression of tliree homologous, alternatively spliced mRNA variants in the concentration range of 1000 ppm to 10 ppm. The quantification of the splice isoforms was carried out using a set of both internal, exon- specific probes and merged, splice junction specific probes, printed onto microarrays and hybridized with complex cDNA target pools spiked with different cloned artificial splice isoforms in which the middle exon was either alternatively skipped or excluded completely resulting in the three different splice isoforms; 01-INS3-03, 01-INS4-03 and 01-03.
Example 53. Expression Profiling of Toxicological responses in Caenorhabditis elegans using LNA Oligonucleotide Microarrays and beta-naphthoflavone and primaquine as model compounds.
The present patent example demonstrates the use of the Exiqon C. elegans LNA tox oligoarray in gene expression profiling experiments in the nematode Caenorhabditis elegans. The C. elegans tox oligoarray will monitor the expression of a selection of 110 genes relevant for general stress response and for the metabolism of toxic compounds. Two different capture probes for each of these target genes have been designed, and included on the LNA tox array. In addition, the C. elegans LNA tox oligoarray contains capture probes providing control for
cDNA synthesis efficiency and the developmental stage of the nematode. Capture probes for constitutively expressed genes for data set normalization is also included on the C. elegans LNA tox oligoarray.
A. Cultivation of C. elegans worms For all cultures the sample is divided into two, and one half of the sample is used as the control, the other as the treated sample. Worm samples are harvested and sucrose cleaned by standard methods. For heat shock treatment: the heat shock sample is added to S-media preheated to 33°C in a 1 L flask suspended in a water bath at 33°C, the other sample is added to a 1 L flask with S-media at 25°C. Both samples are shaken at approximately 100 rpm. for an hour. For β-naphthoflavone and primaquine treatment: 0.5 mL of 5 mg/mL β- naphthoflavone in DMSO or 0.5 mL of 20 mg/mL primaquine in DMSO were added to each 500 mL volume of S-media culture after 28 hours of growth from LI. At the same time 0.5 mL of DMSO was added to the control. Incubation was for 24 hours. Samples are then harvested by centrifugation at 3000xg suspended in RNAEαter™ (Ambion) and immediately frozen in liquid nitrogen.
B. RNA extraction
RNA was extracted from the worm samples using the FastRNA® Kit, GREEN (Q-BIO) essentially according to the suppliers' instmctions.
C. Design and synthesis of the LNA capture probes Capture probes were designed using an in-house developed software. Regions with unique mRNA sequence of the selected target genes were identified. The optimal 50-mer oligonucleotide sequences with respect to Tm, self-complementarity and secondary structure were selected. LNA modifications were incorporated to increase affinity and specificity.
D. Printing of the LNA microarrays The microarrays were printed on lmmobilizer™ MicroArray Slides (Exiqon, Denmark) using the MicroGrid II from Biorobotics (UK). The arrays were printed with a 10 μM capture probe solution. Two replicas of the capture probes were printed on each slide.
E. Synthesis of fluorochrome labelled first strand cDNA from total RNA
15 μg of C. elegans total RNA was combined with 5 μg oligo dT primer (T20NN) in an RNase free, pre-siliconized 1.5 mL tube, and the final volume was adjusted with DEPC- water to 14.5 μL. The reaction mixture was heated at +70°C for 10 min., quenched on ice 5 min., spin 20 seconds, followed by addition of 1 μL SUPERase-In™ (20U/μL, Ambion, USA), 6
μL 5xRTase buffer (Invitrogen, USA), 3 μL 0.1 M DTT (Invitrogen, USA), 1.5 μL dNTP (20mM dATP, dGTP, dTTP; 4 mM dCTP in DEPC-water, Amersham Biosciences, USA), and 3 μL Cy3™-dCTP or Cy5™-dCTP (Amersham Biosciences, USA). First strand cDNA synthesis was carried out by adding 1 μL of Superscript™ II (Invitrogen, 200 U/mL), mixing and incubating the reaction mixture for 1 hour at 42°C. An additional 1 μL of Superscript™ II was added and the cDNA synthesis reaction mixture was incubated for an additional 1 hour at 42°C; the reaction was stopped by heating at 70°C for 5 min., and quenching on ice for 2 min. The RNA was hydrolyzed by adding 5 μL of 1 M NaOH, and incubating at 70°C for 15 min. The samples were neutralized by adding 5 μL of 1 M HCl, and purified by adding 450 μL lxTE buffer, pH 7.5 to the neutralized sample and transferring the samples onto a
Microcon-30 concentrator. The samples were centrifuged at 14000xg in a microcentrifuge for ~8 min, the flow-through was discarded and the washing step was repeated twice by refilling the filter with 450 μl lxTE buffer and by spinning for ~12 min. centrifugation was continued until the volume was reduced to about 5 μL, and finally the labelled cDNA probe was eluted by inverting the Microcon-30 tube and spinning at lOOOxg for 3 min.
F. Synthesis of fluorochrome labelled cRNA from total RNA
First and second strand cDNA syntheses were made using the MessageAmp™ aRNA Kit (Ambion) according to suppliers' instmctions. Five microgram of C. elegans total RNA was used as template for cDNA syntheses. Syntheses of fluorescent cRNA were made according to the MessageAmp™ aRNA Kit (Ambion) protocol with minor modifications. Cy3™-UTP or Cy5™-UTP (6 μl of a 5 mM solution Amersham Biosciences, USA) replaced biotin-CTP. The final concentration of ATP, CTP, and GTP was 7.5 mM whereas the concentration of UTP was reduced to 4.9 mM.
G. Hybridization with fluorochrome-labelled cDNA or cRNA The arrays were hybridized overnight using the following protocol. The Cy3™ and Cy5™- labelled cDNA or cRNA samples were combined in one tube followed by addition of 3 μL 20χSSC (3xSSC final), 0.5 μL 1 M HEPES, pH 7.0 (25 mM final), 25 μg yeast tRNA (1.25 μg/μL final), 0.6 μL 10% SDS (0.3% final), and DEPC-treated water to 20 μL final volume. The labelled cDNA target sample was filtered in a Millipore 0.22 micron spin column according to the manufacturer's instmctions (Millipore, USA), and the probe was denatured by incubating the reaction at 100°C for 2 min. The sample was cooled at 20-25 °C for 5 min. by spinning at max speed in a microcentrifuge. A LifterSlip (Erie Scientific Company, USA)
was carefully placed on top of the microarray spotted on lmmobilizer™ MicroArray Slide and the hybridization mixture was applied to the array from the side. An aliquot of 30 μL of 3χSSC was added to both ends of the hybridization chamber, and the lmmobilizer™ MicroArray Slide was placed in the hybridization chamber. The chamber was sealed watertight and incubated at 65 °C for 16-18 hours submerged in a water bath. After hybridisation, the slide was removed carefully from the hybridization chamber and washed using the following protocol. The Lifterslip coverslip was washed off in 2xSSC, pH 7.0 containing 0.1% SDS at room temperature for 1 min., followed by washing of the microarrays subsequently in 1. Ox SSC, pH 7.0 at room temperature for 1 min, and then in 0.2xSSC, pH 7.0 at room temperature for 1 min. Finally the slides were washed for 5 seconds in 0.05xSSC, pH 7.0. The slides were then dried by centrifugation in a swinging bucket rotor at approximately 200 G for 2 min. The slide is now ready for scanning. H. Data analysis. Following washing and drying, the slides were scanned using a ScanArray 4000XL scanner (Perkin-Elmer Life Sciences, USA), and the array data were processed using the GenePix™ Pro 4.0 software package (Axon, USA). The data in each image was normalized so that the ratio of means of all of the features is equal to 1.
RESULTS Use of LNA-modified oligonucleotide capture probes in Exiqons C. elegans LNA tox oligoarray clearly allows the identification of distinct expression profiles for C. elegans genes relevant for general stress response and for the metabolism of toxic compounds.
Table 17. Expression profiling using LNA Oligonucleotide Microarrays. Log2 transformed fold of changes for selected genes in the two expression profiling experiments hybridised with cRNA target.
Table 18. LNA-modified oligonucleotide capture probes. LNA modifications are depicted by uppercase letters in the sequence, mC denotes LNA methyl cytosine.
Oligo Name Sequence
CEABC_C34G6.4_u293_LNA3 TgcmCatTgcAcgGgcActTgtTcgAtcTccTtcTgtTttActTttGgaTg CEABC_C34G6.4_u375_LNA3 TcaTtcTagGatTgcmCagAtgGttAtgAtamCtcAtgTcgGagAgaAagGa CEABC_F57C12.4_u15_LNA3 mCcaAtgTtgTttAatTggTtgTaaTgtmCttGatGacmCtgmCatAatmCatAt CEABC_F57C12.4_u480_LNA3 mCacAagAtcmC.gTgtTgtTctmCcgGaamCaaTgaAaaTgaActTagAtcmCa CEABC_F57C12.5_u111_LNA3 TacTtgTtcTcgAcaAagGttGtgTagmCcgAgtTtgAcamCtcmCgaAgaAa CEABC_F57C12.5_u444_LNA3 TgaActTggAtcmCctTctTtgmCatTtaGcgAtgAtcAaaTttGggAagmCg CEABC_K08E7.9_d8_LNA3 TcaTtaAttTtgTgtAgcTttmCttTctmCgaTttTtgmCacGatmCttTccmCc CEABC_K08E7.9_u51_LNA3 AggGtgmCctActAcaAacTgamCccAaaAgcAgaTgamCcgAgaAgaAatAa CEABC_Y39D8C.1_u37_LNA3 AttGaaAgcGacGcgGaaAgtGccAtgTatTtcTaaTttTgtTttmCttTa CEABC_Y39D8C.1_u422_LNA3 TtgTcaGcaTatmCaaGagTagAtaTggAagTggAtamCacTctGctAatmCc CEADH_H24K24.3a_d3_LNA3 mCacmCttAttGcgTtcAatTttTgtTtcmCacmCtamCtamCtamCgaAtamCgtTg CEADH_H24K24.3a_u50_LNA3 TcamCaaGggAgaGagTctGcgGtcGgtGctGgcGttmCgaGaaAatAtaAc CEAPEX_R09B3.1_u191_LNA3 mCatGcaTccmCgamCgaGaaGaaGtamCtcAttTtgGagTtaTctGgcGaaTt CEAPEX_R09B3.1_u37_LNA3 GacmCatGctmCcgGtcGtcAtgmCaaAtcGacTtcTaaAttGctTctGatTa CEAPO_C35D10.9_u15_LNA3 TtgmCatGctGttAaaAccTatmCgtGtamCaaTatTgcmCtgTatAttmCccmCt CEAPO_C35D10.9_u609_LNA3 TggmCacAgcTtaAtaAcaAatTggAaaGtcGagGatTagTcgGtgTtgAa CEAPO_C48D1.2_u176_LNA3 GacAcamCgcAaaGgaTatGgaTgtTgtTgaGctGctGacTgaAgtmCaaTa CEAPO_C48D1.2_u23_LNA3 AgcAcgAaamCtcTgcmCgtmCtaAaaTtcActmCgtGatTcaTtgmCccAatTg CEAPO_F20C5.1_u453_LNA3 AtgGtcAtamCtcTaaAatGggmCagAacTtcAacmCaaAtcAttmCtcGtcAg CEAPO_F20C5.1_u96_LNA3 AacmCcgAgcTtgmCcgmCaaAgtGcaAgaAaaTtaTagAacGaaTgaAacAg CEATPase B0365.3 u31 LNA3 GgaTggGtcGagmCgtGagAccTacTacTaaAgaAcaGctTgtGaaTctTt
CEATPase_B0365.3_u386_ NA3 mCaamCgtTctmCgaTtcmCtamCggAcaAgaAtgGacmCtaTgcmCaamCagA
aaGa CEATPase_C17H12.14_u356_LNA
3 TgcTcgTtaTccAgcTatTttGaaGggActTgtmCatGcaAggActTctTc
CEATPase_C17H12.14_u89_LNA3 mCcgTttAgaGctTatTgcTaamCcaGatTgtmCccAcaAgtmCagAacAgcTc CEATPase_F55F3.3_u215_LNA3 TgamCggAcgmCtamCtamCccAtaTgtAttTgtTccAtcTtamCcaGcaAccAa
AgcTacTtcAttmCgamCaaGgaAcaTctmCggAaaAgtmCaaGtamCatmCccG CEATPase_F55F3.3_u275_LNA3 g CEATPase_Y49A3A.2__u103_LNA3 AaaTtcAagGatmCcaGttGccGatGgtGaaGccAagAttmCgcAagGatTa mCgaTcgTttmCtgmCccAttmCtamCaaGacTgtmCggTatGctmCaaGaaTatG CEATPase_Y49A3A.2_u272_LNA3 a
CECALR_Y38A10A.5_u238_LNA3 TcaGgaAcgAtcTttGacAacAttAtcAtcAccGacTctGttGagGagGc CECALR_Y38A10A.5_u296_LNA3 TgaActmCtamCtcTtaTgaAagmCtgGggAgcmCatmCggAttmCgaTttGtgGc CECAT_Y54G11A.5b_u137_LNA3 GaamCttTgcAggGccGctmCggGgaAtgTcaTgaTttmCatTatTaaGggAa CECAT_Y54G11 A.5b_u189_LNA3 GtcAatTctGggAgaAggTgtTggAtamCcgGggmCtcGggAgaGaaTgtGc CECC_C03D6.3_u275_LNA3 AtgTaaAgaAggAatGctTccmCgaAtgGatTggAtaTttAttTgtmCcaGa CECC_C03D6.3_u430_LNA3 GgamCcgAaaTttGtgmCagmCatGtcGgamCacGaaAttGatGgtmCtcAttTt CECC_C07G2.3_d9_LNA3 mCagAcamCgaAggTtamCgaTagAtaAccAtcTctmCaaAgtmCtaTcgAccTc CECC_C07G2.3_u44_LNA3 mCgamCgaTgtGcgTgtTccTgamCgaTgaAagAatGggAtaTtaAgaAaamCc CECC_Y46G5A.2_u331 _LNA3 TtgTgcTccAtcGctGctmCcgmCttAcaGacTtgAcaAcgmCtcAccTttGc CECC_Y46G5A.2_u385_LNA3 AatGagmCggTtgTgcmCgtGtgAcgTcamCttmCgtmCacAgtGttGctmCtamCt CECoA_C29F3.1_u316_LNA3 AaaTtgAcamCcaAtcAaaTctGtcTcaTctmCctGagGacmCgtmCaamCttmCg CECoA_C29F3.1_u392_LNA3 AatmCttTgtGtamCggAgaTggGgcAaaAggmCagmCaaGaaAgtAaamCcaAg CECoA_F08A8.4_u1094_LNA3 AggAcaAggGgcActActGgcAcaGgcTttGatTatTgcAgtGagAtaTt CECoA F08A8.4 u1260 LNA3 TtaAtgGagGtgAcaAtgGgtTccTtgGatTcgAtaAatTccGagTgcmCc
GctmCttmCtcmCagTggGctmCaaAatAgtmCaamCtcAacAgaTcgGaaGttmC
CECoA_F59F4.1_u109_LNA3 t CECoA F59F4.1 u424 LNA3 AaaGctTcgAgaTggmCacGttmCgtmCtgTatmCtcGtgAagAacTtaTtgmCa
CECoA_Y25C1A.13_u115_LNA3 GatTcgmCtgAacTttAtcAagAcgTggAatAtgAgcmCagmCtcmCtgTcgAc
CECoA_Y25C1A.13_u451_LNA3 GatmCttAtcAccGcgTgcGatAttmCgaGtaGctTcamCagGatGcgAttTt
CECOL_C27H5.5_u493_LNA3 GgaAagGaaGgaTccAttmCtcAgcTctGcamCttmCcamCcaTcaGagmCcaTg
CECOL_C27H5.5_u680_LNA3 TggAtamCaaGgaGggAtcTggmCagTggTggAtcTggAagTggTggAtaTg
CECOQ_ZC395.2_u199_LNA3 TtgAaaGaamCtcmCttGccGacGatmCctGaaAcamCacAaaGaaTtgmCtgAa
CECOQ_ZC395.2_u400_LNA3 AtgTggGatGagGagAaaGaamCatTtaGatAcaAtgGaaAgaTtaGctGc
CECRYZ_F39B2.3_u171_LNA3 AggmCtgAgcTctTggActTtgGcaTcaAcaTtgTctmCatTctTgaAggAa
CECRYZ F39B2.3 u222 LNA3 TtaTggTtamCagAagGagmCtgTttAcgGtgTagmCatTggGaaTgtmCttmCc mCacTtcAacmCaamCtcmCgtGttAatmCaaGcaAgcmCgcmCacmCatmCta
CECyclin_R02F2.1 a_u24_LNA3 AtgAg
CECyclin_R02F2.1 a_u312_LNA3 TctmCatTgcTcgTcg AggmCtamCcaAcaAacActGgcAatAccmCaaTtaAt CECyclin_ZC168.4_u203_LNA3 TaaGaaAgtmCatTgaGgaTgcTgtmCgcTttGctmCgcmCgaAgtmCtcGtaTa CECyclin_ZC168.4_u273_LNA3 AagTtcAtcmCtgTtgAcgGaaTcgAggmCggAgaAtgmCtgTatmCggTcaTt CECYP_B0213.15_u133_LNA3 AcaGgaAatAtgAttTtgGatTtcGatTttGaaTcgGttGgtGctGccmCc CECYP_B0213.15_u202_LNA3 GctGagmCtgTatTtgGctAgtGaaAtgTgtGttTttGatActTtaAatGa
CECYP_B0304.3_u38_LNA3 AcgAggTttGgaTcamCaaTcaGaaTtcTgtGaaAtaAgcGttTttTggGa CECYP_B0304.3_u89_LNA3 AgtTctmCggTctAacAgtGtcTccmCgtTgaAtaTtcTtgTaaAatmCacAc CECYP_C03G6.14_u706_LN A3 AtgAccActmCaaAatActGctAaaAgaTttGcaGcgGcaGaaGccGttAa CECYP_C03G6.14_u768_LNA3 TtgAtaTggmCtgTacmCtgTatGgtTttTgaGgamCgtTttTtaGgaGtcGa CECYP_C03G6.15_d9_LNA3 AttTatTcaTtcAtcmCatGtaAacTgtAtaTttTgaAttTgtGttGtaAa CECYP_C03G6.15_u148_LNA3 GccAaaGcaGaaTtgTatTtgAtcTtcGgtAacmCttmCtcmCttmCgcTacAa CECYP_C06B3.3_u102_LNA3 AttTtgAatmCttmCtgGgaAaaTgcmCatmCcamCtcGagAaamCcgTtcmCgtTt CECYP_C06B3.3_u474_LNA3 mCtaAcgGagGatmCtcGccAatTatmCttTgaGagAcaAaamCtgAaamCtcmCt CECYP_C12D5.7_u399_LNA3 AtcTagTccmCaaTgaAtcTccmCacAtgmCtgTtamCtcGtgAtgTtcAacTc CECYP_C12D5.7_u65_LNA3 TttTgcTttmCatmCgcAaaAgcTcaAgaTtamCacAtgTcaGgtmCaaGccAa CECYP_C45H4.17_u27_LNA3 mCcgmCgamCttTaaAgaGaaGatmCatAaaTttGcaTtgTttTttGttTgtAt CECYP_C45H4.17_u598_LNA3 mCgaGggTgaTtcGgaGacTttmCagTaaTgtmCcaActTtcAaaTgtTtgmCa CECYP_C45H4.2_u110_LNA3 TagAtamCaaGatAcaTccmCtcAaaAgaAggmCctAccGtcAatGgcmCaaAg CECYP_C45H4.2_u429_LNA3 TcaAcgmCgtmCtaTaaAtgAatmCacAacGagGtaTcaAcaTtcTccmCccTg CECYP_C49C8.4_u363_LNA3 AtgmCtgAtgTtgAaaTtgmCtgGctAccGtaTtcmCaaAagAtamCtgTaaTc CECYP_C49C8.4_u883_LNA3 AtgAatmCcaTggmCttGgamCatmCtcmCcgTttTtcAagGgaTatAaaAatGt CECYP_C49G7.8_d6_LNA3 AtgmCaamCgaAttAgtGaaAaaTtcAtcmCtgGaaTaaAaaAtaAttmCtaAa CECYP_C49G7.8_u795_LNA3 AtcGctAcgAcaAtcTttmCcgAtgmCctTcgAagTttmCgaAagmCttTctmCt CECYP_F01 D5.9_u374_LNA3 GagGtcGgtGgaGgaGgaAgtGgaAatTgamCggmCaaAatmCctGccmCaaGg CECYP_F01 D5.9_u46_LNA3 mCccTctTtgGgaTttmCcamCtcAagTttActGttmCggmCagmCagTgaTatAa CECYP_F08F3.7_u25_LNA3 GagTtgGttmCcamCagAatGctTagGacGttTaaAttmCgtmCacAaamCttTt CECYP_F08F3.7_u401_LNA3 mCaaTatGgtTccmCatTttAgcAacTcaTatGaamCacAgaAgaTgtmCctTg CECYP_F14F7.2_u397_LNA3 GaaAaaGgcGtcGacAttTtaTgtGacAcgTggAcamCttmCacTatGacAa CECYP_F14F7.2_u68_LNA3 TaaTtgAatTacGggTctTttGtamCatAttAatTttAgtAtamCttTgtGa CECYP_F42A9.5_u435_L.IMA3 AtaTcaAtgmCaamCtaTtaAtgAatmCacAacGtcTtgmCcaAtcTtcTccmCg CECYP_F42A9.5_u55J.NA3 GgaGtgActAtgAaaGcaAagAgtTacmCgaTtgAaamCtgAaaGacAgamCa CECYP_K07C6.3_u3_LNA3 AatmCttTaaTgaTaaTttAtgGgaTctGtaTttmCtcTttmCtgTcaAtaAa CECYP_K07C6.3_u354_LNA3 AtgAgcmCcamCaaAtgTaaAagGatAcgAgaTtgAttmCggGaamCagTcaTg CECYP_K07C6.4_u118_LNA3 AtcmCtgmCgaTatGacAttAagmCcamCatGgtTctGaamCctTcaAcaGaaGa mCtgAacmCttmCaamCagAagAtaAacTtcmCgtAtaGcgmCtgGaaAaamCtc
CECYP_K07C6.4_u87_LNA3 mCt CECYP_K07C6.5_u7_LNA3 AttTaaAggAatTcamCagmCtcAaaAaaTaaTaamCtamCcgGttmCagAgaTt CECYP_K07C6.5_u99J.NA3 AatTtgAgcmCacAtgGcaAgtTatmCaamCagAggAgamCaaTgcmCgtAcaGt CECYP_K09A11.3_u362_LNA3 TgamCatTctActTaaAggGaaGaaAatAccAacTggTacmCctTgtAttTg
TcamCcamCaaAgcmCatAcaTatGcgAgcTagTtcmCtcAggmCtgmCttAaam
CECYP_K09A11.3_u48_LNA3 Cc CECYP_K09A11.4_u238_LNA3 TtcGacAaaActAttTtgGaaAgaAcaAtcmCcaTtcAgtGtcGgcAaamCg CECYP_K09A11.4_u68_LNA3 TctGacAacAaaGccAtamCacGtgmCcgActAatTccAcaAtcAgcTagAa CECYP_K09D9.2_u151_LNA3 TtgGcaAaaGcaGaaTtgTatTtaAtcTttGgaAacmCtcmCttmCttmCgcTa CECYP_K09D9.2_u866_LNA3 TgaAtcTttmCaaActTatmCacTccTttTaaTacTacmCgtTccTgtTtgGa CECYP_T10B9.10_u410_LN A AttGagAttGtaTccAttGgcGtcTctTgtTcamCaaTcgAaaAtgTctmCa CECYP T10B9.10_u56_LNA AacTgcTacTatTgcGccAtcAagTgtGctGctmCaaActTaaAtcmCagGt
CECYPJT10B9.7_u102_LNA3 TtgAgamCagGaaAtaAgamCtaGaaTtcmCttTgaAacTggTggGaaGtgmCt CECYP _T10B9.7_u267_LNA3 AagAtgTcaAagAatTcaAgcmCagAacGatGgtmCcamCcgAcgAgcmCatTa CECYP_T19B10.1_u100_LNA3 AttGaamCcaActmCtgAaaTatAatGacAcaAaamCcaTgtmCtgGaaGtgGt CECYP_T19B10.1_u319_LNA3 GgcAatGtgAcaAtaTctmCcaAtgGttmCttmCacAgcAatmCatmCacGtgTt CECYP_Y49C4A.9_u121_LNA3 mCtaTtcAatmCgaTatTttAtcAcamCcaTccAgtGctGgamCctmCcaTcaTt CECYP_Y49C4A.9_u413_LNA3 GtcTcaGagAtgTgtAaaTttActTccmCtgmCaaTttGttTcamCgcAacTa CECYP_ZK177.5_u394_LNA3 TtcmCgaAtgTttmCcaAttGggActGaaGttTcaAgaGtcAccmCagAaaAa CECYP_ZK177.5_u445_LN A3 GatmCcaGcaTctTccAagmCttAcaTtcmCtcmCgtGctTgtAtcAagGaaAc CEDA0_C47A10.5_d9_LN A3 TttGaaAacmCtgTttTatTatTaaAatAgaTaaTtgAttAgtTctGtamCg CEDAO_C47A10.5_u269_LNA3 AtamCgtTgcActGcaTccGgcTatGagGgaGccAaaAatmCttAggGgaGt CEDC_C01A2.3_u373_LNA3 GcamCttmCcaTtcAtcTctGcaGctActAtgGctTtgGtgAcaAaaGttGg CEDC_C01A2.3_u96_LNA4 mCcgTccAaaAgaAtgmCcaTctmCacAagTctTgaAatmCttAtaAagGtaGt CEDC_C34F6.1_u301_LNA3 GagGgaTcaAcaGtaAccTcgTgcGgtAttGacAagGgaTgtmCcgGaaGg CEDC_C34F6.1_u450_LNA3 GatGgtTctTcgAtcGcaAacAaaAcaGatGtgmCtcmCatTtamCatAcgGa CEDC_F33D11.3_u126_LNA3 AtgGagAaaAtgGatmCtgAtgGagTtgmCagGaaGtgAtgGagmCtcmCagGa CEDC F33D11.3 u14 LNA3 TgaAtcTccAtaAatTatTcaAtgTttmCcaAatAttTaaTttAtcAatTg
GctmCaamCacGgtAggAtcmCtaTggAacmCgtmCggAggAgcAggmCctmCg
CEDC_F46E10.2_u392_LNA3 gAg CEDC_F46E10.2_u54_LNA3 mCgtGacAacmCtcTtaTttAttTctGtaAaamCtgAttmCgcmCaaActTttGt CEDC_F56G4.2_u382J.NA3 GaaGctTtcAaamCcaAatGagTtcmCttmCccGgaAtcmCcaAagAatAccAa CEDC_F56G4.2_u82_LNA3 AcaAtgAaaAgaGagGatGgaAagGaaAtcGaaGtcTctGttmCttGacGa CEDC_M 162.2_u103_LN A3 GatGagGtamCatAacTttGtgTgcAgtTatAggmCcaTctAcaGtamCctGc CEDC_ 162.2_u480_LNA3 TtcmCatmCatmCacTaamCcgAttGtcmCtgAcaTtgAtgGccAaamCcaGggAa CEDC_R10E4.11_u274_LNA3 TcamCatTatmCgaAcaAgtActAgtAagmCatGctGtgAtgGagTgcmCgcTa CEDC_R10E4.11_u397_LNA3 mCacGgaGatmCacGacAtcAaaGcgGatTgcTtaGagTgtGgaAacmCgtmCt CEDC_T04C9.1_u321_LNA3 ActAtcTacGtgGcamCgtTggActmCatmCatmCgaTggGaamCgamCgtAtaAg CEDC_T04C9.1_u64_LNA3 TctmCtgGccAgtTcamCttTgtGatmCaaTctmCagAttmCgtmCcamCacAagAt mCtamCttmCcgmCaaGaaGgcmCcgTcgTttmCtaAtcGatmCgaAcaTctmCac
CEDC_W02A2.3_u32_LNA3 Ac
AtgGatGatmCgamCccActTgcmCacTgamCccAcaAtcmCcgmCacTcamCta
CEDC Λ 02A2.3_u374_LNA3 mCc CEDC_W05G11.3_u153_LNA3 AagAcgGagAggmCtgGagAgaAcgGtamCcgAtgGagAgcmCagGaamCtgAt CEDC_W05G11.3_u51_LNA3 mCcamCccAggAggAggGatAcaAgaGaaGaaAgtAcaGatTctmCcaActAa CEDC_ZK863.5_u256_LNA3 AgtTtcAcamCttmCttTttGccGttTtgGttmCccGttAtcAatmCcaTtgAt CEDC ZK863.5 u324 LNA3 mCttTtaTatTctmCatmCaaTttGttTccTacTtgGtcAgcTgaGgaTcgTt CEEPHX_Y55B1 BR.4_u161J-NA3 TtcGgcAcaAatGgaGcaAaaGtaTcgTggTtaTtgTgaTgcGatTatTc CEEPHX_Y55B1BR.4_u93_LNA3 mCtamCtaTgaAtgAgcTcamCtgGacTcaTttAtcAacTcgAgtmCaaAagmCc CEER_18S_u388_LNA3 GttGgcGaaTctTcgGgtTcgTatAacTtcTtaGagGgaTaaGcgGtgTt
CEER_18S_u82_LNA3 GaamCtgAttmCgaGaaGagTggGgamCtgTcgmCttmCgaGgtTtaAcgActTc
CEER_26S_u342_LNA3 TgtTatTgcGaaAgtAatmCctGctTagTacGagAggAacAgcGggTtcAa
CEER_26S_u38_LNA3 TgcAtamCgamCttGgtmCtcTtgGtcAagGtgTtgTatTcaGtaGagmCagTc
CEF0X0_R13H8.1b_u331_LNA3 TgtGctmCagAatmCcamCttmCttmCgaAatmCcaAttGtgmCcaAgcActAacTt
TtaAgamCggAacmCaaTtgmCtcmCacmCacmCatmCatAccAcgAgtTgaAca CEFOXO_R13H8.1b_u393_LNA3 Gt
AcaTtgmCtamCcaAggmCctAagmCcgmCttmCaaAttmCtcTaaGtcTgaAatG
CEGAPDH_K10B3.7_u21_LNA3 CEGAPDH_K10B3.7_u727_LN A3 GttGagTccAccGgaGtcTtcAccAccAtcGagAagGccAatGctmCacTt CEGBA_F11 E6.1 a_u232_LNA3 AgtAaaTtcmCttmCcamCgtGgaTctActmCgtGtgTtcAcaAagAtcGagGg CEGBA_F11 E6.1 a_u451_LNA3 GgtmCcaAtaAtgGgaGacTggTtcmCgcGcaGaaAgtTatGcaGatGatAt CEGLU_C02A12.1_u264_LNA3 AgaAaamCttmCgtTggAccmCtgmCtaAggAgaAgtAttTcaAgcTtcTgaGc CEGLU_C02A12.1_u55_LNA3 GagmCacmCcgAagmCtcAagmCcaTatTtgGaaAcaAgamCcaTacTctTcaAa CEGLU_C46F11.2_u271_LNA3 GttAccmCtcTacAaaTctmCgcTtcAatmCcaAtgTtgTtcGcaGtcAccAa CEGLU_C46F11.2_u45_LNA3 mCcgAagAgcTcgTtamCtaTgcGagGagGtgTgaAgcmCggAatAatTttTt CEGLU_F26E4.12_u109_LNA3 AagTtcTtgGttGgamCgcGatGggAaaAttAtcAagAgaTttGgamCcaAc CEGLU_F26E4.12_u480_LNA3 AcgAttTcaAcgTcaAaaAtgmCtaAtgGtgAtgAcgTgtmCacTttmCggAt CEGLU_R07B1.4_u166_LNA3 AccTggGttGatGttTttGcgGctGaaAgtTtcTccAagmCtcAttGatTa CEGLU_R07B1.4_u38_LNA3 GaaGtamCgtmCtcmCcaAagAaaAgcTacmCccAgcTtaAggmCatTgcAcaAt CEGLU JTO9A12.2_u220_LNA3 GcgmCcaGatAtgTatTcaAagAtcGagGtaAatGgtmCagAacActmCatmCc CEGLU_T09A12.2_u335_LNA3 AatmCtamCagGgaAaaAggAttTcgAgtTgcmCgcGttTccAtgmCaaTcaAt CEGLU_T28A11.11_u299_LNA3 AgaTggmCaaAgaAgcAtamCatAacTgaAacTctTccmCggGgaGctActAc CEGLU_T28A11.11_u54_LNA3 TgaAtaAacGggmCcgAacTaaAtcmCatTcgTcaGtgGaaAtgGgaAacAa CEGPD_B0035.5_U256_LNA3 GtcmCgtmCttmCctGatGctTatGaamCgcmCtaTttmCtcGaaGtaTtcAtgGg CEGPD_B0035.5_u478_LNA3 TgtGgaAaaGctmCtcAacGagAagAaaGcaGaaGttmCgtAtamCaaTtcAa
AtaTcgmCcgmCctGctTccTcamCcaAccmCgaAtaAcgmCaamCaaAaamCtt
CEHSP_C09B8.6_d8_LNA3 Ta CEHSP_C09B8.6_u286_LNA3 AagAgcmCcamCtcAtcAagGatGaaAgtGatGgaAagActmCttmCgtmCtcAg CEHSP_C12C8.1_u127_LNA3 mCaaGatAttTtaAcaAaaAtgmCatmCaamCaaGaaGccmCaaTcaGgtTccGg CEHSP_C12C8.1_u1531_LNA3 mCttGggmCatTctGtamCggGatGctGtcAttActGtgmCctGcaTatTttAa CEHSP_C47E8.5_u310_LNA3 AagAagmCatmCtcGaaAtcAacmCcaGacmCacGctAtcAtgAagAcamCttmCg CEHSP_C47E8.5_u361_LNA3 AtgAaaGctmCaaGctmCttmCgtGatTccTctActAtgGgaTacAtgGccGc CEHSP_F26D10.3_u276_LNA3 TtaAgcAgamCcaTtgAggAcgAgaAgcTcaAggAtaAgaTcaGccmCagAa CEHSP_F26D10.3_u397_LNA3 mCgtmCttTccAagGatGacAttGaamCgcAtgGtcAacGaaGctGagAaaTa CEHSP_F43D9.4_u169_LNA3 GtcGacTtgGctmCacAtcmCacAccGtcAtcAacAagGaaGgamCagAtgAc mCaaTctTgaGggAcamCgtTctmCacmCatTgaGggAcamCcamCgaGgtmCa
CEHSP_F43D9.4_u275_LNA3 aGa
TcamCtaAaaTgcAccAatmCtgGacAatmCttmCtgmCttmCtgmCtgGatGcgmC
CEHSP_F44E5.4/5_u123_LNA3 t CEHSP_F44E5.4/5_u380_LNA3 TcaTgaAgcTaaAcaAttmCgaAaaGgaAgaTggTgaAcaAcgGgaAcgTg CEHSP_F52E1.7_u175_LNA3 AagTatAacmCttmCcaAcaGggGtcmCgtmCcaGaamCaaAtcAagTccGaaTt CEHSP_F52E1.7_u448_LNA3 TttAacmCatGgcmCgcAgaTtcTtcGatGacGtcGacTttGatmCgcmCacAt CEHSP_F54D5.8_u252_LNA3 GcgTcgAaaAgaTctmCccTgaAgtmCtgmCatTgamCtgGccTtgAtaTtaTg CEHSP_F54D5.8_u318_LNA3 AcaTagTctTcgTcaTcaAggAtaAgcmCacAccmCgaAatTcaAgcGagAg CEHUS_H26D21.1_u117_LNA3 TcgmCcaAcamCtcGgamCacGtgmCcaAaaTgaAtaTcaTctmCaaAtcGaaTg CEHUS H26D21.1 u478 LNA3 GtcGaaGttAgaAatmCcaGaaGccGatAttGttTctmCatmCaaAttmCcaAt
CEMRE_ZC302.1_u169_LNA3 ActActmCgtGgaAgaTccAatAaaGttGttTcaAcgmCgamCaaAtcGatTc CEMRE_ZC302.1_u292_LNA3 GgcAgtGaaGatGaaGtgGcaAatTctGatGaaGaaAtgGgaAgcAgtAt CEMTLJT08G5.10_d 127_LNA3 TtgTcaAcgAccAgaAgcAaaAatTatGggAatmCgcGatAaaAttmCaaGg CEMTLJTO8G5.10_u45_LNA3 GatGcaAgtGtgmCcaActGcgAatGtgmCtcAggmCtgmCtcAttAatTtgAa CENAP_D2096.8_u356_LNA3 GacGatAtgTtcGatTtcmCcaGgaGagGacGgtGatGatGtgTcaGacTt CENAP_D2096.8_u70_LNA3 GacGatAtgTtcGatTtcmCcaGgaGagGacGgtGatGatGtgTcaGacTt CEPAJF56D12.5_u241_LNA3 GagGtcGtcGtaAtcmCacAagGctmCcaAgaAagmCaaGtgmCtcGacAttTc CEPAJF56D12.5_u301_LNA3 GatActTttGgcAagmCtcGttmCcaAtcAagAagGagGtcAtcmCcaGatmCg CEPDJC07A12.4_u28_LNA3 GatGagGagGgamCacAccGagmCtcTaaAtcmCacAttmCcaAtamCagTtcAa CEPDJC07A12.4_u433_LNA3 mCttAtgTccGaaGatAtcmCcaGagGatTggGacAagAacmCcaGtcAagAt
TacmCccAgtmCgamCtaTgaTggAgamCagAaamCctmCgaGaaGttmCgaAg
CEPDI_C14B1.1_u119_LNA3 aAt CEPDI_C14B1.1_u358_LNA3 mCtcGtcGccTccAacTtcAacGaaAttGccmCttGatGaaAccAagActGt CEPGK_T03F1.3_d9_LNA3 TtcTatTgtTtaTtcmCttGccmCaaTagTgtAttTgtAttTatTctTtcTc CEPGK_T03F1.3_u424_LNA3 mCaaAtcmCatmCtcmCcaGtgGatTtcGtcAttGctGacAagTtcGccGagGa CEPON_E01A2.7_u223_LNA3 GttTctGatTcgAcamCttTatGgamCcaTctmCaaGttmCtgmCgaGttTctTt CEPON_E01A2.7_u79_LNA3 GggAaamCaaAtgAttGttGgtAcaGtaGccmCgcmCctGctAttmCacTgtGa mCgaGcamCatmCatmCcaAtcGttmCctGttmCaamCaaGgcmCttmCtaAtcGtt
CEPPGB_F13D12.6_U44_LNA3 Ag CEPPGB_F13D12.6_u440_LNA3 TgaTgaGagmCccAgtAacmCaaTtaTttGaamCcgTcaGgaTgtGcgTaaGg mCgtmCtaAtcGaaGaaGggGatmCgtGggmCaaTcaTaamCtaAttAacmCttm
CEPPS_T14G10.1_d2J.NA3 Ca
CEPPSJT14G10.1_u240_LNA3 mCaaTggmCtcmCagGtcTttmCtgmCtcTtcAtaTacTtcmCatTccGagTtgmCt
CEPRDX_R07E5.2_u405_LNA3 GttmCtcTtgGagmCtgAagTtgTcgmCgtGctmCgtGtgAttmCtcActTctmCt
CEPRDX_R07E5.2_u42_LNA3 TcgmCtamCcaGcaAggAatActTcaAcaAggTcaAcaAgtGatmCacAcaGa
CEPYC_D2023.2_u256_LNA3 AagGaaAttGtaActmCgcmCcaAgaGctmCtcmCcaGgtGtcmCgtGgamCatAt
CEPYC_D2023.2_u427_LNA3 TtgActGgaTtgGagAttGcgGaaGaaGttGatGttGaaAtcGagAgtGg
CERAD_F10G7.4_u169_LNA3 GccAagTctmCaaGcaAtaAgtGttGatmCaaTcaGagmCcaTacGgaGagAt
AtaTtgAgamCttmCggGacAagmCggActTctmCatmCtgTcamCagmCaamCtg
CERAD_F10G7.4_u267_LNA3 mCc CERAD_F32A11.2_u250_LNA3 GatmCcgmCagAgaAtcGagTatTtcmCtcTcgAgamCccAtgGatAtcAacTg CERAD_F32A11.2_u380_LNA3 TccGttAagAagmCtcActGgaAaaAcamCacGgcTcgAacGaaAttGgaAt CERAD_T04H1.4_u274_LNA3 AatTtgGatGagAgcAaaGtgGaaGgaAtgGctAtcGttTtgGcaGatAt CERAD_T04H1.4_u375_LNA3 GtgmCtgGtcAaaAaaTgcTtgmCttmCgtTgcTtaTtcGcaTtgmCacTcgmCa CERAD_W06D4.6_u325_LNA3 mCttmCgaGaamCtcTtcAagTtgGaaTcaAcaGtgGcaTcgGatAcamCatGa CERAD_W06D4.6_u34_LNA3 GtgmCctTctGaaGccGaaGaaAacGacGatTagTtaAatGttTccAagTt CERAD_Y116A8C.13_u289_LN A3 GatAaaAtcGatAgcGacGacGatGagGaaGccGatGatGagGagmCtcGa CERAD_Y116A8C.13_u59_LNA3 GcaGgtGgaTacGgaTgtGgaGctGacTttTgcGttTtaTcaAgaAtcTc CERAD_Y39A1A.23_u221_LNA3 TccmCgtAgaAgtAgaAatGctAgaAgaAccTgaAcaAgaAgaTcaAgaAa CERAD_Y39A1A.23_u276_LNA3 TgcAagAtgTcaGtaTtgAaamCaaTtcmCtgTagAgamCccmCcgAagAaaAt CERAD_Y41 C4A.14_u509_LNA3 AgtmCtcGtaTccGggAatGttTcaGccTgtGaaAatGctTgtTgaAgamCg CERAD Y41C4A.14 u731 LNA3 mCttmCaaAacmCgtmCgcTttTaaGgaTacAggAacGtgGcamCgcTtcmCgaG
CERAD_Y43C5A.6_u131 J.NA3 mCagAttGtamCctTcgAaaAggAaaAggAgaGaaTcgmCgtmCgcAaaAatGg CERAD_Y43C5A.6_u429_LNA3 TgaTggmCttTgaTtaTtcGagmCagGagmCaaTgaTgtmCcgAgaGtcGttAt CERFC_F31E3.3_u128_LNA3 mCaaTgamCgaGaaTatTggAgtAatGggGaaActGgtTgcGacTtgmCgaAa CERFC_F31 E3.3_u55_LNA3 TtgGaaAacAatmCtcmCtcGacTttmCtgmCtcActmCttmCgtGaaActAtcmCa CERPL_K11 H12.2_d1J.NA3 TctTgtTatTttAttTtgTttTggGctTgtTccGaaAatGaaAtgGttGt mCaaTggAtcAccAagmCcaGttmCacAagmCacmCgtGagmCaaAgaGgamCt
CERPL_K11 H12.2_u172_LNA3 cAc CERT_F36A4.7_u1396_LNA3 mCttTgtGatGtgAtgActGcgAagGgamCacTtgAtgGctAttAcgAgamCa CERT_F36A4.7_u2302_LNA3 GagmCcaGctActmCagAtgAcamCtcAacAcgTtcmCatTatGcaGgaGttTc CERT_F36A4.7_u289_LNA3 TacActmCcaTccTcgmCcgAcaTacAatmCcaAcaTctmCcamCgcGgaTtcTc CERT_F36A4.7_u2919_LNA3 AtgGagAagAtgGttTggAtgGaaTgtGggTtgAgaAtcAgaAtaTgcmCg CERT_F36A4.7_u4269_LNA3 AacmCggGatAccGtgTcgAacGtcAcaTgaAagAtgGcgAtaTaaTcgTc CERT_F36A4.7_u5485_LNA3 GagGagAttAaamCgcAtgTcaGtgGctmCatGtcGagTttmCcaGaaGtcTa CESLC_F52F12.1 a_u249_LNA3 AgaTatTgcmCtcTacTtaTcaTggGccTgaTggmCttTgtmCtgmCcgGtaTt
GaaTctmCaamCcamCttmCtgGaamCccmCatAcamCcaAtgGatAgaAgamC
CESLC_F52F12.1 a_u76_LNA3 ggAg CESLC_K11 G9.5_u400_LNA3 GttGttmCttTttTccGtgAtcTttTcaTgtTtaTgtmCtgAacGtgGcaGg CESLC_K11 G9.5_u462_LNA3 GacTcgTtgGtgTctTgcTagGatGtcTtgGgtTcaTtcmCtcAatmCgtTg CESLC_Y32F6B.1_u179_LNA3 GtamCtgGgcTcgAggGctGaaActAatmCgaAgaAgaAacTccAgaAgaTa CESLC_Y32F6B.1_u280_LNA3 GgaTcaTgcTctGttTacGacActGatGagTtaAgaGtcAgamCtgmCacGt CESLC_Y37A1 C.1 a_u104_LNA3 mCgaTggTtcTtcTcgTctAtcAtaTcgGggTagTtgmCcgAagTgtTgaAa CESLC_Y37A1C.1a_u404_LNA3 mCaaAtcGaamCtgGtaTaaAggAggAccGacGgaGacGaaTttGaamCgaGa CESLC_Y70G10A.3_u383_LNA3 AttmCgaTcaAagAacTctGgcTctmCggmCgtTaamCtgGacAttTgtTcgTc CESLC_Y70G10A.3_u46_LNA3 mCtcmCccGagmCagGcgAttAttmCacGctAgtTatGctmCaaAtgTgaTctGt CES0D_C15F1.7_u435_LNA3 mCcgGtamCtaTctGgaTcamCacAgaAgtmCcgAaaAtgAccAggmCagTtaTt CES0D_C15F1.7_u9_LNA3 mCccAgtGacTacmCtgAatmCgcGtcTctGaaTctmCcamCacAatTccTacTa CESOD_F10D11.1_u326_LNA3 GgaGttGctmCacmCgcAatTaaGagmCgamCttmCggAtcTctGgaTaaTctTc CESOD_F10D11.1_u477_LNA3 AaaTtgAggAaaAgcTtcAcgAggmCggTctmCcaAagGaaAcgTcaAagAa mCaaTcgTacmCatGaaAgaAgtTggAagmCcamCgtGcaAgaGaaGaaAtcmC
CESULT_EEED8.2_u316_LNA3 a CESULT_EEED8.2_u82_LNA3 AagAagAttmCctGacmCagAgaGacTcamCgtGctTacmCcaAgaAgcAtcTa CESULT_Y113G7A.11_u252_LNA 3 AgcAttGgtGgaAatAcgAaaTggmCatGggAagAgaAacmCccTctmCaaTt
CESULT_Y113G7A.11_u96_LNA3 mCtgGttAcgGtaGtgTatGgtmCccTgtmCctmCtcAgaAtgmCaaAtaTgtmCg CESULT_Y67A10A.4_u108_LNA3 TctAcgTcgAtgGaaAagmCcgAttTaamCaaTcaAagmCcaAcaAcgmCagTt CESULT .Y67A10A.4_u327_LNA3 GgaAagGtgmCcaAaaAgtTgamCagmCaaTtgGagGatmCttAttmCatTgcmCa CET0P0_K12D12.1_u398_LNA3 AgaTgaTgaTgaAgtTccTgcAaaGaaGccTgcTccAgcGaaGaaAgcTg CETOPO_K12D12.1_u449_LN A3 AaaAccTcgTacTggAaaAggAgcTgcGaaAgcGgaAgtTatmCgaTttGt CETOPO_M01 E5.5b_u256_LNA3 GagAagGccmCagAagAagTacGacAgamCtgAagGagmCagTtgAaaAagTt CETOPO_M01 E5.5b_u429_LNA3 TtcTgtmCatAcaAtcGtgmCtaAtcGgcAggTtgmCgaTccTttGtaAccAt CEUbi F25B5.4 u186 LNA3 AagmCttmCggAcamCcaTtgAgaAtgTcaAagmCcaAaaTccAggAtaAggAg
CEUbi_F25B5.4_u2_LNA3 AatmCgaAccmCatmCaaTtcActmCgtTatTccTccTcgAtcTccGttmCaaGt
CEUbi_F29B9.6_u145_LNA3 mCtgAacmCatmCcaAatAttGaaGatmCcaGctmCagGctGaaGccTatmCagAt
CEUbi_F29B9.6jj230_LNA3 mCgtGtgmCttAtcTctTctGgaTgaAaamCaaGgaTtgGaaGccGtcAatmCt
CEUbi_M7.1_u239_LNA3 mCggAagmCatmCtgmCctTgamCatTctmCcgTtcGcaGtgGtcGccGgcTctG
CEUbi_M7.1_u53_LNA3 AaaGtamCgcTatGtgAggAggmCtaAcamCcaTtcAtaTaaGaamCgcAgcmCa
CEUGT_F39G3.1_u40_LNA3 TgtTgcmCgtAgaAgaGagActAaaActAagAacGatTgaTtgAagGtcTg
CEUGT_F39G3.1_u466_LNA3 TacAatTctTtgmCagGaaGcaAtaTccGccGgaGtcmCccmCttAtcActAt
CEUGT_M88.1_u480_LNA3 mCtcAcgGagGttAtaAttmCtaTgcAggAggmCaaTttmCtgmCtgGagTtcmCa
CEUGT_M88.1_u72_LNA3 AccGttTcaTgaGagmCtgTaaTcaGgtGttGttTctGtaAaaAgtGtgAa
YAL009W_u145_LNA3 GtgGatGtgAaaTtaGtcmCtcAacmCccAgaGcaTttAgtGcaGagAttAg
YAL009W_u341_LNA3 GcaGttTaaTgtGaaGctAgtTaaAgtAcaGtcTacGtgGgamCgaGaaAt
YAL059W_u262_LNA3 AttGccAagTccAttTctmCgtGccAagTacAttmCaaAatAcaAgaAagGc
YAL059W_u51_LNA3 AgamCtcmCtamCaaAtaGatTcgGtgTccTgcmCagAcgAtgTtgAagAatAg
YER109C_u109_LNA3 TtgAagTttGggAatAttGgtAtgGttGaaGacmCaaGgamCcgGatTacGa
YER109C_u436_LNA3 GagGcgmCaaGtaGgcAatGatTcaAgaAgtAgtAaaGgcAatmCgtAacAc
YHR152W_u128_LNA3 TgaGcamCaaAgtTaaGatGttmCggAaaGaaAaaGaaAgtmCaaTccTatGa mCaaGtgAccAatmCagmCacGcamCggmCttmCcaTccTcaAgamCtgAtaTta
YHR152W_u510_LNA3 mCc
YKL130C_u211_LNA3 AtlAaaTgcGcaGatGagGacGgaAcgAatAtcGgaGaaActGatAatAt
YKL130C_u85_LNA3 GatGgtAagmCtgAgcGccTtgGacGaaGaaTttGatGttGtcGctActAa
YKL178C_u199_LNA3 TacGtcAcgmCaaGgamCagAgcTttGacGacGaaAtaTcamCttGgaGgaTt
YKL178C_u367_LNA3 TctmCccTgtGtaGgtAcamCcaAtaTcamCaaGcgmCatTtcTatGtcGacTa
TgcTaamCacmCagTttAgamCcaTggAaaTccmCacmCgcAaaTatAagmCaa
YLR443W_u179_LNA3 Tg
GcaGgamCatAagAttmCcgGtcAagmCaamCgamCagTgaAgaAagTatGcaA
YLR443W_u86_LNA3 a YOR092W_u251_LNA3 mCcgTctAgtGaaAgcGggAtgGctAaaTtgGgaAaamCgamCaaGatGttAt YOR092W_u82_LNA3 GatGctTcaAtaTccTttGatGgtmCgtTagTttAccAttTttGgtGtcTt YPL263C_u132_LNA3 AgtmCatTtgAgtTatGtgAagAccGttGgtGggAaaGaaGagAtcAggTg YPL263C u257 LNA3 GtcTtgGctAccAcamCccAaaAccGttmCgaAacTttAagAgcAttmCtamCt
Example 54. Evaluation of different LNA substituted oligonucleotides as probes for fluorescence in situ hybridization (FISH on metaphase chromosomes and interphase nuclei. Locked Nucleic Acids (LNA) constitute a novel class of DNA analogues that have an exceptionally high affinity towards complementary DNA and RNA. Using human classical satellite-2 repeat sequence clusters as targets, we demonstrate that LNA/DNA mixmer oligonucleotides are excellent probes for FISH combining high binding affinity with short hybridization time and even with the ability to hybridize without prior termal denaturation of
the template. The development of molecular probes and image analysis has made fluorescence in situ hybridization (FISH) a powerful investigative tool. Although FISH has proved to be a useful technique in many areas, it is a fairly time-consuming procedure with limitations in sensitivity. Probes with higher DNA affinity may potentially reduce the time needed for hybridization and the sensitivity of the technique. Thus, improvement in hybridization characteristics has been reported for the DNA mimic peptide nucleic acid (PNA). This example describes the development of LNA substituted oligonucleotides as probes for fluorescence in situ hybridization on metaphase chromosomes and interphase nuclei. In each experiment a different LNA substituted oligonucleotide of the same 23-bp human satellite-2 repeat sequence (attccattcgattccattcgatc) have been used, cf. Jeanpierre, M. (1994). Human satellites 2 and 3. Annals of Genetics 37, 163-171. Oligomers with various LNA content, different labels, and hybridization conditions have been used and compared with each other and the optimal conditions have been determined for an efficient LNA-FISH protocol. A. MATERIALS AND METHODS Al. Chromosome Preparations
Chromosome preparations were made by standard methods from peripheral lymphocyte cultures of normal males. Slides were prepared 1-4 days prior to an experiment and treated with RNAse (lOμg/μl) at 37°C for one hour before hybridization. A2. Probe preparation
The 23bp human satellite-2 repeat sequence, attccattcgattccattcgatc, was used to prepare the LNA/DNA mixmers with different content and sequence order of LNA modifications (Table 19). All mixmers were labeled in the 5' end with either Cy3 or biotin. Biotin amidite was purchased from Applied Biosystems and Cy3 amidite was purchased from Amersham Bioscience. A DNA oligonucleotide of the same sequence without any LNA modifications was used as a control in each experiment. A3. Fluorescence in situ hybridization
FISH was carried out as described by Silahtaroglu AN, Hacihanefioglu S, Guven GS, Cenani A, Wirth J, Tommerup N, Turner Z.(1998) Not para-, not peri-, but centric inversion of chromosome 12. JMed Genet. 35(8):682-4. (1); with the following modifications including: The amounts of probe were 6.4, 10, 13.4 and 20 pmoles. Denaturation of the target DNA and the probe were performed at 75oC for 5 minutes either separately using 70% formamide or simultaneously under the coverslip in the presence of the hybridization mixture containing
50% formamide. In addition the effect of denaturation was also tested. Two alternative hybridization mixtures were used: 50% formamide/2xSSC (pH 7.0) /10% dextran sulphate or 2xSSC (pH 7.0) /10% dextran sulphate. Hybridization times included 30 min, 1 hr, 2 hrs, 3 hrs and overnight. Hybridization temperatures included: 37°C, 55°C, 60°C and 72°C. Post washing was either as for standard FISH (1), or with 50% formamide/2xSSC at 60°C, or without formamide. Hybridization signals with biotin labeled LNA substituted oligonucleotide probes were visualized indirectly using two layers of fluorescein-labeled avidin (Vector Labs) linked by a biotinylated anti-avidin molecule, which amplified the signal 8-64 times. The hybridization of Cy3 labeled molecules however, was visualized directly after a short washing procedure. Slides were mounted in Nectashield (Vector Laboratories) containing 4'-6'-diamidino-2- phenylindole (DAPI). The whole procedure was carried out in the dark. The signals were visualized using a Leica DMRB epifluorescence microscope equipped with a SenSys charge- coupled device camera (Photometries, Tucson, AZ), and IPLAB Spectrum Quips FISH software (Applied Imaging international Ltd, Newcastle, UK) within two days after hybridization.
B. RESULTS AND DISCUSSION
Satellite-II DNA, composed of multiple repeats of a 23bp and a 26bp sequence, is especially concentrated in the large heterochromatic regions of human chromosomes 1 and 16, but is also found in the heterochromatic regions of chromosomes 9, Y, 15 and in other minor sites like the short arms/satellites of the acrocentric chromosomes and some centromeric regions. Classical satellite DNA can be visualised by FISH with traditional genomic and DNA oligonucleotide probes (see Kokalj-Vokac N, Alemeida A, Gerbault-Seureau M, Malfoy B, Dutrillaux B. (1993) Two-color FISH characterization of i(lq) and der(l;16) in human breast cancer cells. Genes Chromosomes Cancer. 1, 8-14; and Tagarro I, Fernandez-Peralta AM, Gonzales-Aguilera JJ. (1994) Chromosomal localization of human satellites 2 and 3 by a FISH method using oligonucleotides as probes. Hum Genet. 93(4):383-8). Due to this and the presence of distinct major and minor sites of satellite-2 DNA in the genome, we used the 23- bp satellite-2 repeat sequence, attccattcgattccattcgatc, as a convenient model to test the efficiency of various DNA/LNA mixmers for FISH analysis and the effect of different experimental conditions by recording the number, location and strength of signals on each metaphase. To compare the efficiency of mixmers with different LNA content (Table 19) and to optimize the LNA-FISH protocol, different conditions were tried at each step of a
standard FISH protocol as described in Materials and Methods. All LNA substituted oligonucleotides (LNA/DNA mixmer oligonucleotides) for human satellite-2 sequence gave very prominent signals when used as FISH probes. In general, the signal on chromosome 1 was always stronger and appeared earlier, followed by signals on chromosomes 16, 9, Y, 15, other acrocentric chromosomes and the centromeric regions of other chromosomes, respectively (Figure 51). In general, biotin labeled mixmers gave stronger signals with a higher background, whereas Cy3 -labeled molecules gave a significantly lower background. Bl. Effect of LNA content of the LNA substituted oligonucleotides (LNA/DNA mixmers) The LNA-2 molecule which had every other nucleotide modified as LNA. (aTtCcAtTcGaTtCcAtTcGaTc) gave the best results in all the experiment performed. The LNA-3 molecule, with every third oligonucleotide modified as LNA, (aTtcCatTcgAtTccAttCgaTc) also gave hybridization signals, but with less efficiency than the LNA-2 probes. Preferably, an LNA-2 oligonucleotide molecule has an LNA unit at every other nucleotide position in the sequence and an LNA-3 oligonuclotide molecule has an LNA unit at every third position of the sequence. However, minor deviations, e.g. in one position or less than 5-10 percent of the nucleotide positions in the sequence may still provide the general features of an LNA-2 or an LNA-3 molecule.
The Dispersed LNA (aTtccatTcgaTtccAttcgaTc), which had 5 dispersed LNA modifications, was less efficient in short term hybridization, but gave signals on both chromosomes 1 and 16 after overnight hybridizations. LNA/DNA mixmers with 3 LNA Blocks (aTTCcattcgATTccattcGATc) was comparably inferior as a FISH-probe. B2. Effect of amount of the LNA/DNA mixmers
The initial experiments performed with 20 pmol of LNA/DNA mixmer resulted in bright and large signals, but with an extremely high background. Thus, lower concentrations were tested (13.4 pmol, 10 pmol and 6.4 pmol). The concentration giving the optimal signal to noise ratio was found to be 6.4 pmol. B3. Effect of denaturation
The signals on the major sites of hybridization (lq, 16q) were equally bright after both types of denaturation. However, smaller and weaker signals were observed on the minor sites with the simultaneous denaturation protocol. To check the potential "strand invasion" property of LNA, some of the experiments were performed without a denaturation step. As expected, no signals were obtained by the control DNA oligonucleotide probe. In contrast, hybridization signals on chromosomes 1 and 16 were observed after overnight hybridization with LNA
probes, with LNA-2 mixmer giving the best signals. Compared to the signals obtained in experiments involving a denaturation step, the signals were smaller, but prominent and without any background.
B4. Effect of hybridization time, temperature and post-hybridization washes Although signals could be observed after only 30 min of hybridization, the optimal hybridization time and temperature for LNA-2, which gave the best signals, was 1 hr at 37°C. A 3x5 min wash with 0.1xSSC/60°C and 4xSSC/0.05%Tween/37°C, respectively, followed by a 5 min PBS wash was found to be sufficient for washing the slides after hybridization with DNA-LNA mixmers. There was no specific difference between a wash with 50% formamide at 42°C or 60°C.
The signals faded away in most of the slides within two days. When hybridized with directly labeled LNA, the whole slide was stained with Cy3 after three days. Thus, slides had to be analyzed within 48 hours after hybridization. C. CONCLUSION
The experiments have demonstrated, that LNA substituted oligonucleotides are very efficient FISH probes. LNA substituted oligonucleotide probes gave strong signals after only 1 hr of hybridization, and it was possible to omit the use of formamide both from the denaturation and from the post hybridization washing steps and still obtain a very good signal to noise ratio. The ability of LNA to hybridize without prior denaturation could be due to a strand invasion property of LNA and this warrants further investigation with other LNA probes and at different conditions.Based on the combined results of these experiments, the optimal LNA- FISH procedure was defined as follows: 6.4 pmoles of Cy-3 labeled LNA-2 probe was denatured together with the target at 75°C for 5 minutes, and hybridized for one hour then followed by a short post wash without any formamide (3x 5 minutes O.lxSSC at 60°C; 2x 5minutes 4xSSC/0.05%Tween at 37°C; 5 minute PBS). The FISH experiments indicate that LNA containing probes would be valuable for the detection of a variety of other repetitive elements, such as centromeric α-repeats and telomeric repeats. In addition, the superior hybridization characteristics of LNA containing oligonucleotides could lead to detection of single base pair differences between repetitive sequences as well as single copy sequences. Cl. Figure 51 shows a comparison of different LNA/DNA mixmer oligonucleotides. Experiment conditions: 6.4 pmoles of Cy3 labeled probe was hybridized for 30 minutes at 37°C, after simultaneous denaturation of the target and the probe at 75°C for 5 minutes. A.
LNA-2 giving signals on chromosomes 1, 16, 9 and 15, B. LNA-3 giving bright signals on chromosomes 1, 16 and 9, C. Dispersed LNA giving signals on chromosomes 1 and 16 only, D. LNA Block giving smaller signals on chromosome 1, E. DNA oligo giving no signals on any of the chromosomes.
Table 19. DNA/LNA mixmers for human satellite 2 repeat sequence used in this study.
Name FISH probe sequences LNA monomers Tm*
DNA oligo attccattcgattccattcgatc 0 60
Dispersed LNA aTtccatTcgaTtccAttcgaTc 5 71
LNA-3 aTtcCatTcgAtTccAttCgaTc 8 77
LNA Blocks aTTCcattcgATTccattcGATc 9 73
LNA-2 ATtCcAtTcGaTtCcAtTcGaTc 11 84
LNA modifications are depicted in capital letters and *Tm values for each molecule have been calculated using Exiqon's Tm Prediction program accessible at http://lna-tm.com/ and as appears from Figure 27 herein.
Example 55. Highly efficient fluorescence in situ hybridization (FISH) using an LNA probe specific for human telomere repeat
1. Chromosome Preparations Chromosome preparations were made by standard methods from peripheral lymphocyte cultures of two normal males. Slides were prepared 1-6 days prior to an experiment and treated with RNAse (lOμg/μl) at 37°C for one hour before hybridization.
2. FISH probe preparation
A Cy3 -labelled, LNA-2 design of the 24-bp telomere sequence (ttagggttagggttagggttaggg) representing 4 blocks of 6-bp telomere repeat (ttaggg) was used as a probe. A DNA oligomer of the same sequence without any LNA modifications was used as a control in each experiment.
3. Fluorescence in situ hybridization
FISH was carried out as described previously (Silahtaroglu et al, 1998) with the following modifications. The amount of probe was 5pmoles. Denaturation of the target DNA and the
probe were performed at 75°C for 5 minutes simultaneously under the coverslip in the presence of hybridization mix containing 50% formamide. Slides were washed after 30 min. hybridization at 37 °C. Post washing steps included a 2x5min 0.1XSSC at 60 °C; 5min 2XSSC at 37°C; 3min 4XSSC/0.05%Tween20 at 37°C and 5min PBS. Slides were mounted in Vectashield (Vector Laboratories) containing 4'-6'-diamidino-2-phenylindole (DAPI). The whole procedure was carried out in the dark. The signals were visualized using a Leica DMRB epifluorescence microscope equipped with a SenSys charge-coupled device camera (Photometries, Tucson, AZ), and IPLAB Spectmm Quips FISH software (Applied Imaging international Ltd., Newcastle, UK). 4. Results
The human telomere repeat specific LNA oligonucleotide probe gave prominent signals on the telomeres, when used as a FISH probe (Figure 52), whereas no signals could be detected with the corresponding DNA control probe when using the hybridization conditions specified above. Thus, the experiments described here for human telomere repeat, demonstrates that LNA substituted oligonucleotides are highly efficient as FISH probes. 5. References
Silahtaroglu, A.N., Hacihanefioglu, S., Guven, G.S., Cenani, A., Wirth, J., Tommerup, N., Turner, Z.(1998) Not para-, not peri-, but centric inversion of chromosome 12. Journal of Medical Genetics 35(8), 682-684.
Example 56. Fluorescence in situ hybridization using chromosome-21 specific centromere LNA probes.
1. Chromosome Preparations
Chromosome preparations were made by standard methods from peripheral lymphocyte cultures of a normal female. Slides were prepared 5 days prior to use. Before use slides were treated with RNAse A (lOμg/μl) at 37°C for one hour and proteinase K for 10 minutes washed, with 2xSSC 3 times 3 min, before dehydration tlirough a cold ethanol series.
2. Probe preparation
A 5' biotin-labelled, LNA substituted 15-mer FISH probe (aCcCaGcCaAaGgAg, LNA uppercase, DNA lowercase) and a 5' biotin-labelled LNA substituted 24-mer FISH probe
(TgTgTaCcCaGcCaAaGgAgTtGa, LNA uppercase, DNA lowercase) specific for the centromeric human chromosome 21 alphaRI(680) locus alpha-satellite repeat were used as probes. Biotin-labelled DNA probes of the same sequence without any LNA modifications were used as controls in each experiment.
3. Fluorescence in situ hybridization
FISH was carried out as described previously (Silahtaroglu et al, 1998) with the following modifications. The amount of probe was lμM for lthe 15-mer chromocsome 21 FISH probe and 1.4 μM for the 24.mer FISH probe. Denaturation of the target DNA and the probe were performed simultaneously at 79°C for 4 minutes under the coverslip in the presence of hybridization mix containing 50% formamide. Slides were washed after 40 min. hybridization at RT. Post washing steps included a 2x5min 0.1XSSC at 65 °C; 5min 3min 4XSSC/0.05%Tween20 at 37°C. Slides are then incubated lOmin with 1% blocking reagent and a layer of Flourescein conjugated Avidin (Vector Labs) has been applied for 20 minutes at 37°C. After a 3 times 3 minute wash with 4XSSC/0.05% Tween20, slides are dehydrated and mounted in Vectashield (Vector Laboratories) containing 4'-6'-diamidino-2- phenylindole (DAPI). The whole procedure was carried out in the dark. The signals were visualized using a Leica DMRB epifluorescence microscope equipped with a SenSys charge- coupled device camera (Photometries, Tucson, AZ), and IPLAB Spectrum Quips FISH software (Applied Imaging international Ltd., Newcastle, UK).
4. Results
The LNA substituted 15-mer oligonucleotide probe specific for the centromeric human chromosome 21 alphaRI(680) locus alpha-satellite repeat gave prominent signals on chromosome 21, when used as a FISH probe, whereas no signals could be detected with the corresponding DNA control probe when using the hybridization conditions specified above. The LNA substituted 24-mer oligonucleotide probe specific for the centromeric alphaRI(680) locus alpha-satellite repeat gave prominent signals both on chromosomes 13 and 21, when used as a FISH probe, while no signals were observed with the DNA control probe. This is expected, since the aforementioned chromosomes differ only at one nucleotide position in the given probe sequence. On the other hand, the results obtained by the 15-mer LNA FISH probe clearly demonstrates that the LNA substituted probe is capable of discriminating a single mismatch between chromosomes 13 and 21 in the centromeric alpha-satellite repeat.
Thus, the experiments described here for the centromeric repeat-specific LNA probes in the human chromosome 21, demonstrates that LNA substituted oligonucleotides are highly efficient as FISH probes and can be used in diagnosis of chromosome 21 trisomy.
Other Embodiments
From the foregoing description, it will be apparent that variations and modifications may be made to the invention described herein to adopt it to various usages and conditions. The foregoing description of the invention is merely illustrative thereof, and it understood that variations and modifications can be effected without departing from the scope or spirit of the invention.
All publications, patent applications, and patents mentioned in this specification are herein incorporated by reference to the same extent as if each independent publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.