CA2342903A1 - Differential genetic display technique and vector - Google Patents
Differential genetic display technique and vector Download PDFInfo
- Publication number
- CA2342903A1 CA2342903A1 CA002342903A CA2342903A CA2342903A1 CA 2342903 A1 CA2342903 A1 CA 2342903A1 CA 002342903 A CA002342903 A CA 002342903A CA 2342903 A CA2342903 A CA 2342903A CA 2342903 A1 CA2342903 A1 CA 2342903A1
- Authority
- CA
- Canada
- Prior art keywords
- length
- mono
- library
- gene
- complementary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6809—Methods for determination or identification of nucleic acids involving differential detection
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
This invention discloses a method of determining differ- ential display of gene expression comprising comparisons of mono-length cRNA libraries. These libraries are probe hy- bridized to accessible ordered arrays to determine differential hybridization display sites between mono-length segment li- braries and to locate genes of expression differential. Also disclosed is a mono-length library corresponding to or com- plementary with expressed mRNA sequences having cRNA elements 23 nucleotides in length of the formula 5'-gcugga- gaucggnnnnnnnnnnn-3', and wherein n represents one of 11 ribonucleotides in sequence corresponding to a complementary 11 nucleotide sequence of the mRNA.
Description
DIFFERENTIAL GENETIC DISPLAY TECHNIQUE AND VECTOR
Field of the Invention This invention discloses a method of determining differential display of gene expression comprising comparisons of mono-length cRNA libraries.
s These libraries are probe hybridized to accessible ordered arrays to determine differential hybridization display sites between mono-length segment libraries and to locate genes of expression differential.
Also disclosed is a mono-length library corresponding to expressed mRNA sequences having cRNA elements 23 nucleotides in length of the to formula 5'-gcuggagaucggnnnnnnnnnnn-3', and wherein n represents one of 1 1 nucleotides in sequence corresponding to a complementary 1 1 nucleotide sequence of the mRNA.
Background of the Invention is The isolation and identification of differentially expressed genes are significant aspects of understanding molecular mechanisms underlying various biological processes, such as cellular growth and differentiation, disease, and cell response to altered environment. During the last ten years, several techniques have been developed to isolate and identify these 2o genes. Differential display (DD1, first developed by Liang and Pardee, has proven to be a useful technique for the detection of differentially expressed genes. Reference is made to Science 257:967 ( 1992), and to US patent No. 5,262,31 1 to Liang et al., the teachings of which are incorporated herein by reference. This technique allows the detection of differences in ?= RNA expression profiles in multiple samples by the creation of a~ RNA
Field of the Invention This invention discloses a method of determining differential display of gene expression comprising comparisons of mono-length cRNA libraries.
s These libraries are probe hybridized to accessible ordered arrays to determine differential hybridization display sites between mono-length segment libraries and to locate genes of expression differential.
Also disclosed is a mono-length library corresponding to expressed mRNA sequences having cRNA elements 23 nucleotides in length of the to formula 5'-gcuggagaucggnnnnnnnnnnn-3', and wherein n represents one of 1 1 nucleotides in sequence corresponding to a complementary 1 1 nucleotide sequence of the mRNA.
Background of the Invention is The isolation and identification of differentially expressed genes are significant aspects of understanding molecular mechanisms underlying various biological processes, such as cellular growth and differentiation, disease, and cell response to altered environment. During the last ten years, several techniques have been developed to isolate and identify these 2o genes. Differential display (DD1, first developed by Liang and Pardee, has proven to be a useful technique for the detection of differentially expressed genes. Reference is made to Science 257:967 ( 1992), and to US patent No. 5,262,31 1 to Liang et al., the teachings of which are incorporated herein by reference. This technique allows the detection of differences in ?= RNA expression profiles in multiple samples by the creation of a~ RNA
fingerprint for each sample. Briefly, partial complementary cDNA
sequences are amplified from subsets of mRNAs by reverse transcription using modified oligo(dT) primers. The cDNA is used as a template in PCRs with the original 3' primer and an additional 5' 10-mer of arbitrary but s defined sequence. Radiolabeled nucleotide is incorporated into the display reactions which allows the resulting PCR products to be visualized by autoradiography following standard polycrylamide gel electrophoresis.
Regions corresponding to bands of interest are excised from the dried gel, and the cDNA is amplified using the original primer combination and cloned to into a vector. Clones can then be used as probes to confirm differential expression. Since its original description, DD has become an increasingly used methodology for cloning differentially expressed genes. DD has been used in monitoring quantitative gene expression using known cDNA
sequences as targets (Science 270:467 (1995)) and as a tool for the I5 identification of differences in normal and cancer gene expression (Science 276:1268 ( 1997) (the teachings of such references are incorporated herein by referencel.
Compared to other methods used, such as differential screening and subtractive hybridization as noted in J. Natl. Cancer Inst.
20 80:200 (1988); Nature lLond.) 348:699 (1990); Proc. Natl Acad. Sci. USA
88:2825 (1991 )(the teachings of which are incorporated herein by reference), DD presents technical advantages. However, there have been several reports documenting problems with the prior DD technique.
The main criticism has been a frequent failure to confirm the original expression profile (Cancer Res. 54:1139 (1994)) and significantly high false-positive rates have been reported (Nucleic Acids Res. 22:1764 (1994); Cancer Res. 54:1 139 (1994)). One report suggests three main sources of false positive: ( 1 ) artificial differences created in the original s RNA populations by non-standardized extraction procedures, (2) identical-sized DNA fragments that co-migrate with the band of interest on display gels and (3) DNA contamination introduced into the re-amplification PCR
(Miele et al., BioTechniques 25:138 (1998)). The teachings of Miele and the foregoing cited materials are incorporated herein by reference. Indeed, to clones derived from an apparently single display band frequently represent a number of different sequences (BioTechniques 16:1096 (1994); Nucleic Acids Res. 22:1764 (1994)). Thus, the method is poorly reproducible and has been difficult to standardize. Several modifications of the original method have been made in attempts to circumvent these problems. Note is is made of Curr. Opin. Immunol 7:274 (1995); Biochem. Biophys. Res Commun. 199:564 (1994); PCR Methods Appl. 4:97 (1994), US
5,712,126 to Sherman et al.; US 5,459,037 to Sutcliffe et al.; US
5,695,937 to Kinzler et al. The teachings of the foregoing cited materials are incorporated herein by reference. Chuang et al. have developed a new 2o approach based on hybridization for the analysis of differentially expressed genes (J. Bacteriol. 175:2026 (1993)). Other new procedures for expression monitoring of large numbers of genes have been developed using DNA microarrays (Science 270:467 ( 1995); Genome Research 6:639 1996)) or oligonucleotide microarrays (Biochips) (Proc. Natl Acad. Sci.
sequences are amplified from subsets of mRNAs by reverse transcription using modified oligo(dT) primers. The cDNA is used as a template in PCRs with the original 3' primer and an additional 5' 10-mer of arbitrary but s defined sequence. Radiolabeled nucleotide is incorporated into the display reactions which allows the resulting PCR products to be visualized by autoradiography following standard polycrylamide gel electrophoresis.
Regions corresponding to bands of interest are excised from the dried gel, and the cDNA is amplified using the original primer combination and cloned to into a vector. Clones can then be used as probes to confirm differential expression. Since its original description, DD has become an increasingly used methodology for cloning differentially expressed genes. DD has been used in monitoring quantitative gene expression using known cDNA
sequences as targets (Science 270:467 (1995)) and as a tool for the I5 identification of differences in normal and cancer gene expression (Science 276:1268 ( 1997) (the teachings of such references are incorporated herein by referencel.
Compared to other methods used, such as differential screening and subtractive hybridization as noted in J. Natl. Cancer Inst.
20 80:200 (1988); Nature lLond.) 348:699 (1990); Proc. Natl Acad. Sci. USA
88:2825 (1991 )(the teachings of which are incorporated herein by reference), DD presents technical advantages. However, there have been several reports documenting problems with the prior DD technique.
The main criticism has been a frequent failure to confirm the original expression profile (Cancer Res. 54:1139 (1994)) and significantly high false-positive rates have been reported (Nucleic Acids Res. 22:1764 (1994); Cancer Res. 54:1 139 (1994)). One report suggests three main sources of false positive: ( 1 ) artificial differences created in the original s RNA populations by non-standardized extraction procedures, (2) identical-sized DNA fragments that co-migrate with the band of interest on display gels and (3) DNA contamination introduced into the re-amplification PCR
(Miele et al., BioTechniques 25:138 (1998)). The teachings of Miele and the foregoing cited materials are incorporated herein by reference. Indeed, to clones derived from an apparently single display band frequently represent a number of different sequences (BioTechniques 16:1096 (1994); Nucleic Acids Res. 22:1764 (1994)). Thus, the method is poorly reproducible and has been difficult to standardize. Several modifications of the original method have been made in attempts to circumvent these problems. Note is is made of Curr. Opin. Immunol 7:274 (1995); Biochem. Biophys. Res Commun. 199:564 (1994); PCR Methods Appl. 4:97 (1994), US
5,712,126 to Sherman et al.; US 5,459,037 to Sutcliffe et al.; US
5,695,937 to Kinzler et al. The teachings of the foregoing cited materials are incorporated herein by reference. Chuang et al. have developed a new 2o approach based on hybridization for the analysis of differentially expressed genes (J. Bacteriol. 175:2026 (1993)). Other new procedures for expression monitoring of large numbers of genes have been developed using DNA microarrays (Science 270:467 ( 1995); Genome Research 6:639 1996)) or oligonucleotide microarrays (Biochips) (Proc. Natl Acad. Sci.
USA 91:5022 ( 19941; Nature Biotechnoloay 14:1675 ( 1996)1. The microarrays are used to analyze the expression of large numbers of genes in a single hybridization experiment. For example, oligonucleotide probe arrays have been used to screen the reverse transcriptase and protease s genes of the HIV-1 genome to explore genetic diversity and detect mutations conferring resistance to antiviral drugs. BioTechniqiues 19(3):442 (1995); Nat Med 2(7):753 (1996). The teachings of the foregoing cited materials are incorporated herein by reference.
Although these alternatives have improved the original method, to problems associated with RNA quality, false positives, and reproducibility still remain.
A DD methodology which overcomes shortcomings exposed earlier has now been developed.
Is Summary of the Invention This invention comprises a method of determining differential display of gene expression comprising the steps of:
(a) preparing at least two substantially identical accessible ordered arrays of synthetic substantially mono-length oligonucleotide DNA
2o segments representing permutations (and optionally substantially all or all permutations) of possible oligonucleotide sequences, (b) preparing partial-length first cDNA library being a partial-length segment library corresponding to substantially all complementary expressed mRNA sequences of a first gene source;
(c) using the library of step (b) and preparing mono-length first cRNA
library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA sequences of the first gene source;
s (d) preparing substantially expression-length first cDNA library corresponding to or complementary with expressed mRNA sequences of the first gene source, wherein said library is a substantially expression-length transcript library;
(e) preparing like-condition partial-length second cDNA library being a to substantially partial-length segment library corresponding to or complementary with expressed mRNA sequences of a second gene source;
(f) using the library of step (e) and preparing like-condition mono-length second cRNA library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA
Is sequences of a second gene source;
(g) preparing like-condition substantially expression-length second cDNA library corresponding to or complementary with expressed mRNA
sequences of the second gene source, wherein said library is a substantially expression-length transcript library;
20 (h) probe hybridizing the first and second mono-length segment libraries, each with an ordered array of step (a); and (i) comparing the probe hybridized accessible ordered arrays to determine differential hybridization display sites between first and second mono-length segment libraries; and, (k) referencing differential hybridization sites to at least one of said expression-length libraries to locate the gene of expression differential. In one embodiment the method further comprises amplifying partial-length segments corresponding to cRNA from the differential hybridization display sites by amplification of corresponding hybridization segments of a corresponding partial-length library prior to referencing to at least one of said expression-length libraries to identify the gene of expression differential. It will be understood that in particular an expression-length library is complementary to the mRNA of the gene source and in other to embodiments an expression-length library is a corresponding sequence.
Conversion sequences from complementary to corresponding is well understood in the art.
Within the practice of the method, a significant determined differential is noted to be a nucleotide binding differential.
Is In the practice of this invention mono-length libraries are prepared from partial-length libraries by cleaving said partial-length cDNA
corresponding to or complementary with expressed mRNA sequences by a remote-site restriction enzyme, such as with Bpm I. It is understood that such libraries can be either cRNA or cDNA depending on the procedures 2o employed. It is further understood that one strand of cDNA will be complementary to the mRNA and on strand will be corresponding.
In some embodiments, gene sources for expressed cellular cDNA
from mRNA are from cells in either affected or non-affected states.
Particular note is made of mono-length sequences that are cRNA
elements 23 nucleotides in length of the formula 5'-gcuggagaucggnnnnnnnnnnn-3', wherein "n" represents any nucleotide and the 1 1 n nucleotides correspond to a complementary1 1 nucleotide s sequence of mRNA. In some embodiments the mono-length oligonucleotide will be substantially about 13-mer and in others substantially about 23-mer.
One embodiment of a mono-length cRNA segment library comprises a provided nucleotide portion and a inquiry nucleotide portion, and in some embodiments the provided nucleotide portion comprises 5'-gcuggagaucgg-3', or 5'-gcuggagau-3', or optionally at least about 9-mer and the inquiry portion about 14-mer. In some embodiments the inquiry portion comprises a constant and a variable portion and the variable portion is at least about 9-mer or about 1 1-mer, and, optionally, the constant portion of the said inquiry nucleotide portion is about 3-mer. In some such embodiments it is is contemplated that the provided portion comprises 5'cuggag3'. It is further contemplated that the inquiry portion has a 3'-end and a 5'-end, and said 3'-end is 5'cgg3'.
Depending on the embodiment, the inquiry portion is useful in a range of from about 10- to about 36-mer, the provided portion is useful in a 2o range of from about 1- to 20-mer, the variable portion in a range of from about 9- to about 36-mer, and mono-lengths of about 1 1- to about 56-mer.
Consequently, nucleotide segments of the target array as usefully from about 10- to about 56- mer. The inquiry portion is understood to optionally include a constant portion of about 1- to about 6-mer.
The method also includes the differential being determined by comparing comparison reporters selected from the group consisting of chemiluminescent labeling detection or radioactive labeling detection of s hybridization of cRNA sequences. Such detection permits quantitative gene expression analysis based on reporter differentials from complementary oligonucleotides bound to a membrane or DNA micro array, or combinations of such methods. With such methods, comparison is usefully made by alternating steps of quantifying said comparison reporters and incremental to washing between about 40°C and 70°C. Useful increments intervals of about 5 ° or 6 ° C for broader determinations and about 3°C or less for finer determinations. Increments of about about 1 ° or less, and more particularly about 0.5°C are useful far the highest level of discrimination, and are useful with careful attention to temperature regulation.
is The method further includes embodiments in which substantially identical ordered arrays of synthetic mono-length oligonucleotides are accessibly affixed to a substrate, optionally on a nylon membrane, or a biochip. Oligonucleotides are usefully attached to substrate by way of an intervening spacer molecule of at least about 6-carbons, or at least about a 20 12-carbons.
In another aspect, the invention comprises a substantially mono-length segment cRNA library wherein the mono-length segments comprise cRNA corresponding to or complementary with substantially all expressed mRNA sequences of a gene source, and optionally, wherein the mono-lengths are substantially about 23-mer. In some embodiments, said mono-length segments are about 14-mer with about 1 1-mer corresponding to or complementary with the mRNA.
In another embodiment, the invention includes a vector-insertable linker tsuch as a BssH II recognition site adjacent to) comprising a promoter site such as a T3 RNA polymerase promoter site in functional connection with a Bpm I recognition site readably adjacent to a Cla I recognition site which is adjacent to a Not I recognition site, which is adjacent to a Kpn1 recognition site which is adjacent to a T7 RNA polymerase promoter site to which is adjacent to a Bss HII recognition site. One such vector for the vector-insertable linker is pBluescript II SK + cloning vector.
Noted is a vector-insertable linker comprising a T3 RNA polymerase promoter site in functional connection with a Bpm I recognition site, adjacent to Cla I/Msp I residual recognition site adjacent to an insert, and is particularly a 14-mer insert, from a cDNA probe source adjacent to a Not I
recognition site adjacent to Kpn1 recognition site adjacent to either a T3 or T7 promoter site which is adjacent to a Bss HII recognition site. Optionally, the aforementioned vector-insertable linker is inserted into the pBluescript II
SK + cloning vector.
2o In yet another embodiment, the invention comprises a mono-length cRNA library corresponding to or complementary with expressed mRNA
sequences, wherein the library consists of cDNA elements 23 nucleotides in length of the formula 5'-gctggagatcggnnnnnnnnnnn-3', and wherein n represents one of 1 1 nucleotides in sequence corresponding to a complementary 1 1 nucleotide sequence of mRNA.
In another embodiment, the invention includes a method of determining differential display of gene expression comprising the steps of:
s (a) preparing at least two substantially identical accessible ordered arrays of synthetic substantially mono-length oligonucleotide DNA
segments representing all permutations of possible oligonucleotide sequences, (b) preparing a partial-length first cDNA library being a partial-length segment library corresponding to substantially all complementary expressed mRNA sequences of a first gene source;
(c) using the library of step (b) and preparing mono-length first cDNA
library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA sequences of the first gene is source;
(d) preparing like-condition partial-length second cDNA library being a substantially partial-length segment library corresponding to or complementary with expressed mRNA sequences of a second gene source;
(e) using the library of step (e) and preparing like-condition mono-length second cRNA library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA
sequences of a second gene source;
(f) probe hybridizing the first and second mono-length segment libraries, each with an ordered array of step (a); and (g) comparing the probe hybridized accessible ordered arrays to determine differential hybridization display sites between first and second mono-length segment libraries; (such as in forensic applications) and in some embodiments, (h) referencing differential hybridization sites to at least one a nucleotide sequence data base to identify the gene of expression differential. With such method it is useful to amplify the mono-length segments of the differential hybridization display sites with corresponding hybridization segments of a partial-length library prior to referencing the 1o data base to identify the gene of expression differential. It is understood that a substantially expression-length library data base is usefully employed in some embodiments of this aspect of the invention.
The invention includes a method of detecting a point mismatch probe nucleotide in short mono-length nucleotides derived from partial-length ~5 segment or expression-length segment nucleotide libraries comprising (a) hybridizing probe nucleotides against an accessible ordered array of synthetic substantially short mono-length target oligonucleotide DNA
segments representing all permutations of possible oligonucleotide sequences 2o wherein the probe nucleotides and the target nucleotides are of substantially equal length, and (b) quantitatively assessing hybridization sites (c) washing these hybridization sites under stringent conditions (d) detecting post-washing variation in hybridization sites representative of a point mismatch nucleotide, and, in particular embodiments further (e) referencing the site variations) to a partial-length or expression s length segment of at least one of the libraries. In this method, optionally, the probe nucleotide has a constant portion and an inquiry portion, and the target site oligonucleotide has a portion complementary to the constant portion of the inquiry oligonucleotide. In such method, the washing of step (c) usefully further comprises the step of inhibiting competitively hybridization of the provided portions.
In a particular embodiment the invention comprises a mono-length gene tag library, wherein the mono-length gene tags are either cDNA or cRNA. The mono- length insert segments are in specific embodiments about 23-mer for cRNA and about 23-by for cDNA, and optionally about 15 14-mer or 14-by segments respectively correspond to or are complementary with expressed mRNA sequences of a gene source. In one embodiment of a cRNA mono-length library, mono-length segments are substantially 23-mer in length.
Brief Description of the Drawin4s ?o Fig. 1. (A)-(F) is a diagrammatic flow chart of the differential display technique in stages.
Fig. 2(A)-(D) provides sequence information for specific vector, probes and targets.
Fig. 3.is a flow chart of procedures in a particular embodiment.
SUBSTITUTE Sf-I~ET (RULE 26) Fig. 4(A), (B) and (C) depict accessible array test results.
Detailed Descr~tion of the Invention This invention will be better understood with reference to the s following definitions:
A. Gene is a unit of genetic information. In one embodiment it is a region of DNA that encodes information for a discrete gene product that is a protein or RNA. Depending on the organism, genes can reside in DNA or RNA form. As used herein, genes shall be expansively understood to mean genetic information that encode for cellular products which are required for sustaining normal cellular processes in all living organisms. Genes are typically organized into coding and non-coding sequences (known as exons and introns, respectively) and are also comprised of other elements which control expression is (e.g., promoters, enhancers, operons, and other regulatory regions).
Particular reference is made to the use of the present technology as applicable to the analysis of gene expression in any organism including without limitation, viruses, bacteria, fungi, plants, and animals including humans, and further including neoplasms and cell 2o cultures including chimeric cell types.
B. Expressed genes or expressed nucleotide sequences as to a gene source shall mean the mRNA expressed in a living cell or in a cellular system under the conditions of culture. For stability and ease of manipulation, in many embodiments the mRNA expressed is then translated into complementary DNA (cDNA) for differential diagnostic procedures.
C. Differential Display of gene expression shall mean an assay system which detects differences in the level or type of expression of s particular genes by comparison of two populations of cells.
Typically, DD is capable of monitoring a wide range of expressed genes up to and including the entire complement of expressed sequences in a given cell population. Because only expressed genes are monitored in relation to a particular trait by the present 1o technique, DD represents a functional genomics approach with respect to gene discovery experimentation. DD by the present method permits the identification of single and multiple genes whose regulation has been perturbed and/or are involved in deregulation.
Deregulation may involve gross or subtle changes in gene expression is including under-expression (including zero expression) and over-expression (including always present expression).
D. Detected differentials arise from a number of affected and non-affected gene sources. Affected cells shall mean cells which are diseased, or which have been induced by chemical agents, toxins, biological 2o agents, etc. Non-affected shall mean the opposite, or lack, of the particular property under study in the affected cell. Non-affected shall also mean what is generally referred to in scientific studies as a control subject, as e.g. in a case-control association study. Again.
without being bound by any particular theory. it is believed that in particular cases such differentials will arise from, without limitation, (i) gene induction or repressor agents (e.g. agents which through their presence can activate, or repress, gene transcription and 5 thereby causing altered or deregulated metabolism) (ii) chemical toxins (e.g. chemical compounds which through their presence in sufficient concentration will poison the cell) (iii) physical stress (e.g., cold or heat shock, UV exposure etc.) (iv) disease or resistance state (e.g. susceptibility to disease or 1o reduced vigor caused either by genetic factors or by pathogens, viruses, bacteria, diet and other environmental factors) (v) tissue specificity (e.g. expression which is specific to brain, muscle, liver, spleen, etc.) ~5 (vi) strain or varietal specificity (e.g. particularly in Ag-bio applications: weight, height, color, etc.) (vii) developmental specificity (e.g. genes specifically expressed during specific developmental time periods such as prenatal, young, adult, aged etc.), and (viii) gene mutation (e.g. deletions, insertions, expansions and amplifications, point mutations (transitions, transversions), missense mutations, nonsense mutations etc.).
E. Library shall mean substantially the totality of genetic material in the sample which the library defines. Thus an expressed gene library shall represent substantially the totality of the expressed genes. It is understood that depending on the library construction methodology, some segments of genes will not be included in the library. By way of example, a library constructed from Msp I and Not I and a vector cloning site compatible with these enzymes results in a library that is greater than about 90% (estimated) representative of the totality of expressed sequences. The proportion of representation depends on multiple factors including the cutting frequency of the enzymes used, the nature of cut sites and cloning efficiency. The palindromic recognition site of Msp I is compatible with CpG islands which tend to be located in gene-rich regions of the genome. Additionally, since Not I is a rare cutter and Msp I is a frequent cutter, the vast majority of fragments will be Msp I - Not I. In the pALLgenes construct disclosed, only Msp I-Not I fragments will insert into the vector. The remainder of fragments will be Not I - Not I fragments or fragments which do not have an Msp I site, both of which are not vector insertable and thus are not represented in the library.
F. Substantially mono-length segment library shall mean a library comprising gene tag probes, either cDNA or cRNA, that are short --2o from about 9-mer to about 56-mer (or -by as to DNA) -- and uniform in length. Uniform in length shall be understood to mean either exactly equal, or in particular instances, about ~ 15%, and preferably about ~ 5%. Such mono-length segments function as probes and are gene tags representing expressed mRNA sequences from a gene source. Mono-lengths of cRNA are used for hybridization against an immobilized array of complementary target sequences. In the context of this invention, the shortness and uniformity of length favors the specificity and reproducibility of probe s to target hybridization. While it is assumed that an ideal hybridization takes place when both probe and target are exactly the same length, this is not required in every embodiment. Some variation in length (e.g ~ 1 or 2-mer) will be tolerated in some cases (depending on the application) while still producing a high degree of 1o specificity and reproducibility. It will be understood by those skilled in the art that the direction of transcription creates either a complementary or corresponding mono-length library relative to the mRNA.
G. Provided nucleotide portion shall mean the nucleotide sequence specific s to the probe which corresponds to a segment of vector sequence before recombinant insertion of any foreign sequence. In the embodiment of Fig. 2, the sequence of the probe (or mono-length segment) is 5'-gcuggagaucggnnnnnnnnnnn-3', which is exactly 23 nucleotides in length. Starting from the 5' end, the probe sequence 2o breaks down as follows:
1 ) 5'g3' is a provided nucleotide portion; the first base to be polymerized by the T3 RNA polymerase enzyme immediately downstream and the last nucleotide of the T3 promoter site. The complete T3 RNA polyrnerase promoter site is 5'aattaaccctcactaaaggg3'.
2) 5'cuggag3', an element of the provided nucleotide portion is the recognition site of Bpm I restriction enzyme. Bpm I docks onto this s site and cuts 16 nucleotides downstream from this site (or 14 nucleotides on the opposite 3' strand).
3) 5'au3', is an element of the provided portion. This sequence is part of the Cla I recognition site. Prior to insertion, the full Cla I
recognition site is 5'atcgat.
to 4) 5'cgg3', is the constant moiety of the inquiry nucleotide portion.
Because it was originally the 5' end of the DNA (5' Msp I- Not I 3' cDNA fragment) and becomes part of the Cla I / Msp I recognition site, it is constant. Starting from 5'cgg3' and proceeding all the way to the 3' end of the probe sequence, the inquiry portion is then a is 14-mer and corresponds to the cDNA-inserted portion of the vector.
5) 5'nnnnnnnnnnn3' is the variable moiety of the inquiry portion in this example. It is cRNA corresponding to the first 1 1 bases after the Msp I recognition site of the 5' end of the Msp I - Not I cDNA fragment reverse-transcribed from mRNA of the genomic material being examined. The n's 2o represent the nucleotide or ribonucleotide bases (ie. n = a (adenine), g (guanine), or c (cytosine), and t (thymine) or a (uracil) (t with DNA, a with RNA) thus there are 4" permutations of possible sequence of the 1 1-mer nucleotide.
H. Inquiry nucleotide portion shall mean the nucleotide sequence specific to the probe which corresponds to the insert sequence after recombinant insertion of a foreign nucleotide sequence. The inquiry nucleotide portion is variable for any given probe generated from a s given vector. In the embodiment of Fig. 2, the inquiry nucleotide portion is a 3' cRNA tag since its sequence is ultimately derived from the 3' reverse-transcribed portion of expressed mRNAs. In particular embodiments "short" mono-length segments, that is from about 1 1-to about 56- mer are useful, with particular reference to 9- to 13-to mer, and particularly about 9-mer. Such short mono-length segments consist of a "provided nucleotide portion" of from about 1- to about 20-mer. A provided nucleotide portion is made continuous with an "inquiry nucleotide portion" of from about 10- to about 36-mer, with particular reference to about 12-to about 16-mer is and more particularly about 14-mer. The inquiry portion can be further segregated into a variable portion or moiety and a constant portion or moiety. The variable moiety is noted in particular embodiments as being from about 9- to about 36-mer with particular reference to about 1 1-mer and the constant portion of about 1- to 2o about 6-mer with particular reference to about 3-mer.
I. Partial-length segment library shall mean a cDNA library generated by cloning of partial, as opposed to complete, sequence representations of expressed RNAs. By way of example, Msp I - Not I DNA
fragments cloned into the pALLgenes vector (Example 3) constitute a partial-length segment library. The construction of the pALLgenes vector is defined in Example 2. The partial-length segment library is, in a particular embodiment, a precursor library used to generate the short mono-length cRNA segment library by digestion with Bpm I
5 followed by polymerization by T3 RNA polymerise enzyme. It is also the library used in an intermediate selective amplification step in one back figuring strategy used to obtain the entire sequence of the differentially expressed gene (Fig. 3). It is noted that Not I shall mean the restriction endonuclease or enzyme from Nocardia otitidis-to caviarum. Msp I shall mean the restriction endonuclease or enzyme from Moraxella species. Cla f is the restriction endonuclease or enzyme from Caryophanon latum. Bpm I shall mean the restriction endonuclease or enzyme from Bacillus pumilus which was obtained from New England Biolabs (Degtyarev, S.K. and Morgan, R; NEB
t5 Culture Collection 71 1, Beverly, MA, USA). T3 RNA polymerise shall mean the bacteriophage T3 DNA-dependent RNA Polymerise (Morris, C.E. et al., Gene 41: 1931 that recognizes a bacteriophage-specific promoter and initiates synthesis of RNA on double-stranded DNA templates. T3 initiates synthesis of RNA in a particular 20 orientation on a double stranded DNA template. Examples of other RNA polymerises include bacteriophage SP6 and T7 DNA-dependent RNA Polymerise (Butler, E.T and Chamberlain, M.J., J. Biol. Chem.
257: 5772; Davanloo, P. et al., Proc. Natl. Acid. Sci. 81 : 2035).
WO 00/14273 PCT/CA99/00?89 J. A library of substantially expression length sequences shall mean a library comprising a substantially complete representation of substantially complete sequence mRNA. This library, reversed transcribed to cDNA, is conveniently cloned into a suitable phage s vector for subsequent retrieval of the predominantly 5' end of the RNA sequence using a back figuring strategy (Fig. 3). Using this strategy, the entire sequence of any differentially expressed gene can be obtained by selective amplification and sequencing since this library confiains substantially full-length cDNA sequencing templates.
As RNA is relatively less stable than DNA and may possess secondary structures which inhibit or prevent elongation by polymerase enzymes, manipulation artifacts may cause these templates to be substantially, as opposed to completely, full-length.
However, when required, obtaining a complete sequence can be is achieved by supplementary strategies and steps. For example, it is possible to generate an expression-length cDNA library using 5'-end mRNA-based capture methods (Edery, I, Chu, L.L., Sonenberg, N., and Pelletier, J., "An efficient strategy to isolate full-length cDNAs based on an mRNA Cap Retention Procedure (CAPture)," Mol. Cell.
2o Biol. 15: 3363-3371, 1995). Of course, with the availability of searchable databases of expression length sequences or other representative sequences, in some embodiments, it is possible to search directly from partial-length sequence to database without need for hybridization to a substantially expression-length library.
WO 00/14273 PCT/CA99/007$9 K. DNA shall be expansively understood to mean a molecule of deoxyribonucleic acid, comprising the component purines and pyrimidines a, t, g, and c in any combination. DNA is single-stranded or double-stranded, and in certain systems, triplex. Sources of DNA
include genomic sources (e.g., from the cell nuclei of living organisms), recombinant vectors (e.g., plasmids, phage, cosmids, BACs, YACs, etc.) or artificial synthesis products (e.g., in vitro polymerized sequences such as PCR products, sequencing reaction products, etc.l. A string of DNA is polymerized by addition of 1o nucleotides. In some instances, the term nucleotides is broadly intended to encompass ribonucleotides, particularly in discussions of RNA.
L. RNA shall be expansively understood to mean a molecule of ribonucleic acid comprising the component purines and pyrimidines a, u, g, and c in any combination and particularly messenger ribonucleic acid (mRNA). RNA is generally transcribed from DNA. mRNA is single-stranded and normally contains a poly-A tail (i.e., a string of adenines) at its 3' end. A string of RNA is polymerized by addition of ribonucleotides.
2o M. Like-condition replication shall mean substantially similar culture and library preparation conditions but for a perturbation in the comparison replication. (e.g. induced vs. non-induced, diseased vs. normal, trait-plus vs. trait-minus, etc.). In the method of this invention, care in preparing corresponding affected/non-affected cell libraries reduces or eliminates extraneous differentials of expression.
N. by and -mer shall mean the abbreviated forms of the terms base pairs) and oligomer(s), respectively. The number of oligomers shall mean s the number of nucleotides contained in a given single stranded DNA
or RNA string. For example, a 20-mer refers to a single-stranded DNA or RNA sequence comprising 20 bases. A base pair refers to the number of nucleotides contained in a given double-stranded complementary DNA string. Thus, a 20-by fragment refers to a double-stranded DNA sequence comprising 20 complementary bases.
O. Remote-site restriction endonucleases shall mean enzymes which do not fall within the normal category of restriction endonucleases which typically recognize palindromic sequences and cut within the palindromic sequence. Remote-site restriction enzymes typically do is not require perfect palindromes as recognition sites and typically cut at a distance from their specific recognition site. In the practice of this invention it is contemplated that these remote distances are in a range of between about 3 and about 50 nucleotides as being particularly useful. Particular note is made of Bpm I with a 2o recognition site of 5'ctggag3' and also of the isoschizomer Gsu I.
Examples of other remote-site restriction enzymes include Bsa I, BseR I, BsmF I, Fok l, Hga I, Hph I, Mbo II, Mnl I, Ple I and SfaN I.
P. An ordered array or oligonucleotide array detector refers to an ordered array of specific oligonucleotides affixed or immobilized on a substrate suitable for carrying out multiple hybridizations at one time. "Specific" as to oligonucleotides means a collection of oligonucleotides which comprise all possible permutations of nucleotide sequences. The ordered function arises from identified oligonucleotides being attached at specific and known loci. Substrate shall mean any solid phase surface to which oligonucleotides, whether with or without linker or spacer groups, may be attached, e.g., nylon membrane, glass slide, biochip. Attachment methods include chemical treatment (e.g., EDC1, UV cross-linking, covalent bonding, or other methods which allow the oligonucleotide to be stably immobilized or affixed onto a solid phase.
Q. Accessible array shall mean that the oligonucleotides of an ordered array are positioned so as to be available for hybridization with minimized steric or mechanical hindrance. Accessibility is achieved by a variety of means including attaching a spacer molecule of from ~5 about 2 or 3 carbons to as many as 20 or more carbons between the oligonucleotide and the substrate. A carbon spacer molecule refers to, as an example, an N-MMT-protected aminododecanol phosphoramidite which is added to the 5' end of an oligonucleotide during its synthesis.
2o R. Indicator of complementarity shall mean a reporter system for the purposes of establishing the occurrence and the amount of hybridization resulting from complementary base-pairing of target to probe sequences. As examples, the reporter may be a signal emittor such as a radioactive label, fluorescent label or chemiluminescent label physically attached to the probe so as to permit signal detection by a fluorescence scanner, confocal microscope, phosphor screen, photographic paper or other signal detection system. In some embodiments, the indicator of complementarily is of sufficient 5 sensitivity and dynamic range so as to be capable of detecting a minimum of hybridization (including zero complementarity) to a maximum of hybridization (including all sequences hybridized).
S. Incremental wash conditions shall mean washing the hybridized array with incrementally stringent conditions, usually by changing to temperature or salt concentrations in the wash buffer. At each incremental wash step, the hybridization pattern is monitored and recorded and subsequently analyzed as to changes. By comparing target sequence information and the different hybridization patterns obtained under different stringency conditions, a determination of the t5 differential expression signals is made. By carefully monitoring small changes over small increments wash conditions, it is possible to distinguish false positives (e.g. differences due to SNPs or GC-rich sequences) and false negative signals (e.g. differences due to secondary structure or rare RNAs) from actual.
2o T. Target shall mean a DNA sequence immobilized onto a solid phase substrate for hybridization with complementary sequences. In the context of one example provided, the target shall be a correspondingly short complementary sequence substantially identical in length to the probe Ihere, about 23-mer). However, given cost and feasibility constraints related to working with conventionally-synthesized oiigos and membranes, it is also contemplated that the target is not always of identical length (ranging from about 8-mer to about 23-mer in particular examplesl.
s In such circumstances, specificity and reproducibility of probe to target hybridization is augmented with "blocker" oligos added during the hybridization step to mask complementary sequences (competitive inhibition) in the provided nucleotide portion of the probe thus preventing this portion from interfering with base-pairing to of inquiry nucleotide portion with the target (Fig. 2).
U. Complex traits shall mean physiological, physical, or metabolic traits which cannot be explained by a single causal gene but rather by multiple genetic and/or environmental factors. For example, a trait may arise as a result of multiple gene products, interacting, and/or is interacting in a dose dependent manner. Dissection of complex traits, therefore, requires special tools that monitor multiple gene events, including inactivity activity, and level of activity, simultaneously.
V. Gene site anomaly shall mean a DNA sequence which is deviant from 2o the normal, control or non-affected state, e.g., single nucleotide polymorphisms (SNP), point mutations including transitions and transversions, deletions, etc. In some instances, a gene site anomaly causes lack of expression, over-expression, silent and functional changes in protein sequences, differential splicing of RNA, etc.
An overview of the process of the technology is presented in Fig. 3.
Fig 3 discloses one line of cells to be compared as to affected and non-affected states or populations. First, mRNA is extracted and reverse-transcribed to generate a cDNA library as to both the affected and non-s affected states. This cDNA is then used to generate an expression-length library for each population consisting of complete gene sequences and a partial-length library consisting of the 3' end of gene sequences. The latter library is digested with a remote-site cutter followed by polymerization with an RNA polymerase to generate a short mono-length library consisting of to cRNA probes. These probes are used to hybridize against a set of complementary and specific oligonucleotide targets. Hybridization patterns, conveniently on a chip or microarray, are compared between populations of cells to in order to determine differential gene tag signals corresponding to a gene or genes of interest. If desired, these signals can be traced back to Is an address on the target array and thus an oligonucleotide sequence. The latter sequence is used to produce one of the primers required for selectively amplifying and sequencing the 3' portion of each gene of interest from the partial-length library. In turn, 3' sequence of each gene of interest is used to generate one of the primers required for selectively 2o amplifying and sequencing the remaining 5' portion of each gene of interest from the expression-length library. If the gene of interest is known, comparing the obtained sequence from available genomic databases will identify the differentially expressed gene and possibly its function. If the gene of interest is novel, further experimentation is initiated to identify its function with respect to the trait of interest.
In a broad embodiment, the technology entails generating a library of substantially mono-length cRNA gene tags using a vector and remote-site s restriction enzyme which cuts at a fixed distance downstream from its recognition site. Libraries of such substantially mono-length gene tags are compared between or among two or more populations of cells by hybridization to respective ordered arrays of specific oligonucleotide target sequences. Such oli~onucleotides are conveniently immobilized onto a substrate such as membrane, a chip, or the like to provide stable loci for an "indicator of complementarity." The presence or absence of complementarity -- often as disclosed by hybridization patterns on an immobilized oligonucleotide array under incremental wash conditions -- is noted. These patterns and changes in these patterns, with particular ~ s reference to differentials are analyzed for changes in signal intensity.
Clearly, such analysis is facilitated by the use of computers.
In some embodiments, analysis is performed as described in U.S.
5,795716 to Chee; U.S. 5,800,992 (Affymetrix); U.S. 4,74,043 to Bacus;
Drmanac et al., "An algorithm for DNA Sequence Generation form k-Tuple Zo Word Contents of the Minimal Number of Random Fragments," J. of Biomolecular Structure & Dynamics, 8:1085-1 102 (1991 ); and Southern et al., "Analyzing and Comparing Nucleic Acid Sequences by Hybridization to Arrays of ~ligonucleotides: Evaluation Using Experimental Modefs,"
Genomics, 13:1008-1017 (1992), the teachings of which are incorporated herein by reference.
Target sequences which show significant differences in hybridization patterns between the compared cell populations are traced back to the s appropriate gene tag. A corresponding full-length nucleotide sequence is subsequently retrieved by a first hybridization with a partial-length gene source library and then from an expression-length data base or source library. In some embodiments using libraries, PCR and sequencing using primer sequences respectively corresponding to a) the gene tag sequence of interest and an internal vector sequence and b) the obtained 3' mRNA
sequence and an internal vector sequence is useful.
In a particular embodiment, the invention comprises generating a library of 3'-specific short mono-length cRNA probes (gene tags) using a cloning vector and the remote-site restriction enzyme Bpm 1. Bpm 1 cuts is 16 nucleotides downstream from its recognition site. For a two population comparison, libraries of such gene tags derived from two populations of cells are compared by parallel hybridizations to ordered arrays of specific oligonucleotides immobilized onto a membrane or a chip. The presence or absence of hybridization signals on an array under incremental wash 2o conditions is monitored and analyzed and gene tags which show significant differences in hybridization patterns between the two cell populations are traced back to the appropriate specific oligonucleotide target. The corresponding full-length RNA sequence is subsequently retrieved from prepared cDNA libraries, usually in two steps; by selective PCR and sequencing. This technology permits the rapid and cost-effective identification of candidate genes associated with different states of induction or disease.
In one embodiment a specialized cloning vector, pALLgenesT"~, is s formed as follows. This specialized vector is a modified pBluescript I1 SK
+ cloning vector. (Short, J.M. et al, Nucleic Acids Res. 16:7583-7600 (1988)) in which the original linker is replaced. It is understood that other cloning vectors with replacable linkers are also useful, including by way of example, pGEM, pUC19, and pBR322.
The choice of replacement linker is an important enabler of the pALLgenes vector because it allows the insertion of Msp I / Not I
fragments, and the subsequent generation of mono-length cRNA probe sequences. Other useful linkers are those which contain enzyme recognition sites, arranged in the proper spatial arrangement, so as to is permit, in order, the ligation of cDNA restriction fragments from a gene source, the linearization of the vector by action of a remote-site cutter, and the polymerization of mono-length sequences by action of an RNA
polymerase. The spatial arrangement of the enzyme recognition sites, as well as special design of the enzymes themselves, determines the final 20 overall length of the mono-length sequences. By way of example, the probe length is shortened if the choice of enzymes is one in which their recognition sites overlap in the vector. Conversely, and also by way of example, the enzymes are selectively engineered such that they will be compatible with overlapping recognition sites.
In some embodiments, the linear amplification of inserts during the cloning process permits the detection of quantitative differences in mRNA
expression, including the detection of both abundant and rare or less-represented mRNA transcripts. Because rare mRNAs are less abundant in s the overall ratio of the library transcripts, they tend to be more difficult to detect and can sometimes fall below the threshold of the limits of detection. Linear amplification of the library promotes the detection of rare transcripts while largely conserving the actual proportional relationship of transcript abundance in the gene source.
io The particular spatial arrangement of the Bpm I restriction enzyme in the pALLgenes vector is useful in generating a library of short mono-length probes. Hybridization of these probes to an array of correspondingly short and uniform length target sequences tends to be specific and reproducible compared to hybridizations obtained with variable length probes and ~s targets.
The use of short mono-length probes and targets which are similarly short and mono-length improves capon conventional differential display techniques which tend to generate a multitude of false positive and false negative signals, thereby making the resulting data more difficult to 2o interpret. Without being bound by any particular theory, it is believed that false positive and false negatives are caused by non-specific hybridization of variable, random, and longer mRNA probes since multiple sites will be randomly recognized by a coincidentally complementary target in a longer and more variable length probe. Having a tightly restricted range of probe and target length reduces the overall complexity and randomness of hybridization such that the resulting hybridization patterns are available to be unambiguously interpreted and reproduced.
The particular gene structural region represented by the probe is s useful for detecting functionally important differential sequences or expression. Without being bound by any particular theory, and in view the observations which suggest that 3' and 5' end sequences are more likely to be divergent, there is less interference from false positive hybridization signals arising from gene homolog sequence as the probes generated by io this technology in one embodiment represent the 3' end of expressed gene sequences. Also, because the gene extremities are less conserved, it is more likely that sequence differences will be present in the gene regions represented.
In a particular embodiment, the Bpm I enzyme can also be used to Is generate a library of short mono-length cDNA fragments. Such a library is generated by cutting the cDNA-inserted vector such as the pALLgenes vector with Bpm I, digesting the linearized vector with Not I to remove a section of the inserted cDNA followed by religation of the vector. This procedure results in a cDNA library of gene tags representing equivalent 2o sequences to the cRNA probes described above. Such a library is useful for generating a complementary set of cRNA probe sequences by polymerization from the reverse orientation using T7 RNA polymerase and the T7 polymerase promoter site located within the replacement linker.
In a particularly efficient embodiment, use of floating targets and incremental stringency increases specificity. The use of a spacer molecule attached to each oligonucleotide sequence on the membrane array favors probe hybridization to its complementary target. Particular note is made of s 6- to 20-carbon spacers, with more particular reference to 12-carbon spacers. Oligonucleotide attachment to the membrane is particularly effective with 12-carbon spacers and that this attachment is stable under different wash conditions. 1t is believed that the spacer enables the entire oligonucleotide sequence to "float" freely over the membrane during to hybridization and thus allows maximum availability for complementary sequence hybridization with the probe (Zhang et al. "Single-base mutational analysis of cancer and genetic diseases using membrane bound modified oligonucleotides," Nucl. Acids Res. 19: 3929-3933 (1991 ), the teachings of which are incorporated herein by reference.) Specific conditions to ~s optimize complementary sequence hybridization with the probe vary with membrane type, oligonucleotide concentration, spacer-type, and spacer attachment conditions. Different membrane types will accept attachment of spacer-linked oligos with varying degrees of effectiveness thereby reducing the final amount of target attached to the membrane. The length 20 of the spacer molecule will affect the degree to which the complete nucleotide sequence of the oligo target is exposed and accessible to the complete nucleotide sequence of the probe. Efficient probe hybridization will also depend on the relative annealing strengths of complementary base-pairs. The stronger hydrogen bonding of gc compared to at base-pairing means that gc-rich probe sequences will tend to hybridize to their complementary targets at a greater efficiency in spite of other factors such as spacer length or oligo attachment method. Optimization of these parameters increases the possibility of detecting hybridization differentials s due to a single nucleotide mismatch.
In particular embodiments, the specificity of probe-to-target hybridization is further increased by subjecting the hybridized membrane or array to a series of increasingly stringent wash steps at incremental temperature settings. Under these conditions, the specificity of probe-to-target hybridization is very high and is able to discriminate down to single base mismatches. This feature can be employed as a "variable specificity"
adjustment, according to the particular requirements of the application involved. That is, wash temperature increments on the order of about every 0.5°C to about 1 °C, or about 3°C are useful for single base is mismatches, with about 3°C or greater increments for more gross differentiation.
By way of example, assays seeking to detect single mismatches in a collectively unknown sample of probes would be monitored by finer increments as would be the case in an assay seeking to detect a deletion 2o comprising several nucleotides of sequence which could be effectively monitored at larger temperature intervals. At each stringency increment the hybridization pattern is recorded. Subsequent analysis of the totality of recorded patterns is made to determine the degree of complementarity of the probe with respect to the target. In deducing complementarity, it is important to take into account interacting thermodynamic properties and other factors including the sequence of the target, the melting temperature of the double stranded DNA, gc and at content, self-annealing loop and stem secondary structures, constitutive binding of constant provided s sequences, polymorphism discrimination, temperature, salt concentration and other hybridization buffer conditions, etc. Analysis can be performed by any differential computational method.
In particular embodiments, the present invention permits simultaneously monitoring of all expressed genes. This is accomplished by ~o employing an ordered array with a density of specific oligonucleotide sequences sufficient to cover all or substantially all possible permutations of expressed gene sequence tags. By simultaneously monitoring the expression of multiple genes against a background of all or substantially all possible genes, the technology provides a complete and detailed profile of ~s gene expression. This feature improves upon similar and conventional hybridization-based differential display techniques that are, for example, limited to detecting known RNA sequences. It is estimated that cells of complex organisms, such as humans, contain between 20,000 to 30,000 different RNA species. In one example, probe generation using Msp I and 2o Not I, and hybridization against an array of > 64,000 targets, is sufficient to cover the vast majority of, but not all, expressed genes. The reservation is because Not I - Not I fragments and gene sequences without an Msp I site will not be ligated into the cloning vector and therefore, will not be represented in the library. To compensate for this, the use of additional sets of restriction enzymes proportionally increases the chances of covering the totality of RNA species expressed in a cell since fragments not vector-insertable by a first combination or set of enzymes will likely be vector-insertable with a different combination or set of enzymes and/or a s different set of cloning vectors.
A number of restriction enzymes can be used at particular stages of the process. The invention is not restricted to Not I and Msp I and alternate enzymes will work. Favored enzymes have some notable properties. An important property of Not I enzyme is that it is a rare-cutter.
An important property of Msp I is that it is a frequent-cutter and has a recognition site which favors the probability that it will cut within CpG
islands. CpG islands are associated with regions of the genome that are gene-rich and implicated in DNA regulation. Since CpG islands are more likely to be conserved, a directed evaluation of these genomic regions will ~5 be particularly useful in gene identification studies using founder populations. Additionally, since Msp I - Not I fragments generated by this protocol wilt represent the vast majority, but not all, of the totality RNA
species, alternate sets of restriction enzymes and cloning vectors are useful to increase the representation of RNA species.
2o Examples of other useful enzyme sets are offered. Using the same pALLgenes vector and Not (, the following restriction enzymes will adequately replace Msp I: Taq I, Acc I, Hpa I and Nar I. Selecting a different rare-cutter is possible by way of constructing a new replacement linker wherein the Not I cloning site is replaced by one of the following: Nar I, Nru I, Nhe I, Nde I and Nsi I. A further example is constructing a new replacement linker wherein the compatible cohesive end recognition site for Cla I is replaced by one for either BamH I or EcoR V. For such a construction, inserts are generated by digestion with 1 } Sau3A I or 2) Alu I, s Hae III and Nru I, respectively, and an appropriate rare-cutter (as listed above). Permutations of the above provide an extensive set of enzymes with which to generate additional gene tag libraries.
In one embodiment, the present invention also detects the DNA
sequence of differentially-expressed genes. Since the address of 1o hybridizing gene tags is known (i.e. location of signal on the ordered array), it is possible to retrieve and sequence the corresponding full-length transcript from a prepared library in two steps. In the first step, the entire Msp I- Not I fragment derived from the 3' end of the original transcript is selectively amplified by PCR with the aid of the target sequence and an 1s internal vector sequence as reaction primers. The PCR product is sequenced. In the second step, a suitable primer is designed using the sequence from the first step. A suitable primer is an oligonucleotide sequence having characteristics which favor specific primer annealing to the template DNA including having minimal secondary structure and 2o balanced gc content. Such primers are about 18- to 23-mer in length and have a melting temperature approximately that of the primer at the opposite extremity of the amplicon. Primers, along with an internal sequence from the phage vector are used to selectively amplify by PCR the substantially full-length cloned transcript. This PCR product is subsequently sequenced to obtain the remaining unknown sequence of the transcript. This feature permits rapid and efficient identification of RNAs underlying a positive hybridization signal.
Monitoring quantitative differences in the levels of rnRNA expression s is an aspect of the present invention. Since the probe generation protocol involves a linear amplification step, low-abundance transcripts are detectable using the method disclosed herein. Quantitative differences in the levels of mRNA between populations of cells are also detectable, but careful dosing of starting materials and adequate sensitivity and dynamic to range of the signal detection system (e.g. Phospholmager from Molecular Dynamics, Sunnyvale) are important conditions of the operation. The concentrations or amounts of extracted RNA, cell transformants, enzymes, incubation times, etc., are also precisely measured and controlled in order to ensure justifiable comparisons later on in the results. To detect subtle ~s differences in two cell populations, it is important to manipulate populations to be compared in as identical a way as possible, mutatis mutandis.
Alternatively or in conjunction, it is useful to calibrate and normalize different signal levels between arrays since all hybridizing signals will serve as internal controls for expected signal intensities.
2o Particular attention is directed to the production of the arrayed membrane or biochip. In many embodiments, once the array substrate is optimized by choice of substrate material, method of target oligonucleotide attachment and other appropriate conditions for sensitive, specific, and reproducible probe hybridization, replicas of the array are produced for testing expression-based differences in an unlimited set of different organisms. The production and optimization of the array is likely the most time-consuming and expensive part of the technology since in many embodiments thousands of oligonucleotide sequences must be s appropriately immobilized onto a solid phase substrate. The substrate must be capable of creating a stable attachment of each oligo sequence at spatially discrete loci. The method of attachment is by any available method. Particular note is made of attachment by bonding reactive groups joined to the oligonucleotide either by UV cross-linking, chemical treatment, heat treatment or other methods used to generate a stable bond between the oligonucleotide and the solid phase substrate that is resistant to multiple wash steps. In particular applications, oligonucleotide sequences are synthesized separately and subsequently attached to a membrane using a gridding robot. Alternatively, the oligonucleotide sequences are ~s synthesized directly on a biochip substrate by photolithographic methods.
Attachment of the oligonucleotide is usefully carried out in a such a way so that all nucleotides of the oligonucleotide sequences are accessible to complementary binding by a non-immobilized probe sequence. As noted, this is often accomplished by insertion of a spacer molecule permitting the 2o target nucleotide sequence to float above the membrane surface ands thus be more available for hybridization to complementary sequences present in the probe cocktail. Finally, the substrate is more useful in a format that is convenient to scan or read signal intensities from.
An array serves as a fixed matrix of possible permutations of oligonucleotide sequences upon which gene tags from any test genome are applied to determine differential expression profiles of genes. In many instances a single method for generating a gene tag library is used for many 5 different organisms. Particular note is made of the assay utility in identifying genes expressed as a result of chemical, biological and physical induction (or repression), disease (or resistance) state, varietal specificity, tissue specificity, developmental specificity, or mutation. It is particularly useful to immobilize the ordered array of target sequences onto a nylon membrane, a chip, or similar format. The chip and other small formats offer advantages in portability and convenience, such as with a diagnostic kit.
A significant use of the present invention is in gene identification related to complex traits. Genes involved in complex traits, which by ~5 definition arise from the action and interaction of multiple genetic sites, have historically posed substantial difficulty for identification. The present technology is a useful tool for genetic analysis of complex traits because it allows the single time point mapping of quantitatively differential gene expression; encompassing both up-regulation and down-regulation. As 2o such, this approach does not rely on the identification (predominantly by positional cloning methods) of structural mutations in genes as contributors to the trait but rather on altered amounts of gene products associated with the trait. In a particular embodiment of the claimed method of gene expression profiling and monitoring provides a non-ambiguous portrait of multiple genes expressed simultaneously in response to a given stimulus (induction agent, disease, infection, toxin, etc.) thus allowing multiple contributors to a trait to be identified. It is understood that the instant method is usefully used in parallel and complementary experimentation with other gene identification strategies such as genetic mapping and cloning methods.
The scope of the invention is understood to reach all organisms. The analysis of the genomes of any organism, whether animal, plant, fungal, bacterial or viral, is possible in so far as probes from RNA libraries can be 1o produced from these organisms. The target array configuration remains fixed regardless of the type of genome assayed. This technology is advantageous in gene identification studies in genomes which have not been extensively studied at the genomic level, (e.g. genomes with limited genetic marker maps, physical maps or sequence information) since little a priori information is needed to identify candidate genes. As an example, the identification of candidate genes involved in disease or pathogen resistance in agronomically important crop plants or domestic livestock is noted.
The present invention is usefully employed in drug and toxicology 2o screening, discovery and development. The instant DD methodology permits monitoring drug response at the gene expression level and locating genes involved in a particular response. Similarly, diagnostic uses based on the assay for known gene expression profiles related to specific induction or disease states (e.g. monitoring expression of known genes which are expressed during cancer or other illnessl. This aspect of the application of the technology will be of particular use for pharmacogenomic research in evaluating how variability in genetic background influences positive or negative response to a drug.
s The following examples disclose the methods of the present invention. Particular reference is made to available texts in this area, the teachings of which are incorporated herein by reference:
1 . Current Protocols in Molecular Biology, F.M. Ausubel et al. (John Wiley and Sons, inc., New York ( 1988)) ISBN 047150338X.
to 2. Molecular Cloning: A Laboratory Manual, J. Sambrook et al. (Cold Spring Harbor Laboratory Press (1989)) ISBN 0879693096.
3. A Practical Guide to Molecular Cloning, Bernard v. Perbal et al.
(John Wifey and Sons, lnc., New York (1988)) ISBN 0471850713.
4. Short Protocols in Molecular Biology, (Second Edition), F.M.
1s Ausubel et al. (John Wiley and Sons, Inc., New York (1992)) ISBN
0471577359.
5. Gene Transfer and Expression, Michael Kreigler (W.H. Freeman &
Company, New York ( 1990) ISBN 0716770040.
Although these alternatives have improved the original method, to problems associated with RNA quality, false positives, and reproducibility still remain.
A DD methodology which overcomes shortcomings exposed earlier has now been developed.
Is Summary of the Invention This invention comprises a method of determining differential display of gene expression comprising the steps of:
(a) preparing at least two substantially identical accessible ordered arrays of synthetic substantially mono-length oligonucleotide DNA
2o segments representing permutations (and optionally substantially all or all permutations) of possible oligonucleotide sequences, (b) preparing partial-length first cDNA library being a partial-length segment library corresponding to substantially all complementary expressed mRNA sequences of a first gene source;
(c) using the library of step (b) and preparing mono-length first cRNA
library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA sequences of the first gene source;
s (d) preparing substantially expression-length first cDNA library corresponding to or complementary with expressed mRNA sequences of the first gene source, wherein said library is a substantially expression-length transcript library;
(e) preparing like-condition partial-length second cDNA library being a to substantially partial-length segment library corresponding to or complementary with expressed mRNA sequences of a second gene source;
(f) using the library of step (e) and preparing like-condition mono-length second cRNA library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA
Is sequences of a second gene source;
(g) preparing like-condition substantially expression-length second cDNA library corresponding to or complementary with expressed mRNA
sequences of the second gene source, wherein said library is a substantially expression-length transcript library;
20 (h) probe hybridizing the first and second mono-length segment libraries, each with an ordered array of step (a); and (i) comparing the probe hybridized accessible ordered arrays to determine differential hybridization display sites between first and second mono-length segment libraries; and, (k) referencing differential hybridization sites to at least one of said expression-length libraries to locate the gene of expression differential. In one embodiment the method further comprises amplifying partial-length segments corresponding to cRNA from the differential hybridization display sites by amplification of corresponding hybridization segments of a corresponding partial-length library prior to referencing to at least one of said expression-length libraries to identify the gene of expression differential. It will be understood that in particular an expression-length library is complementary to the mRNA of the gene source and in other to embodiments an expression-length library is a corresponding sequence.
Conversion sequences from complementary to corresponding is well understood in the art.
Within the practice of the method, a significant determined differential is noted to be a nucleotide binding differential.
Is In the practice of this invention mono-length libraries are prepared from partial-length libraries by cleaving said partial-length cDNA
corresponding to or complementary with expressed mRNA sequences by a remote-site restriction enzyme, such as with Bpm I. It is understood that such libraries can be either cRNA or cDNA depending on the procedures 2o employed. It is further understood that one strand of cDNA will be complementary to the mRNA and on strand will be corresponding.
In some embodiments, gene sources for expressed cellular cDNA
from mRNA are from cells in either affected or non-affected states.
Particular note is made of mono-length sequences that are cRNA
elements 23 nucleotides in length of the formula 5'-gcuggagaucggnnnnnnnnnnn-3', wherein "n" represents any nucleotide and the 1 1 n nucleotides correspond to a complementary1 1 nucleotide s sequence of mRNA. In some embodiments the mono-length oligonucleotide will be substantially about 13-mer and in others substantially about 23-mer.
One embodiment of a mono-length cRNA segment library comprises a provided nucleotide portion and a inquiry nucleotide portion, and in some embodiments the provided nucleotide portion comprises 5'-gcuggagaucgg-3', or 5'-gcuggagau-3', or optionally at least about 9-mer and the inquiry portion about 14-mer. In some embodiments the inquiry portion comprises a constant and a variable portion and the variable portion is at least about 9-mer or about 1 1-mer, and, optionally, the constant portion of the said inquiry nucleotide portion is about 3-mer. In some such embodiments it is is contemplated that the provided portion comprises 5'cuggag3'. It is further contemplated that the inquiry portion has a 3'-end and a 5'-end, and said 3'-end is 5'cgg3'.
Depending on the embodiment, the inquiry portion is useful in a range of from about 10- to about 36-mer, the provided portion is useful in a 2o range of from about 1- to 20-mer, the variable portion in a range of from about 9- to about 36-mer, and mono-lengths of about 1 1- to about 56-mer.
Consequently, nucleotide segments of the target array as usefully from about 10- to about 56- mer. The inquiry portion is understood to optionally include a constant portion of about 1- to about 6-mer.
The method also includes the differential being determined by comparing comparison reporters selected from the group consisting of chemiluminescent labeling detection or radioactive labeling detection of s hybridization of cRNA sequences. Such detection permits quantitative gene expression analysis based on reporter differentials from complementary oligonucleotides bound to a membrane or DNA micro array, or combinations of such methods. With such methods, comparison is usefully made by alternating steps of quantifying said comparison reporters and incremental to washing between about 40°C and 70°C. Useful increments intervals of about 5 ° or 6 ° C for broader determinations and about 3°C or less for finer determinations. Increments of about about 1 ° or less, and more particularly about 0.5°C are useful far the highest level of discrimination, and are useful with careful attention to temperature regulation.
is The method further includes embodiments in which substantially identical ordered arrays of synthetic mono-length oligonucleotides are accessibly affixed to a substrate, optionally on a nylon membrane, or a biochip. Oligonucleotides are usefully attached to substrate by way of an intervening spacer molecule of at least about 6-carbons, or at least about a 20 12-carbons.
In another aspect, the invention comprises a substantially mono-length segment cRNA library wherein the mono-length segments comprise cRNA corresponding to or complementary with substantially all expressed mRNA sequences of a gene source, and optionally, wherein the mono-lengths are substantially about 23-mer. In some embodiments, said mono-length segments are about 14-mer with about 1 1-mer corresponding to or complementary with the mRNA.
In another embodiment, the invention includes a vector-insertable linker tsuch as a BssH II recognition site adjacent to) comprising a promoter site such as a T3 RNA polymerase promoter site in functional connection with a Bpm I recognition site readably adjacent to a Cla I recognition site which is adjacent to a Not I recognition site, which is adjacent to a Kpn1 recognition site which is adjacent to a T7 RNA polymerase promoter site to which is adjacent to a Bss HII recognition site. One such vector for the vector-insertable linker is pBluescript II SK + cloning vector.
Noted is a vector-insertable linker comprising a T3 RNA polymerase promoter site in functional connection with a Bpm I recognition site, adjacent to Cla I/Msp I residual recognition site adjacent to an insert, and is particularly a 14-mer insert, from a cDNA probe source adjacent to a Not I
recognition site adjacent to Kpn1 recognition site adjacent to either a T3 or T7 promoter site which is adjacent to a Bss HII recognition site. Optionally, the aforementioned vector-insertable linker is inserted into the pBluescript II
SK + cloning vector.
2o In yet another embodiment, the invention comprises a mono-length cRNA library corresponding to or complementary with expressed mRNA
sequences, wherein the library consists of cDNA elements 23 nucleotides in length of the formula 5'-gctggagatcggnnnnnnnnnnn-3', and wherein n represents one of 1 1 nucleotides in sequence corresponding to a complementary 1 1 nucleotide sequence of mRNA.
In another embodiment, the invention includes a method of determining differential display of gene expression comprising the steps of:
s (a) preparing at least two substantially identical accessible ordered arrays of synthetic substantially mono-length oligonucleotide DNA
segments representing all permutations of possible oligonucleotide sequences, (b) preparing a partial-length first cDNA library being a partial-length segment library corresponding to substantially all complementary expressed mRNA sequences of a first gene source;
(c) using the library of step (b) and preparing mono-length first cDNA
library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA sequences of the first gene is source;
(d) preparing like-condition partial-length second cDNA library being a substantially partial-length segment library corresponding to or complementary with expressed mRNA sequences of a second gene source;
(e) using the library of step (e) and preparing like-condition mono-length second cRNA library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA
sequences of a second gene source;
(f) probe hybridizing the first and second mono-length segment libraries, each with an ordered array of step (a); and (g) comparing the probe hybridized accessible ordered arrays to determine differential hybridization display sites between first and second mono-length segment libraries; (such as in forensic applications) and in some embodiments, (h) referencing differential hybridization sites to at least one a nucleotide sequence data base to identify the gene of expression differential. With such method it is useful to amplify the mono-length segments of the differential hybridization display sites with corresponding hybridization segments of a partial-length library prior to referencing the 1o data base to identify the gene of expression differential. It is understood that a substantially expression-length library data base is usefully employed in some embodiments of this aspect of the invention.
The invention includes a method of detecting a point mismatch probe nucleotide in short mono-length nucleotides derived from partial-length ~5 segment or expression-length segment nucleotide libraries comprising (a) hybridizing probe nucleotides against an accessible ordered array of synthetic substantially short mono-length target oligonucleotide DNA
segments representing all permutations of possible oligonucleotide sequences 2o wherein the probe nucleotides and the target nucleotides are of substantially equal length, and (b) quantitatively assessing hybridization sites (c) washing these hybridization sites under stringent conditions (d) detecting post-washing variation in hybridization sites representative of a point mismatch nucleotide, and, in particular embodiments further (e) referencing the site variations) to a partial-length or expression s length segment of at least one of the libraries. In this method, optionally, the probe nucleotide has a constant portion and an inquiry portion, and the target site oligonucleotide has a portion complementary to the constant portion of the inquiry oligonucleotide. In such method, the washing of step (c) usefully further comprises the step of inhibiting competitively hybridization of the provided portions.
In a particular embodiment the invention comprises a mono-length gene tag library, wherein the mono-length gene tags are either cDNA or cRNA. The mono- length insert segments are in specific embodiments about 23-mer for cRNA and about 23-by for cDNA, and optionally about 15 14-mer or 14-by segments respectively correspond to or are complementary with expressed mRNA sequences of a gene source. In one embodiment of a cRNA mono-length library, mono-length segments are substantially 23-mer in length.
Brief Description of the Drawin4s ?o Fig. 1. (A)-(F) is a diagrammatic flow chart of the differential display technique in stages.
Fig. 2(A)-(D) provides sequence information for specific vector, probes and targets.
Fig. 3.is a flow chart of procedures in a particular embodiment.
SUBSTITUTE Sf-I~ET (RULE 26) Fig. 4(A), (B) and (C) depict accessible array test results.
Detailed Descr~tion of the Invention This invention will be better understood with reference to the s following definitions:
A. Gene is a unit of genetic information. In one embodiment it is a region of DNA that encodes information for a discrete gene product that is a protein or RNA. Depending on the organism, genes can reside in DNA or RNA form. As used herein, genes shall be expansively understood to mean genetic information that encode for cellular products which are required for sustaining normal cellular processes in all living organisms. Genes are typically organized into coding and non-coding sequences (known as exons and introns, respectively) and are also comprised of other elements which control expression is (e.g., promoters, enhancers, operons, and other regulatory regions).
Particular reference is made to the use of the present technology as applicable to the analysis of gene expression in any organism including without limitation, viruses, bacteria, fungi, plants, and animals including humans, and further including neoplasms and cell 2o cultures including chimeric cell types.
B. Expressed genes or expressed nucleotide sequences as to a gene source shall mean the mRNA expressed in a living cell or in a cellular system under the conditions of culture. For stability and ease of manipulation, in many embodiments the mRNA expressed is then translated into complementary DNA (cDNA) for differential diagnostic procedures.
C. Differential Display of gene expression shall mean an assay system which detects differences in the level or type of expression of s particular genes by comparison of two populations of cells.
Typically, DD is capable of monitoring a wide range of expressed genes up to and including the entire complement of expressed sequences in a given cell population. Because only expressed genes are monitored in relation to a particular trait by the present 1o technique, DD represents a functional genomics approach with respect to gene discovery experimentation. DD by the present method permits the identification of single and multiple genes whose regulation has been perturbed and/or are involved in deregulation.
Deregulation may involve gross or subtle changes in gene expression is including under-expression (including zero expression) and over-expression (including always present expression).
D. Detected differentials arise from a number of affected and non-affected gene sources. Affected cells shall mean cells which are diseased, or which have been induced by chemical agents, toxins, biological 2o agents, etc. Non-affected shall mean the opposite, or lack, of the particular property under study in the affected cell. Non-affected shall also mean what is generally referred to in scientific studies as a control subject, as e.g. in a case-control association study. Again.
without being bound by any particular theory. it is believed that in particular cases such differentials will arise from, without limitation, (i) gene induction or repressor agents (e.g. agents which through their presence can activate, or repress, gene transcription and 5 thereby causing altered or deregulated metabolism) (ii) chemical toxins (e.g. chemical compounds which through their presence in sufficient concentration will poison the cell) (iii) physical stress (e.g., cold or heat shock, UV exposure etc.) (iv) disease or resistance state (e.g. susceptibility to disease or 1o reduced vigor caused either by genetic factors or by pathogens, viruses, bacteria, diet and other environmental factors) (v) tissue specificity (e.g. expression which is specific to brain, muscle, liver, spleen, etc.) ~5 (vi) strain or varietal specificity (e.g. particularly in Ag-bio applications: weight, height, color, etc.) (vii) developmental specificity (e.g. genes specifically expressed during specific developmental time periods such as prenatal, young, adult, aged etc.), and (viii) gene mutation (e.g. deletions, insertions, expansions and amplifications, point mutations (transitions, transversions), missense mutations, nonsense mutations etc.).
E. Library shall mean substantially the totality of genetic material in the sample which the library defines. Thus an expressed gene library shall represent substantially the totality of the expressed genes. It is understood that depending on the library construction methodology, some segments of genes will not be included in the library. By way of example, a library constructed from Msp I and Not I and a vector cloning site compatible with these enzymes results in a library that is greater than about 90% (estimated) representative of the totality of expressed sequences. The proportion of representation depends on multiple factors including the cutting frequency of the enzymes used, the nature of cut sites and cloning efficiency. The palindromic recognition site of Msp I is compatible with CpG islands which tend to be located in gene-rich regions of the genome. Additionally, since Not I is a rare cutter and Msp I is a frequent cutter, the vast majority of fragments will be Msp I - Not I. In the pALLgenes construct disclosed, only Msp I-Not I fragments will insert into the vector. The remainder of fragments will be Not I - Not I fragments or fragments which do not have an Msp I site, both of which are not vector insertable and thus are not represented in the library.
F. Substantially mono-length segment library shall mean a library comprising gene tag probes, either cDNA or cRNA, that are short --2o from about 9-mer to about 56-mer (or -by as to DNA) -- and uniform in length. Uniform in length shall be understood to mean either exactly equal, or in particular instances, about ~ 15%, and preferably about ~ 5%. Such mono-length segments function as probes and are gene tags representing expressed mRNA sequences from a gene source. Mono-lengths of cRNA are used for hybridization against an immobilized array of complementary target sequences. In the context of this invention, the shortness and uniformity of length favors the specificity and reproducibility of probe s to target hybridization. While it is assumed that an ideal hybridization takes place when both probe and target are exactly the same length, this is not required in every embodiment. Some variation in length (e.g ~ 1 or 2-mer) will be tolerated in some cases (depending on the application) while still producing a high degree of 1o specificity and reproducibility. It will be understood by those skilled in the art that the direction of transcription creates either a complementary or corresponding mono-length library relative to the mRNA.
G. Provided nucleotide portion shall mean the nucleotide sequence specific s to the probe which corresponds to a segment of vector sequence before recombinant insertion of any foreign sequence. In the embodiment of Fig. 2, the sequence of the probe (or mono-length segment) is 5'-gcuggagaucggnnnnnnnnnnn-3', which is exactly 23 nucleotides in length. Starting from the 5' end, the probe sequence 2o breaks down as follows:
1 ) 5'g3' is a provided nucleotide portion; the first base to be polymerized by the T3 RNA polymerase enzyme immediately downstream and the last nucleotide of the T3 promoter site. The complete T3 RNA polyrnerase promoter site is 5'aattaaccctcactaaaggg3'.
2) 5'cuggag3', an element of the provided nucleotide portion is the recognition site of Bpm I restriction enzyme. Bpm I docks onto this s site and cuts 16 nucleotides downstream from this site (or 14 nucleotides on the opposite 3' strand).
3) 5'au3', is an element of the provided portion. This sequence is part of the Cla I recognition site. Prior to insertion, the full Cla I
recognition site is 5'atcgat.
to 4) 5'cgg3', is the constant moiety of the inquiry nucleotide portion.
Because it was originally the 5' end of the DNA (5' Msp I- Not I 3' cDNA fragment) and becomes part of the Cla I / Msp I recognition site, it is constant. Starting from 5'cgg3' and proceeding all the way to the 3' end of the probe sequence, the inquiry portion is then a is 14-mer and corresponds to the cDNA-inserted portion of the vector.
5) 5'nnnnnnnnnnn3' is the variable moiety of the inquiry portion in this example. It is cRNA corresponding to the first 1 1 bases after the Msp I recognition site of the 5' end of the Msp I - Not I cDNA fragment reverse-transcribed from mRNA of the genomic material being examined. The n's 2o represent the nucleotide or ribonucleotide bases (ie. n = a (adenine), g (guanine), or c (cytosine), and t (thymine) or a (uracil) (t with DNA, a with RNA) thus there are 4" permutations of possible sequence of the 1 1-mer nucleotide.
H. Inquiry nucleotide portion shall mean the nucleotide sequence specific to the probe which corresponds to the insert sequence after recombinant insertion of a foreign nucleotide sequence. The inquiry nucleotide portion is variable for any given probe generated from a s given vector. In the embodiment of Fig. 2, the inquiry nucleotide portion is a 3' cRNA tag since its sequence is ultimately derived from the 3' reverse-transcribed portion of expressed mRNAs. In particular embodiments "short" mono-length segments, that is from about 1 1-to about 56- mer are useful, with particular reference to 9- to 13-to mer, and particularly about 9-mer. Such short mono-length segments consist of a "provided nucleotide portion" of from about 1- to about 20-mer. A provided nucleotide portion is made continuous with an "inquiry nucleotide portion" of from about 10- to about 36-mer, with particular reference to about 12-to about 16-mer is and more particularly about 14-mer. The inquiry portion can be further segregated into a variable portion or moiety and a constant portion or moiety. The variable moiety is noted in particular embodiments as being from about 9- to about 36-mer with particular reference to about 1 1-mer and the constant portion of about 1- to 2o about 6-mer with particular reference to about 3-mer.
I. Partial-length segment library shall mean a cDNA library generated by cloning of partial, as opposed to complete, sequence representations of expressed RNAs. By way of example, Msp I - Not I DNA
fragments cloned into the pALLgenes vector (Example 3) constitute a partial-length segment library. The construction of the pALLgenes vector is defined in Example 2. The partial-length segment library is, in a particular embodiment, a precursor library used to generate the short mono-length cRNA segment library by digestion with Bpm I
5 followed by polymerization by T3 RNA polymerise enzyme. It is also the library used in an intermediate selective amplification step in one back figuring strategy used to obtain the entire sequence of the differentially expressed gene (Fig. 3). It is noted that Not I shall mean the restriction endonuclease or enzyme from Nocardia otitidis-to caviarum. Msp I shall mean the restriction endonuclease or enzyme from Moraxella species. Cla f is the restriction endonuclease or enzyme from Caryophanon latum. Bpm I shall mean the restriction endonuclease or enzyme from Bacillus pumilus which was obtained from New England Biolabs (Degtyarev, S.K. and Morgan, R; NEB
t5 Culture Collection 71 1, Beverly, MA, USA). T3 RNA polymerise shall mean the bacteriophage T3 DNA-dependent RNA Polymerise (Morris, C.E. et al., Gene 41: 1931 that recognizes a bacteriophage-specific promoter and initiates synthesis of RNA on double-stranded DNA templates. T3 initiates synthesis of RNA in a particular 20 orientation on a double stranded DNA template. Examples of other RNA polymerises include bacteriophage SP6 and T7 DNA-dependent RNA Polymerise (Butler, E.T and Chamberlain, M.J., J. Biol. Chem.
257: 5772; Davanloo, P. et al., Proc. Natl. Acid. Sci. 81 : 2035).
WO 00/14273 PCT/CA99/00?89 J. A library of substantially expression length sequences shall mean a library comprising a substantially complete representation of substantially complete sequence mRNA. This library, reversed transcribed to cDNA, is conveniently cloned into a suitable phage s vector for subsequent retrieval of the predominantly 5' end of the RNA sequence using a back figuring strategy (Fig. 3). Using this strategy, the entire sequence of any differentially expressed gene can be obtained by selective amplification and sequencing since this library confiains substantially full-length cDNA sequencing templates.
As RNA is relatively less stable than DNA and may possess secondary structures which inhibit or prevent elongation by polymerase enzymes, manipulation artifacts may cause these templates to be substantially, as opposed to completely, full-length.
However, when required, obtaining a complete sequence can be is achieved by supplementary strategies and steps. For example, it is possible to generate an expression-length cDNA library using 5'-end mRNA-based capture methods (Edery, I, Chu, L.L., Sonenberg, N., and Pelletier, J., "An efficient strategy to isolate full-length cDNAs based on an mRNA Cap Retention Procedure (CAPture)," Mol. Cell.
2o Biol. 15: 3363-3371, 1995). Of course, with the availability of searchable databases of expression length sequences or other representative sequences, in some embodiments, it is possible to search directly from partial-length sequence to database without need for hybridization to a substantially expression-length library.
WO 00/14273 PCT/CA99/007$9 K. DNA shall be expansively understood to mean a molecule of deoxyribonucleic acid, comprising the component purines and pyrimidines a, t, g, and c in any combination. DNA is single-stranded or double-stranded, and in certain systems, triplex. Sources of DNA
include genomic sources (e.g., from the cell nuclei of living organisms), recombinant vectors (e.g., plasmids, phage, cosmids, BACs, YACs, etc.) or artificial synthesis products (e.g., in vitro polymerized sequences such as PCR products, sequencing reaction products, etc.l. A string of DNA is polymerized by addition of 1o nucleotides. In some instances, the term nucleotides is broadly intended to encompass ribonucleotides, particularly in discussions of RNA.
L. RNA shall be expansively understood to mean a molecule of ribonucleic acid comprising the component purines and pyrimidines a, u, g, and c in any combination and particularly messenger ribonucleic acid (mRNA). RNA is generally transcribed from DNA. mRNA is single-stranded and normally contains a poly-A tail (i.e., a string of adenines) at its 3' end. A string of RNA is polymerized by addition of ribonucleotides.
2o M. Like-condition replication shall mean substantially similar culture and library preparation conditions but for a perturbation in the comparison replication. (e.g. induced vs. non-induced, diseased vs. normal, trait-plus vs. trait-minus, etc.). In the method of this invention, care in preparing corresponding affected/non-affected cell libraries reduces or eliminates extraneous differentials of expression.
N. by and -mer shall mean the abbreviated forms of the terms base pairs) and oligomer(s), respectively. The number of oligomers shall mean s the number of nucleotides contained in a given single stranded DNA
or RNA string. For example, a 20-mer refers to a single-stranded DNA or RNA sequence comprising 20 bases. A base pair refers to the number of nucleotides contained in a given double-stranded complementary DNA string. Thus, a 20-by fragment refers to a double-stranded DNA sequence comprising 20 complementary bases.
O. Remote-site restriction endonucleases shall mean enzymes which do not fall within the normal category of restriction endonucleases which typically recognize palindromic sequences and cut within the palindromic sequence. Remote-site restriction enzymes typically do is not require perfect palindromes as recognition sites and typically cut at a distance from their specific recognition site. In the practice of this invention it is contemplated that these remote distances are in a range of between about 3 and about 50 nucleotides as being particularly useful. Particular note is made of Bpm I with a 2o recognition site of 5'ctggag3' and also of the isoschizomer Gsu I.
Examples of other remote-site restriction enzymes include Bsa I, BseR I, BsmF I, Fok l, Hga I, Hph I, Mbo II, Mnl I, Ple I and SfaN I.
P. An ordered array or oligonucleotide array detector refers to an ordered array of specific oligonucleotides affixed or immobilized on a substrate suitable for carrying out multiple hybridizations at one time. "Specific" as to oligonucleotides means a collection of oligonucleotides which comprise all possible permutations of nucleotide sequences. The ordered function arises from identified oligonucleotides being attached at specific and known loci. Substrate shall mean any solid phase surface to which oligonucleotides, whether with or without linker or spacer groups, may be attached, e.g., nylon membrane, glass slide, biochip. Attachment methods include chemical treatment (e.g., EDC1, UV cross-linking, covalent bonding, or other methods which allow the oligonucleotide to be stably immobilized or affixed onto a solid phase.
Q. Accessible array shall mean that the oligonucleotides of an ordered array are positioned so as to be available for hybridization with minimized steric or mechanical hindrance. Accessibility is achieved by a variety of means including attaching a spacer molecule of from ~5 about 2 or 3 carbons to as many as 20 or more carbons between the oligonucleotide and the substrate. A carbon spacer molecule refers to, as an example, an N-MMT-protected aminododecanol phosphoramidite which is added to the 5' end of an oligonucleotide during its synthesis.
2o R. Indicator of complementarity shall mean a reporter system for the purposes of establishing the occurrence and the amount of hybridization resulting from complementary base-pairing of target to probe sequences. As examples, the reporter may be a signal emittor such as a radioactive label, fluorescent label or chemiluminescent label physically attached to the probe so as to permit signal detection by a fluorescence scanner, confocal microscope, phosphor screen, photographic paper or other signal detection system. In some embodiments, the indicator of complementarily is of sufficient 5 sensitivity and dynamic range so as to be capable of detecting a minimum of hybridization (including zero complementarity) to a maximum of hybridization (including all sequences hybridized).
S. Incremental wash conditions shall mean washing the hybridized array with incrementally stringent conditions, usually by changing to temperature or salt concentrations in the wash buffer. At each incremental wash step, the hybridization pattern is monitored and recorded and subsequently analyzed as to changes. By comparing target sequence information and the different hybridization patterns obtained under different stringency conditions, a determination of the t5 differential expression signals is made. By carefully monitoring small changes over small increments wash conditions, it is possible to distinguish false positives (e.g. differences due to SNPs or GC-rich sequences) and false negative signals (e.g. differences due to secondary structure or rare RNAs) from actual.
2o T. Target shall mean a DNA sequence immobilized onto a solid phase substrate for hybridization with complementary sequences. In the context of one example provided, the target shall be a correspondingly short complementary sequence substantially identical in length to the probe Ihere, about 23-mer). However, given cost and feasibility constraints related to working with conventionally-synthesized oiigos and membranes, it is also contemplated that the target is not always of identical length (ranging from about 8-mer to about 23-mer in particular examplesl.
s In such circumstances, specificity and reproducibility of probe to target hybridization is augmented with "blocker" oligos added during the hybridization step to mask complementary sequences (competitive inhibition) in the provided nucleotide portion of the probe thus preventing this portion from interfering with base-pairing to of inquiry nucleotide portion with the target (Fig. 2).
U. Complex traits shall mean physiological, physical, or metabolic traits which cannot be explained by a single causal gene but rather by multiple genetic and/or environmental factors. For example, a trait may arise as a result of multiple gene products, interacting, and/or is interacting in a dose dependent manner. Dissection of complex traits, therefore, requires special tools that monitor multiple gene events, including inactivity activity, and level of activity, simultaneously.
V. Gene site anomaly shall mean a DNA sequence which is deviant from 2o the normal, control or non-affected state, e.g., single nucleotide polymorphisms (SNP), point mutations including transitions and transversions, deletions, etc. In some instances, a gene site anomaly causes lack of expression, over-expression, silent and functional changes in protein sequences, differential splicing of RNA, etc.
An overview of the process of the technology is presented in Fig. 3.
Fig 3 discloses one line of cells to be compared as to affected and non-affected states or populations. First, mRNA is extracted and reverse-transcribed to generate a cDNA library as to both the affected and non-s affected states. This cDNA is then used to generate an expression-length library for each population consisting of complete gene sequences and a partial-length library consisting of the 3' end of gene sequences. The latter library is digested with a remote-site cutter followed by polymerization with an RNA polymerase to generate a short mono-length library consisting of to cRNA probes. These probes are used to hybridize against a set of complementary and specific oligonucleotide targets. Hybridization patterns, conveniently on a chip or microarray, are compared between populations of cells to in order to determine differential gene tag signals corresponding to a gene or genes of interest. If desired, these signals can be traced back to Is an address on the target array and thus an oligonucleotide sequence. The latter sequence is used to produce one of the primers required for selectively amplifying and sequencing the 3' portion of each gene of interest from the partial-length library. In turn, 3' sequence of each gene of interest is used to generate one of the primers required for selectively 2o amplifying and sequencing the remaining 5' portion of each gene of interest from the expression-length library. If the gene of interest is known, comparing the obtained sequence from available genomic databases will identify the differentially expressed gene and possibly its function. If the gene of interest is novel, further experimentation is initiated to identify its function with respect to the trait of interest.
In a broad embodiment, the technology entails generating a library of substantially mono-length cRNA gene tags using a vector and remote-site s restriction enzyme which cuts at a fixed distance downstream from its recognition site. Libraries of such substantially mono-length gene tags are compared between or among two or more populations of cells by hybridization to respective ordered arrays of specific oligonucleotide target sequences. Such oli~onucleotides are conveniently immobilized onto a substrate such as membrane, a chip, or the like to provide stable loci for an "indicator of complementarity." The presence or absence of complementarity -- often as disclosed by hybridization patterns on an immobilized oligonucleotide array under incremental wash conditions -- is noted. These patterns and changes in these patterns, with particular ~ s reference to differentials are analyzed for changes in signal intensity.
Clearly, such analysis is facilitated by the use of computers.
In some embodiments, analysis is performed as described in U.S.
5,795716 to Chee; U.S. 5,800,992 (Affymetrix); U.S. 4,74,043 to Bacus;
Drmanac et al., "An algorithm for DNA Sequence Generation form k-Tuple Zo Word Contents of the Minimal Number of Random Fragments," J. of Biomolecular Structure & Dynamics, 8:1085-1 102 (1991 ); and Southern et al., "Analyzing and Comparing Nucleic Acid Sequences by Hybridization to Arrays of ~ligonucleotides: Evaluation Using Experimental Modefs,"
Genomics, 13:1008-1017 (1992), the teachings of which are incorporated herein by reference.
Target sequences which show significant differences in hybridization patterns between the compared cell populations are traced back to the s appropriate gene tag. A corresponding full-length nucleotide sequence is subsequently retrieved by a first hybridization with a partial-length gene source library and then from an expression-length data base or source library. In some embodiments using libraries, PCR and sequencing using primer sequences respectively corresponding to a) the gene tag sequence of interest and an internal vector sequence and b) the obtained 3' mRNA
sequence and an internal vector sequence is useful.
In a particular embodiment, the invention comprises generating a library of 3'-specific short mono-length cRNA probes (gene tags) using a cloning vector and the remote-site restriction enzyme Bpm 1. Bpm 1 cuts is 16 nucleotides downstream from its recognition site. For a two population comparison, libraries of such gene tags derived from two populations of cells are compared by parallel hybridizations to ordered arrays of specific oligonucleotides immobilized onto a membrane or a chip. The presence or absence of hybridization signals on an array under incremental wash 2o conditions is monitored and analyzed and gene tags which show significant differences in hybridization patterns between the two cell populations are traced back to the appropriate specific oligonucleotide target. The corresponding full-length RNA sequence is subsequently retrieved from prepared cDNA libraries, usually in two steps; by selective PCR and sequencing. This technology permits the rapid and cost-effective identification of candidate genes associated with different states of induction or disease.
In one embodiment a specialized cloning vector, pALLgenesT"~, is s formed as follows. This specialized vector is a modified pBluescript I1 SK
+ cloning vector. (Short, J.M. et al, Nucleic Acids Res. 16:7583-7600 (1988)) in which the original linker is replaced. It is understood that other cloning vectors with replacable linkers are also useful, including by way of example, pGEM, pUC19, and pBR322.
The choice of replacement linker is an important enabler of the pALLgenes vector because it allows the insertion of Msp I / Not I
fragments, and the subsequent generation of mono-length cRNA probe sequences. Other useful linkers are those which contain enzyme recognition sites, arranged in the proper spatial arrangement, so as to is permit, in order, the ligation of cDNA restriction fragments from a gene source, the linearization of the vector by action of a remote-site cutter, and the polymerization of mono-length sequences by action of an RNA
polymerase. The spatial arrangement of the enzyme recognition sites, as well as special design of the enzymes themselves, determines the final 20 overall length of the mono-length sequences. By way of example, the probe length is shortened if the choice of enzymes is one in which their recognition sites overlap in the vector. Conversely, and also by way of example, the enzymes are selectively engineered such that they will be compatible with overlapping recognition sites.
In some embodiments, the linear amplification of inserts during the cloning process permits the detection of quantitative differences in mRNA
expression, including the detection of both abundant and rare or less-represented mRNA transcripts. Because rare mRNAs are less abundant in s the overall ratio of the library transcripts, they tend to be more difficult to detect and can sometimes fall below the threshold of the limits of detection. Linear amplification of the library promotes the detection of rare transcripts while largely conserving the actual proportional relationship of transcript abundance in the gene source.
io The particular spatial arrangement of the Bpm I restriction enzyme in the pALLgenes vector is useful in generating a library of short mono-length probes. Hybridization of these probes to an array of correspondingly short and uniform length target sequences tends to be specific and reproducible compared to hybridizations obtained with variable length probes and ~s targets.
The use of short mono-length probes and targets which are similarly short and mono-length improves capon conventional differential display techniques which tend to generate a multitude of false positive and false negative signals, thereby making the resulting data more difficult to 2o interpret. Without being bound by any particular theory, it is believed that false positive and false negatives are caused by non-specific hybridization of variable, random, and longer mRNA probes since multiple sites will be randomly recognized by a coincidentally complementary target in a longer and more variable length probe. Having a tightly restricted range of probe and target length reduces the overall complexity and randomness of hybridization such that the resulting hybridization patterns are available to be unambiguously interpreted and reproduced.
The particular gene structural region represented by the probe is s useful for detecting functionally important differential sequences or expression. Without being bound by any particular theory, and in view the observations which suggest that 3' and 5' end sequences are more likely to be divergent, there is less interference from false positive hybridization signals arising from gene homolog sequence as the probes generated by io this technology in one embodiment represent the 3' end of expressed gene sequences. Also, because the gene extremities are less conserved, it is more likely that sequence differences will be present in the gene regions represented.
In a particular embodiment, the Bpm I enzyme can also be used to Is generate a library of short mono-length cDNA fragments. Such a library is generated by cutting the cDNA-inserted vector such as the pALLgenes vector with Bpm I, digesting the linearized vector with Not I to remove a section of the inserted cDNA followed by religation of the vector. This procedure results in a cDNA library of gene tags representing equivalent 2o sequences to the cRNA probes described above. Such a library is useful for generating a complementary set of cRNA probe sequences by polymerization from the reverse orientation using T7 RNA polymerase and the T7 polymerase promoter site located within the replacement linker.
In a particularly efficient embodiment, use of floating targets and incremental stringency increases specificity. The use of a spacer molecule attached to each oligonucleotide sequence on the membrane array favors probe hybridization to its complementary target. Particular note is made of s 6- to 20-carbon spacers, with more particular reference to 12-carbon spacers. Oligonucleotide attachment to the membrane is particularly effective with 12-carbon spacers and that this attachment is stable under different wash conditions. 1t is believed that the spacer enables the entire oligonucleotide sequence to "float" freely over the membrane during to hybridization and thus allows maximum availability for complementary sequence hybridization with the probe (Zhang et al. "Single-base mutational analysis of cancer and genetic diseases using membrane bound modified oligonucleotides," Nucl. Acids Res. 19: 3929-3933 (1991 ), the teachings of which are incorporated herein by reference.) Specific conditions to ~s optimize complementary sequence hybridization with the probe vary with membrane type, oligonucleotide concentration, spacer-type, and spacer attachment conditions. Different membrane types will accept attachment of spacer-linked oligos with varying degrees of effectiveness thereby reducing the final amount of target attached to the membrane. The length 20 of the spacer molecule will affect the degree to which the complete nucleotide sequence of the oligo target is exposed and accessible to the complete nucleotide sequence of the probe. Efficient probe hybridization will also depend on the relative annealing strengths of complementary base-pairs. The stronger hydrogen bonding of gc compared to at base-pairing means that gc-rich probe sequences will tend to hybridize to their complementary targets at a greater efficiency in spite of other factors such as spacer length or oligo attachment method. Optimization of these parameters increases the possibility of detecting hybridization differentials s due to a single nucleotide mismatch.
In particular embodiments, the specificity of probe-to-target hybridization is further increased by subjecting the hybridized membrane or array to a series of increasingly stringent wash steps at incremental temperature settings. Under these conditions, the specificity of probe-to-target hybridization is very high and is able to discriminate down to single base mismatches. This feature can be employed as a "variable specificity"
adjustment, according to the particular requirements of the application involved. That is, wash temperature increments on the order of about every 0.5°C to about 1 °C, or about 3°C are useful for single base is mismatches, with about 3°C or greater increments for more gross differentiation.
By way of example, assays seeking to detect single mismatches in a collectively unknown sample of probes would be monitored by finer increments as would be the case in an assay seeking to detect a deletion 2o comprising several nucleotides of sequence which could be effectively monitored at larger temperature intervals. At each stringency increment the hybridization pattern is recorded. Subsequent analysis of the totality of recorded patterns is made to determine the degree of complementarity of the probe with respect to the target. In deducing complementarity, it is important to take into account interacting thermodynamic properties and other factors including the sequence of the target, the melting temperature of the double stranded DNA, gc and at content, self-annealing loop and stem secondary structures, constitutive binding of constant provided s sequences, polymorphism discrimination, temperature, salt concentration and other hybridization buffer conditions, etc. Analysis can be performed by any differential computational method.
In particular embodiments, the present invention permits simultaneously monitoring of all expressed genes. This is accomplished by ~o employing an ordered array with a density of specific oligonucleotide sequences sufficient to cover all or substantially all possible permutations of expressed gene sequence tags. By simultaneously monitoring the expression of multiple genes against a background of all or substantially all possible genes, the technology provides a complete and detailed profile of ~s gene expression. This feature improves upon similar and conventional hybridization-based differential display techniques that are, for example, limited to detecting known RNA sequences. It is estimated that cells of complex organisms, such as humans, contain between 20,000 to 30,000 different RNA species. In one example, probe generation using Msp I and 2o Not I, and hybridization against an array of > 64,000 targets, is sufficient to cover the vast majority of, but not all, expressed genes. The reservation is because Not I - Not I fragments and gene sequences without an Msp I site will not be ligated into the cloning vector and therefore, will not be represented in the library. To compensate for this, the use of additional sets of restriction enzymes proportionally increases the chances of covering the totality of RNA species expressed in a cell since fragments not vector-insertable by a first combination or set of enzymes will likely be vector-insertable with a different combination or set of enzymes and/or a s different set of cloning vectors.
A number of restriction enzymes can be used at particular stages of the process. The invention is not restricted to Not I and Msp I and alternate enzymes will work. Favored enzymes have some notable properties. An important property of Not I enzyme is that it is a rare-cutter.
An important property of Msp I is that it is a frequent-cutter and has a recognition site which favors the probability that it will cut within CpG
islands. CpG islands are associated with regions of the genome that are gene-rich and implicated in DNA regulation. Since CpG islands are more likely to be conserved, a directed evaluation of these genomic regions will ~5 be particularly useful in gene identification studies using founder populations. Additionally, since Msp I - Not I fragments generated by this protocol wilt represent the vast majority, but not all, of the totality RNA
species, alternate sets of restriction enzymes and cloning vectors are useful to increase the representation of RNA species.
2o Examples of other useful enzyme sets are offered. Using the same pALLgenes vector and Not (, the following restriction enzymes will adequately replace Msp I: Taq I, Acc I, Hpa I and Nar I. Selecting a different rare-cutter is possible by way of constructing a new replacement linker wherein the Not I cloning site is replaced by one of the following: Nar I, Nru I, Nhe I, Nde I and Nsi I. A further example is constructing a new replacement linker wherein the compatible cohesive end recognition site for Cla I is replaced by one for either BamH I or EcoR V. For such a construction, inserts are generated by digestion with 1 } Sau3A I or 2) Alu I, s Hae III and Nru I, respectively, and an appropriate rare-cutter (as listed above). Permutations of the above provide an extensive set of enzymes with which to generate additional gene tag libraries.
In one embodiment, the present invention also detects the DNA
sequence of differentially-expressed genes. Since the address of 1o hybridizing gene tags is known (i.e. location of signal on the ordered array), it is possible to retrieve and sequence the corresponding full-length transcript from a prepared library in two steps. In the first step, the entire Msp I- Not I fragment derived from the 3' end of the original transcript is selectively amplified by PCR with the aid of the target sequence and an 1s internal vector sequence as reaction primers. The PCR product is sequenced. In the second step, a suitable primer is designed using the sequence from the first step. A suitable primer is an oligonucleotide sequence having characteristics which favor specific primer annealing to the template DNA including having minimal secondary structure and 2o balanced gc content. Such primers are about 18- to 23-mer in length and have a melting temperature approximately that of the primer at the opposite extremity of the amplicon. Primers, along with an internal sequence from the phage vector are used to selectively amplify by PCR the substantially full-length cloned transcript. This PCR product is subsequently sequenced to obtain the remaining unknown sequence of the transcript. This feature permits rapid and efficient identification of RNAs underlying a positive hybridization signal.
Monitoring quantitative differences in the levels of rnRNA expression s is an aspect of the present invention. Since the probe generation protocol involves a linear amplification step, low-abundance transcripts are detectable using the method disclosed herein. Quantitative differences in the levels of mRNA between populations of cells are also detectable, but careful dosing of starting materials and adequate sensitivity and dynamic to range of the signal detection system (e.g. Phospholmager from Molecular Dynamics, Sunnyvale) are important conditions of the operation. The concentrations or amounts of extracted RNA, cell transformants, enzymes, incubation times, etc., are also precisely measured and controlled in order to ensure justifiable comparisons later on in the results. To detect subtle ~s differences in two cell populations, it is important to manipulate populations to be compared in as identical a way as possible, mutatis mutandis.
Alternatively or in conjunction, it is useful to calibrate and normalize different signal levels between arrays since all hybridizing signals will serve as internal controls for expected signal intensities.
2o Particular attention is directed to the production of the arrayed membrane or biochip. In many embodiments, once the array substrate is optimized by choice of substrate material, method of target oligonucleotide attachment and other appropriate conditions for sensitive, specific, and reproducible probe hybridization, replicas of the array are produced for testing expression-based differences in an unlimited set of different organisms. The production and optimization of the array is likely the most time-consuming and expensive part of the technology since in many embodiments thousands of oligonucleotide sequences must be s appropriately immobilized onto a solid phase substrate. The substrate must be capable of creating a stable attachment of each oligo sequence at spatially discrete loci. The method of attachment is by any available method. Particular note is made of attachment by bonding reactive groups joined to the oligonucleotide either by UV cross-linking, chemical treatment, heat treatment or other methods used to generate a stable bond between the oligonucleotide and the solid phase substrate that is resistant to multiple wash steps. In particular applications, oligonucleotide sequences are synthesized separately and subsequently attached to a membrane using a gridding robot. Alternatively, the oligonucleotide sequences are ~s synthesized directly on a biochip substrate by photolithographic methods.
Attachment of the oligonucleotide is usefully carried out in a such a way so that all nucleotides of the oligonucleotide sequences are accessible to complementary binding by a non-immobilized probe sequence. As noted, this is often accomplished by insertion of a spacer molecule permitting the 2o target nucleotide sequence to float above the membrane surface ands thus be more available for hybridization to complementary sequences present in the probe cocktail. Finally, the substrate is more useful in a format that is convenient to scan or read signal intensities from.
An array serves as a fixed matrix of possible permutations of oligonucleotide sequences upon which gene tags from any test genome are applied to determine differential expression profiles of genes. In many instances a single method for generating a gene tag library is used for many 5 different organisms. Particular note is made of the assay utility in identifying genes expressed as a result of chemical, biological and physical induction (or repression), disease (or resistance) state, varietal specificity, tissue specificity, developmental specificity, or mutation. It is particularly useful to immobilize the ordered array of target sequences onto a nylon membrane, a chip, or similar format. The chip and other small formats offer advantages in portability and convenience, such as with a diagnostic kit.
A significant use of the present invention is in gene identification related to complex traits. Genes involved in complex traits, which by ~5 definition arise from the action and interaction of multiple genetic sites, have historically posed substantial difficulty for identification. The present technology is a useful tool for genetic analysis of complex traits because it allows the single time point mapping of quantitatively differential gene expression; encompassing both up-regulation and down-regulation. As 2o such, this approach does not rely on the identification (predominantly by positional cloning methods) of structural mutations in genes as contributors to the trait but rather on altered amounts of gene products associated with the trait. In a particular embodiment of the claimed method of gene expression profiling and monitoring provides a non-ambiguous portrait of multiple genes expressed simultaneously in response to a given stimulus (induction agent, disease, infection, toxin, etc.) thus allowing multiple contributors to a trait to be identified. It is understood that the instant method is usefully used in parallel and complementary experimentation with other gene identification strategies such as genetic mapping and cloning methods.
The scope of the invention is understood to reach all organisms. The analysis of the genomes of any organism, whether animal, plant, fungal, bacterial or viral, is possible in so far as probes from RNA libraries can be 1o produced from these organisms. The target array configuration remains fixed regardless of the type of genome assayed. This technology is advantageous in gene identification studies in genomes which have not been extensively studied at the genomic level, (e.g. genomes with limited genetic marker maps, physical maps or sequence information) since little a priori information is needed to identify candidate genes. As an example, the identification of candidate genes involved in disease or pathogen resistance in agronomically important crop plants or domestic livestock is noted.
The present invention is usefully employed in drug and toxicology 2o screening, discovery and development. The instant DD methodology permits monitoring drug response at the gene expression level and locating genes involved in a particular response. Similarly, diagnostic uses based on the assay for known gene expression profiles related to specific induction or disease states (e.g. monitoring expression of known genes which are expressed during cancer or other illnessl. This aspect of the application of the technology will be of particular use for pharmacogenomic research in evaluating how variability in genetic background influences positive or negative response to a drug.
s The following examples disclose the methods of the present invention. Particular reference is made to available texts in this area, the teachings of which are incorporated herein by reference:
1 . Current Protocols in Molecular Biology, F.M. Ausubel et al. (John Wiley and Sons, inc., New York ( 1988)) ISBN 047150338X.
to 2. Molecular Cloning: A Laboratory Manual, J. Sambrook et al. (Cold Spring Harbor Laboratory Press (1989)) ISBN 0879693096.
3. A Practical Guide to Molecular Cloning, Bernard v. Perbal et al.
(John Wifey and Sons, lnc., New York (1988)) ISBN 0471850713.
4. Short Protocols in Molecular Biology, (Second Edition), F.M.
1s Ausubel et al. (John Wiley and Sons, Inc., New York (1992)) ISBN
0471577359.
5. Gene Transfer and Expression, Michael Kreigler (W.H. Freeman &
Company, New York ( 1990) ISBN 0716770040.
6. cDNA Library Protocols, Ian G. Cowell et al. (Humana Press, New 2o Jersey (1997)) ISBN 089603383X.
Example 1 ACCESSIBLE ARRAY
Oligonucleotide attachment to different membranes Five different oligo sequences were tested for complementary binding on the membrane (see Table 1 ). Oligos B, C and D contained 1, 2 and 3 mismatches, respectively, with respect to Oligo A. Oiigo F contained a hairpin structure. Oligo G was at-rich. Oligo H was gc-rich. Complementary sequences to Oligos A, E, F, G and H were attached to the test membranes. The probe oligos were radioactively end-labeled with 32P, according to standard procedures. Briefly, 100 pmole of oligonucleotide was end-labeled with (gamma-32P)ATP using T4 polynucleotide kinase in a final reaction volume of 50 uL. The mixture was incubated 15 min. at 37°C and the reaction was stopped by incubating at 90°C for 5 min. The labeled oligonucleotide was then purified on Sephadex G-25 columns.
Table 1 Oligo Probe oligo (5' to Target oligo 3') (5' to 3') is A atc gctagcat atgcta ccgat B atc g ttagcat 2o C atcgg ttag tat D atcgg ttag to E atc gaattca cctgattccgat F atc gttccgat atc gaacc at G atagttactaag ctta taactat 3o H caccga tcc ggactc t cc Three different types of spacer oligos were tested: 1 ) Oligo without spacer, 2) Oligo with 6-C spacer, and 3) Oligo with 12-C spacer. Oligos with C-spacers had a primary reactive amine group added to the 5' ends via a carbon spacer.
Three different membranes were tested: Type #1, a negatively charged membrane (Pall Biodyne C, East Hills, NY), Type #2 membrane s (Amersham, Arlington Heights, IL), and Type #3 membrane (Genescreen, NEN Lite Sciences, Boston, MA ). Attachment of target oligos to the membranes was effected as follows. The oligonucleotides were covalently linked to a negatively charged nylon membrane (Pall membrane only) by the amine group. The Pall membrane was chemically treated with EDC ([1-ethyl-3-(dimethyiaminopropyl)carbodiamide hydrochloride] ) (Sigma, St-Louis, MO) according to methods already described in Kawasaki et al., Methods in Enz~rmoloav, 218: 369-381, 1993. Oligos were attached to Hybond and Genescreen membranes with UV cross-linking and heat treatment at 80° C for 3 hours.
is Accessibility of Probe Hybridization Probe hybridization of the oligonucleotides to the membrane was carried out as follows. Membranes were incubated with pre-hybridization buffer (5X SSC, 5X Denhardts, 0.1 % SDS, 0.2 mgimL salmon sperm DNA) for 2 hours at 50 C. Radioactive probes (7.5 x 105 cpmimL) were denatured by 2o boiling, then added directly to the pre-hybridization buffer and incubated overnight at 50 C. The membranes were washed at increasing stringency up to 0.5X SSC at 50 C. Membranes were then exposed on Kodak XAR
film overnight at -80 C using a intensifying screens.
These data established that probe hybridization is highly specific and capable of discriminating down to a single nucleotide mismatch. Results are summarized in Fig. 4. Hybridization of oligonucleotides was tested using immobilized oligonucleotides containing no modification (C=0) and s modifications consisting of hexamethylene and dodecamethylene spacer molecules added at the 5' end (C = 6 and C =12, respectively). Wells 4-9 were empty. Wells 3, 12, 15, 1$, 21, 24 contained the specific oligo of interest (Oligo A) while all other wells contained non-complementary oligos (Oligos E and F). Probes consisted of complementary oligos with no mismatch (Oligo A), one mismatch (Oligo B), two mismatches (Oligo C).
Results showed that 0.5 pmoles slot-blotted on a membrane and hybridized with 100 pmoles of the complementary oligo were detected. Probes containing one or more mismatches could not detect the immobilized oligo with the stringency used. Hybridization with the probe with the hairpin is structure were not be detected with the stringency used. Note that the data is easily interpreted due to the lack of background. In the vast majority of the membrane and oligo combinations tested (data not shown), background signal was minimal or absent. A C = 12 spacer gave a stronger signal than a C = 6 spacer. Immobilized (less accessible) oligonucleotides 2o without spacers were not detected with the stringency used.
Complementary oligos did not hybridize, or hybridized weakly, to the Hybond and Genescreen membranes. This was probably as a result of inaccessible complementary base-pairing caused by oligo attachment method or poor attachment of the oligo to the membrane.
It is possible that the oligos with spacers did not attach to the Hybond and Genescreen membranes because the latter were not negatively charged. As an exception, the gc-rich oligos tended to hybridize to their attached complementary oligos on these membranes, probably because of s the stronger triple hydrogen bonding in gc compared to at base-pairing, considering the stringencies used in this experiment. !t is to be understood that modification of the degree of oligo attachment to membrane such that accessibility of target sequences is maintained is available by modification of oligo attachment conditions and by selection of linker.
Results establish that 1 ) oligo attachment to the membrane was particularly effective with the accessability provided by 12-carbon spacers and that this attachment was stable under different wash conditions, 2) probe hybridization was highly specific and capable of discriminating down to a single nucleotide mismatch, and 3) the data was easily interpreted due ~s to the lack of background (non-specific hybridization).
Example 2 Vector Construction The objective was to construct a cloning vector with the promoter 2o site of an RNA polymerase and the recognition site of a remote-site cutter endonuclease in functional relationship with one another, thus allowing the generation of short mono-length segments from inserted cDNA fragments.
This specialized cloning vector, pALLgenes, was constructed as follows.
The vector was a modified pBluescript II SK + (Short, J.M., et al., 2s Nucleic Acids Res. 16: 7583-7600, 1988) in which the original polylinker in the Multiple Cloning Site (Bss HII - Bss HII) was replaced with a polylinker containing, in order, the following restriction or polymerise promoter recognition sites: Bss HII - T3 promoter site - Bpm I - Cla I - Not I
- Kpn I - T7 promoter site - Bss HII. The replacement linker was artificially s synthesized and cloned into pBluescript II SK + . Briefly, two oligonucleotides were synthesized: Oligo I =
5'attgcgcgcaattaaccctcactaaagggctggagatcgatactagccqat3' and Oligo J =
5'ttagcgcgctaatacgactcactatagggggtaccgcggccgcatcaattaac3'. The two oligos were designed to anneal together at a short internal complementary to region (underlined). Oligos I and J were used as a template in a PCR
amplification with a final reaction volume of 200 uL and containing 500 ng Oligo I, 500 ng Oligo J, 100 uM dNTP, 1.2 mM MgCl2, 2 U Taq (Gibco, Gaithersburg, MD, USA). The PCR program was100 C for 3 m; 5 cycles of 95°C for 20 seconds, 52°C for 20 seconds, 68° C for 20 seconds; and 1 is cycle of 68°C for 60 seconds. The PCR product was digested with Bss HII
and subsequently purified by band excision from a 2% agarose gel after staining with ethidium bromide. The original linker from pBluescript II SK+
was excised by Bss HII digestion followed by gel purification of the resulting vector band to which the replacement linker was ligated.
2o The sequence of the replacement linker, located between two Bss HII
recognition sites, was confirmed by sequencing (data not shown) and was as follows:
5'gcgcgcaattaaccctcactaaagggctggagatcgatgctagccgatgcggccgcggtaccccct atagtgagtcgtattagcgcgc3'.
As shown in Fig. 1, Msp I - Not I digested fragments are inserted into the pALLgenes vector at the Cla I - Not I cloning site. Bpm I is then used to linearize the vector at a specific site within the insert and at exactly 16 nucleotides downstream from the Bpml recognition site. Polymerization s of the linearized vector with T3 RNA pol then produces a mono-stranded oligomer of exactly 23 nucleotides.
Example 3 Prei~aration of 3'-sj~ecific cDNA libraries and cRNA probes An mRNA library is produced from two populations of cells (affected vs. non-affected) for which differential gene expressed is being assayed (Fig. 1 A). The following procedures are performed on each individual population of cells.
Using standard procedures, or an RNA extraction kit such as Fast ~5 track 2.0 mRNA Isolation Kit (Invitrogen Corporation, CA), 5-10 ug of poly-A mRNA is extracted from the cells.
In order to obtain cDNA, the mRNA is reverse-transcribed by the method of Okayama and Berg (Mol. Cell Biol. 2: 161 ( 1982), the teachings of which are incorporated by reference) using reverse transcriptase and a 2o capture primer consisting of a poly(dT)/Not I sequence (5'gcggccgcttttttttttttttt3'). This generates a series of RNA-DNA hybrid fragments of variable length, all of which contain a Not I site at the 5' end.
In particular embodiments, in order to increase the efficiency of mRNA
capture, it is advisable to create additional libraries by extending the nucleotide sequence at either or both the 3' and 5' ends of the capture primer, thus permitting a stronger hybridization with certain mRNAs.
In organisms which do not produce RNA with a 3' poly-A tail the capture of RNA is accomplished using an alternate method. One such s method is a strategy based on capture at a 5'-specific sequence or property such as the RNA cap. This approach is particularly useful for isolating an RNA library from bacteria.
Using standard procedures for synthesis of blunt-ended double-stranded cDNA such as that available by commercial kits, (e.g.,CopyKit to (Invitrogen, Carlsbad, CA)), the RNA-DNA hybrid fragments are digested with RNAse H enzyme thus permitting DNA Polymerase I enzyme to use the digested RNA fragment as a template for second strand synthesis of the cDNA. Nicks in the double-stranded DNA are repaired by DNA Ligase enzyme followed by treatment with T4 DNA Polymerase enzyme to create ~s blunt-ended cDNA fragments.
The cDNA fragments are then separated into two aliquots. One aliquot is used to generate a substantially expression-length library (Fig.
1 B). The other aliquot is used to generate a partial-length library from which a short mono-length library is subsequently generated (Fig. 1 C).
2o Expression-length library One aliquot of blunt-ended cDNA fragments is cloned, using standard procedures, into Lambda-ZAP phage to generate an expression-length cDNA library, according to the method of Short, J.M.: "Lambda ZAP: a bacteriophage lambda expression vector with in vivo excision properties.
Nuc. Acids Res. 16:7583-7600 (1988) the teachings of which are incorporated by reference.
The library is titrated to > 3 x 106 plaque forming units / ug of RNA
to ensure adequate representation of all RNAs. This library is stored frozen s and is used in subsequent steps for retrieval of cDNA corresponding to sequences of interest from probe hybridization.
Partial-length library Another aliquot of blunt-ended cDNA is digested simultaneously with Msp I and Not I restriction enzymes. This generates a series of fragments 10 of which the vast majority have 5' Msp I and 3' Msp I sticky ends. A
minority of fragments will include Not I - Not I and 5' Msp I - Not I 3' fragments.
Using standard procedures, the digested fragments are ligated into the cloning vector of Example 2 between the Cla I - Not I cloning sites (Fig.
1s 2A). Since sticky-end Msp I fragments are insertable with respect to the Cla I site (5'...atcgat...3'), only Msp I - Not I fragments are inserted into the cloning vector (Fig. 2C). The vector construct is used to transform competent bacteria. After growing up the bacterial clones in suitable media and antibiotic, the vector DNA is separated from bacterial genomic DNA
2o using a standard DNA purification kit such as Mini-Prep DNA Purification Kit (Qiagen, Hilden, Germany). One aliquot of the purified vector DNA is stored frozen and is used in subsequent steps for retrieval of cDNA
corresponding to sequences of interest from probe hybridization. Another aliquot is used to generate the short mono-length library.
Short Mono-length library The purified vector DNA from the last step is linearized using Bpm I
(5'...ctggag(n)16...3') restriction enzyme. (Fig. 2B) The Bpm I enzyme cuts at a site exactly 16 nucleotides downstream from its recognition site s which is situated adjacent to the T3 promoter site in the vector of Example 2.
The digested fragments are then polymerized using T3 RNA
Polymerase and labeled dNTPs (either radioactive 32-P or fluorescent tags) (Fig. 2C). The RNA polymerase specifically recognizes the T3 to bacteriophage-specific promoter site in the linearized plasmid. This reaction generates a library of labeled single-stranded RNA fragments which are exactly 23 nucleotides long (Fig. 2D). These fragments comprise a library of 3' gene tags which are subsequently used as hybridization probes against an array of complementary target oligonucleotides affixed onto a 15 membrane or biochip substrate (Fig. 2D). Ideally, the target sequences are 23-mer in length and complementary to both the provided and inquiry nucleotide portions of the probe. However, as a cost-cutting measure, alternative target sequences will also be useful in applications requiring a only lesser specificity of hybridization. Examples of the latter include 2o target sequences complementary to only the variable moiety of the inquiry portion of the probe and target sequences complementary to the entire inquiry portion of the probe but with a blocker oligo complementary to the provided portion of the probe added to the hybridization reagents.
Example 4 Preaaration of a membrane array In this procedure, an ordered array of specific oligonucleotides is s immobilized onto a membrane. Briefly, a complete set of specific 13-mer oligonucleotides is synthesized of which the first 5 nucleotides are fixed and the last 8 nucleotides are variable and exhaustively cover all possible sequence permutations. These 13-mer specific oligonucleotides have sequence 5'-(carbon spacer)-nnnnnnnnccgat-3' where n = g, a, t, or c.
This set of 13-mer specific oligonucleotides is equivalent to 48 or N = 65, 536 different oligonucleotide sequences. Each oligo is synthesized with a primary reactive amine group added to the 5' end via a 12-carbon spacer and an amino-linker.
Each oligonucleotide is diluted to an identical concentration and ~s distributed to 384-well format microplates. Using the Q-BOT gridding robot (Genetix, UK), an identical amount of each oligonucleotide is double-spotted to specific addresses on a 22x22cm Biodyne C membrane (Pall Biosupport, Port Washington, New York). A series of replica membranes is produced.
The spotted oligonucleotides are covalently attached to the 2o membrane using chemical treatment with EDC as disclosed in Kawasaki, E.;
Randall, S.; Erlich, H.: "Genetic analysis using polymerase chain reaction-amplified DNA and immobilized oligonucleotide probes: reverse dot-blot typing, Methods in Enzymology 218: 369-381 ( 1993), the teachings of which are incorporated by reference. The immobilization configuration with the 12-carbon spacer ensures that the entire target sequence is freely accessible during subsequent annealing steps (Fig. 2D).
Example 5 Alternative Array Preparation Following the preparation of Example 4, array preparation is varied as follows. The oligonucleotides are immobilized onto a biochip in an ordered array. The oligonucleotides are 23-mers with sequence 5'-(optional spacer)-to nnnnnnnnnnnccgatctccagc3' where n = g, a, t, or c. This is equivalent to 4" (N = 4,194,304) specific oligonucleotides covering all possible permutations of complementary probes in the variable region. In the biochip array, oligos are synthesized directly on the solid phase substrate (for example, glass or polymer) by a synthesis known as photolithography.
is Example 6 Probe hybridization In this procedure, labeled mono-length 23-mer cRNA probes from 2o Example 3 are hybridized to the membrane array at incremental wash stringencies.
Using standard procedures, the cRNA probes are hybridized to the membrane array. The hybridized membrane is washed over a range of stringency. The temperature increments are from 1 ° C to 5° C, over an 2s overall range of 45 ° to 65 ° C. At each successive wash temperature, a photo of the membrane is recorded by a Phospho-Imager (Molecular Dynamics, Sunnyvale, California) or by a similar signal recorder.
The hybridization signal intensities are scanned and converted into numeric values by an image-analysis software. For signal intensities generated by the Phosphorlmager, the resolution image generally of greater than 1024 grey tones per pixel is sufficient to detect quantitative s differences in hybridization per target. These signals are monitored for each wash step and subsequently analyzed by a computer algorithm that determines true positive signals from false positive and false negative signals (see Definition S). This permits generating a profile of gene expression which is used to compare against profiles of differential 1o populations. Positive and negative controls are added to each hybridization array in order to validate and normalize signal quality.
Example 7 Identification of hybridized cRNA
~s Substantially expression-length sequences corresponding to positive hybridization signals on the membrane array are isolated and sequenced as follows.
Hybridization signals are traced back to their corresponding addresses on the membrane, and therefore, to their corresponding 13-mer 2o DNA sequences. Accordingly, the cRNA sequences which are differentially expressed are fished out from the phage library by a selective PCR
amplification in 2 steps.
A. Selective amplification from the partial-length library The first step consists in PCR amplification and sequencing of the entire 3' sequence downstream to the first Cla I / Msp I hybrid site using a purified aliquot of the partial-length library as a template (Fig. 1 E). In order to do this, a 20-mer forward primer complementary to the 1 1-mer unique 5 cRNA sequence plus the upstream Bpm I/Cla I/Msp I site, and a reverse primer complementary to an internal vector sequence (e.g. T7 promoter primer sequence), are used to generate a PCR product for sequencing. The PCR product is sequenced using standard automated systems such as the ABI-377 DNA sequencer (Perkin Elmer, CT) (Fig. 1 E1.
to B. Selective amplification from the expression-length library The second step consists of selective PCR amplification of the full-length mRNA sequence using the phage library in Example 3 as a template (Fig. 1 F). Primers used in this reaction are a forward primer consisting of a suitable internal phage sequence and a reverse primer consisting of a Is suitable sequence obtained from the selective amplification and sequencing of the 3' Msp I - Not I fragment from the partial-length library. (Fig. 1 F) The resulting PCR product corresponds to the expression-length mRNA which is then sequenced using standard procedures to obtain the identity of the expressed gene. Some characteristics of the identified gene 2o such as putative function can be obtained using comparative bioinformatics searches of existing biological databases. Reference is made to BLAST, (Basic Local Alignment Search Toot) which is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA, and is found at the National Center for Biotechnology Information lhttp://www.ncbi.nlm.nih.gov/BLAST/). The differentially expressed gene sequence is confirmed by Northern blot against affected and non-affected cell RNA extracts. In alternative embodiments, the PCR fragment is used as a probe for additional hybridization experiments.
Example 1 ACCESSIBLE ARRAY
Oligonucleotide attachment to different membranes Five different oligo sequences were tested for complementary binding on the membrane (see Table 1 ). Oligos B, C and D contained 1, 2 and 3 mismatches, respectively, with respect to Oligo A. Oiigo F contained a hairpin structure. Oligo G was at-rich. Oligo H was gc-rich. Complementary sequences to Oligos A, E, F, G and H were attached to the test membranes. The probe oligos were radioactively end-labeled with 32P, according to standard procedures. Briefly, 100 pmole of oligonucleotide was end-labeled with (gamma-32P)ATP using T4 polynucleotide kinase in a final reaction volume of 50 uL. The mixture was incubated 15 min. at 37°C and the reaction was stopped by incubating at 90°C for 5 min. The labeled oligonucleotide was then purified on Sephadex G-25 columns.
Table 1 Oligo Probe oligo (5' to Target oligo 3') (5' to 3') is A atc gctagcat atgcta ccgat B atc g ttagcat 2o C atcgg ttag tat D atcgg ttag to E atc gaattca cctgattccgat F atc gttccgat atc gaacc at G atagttactaag ctta taactat 3o H caccga tcc ggactc t cc Three different types of spacer oligos were tested: 1 ) Oligo without spacer, 2) Oligo with 6-C spacer, and 3) Oligo with 12-C spacer. Oligos with C-spacers had a primary reactive amine group added to the 5' ends via a carbon spacer.
Three different membranes were tested: Type #1, a negatively charged membrane (Pall Biodyne C, East Hills, NY), Type #2 membrane s (Amersham, Arlington Heights, IL), and Type #3 membrane (Genescreen, NEN Lite Sciences, Boston, MA ). Attachment of target oligos to the membranes was effected as follows. The oligonucleotides were covalently linked to a negatively charged nylon membrane (Pall membrane only) by the amine group. The Pall membrane was chemically treated with EDC ([1-ethyl-3-(dimethyiaminopropyl)carbodiamide hydrochloride] ) (Sigma, St-Louis, MO) according to methods already described in Kawasaki et al., Methods in Enz~rmoloav, 218: 369-381, 1993. Oligos were attached to Hybond and Genescreen membranes with UV cross-linking and heat treatment at 80° C for 3 hours.
is Accessibility of Probe Hybridization Probe hybridization of the oligonucleotides to the membrane was carried out as follows. Membranes were incubated with pre-hybridization buffer (5X SSC, 5X Denhardts, 0.1 % SDS, 0.2 mgimL salmon sperm DNA) for 2 hours at 50 C. Radioactive probes (7.5 x 105 cpmimL) were denatured by 2o boiling, then added directly to the pre-hybridization buffer and incubated overnight at 50 C. The membranes were washed at increasing stringency up to 0.5X SSC at 50 C. Membranes were then exposed on Kodak XAR
film overnight at -80 C using a intensifying screens.
These data established that probe hybridization is highly specific and capable of discriminating down to a single nucleotide mismatch. Results are summarized in Fig. 4. Hybridization of oligonucleotides was tested using immobilized oligonucleotides containing no modification (C=0) and s modifications consisting of hexamethylene and dodecamethylene spacer molecules added at the 5' end (C = 6 and C =12, respectively). Wells 4-9 were empty. Wells 3, 12, 15, 1$, 21, 24 contained the specific oligo of interest (Oligo A) while all other wells contained non-complementary oligos (Oligos E and F). Probes consisted of complementary oligos with no mismatch (Oligo A), one mismatch (Oligo B), two mismatches (Oligo C).
Results showed that 0.5 pmoles slot-blotted on a membrane and hybridized with 100 pmoles of the complementary oligo were detected. Probes containing one or more mismatches could not detect the immobilized oligo with the stringency used. Hybridization with the probe with the hairpin is structure were not be detected with the stringency used. Note that the data is easily interpreted due to the lack of background. In the vast majority of the membrane and oligo combinations tested (data not shown), background signal was minimal or absent. A C = 12 spacer gave a stronger signal than a C = 6 spacer. Immobilized (less accessible) oligonucleotides 2o without spacers were not detected with the stringency used.
Complementary oligos did not hybridize, or hybridized weakly, to the Hybond and Genescreen membranes. This was probably as a result of inaccessible complementary base-pairing caused by oligo attachment method or poor attachment of the oligo to the membrane.
It is possible that the oligos with spacers did not attach to the Hybond and Genescreen membranes because the latter were not negatively charged. As an exception, the gc-rich oligos tended to hybridize to their attached complementary oligos on these membranes, probably because of s the stronger triple hydrogen bonding in gc compared to at base-pairing, considering the stringencies used in this experiment. !t is to be understood that modification of the degree of oligo attachment to membrane such that accessibility of target sequences is maintained is available by modification of oligo attachment conditions and by selection of linker.
Results establish that 1 ) oligo attachment to the membrane was particularly effective with the accessability provided by 12-carbon spacers and that this attachment was stable under different wash conditions, 2) probe hybridization was highly specific and capable of discriminating down to a single nucleotide mismatch, and 3) the data was easily interpreted due ~s to the lack of background (non-specific hybridization).
Example 2 Vector Construction The objective was to construct a cloning vector with the promoter 2o site of an RNA polymerase and the recognition site of a remote-site cutter endonuclease in functional relationship with one another, thus allowing the generation of short mono-length segments from inserted cDNA fragments.
This specialized cloning vector, pALLgenes, was constructed as follows.
The vector was a modified pBluescript II SK + (Short, J.M., et al., 2s Nucleic Acids Res. 16: 7583-7600, 1988) in which the original polylinker in the Multiple Cloning Site (Bss HII - Bss HII) was replaced with a polylinker containing, in order, the following restriction or polymerise promoter recognition sites: Bss HII - T3 promoter site - Bpm I - Cla I - Not I
- Kpn I - T7 promoter site - Bss HII. The replacement linker was artificially s synthesized and cloned into pBluescript II SK + . Briefly, two oligonucleotides were synthesized: Oligo I =
5'attgcgcgcaattaaccctcactaaagggctggagatcgatactagccqat3' and Oligo J =
5'ttagcgcgctaatacgactcactatagggggtaccgcggccgcatcaattaac3'. The two oligos were designed to anneal together at a short internal complementary to region (underlined). Oligos I and J were used as a template in a PCR
amplification with a final reaction volume of 200 uL and containing 500 ng Oligo I, 500 ng Oligo J, 100 uM dNTP, 1.2 mM MgCl2, 2 U Taq (Gibco, Gaithersburg, MD, USA). The PCR program was100 C for 3 m; 5 cycles of 95°C for 20 seconds, 52°C for 20 seconds, 68° C for 20 seconds; and 1 is cycle of 68°C for 60 seconds. The PCR product was digested with Bss HII
and subsequently purified by band excision from a 2% agarose gel after staining with ethidium bromide. The original linker from pBluescript II SK+
was excised by Bss HII digestion followed by gel purification of the resulting vector band to which the replacement linker was ligated.
2o The sequence of the replacement linker, located between two Bss HII
recognition sites, was confirmed by sequencing (data not shown) and was as follows:
5'gcgcgcaattaaccctcactaaagggctggagatcgatgctagccgatgcggccgcggtaccccct atagtgagtcgtattagcgcgc3'.
As shown in Fig. 1, Msp I - Not I digested fragments are inserted into the pALLgenes vector at the Cla I - Not I cloning site. Bpm I is then used to linearize the vector at a specific site within the insert and at exactly 16 nucleotides downstream from the Bpml recognition site. Polymerization s of the linearized vector with T3 RNA pol then produces a mono-stranded oligomer of exactly 23 nucleotides.
Example 3 Prei~aration of 3'-sj~ecific cDNA libraries and cRNA probes An mRNA library is produced from two populations of cells (affected vs. non-affected) for which differential gene expressed is being assayed (Fig. 1 A). The following procedures are performed on each individual population of cells.
Using standard procedures, or an RNA extraction kit such as Fast ~5 track 2.0 mRNA Isolation Kit (Invitrogen Corporation, CA), 5-10 ug of poly-A mRNA is extracted from the cells.
In order to obtain cDNA, the mRNA is reverse-transcribed by the method of Okayama and Berg (Mol. Cell Biol. 2: 161 ( 1982), the teachings of which are incorporated by reference) using reverse transcriptase and a 2o capture primer consisting of a poly(dT)/Not I sequence (5'gcggccgcttttttttttttttt3'). This generates a series of RNA-DNA hybrid fragments of variable length, all of which contain a Not I site at the 5' end.
In particular embodiments, in order to increase the efficiency of mRNA
capture, it is advisable to create additional libraries by extending the nucleotide sequence at either or both the 3' and 5' ends of the capture primer, thus permitting a stronger hybridization with certain mRNAs.
In organisms which do not produce RNA with a 3' poly-A tail the capture of RNA is accomplished using an alternate method. One such s method is a strategy based on capture at a 5'-specific sequence or property such as the RNA cap. This approach is particularly useful for isolating an RNA library from bacteria.
Using standard procedures for synthesis of blunt-ended double-stranded cDNA such as that available by commercial kits, (e.g.,CopyKit to (Invitrogen, Carlsbad, CA)), the RNA-DNA hybrid fragments are digested with RNAse H enzyme thus permitting DNA Polymerase I enzyme to use the digested RNA fragment as a template for second strand synthesis of the cDNA. Nicks in the double-stranded DNA are repaired by DNA Ligase enzyme followed by treatment with T4 DNA Polymerase enzyme to create ~s blunt-ended cDNA fragments.
The cDNA fragments are then separated into two aliquots. One aliquot is used to generate a substantially expression-length library (Fig.
1 B). The other aliquot is used to generate a partial-length library from which a short mono-length library is subsequently generated (Fig. 1 C).
2o Expression-length library One aliquot of blunt-ended cDNA fragments is cloned, using standard procedures, into Lambda-ZAP phage to generate an expression-length cDNA library, according to the method of Short, J.M.: "Lambda ZAP: a bacteriophage lambda expression vector with in vivo excision properties.
Nuc. Acids Res. 16:7583-7600 (1988) the teachings of which are incorporated by reference.
The library is titrated to > 3 x 106 plaque forming units / ug of RNA
to ensure adequate representation of all RNAs. This library is stored frozen s and is used in subsequent steps for retrieval of cDNA corresponding to sequences of interest from probe hybridization.
Partial-length library Another aliquot of blunt-ended cDNA is digested simultaneously with Msp I and Not I restriction enzymes. This generates a series of fragments 10 of which the vast majority have 5' Msp I and 3' Msp I sticky ends. A
minority of fragments will include Not I - Not I and 5' Msp I - Not I 3' fragments.
Using standard procedures, the digested fragments are ligated into the cloning vector of Example 2 between the Cla I - Not I cloning sites (Fig.
1s 2A). Since sticky-end Msp I fragments are insertable with respect to the Cla I site (5'...atcgat...3'), only Msp I - Not I fragments are inserted into the cloning vector (Fig. 2C). The vector construct is used to transform competent bacteria. After growing up the bacterial clones in suitable media and antibiotic, the vector DNA is separated from bacterial genomic DNA
2o using a standard DNA purification kit such as Mini-Prep DNA Purification Kit (Qiagen, Hilden, Germany). One aliquot of the purified vector DNA is stored frozen and is used in subsequent steps for retrieval of cDNA
corresponding to sequences of interest from probe hybridization. Another aliquot is used to generate the short mono-length library.
Short Mono-length library The purified vector DNA from the last step is linearized using Bpm I
(5'...ctggag(n)16...3') restriction enzyme. (Fig. 2B) The Bpm I enzyme cuts at a site exactly 16 nucleotides downstream from its recognition site s which is situated adjacent to the T3 promoter site in the vector of Example 2.
The digested fragments are then polymerized using T3 RNA
Polymerase and labeled dNTPs (either radioactive 32-P or fluorescent tags) (Fig. 2C). The RNA polymerase specifically recognizes the T3 to bacteriophage-specific promoter site in the linearized plasmid. This reaction generates a library of labeled single-stranded RNA fragments which are exactly 23 nucleotides long (Fig. 2D). These fragments comprise a library of 3' gene tags which are subsequently used as hybridization probes against an array of complementary target oligonucleotides affixed onto a 15 membrane or biochip substrate (Fig. 2D). Ideally, the target sequences are 23-mer in length and complementary to both the provided and inquiry nucleotide portions of the probe. However, as a cost-cutting measure, alternative target sequences will also be useful in applications requiring a only lesser specificity of hybridization. Examples of the latter include 2o target sequences complementary to only the variable moiety of the inquiry portion of the probe and target sequences complementary to the entire inquiry portion of the probe but with a blocker oligo complementary to the provided portion of the probe added to the hybridization reagents.
Example 4 Preaaration of a membrane array In this procedure, an ordered array of specific oligonucleotides is s immobilized onto a membrane. Briefly, a complete set of specific 13-mer oligonucleotides is synthesized of which the first 5 nucleotides are fixed and the last 8 nucleotides are variable and exhaustively cover all possible sequence permutations. These 13-mer specific oligonucleotides have sequence 5'-(carbon spacer)-nnnnnnnnccgat-3' where n = g, a, t, or c.
This set of 13-mer specific oligonucleotides is equivalent to 48 or N = 65, 536 different oligonucleotide sequences. Each oligo is synthesized with a primary reactive amine group added to the 5' end via a 12-carbon spacer and an amino-linker.
Each oligonucleotide is diluted to an identical concentration and ~s distributed to 384-well format microplates. Using the Q-BOT gridding robot (Genetix, UK), an identical amount of each oligonucleotide is double-spotted to specific addresses on a 22x22cm Biodyne C membrane (Pall Biosupport, Port Washington, New York). A series of replica membranes is produced.
The spotted oligonucleotides are covalently attached to the 2o membrane using chemical treatment with EDC as disclosed in Kawasaki, E.;
Randall, S.; Erlich, H.: "Genetic analysis using polymerase chain reaction-amplified DNA and immobilized oligonucleotide probes: reverse dot-blot typing, Methods in Enzymology 218: 369-381 ( 1993), the teachings of which are incorporated by reference. The immobilization configuration with the 12-carbon spacer ensures that the entire target sequence is freely accessible during subsequent annealing steps (Fig. 2D).
Example 5 Alternative Array Preparation Following the preparation of Example 4, array preparation is varied as follows. The oligonucleotides are immobilized onto a biochip in an ordered array. The oligonucleotides are 23-mers with sequence 5'-(optional spacer)-to nnnnnnnnnnnccgatctccagc3' where n = g, a, t, or c. This is equivalent to 4" (N = 4,194,304) specific oligonucleotides covering all possible permutations of complementary probes in the variable region. In the biochip array, oligos are synthesized directly on the solid phase substrate (for example, glass or polymer) by a synthesis known as photolithography.
is Example 6 Probe hybridization In this procedure, labeled mono-length 23-mer cRNA probes from 2o Example 3 are hybridized to the membrane array at incremental wash stringencies.
Using standard procedures, the cRNA probes are hybridized to the membrane array. The hybridized membrane is washed over a range of stringency. The temperature increments are from 1 ° C to 5° C, over an 2s overall range of 45 ° to 65 ° C. At each successive wash temperature, a photo of the membrane is recorded by a Phospho-Imager (Molecular Dynamics, Sunnyvale, California) or by a similar signal recorder.
The hybridization signal intensities are scanned and converted into numeric values by an image-analysis software. For signal intensities generated by the Phosphorlmager, the resolution image generally of greater than 1024 grey tones per pixel is sufficient to detect quantitative s differences in hybridization per target. These signals are monitored for each wash step and subsequently analyzed by a computer algorithm that determines true positive signals from false positive and false negative signals (see Definition S). This permits generating a profile of gene expression which is used to compare against profiles of differential 1o populations. Positive and negative controls are added to each hybridization array in order to validate and normalize signal quality.
Example 7 Identification of hybridized cRNA
~s Substantially expression-length sequences corresponding to positive hybridization signals on the membrane array are isolated and sequenced as follows.
Hybridization signals are traced back to their corresponding addresses on the membrane, and therefore, to their corresponding 13-mer 2o DNA sequences. Accordingly, the cRNA sequences which are differentially expressed are fished out from the phage library by a selective PCR
amplification in 2 steps.
A. Selective amplification from the partial-length library The first step consists in PCR amplification and sequencing of the entire 3' sequence downstream to the first Cla I / Msp I hybrid site using a purified aliquot of the partial-length library as a template (Fig. 1 E). In order to do this, a 20-mer forward primer complementary to the 1 1-mer unique 5 cRNA sequence plus the upstream Bpm I/Cla I/Msp I site, and a reverse primer complementary to an internal vector sequence (e.g. T7 promoter primer sequence), are used to generate a PCR product for sequencing. The PCR product is sequenced using standard automated systems such as the ABI-377 DNA sequencer (Perkin Elmer, CT) (Fig. 1 E1.
to B. Selective amplification from the expression-length library The second step consists of selective PCR amplification of the full-length mRNA sequence using the phage library in Example 3 as a template (Fig. 1 F). Primers used in this reaction are a forward primer consisting of a suitable internal phage sequence and a reverse primer consisting of a Is suitable sequence obtained from the selective amplification and sequencing of the 3' Msp I - Not I fragment from the partial-length library. (Fig. 1 F) The resulting PCR product corresponds to the expression-length mRNA which is then sequenced using standard procedures to obtain the identity of the expressed gene. Some characteristics of the identified gene 2o such as putative function can be obtained using comparative bioinformatics searches of existing biological databases. Reference is made to BLAST, (Basic Local Alignment Search Toot) which is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA, and is found at the National Center for Biotechnology Information lhttp://www.ncbi.nlm.nih.gov/BLAST/). The differentially expressed gene sequence is confirmed by Northern blot against affected and non-affected cell RNA extracts. In alternative embodiments, the PCR fragment is used as a probe for additional hybridization experiments.
Claims (52)
1. A method of determining differential display of gene expression comprising the steps of:
(a) preparing at least two substantially identical accessible ordered arrays (target arrays) of synthetic substantially mono-length oligonucleotide DNA segments representing permutations of possible oligonucleotide sequences, (b) preparing partial-length first cDNA library being a partial-length segment library corresponding to substantially all complementary expressed mRNA sequences of a first gene source;
(c) using the library of step (b) and preparing a mono-length first cRNA library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA sequences of the first gene source;
(d) preparing substantially expression-length first cDNA library corresponding to or complementary with expressed mRNA sequences of the first gene source, wherein said library is a substantially expression-length transcript library;
(e) preparing like-condition partial-length second cDNA library being a substantially partial-length segment library corresponding to or complementary with expressed mRNA sequences of a second gene source;
(f) using the library of step (e) and preparing like-condition mono-length second cRNA library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA
sequences of a second gene source;
(g) preparing like-condition substantially expression-length second cDNA library corresponding to or complementary with expressed mRNA
sequences of the second gene source, wherein said library is a substantially expression-length transcript library;
(h) probe hybridizing the first and second mono-length segment libraries, each with an ordered array of step (a); and (i) comparing the probe hybridized accessible ordered arrays to determine differential hybridization display sites between first and second mono-length segment libraries; and, (k) referencing differential hybridization sites to at least one of said expression-length libraries to locate the gene of expression differential.
(a) preparing at least two substantially identical accessible ordered arrays (target arrays) of synthetic substantially mono-length oligonucleotide DNA segments representing permutations of possible oligonucleotide sequences, (b) preparing partial-length first cDNA library being a partial-length segment library corresponding to substantially all complementary expressed mRNA sequences of a first gene source;
(c) using the library of step (b) and preparing a mono-length first cRNA library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA sequences of the first gene source;
(d) preparing substantially expression-length first cDNA library corresponding to or complementary with expressed mRNA sequences of the first gene source, wherein said library is a substantially expression-length transcript library;
(e) preparing like-condition partial-length second cDNA library being a substantially partial-length segment library corresponding to or complementary with expressed mRNA sequences of a second gene source;
(f) using the library of step (e) and preparing like-condition mono-length second cRNA library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA
sequences of a second gene source;
(g) preparing like-condition substantially expression-length second cDNA library corresponding to or complementary with expressed mRNA
sequences of the second gene source, wherein said library is a substantially expression-length transcript library;
(h) probe hybridizing the first and second mono-length segment libraries, each with an ordered array of step (a); and (i) comparing the probe hybridized accessible ordered arrays to determine differential hybridization display sites between first and second mono-length segment libraries; and, (k) referencing differential hybridization sites to at least one of said expression-length libraries to locate the gene of expression differential.
2. The method of Claim 1 further comprising amplifying partial-length segments of said partial length libraries corresponding to said differential hybridization display sites by amplification of corresponding hybridization segments of a corresponding partial-length library prior to referencing to at least one of said expression-length libraries to identify the gene of expression differential.
3. The method of Claim 1 wherein the determined differential is a quantitative differential.
4. The method of Claim 1 wherein the determined differential is a nucleotide binding differential.
5. The method of Claim 1 wherein said mono-length libraries are prepared from said partial-length libraries by cleaving said partial-length cDNA
corresponding to or complementary with expressed mRNA sequences by a remote-site restriction endonuclease.
corresponding to or complementary with expressed mRNA sequences by a remote-site restriction endonuclease.
6. The method of Claim 5 wherein the remote-site restriction endonuclease is Bpm I.
7. The method of Claim 1 wherein said first gene source is cDNA
transcribed from expressed cellular mRNA from cells in non-affected state and said second gene source is cDNA transcribed from expressed cellular mRNA from cells in an affected state.
transcribed from expressed cellular mRNA from cells in non-affected state and said second gene source is cDNA transcribed from expressed cellular mRNA from cells in an affected state.
8. A mono-length library of Claim 1 wherein said mono-length sequences are cRNA elements 23 nucleotides in length of the formula 5'-gcuggagaucggnnnnnnnnnnn-3', wherein "n" represents any nucleotide and said 11 n nucleotides corresponding to a complementary 11 nucleotide sequence of said mRNA.
9. The method of Claim 1 wherein said mono-length oligonucleotide is substantially about 13-mer.
10. The method of Claim 1 wherein said mono-length oligonucleotide is substantially about 23-mer.
11. The method of Claim 1 wherein each segment of said mono-length segment library comprises a provided nucleotide portion and a inquiry nucleotide portion.
12. The method of Claim 11 wherein the provided nucleotide portion of said mono-length comprises 5'-gcuggagaucgg-3'.
13. The method of Claim 11 wherein the provided nucleotide portion of said mono-length is at least about 9-mer.
14. The method of Claim 11 wherein the provided nucleotide portion of said mono-length comprises 5'-gcuggagau-3'.
15. The method of Claim 11 wherein the inquiry nucleotide portion of said mono-length is about 14-mer.
16. The method of Claims 15 wherein the provided nucleotide portion of said mono-length comprises 5'-gcuggagaucgg-3'.
17. The method of Claim 11 wherein said inquiry portion comprises a constant and a variable portion and said variable portion is at least about 9-mer.
18. The method of Claim 17 wherein the variable portion of the said inquiry nucleotide portion is about 11-mer.
19. The method of Claim 18 wherein the constant portion of the said inquiry nucleotide portion is about 3-mer.
20. The method of Claim 11 wherein the provided portion comprises 5'cuggag3'.
21. The method of Claim 11 wherein said inquiry portion has a 3'-end and a 5'-end, and said 3'-end is 5'cgg3'.
22. The method of Claim 1 wherein said differential is quantitative gene expression by comparing comparison reporters selected from the group consisting of radioactive label, fluorescent label or chemiluminescent labeling detection of cRNA hybridization with target DNA sequences of modified oligonucleotides bound to a membrane or in a micro array.
23. The method of Claim 22 wherein said comparing further comprises the steps of quantifying said comparison reporters by incremental washing at temperatures between about 40°C and 70°C.
24. The method of Claim 23 wherein said increments are about 3°C or less.
25. The method of Claim 1 wherein said substantially identical ordered arrays of synthetic mono-length oligonucleotides are accessibly affixed to a substrate.
26. The method of Claim 25 wherein the substrate is a nylon membrane.
27. The method of Claim 25 wherein the said substrate is a biochip.
28. The method of Claim 25 wherein the said oligonucleotide is attached to said substrate by way of an intervening spacer molecule of at least about 6-carbons.
29. The method of Claim 28 wherein said spacer comprises at least about a 12-carbon chain.
30. A substantially mono-length segment cDNA library wherein said mono-length segments comprise cDNA corresponding to substantially all complementary expressed mRNA sequences of a gene source.
31. The library of Claim 30 wherein said mono-length is substantially about 23-mer.
32. The library of Claim 30 wherein said correspondence to complementary mRNA sequences comprises about 11-mer of each said mono-length segment.
33. A vector-insertable linker comprising a BssH II recognition site adjacent to an RNA polymerase promoter site in functional connection with a Bpm I
recognition site readably adjacent to a Cla I recognition site which is adjacent to a Not I recognition site, which is adjacent to a Kpn1 recognition site which is adjacent to either an RNA polymerase promoter site which is adjacent to a Bss HII recognition site.
recognition site readably adjacent to a Cla I recognition site which is adjacent to a Not I recognition site, which is adjacent to a Kpn1 recognition site which is adjacent to either an RNA polymerase promoter site which is adjacent to a Bss HII recognition site.
34. The linker of Claim 33 wherein at least on promoter site is T3 RNA
polymerase promoter site adjacent to said Bpm I recognition site.
polymerase promoter site adjacent to said Bpm I recognition site.
35. The linker of Claim 33 wherein at least on promoter site is T7 RNA
polymerase promoter site adjacent to said Bpm I recognition site.
polymerase promoter site adjacent to said Bpm I recognition site.
36. The vector-insertable linker of Claim 35 inserted into the pBluescript II
SK + cloning vector.
SK + cloning vector.
37. A vector-insertable linker comprising an RNA polymerase promoter site in functional connection with a Bpm I recognition site, adjacent to a Cla I/Msp I residual recognition site adjacent to an insert from a cDNA gene source adjacent to a Not I recognition site adjacent to Kpn1 recognition site adjacent to either a T3 or T7 RNA polymerase promoter site which is adjacent to a Bss HII recognition site.
38. The vector-insertable linker of Claim 37 inserted into the pBluescript Il SK + cloning vector.
39. A mono-length cRNA library corresponding to complementary expressed mRNA sequences wherein said library consists of cRNA
elements 23 nucleotides in length of the formula 5'-gcuggagaucggnnnnnnnnnnn-3', and wherein n represents one of 11 nucleotides in sequence corresponding to a complementary 11 nucleotide sequence of said mRNA.
elements 23 nucleotides in length of the formula 5'-gcuggagaucggnnnnnnnnnnn-3', and wherein n represents one of 11 nucleotides in sequence corresponding to a complementary 11 nucleotide sequence of said mRNA.
40. A method of determining differential display of gene expression comprising the steps of:
(a) preparing at least two substantially identical accessible ordered arrays of synthetic substantially mono-length oligonucleotide DNA
segments representing all permutations of possible oligonucleotide sequences, (b) preparing partial-length first cDNA library being a partial-length segment library corresponding to substantially all complementary expressed mRNA sequences of a first gene source;
(c) using the library of step (b) and preparing mono-length first cDNA
library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA sequences of the first gene source;
(d) preparing like-condition partial-length second cDNA library being a substantially partial-length segment library corresponding to or complementary with expressed mRNA sequences of a second gene source;
(e) using the library of step (e) and preparing like-condition mono-length second cDNA library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA
sequences of a second gene source;
(f) probe hybridizing the first and second mono-length segment libraries, each with an ordered array of step (a); and (g) comparing the probe hybridized accessible ordered arrays to determine differential hybridization display sites between first and second mono-length segment libraries.
(a) preparing at least two substantially identical accessible ordered arrays of synthetic substantially mono-length oligonucleotide DNA
segments representing all permutations of possible oligonucleotide sequences, (b) preparing partial-length first cDNA library being a partial-length segment library corresponding to substantially all complementary expressed mRNA sequences of a first gene source;
(c) using the library of step (b) and preparing mono-length first cDNA
library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA sequences of the first gene source;
(d) preparing like-condition partial-length second cDNA library being a substantially partial-length segment library corresponding to or complementary with expressed mRNA sequences of a second gene source;
(e) using the library of step (e) and preparing like-condition mono-length second cDNA library being a substantially mono-length segment library corresponding to or complementary with expressed mRNA
sequences of a second gene source;
(f) probe hybridizing the first and second mono-length segment libraries, each with an ordered array of step (a); and (g) comparing the probe hybridized accessible ordered arrays to determine differential hybridization display sites between first and second mono-length segment libraries.
41. The method of Claim 40 further comprising, (h) referencing differential hybridization sites to at least one oligonucleotide sequence data base to identify the gene of expression differential.
42. The method of Claim 40 further comprising amplifying the mono-length segments of said differential hybridization display sites with corresponding hybridization segments of a partial-length library prior to referencing said data base to identify the gene of expression differential.
43. The method of Claim 40 wherein said data base is substantially an expression-length library data base.
44. A method of detecting a point mismatch probe nucleotide in short mono-length oligonucleotides derived from partial-length segment or expression-length segment oligonucleotide libraries comprising (a) hybridizing probe oligonucleotides against an accessible ordered array of synthetic substantially short mono-length target oligonucleotide DNA segments representing all permutations of possible oligonucleotide sequences wherein the probe oligonucleotides and the target site oligonucleotides are of substantially equal length, and (b) quantitatively assessing hybridization sites (c) washing said hybridization sites under stringent conditions (d) detecting post-washing variation in hybridization sites representative of a point mismatch nucleotide, and (e) referencing said site variation to a partial-length or expression length segment of at least one of said libraries.
45. The method of Claim 44 wherein the probe oligonucleotide has a constant portion and an inquiry portion; and, the target site oligonucleotide has a portion complementary to said constant portion of said probe nucleotide wherein the washing of step (c) further comprises the step of (c') inhibiting competitively hybridization of the provided portion.
46. A mono-length gene tag library.
47. The library of Claim 46 wherein the mono-length gene tags are cRNA.
48. The library of Claim 47 wherein said mono-length insert segments comprise substantially 23-mer inserts.
49. The library of Claim 48 wherein said mono-length insert segments comprise substantially 14-mer inserts.
50. The library of Claim 46 wherein the mono-length gene tags are cDNA
51. The library of Claim 50 wherein said mono-length insert segments are segments comprise substantially 23-by inserts.
52. The library of Claim 51 wherein said mono-length insert segments are segments comprise substantially 14-bp inserts.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14593698A | 1998-09-03 | 1998-09-03 | |
US09/145,936 | 1998-09-03 | ||
PCT/CA1999/000789 WO2000014273A2 (en) | 1998-09-03 | 1999-08-26 | Differential genetic display technique and vector |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2342903A1 true CA2342903A1 (en) | 2000-03-16 |
Family
ID=22515207
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002342903A Abandoned CA2342903A1 (en) | 1998-09-03 | 1999-08-26 | Differential genetic display technique and vector |
Country Status (2)
Country | Link |
---|---|
CA (1) | CA2342903A1 (en) |
WO (1) | WO2000014273A2 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2808287B1 (en) * | 2000-04-26 | 2004-09-17 | Ipsogen | METHOD FOR QUANTITATIVE MEASUREMENT OF GENE EXPRESSION |
US20050014168A1 (en) * | 2003-06-03 | 2005-01-20 | Arcturus Bioscience, Inc. | 3' biased microarrays |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997013877A1 (en) * | 1995-10-12 | 1997-04-17 | Lynx Therapeutics, Inc. | Measurement of gene expression profiles in toxicity determination |
US5658736A (en) * | 1996-01-16 | 1997-08-19 | Genetics Institute, Inc. | Oligonucleotide population preparation |
GB9620749D0 (en) * | 1996-10-04 | 1996-11-20 | Brax Genomics Ltd | Identifying antisense oligonucleotides |
-
1999
- 1999-08-26 CA CA002342903A patent/CA2342903A1/en not_active Abandoned
- 1999-08-26 WO PCT/CA1999/000789 patent/WO2000014273A2/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2000014273A3 (en) | 2000-06-02 |
WO2000014273A2 (en) | 2000-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7824856B2 (en) | Expression profiling using microarrays | |
EP1713936B1 (en) | Genetic analysis by sequence-specific sorting | |
EP0799897B1 (en) | Kits and methods for the detection of target nucleic acids with help of tag nucleic acids | |
US6156502A (en) | Arbitrary sequence oligonucleotide fingerprinting | |
US6268147B1 (en) | Nucleic acid analysis using sequence-targeted tandem hybridization | |
US20140243229A1 (en) | Methods and products related to genotyping and dna analysis | |
EP1056889B1 (en) | Methods related to genotyping and dna analysis | |
EP1483404A2 (en) | Methods for detecting genome-wide sequence variations associated with a phenotype | |
WO2000047767A1 (en) | Oligonucleotide array and methods of use | |
MXPA03000575A (en) | Methods for analysis and identification of transcribed genes, and fingerprinting. | |
US20040023237A1 (en) | Methods for genomic analysis | |
US20040110166A1 (en) | Genome-wide scanning of genetic polymorphisms | |
US20070231803A1 (en) | Multiplex pcr mixtures and kits containing the same | |
US20060228714A1 (en) | Nucleic acid representations utilizing type IIB restriction endonuclease cleavage products | |
CA2342903A1 (en) | Differential genetic display technique and vector | |
JP2002532070A (en) | Arrays and methods for analyzing nucleic acid sequences | |
US20040029161A1 (en) | Methods for genomic analysis | |
IE83464B1 (en) | Process for amplifying and detecting nucleic acid sequences |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FZDE | Dead |