NOVEL NUCLEOTIDE AND AMINO ACID SEQUENCES, AND ASSAYS AND METHODS OF USE THEREOF FOR DIAGNOSIS
FIELD OF THE INVENTION The present invention is related to novel nucleotide and protein sequences, and assays and methods of use thereof.
BACKGROUND OF THE INVENTION Diagnostic markers are important for early diagnosis of many diseases, as well as monitoring treatment and determining prognosis of such diseases. Serum markers are examples of such diagnostic markers and are used for diagnosis of many different diseases. Such serum markers typically encompass secreted proteins and/or peptides; however, some serum markers may be released to the blood upon trauma, such as trauma to the heart (for example through cardiac failure). Immunohistochemistry (IHC) is the study of distribution of an antigen of choice in a sample based on specific antibody-antigen binding, typically on tissue slices. The antibody features a label which can be detected, for example as a stain which is detectable under a microscope. The tissue slices are prepared by being fixed. IHC is therefore particularly suitable for antibody- antigen reactions that are not disturbed or destroyed by the process of fixing the tissue slices. IHC permits determining the localization of binding, and hence mapping of the presence of the antigen within the tissue and even within different compartments in the cell. Such mapping can provide useful diagnostic information, including:
1) the histological type of the tissue sample
2) the presence of specific cell types within the sample 3) information on the physiological and/or pathological state of cells (e.g. which phase of the cell-cycle they are in)
4) the presence of disease related changes within the sample
5) differentiation between different specific disease subtypes where it is already known the tissue is of disease state (for example, the differentiation between different tumor types when it is already known the sample was taken from cancerous tissue).
IHC information is valuable for more than diagnosis. It can also be used to determine prognosis and therapy treatment (as in the case of HER- 2 in breast cancer) and monitor disease. IHC protein markers could be from any cellular location. Most often these markers are membrane proteins but secreted proteins or intracellular proteins (including intranuclear) can be used as an IHC marker too. IHC has at least two major disadvantages. It is performed on tissue samples and therefore a tissue sample has to be collected from the patient, which most often requires invasive procedures like biopsy associated with pain, discomfort, hospitalization and risk of infection. In addition, the interpretation of the result is observer dependant and therefore subjective. There is no measured value but rather an estimation (on a scale of 1-4) of how prevalent the antigen on target is.
SUMMARY OF THE INVENTION The present invention provides, in different embodiments, many novel amino acid and nucleic acid sequences, which may optionally be used as diagnostic markers. For example, the present invention provides a number of different variants of known serum proteins, which may optionally be used as diagnostic markers, preferably as serum markers, or optionally as IHC markers. The present invention therefore overcomes the many deficiencies of the background art with regard to the need to obtain tissue samples and subjective interpretations of results. For example, serum markers require only a simple blood test and their result is typically a scientifically measured number. As IHC markers, the variants of the present invention may also provide different and/or better measurement parameters for various diseases and/or pathological conditions. The present invention also provides a number of different variants of known IHC proteins, which may optionally be used as diagnostic markers, preferably as serum markers, or optionally as IHC markers. The present invention therefore overcomes the many deficiencies of the background art with regard to the need to obtain tissue samples and subjective interpretations of results. For example, serum markers require only a simple blood test and their result is typically a scientifically measured number. As IHC markers, the variants of the present invention may also provide different and/or better measurement parameters for various diseases and/or pathological conditions.
Other variants are also provided by the present invention as described in greater detail below. The diseases for which such variants may be useful diagnostic markers are described in greater detail below for each of the variants. The variants themselves are described by "cluster" or by gene, as these variants are splice variants of known proteins. Therefore, a "marker- detectable disease" refers to a disease that may be detected by a particular marker, with regard to the description of such diseases below. The markers of the present invention, alone or in combination, show a high degree of differential detection between disease and non-disease states. The present invention therefore also relates to diagnostic assays for disease detection optionally and preferably in a biological sample taken from a subject (patient), which is more preferably some type of body fluid or secretion including but not limited to seminal plasma, blood, serum, urine, prostatic fluid, seminal fluid, semen, the external secretions of the skin, respiratory, intestinal, and genitourinary tracts, tears, cerebrospinal fluid, sputum, saliva, milk, peritoneal fluid, pleural fluid, cyst fluid, broncho alveolar lavage, lavage of the reproductive system and/or lavage of any other part of the body or system in the body, and stool or a tissue sample. The term may also optionally encompass samples of in vivo cell culture constituents. The sample can optionally be diluted with a suitable eluant before contacting the sample to an antibody and or performing any other diagnostic assay.
Information given in the text with regard to cellular localization was determined according to four different software programs: (i) tmhmm (from Center for Biological Sequence Analysis, Technical University of Denmark DTU, http://www.cbs.dm.dk/services/TMHMM/TMHMM2.0b.guide.php) or (ii) tmpred (from EMBnet, maintained by the ISREC Bionformatics group and the LICR Information Technology Office, Ludwig Institute for Cancer Research, Swiss Institute of Bioinformatics, http://www.ch.embnet.org/software/TMPRED_form.html) for transmembrane region prediction; (iii) signalp_hmm or (iv) signalp_nn (both from Center for Biological Sequence Analysis, Technical University of Denmark DTU, http://www.cbs.dtu.dk/services/SignalP/background prediction.php) for signal peptide prediction. The terms "signalp_hmm" and "signalp_nn" refer to two modes of operation for the
program SignalP: hmm refers to Hidden Markov Model, while nn refers to neural networks. Localization was also determined through manual inspection of known protein localization and/or gene structure, and the use of heuristics by the individual inventor. In some cases for the manual inspection of cellular localization prediction inventors used the ProLoc computational platform [Einat Hazkani-Covo, Erez Levanon, Galit Rotman, Dan Graur and Amit Novik; (2004) "Evolution of multicellularity in metazoa: comparative analysis of the subcellular localization of proteins in Saccharomyces, Drosophila and Caenorhabditis." Cell Biology International 2004;28(3):171-8.], which predicts protein localization based on various parameters including, protein domains (e.g., prediction of trans-membranous regions and localization thereof within the protein), pi, protein length, amino acid composition, homology to pre-annotated proteins, recognition of sequence patterns which direct the protein to a certain organelle (such as, nuclear localization signal, NLS, mitochondria localization signal), signal peptide and anchor modeling and using unique domains from Pfam that are specific to a single compartment. Information is given in the text with regard to SNPs (single nucleotide polymoφhisms).
A description of the abbreviations is as follows. "T - > C", for example, means that the SNP results in a change at the position given in the table from T to C. Similarly, "M - > Q", for example, means that the SNP has caused a change in the corresponding amino acid sequence, from methionine (M) to glutamine (Q). If, in place of a letter at the right hand side for the nucleotide sequence SNP, there is a space, it indicates that a frameshift has occurred. A frameshift may also be indicated with a hyphen (-). A stop codon is indicated with an asterisk at the right hand side (*). As part of the description of an SNP, a comment may be found in parentheses after the above description of the SNP itself. This comment may include an FTId, which is an identifier to a SwissProt entry that was created with the indicated SNP. An FTId is a unique and stable feature identifier, which allows construction of links directly from position- specific annotation in the feature table to specialized protein-related databases. The FTId is always the last component of a feature in the description field, as follows: FTId=XXX_number, in which XXX is the 3- letter code for the specific feature key, separated by an underscore from a 6-digit number. In the table of the amino acid mutations of the wild type proteins of the selected splice variants of the invention, the header of the first column is "SNP position(s) on amino acid sequence", representing a position of a known mutation on amino acid sequence.
SNPs may optionally be used as diagnostic markers according to the present invention, alone or in combination with one or more other SNPs and/or any other diagnostic marker. Preferred embodiments of the present invention comprise such SNPs, including but not limited to novel SNPs on the known (WT or wild type) protein sequences given below, as well as novel nucleic acid and/or amino acid sequences formed through such SNPs, and/or any SNP on a variant amino acid and/or nucleic acid sequence described herein. Information given in the text with regard to the Homology to the known proteins was determined by Smith- Waterman version 5.1.2 using special (non default) parameters as follows: -model=sw.model -GAPEXT=0 -GAPOP=100.0 -MATRIX=blosumlOO Information is given with regard to overexpression of a cluster in cancer based on ESTs. A key to the p values with regard to the analysis of such overexpression is as follows: - library- based statistics: P- value without including the level of expression in cell- lines (PI) - library based statistics: P- value including the
of expression in cell- lines (P2) - EST clone statistics: P- value without including the level of expression in cell- lines (SP1) - EST clone statistics: predicted overexpression ratio without including the level of expression in cell- lines (R3) - EST cbne statistics: P- value including the level of expression in cell- lines (SP2) - EST clone statistics: predicted overexpression ratio including the level of expression in cell- lines (R4) Library-based statistics refer to statistics over an entire library, while EST clone statistics refer to expression only for ESTs from a particular tissue or cancer.
Information is given with regard to overexpression of a cluster in cancer based on microarrays. As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured. There are two types of microarray results: those from microarrays prepared according to a
design by the present inventors, for which the microarray fabrication procedure is described in detail in Materials and Experimental Procedures section herein; and those results from microarrays using Affymetrix technology. As a microarray reference, in the specific segment paragraphs, the unabbreviated tissue name was used as the reference to the type of chip for which expression was measured. For microarrays prepared according to a design by the present inventors, the probe name begins with the name of the cluster (gene), followed by an identifying number. Oligonucleotide microarray results taken from Affymetrix data were from chips available from Affymetrix Inc, Santa Clara, CA, USA (see for example data regarding the
Human Genome U133 (HG-U133) Set at www.affymetrix.com/products/arrays/specific/hgul33.affx; GeneChip Human Genome U133A
2.0 Array at www.affymetrix.com/products/arrays/specific/hgul33av2.affx; and Human
Genome U133 Plus 2.0 Array at www.affymetrix.com/products/arrays/specific/hgul33plus.affx). The probe names follow the
Affymetrix naming convention. The data is available from NCBI Gene Expression Omnibus (see www.ncbi.nlm.nih.gov/projects/geo/ and Edgar et al, Nucleic Acids Research, 2002, Vol.
30, No. 1 207-210). The dataset (including results) is available from www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1133 for the Series GSE1133 database
(published on March 2004); a reference to these results is as follows: Su et al (Proc Natl Acad
Sci U S A. 2004 Apr 20;101(16):6062-7. Epub 2004 Apr 09). Oligonucleotide probes for use with arrays designed by the present inventors:
>S67314_0_0_741
CACAGAGCCAGGATGTTCTTCTGACCTCAGTATCTACTCCAGCTCCAGCT
>S67314_0_0_744
TGGCATGCTGGAACATGGACTCTAGCTAGCAAGAAGGGCTCAAGGAGGTG >HSPROSAP_0_0_ 11823
CCTCTGGGGTAGGTTACTATCCTCTTTGTCCTGCCAGTACCCCTAGAAAT
>HSPROSAP_0_9_0
TTGGTGTTTCGGCATGGAGACCGAAGTCCCATTGACACCTTTCCCACTGA
>D1 1581_0_0_2570 ATGAGGGGAGATTGCCTTCCACTACACATAAGTATGGTCAAGTATGAAAT
>HSMUC1A 0 37 0
AAAAGGAGACTTCGGCTACCCAGAGAAGTTCAGTGCCCAGCTCTACTGAG
>HSMUC1A_0_0_11364
AAAGGCTGGCATAGGGGGAGGTTTCCCAGGTAGAAGAAGAAGTGTCAGCA
>HSMUC1A_0_0_11365
AATTAACCCTTTGAGAGCTGGCCAGGACTCTGGACTGATTACCCCAGCCT
>HSAPHOL_0_11_0
GGAACATTCTGGATCTGACCCTCCCAGTCTCATCTCCTGACCCTCCCACT
>HSCREACT_0_31_0
CCTCCCCTTTTCCACACGAACCTTGTGGGGCTGTGAATTCTTTCTTCATC
In the heart specific clusters, a first set of abbreviations is used for the first histogram
ADP = adipocyte
BLD = blood BLDR = bladder
BRN = brain
BONE = bone
BM = bone marrow
BRS = mammary gland CAR = cartilage
CNS = central nervous system
COL = colon
E-ADR = endocrine_adrenal_gland
E-PAN = endocrine_pancreas E-PT = endocrine_parathyroid_thyroid
ENDO = endocrine_unchar
EPID = epididymis
GI = gastrointestinal tract
GU = genitourinary HN = head and neck
HRT = heart
KD = kidney
LI = liver
LUNG = lung
LN = lymph node MUS = muscle
OV = ovary
PNS = peripheral nervous system
PRO = prostate
SKIN = skin SPL = spleen
SYN = synovial membrane
TCELL = immune T cells
THYM = thymus
TST = testes UTER = cervix- uterus
VAS = vascular
In the second histogram(s) of the heart paragraph, the oligo-probe names are abbreviated/enumerated as follows:
"adipocyte", "Al"; "adrenalcortex", "A2"; "adrenalgland", "A3"; "amygdala", "A4"; "appendix", "A5"; "atrioventricularnode", "A6"; "bm_cdl05_endothelial", "El"; "bm_cd33_myeloid", "Ml"; "bm_cd34_", "Bl"; "bm_cd7 l_earlyerythroid", "El"; "bonemarrow", "B2";
"bronchialepithelialcells", "B3";
"cardiacmyocytes", "Cl";
"caudatenucleus", "C2";
"cerebellum", "C3";
"cerebellumpeduncles", "C4";
"ciliaryganglion", "C5";
"cingulatecortex", "C6";
"globuspallidus", "GI";
"heart", "HI";
"hypothalamus", "H2";
"kidney", "KI";
"liver", "Ll";
"lung", "L2";
"lymphnode", "L3";
"medullaoblongata", "Ml";
"occipitallobe", "Ol";
"olfactorybulb", "02";
"ovary", "03";
"pancreas", "PI";
"pancreaticislets", "P2";
"parietallobe", "P3";
"pb_bdca4_dentritic_cells", "P4";
"pb_cd 14_monocytes", "P5";
"pb_cdl9_bcells", "P6";
"pb_cd4_tcells", "P7";
"pb_cd56_nkcells", "P8";
"pb_cd8_tcells", "P9";
"pituitary", "Pa";
"placenta", "Pb";
"pons", "Pc";
"prefrontalcortex " , "Pd";
"prostate", "Pe", "salivarygland", "SI" "skeletalmuscle", "S2", "skin", "S3", "smoothmuscle", "S4" "spinalcord", "S5", "subthalamicnucleus", "S6"; " superiorcervicalganglion" , "S7"; "temporallobe", ii'- i ti "testis", "T2" "testisgermceH", "T3" "testisinterstitial", »T4" "testisleydigcell", "T5" "testisseminiferoustubule", "S6"; "thalamus", "thymus", "T8" "thyroid", "■ a" "tonsil", "Ta"; "trachea", "Tb" "trigeminalganglion", "Tc"; "uterus", "Ul" "uteruscorpus", "U2" "wholeblood", "wr "wholebrain", "W2'
It should be noted that the terms "segment", "seg" and "node" are used interchangeably in reference to nucleic acid sequences of the present invention, they refer to portions of nucleic acid sequences that were shown to have one or more properties as described below. They are also the building blocks that were used to construct complete nucleic acid sequences as described in greater detail bebw. Optionally and preferably, they are examples of oligonucleotides which are embodiments of the present invention, for example as amplicons,
hybridization units and/or from which primers and/or complementary oligonucleotides may optionally be derived, and/or for any other use. As used herein the phrase "disease" includes any type of pathology and/or damage, including both chronic and acute damage, as well as a progress from acute to chronic damage. The term "marker" in the context of the present invention refers to a nucleic acid fragment, a peptide, or a polypeptide, which is differentially present in a sample taken from patients (subjects) having one of the herein-described diseases or conditions, as compared to a comparable sample taken from subjects who do not have one the above- described diseases or conditions. The phrase "differentially present" refers to differences in the quantity of a marker present in a sample taken from patients having one of the herein-described diseases or conditions as compared to a comparable sample taken from patients who do not have one of the herein-described diseases or conditions. For example, a nucleic acid fragment may optionally be differentially present between the two samples if the amount of the nucleic acid fragment in one sample is significantly different from the amount of the nucleic acid fragment in the other sample, for example as measured by hybridization and/or NAT-based assays. A polypeptide is differentially present between the two samples if the amount of the polypeptide in one sample is significantly different from the amount of the polypeptide in the other sample. It should be noted that if the marker is detectable in one sample and not detectable in the other, then such a marker can be considered to be differentially present. Optionally, a relatively low amount of up- regulation may serve as the marker, as described herein. One of ordinary skill in the art could easily determine such relative levels of the markers; further guidance is provided in the description of each individual marker below. As used herein the phrase "diagnostic" means identifying the presence or nature of a pathologic condition. Diagnostic methods differ in their sensitivity and specificity. The
"sensitivity" of a diagnostic assay is the percentage of diseased individuals who test positive (percent of "true positives"). Diseased individuals not detected by the assay are "false negatives." Subjects who are not diseased and who test negative in the assay are termed "true negatives." The "specificity" of a diagnostic assay is 1 minus the false positive rate, where the "false positive" rate is defined as the proportion of those without the disease who test positive.
While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis. As used herein the phrase "diagnosing" refers to classifying a disease or a symptom, determining a severity of the disease, monitoring disease progression, forecasting an outcome of a disease and/or prospects of recovery. The term "detecting" may also optionally encompass any of the above. Diagnosis of a disease according to the present invention can be effected by determining a level of a polynucleotide σ a polypeptide of the present invention in a biological sample obtained from the subject, wherein the level determined can be correlated with predisposition to, or presence or absence of the disease. It should be noted that a "biological sample obtained from the subject" may also optionally comprise a sample that has not been physically removed from the subject, as described in greater detail below. As used herein, the term "level" refers to expression levels of RNA and/or protein or to DNA copy number of a marker of the present invention. Typically the level of the marker in a biological sample obtained from the subject is different (i.e., increased or decreased) from the level of the same variant in a similar sample obtained from a healthy individual (examples of biological samples are described herein). Numerous well known tissue or fluid collection methods can be utilized to collect the biological sample from the subject in order to determine the level of DNA, RNA and/or polypeptide of the variant of interest in the subject. Examples include, but are not limited to, fine needle biopsy, needle biopsy, core needle biopsy and surgical biopsy (e.g., brain biopsy), and lavage. Regardless of the procedure employed, once a biopsy/sample is obtained the level of the variant can be determined and a diagnosis can thus be made. Determining the level of the same variant in normal tissues of the same origin is preferably effected along- side to detect an elevated expression and/or amplification and/or a decreased expression, of the variant as opposed to the normal tissues. A "test amount" of a marker refers to an amount of a marker in a subject's sample that is consistent with a diagnosis of a particular disease or condition. A test amount can be either in absolute amount (e.g., microgram ml) or a relative amount (e.g., relative intensity of signals).
A "control amount" of a marker can be any amount or a range of amounts to be compared against a test amount of a marker. For example, a control amount of a marker can be the amount of a marker in a patient with a particular disease or condition or a person without such a disease or condition. A control amount can be either in absolute amount (e.g., microgram/ml) or a relative amount (e.g., relative intensity of signals). "Detect" refers to identifying the presence, absence or amount of the object to be detected. A "label" includes any moiety or item detectable by spectroscopic, photo chemical, biochemical, immunochemical, or chemical means. For example, useful labels include 32P, 35S, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin-streptavadin, dioxigenin, haptens and proteins for which antisera or monoclonal antibodies are available, or nucleic acid molecules with a sequence complementary to a target. The label often generates a measurable signal, such as a radioactive, chromogenic, or fluorescent signal, that can be used to quantify the amount of bound label in a sample. The label can be incorporated in or attached to a primer or probe either covalently, or through ionic, van der Waals or hydrogen bonds, e.g., incorporation of radioactive nucleotides, or biotinylated nucleotides that are recognized by streptavadin. The label may be directly or indirectly detectable. Indirect detection can involve the binding of a second label to the first label, directly or indirectly. For example, the label can be the ligand of a binding partner, such as biotin, which is a binding partner for streptavadin, or a nucleotide sequence, which is the binding partner for a complementary sequence, to which it can specifically hybridize. The binding partner may itself be directly detectable, for example, an antibody may be itself labeled with a fluorescent molecule. The binding partner also may be indirectly detectable, for example, a nucleic acid having a complementary nucleotide sequence can be a part of a branched DNA molecule that is in turn detectable through hybridization with other labeled nucleic acid molecules (see, e.g., P. D. Fahrlander and A. Klausner, Bio/Technology 6:1165 (1988)). Quantitation of the signal is achieved by, e.g., scintillation counting, densitometry, or flow cytometry. Exemplary detectable labels, optionally and preferably for use with immunoassays, include but are not limited to magnetic beads, fluorescent dyes, radiolabels, enzymes (e.g., horse radish peroxide, alkaline phosphatase and others commonly used in an ELISA), and calorimetric labels such as colloidal gold or colored glass or plastic beads. Alternatively, the marker in the
sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker-specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture. "Immunoassay" is an assay that uses an antibody to specifically bind an antigen. The immunoassay is characterized by the use of specific binding properties of a particular antibody to isolate, target, and/or quantify the antigen. The phrase "specifically (or selectively) binds" to an antibody or "specifically (or selectively) immunoreactive with," when referring to a protein or peptide (or other epitope), refers to a binding reaction that is determinative of the presence of the protein in a heterogeneous population of proteins and other biologies. Thus, under designated immunoassay conditions, the specified antibodies bind to a particular protein at least two times greater than the background (non-specific signal) and do not substantially bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein. For example, polyclonal antibodies raised to seminal basic protein from specific species such as rat, mouse, or human can be selected to obtain only those polyclonal antibodies that are specifically immunoreactive with seminal basic protein and not with other proteins, except for polymorphic variants and alleles of seminal basic protein. This selection may be achieved by subtracting out antibodies that cross-react with seminal basic protein molecules from other species. A variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select antibodies specifically immunoreactive with a protein (see, e.g., Harlow & Lane, Antibodies, A Laboratory Manual (1988), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity). Typically a specific or selective reaction will be at least twice background signal or noise and more typically more than 10 to 100 times background. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
'Transcript Name "*,*.-
a nucleic acid sequence comprising a sequence in the table below:
HSAPHOL node 11 HSAPHOL node 13 HSAPHOL node 15 HSAPHOL node 19 HSAPHOL node 2 HSAPHOL node 21 HSAPHOL_node 23 HSAPHOL node 26 HSAPHOL_node_28 HSAPHOL node 38 HSAPHOL_node 40 HSAPHOL node 42 HSAPHOL node 16 HSAPHOL node 25 HSAPHOL_node_34 HSAPHOL node 35 HSAPHOL node 36 HSAPHOL node 41 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
IProtein Nam S1 HSAPHOL P2 HSAPHOL P3 HSAPHOL P4 HSAPHOL_P5 HSAPHOL P6 HSAPHOL P7 HSAPHOL P8
According to preferred embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PHSGPAAAFIRRRGWWPGPRCA corresponding to amino acids 1 - 22 of HSAPHOL_P2, second amino acid sequence being at least 90 % homologous to
PATPRPLSWLRAPTRLCLDGPSPVLCA corresponding to amino acids 1 - 27 of AAH21289, which also corresponds to amino acids 23 - 49 of HSAPHOL_P2, and a third amino acid sequence being at least 90 % homologous to EKEKDPKYWRDQ AQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQL HHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAAT ERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNE MPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLD GLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVT DPSLSEMVV VAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHE A VEMDRAIGQ AG SLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYK VVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQN YVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 83 - 586 of AAH21289, which also corresponds to amino acids 50 - 553 of HSAPHOL_P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of HSAPHOL_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCA of HSAPHOL_P2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HS APHOL_P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino
acids in length, wherein at least two amino acids comprise AE, having a structure as follows : a sequence starting from any of amino acid numbers 49-x to 50; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA corresponding to amino acids 1 - 49 of HSAPHOL_P2, second amino acid sequence being at least 90 % homologous to
EKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQL HHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAAT ERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNE MPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLD GLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVT DPSLSEMWVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAG SLTSSEDTLTWTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYK VVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQN YVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 21 - 524 of PPBT_HUMAN, which also corresponds to amino acids 50 - 553 of
HSAPHOL_P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of HSAPHOL_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA of HSAPHOL_P2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally
at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a structure as follows: a sequence starting from any of amino acid numbers 49-x to 50; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P3, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP corresponding to amino acids 63 - 82 of AAH21289, which also corresponds to amino acids 1 - 20 of HSAPHOL_P3, and a second amino acid sequence being at least 90 % homologous to
GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYL CGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSA AYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTD VEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFE PGDMQ YELNRNNVTDPSLSEM WVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQ AL HEAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKK PFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKG PMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSV LF corresponding to amino acids 123 - 586 of AAH21289, which also corresponds to amino acids 21 - 484 of HSAPHOL P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P3, comprising a first amino acid
sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP corresponding to amino acids 1 - 20 of PPBT_HUMAN, which also corresponds to amino acids 1 - 20 of HSAPHOL_P3, and a second amino acid sequence being at least 90 % homologous to GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYL CGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSA AYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTD VEYESDEKΛRGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFE PGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQAL HEAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKK PFTAILYGNGPGYKWGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKG PMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSV LF corresponding to amino acids 61 - 524 of PPBT_HUMAN, which also corresponds to amino acids 21 - 484 of HSAPHOL_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P4, comprising a first amino acid sequence being at least 90 % homologous to
MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLC GVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAA YAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDV EYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEP GDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKP
FTAILYGNGPGYKWGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGP MAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVL F corresponding to amino acids 124 - 586 of AAH21289, which also corresponds to amino acids 1 - 463 of HSAPHOL_P4. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P4, comprising a first amino acid sequence being at least 90 % homologous to
MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLC GVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAA YAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDV EYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEP GDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKP FTAILYGNGPGYKWGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGP MAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVL F corresponding to amino acids 62 - 524 of PPBT_HUMAN, which also corresponds to amino acids 1 - 463 of HSAPHOL_P4. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P5, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL GLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKA KQALHEAVEM corresponding to amino acids 63 - 417 of AAH21289, which also corresponds to amino acids 1 - 355 of HSAPHOL_P5, and a second amino acid sequence being at least 90 % homologous to DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVD YAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIG
ANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 440 - 586 of AAH21289, which also corresponds to amino acids 356 - 502 of HSAPHOL_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P5, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFK RYKHSHFIWNRTELLTLDPHNVDYLL GLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKA KQALHEAVEM corresponding to amino acids 1 - 355 of PPBT_HUMAN, which also corresponds to amino acids 1 - 355 of HSAPHOL_P5, and a second amino acid sequence being at least 90 % homologous to DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKWGGERENVSMVD YAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIG ANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 377 - 524 of PPBT_HUMAN, which also corresponds to amino acids 356 - 502 of HSAPHOL_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally
at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P6, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARTLKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL corresponding to amino acids 63 - 349 of AAH21289, which also corresponds to amino acids 1 - 287 of HS APHOL_P6, and a second amino acid sequence being at least 90 % homologous to GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTP RGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKWGGERENVSMVDYAHNNYQAQSAV PLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAG SLAAGPLLLALALYPLSVLF corresponding to amino acids 395 - 586 of AAH21289, which also corresponds to amino acids 288 - 479 of HSAPHOL_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P6, comprising a first amino acid
sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEK EKDPKYWRDQAQETLKYALELQKLNTNVAKNVEVIFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL corresponding to amino acids 1 - 287 of PPBT_HUMAN, which also corresponds to amino acids 1 - 287 of HSAPHOL_P6, and a second amino acid sequence being at least 90 % homologous to GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTP RGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAV PLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAG SLAAGPLLLALALYPLSVLF corresponding to amino acids 333 - 524 of PPBT_HUMAN, which also corresponds to amino acids 288 - 479 of HSAPHOL_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P7, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK corresponding to amino acids 63 -
326 of AAH21289, which also corresponds to amino acids 1 - 264 of HSAPHOL_P7, and a ' second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP corresponding to amino acids 265 - 306 of HSAPHOL_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOL_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL_P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P7, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPR corresponding to amino acids 1 - 262 of PPBT_HUMAN, which also corresponds to amino acids 1 - 262 of HSAPHOL_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP corresponding to amino acids 263 - 306 of HSAPHOL_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOL_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL_P7.
According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P7, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKΕK PKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK corresponding to amino acids 1 - 264 of 075090, which also corresponds to amino acids 1 - 264 of HSAPHOL_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP corresponding to amino acids 265 - 306 of HSAPHOL_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOL_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL_P7. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P8, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G corresponding to amino acids 63 - 350 of AAH21289, which also corresponds to amino acids 1 - 288 of HSAPHOL_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
KWRGWRGGCMARSLVAGAACGQHLGTRP corresponding to amino acids 289 - 316 of HSAPHOL_P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOL_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL_P8. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P8, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHA TPSAAY AHS ADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL G corresponding to amino acids 1 - 288 of PPBTJHUMAN, which also corresponds to amino acids 1 - 288 of HSAPHOL_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
KWRGWRGGCMARSLVAGAACGQHLGTRP corresponding to amino acids 289 - 316 of HSAPHOL_P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOL_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL P8. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSAPHOL_P8, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFL GDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGΓVTTTRVNHA TPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPK NKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFI NRTELLTLDPHNVDYLL G corresponding to amino acids 1 - 288 of 075090, which also corresponds to amino acids 1 - 288 of HSAPHOL_P8, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP corresponding to amino acids 289 - 316 of HSAPHOL_P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSAPHOLJP8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL_P8. Optionally the Alkaline phosphatase vanant- detectable disease comprises one or more of the following: liver diseases including but not limited to infectious, malignant, degenerating, cholestatic and autoimmune diseases, bone conditions including but not limited to Paget's disease, Osteomalacia, Rickets, bone tumors, osteoporosis, bone changes occuπing due to parathyroid disorders, tumors (either benign, malignant or metastatic) in general, and more specific diseases including but not limited to Hodgkin's disease, diabetes, hyperthyroidism and congestive heart failure. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Oranscnpt Name HSKITCR T2 HSKITCR_T4 HSKITCR T5 HSKITCR T6
a nucleic acid sequence comprising a sequence in the table below: gggmentNamϊT HSKITCR node 0 HSKITCR node 11 HSKITCR_node_17 HSKITCR node 2 HSKITCR node 21 HSKITCR node 27 HSKITCR node_3 HSKITCR node 31 HSKITCR node_33 HSKITCR node 34 HSKITCR node 36 HSKITCR node 44 HSKITCR node_46 HSKITCR node 5 HSKITCR node_50 HSKITCR node 7 HSKITCR node_9 HSKITCR node 13 HSKITCR node_15 HSKITCR node 19 HSKITCR node_23 HSKITCR node 25 HSKITCR_node_29 HSKITCR node 37 HSKITCR node 39 HSKITCR node 41 HSKITCR node 43 HSKITCR node 47 HSKITCR node 48 According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSKITCR_P2, comprising a first amino acid sequence being at least 90 % homologous to
MAPESIFNCVYTFESDVWSYGIFLWELFSLGSSPYPGMPVDSKFYKMIKEGFRMLSPEH APAEMYDLMKTCWDADPLKRPTFKQIVQLIEKQISESTNHIYSNLANCSPNRQKPWDH SVRINSVGSTASSSQPLLVHDDV corresponding to amino acids 836 - 976 of KIT_HUMAN, which also corresponds to amino acids 1 - 141 of HSKITCR_P2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSKITCR_P3, comprising a first amino acid sequence being at least 90 % homologous to
MRGARGAWDFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRVGDEIRLLCTD PGFVKWTFEILDETNENKQNEWITEKAEATNTGKYTCTNKHGLSNSIYVFVRDPAKLFL VDRSLYGKEDNDTLVRCPLTDPEVTNYSLKGCQGKPLPKDLRFIPDPKAGIMIKSVKRA YHRLCLHCSVDQEGKSVLSEKFILKVRPAFKAVPVVSVSKASYLLREGEEFTVTCTIKD VSSSVYSTWKRENSQTKLQEKYNSWHHGDFNYERQATLTISSARVNDSGVFMCYANN TFGSANVTTTLEWDKGFINIFPMINTTVFVNDGENVDLIVEYEAFPKPEHQQWIYMNR TFTDKWEDYPKSENESNIRYVSELHLTRLKGTEGGTYTFLVSNSDVNAAIAFNVYVNTK PEILTYDRLVNGMLQCVAAGFPEPTIDWYFCPGTEQRCSASVLPVDVQTLNSSGPPFGK LVVQSSIDSSAFKHNGTVECKAYNDVGKTSAYFNFAFKGNNKEQIHPHTLFTPLLIGFVI VAGMMCIIVMILTYKYLQKPMYEVQWKVVEEINGNNYVYIDPTQLPYDHKWEFPRNR LSFGKTLGAGAFGKVVEATAYGLIKSDAAMTVAVKMLKPSAHLTEREALMSELKVLS YLGNHMNIVNLLGACTIGGPTLVITEYCCYGDLLNFLRRKRDSFICSKQEDHAEAALYK NLLHSKESSCSDSTNEYMDMKPGVSYVVPTKADKRRSVRIGSYIERDVTPAIMEDDELA LDLEDLLSFSYQVAKGMAFLASKNCIHRDLAARNILLTHGRITKICDFGLARDIKNDSNY VVKGNARLPVKWMAPESIFNCVYTFESDVWSYGIFLWELFSLGSSPYPGMPVDSKFYK MIKEGFRMLSPEHAPAEMYDIMKTCWDADPLKRPTFKQIVQLIEKQISESTNHIYSNLAN CSPNRQKPW corresponding to amino acids 1 - 951 of KIT_HUMAN, which also corresponds to amino acids 1 - 951 of HSKITCR_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
LQGHFIESFVLDILESLYFYNFFLHQMFLCSGLMFEIILWLFL corresponding to amino
acids 952 - 994 of HSKITCR_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSKITCR_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LQGHFIESFVLDILESLYFYNFFLHQMFLCSGLMFEIILWLFL in HSKITCR_P3. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSKITCR_P4, comprising a first amino acid sequence being at least 90 % homologous to
MRGARGAWDFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRVGDEIRLLCTD PGFVKWTFEILDETNENKQNEWITEKAEATNTGKYTCTNKHGLSNSIYVFVRDPAKLFL VDRSLYGKEDNDTLVRCPLTDPEVTNYSLKGCQGKPLPKDLRFIPDPKAGIMIKSVKRA YHRLCLHCSVDQEGKSVLSEKFILKVRPAFKAVPVVSVSKASYLLREGEEFTVTCTIKD VSSS VYSTWKRENSQTKLQEKYNS WHHGDFNYERQ ATLTISSARVNDSGVFMC YANN TFGSANVTTTLEVVDKGFINIFPMINTTVFVNDGENVDLIVEYEAFPKPEHQQWIYMNR TFTDKWEDYPKSENESNIRYVSELHLTRLKGTEGGTYTFLVSNSDVNAAIAFNVYVNTK PEILTYDRLVNGMLQCVAAGFPEPTIDWYFCPGTEQRCSASVLPVDVQTLNSSGPPFGK LVVQSSIDSSAFKHNGTVECKAYNDVGKTSAYFNFAFKGNNKEQFFLPHTLFTPLLIGFVI VAGMMCIIVMILTYKYLQKPMYEVQWKVVEEINGNNYVYIDPTQLPYDHKWEFPRNR LSFGKTLGAGAFGKVVEATAYGLIKSDAAMTVAVKMLKPSAHLTEREALMSELKVLS YLGNHMNΓVNLLGACTIGGPTLVITEYCCYGDLLNFLRRKRDSFICSKQEDHAEAALYK NLLHSKESSCSDSTNEYMDMKPGVSYWPTKADKRRSVRIGSYIERDVTPAIMEDDELA
LDLEDLLSFSYQVAKGMAFLASKNCIHRDLAARNILLTHGRITKICDFGLARDIKNDSNY VVKGN corresponding to amino acids 1 - 828 of KITJTUMAN, which also corresponds to amino acids 1 - 828 of HSKITCR_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSTHSLLDSPAKDF corresponding to amino acids 829 - 842 of HSKITCR_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order.
According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSKITCR_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSTHSLLDSPAKDF in HSKITCR_P4. According to preferred embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSKITCR_P5, comprising a first amino acid sequence being at least 90 % homologous to
MRGARGAWDFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRVGDEIRLLCTD PGFVKWTFEILDETNENKQNEWITEKAE ATNTGKYTCTNKHGLSNSIYVFVR corresponding to amino acids 1 - 112 of KIT_HUMAN, which also conesponds to amino acids 1 - 112 of HSKITCR_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GKCLAFCSAVLSRI corresponding to amino acids 113 - 126 of HSKITCR_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSKITCR_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GKCLAFCSAVLSRI in HSKITCR_P5. Optionally CD117-detectable cancers comprise gastrointestinal stromal tumors, mast cell tumors, and/or seminomatous germ cell tumors. According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HSCREACT PEA 1 T12
HSCREACT_PEA_1_T 13
HSCREACT_PEA 1 T15
HSCREACT PEA 1 T22
HSCREACT_PEA_1 T29
HSCREACT PEA 1 T30
HSCREACT_PEA 1 T32
HSCREACT PEA 1 T33
HSCREACT PEA 1 T38
HSCREACT PEA 1 T39 a nucleic acid sequence comprising a sequence in the table below:
HSCREACT PEA node 63
HSCREACT PEA node 10
HSCREACT PEA node 11
HSCREACT PEA node 12
HSCREACT PEA node_13
HSCREACT PEA node 14
HSCREACT PEA node 15
HSCREACT PEA node 16
HSCREACT PEA node 17
HSCREACT PEA node 18
HSCREACT PEA node 19
HSCREACT PEA node 2
HSCREACT PEA node 20
HSCREACT_PEA node 21
HSCREACT PEA node 22
HSCREACT_PEA node 23
HSCREACT PEA node 24
HSCREACT PEA node 3
HSCREACT PEA node 30
HSCREACT PEA node 31
HSCREACT PEA node 32
HSCREACT PEA node 33
HSCREACT PEA node 34
HSCREACT PEA node 35
HSCREACT PEA node 36
HSCREACT PEA node 37
HSCREACT PEA node 38
HSCREACT PEA node 39
HSCREACT PEA node 4
HSCREACT PEA .node 40
HSCREACT PEA node 41
HSCREACT_PEA_ node 42
HSCREACT PEA node 43
HSCREACT PEA .node 44
HSCREACT PEA node 45
HSCREACT_PEA. node 46
HSCREACT PEA node 47
HSCREACT PEA node 48
HSCREACT PEA 1 node 49
HSCREACT PEA 1 node 5
HSCREACT PEA l_node_50
HSCREACT PEA 1 node 51
HSCREACT PEA 1 node 52
HSCREACT PEA 1 node 53
HSCREACT PEA_l_node 54
HSCREACT PEA 1 node 55
HSCREACT PEA 1 node 56
HSCREACT PEA 1 node 57
HSCREACT PEA_l_node_58
HSCREACT PEA 1 node 59
HSCREACT PEA 1 node 60
HSCREACT PEA 1 node 61
HSCREACT PEA 1 node 64
HSCREACT PEA 1 node 8
HSCREACT PEA l_node 9
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HSCREACT PEA 1 P9
HSCREACT PEA_1_P10
HSCREACT PEA 1 P12
HSCREACT PEA 1 P16
HSCREACT PEA 1 P22
HSCREACT PEA 1 P28
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCREACT_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to
MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKAFTVCLHFYT ELSST corresponding to amino acids 1 - 64 of CRP_HUMAN, which also corresponds to amino acids 1 - 64 of HSCREACT_PEA_1_P9, second (bridging) amino acid sequence comprising H, and a third amino acid sequence being at least 90 % homologous to EINTIYLGGPFSPNVLNWRALKYEVQGEVFTKPQLWP conesponding to amino acids 188 - 224 of CRP_HUMAN, which also corresponds to amino acids 66 - 102 of
HSCREACT_PEA_1_P9, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HSCREACT_PEA_1_P9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise THE having a structure as follows
(numbering according to HSCREACT_PEA_1_P9): a sequence starting from any of amino acid numbers 64-x to 64; and ending at any of amino acid numbers 66 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCREACT_PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKAFTVCLHFYT
ELSSTRG corresponding to amino acids 1 - 66 of CRP_HUMAN, which also corresponds to amino acids 1 - 66 of HSCREACT_PEA_1_P10. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCREACT_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to
MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKAFTVCLHFYT
ELSSTRG corresponding to amino acids 1 - 66 of CRP_HUMAN, which also corresponds to amino acids 1 - 66 of HSCREACT_PEA_1_P12, and a second amino acid sequence being at least 90 % homologous to PNVLNWRALKYEVQGEVFTKPQLWP corresponding to amino acids 200 - 224 of CRP_HUMAN, which also corresponds to amino acids 67 - 91 of
HSCREACT_PEA_1_P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSCREACT_PEA_1_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino
acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise GP, having a structure as follows: a sequence starting from any of amino acid numbers 66-x to 66; and ending at any of amino acid numbers 67 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCREACT_PEA_1_P16, comprising a first amino acid sequence being at least 90 % homologous to
MEKXLCFLλO.TSLSFIAFGQTDMSRKAFVFPKϊSDTSYVSLKAPLTKPLKAFTVCLHFYT ELSSTRGYSIFSYATKRQDNEILIFWSKDIGYSFTVGGSEILFEVPEVTVAPVHICTSWESA SGIVEFWVDGKPRVRKSLKKGYTVGAEASIILGQEQDSF corresponding to amino acids 1 - 160 of CRP_HUMAN, which also corresponds to amino acids 1 - 160 of HSCREACT_PEA_1_P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%o homologous to a polypeptide having the sequence VSESGHWPGVWFGSRVLIIMS corresponding to amino acids 161 - 181 of HSCREACT_PEA_1_P16, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCREACT_PEA_1_P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSESGHWPGVWFGSRVLIIMS in HSCREACT_PEA_1_P16. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCREACT_PEA_1_P22, comprising a first amino acid sequence being at least 90 % homologous to MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKAFTVCLHFYT ELSSTRG conesponding to amino acids 1 - 66 of CRP_HUMAN, which also conesponds to amino acids 1 - 66 of HSCREACT_PEA_1_P22, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AFLILWLFWETPPLFHTNLVGL conesponding to amino acids 67 - 88 of
HSCREACT_PEA_1_P22, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCREACT_PEA_1_P22, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence AFLILWLFWETPPLFHTNLVGL in HSCREACT_PEA_1_P22. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCREACT_PEA_1_P28, comprising a first amino acid sequence being at least 90 % homologous to
MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKAFTVCLHFYT ELSST conesponding to amino acids 1 - 64 of CRP_HUMAN, which also conesponds to amino acids 1 - 64 of HSCREACT_PEA_1_P28, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLS conesponding to amino acids 65 - 67 of HSCREACT_PEA_1_P28, wherein said first and second amino acid sequences are contiguous and in a sequential order. The CRP variant- detectable disease may comprise one or more of the following: Assessment of disease activity in inflammatory conditions (Juvenile rheumatoid arthritis, Rheumatoid arthritis, Ankylosing spondylitis, Reiter disease, Psoriatic arthropathy, Vasculitides Behcet syndrome, Wegener granulomatosis, Polyarteritis nodosa, Polymyalgia rheumatica, Crohn's disease, Rheumatic fever, Familial fevers including familial Mediterranean fever, Acute pancreatitis); Diagnosis and management of infection (Bacterial endocarditis, Neonatal septicemia and meningitis, Intercunent infection in systemic lupus erythematosus, Intercunent infection in leukemia and its treatment, Postoperative complications including infection, and thromboembolism); Differential diagnosis/classification of inflammatory disease (Systemic lupus erythematosus vs. rheumatoid arthritis, Crohn disease vs. ulcerative colitis); Tissue necrosis after myocardial infarction and/or outcome of such an infarction; 1. Coronary artery disease, non- fatal and fatal 2. Stroke 3. Progression of peripheral vascular disease 4. Development of Congestive Heart Failure 5. Sudden Cardiac Death. 6. Poor prognosis in severe unstable angina 7. Poor prognosis after angioplasty. Also optionally and preferably, low-grade
upregulation of CRP vanant production may be detected for predicting coronary Events, stroke and cerebrovascular events, and other cardiovascular diseases such as unstable angina. According to prefened embodiments of the present invention, there is provided a diagnostic method or assay according to measurement of upregulated, slightly upregulated and/or baseline levels of a CRP variant according to the present invention. According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Z25227 T18
Z25227 T19 a nucleic acid sequence comprising a sequence in the table below:
Z25227 node 34
Z25227 node 39
Z25227 node 40
Z25227 node 46
Z25227 node 49
Z25227 node 51
Z25227 node 53
Z25227 node_35
Z25227 node 36
Z25227 node 47
Z25227 node 50
Z25227 node 52
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
Z25227 P10 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
Iranscnpt Name ;
T87719_T1 T87719 T9 a nucleic acid sequence comprising a sequence in the table below:
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence comprising a sequence in the table below:
HSCAMPAT1 PEA 1 node 0 HSCAMPAT1 PEA 1 node 3 HSCAMPAT1 PEA 1 node 4 HSCAMPAT1 PEA l_node_5 HSCAMPAT1 PEA 1 node 8 HSCAMPAT1 PEA l_node_2 HSCAMPAT1 PEA 1 node 7 HSCAMPAT1 PEA l_node_9
According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to prefened embodiments ofthe present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table bebw and/or:
HSΗR PEA 1 T2 HSTIR_PEA_1 T3 a nucleic acid sequence comprising a sequence in the table below:
HSTIR PEA 1 node 0 HSTIR PEA 1 node 4 HSTIR PEA 1 node 8 HSTIR PEA 1 node 10 HSTIR PEA 1 node 17 HSTIR PEA 1 node 21 HSTIR PEA_l_node 2 HSTIR PEA 1 node 6 HSTIR_PE A_ l_node_ 14 HSTIR PEA 1 node 15 HSTIR PEA 1 node 19
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HSTIR PEA 1 P4 HSTIR PEA 1 P6 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
a nucleic acid sequence comprising a sequence in the table below:
HSALK1A PEA 1 node 0 HSALK1A PEA 1 node 8 HSALK1A PEA 1 node 9 HSALK1A_PEA l_node 5 HSALK1A PEA 1 node 7
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
tesgιs?§B8BH HSALK1A_PEA 1 P14 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HSCDIA_PEA 1 Ti l a nucleic acid sequence comprising a sequence in the table below:
HSCDIA PEA 1 node 3 HSCDIA PEA 1 node 7 HSCDIA PEA 1 node 11 HSCDIA PEA 1 node 14 HSCDIA_PEA 1 node 15 HSCDIA PEA 1 node 18 HSCDIA_PEA 1 node 20 HSCDIA PEA 1 node 21 HSCDIA_PEA 1 node 24 HSCDIA PEA 1 node 1 HSCDIA PEA 1 node 6 HSCDIA PEA 1 node 10 HSCDIA_PEA 1 node 13 HSCDIA PEA 1 node 16 HSCDIA PEA 1 node 17 HSCDIA PEA 1 node 19
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HSCDIA_PEA 1 P5 HSCDIA_PEA 1 P6 HSCDIA PEA 1 P7 HSCDIA_PEA 1 P8 HSCDIA_PEA 1 P9 HSCDIA PEA 1 Pl l
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
S69686 T36 S69686 T48 a nucleic acid sequence comprising a sequence in the table below:
S69686_node__54 S69686_node_ _55 S69686_node. _56 S69686_node. .57 S69686_node. .58 S69686_node_59 S69686_node_ .60 S69686_node_61 S69686_node. .62 S69686_node. .63 S69686_node. .64 S69686_node_ .65 S69686_node_ .66 S69686_node. .67 S69686_node_68 S69686_node_ .69 S69686_node. .70 S69686_node. .71 S69686_node. .72 S69686_node_73 S69686_node_74 S69686_node_75 S69686_node_76 S69686_node_77 S69686_node. .78 S69686 node 79
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: S69686 P2 S69686 P6 S69686 P7 S69686 P13
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HUMTCXAAA PEA_1_T12 a nucleic acid sequence comprising a sequence in the table below:
HUMTCXAAA PEA 1 node 0 HUMTCXAAA PEA 1 node 1 HUMTCXAAA_PEA_1 node_2 HUMTCXAAA PEA 1 node 4 HUMTCXAAA__PEA_l_node 6 HUMTCXAAA PEA 1 node 8 HUMTCXAAA PEA_l_node 17 HUMTCXAAA PEA 1 node 20 HUMTCXAAA PEA 1 node 21 HUMTCXAAA PEA 1 node 22 HUMTCXAAA_PEA 1 node_9 HUMTCXAAA PEA 1 node 10 HUMTCXAAA PEA 1 node 11 HUMTCXAAA PEA 1 node 13 HUMTCXAAA PEA 1 node 15 HUMTCXAAA PEA 1 node 16 HUMTCXAAA_PEA 1 node_18 According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HUMTCXAAA PEA 1 P6 HUMTCXAAA PEA 1 P12 HUMTCXAAA PEA 1 PI 3 HUMTCXAAA PEA 1 P14 HUMTCXAAA PEA 1 P15 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: jfr anscripfNa e :W "W-d HSPPI PEA 1 T3 HSPPI PEA 1 T5 HSPPI PEA 1_T6 HSPPI PEA 1 T12 HSPPI_PEA_1_T13 HSPPI PEA_1_T17
HSPPI PEA 1 T18 a nucleic acid sequence comprising a sequence in the table below:
HSPPI PEA 1 node 2 HSPPI PEA_1 node 7 HSPPI PEA 1 node 13 HSPPI PEA 1 node 0 HSPPI PEA 1 node 1 HSPPI PEA 1 node 3 HSPPI PEA 1 node 4 HSPPI PEA 1 node 5 HSPPI PEA 1 node 6 HSPPI PEA_l_node_8 HSPPI PEA 1 node 9 HSPPI PEA 1 node 10 HSPPI PEA 1 node 11 HSPPI PEA 1 node 12 According to preferred embodiments ofthe present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HSPPI PEA 1 P6 HSPPI PEA 1 P8 HSPPI PEA 1 P9 HSPPI PEA 1 P10 HSPPI PEA 1 P12 HSPPI PEA 1 P14 HSPPI PEA 1 P15 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: ifBg geifcii D11581 PEA 1 T4 D11581 PEA 1 T5 D11581 PEA 1 T12 D11581 PEA 1 T14
a nucleic acid sequence comprising a sequence in the table below:
D11581 PEA node 4
D11581 PEA node 8
D11581 PEA node 16
D11581 PEA node 23
D11581 PEA node 27
D11581 PEA node 55
D11581 PEA node 6
D11581 PEA node 10
D11581 PEA node 11
D11581 PEA node 12
D11581 PEA node 13
D11581 PEA node 14
D11581 PEA node 18
D11581 PEA node 20
D11581 PEA node 21
D11581 PEA node 29
D11581 PEA node 30
D11581 PEA node 33
D11581 PEA node 34
D11581 PEA node 36
D11581 PEA node 38
D11581 PEA node 39
D11581 PEA node 40
D11581 PEA node 41
D11581 PEA node 42
D11581 PEA node 43
D11581 PEA node 44
D11581 PEA node 45
D11581 PEA node 49
D11581 PEA node 50
D11581 PEA node 52
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
D11581 PEA_1_P6
D11581 PEA_1 P10
D11581 PEA 1 P12
D11581 PEA 1 P16
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
S42303 PEA 1 T4
S42303 PEA 1 T5
S42303 PEA 1 T6
S42303 PEA 1 T8
S42303 PEA 1_T9
S42303 PEA 1 T10
a nucleic acid sequence comprising a sequence in the table below:
S42303 PEA node 1
S42303 PEA node 2
S42303 PEA node 3
S42303 PEA node 10
S42303 PEA node 14
S42303 PEA .node 17
S42303 PEA node 20
S42303 PEA node_23
S42303 PEA node 25
S42303 PEA node 27
S42303 PEA node 29
S42303 PEA node 31
S42303 PEA node 33
S42303 PEA node 35
S42303 PEA node 41
S42303 PEA node 44
S42303 PEA .node 46
S42303 PEA node 48
S42303 PEA .node 50
S42303 PEA node 4
S42303 PEA node 6
S42303 PEA node 8
S42303 PEA .node 12
S42303 PEA node 21
S42303 PEA node 37
S42303 PEA node 38
S42303_PEA 1 node 47 According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
S42303 PEA 1 P2
S42303 PEA 1 P3
S42303 PEA 1_P4
S42303 PEA 1 P5
S42303 PEA 1 P6
S42303 PEA 1 P7 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
T87096 PEA 1 T14
T87096 PEA 1 T18
T87096 PEA 1 T37
T87096 PEA 1 T39
a nucleic acid sequence comprising a sequence in the table below:
T87096 PEA 1 node 2
T87096 PEA 1 node 21
T87096 PEA 1 node 22
T87096 PEA 1 node 27
T87096_PEA_l_node 38
T87096 PEA 1 node 55
T87096 PEA 1 node 11
T87096 PEA 1 node 12
T87096 PEA 1 node 13
T87096 PEA 1 node 14
T87096 PEA 1 node 15
T87096 PEA_l_node 16
T87096 PEA 1 node 17
T87096 PEA_l_node_18
T87096 PEA_l_node 23
T87096 PEA 1 node 24
T87096 PEA_l_node 26
According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
T87096 PEA 1 Pl l T87096 PEA_1 P27 T87096 PEA 1 P29 T87096 PEA_1_P39 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or: l TranscriptNarne,.
HSPRO204 PEA 1_T3
HSPRO204 PEA 1 T4
HSPRO204 PEA 1 T5
HSPRO204 PEA 1 T6
HSPRO204 PEA 1 Ti l
HSPRO204 PEA 1 T12
HSPRO204 PEA 1 T17
HSPRO204 PEA 1 T18
HSPRO204 PEA 1 T22
a nucleic acid sequence comprising a sequence in the table below:
HSPRO204 PEA node 2
HSPRO204 PEA_ node 6
HSPRO204 PEA node 20
HSPRO204 PEA node 28
HSPRO204 PEA node 35
HSPRO204 PEA node 40
HSPRO204 PEA node 41
HSPRO204 PEA node 0
HSPRO204 PEA node 9
HSPRO204_PEA node 10
HSPRO204 PEA node 11
HSPRO204 PEA node 12
HSPRO204 PEA node 13
HSPRO204 PEA node 14
HSPRO204 PEA node 15
HSPRO204 PEA node 16
HSPRO204 PEA node 17
HSPRO204 PEA node 18
HSPRO204 PEA node 22
HSPRO204 PEA node_23
HSPRO204 PEA node 24
HSPRO204 PEA node 25
HSPRO204 PEA node 26
HSPRO204 PEA. node 30
HSPRO204 PEA. node 31
HSPRO204_PEA. node 32
HSPRO204 PEA. node 33
HSPRO204 PEA node 34
HSPRO204_PEA_ node 37
HSPRO204 PEA node 38
HSPRO204 PEA 1 node 39 According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HSPRO204 PEA 1 P3
HSPRO204 PEA 1 P4
HSPRO204_PEA 1 P5
HSPRO204 PEA 1 P6
HSPRO204 PEA 1 Pl l
HSPRO204 PEA 1 PI 2
HSPRO204_PEA 1 P16
HSPRO204 PEA 1 P21 According to prefened embodiments ofthe present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HSMUC1A PEA 1 T12
HSMUC1A PEA 1 T26
HSMUCIA PEA 1 T28
HSMUCIA PEA 1 T29
HSMUC1A_PEA 1 T30
HSMUCIA PEA 1 T31
HSMUCIA PEA 1 T33
HSMUC1A_PEA 1 T34
HSMUCIA PEA 1 T35
HSMUCIA PEA 1 T36
HSMUCIA PEA 1 T40
HSMUCIA PEA 1 T42
HSMUCIA PEA 1 T43
HSMUCIA PEA 1 T47
a nucleic acid sequence comprising a sequence in the table below:
BJKBJWBB
HSMUCIA PEA 1 node 0
HSMUC1A_PEA 1 node 14
HSMUCIA PEA 1 node 24
HSMUC1A_PEA_1 node 29
HSMUC1A_PEA 1 node 35
HSMUCIA PEA 1 node 38
HSMUCIA PEA 1 node 3
HSMUCIA PEA 1 node 4
HSMUCIA PEA 1 node 5
HSMUCIA PEA 1 node 6
HSMUCIA PEA 1 node 7
HSMUCl A_PEA 1 node 17
HSMUCIA PEA 1 node 18
HSMUCIA PEA 1 node 20
HSMUCIA PEA 1 node 21
HSMUClA_PEA_l_node 23
HSMUCIA PEA 1 node 26
HSMUCIA PEA l_node 27
HSMUCIA PEA 1 node 31
HSMUCIA PEA 1 node 34
HSMUCIA PEA 1 node 36
HSMUCIA PEA 1 node 37
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HSMUCIA PEA 1 P25
HSMUC 1 A_PEA_ 1_P29
HSMUCIA PEA 1 P30
HSMUCIA PEA 1 P32
HSMUCIA PEA 1 P36
HSMUCIA PEA 1 P39
HSMUCIA PEA 1 P45
HSMUC 1A_PEA 1_P49
HSMUCIA PEA 1 P52
HSMUCIA PEA 1 P53
HSMUCIA PEA 1 P56
HSMUCIA PEA 1 P58
HSMUCIA PEA 1 P59
HSMUCIA PEA 1_P63
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSMUCl A_PEA_1_P63, comprising a first amino acid sequence being at least 90 % homologous to MTPGTQSPFFLLLLLTVLTVVTGSGHASSTPGGEKETS ATQRSS V conesponding to amino
acids 1 - 45 of MUC1_HUMAN, which also conesponds to amino acids 1 - 45 of HSMUC 1A_PEA_1_P63, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK conesponding to amino acids 46 - 85 of HSMUC1A_PEA_1_P63, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSMUC1A_PEA_1_P63, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK in HSMUC 1A_PEA_1_P63. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S42303_PEA_1_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SRRNYGKWKLDGMFLLRRYVCIFTEKLKNQAELYVFLS conesponding to amino acids 1 - 38 of S42303_PEA_1_P2, and a second amino acid sequence being at least 90 % homologous to VKFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQDKETQEK WQVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQRQKRDWVIPPINLPENSRG PFPQELVRIRSDRDKNLSLRYSVTGPGADQPPTGIFIINPISGQLSVTKPLDREQIARFHLR AHAVDINGNQVENPIDIVINVIDMNDNRPEFLHQVWNGTVPEGSKPGTYVMTVTAIDA DDPNALNGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQATDM EGNPTYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIIVANLTVTDKDQPHT PAWNAVYRISGGDPTGRFAIQTDPNSNDGLVTVVKPIDFETNRMFVLTVAAENQVPLA KGIQHPPQSTATVSVTVIDVNENPYFAPNPKIIRQEEGLHAGTMLTTFTAQDPDRYMQQ NIRYTK1SDPANWLKIDPVNGQITTIAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGT LQIYLLDLNDNAPQVLPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKRN WTITRLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILRVKVCQCDSNGDCTDV DRIVGAGLGTGAIIAILLCIIILLILVLMFVVWMKRRDKERQAKQLLIDPEDDVRDNILKY DEEGGGEEDQDYDLSQLQQPDTVEPDAIKPVGIRRMDERPIHAEPQYPVRSAAPHPGDI
GDFTNEGLKAADNDPTAPPYDSLLVFDYEGSGSTAGSLSSLNSSSSGGEQDYDYLNDW GPRFKKLADMYGGGDD conesponding to amino acids 58 - 906 of CAD2_HUMAN, which also conesponds to amino acids 39 - 887 of S42303_PEA_1_P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of S42303_PEA_1_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SRRNYGKWKLDGMFLLRRYVCIFTEKLKNQAELYVFLS of S42303_PEA_1_P2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S42303_PEA_1_P3, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MCNTQRM corresponding to amino acids 1 - 7 of S42303_PEA_1_P3, and a second amino acid sequence being at least 90 % homologous to
KFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQDKETQEKW QVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQRQKRDWVIPPINLPENSRGP FPQELVRIRSDRDKNLSLRYSVTGPGADQPPTGIFIINPISGQLSVTKPLDREQIARFHLRA HAVDINGNQVENPIDIVINVIDMNDNRPEFLHQVWNGTVPEGSKPGTYVMTVTAIDAD DPNALNGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQATDME GNPTYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIIVANLTVTDKDQPHTP AWNAVYRISGGDPTGRFAIQTDPNSNDGLVTVVKPIDFETNRMFVLTVAAENQVPLAK GIQHPPQSTATVSVTVIDVNENPYFAPNPKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNI RYTKLSDPAN LKJDPVNGQITTIAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQ IYLLDINDNAPQVLPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKRNWTI TRLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILRVKVCQCDSNGDCTDVDRIV GAGLGTGAIIAILLCIIILLILVLMFVVWMKRRDKERQAKQLLIDPEDDVRDNILKYDEE GGGEEDQDYDLSQLQQPDTVEPDAIKPVGIRRMDERPIHAEPQYPVRSAAPHPGDIGDFI NEGLKAADNDPTAPPYDSLLVFDYEGSGSTAGSLSSLNSSSSGGEQDYDYLNDWGPRF KKLADMYGGGDD conesponding to amino acids 59 - 906 of C AD2_HUMAN, which also
corresponds to amino acids 8 - 855 of S42303_PEA_1_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a head of S42303_PEA_1_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MCNTQRM of S42303_PEA_1_P3. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for S42303_PEA_1_P4, comprising a first amino acid sequence being at least 90 % homologous to
MVYAVRSFPLSSEHAKFLIYAQDKETQEKWQVAVKLSLKPTLTEESVKESAEVEEIVFP RQFSKHSGHLQRQKRDWVIPPINLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGPGAD
QPPTGIFIINPISGQLSVTKPLDREQIARFHLRAHAVDΓNGNQVENPIDIVINVIDMNDNRP EFLHQVWNGTVPEGSKPGTYVMTVTAIDADDPNALNGMLRYRIVSQAPSTPSPNMFTI NNETGDIITVAAGLDREKVQQYTLIIQATDMEGNPTYGLSNTATA VITVTD VNDNPPEF TAMTFYGEVPENRVDIIVANLTVTDKDQPHTPAWNAVYRISGGDPTGRFAIQTDPNSND GLVTVVKPIDFETNRMFVLTVAAENQVPLAKGIQHPPQSTATVSVTVIDVNENPYFAPN PKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRYTKLSDPANWLKIDPVNGQITTIAVL DRESP ΓVKNNIYNATFLASDNGIPPMSGTGTLQIYLLDΓNDNAPQVLPQEAETCETPDPN SINITALDYDIDPNAGPFAFDLPLSPVTIKRNWTITRLNGDFAQLNLKIKFLEAGIYEVPIII TDSGNPPKSNISILRVKVCQCDSNGDCTDVDRIVGAGLGTGAIIAILLCIIILLILVLMFW WMKRRDKERQAKQLLIDPEDDVRDNILKYDEEGGGEEDQDYDLSQLQQPDTVEPDAI KPVGIRRMDERPIHAEPQYPVRSAAPHPGDIGDFINEGLKAADNDPTAPPYDSLLVFDYE
GSGSTAGSLSSLNSSSSGGEQDYDYLNDWGPRFKKLADMYGGGDD conesponding to amino acids 86 - 906 of CAD2_HUMAN, which also conesponds to amino acids 1 - 821 of
S42303_PEA_1_P4. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for S42303_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to MCRIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVHEGQPLLNVK
FSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQDKETQEKWQ
VAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQRQKRDWVIPPINLPENSRGPFP QELVRIRSDRDKNLSLRYSVTGPGADQPPTGIFIINPISGQLSVTKPLDREQIARFHLRAH AVD GNQVENPIDIVINVIDMNDNRPEFLHQVWNGTVPEGSKPGTYVMTVTAIDADDP NALNGMLRYRΓVSQAPSTPSPNMFTINNETGDΠTVAAGLDREKVQQYTLIIQATDMEGN PTYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIΓVANLTVTDKDQPHTPAW NAVYRISGGDPTGRFAIQTDPNSNDGLVTWKPIDFETNRMFVLTVAAENQVPLAKGIQ HPPQSTATVSVTVIDVNENPYFAPNPKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRY TKI.SDPA ^WLKROPVNGQITTIAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQLY LLDINDNAPQVLPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKRNWTIT RLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILR conesponding to amino acids 1 - 697 of CAD2_HUMAN_V1, which also conesponds to amino acids 1 - 697 of S42303_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SLC corresponding to amino acids 698 - 700 of S42303_PEA_1_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S42303_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MCRIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVHEGQPLLNVK FSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQDKETQEKWQ VAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQRQKRDWVIPPINLPENSRGPFP QELVRIRSDRDKNLSLRYSVTGPGADQPPTGIFIINPISGQLSVTKPLDREQIARFHLRAH A\ GNQVENPIDIVINVIDMNDNRPEFLHQVWNGTVPEGSKPGTYVMTVTAIDADDP NALNGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQATDMEGN PTYGLSNTATA VITVTD VNDNPPEFTAMTFYGEVPENRVDIIVANLTVTDKDQPHTP AW NAVYRISGGDPTGRFAIQTDPNSNDGLVTVVKPIDFETNRMFVLTVAAENQVPLAKGIQ HPPQSTATVSVTVIDVNENPYFAPNPKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRY TKLSDPANWLKIDPVNGQITTIAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIY LLDINDNAPQVLPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKRNWTIT RLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILRVKVCQCDSNGDCTDVDRIV
GAGLGTGAIIAILLCIIILLILVLMFWWMKRRDKERQAKQLLIDPEDDVRDNILKYDEE GGGEEDQDYDLSQLQQPDTVEPDAIKPVGIRRMDERPIHAEPQYPVRSAAPHPGDIGDFI NE conesponding to amino acids 1 - 838 of CAD2_HUMAN_V1, which also corresponds to amino acids 1 - 838 of S42303_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KTWPIESLHL conesponding to amino acids 839 - 848 of S42303_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of S42303_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KTWPIESLHL in S42303_PEA_1_P6. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for S42303_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to
MCRIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVHEGQPLLNVK FSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQDKETQEKWQ VAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQRQKRDWVIPPINLPENSRGPFP QELVRIRSDRDKNLSLRYSVTGPGADQPPTGIFIINPISGQLSVTKPLDREQIARFH conesponding to amino acids 1 - 234 of CAD2_HUMAN_V1, which also conesponds to amino acids 1 - 234 of S42303_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRFQPADN conesponding to amino acids 235 - 242 of S42303_PEA_1_P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of S42303_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRFQPADN in S42303_PEA_1_P7.
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T87096_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to
MQPSSLLPLALCLLAAPASALVRTPLHKFTSIRRTMSEVGGSVEDLIAKGPVSKYSQAVP AVTEGPIPEVLKNYMDAQYYGEIGIGTPPQCFTVVFDTGSSNLWVPSIHCKLLDIACWTH HKYNSDKSSTYVKNGTSFDIHYGSGSLSGYLSQDTVSVPCQSASSASALGGVKVERQVF GEATKQPGITFIAAKFDGILGMAYPRISVNNVLPVFDNLMQQKLVDQNIFSFYLSRDPD AQPGGELMLGGTDSKYYKGSLSYLNVTRKAYWQVHLDQVEVASGLTLCKEGCEAIVD TGTSLMVGPVDEVRELQKAIGAVPLIQGE conesponding to amino acids 1 - 324 of CATD_HUMAN, which also conesponds to amino acids 1 - 324 of T87096_PEA_1_P11, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSAGGWGWGWGWQGEPQGHHYHPDTAVTPLST conesponding to amino acids 325 - 356 of T87096_PEA_1_P11, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T87096_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSAGGWGWGWGWQGEPQGHHYHPDTAVTPLST in T87096_PEA_1_P11. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T87096_PEA_1_P27, comprising a first amino acid sequence being at least 90 % homologous to MQPSSLLPLALCLLAAPASALVRIPLHKFTS1RRTMSEVGGSVEDLIAKGPVSKYSQAVP AVTEGPIPEVLKNYMDAQYYGEIGIGTPPQCFTWFDTGSSNLWVPSIHCKLLDIACWIH HKYNSDKSSTYVKNGTSFDIHYGSGSLSGYLSQDTVSVPCQSASSASALGGVKVERQVF GEATKQPGITFIAAKFDGILGMAYPRISVNNVLPVFDNLMQQKLVDQNIFSFYLSRDPD AQPGGELMLGGTDSKYYKGSLSYLNVTRKAYWQVHLDQV conesponding to amino acids 1 - 277 of CATD_HUMAN, which also conesponds to amino acids 1 - 277 of T87096_PEA_1_P27, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%
homologous to a polypeptide having the sequence WAAVG conesponding to amino acids 278 - 283 of T87096_PEA_1_P27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T87096_PEA_1_P27, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence WAAVG in T87096_PEA_1_P27. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T87096_PEA_1_P39, comprising a first amino acid sequence being at least 90 % homologous to
MQPSSLLPLALCLLAAPASALVRIPLHKFTSIRRTMSEVGGSVEDLIAKGPVSKYSQAVP AVTEGPIPEVLKNYMDAQYYGEIGIGTPPQCFTVVFDTGSSNLWVPSIHCKLLDIAC conesponding to amino acids 1 - 117 of CATD_HUMAN, which also conesponds to amino acids 1 - 117 of T87096_PEA_1_P39, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CESRTLAPSPRSCPSGMSLQGCLRNHLGNAILLPLGPVSQASPPPCSSH conesponding to amino acids 118 - 166 of T87096_PEA_1_P39, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T87096_PEA_1_P39, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CESRTLAPSPRSCPSGMSLQGCLRNHLGNAILLPLGPVSQASPPPCSSH in T87096_PEA_1_P39. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P3, comprising a first amino acid sequence being at least 90 % homologous to MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRAVVLSHYIHN LSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQAQQMN corresponding to
amino acids 1 - 104 of PRL_HUMAN, which also conesponds to amino acids 1 - 104 of HSPRO204_PEA_l_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KTF conesponding to amino acids 105 - 107 of HSPRO204_PEA_l_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KTF in HSPRO204_PEA_l_P3. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P4, comprising a first amino acid sequence being at least 90 % homologous to MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQ VTLRDLFDRAVVLSHYIHN LSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQAQQMNQKDFLSLIVSILRSW NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQTKRLLEGMELIVSQ conesponding to amino acids 1 - 164 of PRL_HUMAN, which also conesponds to amino acids 1 - 164 of HSPRO204_PEA_l_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LERTRTYKY conesponding to amino acids 165 - 173 of HSPRO204_PEA_l_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LERTRTYKY in HSPRO204_PEA_l_P4. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P5, comprising a first amino acid sequence being at least 90 % homologous to
MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRAVVLSHYfflN LSSEMFSEFDKRYTHGRGFITKALNSCHTSSLATPEDKEQAQQMNQKDFLSLIVSILRSW NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQTKRLLEGMELTVSQ conesponding to amino acids 1 - 164 of PRL_HUMAN, which also conesponds to amino acids 1 - 164 of HSPRO204_PEA_l_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SLFFCVMRFILKPKKMRSTLSGRDFHPCRWLMKSLAFLLIITCSTAYAGIHIKSTIISSS conesponding to amino acids 165 - 224 of HSPRO204_PEA_l_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SLFFCVMRFILKPKKMRSTLSGRDFHPCRWLMKSLAFLLIITCSTAYAGIHIKSTIISSS in HSPRO204_PEA_l_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P6, comprising a first amino acid sequence being at least 90 % homologous to MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRAWLSHYfflN LSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQAQQMNQKDFLSLIVSILRSW NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQTKRLLEGMELΓVSQV conesponding to amino acids 1 - 165 of PRL_HUMAN, which also conesponds to amino acids 1 - 165 of HSPRO204_PEA_l_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SSLLVLLCFSH corresponding to amino acids 166 - 176 of HSPRO204_PEA_l_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably
at least about 90% and most preferably at least about 95% homologous to the sequence SSLLVLLCFSH in HSPRO204_PEA_l_P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPRO204_PEA_l_Pl 1, comprising a first amino acid sequence being at least 90 % homologous to
MNΓKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRAWLSHYIHN
LSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQAQQMN conesponding to amino acids 1 - 104 of PRL_HUMAN, which also conesponds to amino acids 1 - 104 of HSPRO204_PEA_l_Pl 1, and a second amino acid sequence being at least 90 % homologous to VHPETKENEIYPVWSGLPSLQM ADEESRLS A YYNLLHCLRRDSHKIDNYLKLLKCRIIH NNNC corresponding to amino acids 165 - 227 of PRL_HUMAN, which also conesponds to amino acids 105 - 167 of HSPRO204_PEA_l_Pl 1, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSPRO204_PEA_l_Pl 1, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise NV, having a structure as follows: a sequence starting from any of amino acid numbers 104-x to 104; and ending at any of amino acid numbers 105+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the preset! invention, there is provided an isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P12, comprising a first amino acid sequence being at least 90 % homologous to MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRAVVLSHYIHN LSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQAQQMN conesponding to amino acids 1 - 104 of PRL_HUMAN, which also conesponds to amino acids 1 - 104 of HSPRO204_PEA_l_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AKRTDCSASSMGQAVV
corresponding to amino acids 105 - 120 of HSPRO204_PEA_l_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence AKRTDCSASSMGQAVV in HSPRO204_PEA_l_P12. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P21, comprising a first amino acid sequence being at least 90 % homologous to
MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQ conesponding to amino acids 1 - 40 of PRL_HUMAN, which also conesponds to amino acids 1 - 40 of HSPRO204_PEA_l_P21, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
LPHFFPCHPRRQGASPTDESKRLSEPDSQHIAILE corresponding to amino acids 41 - 75 of HSPRO204_PEA_l_P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P21, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LPHFFPCHPRRQGASPTDESKRLSEPDSQHIAILE in HSPRO204_PEA_l_P21. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S69686_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLRLPYPLTWRQRPKQLEALCVGAATGPRA conesponding to amino acids 1 - 30 of S69686_P2, and a second amino acid sequence being at least 90 % homologous to MWLCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGLKGDPGPPG PMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPPGLPAHLDEELQATLHDFRHQ
ILQTRGALSLQGSIMTVGEKVFSSNGQSITFDAIQEACARAGGRIAVPRNPEENEAIASFV KKYNTYAYVGLTEGPSPGDFRYSDGTPVNYTNWYRGEPAGRGKEQCVEMYTDGQWN DRNCLYSRLTICEF conesponding to amino acids 1 - 248 of PSPA_HUMAN_V1, which also conesponds to amino acids 31 - 278 of S69686_P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a head of S69686_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLRLPYPLTWRQRPKQLEALCVGAATGPRA of S69686_P2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S69686_P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLRLPYPLTWRQRPKQLEALCVATGPRA conesponding to amino acids 1 - 28 of S69686_P6, and a second amino acid sequence being at least 90 % homologous to MWLCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGLKGDPGPPG PMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPPGLPAHLDEELQATLHDFRHQ ILQTRGALSLQGSIMTVGEKVFSSNGQSITFDAIQEACARAGGRIAVPRNPEENEAIASFV KKYNTYAYVGLTEGPSPGDFRYSDGTPVNYTNWYRGEPAGRGKEQCVEMYTDGQWN DRNCLYSRLTICEF conesponding to amino acids 1 - 248 of PSPA_HUMAN_V1, which also conesponds to amino acids 29 - 276 of S69686_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a head of S69686_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLRLPYPLTWRQRPKQLEALCVATGPRA of S69686_P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S69686_P7, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least
90% and most preferably at least 95% homologous to a polypeptide having the sequence MLRLPYPLTWRQRPKQLEALCVGAATGPRA conesponding to amino acids 1 - 30 of S69686_P7, a second amino acid sequence being at least 90 % homologous to MWLCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGLKGDPGPPG PMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPP conesponding to amino acids 1 - 97 of PSPA_HUMAN_V1, which also conesponds to amino acids 31 - 127 of S69686_P7, and a third amino acid sequence being at least 90 % homologous to
ALSLQGSIMTVGEKVFSSNGQSITFDAIQEACARAGGRIAVPRNPEENEAIASFVKKYNT YAYVGLTEGPSPGDFRYSDGTPVNYTNWYRGEPAGRGKEQCVEMYTDGQWNDRNCL YSRLTICEF conesponding to amino acids 124 - 248 of PSPA_HUMAN_V 1 , which also corresponds to amino acids 128 - 252 of S69686_P7, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to preferred embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a head of S69686_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLRLPYPLTWRQRPKQLEALCVGAATGPRA of S69686_P7. According to preferred embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of S69686_P7, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PA, having a structure as follows: a sequence starting from any of amino acid numbers 127-x to 127; and ending at any of amino acid numbers 128+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for S69686_P13, comprising a first amino acid sequence being at least 90 % homologous to MWLCPLALNLILMAASGAVCEVKDVCVGSP conesponding to amino acids 1 - 30 ofPSPA_HUMAN_Vl, which also conesponds to amino acids 1 - 30 of S69686_P13, and a second amino acid sequence being at least 90 % homologous
to GRGKEQCVEMYTDGQWNDRNCLYSRLTICEF conesponding to amino acids 218 - 248 of PSPA_HUMAN_V1, which also conesponds to amino acids 31 - 61 of S69686_P13, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of S69686_P13, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 30-x to 30; and ending at any of amino acid numbers 31+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTCXAAA_PEA_1_P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MRNQAPGRPKGATFPPRRPTGSRAPPLAPELRAKQRPGERV conesponding to amino acids 1 - 41 of HUMTCXAAA_PEA_1_P6, a second amino acid sequence being at least 90 % homologous to MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNPTSGCSWLFQ PRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVLTLSDFRRENEGYYFCSAL SNSIMYFSHFVPVFLP conesponding to amino acids 1 - 134 of CD8A_HUMAN, which also conesponds to amino acids 42 - 175 of HUMTCXAAA_PEA_1_P6, a third amino acid sequence bridging amino acid sequence comprising of G, and a fourth amino acid sequence being at least 90 % homologous to NRRRVCKCPRPWKSGDKPSLSARYV conesponding to amino acids 210 - 235 of CD8 A_HUMAN, which also conesponds to amino acids 177 - 202 of HUMTCXAAA_PEA_1_P6, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of HUMTCXAAA_PEA_1_P6, comprising a
polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MRNQAPGRPKGATFPPRRPTGSRAPPLAPELRAKQRPGERV of HUMTCXAAA_PEA_1_P6. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HUMTCXAAA_PEA_1_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PGN having a stmcture as follows (numbering according to HUMTCXAAA_PEA_1_P6): a sequence starting from any of amino acid numbers 175-x to 175; and ending at any of amino acid numbers 177 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTCXAAA_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to
MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNPTSGCSWLFQ PRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVLTLSDFRRENEGYYFCSAL SNSIMYFSHFVPVFLPAKPTTTPAPRPPTPAPTIASQPLSLRPEACRPAAGGA conesponding to amino acids 1 - 171 of CD8A_HUMAN, which also conesponds to amino acids 1 - 171 of HUMTCXAAA_PEA_1_P12, a second amino acid sequence bridging amino acid sequence comprising of G, and a third amino acid sequence being at least 90 % homologous to NRRRVCKCPRPWKSGDKPSLSARYV conesponding to amino acids 210 - 235 of CD8A_HUMAN, which also conesponds to amino acids 173 - 198 of HUMTCXAAA_PEA_1_P12, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HUMTCXAAA_PEA_1_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least
about 50 amino acids in length, wherein at least two amino acids comprise AGN having a structure as follows (numbering according to HUMTCXAAA_PEA_1_P12): a sequence starting from any of amino acid numbers 171-x to 171; and ending at any of amino acid numbers 173 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HUMTCXAAA_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to
MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNPTSGCSWLFQ PRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVLTLSDFRRENEGYYFCSAL SNSIMYFSHF LPAKPTTTPAPRPPTPAPTIASQPLSLRPEACRPAAGGA VHTRGLDF ACDIYIWAPLAGTCGVLLLSLVITLYCNH conesponding to amino acids 1 - 208 of CD8A_HUMAN, which also conesponds to amino acids 1 - 208 of
HUMTCXAAA_PEA_1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SKSRGIAAGRSRPRSCPWLC conesponding to amino acids 209 - 228 of HUMTCXAAA_PEA_1_P13, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMTCXAAA_PEA_1_P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SKSRGIAAGRSRPRSCPWLC in HUMTCXAAA_PEA_1_P13. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMTCXAAA_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to
MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNPTSGCSWLFQ PRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVLTLSDFRRENEGYYFCSAL SNSIMYFSH conesponding to amino acids 1 - 127 of CD8A_HUMAN, which also conesponds to amino acids 1 - 127 of HUMTCXAAA_PEA_1_P14, and a second amino acid sequence being at least 90 % homologous to
FACDIYIWAPLAGTCGVLLLSLVITLYCNHRNRRRVCKCPRPWKSGDKPSLSARYV
conesponding to amino acids 179 - 235 of CD8A_HUMAN, which also conesponds to amino acids 128 - 184 of HUMTCXAAA_PEA_1_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMTCXAAA_PEA_1_P14, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HF, having a structure as follows: a sequence starting from any of amino acid numbers 127-x to 127; and ending at any of amino acid numbers 128+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HUMTCXAAA_PEA_1_P15, comprising a first amino acid sequence being at least 90 % homologous to MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNPTSGCS WLFQ PRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVLTLSDFRRENEGYYFCSAL SNSIMYFSHFVPVFLP corresponding to amino acids 1 - 134 of CD8 A_HUMAN, which also corresponds to amino acids 1 - 134 of HUMTCXAAA_PEA_1_P15, a second amino acid sequence bridging amino acid sequence comprising of G, and a third amino acid sequence being at least 90 % homologous to NRRRVCKCPRPWKSGDKPSLSARYV conesponding to amino acids 210 - 235 of CD8A_HUMAN, which also conesponds to amino acids 136 - 161 of HUMTCXAAA_PEA_1_P15, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for an edge portion of HUMTCXAAA_PEA_1_P15, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PGN having a structure as follows (numbering according to HUMTCXAAA_PEA_1_P15): a sequence starting
from any of amino acid numbers 134-x to 134; and ending at any of amino acid numbers 136 + ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSPPI_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to
MAX WMRLLPLLALLALWGPDPAAAFVNQHLCGSHL VEAL YLVCGERGFFYTPKTRRE AEDLQ conesponding to amino acids 1 - 62 of INS_HUMAN, which also conesponds to amino acids 1 - 62 of HSPPI_PEA_1_P6, and a second amino acid sequence being at least 90 % homologous to GSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN corresponding to amino acids 75 - 110 of INS_HUMAN, which also conesponds to amino acids 63 - 98 of
HSPPI_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSPPI_PEA_1_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise QG, having a structure as follows: a sequence starting from any of amino acid numbers 62-x to 62; and ending at any of amino acid numbers 63+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPPI_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHL VEAL YLVCGERGFFYTPKTRRE AEDLQ conesponding to amino acids 1 - 62 of INS_HUMAN, which also conesponds to amino acids 1 - 62 of HSPPI_PEA_1_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEPTAHCCPWPPPATPCSWRSHPAWAEGGRRLPPSRGSGALF conesponding to amino acids 63 - 104 of HSPPI_PEA_1_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of HSPPI_PEA_1_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEPTAHCCPWPPPATPCSWRSHPAWAEGGRRLPPSRGSGALF in HSPPI_PEA_1_P9. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSPPI_PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRRE AEDLQ conesponding to amino acids 1 - 62 of INS_HUMAN, which also conesponds to amino acids 1 - 62 of HSPPI_PEA_1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GGAGRGPWCRQPAALGPGGVPAEAWHCGTMLYQHLLPLPAGELLQLDAARRQPHTR RLLHRERWNKALEPA conesponding to amino acids 63 - 133 of HSPPI_PEA_1_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPPI_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GGAGRGPWCRQPAALGPGGVPAEAWHCGTMLYQHLLPLPAGELLQLDAARRQPHTR RLLHRERWNKALEPA in HSPPI_PEA_1_P10. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPPI_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHL VEAL YLVCGERGFFYTPKTRRE AEDLQ corresponding to amino acids 1 - 62 of INSJHUMAN, which also conesponds to amino acids 1 - 62 of HSPPI_PEA_1_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
AGELLQLDAARRQPHTRRLLHRERWNKALEPA conesponding to amino acids 63 - 94 of HSPPI_PEA_1_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of HSPPI_PEA_1_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence AGELLQLDAARRQPHTRRLLHRERWNKALEPA in HSPPI_PEA_1_P12. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPPI_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRRE AEDLQ corresponding to amino acids 1 - 62 of INS_HUMAN, which also conesponds to amino acids 1 - 62 of HSPPI_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AGELLQLDAARRQPHTRRLLHRERWNKALEPALLCRLCVLGALGQAPLPGTWSPSQL SPRSLGAHRCQRRPGPACSGSPQSGHACRLPAAPTLWLRVQYGSCGGL conesponding to amino acids 63 - 168 of HSPPI_PEA_1_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPPI_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence AGELLQLDAARRQPHTRRLLHRERWNKALEPALLCRLCVLGALGQAPLPGTWSPSQL SPRSLGAHRCQRRPGPACSGSPQSGHACRLPAAPTLWLRVQYGSCGGL in HSPPI_PEA_1_P14. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSPPI_PEA_1_P15, comprising a first amino acid sequence being at least 90 % homologous to
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRRE
AEDLQ conesponding to amino acids 1 - 62 of INS_HUMAN, which also conesponds to amino acids 1 - 62 of HSPPI_PEA_1_P15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GGAGRGPWCRQPAALGPGGVPAEAWHCGTMLYQHLLPLPAGELLQLDAARRQPHTR RLLHRERWNKALEPALLCRLCVLGALGQAPLPGTWSPSQLSPRSLGAHRCQRRPGPA CSGSPQSGHACRLPAAPTLWLRVQYGSCGGL conesponding to amino acids 63 - 207 of HSPPI_PEA_1_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPPI_PEA_1_P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GGAGRGPWCRQPAALGPGGVPAEAWHCGTMLYQHLLPLPAGELLQLDAARRQPHTR RLLHRERWNKALEPALLCRLCVLG ALGQAPLPGTWSPSQLSPRSLGAHRCQRRPGPA CSGSPQSGHACRLPAAPTLWLRVQYGSCGGL in HSPPI_PEA_1_P15. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for DI 1581_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIFFAQFVQEATYK EVSKMVKDALTAIEKPTGDEQSSGCLENQ conesponding to amino acids 1 - 90 of FETA_HUMAN, which also conesponds to amino acids 1 - 90 of DI 1581_PEA_1_P6, and a second amino acid sequence being at least 90 % homologous to YGHSDCCSQSEEGRHNCFLAHKKPTPASIPLFQVPEPVTSCEAYEEDRETFMNKFIYEIA RRHPFLYAPTILLWAARYDKIIPSCCKAENAVECFQTKAATVTKELRESSLLNQHACAV MKNFGTRTFQAITVTKXSQKFTKWFTEIQKLVLDVAHVΗEHCCRGDVLDCLQDGEKI MSYICSQQDTLSNKITECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSS GEKNIFLASFVHEYSRRHPQLAVSVILRVAKGYQELLEKCFQTENPLECQDKGEEELQK YIQESQALAKRSCGLFQKLGEYYLQNAFLVAYTKKAPQLTSSELMAITRKMAATAATC CQLSEDKLLACGEGAADIIIGHLCIRHEMTPVNPGVGQCCTSSYANRRPCFSSLWDETY VPPAFSDDKFIFHKDLCQAQGVALQTMKQEFLINLVKQKPQITEEQLEAVIADFSGLLEK
CCQGQEQEVCFAEEGQKLISKTRAALGV conesponding to amino acids 108 - 609 of FETA_HUMAN, which also conesponds to amino acids 91 - 592 of DI 1581_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of DI 1581_PEA_1_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise QY, having a stmcture as follows: a sequence starting from any of amino acid numbers 90-x to 90; and ending at any of amino acid numbers 91+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for DI 1581_PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to
MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIFFAQFVQEATYK EVSKMVKDALTAIEKPTGDEQSSGCLENQLPAFLEELCHEKEILEKYGHSDCCSQSEEG PHNCFLAHKKPTPASIPLFQVPEPVTSCEAYEEDRETFMNKFIYEIARRHPFLYAPTILLW AARYDKIIPSCCKAENAVECFQTKAATVTKELRESSLLNQHACAVMKNFGTRTFQAITV TKLSQKFTKVNFTEIQKLVLDVAHVHEHCCRGDVLDCLQDGEKIMSYICSQQDTLSNKI TECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSSGEKNIFLASFVHEYS RRHPQLAVSVILRVAKGYQELLEKCFQTENPLECQDKGEEELQKYIQESQALAKRSCGL FQKLGEYYLQNAFLVAYTKKAPQLTSSELMAITRKMAATAATCCQLSEDKLLACGEGA ADIIIGHLCIRHEMTPVNPGVGQCCTSSYANRRPCFSSLWDETYVPPAFSDDKFIFHKDL CQAQGVALQTMKQE conesponding to amino acids 1 - 551 of FETA_HUMAN, which also conesponds to amino acids 1 - 551 of D11581_PEA_1_P10. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for DI 1581_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIFFAQFVQEATYK EVSKMVKDALTAIEKPTGDEQSSGCLENQLPAFLEELCHEKEILEKYGHSDCCSQSEEG
RHNCFLAHKKPTPASIPLFQVPEPVTSCEAYEEDRETFMNKFIYEIARRHPFLYAPTILLW AARYDKJIPSCCKAENAVECFQTKAATVTKELRESSLLNQHACAVMKNFGTRTFQAITV TKLSQKFTKVNFTEIQKL\^DVAHVHEHCCRGDVLDCLQDGEKLMSYICSQQDTLSNKI TECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSSGEKNIFLA conesponding to amino acids 1 - 352 of FETA_HUMAN, which also conesponds to amino acids 1 - 352 of DI 1581_PEA_1_P12, and a second amino acid sequence being at least 90 % homologous to
SLVVDETYVPPAFSDDKFIFHKDLCQAQGVALQTMKQEFLINLVKQKPQITEEQLEAVI ADFSGLLEKCCQGQEQEVCFAEEGQKLISKTRAALGV conesponding to amino acids 514 - 609 of FETA_HUMAN, which also conesponds to amino acids 353 - 448 of
DI 1581_PEA_1_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of DI 1581_PEA_1_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AS, having a stmcture as follows: a sequence starting from any of amino acid numbers 352-x to 352; and ending at any of amino acid numbers 353+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for DI 1581_PEA_1_P16, comprising a first amino acid sequence being at least 90 % homologous to MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIFFAQFVQEATYK EVSKMVKDALTAIEKPTGDEQSSGCLENQ conesponding to amino acids 1 - 90 of
FETA_HUMAN, which also conesponds to amino acids 1 - 90 of DI 1581_PEA_1_P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NFAMRKKFWRSTDIQTAAAKVKREDITVFLHTKSPLQHRSHFSKFQNLSQAVKHMKKT
GRHS conesponding to amino acids 91 - 152 of DI 1581_PEA_1_P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of D11581_PEA_1_P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NFAMRKKTWRSTDIQTAAAKVKJ EDITVFLHTKSPLQHRSHFSKFQNLSQAVKHMKKT GRHS in D11581_PEA_1_P16. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for Z25227_P10, comprising a first amino acid sequence being at least 90 % homologous to
MQQQAATAQAAAAAQAAAVAGN GPGSVGGIAPAISLSAAAGIGVDDLRRLCILRMS FVKGWGPDYPRQSIKETPCWIEIHLHRALQLLDEVLHTMPIADPQPLD conesponding to amino acids 447 - 552 of SMA4_HUMAN, which also conesponds to amino acids 1 - 106 of Z25227_P10. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for T87719_P2, comprising a first amino acid sequence being at least 90 % homologous to MRCALALSALLLLLSTPPLLPS conesponding to amino acids 1 - 22 of PODX_HUMAN_Vl, which also conesponds to amino acids 1 - 22 of T87719_P2, a second amino acid sequence being at least 90 % homologous to
SPSPSPSPSQNATQTTTDSSNKTAPTPASSVTIMATDTAQQSTVPTSKANEILASVKATTL GVSSDSPGTTTLAQQVSGPVNTTVARGGGSGNPTTTIESPKSTKSADTTTVATSTATAKP NTTSSQNGAEDTTNSGGKSSHSVTTDLTSTKAEHLTTPHPTSPLSPRQPTSTHPVATPTSS GHDHLMKISSSSSTVAIPGYTFTSPGMTTTLPSSVISQRTQQTSSQMPASSTAPSSQETVQ PTSPATALRTPTLPETMSSSPTAASTTHRYPKTPSPTVAHESNW conesponding to amino acids 25 - 311 of PODX_HUMAN_Vl, which also corresponds to amino acids 23 - 309 of T87719_P2, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VTPAGVGQVGEPRLG conesponding to amino acids 310 - 324 of T87719_P2, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order.
According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T87719_P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SS, having a stmcture as follows: a sequence starting from any of amino acid numbers 22-x to 23; and ending at any of amino acid numbers 23+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T87719_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VTPAGVGQVGEPRLG in T87719_P2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T87719_P8, comprising a first amino acid sequence being at least 90 % homologous to MRCALALSALLLLLSTPPLLPS conesponding to amino acids 1 - 22 of PODX_HUMAN, which also conesponds to amino acids 1 - 22 of T87719_P8, a second amino acid sequence being at least 90 % homologous to SPSPSPSPSQNATQTTTDSSNKTAPTPASSVTIMATDTAQQSTVPTSKANEILASVKATTL GVSSDSPGTTTLAQQVSGPVNTTVARGGGSGNPTTTIESPKSTKSADTTTVATSTATAKP NTTSSQNGAEDTTNSGGKSSHSVTTDLTSTKAE conesponding to amino acids 25 - 178 of PODX_HUMAN, which also conesponds to amino acids 23 - 176 of T87719_P8, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RARVKL conesponding to amino acids 177 - 182 of T87719_P8, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of T87719_P8, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more
preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SS, having a structure as follows: a sequence starting from any of amino acid numbers 22-x to 23; and ending at any of amino acid numbers 23+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of T87719_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RARVKL in T87719_P8. According to preferred embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSTIR_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to
MPMGSLQPLATLYLLGMLVASCLGRLSWYDPDFQARLTRSNSKCQGQLEVYLKDGW HMVCSQSWGRSSKQWEDPSQASKVCQRLNCGVPLSLGPFLVTYTPQSSIICYGQLGSFS NCSHSRNDMCHSLGLTCLEPQKTTPPTTRPPPTTTPEPTAPPRLQLVAQSGGQHCAGW EFYSGSLGGTISYEAQDKTQDLENFLCNNLQCGSFLKHLPETEAGRAQDPGEPREHQPL PIQWKIQNSSCTSLEHCFRKIKPQKSGRVLALLCSGFQPKVQSRLVGGSSICEGTVEVRQ GAQWAALCDSSSARSSLRWEEVCREQQCGSVNSYRVLDAGDPTSRGLFCPHQKLSQC HELWERNSYCKKVFVT conesponding to amino acids 1 - 366 of CD5_HUMAN, which also conesponds to amino acids 1 - 366 of HSTIR_PEA_1_P6, and a second amino acid sequence being at least 90 % homologous to
FRQKKQRQWIGPTGMNQNMSFHRNHTATVRSHAENPTASHVDNEYSQPPRNSRLSAY PALEGVLHRSSMQPDNSSDSDYDLHGAQRL corresponding to amino acids 409 - 495 of CD5_HUMAN, which also conesponds to amino acids 367 - 453 of HSTIR_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSTIR_PEA_1_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino
acids in length, wherein at least two amino acids comprise TF, having a structure as follows: a sequence starting from any of amino acid numbers 366-x to 366; and ending at any of amino acid numbers 367+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSALK1A_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to
MTLGSPRKGLLMLLMALVTQGDPVKPSRGPLVTCTCESPHCKGPTCRGAWCTWLVR EEGRHPQEHRGCGNLHRELCRGRPTEFVNHYCCDSHLCNHNVSLVLE conesponding to amino acids 1 - 104 of KIR3_HUMAN, which also conesponds to amino acids 1 - 104 of HSALK1A_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GTSSCPSTPSPSSWPLPSLPSFPLMLWPIKGLGAGERVGRTLGSNWQSGLARGGGS conesponding to amino acids 105 - 160 of HSALK1A_PEA_1_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSALK1A_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTSSCPSTPSPSSWPLPSLPSFPLMLWPIKGLGAGERVGRTLGSNWQSGLARGGGS in HSALK1A_PEA_1_P14. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCDIA_PEA_1_P5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LQWGRKNLGAMFAF conesponding to amino acids 1 - 14 of HSCDIA_PEA_1_P5, a bridging amino acid T conesponding to amino acid 18 of HSCDIA_PEA_1_P5, and a second amino acid sequence being at least 90 % homologous to GLKEPLSFHVTWIASFYNHSWKQNLVSGWLSDLQTHTWDSNSSTIVFLWPWSRGNFSN EEWKELETLFRIRTIRSFEGIRRYAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGS DFVSFQNNSWLPYPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKA
HLQRQVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPVWVMWMRGEQEQQGTQRGDI LPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQDIVLYWEHHSSVGFIILAVIVPL LLLIGLALWFRKRCFC conesponding to amino acids 20 - 327 of CD1A_HUMAN_V1, which also conesponds to amino acids 19 - 326 of HSCDIA_PEA_1_P5, wherein said first amino acid seque nee, bridging amino acid and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a head of HSCDIA_PEA_1_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LQWGRKNLGAMFAF of HSCDIA_PEA_1_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCDIA_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MLFLLLPLLA VLPGDGNADGLKEPLSFHVTWIASFYNHS WKQNLVSG WLSDLQTHTW DSNSSTΓVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQFEYPFEIQVTGG CELHSGKVSGSFLQLAYQGSDFVSFQNNSWLPYPVAGNMAKHFCKVLNQNQHENDIT HNLLSDTCPRFILGLLDAGKAHLQRQVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPV WVMWMRGEQEQQGTQRGDILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQD rVLYWEHHSSVGFIILAVrVPLLLLIGLALWFRKR conesponding to amino acids 1 - 324 of CD1A_HUMAN_V1, which also conesponds to amino acids 1 - 324 of HSCDIA_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence W conesponding to amino acids 325 - 325 of HSCDIA_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCDIA_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to MLFLLLPLLA VLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVS GWLSDLQTHTW DSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQFEYPFEIQVTGG
CELHSGKVSGSFLQLAYQGSDFVSFQNNSWLPYPVAGNMAKHFCKVLNQNQHENDIT HNLLSDTCPRFILGLLDAGKAHLQRQVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPV WVMWMRGEQEQQGTQRGDILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQD rVLYW conesponding to amino acids 1 - 294 of CD1A_HUMAN_V1, which also conesponds to amino acids 1 - 294 of HSCDIA_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEKKLRPRLEMPGSGPQA conesponding to amino acids 295 - 312 of HSCDIA_PEA_1_P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSCDIA_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEKKLRPRLEMPGSGPQA in HSCDIA_PEA_1_P7. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCDIA_PEA_1_P8, comprising a first amino acid sequence being at least 90 % homologous to MLFLLLPLLA VLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLSDLQTHTW DSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQFEYPFEIQVTGG CELHSGKVSGSFLQLAYQGSDFVSFQNNSWLPYPVAGNMAKHFCKVLNQNQHENDIT HNLLSDTCPRFILGLLDAGKAHLQRQVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPV WVMWMRGEQEQQGTQRGDILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQD IVLYW conesponding to amino acids 1 - 294 of CD1 A_HUMAN_V1, which also conesponds to amino acids 1 - 294 of HSCDIA_PEA_1_P8, and a second amino acid sequence being at least 90 % homologous to GLALWFRKRCFC conesponding to amino acids 316 - 327 of CD1A_HUMAN_V1, which also conesponds to amino acids 295 - 306 of HSCDIA_PEA_1_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSCDIA_PEA_1_P8, comprising
a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise WG, having a structure as follows: a sequence starting from any of amino acid numbers 294-x to 294; and ending at any of amino acid numbers 295+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCDIA_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to MLFLLLPLLA VLPGDGNAD conesponding to amino acids 1 - 19 of CD1A_HUMAN, which also corresponds to amino acids 1 - 19 of
HSCDIA_PEA_1_P9, and a second amino acid sequence being at least 90 % homologous to
GWLSDLQTHTWDSNSSTΓVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQ FEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLPYPVAGNMAKHFCKV LNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQRQVKPEAWLSHGPSPGPGHLQLV CHVSGFYPKPVWVMWMRGEQEQQGTQRGDILPS ADGTWYLRATLEVAAGEAADLSC
RVKHSSLEGQDIVLYWEHHSSVGFIILAVIVPLLLLIGLALWFRKRCFC conesponding to amino acids 47 - 327 of CD1A_HUMAN, which also corresponds to amino acids 20 - 300 of HSCDIA_PEA_1_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSCDIA_PEA_1_P9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise DG, having a structure as follows: a sequence starting from any of amino acid numbers 19-x to 19; and ending at any of amino acid numbers 20+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSCDIA_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to
MLFLLLPLLA VLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLSDLQTHTW
DSNSSTΓVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQFEYPFEIQVTGG CELHSGKVSGSFLQLAYQGSDFVSFQNNSWLPYPVAGNMAKHFCKVLNQNQHENDIT HNLLSDTCPRFILGLLDAGKAHLQRQVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPV WVMWMR conesponding to amino acids 1 - 239 of CD1A_HUMAN_V1, which also conesponds to amino acids 1 - 239 of HSCDIA_PEA_1_P11, and a second amino acid sequence being at least 90 % homologous to EHHSSVGFIILAVIVPLLLLIGLALWFRKRCFC conesponding to amino acids 295 - 327 of CD1A_HUMAN_V1, which also conesponds to amino acids 240 - 272 of HSCDIA_PEA_1_P11, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSCDIA_PEA_1_P11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RE, having a structure as follows: a sequence starting from any of amino acid numbers 239-x to 239; and ending at any of amino acid numbers 240+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
ΪΦranscriDiNSiϊel. HUMEDF PEA 2 T5 HUMEDF PEA_2_T10 HUMEDF PEA 2 Ti l a nucleic acid sequence comprising a sequence in the table below:
Segment Name - . ( . HUMEDF_PEA. .2. .node. .6 HUMEDF_PEA. .2. .node. .11 HUMEDF_PEA. .2. .node. .18 HUMEDF_PEA_ .2. _node_ .19 HUMEDF_PEA. .2. .node. .22 HUMEDF_PEA_ .2. .node. 2 HUMEDF_PEA. .2. .node. .8 HUMEDF_PEA. .2. .node. .20
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to preferred embodiments ofthe present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HUM1NHA PEA 1 T2 HUMINHA PEA 1 T4 HUMINHA_PEA_1_T5 HUMINHA PEA 1 T6 a nucleic acid sequence comprising a sequence in the table below:
HUMINHA PEA_l_node 2 HUMINHA PEA 1 node 3 HUMINHA_PEA_l_node_4 HUMINHA PEA 1 node 7 HUMINHA PEA 1 node 9 HUMP HA PEA 1 node 10 HUMINHA PEA 1 node 16 HUMINHA PEA 1 node 5 HUMINHA PEA 1 node 6 HUMINHA PEA 1 node 8 HUMINHA PEA 1 node 11 HUMINHA_PEA_l_node_12 HUMINHA PEA_l_node_14
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below: PfoteirrNarne ,~t J Stfi HUMINHA PEA 1 P4
HUMINHA PEA 1 P5
HUMINHA PEA 1 P8
HUMINHA PEA 1 P10
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMINHA_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to MVLHLLLFLLLTPQGGHSCQGLELARELVLAKVRALFLDALGPPAVTREGGDPGVRRL PRRHALGGFTHRGSEPEEEEDVSQAILFPAT conesponding to amino acids 1 - 89 of IHA_HUMAN, which also conesponds to amino acids 1 - 89 of HUMINHA_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GSAPQRPVAMTTAQRDSLLWKLAGLLRESGDWLSGCSTLSLLTPTLQQLNHVFELHL GPWGPGQTGFV conesponding to amino acids 90 - 158 of HUMINHA__PEA_1_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMINHA_PEA_1_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GSAPQRPVAMTTAQRDSLLWKLAGLLRESGDWLSGCSTLSLLTPTLQQLNHVFELHL GPWGPGQTGFV in HUMINHA_PEA_1_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMINHA__PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to MVLHLLLFLLLTPQGGHSCQGLELARELVLAKVRALFLDALGPPAVTREGGDPGVRRL PRRHALGGFTHRGSEPEEEEDVSQAILFPATDASCEDKSAARGLAQEAEEGLFRYMFRP SQHTR conesponding to amino acids 1 - 122 of IHA_HUMAN, which also conesponds to amino acids 1 - 122 of HUMINHA_PEA_1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NHPVEGREPDAQLP conesponding to amino acids 123 - 136 of HUMINHA_PEA_1_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMINHA_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NHPVEGREPDAQLP in HUMINHA_PEA_1_P10. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMEDF_PEA_2_P5, comprising a first amino acid sequence being at least 90 % homologous to
MPLLWLRGFLLASCWIΓVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNSQPEMVEAVK KHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGYVEIEDDIGRRAEMNELM
EQTSEIITFAESGT conesponding to amino acids 1 - 131 of IHB A_HUMAN, which also conesponds to amino acids 1 - 131 of HUMEDF_PEA_2_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKS corresponding to amino acids 132 - 134 of HUMEDF_PEA_2_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMEDF_PEA_2_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKS in HUMEDF_PEA_2_P5. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMEDF_PEA_2_P6, comprising a first amino acid sequence being at least 90 % homologous to
MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNSQPEMVEAVK KΉILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGYVEIΈDDIGRRAEMNELM EQTSEIITFAESG conesponding to amino acids 1 - 130 of IHB A TUMAN, which also conesponds to amino acids 1 - 130 of HUMEDF_PE A_2_P6, and a second amino acid
sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence HSEA conesponding to amino acids 131 - 134 of HUMEDF_PEA_2_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMEDF_PEA_2_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence HSEA in HUMEDF_PEA_2_P6. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMEDF_PEA_2_P8, comprising a first amino acid sequence being at least 90 % homologous to
MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNSQPEMVEAVK KHILNMLHLKKP 'DVTQPVPKAALLNAIRKLHVGKVGENGYVEIEDDIGRRAEMNELM EQTSEIITFAESGT conesponding to amino acids 1 - 131 of IHBA_HUMAN, which also conesponds to amino acids 1 - 131 of HUMEDF_PEA_2_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKS conesponding to amino acids 132 - 134 of HUMEDF_PEA_2_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMEDF_PEA_2_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKS in HUMEDF_PEA_2_P8. According to prefened embodiments of the present invention, there is provided an antibody capable of specifically binding to an epitope of an amino acid sequence as described herein. Optionally the amino acid sequence conesponds to a bridge, edge portion, tail, head or insertion as described herein.
Optionally the antibody is capable of differentiating between a splice variant having said epitope and a corresponding known protein. According to prefened embodiments of the present invention, there is provided a kit for detecting an inhibin variant-detectable disease, comprising a kit detecting specific expression of a splice variant as described herein. Optionally the kit comprises a NAT-based technology. Optionally the kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence as described herein. Optionally the kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence as described herein. Optionally the kit comprises an antibody as described herein. Optionally the kit further comprises at least one reagent for performing an ELISA or a Western blot. Optionally the method for detecting an inhibin variant-detectable disease, comprising detecting specific expression of a splice variant as described herein. Optionally said detecting specific expression is performed with a NAT-based technology. Optionally detecting specific expression is performed with an immunoassay. Optionally the immunoassay comprises an antibody as described herein. According to preferred embodiments of the present invention, there is provided a biomarker capable of detecting Inhibin variant- detectable disease, comprising any of the above nucleic acid sequences or a fragment thereof, or any ofthe above amino acid sequences or a fragment thereof. According to prefened embodiments of the present invention, there is provided a method for screening for variant-detectable disease, comprising detecting cells affected by an inhibin variant-detectable disease with a biomarker or an antibody or a method or assay as described herein. According to prefened embodiments of the present invention, there is provided a method for diagnosing an inhibin variant detectable disease, comprising detecting cells affected by inhibin variant-detectable disease with a biomarker or an antibody or a method or assay as described herein.
According to prefened embodiments of the present invention, there is provided a method for monitoring disease progression and/or treatment efficacy and/or relapse of Inhibin variant- detectable disease, comprising detecting cells affected by Inhibin variant-detectable disease with a biomarker or an antibody or a method or assay as described herein. According to prefened embodiments of the present invention, there is provided a method of selecting a therapy for Inhibin variant- detectable disease, comprising detecting cells affected by an inhibin variant-detectable disease with a biomarker or an antibody or a method or assay as described herein and selecting a therapy according to said detection. According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HSGROWl_PEA 1 PEA 1 T5
HSGROW1 PEA 1 PEA 1 T8
HSGROW1 PEA 1 PEA 1 T10
HSGROW1 PEA 1 PEA 1 Ti l
HSGR0W1_PEA 1 PEA 1 T16 a nucleic acid sequence comprising a sequence in the table below:
HSGR0W1_PEA 1 PEA 1 node 2
HSGROW1 PEA 1 PEA 1 node 4
HSGROW1 PEA 1 PEA 1 node 15
HSGROWl_PEA 1 PEA 1 node 18
HSGROW1 PEA 1 PEA 1 node 0
HSGROWl_PEA 1 PEA 1 node 3
HSGROW1 PEA 1 PEA 1 node 5
HSGROWl_PEA 1 PEA 1 node 6
HSGROW1 PEA 1 PEA 1 node 7
HSGROWl_PEA 1 PEA 1 node 8
HSGROWl_PEA 1 PEA 1 node 9
HSGROW1 PEA 1 PEA 1 node 11
HSGROWl_PEA 1 PEA 1 node 12
HSGROW1 PEA 1 PEA 1 node 13
HSGROW1 PEA 1 PEA 1 node 14
HSGROWl_PEA 1 PEA 1 node 16
HSGROW1 PEA 1 PEA 1 node 17
HSGROW1 PEA 1 PEA 1 node 19
HSGROW1 PEA 1 PEA 1 node 20 According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HSGROWl_PEA_l_PEA 1 P17 HSGROW1 PEA 1 PEA 1 PI 8 HSGROW1 PEA_1_PEA 1 P9 HSGROW1 PEA 1 PEA 1 P10 HSGROW1 PEA 1 PEA 1 P15 According to preferred embodiments ofthe present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
T05709 PEA 1 T2 T05709 PEA 1 T3 T05709 PEA_1_T5 T05709 PEA 1 T7 T05709_PEA 1 T8 a nucleic acid sequence comprising a sequence in the table below:
T05709_PEA_ .l_node_45
T05709_PEA_ _l_node_46
T05709_PEA_ 1 node 48
T05709_PEA_ _l_node_5
T05709_PEA_ _l_node_7 According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
T05709 PEA 1 P3
T05709 PEA 1 P8
T05709_PEA 1 P9
T05709 PEA 1 PI 3
T05709 PEA 1 P14 According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T05709_PEA_1_P3, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDL VYVNY ARTEDFFKLERDMKINCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKSYPDGWNLPGGGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVH PIGYYDAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTNEVT RIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVRSFGTLKKEGWRP RRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYINADSSIEGNYTLRVDCTPLMY SLVHNLTKELKSPDEGFEGKSLYESWTKKSPSPEFSGMPRISKLGSGNDFEVFFQRLGIA SGRARYTKNWETNKFSGYPLYHSVYETYELVEKFYDPMFKYHLTVAQVRGGMVFELA NSIVLPFDCRDYAWLRKYADKIYSISMKHPQEMKTYSVSFDSLFSAVKNFTEIASKFSE RLQDFDKS conesponding to amino acids 1 - 656 of Q8TAY3, which also conesponds to amino acids 1 - 656 of T05709_PEA_1_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MSSMLQAATTSMQGSHSQEFMMLCLILKAKWTLPRPGEK conesponding to amino acids
657 - 695 of T05709_PEA_1_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of T05709_PEA_1_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MSSMLQAATTSMQGSHSQEFMMLCLILKAKWTLPRPGEK in T05709_PEA_1_P3. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T05709_PEA_1_P3, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLITETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDL VYVNY ARTEDFFKLERDMKINCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKS YPDGWNLPGGGVQRGNILNLNGAGDPLTPG YPANEYA YRRGIAEAVGLPSIPVH PIGYYDAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTNEVT RIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVRSFGTLKKEGWRP RRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYINADSSIEGNYTLRVDCTPLMY SLVHNLTKELKSPDEGFEGKSLYESWTKKSPSPEFSGMPRISKLGSGNDFEVFFQRLGIA SGRARYTKNWETNKFSGYPLYHSVYETYELVEKFYDPMFKYHLTVAQVRGGMVFELA NSIVLPFDCRDYAWLRKYADKIYSISMKHPQEMKTYSVSFDSLFSAVKNFTEIASKFSE RLQDFDKS conesponding to amino acids 1 - 656 of FOHl_HUMAN, which also corresponds to amino acids 1 - 656 of T05709_PEA_1_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
MSSMLQAATTSMQGSHSQEFMMLCLILKAKWTLPRPGEK conesponding to amino acids 657 - 695 of T05709_PEA_1_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T05709_PEA_1_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at
least about 90% and most preferably at least about 95% homologous to the sequence MSSMLQAATTSMQGSHSQEFMMLCLILKAKWTLPRPGEK in T05709_PEA_1_P3. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T05709_PEA_1_P8, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDLVYVNY ARTEDFFKLERDMKINCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKSYPDGWNLPGGGVQRGNILNLNGAGDPLTPGYPANEYA YRRGIAEAVGLPSIPVH PIGYYDAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTNEVT RIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVRSFGTLKKEGWRP RRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYINADSSIEGNYTLRVDCTPLMY SLVHNLTKE corresponding to amino acids 1 - 480 of Q8TAY3, which also conesponds to amino acids 1 - 480 of T05709_PEA_1_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VL conesponding to amino acids 481 - 482 of T05709_PEA_1_P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for T05709_PEA_1_P8, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDL VYVNY ARTEDFFKLERDMKINCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKSYPDGWNLPGGGVQRGNILNLNGAGDPLTPGYPANEYA YRRGIAEAVGLPSIPVH PIGYYDAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTNEVT RIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVRSFGTLKKEGWRP RRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYINADSSIEGNYTLRVDCTPLMY SLVHNLTKE conesponding to amino acids 1 - 480 of FOHl_HUMAN, which also
conesponds to amino acids 1 - 480 of T05709_PEA_1_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VL corresponding to amino acids 481 - 482 of T05709_PEA_1_P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for T05709_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFL YNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDLVYVNY ARTEDFFKLERDMKLNCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKSYPDGWNLPGGGVQRGNILNLNGAGDPLTPGYPAN conesponding to amino acids 1
- 275 of Q8TAY3, which also conesponds to amino acids 1 - 275 of T05709_PEA_1_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GE conesponding to amino acids 276 - 277 of T05709_PEA_1_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T05709_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDLVYVNY ARTEDFFKLERDMKTNCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKSYPDGWNLPGGGVQRGNILNLNGAGDPLTPGYPAN conesponding to amino acids 1
- 275 of FOHl_HUMAN, which also conesponds to amino acids 1 - 275 of T05709_PEA_1_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GE conesponding to amino acids 276 - 277
of T05709_PEA_1_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T05709_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM K ^LDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDLVYVNY ARTEDFFKLERDMKTNCSGKΓVIARYGKVFRGNK corresponding to amino acids 1 - 213 of Q8TAY3, which also conesponds to amino acids 1 - 213 of T05709_PEA_1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NMLIGVELQRLLVFQVFLFIQLDTMMHRSS conesponding to amino acids 214 - 243 of T05709_PEA_1_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T05709_PEA_1_P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NMLIGVELQRLLVFQVFLFIQLDTMMHRSS in T05709_PEA_1_P13. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for T05709_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFL YNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDL VYVNY ARTEDFFKLERDMKINCSGKIVIARYGKVFRGNK conesponding to amino acids 1 - 213 of FOHl_HUMAN, which also conesponds to amino acids 1 - 213 of T05709_PEA_1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NMLIGVELQRLLVFQVFLFIQLDTMMHRSS conesponding to amino
acids 214 - 243 of T05709_PEA_1_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of T05709_PEA_1_P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NMLIGVELQRLLVFQVFLFIQLDTMMHRSS in T05709_PEA_1_P13. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for T05709_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLF conesponding to amino acids 1 - 39 of Q8TAY3, which also conesponds to amino acids 1 - 39 of T05709_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKVGKRN corresponding to amino acids 40 - 46 of T05709_PEA_1_P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of T05709_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKVGKRN in T05709_PEA_1_P14. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for T05709_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLF conesponding to amino acids 1 - 39 of FOHl_HUMAN, which also conesponds to amino acids 1 - 39 of T05709_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKVGKRN conesponding to amino acids 40 - 46 of T05709_PEA_1_P14, wherein said first and second amino acid sequences are contiguous and in a sequential order.
According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of T05709_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKVGKRN in T05709_PEA_1_P14. According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HSU13680 PEA 1 T6 HSU 13680 PEA 1 T8 HSU13680 PEA 1 T10 HSU13680 PEA 1 Ti l HSU13680 PEA_1_T13 HSU13680 PEA 1 T14 a nucleic acid sequence comprising a sequence in the table below:
HSU13680 PEA 1 node_3 HSU13680 PEA 1 node 8 HSU13680 PEA 1 node_10 HSU13680 PEA 1 node_14 HSU 13680 PEA 1 node 16 HSU13680 PEA 1 node 0 HSU13680 PEA 1 node_l HSU 13680 PEA 1 node_5 HSU 13680 PEA 1 node 6 HSU13680 PEA 1 node_12 According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HSPROSAP PEA 1 T3 HSPROSAP_PEA 1_T15 HSPROSAP PEA 1 T19 HSPROSAP PEA 1 T20 HSPROSAP PEA 1 T23 HSPROSAP PEA 1 T24 HSPROSAP PEA 1 T25 a nucleic acid sequence comprising a sequence in the table below:
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
According to preferred embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HUMPGCA PEA 1 TO HUMPGCA PEA 1_T1 HUMPGCA PEA 1 T5 a nucleic acid sequence comprising a sequence in the table below:
HUMPGCA PEA 1 node 0 HUMPGCA PEA l_node_2 HUMPGCA PEA 1 node 14 HUMPGCA PEA 1 node_16 HUMPGCA PEA l_node_17 HUMPGCA PEA 1 node 19 HUMPGCA PEA l_node_28 HUMPGCA PEA 1 node 4 HUMPGCA PEA l_node_5 HUMPGCA PEA 1 node 6 HUMPGCA PEA 1 node 9 HUMPGCA PEA l_node_10 HUMPGCA PEA 1 node 11 HUMPGCA PEA l_node_15 HUMPGCA PEA 1 node_22 HUMPGCA PEA 1 node 26 HUMPGCA PEA 1 node 27
HUMPGCA PEA 1 node_29
According to preferred embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HUMPGCA PEA 1 P12 HUMPGCA PEA 1 P14 HUMPGCA PEA 1 P15 According to prefened embodiments of the present invention, there is provided an isolated polynucleotide comprising a nucleic acid sequence in the table below and/or:
HUMFBRB PEA 1 T14 HUMFBRB PEA 1 T16 HUMFBRB PEA 1 T19 HUMFBRB PEA 1 T20 HUMFBRB PEA 1 T25 HUMFBRB PEA 1 T44 HUMFBRB PEA_1_T52 HUMFBRB PEA 1 T8
a nucleic acid sequence comprising a sequence in the table below:
HUMFBRB PEA 1 node 0 HUMFBRB PEA 1 node 28 HUMFBRB PEA 1 node 39 HUMFBRB PEA 1 node 47 HUMFBRB PEA_1 node 51 HUMFBRB PEA 1 node 55 HUMFBRB PEA_l_node 56 HUMFBRB PEA 1 node 64 HUMFBRB PEA 1 node 69 HUMFBRB PEA 1 node 71 HUMFBRB PEA 1 node 74 HUMFBRB PEA 1 node 75 HUMFBRB PEA 1 node 12 HUMFBRB PEA 1 node 13
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
IKgiMHamaSκ*s HUMFBRB PEA_1_P4 HUMFBRB PEA 1 P9 HUMFBRB PEA 1 Pl l HUMFBRB PEA 1 P13 HUMFBRB PEA 1 P17 HUMFBRB_PEA 1 P26
According to prefened embodiments ofthe present invention, there is provided an isolated polynucleotide compnsing a nucleic acid sequence in the table below and/or:
HSMRACP5 PEA 1 Ti l HSMRACP5_PEA_1 T14 HSMRACP5 PEA 1 T20
a nucleic acid sequence comprising a sequence in the table below:
HSMRACP5 PEA 1 node 0 HSMRACP5 PEA 1 node 12 HSMRACP5 PEA 1 node 13 HSMRACP5_PEA 1 node 19 HSMRACP5 PEA 1 node 24 HSMRACP5 PEA 1 node 25 HSMRACP5 PEA 1 node 28 HSMRACP5 PEA 1 node 11 HSMRACP5 PEA 1 node 14 HSMRACP5 PEA 1 node 15 HSMRACP5 PEA 1 node 16 HSMRACP5 PEA 1 node 17 HSMRACP5 PEA 1 node 20 HSMRACP5 PEA 1 node 23 HSMRACP5 PEA 1 node 26 HSMRACP5 PEA 1 node 27 HSMRACP5 PEA 1 node 3 HSMRACP5 PEA 1 node 8
According to prefened embodiments of the present invention, there is provided an isolated polypeptide comprising an amino acid sequence in the table below:
HSMRACP5 PEA 1 Pl l HSMRACP5 PEA 1 P12 HSMRACP5 PEA 1 P14 According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSMRACP5_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to
MDMWTALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMANAKEIART VQILGADFILSLGDNFYFTGVQDINDKRFQ conesponding to amino acids 1 - 87 of AAH25414, which also conesponds to amino acids 1 - 87 of HSMRACP5_PEA_1_P11, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VCAQQSGAGGGGGQWGEAALPSDLPLVRAEGR conesponding to amino acids 88 - 1 19 of HSMRACP5_PEA_1_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSMRACP5_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VCAQQSGAGGGGGQWGEAALPSDLPLVRAEGR in HSMRACP5_PEA_1_P 11. According to preferred embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSMRACP5_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to MDMWTALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMANAKEIART VQILGADFILSLGDNFYFTGVQDINDKRFQ conesponding to amino acids 1 - 87 of
PPA5_HUMAN, which also conesponds to amino acids 1 - 87 of HSMRACP5_PEA_1_P11, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VCAQQSGAGGGGGQWGEAALPSDLPLVRAEGR
conesponding to amino acids 88 - 1 19 of HSMRACP5_PEA_1_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSMRACP5_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VCAQQSGAGGGGGQWGEAALPSDLPLVRAEGR in HSMRACP5_PEA_1_P11. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSMRACP5_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to
MDMWTALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMANAKEIART VQILGADFILSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLRKVP conesponding to amino acids 1 - 103 of AAH25414, which also corresponds to amino acids 1 - 103 of HSMRACP5_PEA_1_P12, and a second amino acid sequence being at least 90 % homologous to
WNFPSPFYRLHFKIPQTNVSVAIFMLDTVTLCGNSDDFLSQQPERPRDVKLARTQLSWL KKQLAAAREDYVLVAGHYPVWSIAEHGPTHCLVKQLRPLLATYGVTAYLCGHDHNLQ YLQDENGVGYVLSGAGNFMDPSKRHQRKVPNGYLRFHYGTEDSLGGFAYVEISSKEM TVTYIEASGKSLFKTRLPRRARP conesponding to amino acids 130 - 325 of AAH25414, which also conesponds to amino acids 104 - 299 of HSMRACP5_PEA_1_P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HSMRACP5_PEA_1_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PW, having a structure as follows: a sequence starting from any of amino acid numbers 103-x to 104; and ending at any of amino acid numbers 104+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSMRACP5_PEA_1_P14, comprising a first amino
acid sequence being at least 90 % homologous to
MDMWTALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMANAKEIART VQILGADF1LSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLRK conesponding to amino acids 1 - 101 of AAH25414, which also conesponds to amino acids 1 - 101 of HSMRACP5_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EGETQLMNCGAT conesponding to amino acids 102 - 113 of HSMRACP5_PEA_1_P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSMRACP5_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EGETQLMNCGAT in HSMRACP5_PEA_1_P14. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSMRACP5_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to
MDMWTALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMANAKEIART VQILGADFILSLGDNFYFTGVQDLNDKRFQETFEDVFSDRSLRK conesponding to amino acids 1 - 101 of PPA5_HUMAN, which also conesponds to amino acids 1 - 101 of
HSMRACP5_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EGETQLMNCGAT conesponding to amino acids 102 - 113 of HSMRACP5_PEA_1_P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSMRACP5_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EGETQLMNCGAT in HSMRACP5_PEA_1_P14.
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSMRACP5_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to
MDM TALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMANAKEIART VQILGADFILSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLRK conesponding to amino acids 1 - 101 of PPA5_HUMAN, which also conesponds to amino acids 1 - 101 of HSMRACP5_PEA_1_P14, and a second amino acid sequence being at least 70%, optbnally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EGETQLMNCGAT conesponding to amino acids 102 - 113 of HSMRACP5_PEA_1_P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSMRACP5_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EGETQLMNCGAT in HSMRACP5_PEA_1_P14. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P4, comprising a first amino acid sequence being at least 90 % homologous to MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLDKKREEAP SLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADPDLGVLCPTGCQLQEA LLQQERPIR SVDELNNNVEAVSQTSSSSFQYMYLLKDLWQKRQKQVKDNENVVNEY SSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PWSGKECEEIIRKGGETSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSVDF GRKWDPYKQGFGNVATNTDGKNYCGLPGEYWLGNDKISQLTRMGPTELLIEMEDWK GDKVKAHYGGFTVQNEANKYQISVNKYRGTAGNALMDGASQLMGENRTMTIHNGMF FSTYDRDNDGW conesponding to amino acids 1 - 415 of FIBB_HUMAN, which also conesponds to amino acids 1 - 415 of HUMFBRB_PEA_1_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
YVWHSLLLL corresponding to amino acids 416 - 424 of HUMFBRB_PEA_1_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMFBRB_PEA_1_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YVWHSLLLL in HUMFBRB_PEA_1_P4. According to preferred embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to
MKJ MVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLDKKREEAP SLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADPDLGVLCPTGCQLQEA LLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYLLKDLWQKRQKQVKDNENWNEY SSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PWSGKECEEIIRKGGETSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSVDF GRKWDPYKQGFGNVATNTDGKNYCGLPG conesponding to amino acids 1 - 320 of FIBB_HUMAN, which also conesponds to amino acids 1 - 320 of HUMFBRB_PEA_1_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NEQACKIKSFYLKWDFF conesponding to amino acids 321 - 337 of HUMFBRB_PEA_1_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HUMFBRB_PEA_1_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NEQACKIKSFYLKWDFF in HUMFBRB_PEA_1_P9. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to
MKJ^MVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLDKKREEAP
SLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADPDLGVLCPTGCQLQEA LLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYLLKDLWQKRQKQVKDNENWNEY SSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PWSGKECEEIIRKGGETSEMYLIQPDSSVKPYRVYCDMNTENGG corresponding to amino acids 1 - 278 of FIBB_HUMAN, which also conesponds to amino acids 1 - 278 of HUMFBRB_PEA_1_P11, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KLSTWDLLICNYLDTVKCQETRPGWAHTCNSSTLGGQSGLIA corresponding to amino acids 279 - 322 of HUMFBRB_PEA_1_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of HUMFBRB_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KLSTWDLLICNYLDTVKCQETRPGWAHTCNSSTLGGQSGLIA in HUMFBRB_PEA_1_P11. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to
MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGFIRPLDKKREEAP SLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADPDLGVLCPTGCQLQEA LLQQERPIRNSVDELNNIWEAVSQTSSSSFQYMYLLKDLWQKRQKQVKDNENVVNEY SSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PWSGK conesponding to amino acids 1 - 239 of FIBBJ3UMAN, which also corresponds to amino acids 1 - 239 of HUMFBRB_PEA_1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GN conesponding to amino acids 240 - 241 of HUMFBRB_PEA_1_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order.
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P17, comprising a first amino acid sequence being at least 90 % homologous to
MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLDKKREEAP SLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADPDLGVLCPTGCQLQEA LLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYLLKDLWQKRQKQVKDNENVVNEY SSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PWSGKECEEIIRKGGETSEMYLIQPDSSVKPYRVYCDMNTENG conesponding to amino acids 1 - 277 of FIBB_HUMAN, which also conesponds to amino acids 1 - 277 of HUMFBRB_PEA_1_P17, and a second amino acid sequence being at least 90 % homologous to GEYWLGNDKISQLTRMGPTELLIEMEDWKGDKVKAHYGGFTVQNEANKYQISVNKYR GTAGNALMDGASQLMGENRTMTIHNGMFFSTYDRDNDGWLTSDPRKQCSKEDGGGW WYNRCHAANPNGRYYWGGQYTWDMAKHGTDDGWWMNWKGSWYSMRKMSMKI RPFFPQQ conesponding to amino acids 320 - 491 of FIBB_HUMAN, which also conesponds to amino acids 278 - 449 of HUMFBRB_PEA_1_P17, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMFBRB_PEA_1_P17, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise GG, having a structure as follows: a sequence starting from any of amino acid numbers 277-x to 277; and ending at any of amino acid numbers 278+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P26, comprising a first amino acid sequence being at least 90 % homologous to
MKRMVSWSFHKIKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLDKKRE conesponding to amino acids 1 - 54 of FIBB_HUMAN, which also conesponds to amino acids 1 - 54 of HUMFBRB_PEA_1_P26, and a second amino acid sequence being at least 90 % homologous to
EALLQQERPIRNS\T>ELNNNVEAVSQTSSSSFQYMYLLKDLWQKRQKQVKDNENVVN EYSSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSC NIPWSGKECEEIIRKGGETSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSV DFGRKWDPYKQGFGNVATNTDGKNYCGLPGEYWLGNDKISQLTRMGPTELLIEMED WKGDKVKAHYGGFTVQNEANKYQISVNKYRGTAGNALMDGASQLMGENRTMTIHN GMFFSTYDRDNDGWLTSDPRKQCSKEDGGGWWYNRCHAANPNGRYYWGGQYTWD MAKHGTDDGWWMNWKGSWYSMRKMSMKIRPFFPQQ conesponding to amino acids 114 - 491 of FIBB_HUMAN, which also conesponds to amino acids 55 - 432 of HUMFBRB_PEA_1_P26, wherein said first and second amino acid sequences are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMFBRB_PEA_1_P26, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EE, having a stmcture as follows: a sequence starting from any of amino acid numbers 54-x to 54; and ending at any of amino acid numbers 55+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HUMPGCA_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to
MKWMVWLVCLQLLEAAWKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYRF GDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSS conesponding to amino acids 1 - 95 of Q8IUM8, which also conesponds to amino acids 1 - 95 of HUMPGCA_PEA_1_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NLWVPSVYCQSQACTSHSRFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQV PNQEFGLSENEPGTNFVYAQFDGIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVY LSNQQGSSGGAWFGGVDSSLYTGQIYWAPVTQELYWQIGIEEFLIGGQASGWCSEGC QAIVDTGTSLLTVPQQYMSALLQATGAQEDEYGQFLVNCNSIQNLPSLTFIINGVEFPLP
PSSYILSNNGYCTVGVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA conesponding to amino acids 96 - 388 of HUMPGCA_PEA_1_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of HUMPGCA_PEA_1_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
NLWVPSVYCQSQACTSHSRFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQV PNQEFGLSENEPGTNFVYAQFDGIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVY LSNQQGSSGGAWFGGVDSSLYTGQΓYWAPVTQELYWQIGIEEFLIGGQASGWCSEGC QAΓVDTGTSLLTVPQQYMSALLQATGAQEDEYGQFLVNCNSIQNLPSLTFIINGVEFPLP PSSYILSNNGYCTVGVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA IN HUMPGCA_PEA_1_P12. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HUMPGCA_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to
MKWMVWLVCLQLLEAAWKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYRF GDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHSRF NPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFVYAQ FDGIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVYLS conesponding to amino acids 1 - 215 of PEPC_HUMAN, which also conesponds to amino acids 1 - 215 of HUMPGCA_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence K conesponding to amino acids 216 - 216 of HUMPGCA_PEA_1_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HUMPGCA_PEA_1_P15, comprising a first amino acid sequence being at least 90 % homologous to MKWMWVLVCLQLLEAAWKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYRF GDLSVTYEPMAYMD conesponding to amino acids 1 - 70 of PEPC_HUMAN, which also
conesponds to amino acids 1 - 70 of HUMPGCA_PEA_1_P15, and a second amino acid sequence being at least 90 % homologous to
VQSIQVPNQEFGLSENEPGTNFVYAQFDGIMGLAYPALSVDEATTAMQGMVQEGALTS PVFSVYLSNQQGSSGGAWFGGVDSSLYTGQIYWAPVTQELYWQIGIEEFLIGGQASGW CSEGCQAIVDTGTSLLTVPQQYMSALLQATGAQEDEYGQFLVNCNSIQNLPSLTFIINGV EFPLPPSSYILSNNGYCTVGVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFA TAA conesponding to amino acids 150 - 388 of PEPC_HUMAN, which also conesponds to amino acids 71 - 309 of HUMPGCA_PEA_1_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to preferred embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of HUMPGCA_PEA_1_P15, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise DV, having a structure as follows: a sequence starting from any of amino acid numbers 70-x to 70; and ending at any of amino acid numbers 71+ ((n-2) - x), in which x varies from 0 to n-2. According to preferred embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_P3, comprising a first amino acid sequence being at least 90 % homologous to
MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPIDTFPTDPIKE SSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT NLAALFPPEGVSIWNPILLWQPIPVHTVPLSEDQLLYLPFRNCPRFQELESETLKSEEFQK RLHPYKDFIATLGKLSGLHGQDLFGIWSKVYDPLYCESVHNFTLPSWATEDTMTKLREL SELSLLSLYGIHKQKEKSRLQGGVLVNEILNHMKRATQff SYKKLIMYSA conesponding to amino acids 1 - 288 of PPAP_HUMAN, which also corresponds to amino acids 1 - 288 of HSPROSAP_PEA_l_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SLWAYGKFN conesponding to amino acids 289 - 297 of HSPROSAP_PEA_l_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SLWAYGKFN in HSPROSAP_PEA_l_P3. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_P9, comprising a first amino acid sequence being at least 90 % homologous to MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPIDTFPTDPIKE SSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT NLAALFPPEGVSIWNPILLWQPIPVHTVPLSEDQLLYLPFRNCPRFQELESETLKSEEFQK RLHPYKDFIATLGKLSGLHGQDLFGIWSKVYDPLYCESVHNFTLPSWATEDTMTKLREL SELSLLSLYGIHKQKEKSRLQGGVLVNEILNHMKRATQIPSYKKLIMYSAHDTTVSGLQ MALDVYNGLLPPYASCHLTELYFEKGEYFVEMYYRNETQHEPYPLMLPGCSPSCPLER FAELVGPVIPQDWSTECMTTNSHQG conesponding to amino acids 1 - 380 of
PPAP_HUMAN, which also conesponds to amino acids 1 - 380 of HSPROSAP_PEA_l_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PAETAHSARRNHDIALPCGRSTCLENTVLYYHYG conesponding to amino acids 381 - 414 of HSPROSAP_PEA_l_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments ofthe present invention, there is provided an isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PAETAHSARRNHDIALPCGRSTCLENTVLYYHYG in HSPROSAP_PEA_l_P9. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_Pl 1, comprising a first amino acid sequence being at least 90 % homologous to MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPIDTFPTDPIKE SSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT
NLAALFPPEGVSIWNPILLWQPIPVHTVPLSEDQLLYLPFRNCPRFQELESETLKSEEFQK RLHPYKDFIATLGKLSGLHGQDLFGIWSKVYDPLYCE conesponding to amino acids 1 - 216 of PPAP_HUMAN, which also conesponds to amino acids 1 - 216 of HSPROSAP_PEA_l_Pl 1, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKEKKITG conesponding to amino acids 217 - 224 of HSPROSAP_PEA_l_Pl 1, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_Pl 1, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKEKKITG in HSPR0SAP_PEA_1_P11. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_P12, comprising a first amino acid sequence being at least 90 % homologous to
MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPIDTFPTDPIKE SSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT NLAALFPPEGVSIWNPILLWQPIPVHTVPLSEDQLLYLPFRNCPRFQELESETLKSEEFQK RLHPYK conesponding to amino acids 1 - 185 of PPAP_HUMAN, which also corresponds to amino acids 1 - 185 of HSPROSAP_PEA_l_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKS conesponding to amino acids 186 - 188 of HSPROSAP_PEA_l_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_P13, comprising a first amino acid sequence being at least 90 % homologous to MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPIDTFPTDPIKE SSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT NLAALFPPEGVSIWNPILLWQPIPVHTVPLSEDQ conesponding to amino acids 1 - 152 of
PPAP_HUMAN, which also conesponds to amino acids 1 - 152 of HSPROSAP_PEA_l_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSILGKPGDFRWT conesponding to amino acids 153 - 165 of HSPROSAP_PEA_l_P13, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSILGKPGDFRWT in HSPROSAP_PEA_l_P13. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_P14, comprising a first amino acid sequence being at least 90 % homologous to MRAAPLLLARAASLSLGFLFLLFFWLDRS VLAKELKFVTLVFRHGDRSPIDTFPTDPIKE SSWPQGFGQLTQLGMEQFTYELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT NLAALFPPEGVSIWNPILLWQPIPVHTVPLSEDQ conesponding to amino acids 1 - 152 of PPAP_HUMAN, which also conesponds to amino acids 1 - 152 of HSPROSAP_PEA_l_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
SHHRHDHRISLWLKLSLTAGPRLLPSDLWGRLLSSLSCQYP corresponding to amino acids 153 - 193 of HSPROSAP_PEA_l_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHHRHDHRISLWLKLSLTAGPRLLPSDLWGRLLSSLSCQYP in HSPROSAP_PEA_l_P14. According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_P23, comprising a first amino
acid sequence being at least 90 % homologous to
MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLV conesponding to amino acids 1 - 41 of PPAP_HUMAN, which also conesponds to amino acids 1 - 41 of HSPROSAP_PEA_l_P23, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SRLFSLLFP conesponding to amino acids 42 - 50 of HSPROSAP_PEA_l_P23, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_P23, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SRLFSLLFP in HSPROSAP_PEA_l_P23. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSU13680_PEA_1_P18, comprising a first amino acid sequence being at least 90 % homologous to
MSTVKEQLIEKLIEDDENSQCKITIVGTGAVGMACAISILLKDLADELALVDVALDKLK GEMMDLQHGSLFFSTSKITSGKDYSVSANSRIVIVTAGARQQEGETRLALVQRNVAIMK SIIPAIVHYSPDCKILWSNPVDILTYIVWKISGLPVTRVIGSGCNLDSARFRYLIGEKLGV HPTSCHGWIIGEHGDSS corresponding to amino acids 1 - 197 of LDHC_HUMAN_V1, which also conesponds to amino acids 1 - 197 of HSU13680_PEA_1_P18, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GIIWNKRRTLSQYPLCLGAEWCLRCCEN conesponding to amino acids 198 - 225 of HSU13680_PEA_1_P18, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSU13680_PEA_1_P18, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GIIWNKRRTLSQYPLCLGAEWCLRCCEN in HSU13680_PEA_1_P18.
According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSU13680_PEA_1_P19, comprising a first amino acid sequence being at least 90 % homologous to
MSTVKEQLIEKLIEDDENSQCKITIVGTGAVGMACAISILLKDLADELALVDVALDKLK GEMMDLQHGSLFFSTSKITSGKDYSVSANSRIVIVTAGARQQEGETRLALVQRNVAIMK SIIPAIVFΓ^SPDCKILVVSNPVDILTYIVWKISGLPVTRVIGSGCNLDSARFRYLIGEKLGV
HPTSCHGWIIGEHGDSSVP conesponding to amino acids 1 - 199 of LDHC_HUMAN_V1, which also conesponds to amino acids 1 - 199 of HSU13680_PEA_1_P19, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MKLSS conesponding to amino acids 200 - 204 of HSU13680_PEA_1_P19, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSU13680_PEA_1_P19, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MKLSS in HSU13680_PEA_1_P19. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSU13680_PEA_1_P15, comprising a first amino acid sequence being at least 90 % homologous to
MSTVKEQLIEKLIEDDENSQCKITΓVGTGAVGMACAISILLK conesponding to amino acids 1 - 42 of LDHC_HUMAN_V2, which also conesponds to amino acids 1 - 42 of HSU13680_PEA_1_P15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least
95% homologous to a polypeptide having the sequence NFCIF conesponding to amino acids 43 - 47 of HSU13680_PEA_1_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSU13680_PEA_1_P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably
at least about 90% and most preferably at least about 95% homologous to the sequence NFCIF in HSU13680_PEA_1_P15. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSGROWl_PEA_l_PEA_l_P17, comprising a first amino acid sequence being at least 90 % homologous to
MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLAFDTYQEF conesponding to amino acids 1 - 57 of SOMA_HUMAN, which also conesponds to amino acids 1 - 57 of HSGROWl_PEA_l_PEA_l_P17, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
VSSWGMGAHQGWQEGVTFPRWEIRGGD conesponding to amino acids 58 - 84 of HSGROWl_PEA_l_PEA_l_P17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSGROWl_PEA_l_PEA_l_P17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSSWGMGAHQGWQEGVTFPRWEIRGGD in HSGROWl_PEA_l_PEA_l_P17. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSGROWl_PEA_l_PEA_l_P18, comprising a first amino acid sequence being at least 90 % homologous to
MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLAFDTYQEFEE AYIPKEQKYSFLQNPQTSLCFSESIPTPSNREETQQ conesponding to amino acids 1 - 95 of SOMA_HUMAN, which also conesponds to amino acids 1 - 95 of HSGROWl_PEA_l_PEA_l_Pl 8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence T conesponding to amino acids 96 - 96 of HSGROWl_PEA_l_PEA_l_P18, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSGROWl_PEA_l_PEA_l_P9, comprising a first
amino acid sequence being at least 90 % homologous to
MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLAFDTYQEFEE AYIPKEQKYSFLQNPQTSLCFSESIPTPSNREETQQKSNLELLRISLLLIQSWLEPVQFLRS VFANSLVYGASDSNVYDLLKDLEEGIQTLMG conesponding to amino acids 1 - 152 of SOMA_HUMAN, which also corresponds to amino acids 1 - 152 of
HSGROWl_PEA_l_PEA_l_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRVAPGVPNPGAPLTLRAVLEKHCCPLFSSQALTQENSPYSSFPLVNPPGLSLHPEGEGG K conesponding to amino acids 153 - 213 of HSGROWl_PEA_l_PEA_l_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated polypeptide encoding for a tail of HSGROWl_PEA_l_PEA_l_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
VRVAPGVPNPGAPLTLRAVLEKHCCPLFSSQALTQENSPYSSFPLVNPPGLSLHPEGEGG K in HSGROWl_PEA_l_PEA_l_P9. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSGROW1_PEA_1_PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to
MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLAFDTYQEF conesponding to amino acids 1 - 57 of SOMA_HUMAN, which also conesponds to amino acids 1 - 57 of HSGROW1_PEA_1_PEA_1_P10, and a second amino acid sequence being at least 90 % homologous to
LVYGASDSNVYDLLKDLEEGIQTLMGRLEDGSPRTGQIFKQTYSKFDTNSHNDDALLK NYGLLYCFRKDMDKVETFLRIVQCRSVEGSCGF conesponding to amino acids 127 - 217 of SOMA_HUMAN, which also conesponds to amino acids 58 - 148 of HSGROW1_PEA_1_PEA_1_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
According to prefened embodiments ofthe present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of
HSGROW1_PEA_1_PEA_1_P10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise FL, having a structure as follows: a sequence starting from any of amino acid numbers 57-x to 57; and ending at any of amino acid numbers 58+ ((n-2) - x), in which x varies from 0 to n-2. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for HSGROWl_PEA_l_PEA_l_P15, comprising a first amino acid sequence being at least 90 % homologous to
MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLAFDTYQEF conesponding to amino acids 1 - 57 of SOMA_HUMAN, which also conesponds to amino acids 1 - 57 of HSGROWl_PEA_l_PEA_l_P15, and a second amino acid sequence being at least 90 % homologous to
RLEDGSPRTGQIFKQTYSKFDTNSHNDDALLKNYGLLYCFRKDMDKVETFLRIVQCRS VEGSCGF conesponding to amino acids 153 - 217 of SOMAJHUMAN, which also conesponds to amino acids 58 - 122 of HSGROWl_PEA_l_PEA_l_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. According to prefened embodiments of the present invention, there is provided an isolated chimeric polypeptide encoding for an edge portion of
HSGROWl_PEA_l_PEA_l_P15, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise FR, having a stmcture as follows: a sequence starting from any of amino acid numbers 57-x to 57; and ending at any of amino acid numbers 58+ ((n-2) - x), in which x varies from 0 to n-2.
According to prefened embodiments of the present invention, there is provided an antibody capable of specifically binding to an epitope of an amino acid sequence as described herein. Optionally the amino acid sequence corresponds to a bridge, edge portion, tail, head or insertion as in any of the previous claims. Optionally the antibody is capable of differentiating between a splice variant having said epitope and a conesponding known protein. According to prefened embodiments of the present invention, there is provided a kit for detecting a Marker- detectable disease, comprising a kit detecting specific expression of a splice variant as described herein. Optionally the kit comprises a NAT-based technology. Optionally the kit further comprises at least one primer pair capable of selectively hybridizing to a nucleic acid sequence as described herein. Optionally the kit further comprises at least one oligonucleotide capable of selectively hybridizing to a nucleic acid sequence as described herein. Optionally the kit comprises an antibody as described herein. Optionally the kit further comprises at least one reagent for performing an ELISA or a Western blot. According to prefened embodiments of the present invention, there is provided a method for detecting a Marker-detectable disease, comprising detecting specific expression of a splice variant according to any ofthe above claims. Optionally detecting specific expression is performed with a NAT-based technology. Optionally detecting specific expression is performed with an immunoassay. Optionally the immunoassay comprises an antibody as described herein. According to prefened embodiments of the present invention, there is provided a biomarker capable of detecting Marker- detectable disease, comprising any nucleic acid sequence described herein or a fragment thereof, or any amino acid sequence described herein or a fragment thereof. According to prefened embodiments of the present invention, there is provided a method for screening for variant-detectable disease, comprising detecting cells affected by a
Marker-detectable disease with a biomarker or an antibody or a method or assay according to any of the above claims. According to prefened embodiments of the present invention, there is provided a method for diagnosing a marker-detectable disease, comprising detecting cells affected by Marker-detectable disease with a biomarker or an antibody or a method or assay according to any ofthe above claims. According to preferred embodiments of the present invention, there is provided a method for monitoring disease progression and or treatment efficacy and/or relapse of Marker- detectable disease, comprising detecting cells affected by Marker-detectable disease with a biomarker or an antibody or a method or assay according to any of the above claims. According to prefened embodiments of the present invention, there is provided a method of selecting a therapy for a marker- detectable disease, comprising detecting cells affected by a marker-detectable disease with a biomarker or an antibody or a method or assay according to any of the above claims and selecting a therapy according to said detection. With regard to markers suitable for detecting cardiac disease (including but not limited to HSCREACT), according to prefened embodiments ofthe present invention, cardiac disease and/or pathology and/or condition and/or disorder may comprise one or more of Myocardial infarct, acute coronary syndrome, angina pectoris (stable and unstable), cardiomyopathy, myocarditis, congestive heart failure or any type of heart failure, the detection of reinfarction, the detection of success of thrombolytic therapy after Myocardial infarct, Myocardial infarct after surgery, assessing the size of infarct in Myocardial infarct, the differential diagnosis of heart related conditions from lung related conditions (as pulmonary embolism), the differential diagnosis of Dyspnea, and cardiac valves related conditions. For these embodiments, there are provided novel markers for cardiac disease that are both sensitive and accurate. Biomolecular sequences (amino acid and/or nucleic acid sequences) uncovered using the methodology of the present invention and described herein can be efficiently utilized as tissue or pathological markers and/or as drugs or drug targets for treating or preventing a disease. These markers are specifically released to the bloodstream under conditions of cardiac disease and/or cardiac pathology, including but not limited to cardiac damage, and/or are otherwise expressed at a much higher level and/or specifically expressed in heart. The method
of the present invention identifies clusters (genes) which are characterized in that the transcripts are differentially expressed in heart muscle tissue compared with other normal tissues, preferably in comparison to skeletal muscle tissue. In acute conditions under which heart muscle tissue experiences hypoxia (with or without necrosis), intracellular proteins that are not normally secreted can leak through the cell membrane to the extracellular space. Therefore, heart muscle tissue differentially expressed proteins, as through analysis of EST expression, are potential acute heart damage markers. Leakage of intracellular content can also occur in chronic damage to the heart muscle, therefore proteins selected according to this method are potential markers for chronic heart conditions. When a protein that is differentially expressed in heart muscle is secreted, it is even more likely to be useful as a chronic heart damage marker, since secretion implies that the protein has a physiological role exterior to the cell, and therefore may be used by the heart muscle to respond to the chronic damage. This rationale is empirically supported by the non- limiting examples of the proteins BNP (brain natriuretic peptide) and ANF (atrial natriuretic factor), which are differentially expressed heart muscle proteins that are secreted and which were shown to be markers for congestive heart failure. In addition, BNP and ANF are not only differentially expressed in heart tissue, they are also overexpressed dramatically (hundreds of times greater expression) when heart failure occurs. Other heart specific secreted proteins might present similar overexpression in chronic damage. Optionally and preferably, the markers described herein are overexpressed in heart as opposed to muscle, as described in greater detail below. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can conelate with a probable diagnosis of cardiac disease and/or cardiac pathology, including but not limited to cardiac damage. The present invention therefore also relates to diagnostic assays for cardiac disease and/or cardiac pathology, including but not limited to cardiac damage, and methods of use of such markers for detection of cardiac disease and/or cardiac pathology, including but not limited to cardiac damage (alone or in combination), optionally and preferably in a sample taken from a subject (patient), which is more preferably some type of blood sample. The present invention therefore also relates to diagnostic assays for cardiac disease and/or cardiac pathology, including but not limited to cardiac damage, and methods of use of
such markers for detection of cardiac disease and/or cardiac pathology, including but not limited to cardiac damage (alone or in combination), optionally and preferably in a sample taken from a subject (patient), which is more preferably some type of blood sample. According to prefened embodiments of the present invention, preferably any ofthe above nucleic acid and/or amino acid sequences further comprises any sequence having at least about 70%, preferably at least about 80%, more preferably at least about 90%, most preferably at least about 95% homology thereto. Unless otherwise noted, all experimental data relates to variants of the present invention, named according to the segment being tested (as expression was tested through RT-PCR as described). All nucleic acid sequences and/or amino acid sequences shown herein as embodiments ofthe present invention relate to their isolated form, as isolated polynucleotides (including for all transcripts), oligonucleotides (including for all segments, amplicons and primers), peptides (including for all tails, bridges, insertions or heads, optionally including other antibody epitopes as described herein) and/or polypeptides (including for all proteins). It should be noted that oligonucleotide and polynucleotide, or peptide and polypeptide, may optionally be used interchangeably.
Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. < 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The
Harper Collins Dictionary of Biology (1991). All of these are hereby incorporated by reference as if fully set forth herein. As used herein, the following terms have the meanings ascribed to them unless specified otherwise.
BRIEF DESCRIPTION OF DRAWINGS
Figure 1 is schematic summary of cancer biomarkers selection engine and the wet validation stages. Figure 2. Schematic illustration, depicting grouping of transcripts of a given cluster based on presence or absence of unique sequence regions. Figure 3 is schematic presentation of the oligonucleotide based microarray fabrication. Figure 4 is schematic summary of the oligonucleotide based microanay experimental flow. Figure 5 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster HSPROSAP, demonstrating overexpression in a mixture of malignant tumors from different tissues. Figure 6 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster DI 1581 , demonstrating overexpression in epithelial malignant tumors and a mixture of malignant tumors from different tissues. Figure 7 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster T87096, demonstrating overexpression in kidney malignant tumors and pancreas carcinoma. Figure 8 is a histogram showing selective expression of cluster S42303 in heart tissue, calculated based on the number of heart- specific clones in libraries/sequences. Figures 9-10 are histograms, demonstrating the expression of cluster S42303, as measured by the actual expression of oligonucleotides in various tissues, including heart. Figure 11 is a histogram showing Cancer and cell- line vs. normal tissue expression for Cluster HSMUCIA, demonstrating overexpression in a mixture of malignant tumors from different tissues, breast malignant tumors, pancreas carcinoma and prostate cancer. Figure 12 is a schematic summary of quantitative real-time PCR analysis. Figure 13 is a histogram showing down regulation of the Mast/stem cell growth factor receptor SCFR (HSKITCR) transcripts, which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2, in cancerous ovary samples relative to the normal samples. Figure 14 is a histogram showing down regulation of the Mast/stem cell growth factor receptor SCFR (HSKITCR) transcripts, which are detectable by amplicon as depicted in
sequence name HSKITCR seg3F2R2, in cancerous colon samples relative to the normal samples. Figures 15 is a histogram showing overexpression of the Mast/stem cell growth factor receptor SCFR (HSKITCR) transcripts, which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2, in cancerous lung samples relative to the normal samples. Figures 16 is a histogram showing down regulation of the Mast/stem cell growth factor receptor SCFR (HSKITCR) transcripts, which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2, in cancerous lung samples relative to the normal samples. Figures 17 is a histogram showing overexpression of the Mast/stem cell growth factor receptor SCFR (HSKITCR) transcripts, which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2, in cancerous prostate samples relative to the normal samples. Figures 18 is a histogram showing down regulation of the Mast stem cell growth factor receptor SCFR (HSKITCR) transcripts, which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2, in cancerous breast samples relative to the normal samples. Figure 19 is a histogram demonstrating the expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2 in different normal tissues. Figure 20 is a histogram showing down regulation of the Mast/stem cell growth factor receptor SCFR (HSKITCR) transcripts, which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2, in cancerous colon samples relative to the normal samples. Figure 21 is a histogram showing down regulation of the Mast/stem cell growth factor receptor SCFR (HSKITCR) transcripts, which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2, in cancerous breast samples relative to the normal samples. Figure 22 is a histogram showing down regulation of the Mast/stem cell growth factor receptor SCFR (HSKITCR) transcripts, which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2, in cancerous lung samples relative to the normal samples.
Figure 23 is a histogram showing down regulation of the Mast/stem cell growth factor receptor SCFR (HSKITCR) transcripts, which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2, in cancerous ovary samples relative to the normal samples. Figure 24 is a histogram demonstrating the expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2 in different normal tissues. Figure 25 is a histogram showing selective expression of cluster HUMCKMA in heart tissue, calculated based on the number of heart- specific clones in libraries/sequences. Figure 26 is a histogram, demonstrating the expression of cluster HUMCKMA, as measured by the actual expression of oligonucleotides in various tissues, including heart. Figure 27 is a schematic presentation of the wild type and new variants of Homo sapiens alkaline phosphatase, liver/bone/kidney (ALPL) HSAPHOL mRNA and protein structures.
Orange boxes indicate the regions, representing exons. Arrows represent the introns. Yellow boxes indicate the amino acid coding regions. Green boxes represent the unique amino acids, encoded by the new variants; the number of the unique amino acids in each variant is indicated within each box. The known mRNA and protein is indicated by "WT". The new variants are marked as T10, T4, T6, T5, and T8, respectively. The location of the GPI-anchor and the location of the CGEN-oligo are indicated. Figures 28-29 are histograms showing on two different scales the expression of Homo sapiens alkaline phosphatase, liver/bone/kidney (ALPL) HSAPHOL transcripts, which are detectable by amplicon as depicted in sequence name HSAPHOL junc2- 13, in different normal tissues. Figure 30 is a histogram showing the expression of the Homo sapiens alkaline phosphatase, liver/bone/kidney (ALPL) HSAPHOL transcripts which are detectable by amplicon as depicted in sequence name HSAPHOL seg26F2R2 in different normal tissues. Figure 31 is a histogram showing the expression of the Homo sapiens alkaline phosphatase, liver/bone/kidney (ALPL) HSAPHOL transcripts which are detectable by amplicon as depicted in sequence name HSAPHOL seg38 in different normal tissues. Figures 32-33 are histograms showing on two different scales the expression of Homo sapiens C- reactive protein, pentraxin- related (CRP) HSCREACT transcripts which are
detectable by amplicon as depicted in sequence name HSCREACT juncl 1-53F2R2 in different normal tissues. Figure 34-35 are histograms showing on two different scales the expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) HSCREACT transcripts which are detectable by amplicon as depicted in sequence name HSCREACT juncl2-30F2R2 in different normal tissues. Figure 36-37 are histograms showing on two different scales the expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) HSCREACT transcripts which are detectable by amplicon as depicted in sequence name HSCREACT juncl2-53F2R2 in different normal tissues. Figure 38 is a histogram showing the expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) HSCREACT transcripts which are detectable by amplicon as depicted in sequence name HSCREACT junc24-47F2R2 in different normal tissues. Figure 39 is a histogram showing the expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) HSCREACT transcripts which are detectable by amplicon as depicted in sequence name HSCREACT seg8-l l in different normal tissues. Figure 40 is a histogram showing the expression of of the Mast/stem cell growth factor receptor SCFR (HSKITCR) transcripts, which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2, in cancerous prostate samples relative to the normal samples.
DESCRIPTION OF PREFERRED EMBODIMENTS The present invention provides variants, which may optionally be used as diagnostic markers. Preferably these variants are useful as diagnostic markers for marker- detectable (also refened to herein as "variant-detectable") diseases as described herein. Differential variant markers are collectively described as "variant disease markers". The markers of the present invention, alone or in combination, can be used for prognosis, prediction, screening, early diagnosis, staging, therapy selection and treatment
monitoring of a marker- detectable disease. For example, optionally and preferably, these markers may be used for staging the disease in patient (for example if the disease features cancer) and/or monitoring the progression of the disease. Furthermore, the markers of the present invention, alone or in combination, can be used for detection of the source of metastasis found in anatomical places other than the originating tissue, again in the example of cancer. Also, one or more of the markers may optionally be used in combination with one or more other disease markers (other than those described herein). Biomolecular sequences (amino acid and/or nucleic acid sequences) uncovered using the methodology of the present invention and described herein can be efficiently utilized as tissue or pathological markers and/or as drugs or drug targets for treating or preventing a disease. These markers are specifically released to the bloodstream under conditions of a particular disease, and/or are otherwise expressed at a much higher level and/or specifically expressed in tissue or cells afflicted with or demonstrating the disease. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can conelate with a probable diagnosis of a particular disease and/or a condition that is indicative of a higher risk for a particular disease. The present invention therefore also relates to diagnostic assays for marker-detectable disease and/or an indicative condition, and methods of use of such markers for detection of marker-detectable disease and/or an indicative condition, optionally and preferably in a sample taken from a subject (patient), which is more preferably some type of blood sample.
In another embodiment, the present invention relates to bridges, tails, heads and/or insertions, and/or analogs, homologs and derivatives of such peptides. Such bridges, tails, heads and/or insertions are described in greater detail below with regard to the Examples. As used herein a "tail" refers to a peptide sequence at the end of an amino acid sequence that is unique to a splice variant according to the present invention. Therefore, a splice variant having such a tail may optbnally be considered as a chimera, in that at least a first portion of the splice variant is typically highly homologous (often 100% identical) to a portion of the conesponding known protein, while at least a second portion of the variant comprises the tail. As used herein a "head" refers to a peptide sequence at the beginning of an amino acid sequence that is unique to a splice variant according to the present invention. Therefore, a splice
variant having such a head may optionally be considered as a chimera, in that at least a first portion of the splice variant comprises the head, while at least a second portion is typically highly homologous (often 100% identical) to a portion of the conesponding known protein. As used herein "an edge portion" refers to a connection between two portions of a splice variant according to the present invention that were not joined in the wild type or known protein. An edge may optionally arise due to a join between the above "known protein" portion of a variant and the tail, for example, and/or may occur if an internal portion of the wild type sequence is no longer present, such that two portions of the sequence are now contiguous in the splice variant that were not contiguous in the known protein. A "bridge" may optionally be an edge portion as described above, but may also include a join between a head and a "known protein" portion of a variant, or a join between a tail and a "known protein" portion of a variant, or a join between an insertion and a "known protein" portion of a variant. Optionally and preferably, a bridge between a tail or a head or a unique insertion, and a "known protein" portion of a variant, comprises at least about 10 amino acids, more preferably at least about 20 amino acids, most preferably at least about 30 amino acids, and even more preferably at least about 40 amino acids, in which at least one amino acid is from the tail/head/insertion and at least one amino acid is from the "known protein" portion of a variant. Also optionally, the bridge may comprise any number of amino acids from about 10 to about 40 amino acids (for example, 10, 11, 12, 13...37, 38, 39, 40 amino acids in length, or any number in between). It should be noted that a bridge cannot be extended beyond the length of the sequence in either direction, and it should be assumed that every bridge description is to be read in such manner that the bridge length does not extend beyond the sequence itself. Furthermore, bridges are described with regard to a sliding window in certain contexts below. For example, certain descriptions of the bridges feature the following format: a bridge between two edges (in which a portion of the known protein is not present in the variant) may optionally be described as follows: a bridge portion of CONTIG-NAME_Pl (representing the name of the protein), comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise
XX (2 amino acids in the center of the bridge, one from each end of the edge), having a structure as follows (numbering according to the sequence of CONTIG-NAME_Pl): a sequence starting from any of amino acid numbers 49-x to 49 (for example); and ending at any of amino acid numbers 50 + ((n-2) - x) (for example), in which x varies from 0 to n-2. In this example, it should also be read as including bridges in which n is any number of amino acids between 10-50 amino acids in length. Furthermore, the bridge polypeptide cannot extend beyond the sequence, so it should be read such that 49-x (for example) is not less than 1, nor 50 + ((n-2) - x) (for example) greater than the total sequence length. In another embodiment, this invention provides antibodies specifically recognizing the splice variants and polypeptide fragments thereof of this invention. Preferably such antibodies differentially recognize splice variants of the present invention but do not recognize a corresponding known protein (such known proteins are discussed with regard to their splice variants in the Examples below). In another embodiment, this invention provides an isolated nucleic acid molecule encoding for a splice variant according to the present invention, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto. In another embodiment, this invention provides an isolated nucleic acid molecule, having a nucleotide sequence as set forth in any one of the sequences listed herein, or a sequence complementary thereto. In another embodiment, this invention provides an oligonucleotide of at least about 12 nucleotides, specifically hybridizable with the nucleic acid molecules of this invention. In another embodiment, this invention provides vectors, cells, liposomes and compositions comprising the isolated nucleic acids of this invention. In another embodiment, this invention provides a method for detecting a splice variant according to the present invention in a biological sample, comprising: contacting a biological sample with an antibody specifically recognizing a splice variant according to the present invention under conditions whereby the antibody specifically interacts with the splice variant in the biological sample but do not recognize known conesponding proteins (wherein the known protein is discussed with regard to its splice variant(s) in the Examples below), and detecting said interaction; wherein the presence of an interaction conelates with the presence of a splice variant in the biological sample.
In another embodiment, this invention provides a method for detecting a splice variant nucleic acid sequences in a biological sample, comprising: hybridizing the isolated nucleic acid molecules or oligonucleotide fragments of at least about a minimum length to a nucleic acid material of a biological sample and detecting a hybridization complex; wherein the presence of a hybridization complex conelates with the presence of a splice variant nucleic acid sequence in the biological sample. According to the present invention, the splice variants described herein are non-limiting examples of markers for diagnosing marker-detectable disease and/or an indicative condition. Each splice variant marker of the present invention can be used alone or in combination, for various uses, including but not limited to, prognosis, prediction, screening, early diagnosis, determination of progression, therapy selection and treatment monitoring of marker- detectable disease and/or an indicative condition, including a transition from an indicative condition to marker- detectable disease. According to optional but prefened embodiments of the present invention, any marker according to the present invention may optionally be used alone or combination. Such a combination may optionally comprise a plurality of markers described herein, optionally including any subcombination of markers, and/or a combination featuring at least one other marker, for example a known marker. Furthermore, such a combination may optionally and preferably be used as described above with regard to determining a ratio between a quantitative or semi- quantitative measurement of any marker described herein to any other marker described herein, and/or any other known marker, and/or any other marker. With regard to such a ratio between any marker described herein (or a combination thereof) and a known marker, more preferably the known marker comprises the "known protein" as described in greater detail below with regard to each cluster or gene. According to other prefened embodiments of the present invention, a splice variant protein or a fagment thereof, or a splice variant nucleic acid sequence or a fragment thereof, may be featured as a biomarker for detecting marker- detectable disease and/or an indicative condition, such that a biomarker may optionally comprise any of the above. According to still other prefened embodiments, the present invention optionally and preferably encompasses any amino acid sequence or fragment thereof encoded by a nucleic acid sequence conesponding to a splice variant protein as described herein Any oligopeptide or
peptide relating to such an amino acid sequence or fragment thereof may optionally also (additionally or alternatively) be used as a biomarker, including but not limited to the unique amino acid sequences of these proteins that are depicted as tails, heads, insertions, edges or bridges. The present invention also optionally encompasses antibodies capable of recognizing, and/or being elicited by, such oligopeptides or peptides. The present invention also optionally and preferably encompasses any nucleic acid sequence or fragment thereof, or amino acid sequence or fragment thereof, conesponding to a splice variant of the present invention as described above, optionally for any application. Non- limiting examples of methods or assays are described below. The present invention also relates to kits based upon such diagnostic methods or assays.
Nucleic acid sequences and Oligonucleotides Various embodiments of the present invention encompass nucleic acid sequences described hereinabove; fragments thereof, sequences hybridizable therewith, sequences homologous thereto, sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occurring or artificially induced, either randomly or in a targeted fashion. The present invention encompasses nucleic acid sequences described herein; fragments thereof, sequences hybridizable therewith, sequences homologous thereto [e.g., at least 50 %, at least 55 %, at least 60%, at least 65 %, at least 70 %, at least 75 %, at least 80 %, at least 85 %, at least 95 % or more say 100 % identical to the nucleic acid sequences set forth below], sequences encoding similar polypeptides with different codon usage, altered sequences characterized by mutations, such as deletion, insertion or substitution of one or more nucleotides, either naturally occurring or man induced, either randomly or in a targeted fashion. The present invention also encompasses homologous nucleic acid sequences (i.e., which form a part of a polynucleotide sequence of the present invention) which include sequence regions unique to the polynucleotides of the present invention. In cases where the polynucleotide sequences of the present invention encode previously unidentified polypeptides, the present invention also encompasses novel polypeptides or portions
thereof, which are encoded by the isolated polynucleotide and respective nucleic acid fragments thereof described hereinabove. A "nucleic acid fragment" or an "oligonucleotide" or a "polynucleotide" are used herein interchangeably to refer to a polymer of nucleic acids. A polynucleotide sequence of the present invention refers to a single or double stranded nucleic acid sequences which is isolated and provided in the form of an RNA sequence, a complementary polynucleotide sequence (cDNA), a genomic polynucleotide sequence and/or a composite polynucleotide sequences (e.g., a combination of the above). As used herein the phrase "complementary polynucleotide sequence" refers to a sequence, which results from reverse transcription of messenger RNA using a reverse transcriptase or any other RNA dependent DNA polymerase. Such a sequence can be subsequently amplified in vivo or in vitro using a DNA dependent DNA polymerase. As used herein the phrase "genomic polynucleotide sequence" refers to a sequence derived (isolated) from a chromosome and thus it represents a contiguous portion of a chromosome. As used herein the phrase "composite polynucleotide sequence" refers to a sequence, which is composed of genomic and cDNA sequences. A composite sequence can include some exonal sequences required to encode the polypeptide of the present invention, as well as some intronic sequences interposing therebetween. The intronic sequences can be of any source, including of other genes, and typically will include conserved splicing signal sequences. Such intronic sequences may further include cis acting expression regulatory elements. Prefened embodiments of the present invention encompass oligonucleotide probes. An example of an oligonucleotide probe which can be utilized by the present invention is a single stranded polynucleotide which includes a sequence complementary to the unique sequence region of any variant according to the present invention, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein). Alternatively, an oligonucleotide probe of the present invention can be designed to hybridize with a nucleic acid sequence encompassed by any of the above nucleic acid sequences,
particularly the portions specified above, including but not limited to a nucleotide sequence coding for an amino sequence of a bridge, tail, head and/or insertion according to the present invention, and/or the equivalent portions of any nucleotide sequence given herein (including but not limited to a nucleotide sequence of a node, segment or amplicon described herein). Oligonucleotides designed according to the teachings of the present inventbn can be generated according to any oligonucleotide synthesis method known in the art such as enzymatic synthesis or solid phase synthesis. Equipment and reagents for executing solid-phase synthesis are commercially available from, for example, Applied Biosystems. Any other means for such synthesis may also be employed; the actual synthesis of the oligonucleotides is well within the capabilities of one skilled in the art and can be accomplished via established methodologies as detailed in, for example, 'Molecular Cloning: A laboratory Manual" Sambrook et al., (1989); "Cunent Protocols in Molecular Biology" Volumes I-III Ausubel, R. M., ed. (1994); Ausubel et al., "Cunent Protocols in Molecular Biology", John Wiley and Sons, Baltimore, Maryland (1989); Perbal, "A Practical Guide to Molecular Cloning", John Wiley & Sons, New York (1988) and "Oligonucleotide Synthesis" Gait, M. J., ed. (1984) utilizing solid phase chemistry, e.g. cyanoethyl phosphoramidite followed by deprotection, desalting and purification by for example, an automated trityl-on method or HPLC. Oligonucleotides used according to this aspect of the present invention are those having a length selected from a range of about 10 to about 200 bases preferably about 15 to about 150 bases, more preferably about 20 to about 100 bases, most preferably about 20 to about 50 bases. Preferably, the oligonucleotide of the present invention features at least 17, at least 18, at least 19, at least 20, at least 22, at least 25, at least 30 or at least 40, bases specifically hybridizable with the biomarkers of the present invention. The oligonucleotides of the present invention may comprise heterocylic nucleosides consisting of purines and the pyrimidines bases, bonded in a 3' to 5' phosphodiester linkage. Preferably used oligonucleotides are those modified at one or more of the backbone, intemucleoside linkages or bases, as is broadly described hereinunder. Specific examples of prefened oligonucleotides useful according to this aspect of the present invention include oligonucleotides containing modified backbones or non-natural intemucleoside linkages. Oligonucleotides having modified backbones include those that retain a phosphorus atom in the backbone, as disclosed in U.S. Pat. NOs: 4,469,863; 4,476,301;
5,023,243; 5,177,196; 5,188,897; 5,264,423; 5,276,019; 5,278,302; 5,286,717; 5,321,131; 5,399,676; 5,405,939; 5,453,496; 5,455,233; 5,466, 677; 5,476,925; 5,519,126; 5,536,821; 5,541,306; 5,550,111; 5,563,253; 5,571,799; 5,587,361; and 5,625,050. Prefened modified oligonucleotide backbones include, for example, phosphorothioates, chiral phosphorothioates, phosphorodithioates, phosphotriesters, aminoalkyl phosphotriesters, methyl and other alkyl phosphonates including 3'-alkylene phosphonates and chiral phosphonates, phosphinates, phosphoramidates including 3'-amino phosphoramidate and aminoalkylphosphoramidates, thionophosphoramidates, thionoalkylphosphonates, thionoalkylphosphotriesters, and boranophosphates having normal 3'-5' linkages, 2'-5' linked analogs of these, and those having inverted polarity wherein the adjacent pairs of nucleoside units are linked 3'-5' to 5'-3' or 2'-5' to 5'-2'. Various salts, mixed salts and free acid forms can also be used. Alternatively, modified oligonucleotide backbones that do not include a phosphorus atom therein have backbones that are formed by short chain alkyl or cycloalkyl intemucleoside linkages, mixed heteroatom and alkyl or cycloalkyl intemucleoside linkages, or one or more short chain heteroatomic or heterocyclic intemucleoside linkages. These include those having morpholino linkages (formed in part from the sugar portion of a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfone backbones; formacetyl and thioformacetyl backbones; methylene formacetyl and thioformacetyl backbones; alkene containing backbones; sulfamate backbones; methyleneimino and methylenehydrazino backbones; sulfonate and sulfonamide backbones; amide backbones; and others having mixed N, O, S and CFfe component parts, as disclosed in U.S. Pat. Nos. 5,034,506; 5,166,315; 5,185,444; 5,214,134; 5,216,141; 5,235,033; 5,264,562; 5,264,564; 5,405,938; 5,434,257; 5,466,677; 5,470,967; 5,489,677; 5,541,307; 5,561,225; 5,596,086; 5,602,240; 5,610,289; 5,602,240; 5,608,046; 5,610,289; 5,618,704; 5,623, 070; 5,663,312; 5,633,360; 5,677,437; and 5,677,439. Other oligonucleotides which can be used according to the present invention, are those modified in both sugar and the intemucleoside linkage, i.e., the backbone, of the nucleotide units are replaced with novel groups. The base units are maintained for complementation with the appropriate polynucleotide target. An example for such an oligonucleotide mimetic, includes peptide nucleic acid (PNA). United States patents that teach the preparation of PNA compounds include, but are not limited to, U.S. Pat. Nos. 5,539,082; 5,714,331; and 5,719,262, each of
which is herein incorporated by reference. Other backbone modifications, which can be used in the present invention are disclosed in U.S. Pat. No: 6,303,374. Oligonucleotides of the present invention may also include base modifications or substitutions. As used herein, "unmodified" or "natural" bases include the purine bases adenine (A) and guanine (G), and the pyrimidine bases thymine (T), cytosine (C) and uracil (U). Modified bases include but are not limited to other synthetic and natural bases such as 5- methylcytosine (5-me-C), 5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine, 6-methyl and other alkyl derivatives of adenine and guanine, 2-propyl and other alkyl derivatives of adenine and guanine, 2-thiouracil, 2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl uracil and cytosine, 6-azo uracil, cytosine and thymine, 5-uracil (pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl, 8-hydroxyl and other 8- substituted adenines and guanines, 5- halo particularly 5-bromo, 5-trifluoromethyl and other 5- substituted uracils and cytosines, 7-methylguanine and 7-methyladenine, 8-azaguanine and 8- azaadenine, 7-deazaguanine and 7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Further bases particularly useful for increasing the binding affinity of the oligomeric compounds of the invention include 5-substituted pyrimidines, 6-azapyrimidines and N-2, N-6 and 0-6 substituted purines, including 2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine. 5-methylcytosine substi tions have been shown to increase nucleic acid duplex stability by 0.6- 1.2 °C and are presently prefened base substitutions, even more particularly when combined with 2'-0-methoxyethyl sugar modifications. Another modification of the oligonucleotides of the invention involves chemically linking to the oligonucleotide one or more moieties or conjugates, which enhance the activity, cellular distribution or cellular uptake of the oligonucleotide. Such moieties include but are not limited to lipid moieties such as a cholesterol moiety, cholic acid, a thioether, e.g., hexyl-S- tritylthiol, a thiocholesterol, an aliphatic chain, e.g., dodecandiol or undecyl residues, a phospholipid, e.g., di-hexadecyl-rac- glycerol or triethylammonium 1,2-di-O-hexadecyl-rac- glycero-3-H-phosphonate, a polyamine or a polyethylene glycol chain, or adamantane acetic acid, a palmityl moiety, or an octadecylamine or hexylamino-carbonyl-oxycholesterol moiety, as disclosed in U.S. Pat. No: 6,303,374.
It is not necessary for all positions in a given oligonucleotide molecule to be uniformly modified, and in fact more than one of the aforementioned modifications may be incorporated in a single compound or even at a single nucleoside within an oligonucleotide. It will be appreciated that oligonucleotides of the present invention may include further modifications for more efficient use as diagnostic agents and/or to increase bioavailability, therapeutic efficacy and reduce cytotoxicity. To enable cellular expression of the polynucleotides of the present invention, a nucleic acid construct according to the present invention may be used, which includes at bast a coding region of one of the above nucleic acid sequences, and further includes at least one cis acting regulatory element. As used herein, the phrase "cis acting regulatory element" refers to a polynucleotide sequence, preferably a promoter, which binds a trans acting regulator and regulates the transcription of a coding sequence located downstream thereto. Any suitable promoter sequence can be used by the nucleic acid construct of the present invention. Preferably, the promoter utilized by the nucleic acid construct of the present invention is active in the specific cell population transformed. Examples of cell type-specific and/or tissue- specific promoters include promoters such as albumin that is liver specific, lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al., (1989) EMBO J. 8:729-733] and immunoglobulins; [Banerji et al. (1983) Cell 33729-740], neuron- specific promoters such as the neurofilament promoter [Byrne et al. (1989) Proc. Natl. Acad. Sci. USA 86:5473-5477], pancreas- specific promoters [Edlunch et al. (1985) Science 230:912-916] or mammary gland-specific promoters such as the milk whey promoter (U.S. Pat. No. 4,873,316 and European Application Publication No. 264,166). The nucleic acid construct of the present invention can further include an enhancer, which can be adjacent or distant to the promoter sequence and can function in up regulating the transcription therefrom. The nucleic acid construct of the present invention preferably further includes an appropriate selectable marker and/or an origin of replication. Preferably, the nucleic acid construct utilized is a shuttle vector, which can propagate both in E. coli (wherein the construct comprises an appropriate selectable marker and origin of replication) and be compatible for propagation in cells, or integration in a gene and a tissue of choice. The construct according to
the present invention can be, for example, a plasmid, a bacmid, a phagemid, a cos id, a phage, a vims or an artificial chromosome. Examples of suitable constructs include, but are not limited to, pcDNA3, pcDNA3.1 (+/-), pGL3, PzeoSV2 (+/-), pDisplay, pEF/myc/cyto, pCMV/myc/cyto each of which is commercially available from Invitrogen Co. (www.invitrogen.com). Examples of retroviral vector and packaging systems are those sold by Clontech, San Diego, Calif, includingRetro-X vectors pLNCX and pLXSN, which permit cloning into multiple cloning sites and the trasgene is transcribed from CMV promoter. Vectors derived from Mo-MuLV are also included such as pBabe, where the transgene will be transcribed from the 5'LTR promoter. Cunently preferred in vivo nucleic acid transfer techniques include transfection with viral or non-viral constructs, such as adenovirus, lentivirus, Herpes simplex I vims, or adeno- associated vims (AAV) and lipid-based systems. Useful lipids for lipid- mediated transfer of the gene are, for example, DOTMA, DOPE, and DC-Choi [Tonkinson et al., Cancer Investigation, 14(1): 54-65 (1996)]. The most prefened constructs for use in gene therapy are viruses, most preferably adenovimses, AAV, lentiviruses, or refrovimses. A viral construct such as a retroviral construct includes at least one transcriptional promoter/enhancer or locus-defining element(s), or other elements that control gene expression by other means such as alternate splicing, nuclear RNA export, or post-translational modification of messenger. Such vector constructs also include a packaging signal, long terminal repeats (LTRs) or portions thereof, and positive and negative strand primer binding sites appropriate to the vims used, unless it is already present in the viral construct. In addition, such a construct typically includes a signal sequence for secretion of the peptide from a host cell in which it is placed. Preferably the signal sequence for this purpose is a mammalian signal sequence or the signal sequence of the polypeptide variants of the present invention. Optionally, the construct may also include a signal that directs polyadenylation, as well as one or more restriction sites and a translation termination sequence. By way of example, such constructs will typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of second-strand DNA synthesis, and a 3' LTR or a portion thereof. Other vectors can be used that are non-viral, such as cationic lipids, polylysine, and dendrimers.
Hybridization assays
Detection of a nucleic acid of interest in a biological sample may optionally be effected by hybridization-based assays using an oligonucleotide probe (non- limiting examples of probes according to the present invention were previously described). Traditional hybridization assays include PCR, RT-PCR, Real-time PCR, RNase protection, in-situ hybridization, primer extension, Southern blots (DNA detection), dot or slot blots (DNA, RNA), and Northern blots (RNA detection) (NAT type assays are described in greater detail below). More recently, PNAs have been described (Nielsen et al. 1999, Current Opin. Biotechnol. 10:71-75). Other detection methods include kits containing probes on a dipstick setup and the like. Hybridization based assays which allow the detection of a variant of interest (i.e., DNA or RNA) in a biological sample rely on the use of oligonucleotides which can be 10, 15, 20, or 30 to 100 nucleotides long preferably from 10 to 50, more preferably from 40 to 50 nucleotides long. Thus, the isolated polynucleotides (oligonucleotides) of the present invention are preferably hybridizable with any of the herein described nucleic acid sequences under moderate to stringent hybridization conditions. Moderate to stringent hybridization conditions are characterized by a hybridization solution such as containing 10 % dextrare sulfate, 1 M NaCl, 1 % SDS and 5 x 10^ cpm 32P labeled probe, at 65 °C, with a final wash solution of 0.2 x SSC and 0.1 % SDS and final wash at 65°C and whereas moderate hybridization is effected using a hybridization solution containing 10 % dextrane sulfate, 1 M NaCl, 1 % SDS and 5 x 106 cpm 32P labeled probe, at 65 °C, with a final wash solution of 1 x SSC and 0.1 % SDS and final wash at 50 °C. More generally, hybridization of short nucleic acids (below 200 bp in length, e.g. 17-40 bp in length) can be effected using the following exemplary hybridization protocols which can be modified according to the desired stringency; (i) hybridization solution of 6 x SSC and 1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature of 1 - 1.5 °C below the Tm, final wash solution of 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS at 1 - 1.5 °C below the Tm; (ii) hybridization solution of 6 x SSC and 0.1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA
(pH 7.6), 0.5 % SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature of 2 - 2.5 °C below the Tm, final wash solution of 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS at 1 - 1.5 °C below the Tm, final wash solution of 6 x SSC, and final wash at 22 °C; (iii) hybridization solution of 6 x SSC and 1 % SDS or 3 M TMACI, 0.01 M sodium phosphate (pH 6.8), 1 mM EDTA (pH 7.6), 0.5 % SDS, 100 μg/ml denatured salmon sperm DNA and 0.1 % nonfat dried milk, hybridization temperature. The detection of hybrid duplexes can be carried out by a number of methods. Typically, hybridization duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then detected. Such labels refer to radioactive, fluorescent, biological or enzymatic tags or labels of standard use in the art. A label can be conjugated to either the oligonucleotide probes or the nucleic acids derived from the biological sample. Probes can be labeled according to numerous well known methods. Non- limiting examples of radioactive labels include 3H, 14C, 32P, and 35S. Non- limiting examples of detectable markers include ligands, fluorophores, chemiluminescent agents, enzymes, and antibodies. Other detectable markers for use with probes, which can enable an increase in sensitivity of the method of the invention, include biotin and radio- nucleotides. It will become evident to the person of ordinary skill that the choice of a particular label dictates the manner in which it is bound to the probe. For example, oligonucleotides of the present invention can be labeled subsequent to synthesis, by incorporating biotinylated dNTPs or rNTP, or some similar means (e.g., photo- cross- linking a psoralen derivative of biotin to RNAs), followed by addition of labeled streptavidin (e.g., phycoerythrin-conjugated streptavidin) or the equivalent. Alternatively, when fluorescent ly- labeled oligonucleotide probes are used, fluorescein, lissamine, phycoerythrin, rhodamine (Perkin Elmer Cetus), Cy2, Cy3, Cy3.5, Cy5, Cy5.5, Cy7, FluorX (Amersham) and others [e.g., Kricka et al. (1992), Academic Press San Diego, Calif] can be attached to the oligonucleotides. Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes.
It will be appreciated that a variety of controls may be usefully employed to improve accuracy of hybridization assays. For instance, samples may be hybridized to an inelevant probe and treated with RNAse A prior to hybridization, to assess false hybridization. Although the present invention is not specifically dependent on the use of a label for the detection of a particular nucleic acid sequence, such a label might be beneficial, by increasing the sensitivity of the detection. Furthermore, it enables automation. Probes can be labeled according to numerous well known methods. As commonly known, radioactive nucleotides can be incorporated into probes of the invention by several methods. Non- limiting examples of radioactive labels include 3H, l4C, 32P, and 35S. Those skilled in the art will appreciate that wash steps may be employed to wash away excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay formats are suitable for detecting the hybrids using the labels present on the oligonucleotide primers and probes. It will be appreciated that a variety of controls may be usefully employed to improve accuracy of hybridization assays. Probes of the invention can be utilized with naturally occurring sugar- phosphate backbones as well as modified backbones including phosphorothioates, dithionates, alkyl phosphonates and a-nucleotides and the like. Probes of the invention can be constructed of either ribonucleic acid (RNA) or deoxyribonucleic acid (DNA), and preferably of DNA.
NAT Assays Detection of a nucleic acid of interest in a biological sample may also optionally be effected by NAT-based assays, which involve nucleic acid amplification technology, such as PCR for example (or variations thereof such as real-time PCR for example). As used herein, a "primer" defines an oligonucleotide which is capable of annealing to (hybridizing with) a target sequence, thereby creating a double stranded region which can serve as an initiation point for DNA synthesis under suitable conditions. Amplification of a selected, or target, nucleic acid sequence may be canied out by a number of suitable methods. See generally Kwoh et al., 1990, Am. Biotechnol. Lab. 8: 14
Numerous amplification techniques have been described and can be readily adapted to suit
particular needs of a person of ordinary skill. Non- limiting examples of amplification techniques include polymerase chain reaction (PCR), ligase chain reaction (LCR), strand displacement amplification (SDA), transcription-based amplification, the q3 replicase system and NASBA (Kwoh et al., 1989, Proc. Natl. Acad. Sci. USA 86, 1173-1177; Lizardi et al., 1988, BioTechnology 6:1197-1202; Malek et al., 1994, Methods Mol. Biol., 28:253-260; and Sambrook et al., 1989, supra). The terminology "amplification pair" (or "primer pair") refers herein to a pair of oligonucleotides (oligos) of the present invention, which are selected to be used together in amplifying a selected nucleic acid sequence by one of a number of types of amplification processes, preferably a polymerase chain reaction. Other types of amplification processes include ligase chain reaction, strand displacement amplification, or nucleic acid sequence-based amplification, as explained in greater detail below. As commonly known in the art, the oligos are designed to bind to a complementary sequence under selected conditions. In one particular embodiment, amplification of a nucleic acid sample from a patient is amplified under conditions which favor the amplification of the most abundant differentially expressed nucleic acid. In one prefened embodiment, RT-PCR is canied out on an mRNA sample from a patient under conditions which favor the amplification of the most abundant mRNA. In another prefened embodiment, the amplification of the differentially expressed nucleic acids is carried out simultaneously. It will be realized by a person skilled in the art that such methods could be adapted for the detection of differentially expressed proteins instead of differentially expressed nucleic acid sequences. The nucleic acid (i.e. DNA or RNA) for practicing the present invention may be obtained according to well known methods. Oligonucleotide primers of the present invention may be of any suitable length, depending on the particular assay format and the particular needs and targeted genomes employed. Optionally, the oligonucleotide primers are at least 12 nucleotides in length, preferably between 15 and 24 molecules, and they may be adapted to be especially suited to a chosen nucleic acid amplification system. As commonly known in the art, the oligonucleotide primers can be designed by taking into consideration the melting point of hybridization thereof with its targeted sequence (Sambrook et al., 1989, Molecular Cloning -A Laboratory Manual,
2nd Edition, CSH Laboratories; Ausubel et al., 1989, in Cunent Protocols in Molecular Biology, John Wiley & Sons Inc., N.Y.). It will be appreciated that antisense oligonucleotides may be employed to quantify expression of a splice isoform of interest. Such detection is effected at the pre- mRNA level. Essentially the ability to quantitate transcription from a splice site of interest can be effected based on splice site accessibility. Oligonucleotides may compete with splicing factors for the splice site sequences. Thus, low activity of the antisense oligonucleotide is indicative of splicing activity. The polymerase chain reaction and other nucleic acid amplification reactions are well known in the art (various non- limiting examples of these reactions are described in greater detail below). The pair of oligonucleotides according to this aspect of the present invention are preferably selected to have compatible melting temperatures (Tm), e.g., melting temperatures which differ by less than that 7 °C, preferably less than 5 °C, more preferably less than 4 °C, most preferably less than 3 °C, ideally between 3 °C and 0 °C. Polymerase Chain Reaction (PCR): The polymerase chain reaction (PCR), as described in U.S. Pat. Nos. 4,683,195 and 4,683,202 to Mullis and Multis et α/., is a method of increasing the concentration of a segment of target sequence in a mixture of genomic DNA without cloning or purification. This technology provides one approach to the problems of low target sequence concentration. PCR can be used to directly increase the concentration of the target to an easily detectable level. This process for amplifying the target sequence involves the introduction of a molar excess of two oligonucleotide primers which are complementary to their respective strands of the double -stranded target sequence to the DNA mixture containing the desired target sequence. The mixture is denatured and then allowed to hybridize. Following hybridization, the primers are extended with polymerase so as to form complementary strands. The steps of denaturation, hybridization (annealing), and polymerase extension (elongation) can be repeated as often as needed, in order to obtain relatively high concentrations of a segment of the desired target sequence. The length of the segment of the desired target sequence is determined by the relative positions of the primers with respect to each other, and, therefore, this length is a controllable parameter. Because the desired segments of the target sequence become the dominant sequences (in terms of concentration) in the mixture, they are said to be "PCR-amplified."
Ligase Chain Reaction (LCR or LAR): The ligase chain reaction [LCR; sometimes refened to as "Ligase Amplification Reaction" (LAR)] has developed into a well-recognized alternative method of amplifying nucleic acids. In LCR, four oligonucleotides, two adjacent oligonucleotides which uniquely hybridize to one strand of target DNA, and a complementary set of adjacent oligonucleotides, which hybridize to the opposite strand are mixed and DNA ligase is added to the mixture. Provided that there is complete complementarity at the junction, ligase will covalently link each set of hybridized molecules. Importantly, in LCR, two probes are ligated together only when they base-pair with sequences in the target sample, without gaps or mismatches. Repeated cycles of denaturation, and ligation amplify a short segment of DNA. LCR has also been used in combination with PCR to achieve enhanced detection of single-base changes: see for example Segev, PCT Publication No. W09001069 Al (1990). However, because the four oligonucleotides used in this assay can pair to form two short ligatable fragments, there is the potential for the generation of target- independent background signal. The use of LCR for mutant screening is limited to the examination of specific nucleic acid positions. Self-Sustained Synthetic Reaction (3SR/NASBA): The self- sustained sequence replication reaction (3SR) is a transcription-based in vitro amplification system that can exponentially amplify RNA sequences at a uniform temperature. The amplified RNA can then be utilized for mutation detection. In this method, an oligonucleotide primer is used to add a phage RNA polymerase promoter to the 5' end of the sequence of interest. In a cocktail of enzymes and substrates that includes a second primer, reverse transcriptase, RNase H, RNA polymerase and ribo-and deoxyribonucleoside triphosphates, the target sequence undergoes repeated rounds of transcription, cDNA synthesis and second- strand synthesis to amplify the area of interest. The use of 3SR to detect mutations is kinetically limited to screening small segments of DNA (e.g., 200-300 base pairs). Q-Beta (Qβ) Replicase: In this method, a probe which recognizes the sequence of interest is attached to the replicatable RNA template for Qβ replicase. A previously identified major problem with false positives resulting from the replicatbn of unhybridized probes has been addressed through use of a sequence- specific ligation step. However, available thermostable DNA ligases are not effective on this RNA substrate, so the ligation must be performed by T4 DNA ligase at low temperatures (37 degrees C). This prevents the use of high
temperature as a means of achieving specificity as in the LCR, the ligation event can be used to detect a mutation at the junction site, but not elsewhere. A successful diagnostic method must be very specific. A straight-forward method of controlling the specificity of nucleic acid hybridization is by controlling the temperature of the reaction. While the 3SR/NASBA, and Qβ systems are all able to generate a large quantity of signal, one or more of the enzymes involved in each cannot be used at high temperature (i.e., > 55 degrees C). Therefore the reaction temperatures cannot be raised to prevent non-specific hybridization of the probes. If probes are shortened in order to make them melt more easily at low temperatures, the likelihood of having more than one perfect match in a complex genome increases. For these reasons, PCR and LCR cunently dominate the research field in detection technologies. The basis of the amplification procedure in the PCR and LCR is the fact that the products of one cycle become usable templates in all subsequent cycles, consequently doubling the population with each cycle. The final yield of any such doubling system can be expressed as: (1+X)n =y, where "X" is the mean efficiency (percent copied in each cycle), "n" is the number of cycles, and "y" is the overall efficiency, or yield ofthe reaction. If every copy of a target DNA is utilized as a template in every cycle of a polymerase chain reaction, then the mean efficiency is
100 %. If 20 cycles of PCR are performed, then the yield will be 220, or 1,048,576 copies ofthe starting material. If the reaction conditions reduce the mean efficiency to 85 %, then the yield in those 20 cycles will be only 1.85^0, or 220,513 copies of the starting material. In other words, a PCR running at 85 % efficiency will yield only 21 % as much final product, compared to a reaction running at 100 % efficiency. A reaction that is reduced to 50 % mean efficiency will yield less than 1 % of the possible product. In practice, routine polymerase chain reactions rarely achieve the theoretical maximum yield, and PCRs are usually n for more than 20 cycles to compensate for the lower yield. At 50 % mean efficiency, it would take 34 cycles to achieve the million-fold amplification theoretically possible in 20, and at lower efficiencies, the number of cycles required becomes prohibitive. In addition, any background products that amplify with a better mean efficiency than the intended target will become the dominant products.
Also, many variables can influence the mean efficiency of PCR, including target DNA length and secondary structure, primer length and design, primer and dNTP concentrations, and buffer composition, to name but a few. Contamination of the reaction with exogenous DNA (e.g., DNA spilled onto lab surfaces) or cross- contamination is also a major consideration. Reaction conditions must be carefully optimized for each different primer pair and target sequence, and the process can take days, even for an experienced investigator. The laboriousness of this process, including numerous technical considerations and other factors, presents a significant drawback to using PCR in the clinical setting. Indeed, PCR has yet to penetrate the clinical market in a significant way. The same concerns arise with LCR, as LCR must also be optimized to use different oligonucleotide sequences for each target sequence. In addition, both methods require expensive equipment, capable of precise temperature cycling. Many applications of nucleic acid detection technologies, such as in studies of allelic variation, involve not only detection of a specific sequence in a complex background, but also the discrimination between sequences with few, or single, nucleotide differences. One method of the detection of allele-specific variants by PCR is based upon the fact that it is difficult for Taq polymerase to synthesize a DNA strand when there is a mismatch between the template strand and the 3' end of the primer. An allele-specific variant may be detected by the use of a primer that is perfectly matched with only one of the possible alleles; the mismatch to the other allele acts to prevent the extension ofthe primer, thereby preventing the amplification of that sequence. This method has a substantial limitation in that the base composition of the mismatch influences the ability to prevent extension across the mismatch, and certain mismatches do not prevent extension or have only a minimal effect. A similar 3'-mismatch strategy is used with greater effect to prevent ligation in the LCR. Any mismatch effectively blocks the action of the thermostable ligase, but LCR still has the drawback of target- independent background ligation products initiating the amplification. Moreover, the combination of PCR with subsequent LCR to identify the nucleotides at individual positions is also a clearly cumbersome proposition for the clinical laboratory. The direct detection method according to various prefened embodiments of the present invention may be, for example a cycling probe reaction (CPR) or a branched DNA analysis. When a sufficient amount of a nucleic acid to be detected is available, there are advantages to detecting that sequence directly, instead of making more copies of that target,
(e.g., as in PCR and LCR). Most notably, a method that does not amplify the signal exponentially is more amenable to quantitative analysis. Even if the signal is enhanced by attaching multiple dyes to a single oligonucleotide, the conelation between the final signal intensity and amount of target is direct. Such a system has an additional advantage that the products of the reaction will not themselves promote further reaction, so contamination of lab surfaces by the products is not as much of a concern. Recently devised techniques have sought to eliminate the use of radioactivity and/or improve the sensitivity in automatable formats. Two examples are the "Cycling Probe Reaction" (CPR), and "Branched DNA" (bDNA). Cycling probe reaction (CPR): The cycling probe reaction (CPR), uses a long chimeric oligonucleotide in which a central portion is made of RNA while the two termini are made of DNA. Hybridization of the probe to a target DNA and exposure to a thermostable RNase H causes the RNA portion to be digested. This destabilizes the remaining DNA portions of the duplex, releasing the remainder of the probe from the target DNA and allowing another probe molecule to repeat the process. The signal, in the form of cleaved probe molecules, accumulates at a linear rate. While the repeating process increases the signal, the RNA portion of the oligonucleotide is vulnerable to RNases that may canied through sample preparation. Branched DNA: Branched DNA (bDNA), involves oligonucleotides with branched structures that allow each individual oligonucleotide to carry 35 to 40 labels (e.g., alkaline phosphatase enzymes). While this enhances the signal from a hybridization event, signal from non-specific binding is similarly increased. The detection of at least one sequence change according to various prefened embodiments of the present invention may be accomplished by, for example restriction fragment length polymorphism (RFLP analysis), allele specific oligonucleotide (ASO) analysis, Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE), Single-Strand Conformation Polymorphism (SSCP) analysis or Dideoxy fingerprinting (ddF). The demand for tests which allow the detection of specific nucleic acid sequences and sequence changes is growing rapidly in clinical diagnostics. As nucleic acid sequence data for genes from humans and pathogenic organisms accumulates, the demand for fast, cost-effective, and easy-to-use tests for as yet mutations within specific sequences is rapidly increasing. A handful of methods have been devised to scan nucleic acid segments for mutations.
One option is to determine the entire gene sequence of each test sample (e.g., a bacterial isolate).
For sequences under approximately 600 nucleotides, this may be accomplished using amplified material (e.g., PCR reaction products). This avoids the time and expense associated with cloning the segment of interest. However, specialized equipment and highly trained personnel are required, and the method is too labor- intense and expensive to be practical and effective in the clinical setting. In view of the difficulties associated with sequencing, a given segment of nucleic acid may be characterized on several other levels. At the lowest resolution, the size of the molecule can be determined by electrophoresis by comparison to a known standard run on the same gel. A more detailed picture of the molecule may be achieved by cleavage with combinations of restriction enzymes prior to electrophoresis, to allow construction of ai ordered map. The presence of specific sequences within the fragment can be detected by hybridization of a labeled probe, or the precise nucleotide sequence can be determined by partial chemical degradation or by primer extension in the presence of chain-terminating nucleotide analogs. Restriction fragment length polymorphism (RFLP): For detection of single-base differences between like sequences, the requirements ofthe analysis are often at the highest level of resolution. For cases in which the position of the nucleotide in question is known in advance, several methods have been developed for examining single base changes without direct sequencing. For example, if a mutation of interest happens to fall within a restriction recognition sequence, a change in the pattern of digestion can be used as a diagnostic tool (e.g., restriction fragment length polymoφhism [RFLP] analysis). Single point mutations have been also detected by the creation or destruction of RFLPs. Mutations are detected and localized by the presence and size of the RNA fragments generated by cleavage at the mismatches. Single nucleotide mismatches in DNA heteroduplexes are also recognized and cleaved by some chemicals, providing an alternative strategy to detect single base substitutions, generically named the "Mismatch Chemical Cleavage" (MCC). However, this method requires the use of osmium tetroxide and piperidine, two highly noxious chemicals which are not suited for use in a clinical laboratory. RFLP analysis suffers from low sensitivity and requires a large amount of sample. When RFLP analysis is used for the detection of point mutations, it is, by its nature, limited to the detection of only those single base changes which fall within a restriction sequence of a known restriction endonuclease. Moreover, the majority of the available enzymes have 4 to 6 base-pair
recognition sequences, and cleave too frequently for many large-scale DNA manipulations. Thus, it is applicable only in a small fraction of cases, as most mutations do not fall within such sites. A handful of rare- cutting restriction enzymes with 8 base-pair specificities have been isolated and these are widely used in genetic mapping, but these enzymes are few in number, are limited to the recognition of G+C-rich sequences, and cleave at sites that tend to be highly clustered. Recently, endonucleases encoded by group I introns have been discovered that might have greater than 12 base-pair specificity, but again, these are few in number. Allele specific oligonucleotide (ASO): If the change is not in a recognition sequence, then allele-specific oligonucleotides (ASOs), can be designed to hybridize in proximity to the mutated nucleotide, such that a primer extension or ligation event can bused as the indicator of a match or a mis- match. Hybridization with radioactively labeled allelic specific oligonucleotides (ASO) also has been applied to the detection of specific point mutations. The method is based on the differences in the melting temperature of short DNA fragments differing by a single nucleotide. Stringent hybridization and washing conditions can differentiate between mutant and wild-type alleles. The ASO approach applied to PCR products also has been extensively utilized by various researchers to detect and characterize point mutations in ras genes and gsp/gip oncogenes. Because of the presence of various nucleotide changes in multiple positions, the ASO method requires the use of many oligonucleotides to cover all possible oncogenic mutations. With either of the techniques described above (i.e., RFLP and ASO), the precise location of the suspected mutation must be known in advance of the test. That is to say, they are inapplicable when one needs to detect the presence of a mutation within a gene or sequence of interest. Denaturing/Temperature Gradient Gel Electrophoresis (DGGE/TGGE): Two other methods rely on detecting changes in electrophoretic mobility in response to minor sequence changes. One of these methods, termed "Denaturing Gradient Gel Electrophoresis " (DGGE) is based on the observation that slightly different sequences will display different patterns of local melting when electrophoretically resolved on a gradient gel. In this manner, variants can be distinguished, as differences in melting properties of homoduplexes versus heteroduplexes differing in a single nucleotide can detect the presence of mutations in the target sequences
because of the conesponding changes in their electrophoretic mobilities. The fragments to be analyzed, usually PCR products, are "clamped" at one end by a long stretch of GC base pairs (30-80) to allow complete denaturation of the sequence of interest without complete dissociation of the strands. The attachment of a GC "clamp" to the DNA fragments increases the fraction of mutations that can be recognized by DGGE. Attaching a GC clamp to one primer is critical to ensure that the amplified sequence has a low dissociation temperature. Modifications of the technique have been developed, using temperature gradients, and the method can be also applied to RNA:RNA duplexes. Limitations on the utility of DGGE include the requirement that the denaturing conditions must be optimized for each type of DNA to be tested. Furthermore, the method requires specialized equipment to prepare the gels and maintain the needed high temperatures during electrophoresis. The expense associated with the synthesis of the clamping tail on one oligonucleotide for each sequence to be tested is also a major consideration. In addition, long running times are required for DGGE. The long running time of DGGE was shortened in a modification of DGGE called constant denaturant gel electrophoresis (CDGE). CDGE requires that gels be performed under different denaturant conditions in order to reach high efficiency for the detection of mutations. A technique analogous to DGGE, termed temperature gradient gel electrophoresis (TGGE), uses a thermal gradient rather than a chemical denaturant gradient. TGGE requires the use of specialized equipment which can generate a temperamre gradient peφendicularly oriented relative to the electrical field. TGGE can detect mutations in relatively small fragments of DNA therefore scanning of large gene segments requires the use of multiple PCR products prior to running the gel. Single-Strand Conformation Polymorphism (SSCP): Another common method, called "Single- Strand Conformation Polymoφhism" (SSCP) was developed by Hayashi, Sekya and colleagues and is based on the observation that single strands of nucleic acid can take on characteristic conformations in non- denaturing conditions, and these conformations influence electrophoretic mobility. The complementary strands assume sufficiently different structures that one strand may be resolved from the other. Changes in sequences within the fragment will also change the conformation, consequently altering the mobility and allowing this to be used as an assay for sequence variations.
The SSCP process involves denaturing a DNA segment (e.g., a PCR product) that is labeled on both strands, followed by slow electrophoretic separation on a non-denaturing polyacrylamide gel, so that infra- molecular interactions can form and not be disturbed during the ran. This technique is extremely sensitive to variations in gel composition and temperature. A serious limitation of this method is the relative difficulty encountered in comparing data generated in different laboratories, under apparently similar conditions. Dideoxy fingerprinting (ddF): The dideoxy fϊngeφrinting (ddF) is another technique developed to scan genes for the presence of mutations. The ddF technique combines components of Sanger dideoxy sequencing with SSCP. A dideoxy sequencing reaction is performed using one dideoxy terminator and then the reaction products are ebcfrophoresed on nondenaturing polyacrylamide gels to detect alterations in mobility of the termination segments as in SSCP analysis. While ddF is an improvement over SSCP in terms of increased sensitivity, ddF requires the use of expensive dideoxynucleotides and this technique is still limited to the analysis of fragments of the size suitable for SSCP (i.e., fragments of 200-300 bases for optimal detection of mutations). In addition to the above limitations, all of these methods are limited as to the size of the nucleic acid fragment that can be analyzed. For the direct sequencing approach, sequences of greater than 600 base pairs require cloning, with the consequent delays and expense of either deletion sub-cloning or primer walking, in order to cover the entire fragment. SSCP and DGGE have even more severe size limitations. Because of reduced sensitivity to sequence changes, these methods are not considered suitable for larger fragments. Although SSCP is reportedly able to detect 90 % of single-base sub stitutions within a 200 base-pair fragment, the detection drops to less than 50 % for 400 base pair fragments. Similarly, the sensitivity of DGGE decreases as the length of the fragment reaches 500 base-pairs. The ddF technique, as a combination of direct sequencing and SSCP, is also limited by the relatively small size of the DNA that can be screened. According to a presently prefened embodiment of the present invention the step of searching for any of the nucleic acid sequences described here, in tumor cells or in cells derived from a cancer patient is effected by any suitable technique, including, but not limited to, nucleic acid sequencing, polymerase chain reaction, ligase chain reaction, self- sustained synthetic reaction, Qβ-Replicase, cycling probe eaction, branched DNA, restriction fragment length
polymoφhism analysis, mismatch chemical cleavage, heteroduplex analysis, allele-specific oligonucleotides, denaturing gradient gel electrophoresis, constant denaturant gel electrophoresis, temperature gradient gel electrophoresis and dideoxy fingeφrinting. Detection may also optionally be performed with a chip or other such device. The nucleic acid sample which includes the candidate region to be analyzed is preferably isolated, amplified and labeled with a reporter group. This reporter group can be a fluorescent group such as phycoerythrin. The labeled nucleic acid is then incubated with the probes immobilized on the chip using a fluidics station, describe the fabrication of fluidics devices and particularly microcapillary devices, in silicon and glass substrates. Once the reaction is completed, the chip is inserted into a scanner and patterns of hybridization are detected. The hybridization data is collected, as a signal emitted from the reporter groups already incoφorated into the nucleic acid, which is now bound to the probes attached to the chip. Since the sequence and position of each probe immobilized on the chip is known, the identity of the nucleic acid hybridized to a given probe can be determined. It will be appreciated that when utilized along with automated equipment, the above described detection methods can be used to screen multiple samples for a disease and/or pathological condition both rapidly and easily.
Amino acid sequences and peptides The terms "polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an analog or mimetic of a conesponding naturally occurring amino acid, as well as to naturally occuning amino acid polymers. Polypeptides can be modified, e.g., by the addition of carbohydrate residues to form glycoproteins. The terms "polypeptide," "peptide" and "protein" include glycoproteins, as well as non-glycoproteins. Polypeptide products can be biochemically synthesized such as by employing standard solid phase techniques. Such methods include but are not limited to exclusive solid phase synthesis, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) and/or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic acid sequence) and therefore involves different chemistry.
Solid phase polypeptide synthesis procedures are well known in the art and further described by John Monow Stewart and Janis Dillaha Young, Solid Phase Peptide Syntheses (2nd Ed., Pierce Chemical Company, 1984). Synthetic polypeptides can optionally be purified by preparative high performance liquid chromatography [Creighton T. (1983) Proteins, structures and molecular principles. WH Freeman and Co. N.Y.], after which their composition can be confirmed via amino acid sequencing. In cases where large amounts of a polypeptide are desired, it can be generated using recombinant techniques such as described by Bitter et al., (1987) Methods in Enzymol. 153:516- 544, Studier et al. (1990) Methods in Enzymol. 185:60-89, Brisson et al. (1984) Nature 310:511- 514, Takamatsu et al. (1987) EMBO J. 6:307-311, Coruzzi et al. (1984) EMBO J. 3:1671-1680 and Brogli et al., (1984) Science 224:838-843, Gurley et al. (1986) Mol. Cell. Biol. 6:559-565 and Weissbach & Weissbach, 1988, Methods for Plant Molecular Biology, Academic Press, NY, Section VIII, pp 421-463. The present invention also encompasses polypeptides encoded by the polynucleotide sequences of the present invention, as well as polypeptides according to the amino acid sequences described herein. The present invention also encompasses homologues of these polypeptides, such homologues can be at least 50 %, at least 55 %, at least 60%, at least 65 %, at least 70 %, at least 75 %, at least 80 %, at least 85 %, at least 95 % or more say 100 % homologous to the amino acid sequences set forth below, as can be determined using BlastP software of the National Center of Biotechnology Information (NCBI) using default parameters, optionally and preferably including the following: filtering on (this option filters repetitive or low- complexity sequences from the query using the Seg (protein) program), scoring matrix is BLOSUM62 for proteins, word size is 3, E value is 10, gap costs are 11, 1 (initialization and extension), and number of alignments shown is 50. Preferably, nucleic acid sequence homology/identity is determined by using BlastN software of the National Center of Biotechnology Information (NCBI) using default parameters, which preferably include using the DUST filter program, and dso preferably include having an E value of 10, filtering low complexity sequences and a word size of 11. Finally, the present invention also encompasses fragments of the above described polypeptides and polypeptides having mutations, such as
deletions, insertions or substitutions of one or more amino acids, either naturally occurring or artificially induced, either randomly or in a targeted fashion. It will be appreciated that peptides identified according the present invention may be degradation products, synthetic peptides or recombinant peptides as well as peptidomimetics, typically, synthetic peptides and peptoids and semipeptoids which are peptide analogs, which may have, for example, modifications rendering the peptides more stable while in a body or more capable of penetrating into cells. Such modifications include, but are not limited to N terminus modification, C terminus modification, peptide bond modification, including, but not limited to, CH2-NH, CH2-S, CH2-S=0, 0=C-NH, CH2-0, CH2-CH2, S=C-NH, CH=CH or CF=CH, backbone modifications, and residue modification. Methods for preparing peptidomimetic compounds are well known in the art and are specified. Further details in this respect are provided hereinunder. Peptide bonds (CO-NH-) within the peptide may be substituted, for example, by N- methylated bonds (-N(CH3)-CO-), ester bonds (- C(R)H-C-0-0-C(R)-N-), ketomethylen bonds (-C0-CH2-), α-aza bonds (-NH-N(R)-CO-), wherein R is any alkyl, e.g., methyl, carba bonds (- CH2-NH-), hydroxyethylene bonds (-CH(OH)-CH2-), thioamide bonds (-CS-NH-), olefinic double bonds (-CH=CH-), retro amide bonds (-NH-CO-), peptide derivatives (-N(R)-CH2-CO-), wherein R is the "normal" side chain, naturally presented on the carbon atom. These modifications can occur at any of the bonds along the peptide chain and even at several (2-3) at the same time. Natural aromatic amino acids, Tφ, Tyr and Phe, may be substituted for synthetic non- natural acid such as Phenyl glycine, TIC, naphthylelanine (Nol), ring- methylated derivatives of Phe, halogenated derivatives of Phe or o- methyl- Tyr. In addition to the above, the peptides of the present invention may also include one or more modified amino acids or one or more non-amino acid monomers (e.g. fatty acids, complex carbohydrates etc). As used herein in the specification and in the claims section below the term "amino acid" or "amino acids" is understood to include the 20 naturally occurring amino acids; those amino acids often modified post-translationally in vivo, including, for example, hydroxyproline, phosphoserine and phosphothreonine; and other unusual amino acids including, but not limited
to, 2-aminoadipic acid, hydroxylysine, isodesmosine, nor-valine, nor-leucine and omithine. Furthermore, the term "amino acid" includes both D- and L-amino acids. Table 1 non-conventional or modified amino acids which can be used with the present invention.
Table 1
Table 1 Cont.
Since the peptides of the present invention are preferably utilized in diagnostics which require the peptides to be in soluble form, the peptides of the present invention preferably include one or more non-natural or natural polar amino acids, including but not limited to serine and threonine which are capable of increasing peptide solubility due to their hydroxyl-containing side chain. The peptides of the present invention are preferably utilized in a linear form, although it will be appreciated that in cases where cyclicization does not severely interfere with peptide characteristics, cyclic forms of the peptide can also be utilized. The peptides of present invention can be biochemically synthesized such as by using standard solid phase techniques. These methods include exclusive solid phase synthesis well known in the art, partial solid phase synthesis methods, fragment condensation, classical solution synthesis. These methods are preferably used when the peptide is relatively short (i.e., 10 kDa) and/or when it cannot be produced by recombinant techniques (i.e., not encoded by a nucleic acid sequence) and therefore involves different chemistry. Synthetic peptides can be purified by preparative high performance liquid chromatography and the composition of which can be confirmed via amino acid sequencing. In cases where large amounts of the peptides of the present invention are desired, the peptides of the present invention can be generated using recombinant techniques such as described by Bitter et al., (1987) Methods in Enzymol. 153:516-544, Studier et al. (1990)
Methods in Enzymol. 185:60-89, Brisson et al. (1984) Nature 310:511-514, Takamatsu et al. (1987) EMBO J. 6:307-311, Coruzzi et al. (1984) EMBO J. 3: 1671- 1680 and Brogli et al., (1984) Science 224:838-843, Gurley et al. (1986) Mol. Cell. Biol. 6:559-565 and Weissbach & Weissbach, 1988, Methods for Plant Molecular Biology, Academic Press, NY, Section VIII, pp 421-463 and also as described above.
Antibodies "Antibody" refers to a polypeptide ligand that is preferably substantially encoded by an immunoglobuhn gene or immunoglobulin genes, or fragments thereof, which specifically binds and recognizes an epitope (e.g., an antigen). The recognized immunoglobuhn genes include the kappa and lambda light chain constant region genes, the alpha, gamma, delta, epsilon and mu heavy chain constant region genes, and the myriad- immunoglobulin variable region genes. Antibodies exist, e.g., as intact immunoglobulins or as a number of well characterized fragments produced by digestion with various peptidases. This includes, e.g., Fab' and F(ab)'2 fragments. The term "antibody," as used herein, also includes antibody fragments either produced by the modification of whole antibodies or those synthesized de novo using recombinant DNA methodologies. It also includes polyclonal antibodies, monoclonal antibodies, chimeric antibodies, humanized antibodies, or single chain antibodies. "Fc" portion of an antibody refers to that portion of an immunoglobulin heavy chain that comprises one or more heavy chain constant region domains, CHI, CH2 and CH3, but does not include the heavy chain variable region. The functional f agments of antibodies, such as Fab, F(ab')2, and Fv that are capable of binding to macrophages, are described as follows: (1) Fab, the fragment which contains a monovalent antigen-binding fragment of an antibody molecule, can be produced by digestion of whole antibody with the enzyme papain to yield an intact light chain and a portion of one heavy chain; (2) Fab', the fragment of an antibody molecule that can be obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain; two Fab' fragments are obtained per antibody molecule; (3) (Fab')2, the fragment of the antibody that can be obtained by treating whole antibody with the enzyme pepsin without subsequent reduction; F(ab')2 is a dimer of two Fab' fragments held together by two disulfide
bonds; (4) Fv, defined as a genetically engineered fragment containing the variable region of the light chain and the variable region of the heavy chain expressed as two chains; and (5) Single chain antibody ("SCA"), a genetically engineered molecule containing the variable region of the light chain and the variable region of the heavy chain, linked by a suitable polypeptide linker as a genetically fused single chain molecule. Methods of producing polyclonal and monoclonal antibodies as well as fragments thereof are well known in the art (See for example, Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, New York, 1988, incoφorated herein by reference). Antibody fragments according to the present invention can be prepared by proteolytic hydrolysis of the antibody or by expression in E. coli or mammalian cells (e.g. Chinese hamster ovary cell culture or other protein expression systems) of DNA encoding the fragment. Antibody fragments can be obtained by pepsin or papain digestion of whole antibodies by conventional methods. For example, antibody fragments can be produced by enzymatic cleavage of antibodies with pepsin to provide a 5S fragment denoted F(ab')2. This fragment can be further cleaved using a thiol reducing agent, and optionally a blocking group for the sulfhydryl groups resulting from cleavage of disulfide linkages, to produce 3.5S Fab' monovalent fragments. Alternatively, an enzymatic cleavage using pepsin produces two monovalent Fab' fragments and an Fc fragment directly. These methods are described, for example, by Goldenberg, U.S. Pat. Nos. 4,036,945 and 4,331,647, and references contained therein, which patents are hereby incoφorated by reference in their ertirety. See also Porter, R. R. [Biochem. J. 73: 119-126 (1959)]. Other methods of cleaving antibodies, such as separation of heavy chains to form monovalent light- heavy chain fragments, further cleavage of fragments, or other enzymatic, chemical, or genetic techniques may also be used, so long as the fragments bind to the antigen that is recognized by the intact antibody. Fv fragments comprise an association of VH and VL chains. This association may be noncovalent, as described in Inbar et al. [Proc. Nat'l Acad. Sci. USA 69:2659-62 (19720]. Alternatively, the variable chains can be linked by an intermolecular disulfide bond or cross- linked by chemicals such as glutaraldehyde. Preferably, the Fv fragments comprise VH and VL chains connected by a peptide linker. These single-chain antigen binding proteins (sFv) are prepared by constructing a structural gene comprising DNA sequences encoding the VH and VL domains connected by an oligonucleotide. The structural gene is inserted into an expression
vector, which is subsequently introduced into a host cell such as E. coli. The recombinant host cells synthesize a single polypeptide chain with a linker peptide bridging the two V domains. Methods for producing sFvs are described, for example, by [Whitlow and Filpula, Methods 2: 97-105 (1991); Bird et al., Science 242:423-426 (1988); Pack et al., Bio/Technology 11:1271-77 (1993); and U.S. Pat. No. 4,946,778, which is hereby incoφorated by reference in its entirety. Another form of an antibody fragment is a peptide coding for a single complementarity- determining region (CDR). CDR peptides ("minimal recognition units") can be obtained by constructing genes encoding the CDR of an antibody of interest. Such genes are prepared, for example, by using the polymerase chain reaction to synthesize the variable region from RNA of antibody-producing cells. See, for example, Lanick and Fry [Methods, 2: 106-10 (1991)]. Humanized forms of non-human (e.g., murine) antibodies are chimeric molecules of immunoglobulins, immunoglobulin chains or fragments thereof (such as Fv, Fab, Fab', F(ab') or other antigen-binding subsequences of antibodies) which contain minimal sequence derived from non-human immunoglobulin. Humanized antibodies include human immunoglobulins (recipient antibody) in which residues from a complementary determining region (CDR) of the recipient are replaced by residues from a CDR of a non-human species (donor antibody) such as mouse, rat or rabbit having the desired specificity, affinity and capacity. In some instances, Fv framework residues of the human immunoglobulin are replaced by conesponding non-human residues. Humanized antibodies may also comprise residues which are found neither in the recipient antibody nor in the imported CDR or framework sequences. In general, the humanized antibody will comprise substantially all of at least one, and typically two, variable domains, in which all or substantially all of the CDR regions correspond to those of a non-human immunoglobulin and all or substantially all of the FR regions are those of a human immunoglobulin consensus sequence. The humanized antibody optimally also will comprise at least a portion of an immunoglobulin constant region (Fc), typically that of a human immunoglobulin [Jones et al., Nature, 321:522-525 (1986); Riechmann et al., Nature, 332:323- 329 (1988); and Presta, Cun. Op. Struct. Biol., 2:593-596 (1992)]. Methods for humanizing non-human antibodies are well known in the art. Generally, a humanized antibody has one or more amino acid residues introduced into it from a source which is non- human. These non- human amino acid residues are often refened to as import residues, which are typically taken from an import variable domain. Humanization can be essentially
performed following the method of Winter and co-workers [Jones et al., Nature, 321:522-525 (1986); Riechmann et al., Nature 332:323-327 (1988); Verhoeyen et al., Science, 239:1534- 1536 (1988)], by substituting rodent CDRs or CDR sequences for the conesponding sequences of a human antibody. Accordingly, such humanized antibodies are chimeric antibodies (U.S. Pat. No. 4,816,567), wherein substantially less than an intact human variable domain has been substituted by the conesponding sequence from a non-human species. In practice, humanized antibodies are typically human antibodies in which some CDR residues and possibly some FR residues are substituted by residues from analogous sites in rodent antibodies. Human antibodies can also be produced using various techniques known in the art, including phage display libraries [Hoogenboom and Winter, J. Mol. Biol., 227:381 (1991); Marks et al., J. Mol. Biol., 222:581 (1991)]. The techniques of Cole et al. and Boemer et al. are also available for the preparation of human monoclonal antibodies (Cole et al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, p. 77 (1985) and Boemer et al., J. Immunol., 147(l):86-95 (1991)]. Similarly, human antibodies can be made by introduction of human immunoglobulin loci into transgenic animals, e.g., mice in which the endogenous immunoglobulin genes have been partially or completely inactivated. Upon challenge, human antibody production is observed, which closely resembles that seen in humans in all respects, including gene rearrangement, assembly, and antibody repertoire. This approach is described, for example, in U.S. Pat. Nos. 5,545,807; 5,545,806; 5,569,825; 5,625,126; 5,633,425; 5,661,016, and in the following scientific publications: Marks et al., Bio/Technology 10,: 779- 783 (1992); Lonberg et al., Nature 368: 856-859 (1994); Morrison, Nature 368 812- 13 (1994); Fishwild et al., Nature Biotechnology 14, 845-51 (1996); Neuberger, Nature Biotechnology 14: 826 (1996); and Lonberg and Huszar, Intern. Rev. Immunol. 13, 65-93 (1995). Preferably, the antibody of this aspect of the present invention specifically binds at least one epitope of the polypeptide variants of the present invention. As used herein, the term "epitope" refers to any antigenic determinant on an antigen to which the paratope of an antibody binds. Epitopic determinants usually consist of chemically active surface groupings of molecules such as amino acids or carbohydrate side chains and usually have specific three dimensional structural characteristics, as well as specific charge characteristics.
Optionally, a unique epitope may be created in a variant due to a change in one or more post-translational modifications, including but not limited to glycosylation and/or phosphorylation, as described below. Such a change may also cause a new epitope to be created, for example through removal of glycosylation at a particular site. An epitope according to the present invention may also optionally comprise part or all of a unique sequence portion of a variant according to the present invention in combination with at least one other portion of the variant which is not contiguous to the unique sequence portion in the linear polypeptide itself, yet which are able to form an epitope in combination. One or more unique sequence portions may optionally combine with one or more other non-contiguous portions of the variant (including a portion which may have high homology to a portion of the known protein) to form an epitope.
Immunoassays In another embodiment of the present invention, an immunoassay can be used to qualitatively or quantitatively detect and analyze markers in a sample. This method comprises: providing an antibody that specifically binds to a marker; contacting a sample with the antibody; and detecting the presence of a complex ofthe antibody bound to the marker in the sample. To prepare an antibody that specifically binds to a marker, purified protein markers can be used. Antibodies that specifically bind to a protein marker can be prepared using any suitable methods known in the art. After the antibody is provided, a marker can be detected and/or quantified using any of a number of well recognized immunological binding assays. Useful assays include, for example, an enzyme immune assay (EIA) such as enzyme- linked immunosorbent assay (ELISA), a radioimmune assay (RIA), a Western blot assay, or a slot blot assay see, e.g., U.S. Pat. Nos. 4,366,241; 4,376,110; 4,517,288; and 4,837,168). Generally, a sample obtained from a subject can be contacted with the antibody that specifically binds the marker. Optionally, the antibody can be fixed to a solid support to facilitate washing and subsequent isolation of the complex, prior to contacting the antibody with a sample. Examples of solid supports include but are not limited to glass or plastic in the form of, e.g., a microtiter plate, a stick, a bead, or a microbead. Antibodies can also be attached to a solid support.
After incubating the sample with antibodies, the mixture is washed and the antibody- marker complex formed can be detected. This can be accomplished by hcubating the washed mixture with a detection reagent. Alternatively, the marker in the sample can be detected using an indirect assay, wherein, for example, a second, labeled antibody is used to detect bound marker- specific antibody, and/or in a competition or inhibition assay wherein, for example, a monoclonal antibody which binds to a distinct epitope of the marker are incubated simultaneously with the mixture. Throughout the assays, incubation and/or washing steps may be required after each combination of reagents. Incubation steps can vary from about 5 seconds to several hours, preferably from about 5 minutes to about 24 hours. However, the incubation time will depend upon the assay format, marker, volume of solution, concentrations and the like. Usually the assays will be carried out at ambient temperature, although they can be conducted over a range of temperatures, such as 10 °C to 40 °C. The immunoassay can be used to determine a test amount of a marker in a sample from a subject. First, a test amount of a marker in a sample can be detected using the immunoassay methods described above. If a marker is present in the sample, it will form an antibody- marker complex with an antibody that specifically binds the marker under suitable incubation conditions described above. The amount of an antibody- arker complex can optionally be determined by comparing to a standard. As noted above, the test amount of marker need not be measured in absolute units, as long as the unit of measurement can be compared to a control amount and/or signal. Preferably used are antibodies which specifically interact with the polypeptides of the present invention and not with wild type proteins or other isoforms thereof, for example. Such antibodies are directed, for example, to the unique sequence portions of the polypeptide variants of the present invention, including but not limited to bridges, heads, tails and insertions described in greater detail below. Prefened embodiments of antibodies according to the present invention are described in greater detail with regard to the section entitled "Antibodies". Radio -immunoassay (RIA): In one version, this method involves precipitation of the desired substrate and in the methods detailed hereinbelow, with a specific antibody and J25 radiolabelled antibody binding protein (e.g., protein A labeled with I ) immobilized on a
precipitable carrier such as agarose beads. The number of counts in the precipitated pellet is proportional to the amount of substrate. In an alternate version of the RIA, a labeled substrate and an unlabelled antibody binding protein are employed. A sample containing an unknown amount of substrate is added in varying amounts. The decrease in precipitated counts from the labeled substrate is proportional to the amount of substrate in the added sample. Enzyme linked immunosorbent assay (ELISA): This method involves fixation of a sample (e.g., fixed cells or a proteinaceous solution) containing a protein substrate to a surface such as a well of a microtiter plate. A substrate specific antibody coupled to an enzyme is applied and allowed to bind to the substrate. Presence of the antibody is then detected and quantitated by a colorimetric reaction employing the enzyme coupled to the antibody. Enzymes commonly employed in this method include horseradish peroxidase and alkaline phosphatase. If well calibrated and within the linear range of response, the amount of substrate present in the sample is proportional to the amount of color produced. A substrate standard is generally employed to improve quantitative accuracy. Western blot: This method involves separation of a substrate from other protein by means of an acrylamide gel followed by transfer of the substrate to a membrane (e.g., nylon or PVDF). Presence of the substrate is then detected by antibodies specific to the substrate, which are in turn detected by antibody binding reagents. Antibody binding reagents may be, for example, protein A, or other antibodies. Antibody binding reagents may be radiolabelled or enzyme linked as described hereinabove. Detection may be by autoradiography, colorimetric reaction or chemiluminescence. This method allows both quantitation of an amount of substrate and determination of its identity by a relative position on the membrane which is indicative of a migration distance in the acrylamide gel during electrophoresis. Immunohistochemical analysis: This method involves detection of a substrate in situ in fixed cells by substrate specific antibodies. The substrate specific antibodies may be enzyme linked or linked to fluorophores. Detection is by microscopy and subjective evaluation. If enzyme linked antibodies are employed, a colorimetric reaction may be required. Fluorescence activated cell sorting (FACS): This method involves detection of a substrate in situ in cells by substrate specific antibodies. The substrate specific antibodies are linked to fluorophores. Detection is by means of a cell sorting machine which reads the
wavelength of light emitted from each cell as it passes through a light beam. This method may employ two or more antibodies simultaneously.
Radio-imaging Methods These methods include but are not limited to, positron emission tomography (PET) single photon emission computed tomography (SPECT). Both of these techniques are non- invasive, and can be used to detect and/or measure a wide variety of tissue events and/or functions, such as detecting cancerous cells for example. Unlike PET, SPECT can optionally be used with two labels simultaneously. SPECT has some other advantages as well, for example with regard to cost and the types of labels that can be used. For example, US Patent No. 6,696,686 describes the use of SPECT for detection of breast cancer, and is hereby incoφorated by reference as if fully set forth herein.
Display Libraries According to still another aspect of the present invention there is provided a display library comprising a plurality of display vehicles (such as phages, viruses or bacteria) each displaying at least 6, at least 7, at least 8, at least 9, at least 10, 10-15, 12-17, 15-20, 15-30 or 20- 50 consecutive amino acids derived from the polypeptide sequences of the present invention. Methods of constructing such display libraries are well known in the art. Such methods are described in, for example, Young AC, et al, "The three-dimensional structures of a polysaccharide binding antibody to Cryptococcus neoformans and its complex with a peptide from a phage display library: implications for the identification of peptide mimotopes" J Mol Biol 1997 Dec 12;274(4):622-34; Giebel LB et al. "Screening of cyclic peptide phage libraries identifies ligands that bind streptavidin with high affinities" Biochemistry 1995 Nov 28;34(47): 15430-5; Davies EL et al., "Selection of specific phage-display antibodies using libraries derived from chicken immunoglobulin genes" J Immunol Methods 1995 Oct 12;186(l):125-35; Jones C RT al. "Cunent trends in molecular recognition and bioseparation" J Chromatogr A 1995 Jul 14;707(l):3-22; Deng SJ et al. "Basis for selection of improved carbohydrate-binding single-chain antibodies from synthetic gene libraries" Proc Natl Acad Sci U S A 1995 May 23;92(11):4992-6; and Deng SJ et al. "Selection of antibody single-chain
variable fragments with improved carbohydrate binding by phage display" J Biol Chem 1994 Apr l;269(13):9533-8, which are incoφorated herein by reference.
The following sections relate to Candidate Marker Examples. It should be noted that Table numbering is restarted within each Example, which starts with the words "Description for Cluster".
CANDIDATE MARKER EXAMPLES SECTION This Section relates to Examples of sequences according to the present invention, including illustrative methods of selection thereof with regard to cancer; other markers were selected as described below for the individual markers. Description of the methodology undertaken to uncover the biomolecular sequences of the present invention Human ESTs and cDNAs were obtained from GenBank versions 136 (June 15, 2003 ftp.ncbi.nih.gov/genbank/release.notes/gbl36.release.notes); NCBI genome assembly of April 2003; RefSeq sequences from June 2003; Genbank version 139 (December 2003); Human Genome from NCBI (Build 34) (from Oct 2003); and RefSeq sequences from December 2003. With regard to GenBank sequences, the human EST sequences from the EST (GBEST) section and the human mRNA sequences from the primate (GBPRI) section were used; also the human nucleotide RefSeq mRNA sequences were used (see for example www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html and for a reference to the EST section, see www.ncbi.nlm.nih.gov/dbEST/; a general reference to dbEST, the EST database in GenBank, may be found in Boguski et al, Nat Genet. 1993 Aug;4(4):332-3; all of which are hereby incoφorated by reference as if fully set forth herein). Novel splice variants were predicted using the LEADS clustering and assembly system as described in Sorek, R., Ast, G. & Graur, D. Alu-containing exons are alternatively spliced. Genome Res 12, 1060-7 (2002); US patent No: 6,625,545; and U.S. Pat. Appl. No. 10/426,002, published as US20040101876 on May 27 2004; all of which are hereby incoφorated by reference as if fully set forth herein. Briefly, the software cleans the expressed sequences from repeats, vectors and immunoglobulins. It then aligns the expressed sequences to the genome
taking alternatively splicing into account and clusters overlapping expressed sequences into "clusters" that represent genes or partial genes. These were annotated using the GeneCarta (Compugen, Tel- Aviv, Israel) platform. The GeneCarta platform includes a rich pool of annotations, sequence information (particularly of spliced sequences), chromosomal information, alignments, and additional information such as SNPs, gene ontology terms, expression profiles, functional analyses, detailed domain structures, known and predicted proteins and detailed homology reports. A brief explanation is provided with regard to the method of selecting the candidates. However, it should be noted that this explanation is provided for descriptive puφoses only, and is not intended to be limiting in any way. The potential markers were identified by a computational process that was designed to find genes and/or their splice variants that are specifically expressed in cardiac tissue, as opposed to other types of tissues and also particularly as opposed to muscle tissue, by using databases of expressed sequences. Various parameters related to the information in the EST libraries, determined according to classification by library annotation, were used to assist in locating genes and/or splice variants thereof that are specifically and/or differentially expressed in heart tissues. The detailed description of the selection method and of these parameters is presented in Example 1 below.
PART I - Cardiac disease markers
EXAMPLE 1 Identification of differentially expressed gene products - Algorithm In order to distinguish between differentially expressed gene products and constitutively expressed genes (i.e., house keeping genes), an algorithm based on an analysis of frequencies was configured. A specific algorithm for identification of transcripts specifically expressed in heart tissue is described hereinbelow.
EST analysis ESTs were taken from the following main sources: libraries contained in Genbank version 136 (June 15, 2003 ftp.ncbi.nih.gov/genbank release.notes/gbl36.release.notes) and Genbank version 139 (December 2003); and from the LifeSeq library of Incyte Coφoration
(ESTs only; Wilmington, DE, USA). With regard to GenBank sequences, the human EST sequences from the EST (GBEST) section were used. Library annotation - EST libraries were manually classified according to: 1. Tissue origin 2. Biological source - Examples of frequently used biological sources for construction of EST libraries include cancer cell- lines; normal tissues; cancer tissues; foetal tissues; and others such as normal cell lines and pools of normal cell- lines, cancer cell- lines and combinations thereof. A specific description of abbreviations used below with regard to these tissues/cell lines etc is given above.
3. Protocol of library construction - various methods are known in the art for library construction including normalized library construction; non- normalized library construction; subtracted libraries; ORESTES and others (described in the annotation available in Genbank). It will be appreciated that at times the protocol of library construction is not indicated in the information available about that library. The following rules were followed: EST libraries originating from identical biological samples were considered as a single library. EST libraries which included above-average levels of contamination, such as DNA contamination for example, were eliminated. The presence of such contamination was determined as follows. For each library, the number of unspliced ESTs that are not fully contained within other spliced sequences was counted. If the percentage of such sequences (as compared to all other sequences) was at least 4 standard deviations above the average for all libraries being analyzed, this library was tagged as being contaminated and was eliminated from further consideration in the below analysis (see also Sorek, R. & Safer, H.M. A novel algorithm for computational identification of contaminated EST libraries. Nucleic Acids Res 31, 1067-74 (2003)for further details).
Clusters (genes) having at least five sequences including at least two sequences from the tissue of interest were analyzed. Splice variants were identified by using the LEADS software package as described above.
EXAMPLE 2
Identification of heart tissue specific genes
For detection of heart tissue specific clusters, heart tissue libraries/sequences were compared to the total number of libraries/sequences in the cluster and in Genebank, and to the relevant numbers for muscle tissue libraries/sequences. Statistical tools were employed to identify clusters that were heart tissue specific, both as compared to all other tissues and also in comparison to muscle tissue. The algorithm - for each tested tissue T and for each tested cluster the following were examined: 1. Each cluster includes at least 2 libraries from the tissue T. At least 3 clones
(weighed - as described above) from tissue T in the cluster; 2. The following equation was then used to determine heart tissue-specific expression as compared to expression in all tissue types for a particular cluster: — / in which n Tl N- T - M is the total number of ESTs available for a cluster, while N is the total number of ESTs available in all of the libraries considered in the analysis (effectively all ESTs in Genbank, except for those that were rejected as belonging to contaminated libraries). This ratio was preferably set to be at least about 8, although optionally the ratio could be set to be at least about 5. 3. The following equation was then used to determine heart tissue-specific expression vs.
expression in skeletal muscle tissue for a particular cluster: - / — / // ^ in which t represents the
number of heart tissue- specific ESTs for the cluster, while T is the number of all heart tissue- specific ESTs in the analysis; m is the number of skeletal muscle tissue-specific ESTs for the cluster, while M is the number of all skeletal muscle tissue-specific ESTs in the analysis. This ratio was preferably set to be at least about 4, although optionally the ratio could be set to be at least about 2.
4. Fisher exact test P-values were computed for weighted clone counts to check that the counts are statistically significant according to the following function: F(t,T,n,N) which is the probability of a cluster actually being overexpressed in heart tissue, as compared to its overall level of expression. The P- value was preferably set to be less than about le-5, although optionally it could be set to be less than about le-3.
SELECTING CANDIDATES WITH REGARD TO CANCER A brief explanation is provided with regard to a non- limiting method of selecting the candidates for cancer diagnostics. However, it should noted that this explanation is provided for descriptive puφoses only, and is not intended to be limiting in any way. The potential markers were identified by a computational process that was designed to find genes and/or their splice variants that are over- expressed in tumor tissues, by using databases of expressed sequences. Various parameters related to the information in the EST libraries, determined according to a manual classification process, were used to assist in locating genes and/or splice variants thereof that are over-expressed in cancerous tissues. The detailed description of the selection method is presented in Example 1 below. The cancer biomarkers selection engine and the following wet validation stages are schematically summarized in Figure 1.
PART H - Cancer markers EXAMPLE 1 Identification of differentially expressed gene products - Algorithm In order to distinguish between differentially expressed gene products and constitutively expressed genes (i.e., house keeping genes ) an algorithm based on an analysis of frequencies was configured. A specific algorithm for identification of transcripts over expressed in cancer is described hereinbelow. Dry analysis Library annotation - EST libraries are manually classified according to: (i) Tissue origin (ii) Biological source - Examples of frequently used biological sources for construction of EST libraries include cancer cell- lines; normal tissues;
cancer tissues; fetal tissues; and others such as normal cell lines and pools of normal cell- lines, cancer cell- lines and combinations thereof. A specific description of abbreviations used below with regard to these tissues/cell lines etc is given above.
(iii) Protocol of library construction - various methods are known in the art for library construction including normalized library construction; non-normalized library construction; subtracted libraries; ORESTES and others. It will be appreciated that at times the protocol of library construction is not indicated. The following rules are followed: EST libraries originating from identical biological samples are considered as a single library. EST libraries which include above-average levels of DNA contamination are eliminated. Dry computation - development of engines which are capable of identifying genes and splice variants that are temporally and spacially expressed. Clusters (genes) having at least five sequences including at least two sequences from the tissue of interest are analyzed. EXAMPLE 2 Identification of genes over expressed in cancer. Two different scoring algorithms were developed. Libraries score -candidate sequences which are supported by a number of cancer libraries, are more likely to serve as specific and effective diagnostic markers. The basic algorithm - for each cluster the number of cancer and normal libraries contributing sequences to the cluster was counted. Fisher exact test was used to check if cancer libraries are significantly over-represented in the cluster as compared to the total number of cancer and normal libraries. Library counting: Small libraries (e.g., less than 1000 sequences) were excluded from consideration unless they participate in the cluster. For this reason, the total number of libraries is actually adjusted for each cluster.
Clones no. score - Generally, when the number of ESTs is much higher in the cancer libraries relative to the normal libraries it might indicate actual over- expression. The algorithm - Clone countin : For counting EST clones each library protocol class was given a weight based on our belief of how much the protocol reflects actual expression levels: (i) non-normalized : 1 (ii) normalized : 0.2 (iii) all other classes : 0.1 Clones number score - The total weighted number of EST clones from cancer libraries was compared to the EST clones from normal libraries. To avoid cases where one library contributes to the majority of the score, the contribution of the library that gives most clones for a given cluster was limited to 2 clones. The score was computed as
where: c - weighted number of "cancer" clones in the cluster. C- weighted number of clones in all "cancer" libraries. n - weighted number of "normal" clones in the cluster. N- weighted number of clones in all "normal" libraries. Clones number score significance - Fisher exact test was used to check if EST clones from cancer libraries are significantly over-represented in the cluster as compared to the total number of EST clones from cancer and normal libraries. Two search approaches were used to find either general cancer- specific candidates or tumor specific candidates. • Libraries/sequences originating from tumor tissues are counted as well as libraries originating from cancer cell- lines ("normal" cell- lines were ignored).
• Only libraries/sequences originating from tumor tissues are counted
EXAMPLE 3 Identification of tissue specific genes For detection of tissue specific clusters, tissue libraries/sequences were compared to the total number of libraries/sequences in cluster. Similar statistical tools to those described in above were employed to identify tissue specific genes. Tissue abbreviations are the same as for cancerous tissues, but are indicated with the header "normal tissue". The algorithm - for each tested tissue T and for each tested cluster the following were examined: 1. Each cluster includes at least 2 libraries from the tissue T. At least 3 clones (weighed - as described above) from tissue T in the cluster; and 2. Clones from the tissue T are at least 40 % from all the clones participating in the tested cluster Fisher exact test P-values were computed both for library and weighted clone counts to check that the counts are statistically significant.
EXAMPLE 4 Identification of splice variants over expressed in cancer of clusters which are not over expressed in cancer Cancer-specific splice variants containing a unique region were identified. Identification of unique sequence regions in splice variants A Region is defined as a group of adjacent exons that always appear or do not appear together in each splice variant. A "segment" (sometimes refened also as "seg" or "node") is defined as the shortest contiguous transcribed region without known splicing inside. Only reliable ESTs were considered for region and segment analysis. An EST was defined as unreliable if: (i) Unspliced; (ii) Not covered by RNA; (iii) Not covered by spliced ESTs; and
(iv) Alignment to the genome ends in proximity of long poly-A stretch or starts in proximity of long poly-T stretch. Only reliable regions were selected for further scoring. Unique sequence regions were considered reliable if: (i) Aligned to the genome; and (ii) Regions supported by more than 2 ESTs. The algorithm Each unique sequence region divides the set of transcripts into 2 groups: (i) Transcripts containing this region (group TA). (ii) Transcripts not containing this region (group TB). The set of EST clones of every cluster is divided into 3 groups: (i) Supporting (originating from) transcripts of group TA (SI). (ii) Supporting transcripts of group TB (S2). (iii) Supporting transcripts from both groups (S3). Library and clones number scores described above were given to SI group. Fisher Exact Test P-values were used to check if: S 1 is significantly enriched by cancer EST clones compared to S2; and SI is significantly enriched by cancer EST clones compared to cluster background (S1+S2+S3). Identification of unique sequence regions and division ofthe group of transcripts accordingly is illustrated in Figure 2. Each of these unique sequence regions conesponds to a segment, also termed herein a "node".
Region 1: common to all transcripts, thus it is not considered; Region 2: specific to Transcript 1: T_l unique regions (2+6) against T_2+3 unique regions (3+4); Region 3: specific to Transcripts 2+3: T_2+3 unique regions (3+4) against Tl unique regions (2+6); Region 4: specific to Transcript 3: T_3 unique regions (4) against Tl+2 unique regions (2+5+6); Region 5: specific to Transcript 1+2: T_l+2 unique regions (2+5+6) against T3 unique regions (4); Region 6: specific to Transcript 1 : same as region 2.
EXAMPLE 5 Identification of cancer specific splice variants of genes over expressed in cancer A search for EST supported (no mRNA) regions for genes of: (i) known cancer markers (ii) Genes shown to be over-expressed in cancer in published micro-anay experiments. Reliable EST supported- regions were defined as supported by minimum of one of the following: (i) 3 spliced ESTs; or (ii) 2 spliced ESTs from 2 libraries; (iii) 10 unspliced ESTs from 2 libraries, or (iv) 3 libraries.
Oligonucleotide-based micro-array experiment protocol-
Microarray fabrication Microanays (chips) were printed by pin deposition using the MicroGrid II MGII 600 robot from BioRobotics Limited (Cambridge, UK). 50-mer oligonucleotides target sequences were designed by Compugen Ltd (Tel- Aviv, IL) as described by A. Shoshan et al, "Optical technologies and informatics", Proceedings of SPIE. Vol 4266, pp. 86-95 (2001). The designed oligonucleotides were synthesized and purified by desalting with the Sigma-Genosys system (The Woodlands, TX, US) and all of the oligonucleotides were joined to a C6 amino- modified linker at the 5' end, or being attached directly to CodeLink slides (Cat #25-6700-01. Amersham Bioscience, Piscataway, NJ, US). The 50-mer oligonucleotides, forming the target sequences, were first suspended in Ultra-pure DDW (Cat # 01-866- 1 A Kibbutz Beit-Haemek, Israel) to a concentration of 50μM. Before printing the slides, the oligonucleotides were resuspended in 300mM sodium phosphate (pH 8.5) to final concentration of 150mM and printed at 35-40% relative humidity at 21°C. Each slide contained a total of 9792 features in 32 subanays. Of these features, 4224 features were sequences of interest according to the present invention and negative controls that were printed in duplicate. An additional 288 features (96 target sequences printed in triplicate) contained housekeeping genes from Human Evaluation Library2, Compugen Ltd, Israel.
Another 384 features are E.coli spikes 1-6, which are oligos to E-Coli genes which are commercially available in the Anay Control product (Anay control- sense oligo spots, Ambion Inc. Austin, TX. Cat #1781, Lot #112K06).
Post-coupling processing of printed slides After the spotting of the oligonucleotides to the glass (CodeLink) slides, the slides were incubated for 24 hours in a sealed saturated NaCl humidification chamber (relative humidity 70- 75%). Slides were treated for blocking of the residual reactive groups by incubating them in blocking solution at 50°C for 15 minutes (lOml slide of buffer containing 0.1M Tris, 50mM ethanolamine, 0.1% SDS). The slides were then rinsed twice with Ultra-pure DDW (double distilled water). The slides were then washed with wash solution (lOml/slide. 4X SSC, 0.1% SDS)) at 50°C for 30 minutes on the shaker. The slides were then rinsed twice with Ultra-pure DDW, followed by drying by centrifugation for 3 minutes at 800 φm. Next, in order to assist in automatic operation of the hybridization protocol, the slides were treated with Ventana Discovery hybridization station barcode adhesives. The printed slides were loaded on a Bio-Optica (Milan, Italy) hematology staining device and were incubated for 10 minutes in 50ml of 3-Aminopropyl Triethoxysilane (Sigma A3648 lot #122K589). Excess fluid was dried and slides were then incubated for three hours in 20 mm/Hg in a dark vacuum desiccator (Pelco 2251, Ted Pella, Inc. Redding CA).
The following protocol was then followed with the Genisphere 900- RP (random primer), with mini elute columns on the Ventana Discovery HybStation™, to perform the microanay experiments. Briefly, the protocol was performed as described with regard to the instructions and information provided with the device itself. The protocol included cDNA synthesis and labeling. cDNA concentration was measured with the TBS-380 (Turner Biosystems. Sunnyvale, CA.) PicoFlour, which is used with the OliGreen ssDNA Quantitation reagent and kit. Hybridization was performed with the Ventana Hybridization device, according to the provided protocols (Discovery Hybridization Station Tuscon AZ).
The slides were then scanned with GenePix 4000B dual laser scanner from Axon Instruments Inc, and analyzed by GenePix Pro 5.0 software. Schematic summary ofthe oligonucleotide based microanay fabrication and the experimental flow is presented in Figures 3 and 4. Briefly, as shown in Figure 3, DNA oligonucleotides at 25uM were deposited (printed) onto Amersham 'CodeLink' glass slides generating a well defined 'spot'. These slides are covered with a long-chain, hydrophilic polymer chemistry that creates an active 3-D surface that covalently binds the DNA oligonucleotides 5 '-end via the
C6-amine modification. This binding ensures that the full length of the DNA oligonucleotides is available for hybridization to the cDNA and also allows lower background, high sensitivity and reproducibility.
Figure 4 shows a schematic method for performing the microanay experiments. It should be noted that stages on the left-hand or right-hand side may optionally be performed in any order, including in parallel, until stage 4 (hybridization). Briefly, on the left-hand side, the target oligonucleotides are being spotted on a glass microscope slide (although optionally other materials could be used) to form a spotted slide (stage 1). On the right hand side, control sample RNA and cancer sample RNA are Cy3 and Cy5 labeled, respectively (stage 2), to form labeled probes. It should be noted that the control and cancer samples come from conesponding tissues (for example, normal prostate tissue and cancerous prostate tissue). Furthermore, the tissue from which the RNA was taken is indicated below in the specific examples of data for particular clusters, with regard to overexpression of an oligonucleotide from a "chip" (microanay), as for example "prostate" for chips in which prostate cancerous tissue and normal tissue were tested as described above. In stage 3, the probes are mixed. In stage 4, hybridization is performed to form a processed slide. In stage 5, the slide is washed and scanned to form an image file, followed by data analysis in stage 6.
SECTION 1: VARIANTS OF KNOWN SERUM MARKERS
DESCRIPTION FOR CLUSTER HSGROWl Cluster HSGROWl features 5 transcript(s) and 19 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein vanants are given in table 3.
Table 1 - Transcripts of interest
HSGROWl_PEA 1 PEA 1 T5
HSGROWl PEA 1 PEA 1 T8
HSGROWl PEA 1 PEA 1 T10
HSGROWl PEA 1 PEA 1 Ti l
HSGROWl PEA 1 PEA 1 T16
Table 2 - Segments of interest
HSGROWl PEA 1 PEA 1 node 2
HSGROWl_PEA_l_PEA_l_node 4
HSGROWl PEA 1 PEA 1 node 15
HSGROWl PEA 1 PEA_l_node 18
HSGROWl PEA 1 PEA 1 node 0 10
HSGROWl_PEA 1 PEA_l_node 3 11
HSGROWl PEA 1 PEA 1 node 5 12
HSGROWl_PEA 1 PEA l_node 6 13
HSGROWl PEA 1 PEA 1 node 7 14
HSGROWl PEA 1 PEA l_node 8 15
HSGROWl PEA 1 PEA 1 node 9 16
HSGROWl PEA 1 PEA_l_node 11 17
HSGROWl PEA 1 PEA 1 node 12
HSGROWl_PEA_l PEA_l_node_13 19
HSGROWl PEA 1 PEA 1 node 14 20
HSGROWl PEA 1 PEA 1 node 16 21
HSGROWl PEA 1 PEA_l_node 17 22
HSGROWl_PEA_l_PEA_l_node 19 23
HSGROWl_PEA 1 PEA 1 node 20 24
Table 3 - Proteins of interest
HSGR0W1_PEA_1 PEA 1JP17 26 HSGROW 1_PEA_1_PEA_1_T5
HSGROWl PEA 1 PEA 1 PI 8 27 HSGROWl PEA 1 PEA_1 T8
HSGROWl PEA 1 PEA 1 P9 28 HSGROWl PEA 1 PEA 1 T10
HSGROWl PEA 1 PEA 1 P10 29 HSGROWl PEA 1 PEA 1 Ti l
HSGR0W1_PEA 1_PEA_1_P15 30 HSGROWl PEA_1_PEA_1 T16 These sequences are variants ofthe known protein Somatotropin precursor (SwissProt accession identifier SOMA_HUMAN; known also according to the synonyms Growth hormone; GH; GH-N; Pituitary growth hormone; Growth hormone 1), SEQ ID NO: 25, refened to herein as the previously known protein. Protein Somatotropin precursor is known or believed to have the following function(s): plays an important role in growth control. Its major role in stimulating body growth is to stimulate the liver and other tissues to secrete IGF-1. It stimulates both the differentiation and proliferation of myoblasts. It also stimulates amino acid uptake and protein synthesis in muscle and other tissues. Optional but preferred tests to be performed with this protein comprise GH (Hormones) Growth Hormone and Endocrine syndromes. Growth hormone (GH) tests are used to identify diseases and conditions caused by deficiencies and oveφroduction of GH, to evaluate pituitary function, and to monitor the effectiveness of GH treatment. GH testing is usually ordered on those with symptoms of growth hormone abnormalities or as a follow-up to other abnormal test results. GH tests may be ordered to help evaluate pituitary function (usually GH stimulation tests are used to screen for hypopituitarism and GH suppression tests are used to screen for hypeφituitarism). GH testing may be used to evaluate the long-term effects of chemotherapy on pituitary function in children who undergo such treatment.
The sequence for protein Somatotropin precursor is given at the end of the application, as "Somatotropin precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known ProteinSNP position(s) on Comment amino acid sequence
Protein Somatotropin precursor localization is believed to be Secreted.
The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Diabetes, Type II; Acromegaly; Sex-chromosome abnormality, Turner' s syndrome; Growth hormone deficiency; Dwarfism; Bums; Cachexia; Osteoporosis; Uraemia; Short-bowel syndrome; Lipodysfrophy; Infertility, female; Regeneration, bone; Wound healing. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these
investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities ofthe previously known protein are as follows: Growth factor agonist; Growth hormone releasing factor agonist; Growth hormone modulator. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drag database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Antidiabetic; Symptomatic antidiabetic; Urological; Somatostatin; Anticancer; Ophthalmological; Growth hormone; Reproductive/gonadal, general; Musculoskeletal; Gene therapy; GI inflammatory/bowel disorders; Hypolipaemic/Antiatherosclerosis; Anabolic; Fertility enhancer; Vulnerary; Releasing hormone; Alimentary/Metabolic; Anorectic/Antiobesity. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: signal transduction, which are annotation(s) related to Biological Process; and hormone; peptide hormone, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt TremBl
Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster HSGROWl features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Somatotropin precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSGR0W1_PEA_1_PEA_1_P17 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSGR0W1_PEA_1_PEA_1_T5. An alignment is given to the known protein (Somatotropin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSGR0W1_PEA_1_PEA_1_ P17 and SOMA_HUMAN:
l.An isolated chimeric polypeptide encoding for HSGROWl_PEA_l_PEA_l_P17, comprising a first amino acid sequence being at least 90 % homologous to MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLAFDTYQEF conesponding to amino acids 1 - 57 of SOMA_HUMAN, which also conesponds to amino acids 1 - 57 of HSGR0W1_PEA_1_PEA_1_P17, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSSWGMGAHQGWQEGVTFPRWEIRGGD conesponding to amino acids 58 - 84 of HSGROWl_PEA_l_PEA_l_P17, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSGROWl_PEA_l_PEA_l_P17, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSSWGMGAHQGWQEGVTFPRWEIRGGD in HSGROW 1_PEA_1_PEA_1_P 17.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSGROW 1_PEA_1_PEA_1_P17 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSGROWl_PEA_l_PEA_l_P17 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 5 - Amino acid mutations
SNP position(s) on amino acid Alternative amino acid(s) Previously known SNP?- ««iέ% sequence * -
T -> A Yes The phosphorylation sites of variant protein HSGROWl_PEA_l_PEA_l_P17, as compared to the known protein Somatotropin precursor, are described in Table 6 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 6 - Phosphorylation sιte(s)
Vanant protein HSGROWl_PEA_l_PEA_l_P17 is encoded by the following transcript(s): HSGROWl_PEA_l_PEA_l_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSGROWl_PEA_l_PEA_l_T5 is shown in bold; this coding portion starts at position 109 and ends at position 360. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSGROWl_PEA_l_PEA_l_P17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HSGROWl_PEA_l_PEA_l_P18 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSGROWl_PEA_l_PEA_l_T8. An alignment is given to the known protein (Somatotropin precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSGROWl_PEA_l_PEA_l_P18 and SOMA_HUMAN: l.An isolated chimeric polypeptide encoding for HSGROWl_PEA_l_PEA_l_P18, comprising a first amino acid sequence being at least 90 % homologous to MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLAFDTYQEFEE AYIPKEQKYSFLQNPQTSLCFSESIPTPSNREETQQ conesponding to amino acids 1 - 95 of SOMA_HUMAN, which also conesponds to amino acids 1 - 95 of HSGROW 1_PEA_1_PEA_1_P 18, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence T conesponding to
amino acids 96 - 96 of HSGROWl_PEA_l_PEA_l_P18, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSGROWl_PEA_l_PEA_l_P18 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSGROWl_PEA_l_PEA_l_P18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
The phosphorylation sites of variant protein HSGROWl_PEA_l_PEA_l_P18, as compared to the known protein Somatotropin precursor, are described in Table 9 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Phosphorylation site(s) fgosrhgn/s) on known amin ! presen in variant protein?* aciαlsequence" 178 no 163 no
Variant protein HSGROW 1_PEA_1_PEA_1_P 18 is encoded by the following transcript(s): HSGROW 1_PEA_1_PEA_1_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSGROWl_PEA_l_PEA_l_T8 is shown in bold; this coding portion starts at position 109 and ends at position 396. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSGROWl_PEA_l_PEA_l_P18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HSGROWl_PEA_l_PEA_l_P9 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSGROW1_PEA_1_PEA_1_T10. An alignment is given to the known protein (Somatotropin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSGROWl_PEA_l_PEA_l_P9 and SOMA_HUMAN: l.An isolated chimeric polypeptide encoding for HSGROWl_PEA_l_PEA_l_P9, comprising a first amino acid sequence being at least 90 % homologous to MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLAFDTYQEFEE AYIPKEQKYSFLQNPQTSLCFSESIPTPSNREETQQKSNLELLRISLLLIQSWLEPVQFLRS VFANSLVYGASDSNVYDLLKDLEEGIQTLMG conesponding to amino acids 1 - 152 of SOMA_HUMAN, which also conesponds to amino acids 1 - 152 of HSGROWl_PEA_l_PEA_l_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRVAPGVPNPGAPLTLRAVLEKHCCPLFSSQALTQENSPYSSFPLVNPPGLSLHPEGEGG K conesponding to amino acids 153 - 213 of HSGR0W1_PEA_1_PEA_1_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSGROWl_PEA_l_PEA_l_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRVAPGVPNPGAPLTLRAVLEKHCCPLFSSQALTQENSPYSSFPLVNPPGLSLHPEGEGG K in HSGROW 1_PEA_1 PEA 1 P9.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSGROWl_PEA_l_PEA_l_P9 also has the following non-silent SNPs (Single Nuc leotide Polymoφhisms) as listed in Table 11 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSGROWl_PEA_l_PEA_l_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
The phosphorylation sites of variant protein HSGROWl_PEA_l_PEA_l_P9, as compared to the known protein Somatotropin precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Phosphorylation site(s)
Variant protein HSGR0W1_PEA_1_PEA_1_P9 is encoded by the following transcript(s): HSGROW1_PEA_1_PEA_1_T10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSGROW1_PEA_1_PEA_1_T10 is shown in bold; this coding portion starts at position 109 and ends at position 747. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSGR0W1_PEA_1_PEA_ _P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein HSGROW1_PEA_1_PEA_1_P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSGROW 1_PEA_1_PEA_1_T11. An alignment is given to the known protein (Somatotropin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSGROW1_PEA_1_PEA_1_P10 and SOMA_HUMAN: l.An isolated chimeric polypeptide encoding for HSGROW1_PEA_1_PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLAFDTYQEF conesponding to amino acids 1 - 57 of SOMA_HUMAN, which also conesponds to amino acids 1 - 57 of HSGROW1_PEA_1_PEA_1_P10, and a second amino acid sequence being at least 90 % homologous to LVYGASDSNVYDLLKDLEEGIQTLMGRLEDGSPRTGQIFKQTYSKFDTNSHNDDALLK NYGLLYCFRKDMDKVETFLRIVQCRSVEGSCGF conesponding to amino acids 127 - 217 of SOMA_HUMAN, which also conesponds to amino acids 58 - 148 of HSGROW1_PEA_1_PEA_1_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSGROW1_PEA_1_PEA_1_P10, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length,
preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise FL, having a structure as follows: a sequence starting from any of amino acid numbers 57-x to 57; and ending at any of amino acid numbers 58+ ((n-2) - x), in which x varies
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSGROW1_PEA_1_PEA_1_P10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSGROW1_PEA_1_PEA_1_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 -Amino acid mutations
The phosphorylation sites of variant protein HSGROW1_PEA_1_PEA_1_P10, as compared to the known protein Somatotropin precursor, are described in Table 15 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein).
Table 15 - Phosphorylation site(s)
Variant protein HSGROW1_PEA_1_PEA_1_P10 is encoded by the following transcript(s): HSGR0W1_PEA_1_PEA_1_T11, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSGR0W1_PEA_1_PEA_1_T11 is shown in bold; this coding portion starts at position 109 and ends at position 552. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSGROW1_PEA_1_PEA_1_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein HSGROW 1_PEA_1_PEA_1_P 15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSGROWl_PEA_l_PEA_l_T16. An alignment is given to the known protein (Somatotropin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSGROW 1_PEA_1_PEA_1_P 15 and SOMA_HUMAN: 1.An isolated chimeric polypeptide encoding for HSGROWl_PEA_l_PEA_l_P15, comprising a first amino acid sequence being at least 90 % homologous to MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLAFDTYQEF conesponding to amino acids 1 - 57 of SOMA_HUMAN, which also conesponds to amino acids 1 - 57 of HSGROWl_PEA_l_PEA_l_P15, and a second amino acid sequence being at least 90 % homologous to RLEDGSPRTGQIFKQTYSKFDTNSHNDDALLKNYGLLYCFRKDMDKVETFLRIVQCRS VEGSCGF conesponding to amino acids 153 - 217 of SOMA_HUMAN, which also conesponds to amino acids 58 - 122 of HSGROWl_PEA_l_PEA_l_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSGROW 1_PEA_1_PEA_1_P 15, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise FR, having a structure as follows: a sequence starting from any of amino acid numbers 57-x to 57; and ending at any of amino acid numbers 58+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSGROWl_PEA_l_PEA_l_P15 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSGROWl_PEA_l_PEA_l_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
The phosphorylation sites of variant protein HSGROWl_PEA_l_PEA_l_P15, as compared to the known protein Somatotropin precursor, are described in Table 18 (given according to their posιtion(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorylation site is present in the variant protein; and the last column indicates whether the position is different on the vanant protein). Table 18 - Phosphorylation site(s)
Variant protein HSGROWl_PEA_l_PEA_l_P15 is encoded by the following transcript(s): HSGROW 1_PEA_1_PEA_1_T 16, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSGROWl_PEA_l_PEA_l_T16 is
shown in bold; this coding portion starts at position 109 and ends at position 474. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSGROWl_PEA_l_PEA_l_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
As noted above, cluster HSGROWl features 19 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSGROWl_PEA_l_PEA_l_node_2 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSGROWl_PEA_l_PEA_l_T5, HSGROW1_PEA_1_PEA_1_T8, HSGROW1_PEA_1_PEA_1_T10, HSGR0W1_PEA_1_PEA_1_T11 and HSGROWl_PEA_l_PEA_l_T16. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HSGROW l_PEA_l_PEA_l_node_4 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSGROWl_PEA_l_PEA_l_T5. Table 21 below describes the starting and ending position of this segment on each transcnpt. Table 21 - Segment location on transcripts
Segment cluster HSGROWl_PEA_l_PEA_l_node_15 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSGROW1_PEA_1_PEA_1_T10. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HSGROWl_PEA_l_PEA_l_node_18 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSGROWl_PEA_l_PEA_l_T5, HSGROWl_PEA_l_PEA_l_T8, HS GROW1_PEA_1_PEA_1_T10, HSGROWl_PEA_l_PEA_l_Tl l and HSGROWl_PEA_l_PEA_l_T16. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSGROWl_PEA_l_PEA_l_node_0 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following transcπpt(s): HSGROWl_PEA_l_PEA_l_T5, HSGROW1_PEA_1_PEA_1_ T8, HSGROW1_PEA_1_PEA_1_T10, HSGROWl_PEA_l_PEA_l_Tl l and HSGROWl_PEA_l_PEA_l_T16. Table 24 below describes the starting and ending position of this segment on each transcπpt. Table 24 - Segment location on transcripts
Segment cluster HSGROW l_PEA_l_PEA_l_node_3 according to the present invention can be found in the following transcnpt(s): HSGR0W1_PEA_1_PEA_1_T5, HSGROW1_PEA_1_PEA_1_T8, HSGROW1_PEA_1_PEA_1_T10, HSGROWl_PEA_l_PEA_l_Tl 1 and HSGROWl_PEA_l_PEA_l_T16. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HSGROWl_PEA_l_PEA_l_node_5 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously descnbed. This segment can be found in the following transcript(s): HSGR0W1_PEA_1_PEA_1_T5, HSGROW 1_PEA_1_PEA_1_T8 and HSGROW1_PEA_1_PEA_1_T10. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HSGROW l_PEA_l_PEA_l_node_6 according to the present invention can be found in the following transcript(s): HSGROWl_PEA_l_PEA_l_T5, HSGROWl_PEA_l_PEA_l_T8 and HSGROW1_PEA_1_PEA_1_T10. Table 27 below describes the starting and ending position of this segment on each transcript.
Table 27 - Segment location on transcripts
Segment cluster HSGROWl_PEA_l_PEA_l_node_7 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSGROWl_PEA_l_PEA_l_T5, HSGROWl_PEA_l_PEA_l_T8 and HSGROWl_PEA_l_PEA_l_T10. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HSGROWl_PEA_l_PEA_l_node_8 according to the present invention can be found in the following transcript(s) : HSGROWl_PEA_l_PEA_l_T5, HSGROW 1_PEA_1_PEA_1_T8 and HSGROW1_PEA_1_PEA_1_T10. Table 29 below descπbes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster HSGROWl_PEA_l_PEA_l_node_9 according to the present invention can be found in the following transcπpt(s). HSGROW 1_PEA_1_PEA_1_T5 and
HSGROW1_PEA_1_PEA_1_T10. Table 30 below describes the starting and ending position of this segment on each transcπpt. Table 30 - Segment location on transcripts
Segment cluster HSGROWl_PEA_l_PEA_l_node_l 1 according to the present invention is supported by 15 libraπes. The number of libraries was determined as previously descπbed. This segment can be found in the following transcnpt(s): HSGROWl_PEA_l_PEA_l_T5, HSGROWl_PEA_l_PEA_l_T8 and HSGROWl_PEA_l_PEA_l_T10. Table 31 below describes the starting and ending position of this segment on each transcπpt. Table 31 - Segment location on transcripts
Segment cluster HSGROW l_PEA_l_PEA_l_node_12 according to the present invention can be found in the following transcπpt(s): HSGROWl_PEA_l_PEA_l_T5, HSGROWl_PEA_l_PEA_l_T8 and HSGROW1_PEA_1_PEA_1_T10 Table 32 below descπbes the starting and ending position of this segment on each transcπpt. Table 32 - Segment location on transcripts
Segment cluster HSGROW l_PEA_l_PEA_l_node_l 3 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSGROW 1_PEA_1_PEA_1_T5, HSGROWl_PEA_l_PEA_l_T8 and HSGROW1_PEA_1_PEA_1_T10. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HSGROWl_PEA_l_PEA_l_node_14 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSGROWl_PEA_l_PEA_l_T5, HSGROWl_PEA_l_PEA_l_T8, HSGROW1_PEA_1_PEA_1_T10 and HSGROWl_PEA_l_PEA_l_Tl 1. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster HSGROW l_PEA_l_PEA_l_node_l 6 according to the present invention can be found in the following transcript(s): HSGROW 1_PEA_1_PEA_1_T5, HSGROWl_PEA_l_PEA_l_T8, HSGROW1_PEA_1_PEA_1_T10, HSGROWl_PEA_l_PEA_l_Tl 1 and HSGROWl_PEA_l_PEA_l_T16. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HSGROW l_PEA_l_PEA_ l_node_ 17 according to the present invention can be found in the following transcript(s): HSGROWl_PEA_l_PEA_l_T5, HSGROW1_PEA_1_PEA_1_T8, HSGROW1_PEA_1_PEA_1_T10, HSGROW 1_PEA_1_PEA_1_T 11 and HSGROWl_PEA_l_PEA_l_T16. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster HSGROW l_PEA_l_PEA_l_node_l 9 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSGROWl_PEA_l_PEA_l_T5, HSGROW1_PEA_1_PEA_1_T8, HSGROW1_PEA_1_PEA_1_T10, HSGROWl_PEA_l_PEA_l_Tl l and HSGROWl_PEA_l_PEA_l_T16. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts ranscnpfcname Segment n_ ~' - « Segment £ £ i m,". !t!%nlptenf %nd ff osition: HSGROWl_PEA_l_PEA_l_T5 968 1022 HSGROWl PEA_1_PEA 1 T8 752 806 HSGROWl PEA 1 PEA 1 T10 1012 1066
Segment cluster HSGROW l_PEA_l_PEA_l_node_20 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSGROWl_PEA_l_PEA_l_T5, HSGROW1_PEA_1_PEA_1_T8, HSGROW1_PEA_1_PEA_1_T10, HSGROWl_PEA_l_PEA_l_Tl 1 and HSGROWl_PEA_l_PEA_l_T16. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name : SOMA_HUMAN Sequence documentation: Alignment of: HSGRO l_PEA_l_PEA_l_P17 x SO A_HUMAN Alignment segment 1/1: Quality: 559.00 Escore : 0 Matching length: 57 Total length: 57 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MATGSRTS LLAFGLLCLP LQEGSAFPTIPLSRLFDNAMLRAHRLHQLA 50 1 MATGSRTSLL AFGLLC P LQEGSAFPTIPLSRLFDNAMLRAHRLHQLA 50 51 FDTYQEF 57 MINI 51 FDTYQEF 57
Sequence name : SOMA_HUMAN
Sequence documentation: Alignment of: HSGROWl_PEA_l_PEA_l_P18 x SOMA_HUMAN
Alignment segment 1/1: Quality: 946.00
Escore : 0 Matching length: 95 Total length: 95 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment : 1 MATGSRTSLLLAFGLLCLP LQEGSAFPTIP SRLFDNAMLRAHRLHQLA 50 1 MATGSRTSLLLAFGLLCLP LQEGSAFPTIPLSRLFDNAM RAHRLHQLA 50 51 FDTYQEFEEAYIPKEQKYSFLQNPQTSLCFSESIPTPSNREETQQ 95 51 FDTYQEFEEAYIPKEQKYSFLQNPQTS CFSESIPTPSNREETQQ 95
Sequence name : SOMA_HUMA
Sequence documentation:
Alignment of: HSGR0 1__PEA_1_PEA_1_P9 x SOMA_HUMAN
Alignment segment 1/1: Quality: 1484.00
Escore: 0 Matching length: 152 Total length: 152 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLA 50 1 MATGSRTS LLAFGLLCLP LQEGSAFPTIPLSRLFDNAMLRAHRLHQLA 50 . . . . . 51 FDTYQEFEEAYIPKEQKYSFLQNPQTSLCFSESIPTPSNREETQQKSNLE 100 IIIIMIIIII 51 FDTYQEFEEAYIPKEQKYSFLQNPQTSLCFSESIPTPSNREETQQKSNLE 100 101 L RISLL IQS LEPVQFLRSVFANSLVYGASDSNVYDLLKDLEEGIQTL 150 101 L RISLL IQSWLEPVQFLRSVF7ANSLVYGASDSNVYDLLKDLEEGIQTL 150 151 MG 152 151 MG 152
Sequence name: SOMA_HUMAN Sequence documentation:
Alignment of: HSGROWl PEA 1 PEA_1_P10 x SOMA_HUMAN
Al ignment segment 1/ 1 : Qual ity : 1351 . 00
Escore: 0 Matching length: 148 Total length: 217 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 68.20 Total Percent Identity: 68.20 Gaps : 1
Alignment : 1 MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAM RAHRLHQLA 50 1 MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLA 50 51 FDTYQEF 57 I'M! 51 FDTYQEFEEAYIPKEQKYSFLQNPQTSLCFSESIPTPSNREETQQKSNLE 100 58 LVYGASDSNVYDLLKDLEEGIQTL 81 101 LLRISLLLIQSWLEPVQFLRSVFANSLVYGASDSNVYDLLKDLEEGIQTL 150 82 MGRLEDGSPRTGQIFKQTYSKFDTNSHNDDAL KNYGLLYCFRKDMDKVE 131 151 MGRLEDGSPRTGQIFKQTYSKFDTNSHNDDALLKNYGLLYCFRKDMDKVE 200 132 TFLRIVQCRSVEGSCGF 148 201 TFLRIVQCRSVEGSCGF 217
Sequence name: SOMA_HUMA
Sequence documentation: Alignment of: HSGR0W1_PEA_1_PEA_1_P15 x SOMA_HUMA Alignment segment 1/1:
Quality: 1107.00 Escore: 0 Matching length: 122 Total length: 217 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 56.22 Total Percent Identity: 56.22 Gaps : 1
Alignment : 1 MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLA 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MATGSRTSLLLAFGLLCLPWLQEGSAFPTIPLSRLFDNAMLRAHRLHQLA 50 51 FDTYQEF 57 51 FDTYQEFEEAYIPKEQKYSFLQNPQTSLCFSESIPTPSNREETQQKSNLE 100 57 57 101 LLRISLLLIQSWLEPVQFLRSVFANSLVYGASDSNVYDLLKDLEEGIQTL 150 . . . . . 58 ..RLEDGSPRTGQIFKQTYSKFDTNSHNDDALLKNYGLLYCFRKDMDKVE 105 151 MGRLEDGSPRTGQIFKQTYSKFDTNSHNDDALLKNYGLLYCFRKDMDKVE 200 106 TFLRIVQCRSVEGSCGF 122 201 TFLRIVQCRSVEGSCGF 217
DESCRIPTION FOR CLUSTER T05709
Cluster T05709 feamres 5 transcript(s) and 24 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Glutamate carboxypeptidase II (SwissProt accession identifier FOHl_HUMAN; known also according to the synonyms EC 3.4.17.21; Membrane glutamate carboxypeptidase; PSMA; PSM; mGCP; N-acetylated- alpha- linked acidic dipeptidase I; NAALADase I; Pteroylpoly- gamma- glutamate carboxypeptidase; Folylpoly- gamma- glutamate carboxypeptidase; FGCP; Folate hydrolase 1; Prostate- specific membrane antigen), SEQ ID NO:60, refened to herein as the previously known protein. Protein Glutamate carboxypeptidase II is known or believed to have the following function(s): has both folate hydrolase and N-acetylated- alpha- linked- acidic dipeptidase (NAALADase) activity. Has a preference for tri- alpha- glutamate peptides. In the intestine, required for the uptake of folate. In the brain, modulates excitatory neurotransmission through the hydrolysis of the neuropeptide, N- aceylaspartylglutamate (NAAG), thereby releasing glutamate. Stable at pH greater than 6.5. Isoforms PSM-4 and PSM-5 would appear to be physiologically inelevant. Involved in prostate tumor progression;Also exhibits a dipeptidyl- peptidase IV type activity. In vitro, cleaves Gly-Pro-AMC. Variants according to the present invention may optionally be used for the following test: PSMA (Cancer). Prostate- specific membrane antigen, or PSMA, is a type II transmembrane protein that is as an important marker associated with prostate cancer. A monoclonal antibody directed against PSMA is the basis of a molecular imaging technique based on PSMA and this test showed specific value in the detection of recunences after local therapy for prostate cancer. PSMA overexpression is detected by immunohistochemistry in high-grade prostatic intraepithelial neoplasia and is associated with a higher Gleason score of prostate cancer. PSMA has also been found to be present in the new blood vessels formed in association with a variety of other major solid tumors.
The sequence for protein Glutamate carboxypeptidase II is given at the end of the application, as "Glutamate carboxypeptidase II amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Glutamate carboxypeptidase II localization is believed to be Type II membrane protein. Plasma membrane. The PSMA' isoform is cytoplasmic.
It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: T cell stimulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Immunoconjugate; Imaging agent; Anticancer; Immunostimulant. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: proteolysis and peptidolysis, which are annotation(s) related to
Biological Process; carboxypeptidase; peptidase; metallopeptidase; dipeptidase, which are annotation(s) related to Molecular Function; and membrane fraction; integral plasma membrane protein; membrane, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster T05709 features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Glutamate carboxypeptidase II. A description of each variant protein according to the present invention is now provided.
Variant protein T05709_PEA_1_P3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s)
T05709_PEA_1_T2. An alignment is given to the known protein (Glutamate carboxypeptidase II) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05709_PEA_1_P3 and Q8TAY3 (SEQ ID NO: 1424): l.An isolated chimeric polypeptide encoding for T05709_PEA_1_P3, comprising a first amino acid sequence being at least 90 % homologous to MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFL YNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDL VYVNY ARTEDFFKLERDMKTNCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKSYPDGWNLPGGGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVH PIGYYDAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTNEVT RIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAVVHEIVRSFGTLKKEGWRP RRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYINADSSIEGNYTLRVDCTPLMY
SLVHNLTKELKSPDEGFEGKSLYESWTKKSPSPEFSGMPRISKLGSGNDFEVFFQRLGIA
SGRARYTKNWETNKFSGYPLYHSVYETYELVEKFYDPMFKYHLTVAQVRGGMVFELA
NSIVLPFDCRDYAWLRKYADKIYSISMKHPQEMKTYSVSFDSLFSAVKNFTEIASKFSE
RLQDFDKS corresponding to amino acids 1 - 656 of Q8TAY3, which also conesponds to amino acids 1 - 656 of T05709_PEA_1_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MSSMLQAATTSMQGSHSQEFMMLCLILKAKWTLPRPGEK conesponding to amino acids 657 - 695 of T05709_PEA_1_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T05709_PEA_1_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MSSMLQAATTSMQGSHSQEFMMLCLILKAKWTLPRPGEK in T05709_PEA_1_P3. Comparison report between T05709_PEA_1_P3 and FOHl_HUMAN (SEQ ID NO:60): 1.An isolated chimeric polypeptide encoding for T05709_PEA_1_P3, comprising a first amino acid sequence being at least 90 % homologous to MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFL YNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDLVYVNY ARTEDFFKLERDMKINCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKSYPDGWNLPGGGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVH PIGYYDAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTNEVT WYNVIGTLRGAλ^PDRYVILGGHRDSWWGGIDPQSGAAVVHEIVRSFGTLKKEGWRP RRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYINADSSIEGNYTLRVDCTPLMY SLVHNLTKELKSPDEGFEGKSLYESWTKKSPSPEFSGMPRISKLGSGNDFEVFFQRLGIA SGPvARYTKN ETNKFSGYPLYHSVYETYELVEKFYDPMFKYHLTVAQVRGGMVFELA NSIVLPFDCRDYAVVLRKYADKIYSISMKHPQEMKTYSVSFDSLFSAVKNFTEIASKFSE RLQDFDKS conesponding to amino acids 1 - 656 of FOHl_HUMAN, which also conesponds to amino acids 1 - 656 of T05709_PEA_1_P3, and a second amino acid sequence being at least
70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MSSMLQAATTSMQGSHSQEFMMLCLILKAKWTLPRPGEK conesponding to amino acids 657 - 695 of T05709_PEA_1_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T05709_PEA_1_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MSSMLQAATTSMQGSHSQEFMMLCLILKAKWTLPRPGEK in T05709_PEA_1 P3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because the Signalp_hmm software predicts that this protein has a signal anchor region.. Variant protein T05709_PEA_1_P3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05709_PEA_1_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein T05709_PEA_1_P3 is encoded by the following transcript(s): T05709_PEA_1_T2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T05709_PEA_1_T2 is shown in bold; this coding portion starts at position 262 and ends at position 2346. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05709_PEA_1_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein T05709_PEA_1_P8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T05709_PEA_1_T7. An alignment is given to the known protein (Glutamate carboxypeptidase II) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between T05709_PEA_1_P8 and Q8TAY3: l.An isolated chimeric polypeptide encoding for T05709_PEA_1_P8, comprising a first amino acid sequence being at least 90 % homologous to
MW LLIXETDSAVATAPRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL
SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDΓVPPFSAFSPQGMPEGDL VYVNY
ARTEDFFKLERDMKTNCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKSYPDGWNLPGGGVQRGNILNLNGAGDPLTPGYPANEY A YRRGIAEAVGLPSIPVH PIGYYDAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTNEVT RIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAA WHEIVRSFGTLKKEGWRP RRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYINADSSIEGNYTLRVDCTPLMY SLVHNLTKE conesponding to amino acids 1 - 480 of Q8TAY3, which also conesponds to amino acids 1 - 480 of T05709_PEA_1_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VL conesponding to amino acids 481 - 482 of T05709_PEA_1_P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. Comparison report between T05709_PEA_1_P8 and FOHl UMAN: l.An isolated chimeric polypeptide encoding for T05709_PEA_1_P8, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFL YNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDL VYVNY ARTEDFFKLERDMKΓNCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKSYPDGWNLPGGGVQRGNILNLNGAGDPLTPGYPANEY A YRRGIAEAVGLPSIPVH PIGYYDAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTNEVT RIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAVVHEIVRSFGTLKKEGWRP RRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYΓNADSSIEGNYTLRVDCTPLMY
SLVHNLTKE conesponding to amino acids 1 - 480 of FOHI TUMAN, which also conesponds to amino acids 1 - 480 of T05709_PEA_1_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least
90% and most preferably at least 95% homologous to a polypeptide having the sequence VL conesponding to amino acids 481 - 482 of T05709_PEA_1_P8, wherein said first and second amino acid sequences are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because the Signalp_hmm software predicts that this protein has a signal anchor region.. Variant protein T05709_PEA_1_P8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05709_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein T05709_PEA_1_P8 is encoded by the following transcript(s): T05709_ PEA_1_T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T05709_PEA_1_T7 is shown in bold; this coding portion starts at position 262 and ends at position 1707. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05709_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 8 - Nucleic acid SNPs
Variant protein T05709_PEA_1_P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T05709_PEA_1_T8. An alignment is given to the known protein (Glutamate carboxypeptidase II) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05709_PEA_1_P9 and Q8TAY3: l.An isolated chimenc polypeptide encoding for T05709_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFL YNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDLVYVNY ARTEDFFKLERDMKINCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKSYPDGWNLPGGGVQRGNILNLNGAGDPLTPGYPAN conesponding to amino acids 1 - 275 of Q8TAY3, which also conesponds to amino acids 1 - 275 of T05709_PEA_1_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GE conesponding to amino acids 276 - 277 of T05709_PEA_1_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order.
Comparison report between T05709_PEA_1_P9 and FOHl_HUMAN: l.An isolated chimeric polypeptide encoding for T05709_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFL YNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDL VYVNY ARTEDFFKLERDMKINCSGKIVIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAP GVKSYPDGWNLPGGGVQRGNILNLNGAGDPLTPGYPAN conesponding to amino acids 1 - 275 of FOHl_HUMAN, which also corresponds to amino acids 1 - 275 of T05709_PEA_1_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GE conesponding to amino acids 276 - 277 of T05709_PEA_1_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because the Signalp_hmm software predicts that this protein has a signal anchor region.. Variant protein T05709_PEA_1_P9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acιd(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05709_PEA_1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 -Amino acid mutations
Variant protein T05709_PEA_1_P9 is encoded by the following transcript(s): T05709_PEA_1_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T05709_PEA_1_T8 is shown in bold; this coding portion starts at position 262 and ends at position 1092. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05709_PEA_1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein T05709_PEA_1_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T05709_PEA_1_T5. An alignment is given to the known protein (Glutamate carboxypeptidase II) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between T05709_PEA_1_P13 and Q8TAY3: l.An isolated chimeric polypeptide encoding for T05709_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFL YNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDLVYVNY ARTEDFFKLERDMKTNCSGKIVIARYGKVFRGNK conesponding to amino acids 1 - 213 of Q8TAY3, which also corresponds to amino acids 1 - 213 of T05709_PEA_1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NMLIGVELQRLLVFQVFLFIQLDTMMHRSS conesponding to amino acids 214 - 243 of T05709_PEA_1_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T05709_PEA_1_P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NMLIGVELQRLLVFQVFLFIQLDTMMHRSS in T05709_PEA_1_P13. Comparison report between T05709_PEA_1_P13 and FOHl_HUMAN: l.An isolated chimeric polypeptide encoding for T05709_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFILGFLFGWFIKSSNEATNITPKHNM KAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQWKEFGLDSVELAHYDVLL SYPNKTHPNYISIINEDGNEIFNTSLFEPPPPGYENVSDIVPPFSAFSPQGMPEGDL VYVNY ARTEDFFKLERDMKXNCSGKIVIARYGKVFRGNK conesponding to amino acids 1 - 213 of FOHl_HUMAN, which also conesponds to amino acids 1 - 213 of T05709_PEA_1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NMLIGVELQRLLVFQVFLFIQLDTMMHRSS conesponding to amino acids 214 - 243 of T05709_PEA_1_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order.
2.An isolated polypeptide encoding for a tail of T05709_PEA_1_P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NMLIGVELQRLLVFQVFLFIQLDTMMHRSS in T05709_PEA_1_P13.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows withregard to the cell: membrane. The protein localization is believed to be membrane because the Signalp_hmm software predicts that this protein has a signal anchor region.. Variant protein T05709_PEA_1_P13 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05709_PEA_1_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 -Amino acid mutations
Variant protein T05709_PEA_1_P13 is encoded by the following transcript(s): T05709_PEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T05709_PEA_1_T5 is shown in bold; this coding portion starts at position 262 and ends at position 990. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05709_PEA_1_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein T05709_PEA_1_P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T05709_PEA_1_T3. An alignment is given to the known protein (Glutamate carboxypeptidase II) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05709_PEA_1_P14 and Q8TAY3: l .An isolated chimeric polypeptide encoding for T05709_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLF conesponding to amino acids 1 - 39 of Q8TAY3, which also conesponds to amino acids 1 - 39 of T05709_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKVGKRN conesponding to amino acids 40 - 46 of T05709_PEA_1_P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T05709_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%,
more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKVGKRN in T05709_PEA_1_P14. Comparison report between T05709_PEA_1_P14 and FOHIJHUMAN: l.An isolated chimeric polypeptide encoding for T05709_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to
MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLF conesponding to amino acids
1 - 39 of FOHl_HUMAN, which also conesponds to amino acids 1 - 39 of
T05709_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least
80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKVGKRN conesponding to amino acids 40
- 46 of T05709_PEA_1_P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T05709_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKVGKRN in T05709_PEA_1_P14.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because the Signalp imm software predicts that this protein has a signal anchor region..
Variant protein T05709_PEA_1_P14 is encoded by the following transcript(s): T05709_PEA_1_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T05709_PEA_1_T3 is shown in bold; this coding portion starts at position 262 and ends at position 399. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05709_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 13 - Nucleic acid SNPs
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T05709_PEA_l_node_l according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5, T05709_PEA_1_T7 and T05709_PEA_1_T8. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster T05709_PEA_l_node_15 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5, T05709_PEA_1_T7 and T05709_PEA_1_T8. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
T05709 PEA 1 T2 775 900 T05709 PEA 1 T3 828 953 T05709 PEA 1 T5 775 900 T05709 PEA 1 T7 775 900 T05709 PEA 1 T8 775 900
Segment cluster T05709_PEA_l_node_17 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_ PEA_1_T3, T05709_PEA_1_T7 and T05709_PEA_1_T8. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
T05709 PEA 1 T2 901 1087 T05709 PEA 1 T3 954 1140 T05709 PEA 1 T7 901 1087 T05709 PEA 1 T8 901 1087
Segment cluster T05709_PEA_l_node_18 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T8. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster T05709_PEA_l_node_29 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5 and T05709_PEA_1_T7. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
T05709_PEA_1_T2 1367 1486 T05709 PEA 1 T3 1420 1539 T05709_PEA_1_T5 1 180 1299 T05709 PEA 1 T7 1367 1486
Segment cluster T05709_PEA_l_node_37 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T7. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster T05709_PEA_l_node_43 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3 and T05709_PEA_1_T5. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
T05709_PEA 1 T5 1698 1962
Segment cluster T05709_PEA_l_node_50 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3 and T05709_PEA_1_T5. Table 21 below describes the starting and ending position of this segment on each transcript.
Table 21 - Segment location on transcripts segmenfcendiHE
T05709 PEA 1 T2 2228 2538
T05709 PEA 1_T3 2378 2688
T05709 PEA 1 T5 2138 2448
Segment cluster T05709_PEA_l_node_9 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5, T05709_PEA_1_T7 and T05709_PEA_1_T8. Table 22 below describes the starting and ending position of this segment on each transcript.
Table 22 - Segment location on transcripts
T05709 PEA 1 T2 486 672
T05709 PEA 1 T3 539 725
T05709 PEA 1 T5 486 672
T05709 PEA 1 T7 486 672
T05709 PEA 1 T8 486 672
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T05709_PEA_l_node_0 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_ PEA_1_T3, T05709_PEA_1_T5, T05709_PEA_1_T7 and T05709_PEA_1_T8. Table 23 below describes the starting and ending position of this segment on each transcript.
Table 23 - Segment location on transcripts
T05709 PEA 1 T2 113
T05709 PEA 1 T3 113
T05709 PEA 1 T5 113
T05709 PEA 1 T7 113
T05709 PEA 1 T8 113
Segment cluster T05709_ PEA_l_node_12 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5, T05709_PEA_1_T7 and T05709_ PEA_1_T8. Table 24 below describes the starting and ending position of this segment on each transcript.
Table 24 - Segment location on transcripts
T05709 PEA 1 T2 673 774
T05709_PEA 1 T3 726 827
T05709 PEA 1 T5 673 774
T05709 PEA 1 T7 673 774
T05709 PEA 1 T8 673 774
Segment cluster T05709_PEA_l_node_20 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5 and T05709_PEA_1_T7. Table 25 below describes the starting and ending position of this segment on each transcript.
Table 25 - Segment location on transcripts
Segment cluster T05709_PEA_l_node_23 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5 and T05709_PEA_1_T7. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster T05709_PEA_l_node_26 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5 and T05709_PEA_1_T7. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster T05709_PEA_l_node_31 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment
can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5 and T05709_PEA_1_T7. Table 28 below describes the starting and ending position of this segment on each franscnpt. Table 28 - Segment location on transcripts
T05709 PEA 1 T2 1487 1569 T05709 PEA 1 T3 1540 1622 T05709 PEA 1 T5 1300 1382 T05709 PEA 1 T7 1487 1569
Segment cluster T05709_PEA_l_node_33 according to the present invention is supported by 35 libraries. The number of libranes was determined as previously descnbed. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5 and T05709_PEA_1_T7. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
T05709 PEA 1 T2 1570 1633 T05709 PEA 1 T3 1623 1686 T05709 PEA 1 T5 1383 1446 T05709 PEA 1 T7 1570 1633 Segment cluster T05709_PEA_l_node_35 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5 and T05709_PEA_1_T7. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
T05709_PEA 1 T7 1634 1701
Segment cluster T05709_PEA_l_node_39 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3 and T05709_PEA_1_T5. Table 31 below describes the starting and ending position of this segment on each transcript.
Table 31 - Segment location on transcripts
T05709 PEA 1 T2 1702 1793
T05709 PEA 1 T3 1755 1846
T05709 PEA 1 T5 1515 1606
Segment cluster T05709_PEA_l_node_41 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3 and T05709JPEA_1_T5. Table 32 below describes the starting and ending position of this segment on each transcript.
Table 32 - Segment location on transcripts
T05709 PEA 1 T2 1794 1884
T05709 PEA 1 T3 1847 1937
T05709_PEA 1_T5 1607 1697
Segment cluster T05709_PEA_l_node_45 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3 and T05709_PEA_1_T5. Table 33 below describes the starting and ending position of this segment on each transcript.
Table 33 - Segment location on transcripts
Segment cluster T05709_PEA_l_node_46 according to the present invention can be found in the following transcript(s): T05709_PEA_1_T3 and T05709_PEA_1_T5. Table 34 below describes the starting and ending position of this segment on each franscript. 7αWe 34 - Segment location on transcripts
T05709 PEA 1 T3 2281 2284 T05709 PEA 1 T5 2041 2044
Segment cluster T05709_PEA_l_node_48 according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T3 and T05709_PEA_1_T5. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
T05709 PEA 1 T3 2285 2377 T05709 PEA 1 T5 2045 2137
Segment cluster T05709_PEA_l_node_5 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T3. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts 3Eranscnpt name > , c. . Segment starting,pόsiti ni |Segment#nding»positipnt - T05709 PEA 1 T3 380 432
Segment cluster T05709_PEA_l_node_7 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05709_PEA_1_T2, T05709_PEA_1_T3, T05709_PEA_1_T5, T05709_PEA_1_T7 and T05709_PEA_1_T8. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/heUryof5qR/nBsmuE k8F :Q8TAY3 Sequence documentation: Alignment of: T05709_PEA_1_P3 x Q8TAY3 Alignment segment l/l: Quality: 6485.00 Escore: 0 Matching length: 656 Total length: 656 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 M NLLHETDSAVATARRPR LCAGALVLAGGFFLLGFLFG FIKSSNEAT 50 1 MWNLLHETDSAVATARRPR LCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50
NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQ 100 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 111111111111111111111111111111111111111111111 II 111 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 VIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAPGVKSYPDGWNLPG 250 VIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAPGVKSYPDGWNLPG 250 GGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVHPIGYY 300
111 II 1111111111111 II 11111 II 111 II 1111111111111 II 11 !■• GGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVHPIGYY 300 . . . . . DAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTN 350 DAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTN 350 EVTRIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVR 400 EVTRIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVR 400 SFGTLKKEGWRPRRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYI 450 1111111111 II 11111111111111111111111111111111111111 SFGTLKKEGWRPRRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYI 450 NADSSIEGNYTLRVDCTPLMYSLVHNLTKELKSPDEGFEGKSLYESWTKK 500 NADSSIEGNYTLRVDCTPLMYSLVHNLTKELKSPDEGFEGKSLYESWTKK 500 SPSPEFSGMPRISKLGSGNDFEVFFQRLGIASGRARYTKNWETNKFSGYP 550 SPSPEFSGMPRISKLGSGNDFEVFFQRLGIASGRARYTKNWETNKFSGYP 550 . . . . . LYHSVYETYELVEKFYDPMFKYHLTVAQVRGGMVFELANSIVLPFDCRDY 600 LYHSVYETYELVEKFYDPMFKYHLTVAQVRGGMVFELANSIVLPFDCRDY 600 AWLRKYADKIYSISMKHPQEMKTYSVSFDSLFSAVKNFTEIASKFSERL 650 AWLRKYADKIYSISMKHPQEMKTYSVSFDSLFSAVKNFTEIASKFSERL 650
651 QDFDKS 656 MINI 651 QDFDKS 656
Sequence name: /tmp/heUryof5qR/nBsmuEWk8F: F0H1_HUMAN
Sequence documentation: Alignment of: T05709_PEA_1_P3 x F0H1_HUMAN
Alignment segment 1/1: Quality: 6485.00 Escore: 0 Matching length: 656 Total length: 656 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 51 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 51 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 201 VIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAPGVKSYPDGWNLPG 250
201 VIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAPGVKSYPDGWNLPG 250 251 GGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVHPIGYY 300 251 GGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVHPIGYY 300 301 DAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTN 350 301 DAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTN 350 351 EVTRIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVR 400 351 EVTRIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVR 400 . . . . . 401 SFGTLKKEGWRPRRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYI 450 401 SFGTLKKEGWRPRRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYI 450 451 NADSSIEGNYTLRVDCTPLMYSLVHNLTKELKSPDEGFEGKSLYESWTKK 500 451 NADSSIEGNYTLRVDCTPLMYSLVHNLTKELKSPDEGFEGKSLYESWTKK 500 501 SPSPEFSGMPRISKLGSGNDFEVFFQRLGIASGRARYTKNWETNKFSGYP 550 I II 11111111111111111111111 II 11111111111111 II 111111 501 SPSPEFSGMPRISKLGSGNDFEVFFQRLGIASGRARYTKNWETNKFSGYP 550 551 LYHSVYETYELVEKFYDPMFKYHLTVAQVRGGMVFELANSIVLPFDCRDY 600 551 LYHSVYETYELVEKFYDPMFKYHLTVAQVRGGMVFELANSIVLPFDCRDY 600 601 AWLRKYADKIYSISMKHPQEMKTYSVSFDSLFSAVKNFTEIASKFSERL 650 601 AWLRKYADKIYSISMKHPQEMKTYSVSFDSLFSAVKNFTEIASKFSERL 650 651 QDFDKS 656 MINI 651 QDFDKS 656
Sequence name: /tmp/HoqzzPjViM/l4pKDcOMKb:Q8TAY3 Sequence documentation:
Alignment of: T05709_PEA_1_P8 x Q8TAY3
Alignment segment 1/1: Quality: 4736.00 Escore: 0 Matching length: 480 Total length: 480 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 51 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 51 NlIlTlPlKlHlNlMlKlAlFlLlDlElLlKlAlElNlIlKlKlFlLlYlNlFlTlQlIlPlHlLlAlGlTlElQlNlFlQlLlAlKlQlIlQlSlQlWl 100 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 151 YlElNlVlSlDlIlVlPlPlFlSlAlFlSlPlQlGlMlPlElGlDlLlVlYlVlNlYlAlRlTlElDlFlFlKlLlElRlDlMlKlIlNlClSlGlKlIl 200 201 VIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAPGVKSYPDGWNLPG 250 201 VIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAPGVKSYPDGWNLPG 250 251 GGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVHPIGYY 300 251 GGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVHPIGYY 300 301 DAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTN 350 301 DAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTN 350 351 EVTRIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVR 400
351 EVTRIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVR 400 401 SFGTLKKEGWRPRRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYI 450 401 SFGTLKKEGWRPRRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYI 450 451 NADSSIEGNYTLRVDCTPLMYSLVHNLTKE 480 451 NADSSIEGNYTLRVDCTPLMYSLVHNLTKE 480
Sequence name: /tmp/HoqzzPjViM/l4pKDcOMKb: F0H1_HUMAN
Sequence documentation: Alignment of: T05709_PEA_1_P8 x F0H1_HUMAN
Alignment segment 1/1: Quality: 4736.00 Escore : 0 Matching length: 480 Total length: 480 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 I I I I I I I I I I I M I M I I I I I M I M M I I I I I I I I I I I I I I I I I I I I I I 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 51 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 I I I I M I I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M I I 51 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150
151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 201 VIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAPGVKSYPDGWNLPG 250 201 VIARYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAPGVKSYPDGWNLPG 250 251 GGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVHPIGYY 300 251 GGVQRGNILNLNGAGDPLTPGYPANEYAYRRGIAEAVGLPSIPVHPIGYY 300 301 DAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTN 350 301 DAQKLLEKMGGSAPPDSSWRGSLKVPYNVGPGFTGNFSTQKVKMHIHSTN 350 351 EVTRIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVR 400 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 351 EVTRIYNVIGTLRGAVEPDRYVILGGHRDSWVFGGIDPQSGAAWHEIVR 400 401 SFGTLKKEGWRPRRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYI 450 401 SFGTLKKEGWRPRRTILFASWDAEEFGLLGSTEWAEENSRLLQERGVAYI 450 451 NADSSIEGNYTLRVDCTPLMYSLVHNLTKE 480 451 NADSSIEGNYTLRVDCTPLMYSLVHNLTKE 480
Sequence name: /tmp/FpbltDn2d6/peQYyYktzV:Q8TAY3
Sequence documentation:
Alignment of: T05709_PEA_1_P9 x Q8TAY3 Alignment segment 1/1: Quality: 2726.00 Escore: 0 Matching length: 275 Total length: 275 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00
Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : Alignment: 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 51 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 51 NlIlTlPlKlHlNlMlKlAlFlLlDlElLlKlAlElNlIlKlKlFlLlYlNlFlTlQlIlPlHlLlAlGlTlElQlNlFlQlLlAlKlQlIlQlSlQlWl 100 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 201 VI RYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAPGVKSYPDGWNLPG 250 201 VlIlAlRlYlGlKlVlFlRlGlNlKlVlKlNlAlQlLlAlGlAlKlGlVlIlLlYlSlDlPlAlDlYlFlAlPlGlVlKlSlYlPlDlGlWlNlLlPlGl 250 251 GGVQRGNILNLNGAGDPLTPGYPAN 275 251 GlGlVlQlRlGlNlIlLlNlLlNlGlAlGlDlPlLlTlPlGlYlPlAlNl 275
Sequence name: /tmp/FpbltDn2d6/peQYyYktzV: FOHl_HUMAN
Sequence documentation: Alignment of: T05709_PEA_1_P9 x F0H1_HUMAN
Alignment segment l/l: Quality: 2726.00 Escore: Matching length: 275 Total length: 275
Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 1 MlWlNlLlLlHlElTlDlSlAlVlAlTlAlRlRlPlRlWlLlClAlGlAlLlVlLlAlGlGlFlFlLlLlGlFlLlFlGlWlFlIlKlSlSlNlElAlTl 50 51 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 51 NlIlTlPlKlHlNlMlKlAlFlLlDlElLlKlAlElNlIlKlKlFlLlYlNlFlTlQlIlPlHlLlAlGlTlElQlNlFlQlLlAlKlQlIlQlSlQlWl 100 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 201 VI RYGKVFRGNKVKNAQLAGAKGVILYSDPADYFAPGVKSYPDGWNLPG 250 201 VlIlAlRlYlGlKlVlFlRlGlNlKlVlKlNlAlQlLlAlGlAlKlGlVlIlLlYlSlDlPlAlDlYlFlAlPlGlVlKlSlYlPlDlGlWlNlLlPlGl 250 251 GGVQRGNILNLNGAGDPLTPGYPAN 275 251 GlGlVlQlRlGlNlIlLlNlLlNlGlAlGlDlPlLlTlPlGlYlPlAlNl 275
Sequence name: /tmp/nuVBbGnPVQ/9jLdECOmo7 :Q8TAY3 Sequence documentation:
Alignment of: T05709_PEA_1_P13 x Q8TAY3
Alignment segment 1/1: Quality: 2120.00 Escore : 0
Matching length: 213 Total length: 213 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 51 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 51 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 201 VIARYGKVFRGNK 213 201 VIARYGKVFRGNK 213
Sequence name : /t p/nuVBbGnPVQ/9j LdECOmo7 : FOHl_HUMAN
Sequence documentation: Alignment of: T05709_PEA_1_P13 x F0H1_HUMAN
Alignment segment 1/1: Quality: 2120.00 Escore : 0 Matching length: 213 Total length: 213
Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLFGWFIKSSNEAT 50 51 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 51 NITPKHNMKAFLDELKAENIKKFLYNFTQIPHLAGTEQNFQLAKQIQSQW 100 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1: 1 101 KEFGLDSVELAHYDVLLSYPNKTHPNYISIINEDGNEIFNTSLFEPPPPG 150 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 151 YENVSDIVPPFSAFSPQGMPEGDLVYVNYARTEDFFKLERDMKINCSGKI 200 201 VIARYGKVFRGNK 213 201 VIARYGKVFRGNK 213
Sequence name: /tmp/hUnUR0hyKU/tCJT2KGL9I :Q8TAY3
Sequence documentation:
Alignment of: T05709_PEA_1_P14 x Q8TAY3 Alignment segment 1/1: Quality: 381.00 Escore: 0 Matching length: 39 Total length: 39 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00
Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLF 39 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLF 39
Sequence name: /tmp/hUnUR0hyKU/tCJT2KGL9I : FOHl_HUMAN
Sequence documentation: Alignment of: T05709_PEA_1_P14 x F0H1_HUMAN Alignment segment 1/1: Quality: 381.00 Escore : 0 Matching length: 39 Total length: 39 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLF 39 1 MWNLLHETDSAVATARRPRWLCAGALVLAGGFFLLGFLF 39
DESCRIPTION FOR CLUSTER HSU 13680
Cluster HSU13680 features 6 transcript(s) and 10 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
HSU13680 PEA 1 node 3 72 HSU13680 PEA 1 node 8 73 HSU13680 PEA 1 node 10 74 HSU13680_PEA_l_node_14 75 HSU13680 PEA_1 node 16 76 HSU 13680 PEA 1 node 0 77 HSU13680_PEA 1 node 1 78 HSU 13680 PEA 1 node 5 79 HSU 13680_PEA_l_node_6 80 HSU 13680 PEA 1 node 12 81
Table 3 - Proteins of interest Sr teiniNam'elf? , LCorresponαmg Tτanscnρt(S HSU13680_PEA_1_P18 85 HSU13680_PEA_1_T8; HSU13680 PEA 1 Ti l HSU13680_PEA_1_P19 86 HSU13680_PEA_1_T10; HSU 13680 PEA 1 T13 HSU13680_PEA_1_P15 87 HSU13680_PEA_1_T6; HSU13680 PEA 1 T14 These sequences are variants of the known protein L- lactate dehydrogenase C chain (SwissProt accession identifier LDHC_HUMAN; known also according to the synonyms EC
1.1.1.27; LDH-C; LDH testis subunit; LDH-X), SEQ ID NO: 82, refened to herein as the previously known protein. The sequence for protein L-lactate dehydrogenase C chain is given at the end ofthe application, as "L-lactate dehydrogenase C chain amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein L-lactate dehydrogenase C chain localization is believed to be Cytoplasmic. This protein (and hence its variants) are useful for the following test: LDH (Enzymes) Lactate Dehydrogenase. Used for myocardial infarction diagnosis and neoplastic syndromes assessment. Lactate dehydrogenase (LDH) is a cytosolic protein but a small amount of it is usually detectable in the blood. When cells are damaged or destroyed they release LDH into the bloodstream, causing blood levels to rise. It is a nonspecific marker for a range of conditions. Particularly, LDH C chain has been shown to be testis specific.
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: L-lactate dehydrogenase, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HSU13680 features 6 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein L- lactate dehydrogenase C chain. A description of each variant protein according to the present invention is now provided.
Variant protein HSU13680_PEA_1_P18 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSU13680_PEA_1_T8. An alignment is given to the known protein (L-lactate dehydrogenase C chain) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSU13680_PEA_1_P18 and LDHC_HUMAN_V1 (SEQ ID NO:83): l .An isolated chimeric polypeptide encoding for HSU13680_PEA_1_P18, comprising a first amino acid sequence being at least 90 % homologous to
MSTVKEQLIEKLIEDDENSQCKITIVGTGAVGMACAISILLKDLADELALVDVALDKLK GEMMDLQHGSLFFSTSKITSGKDYSVSANSRIVIVTAGARQQEGETRLALVQRNVAIMK SIIPAIVHYSPDCKILVVSNPVDILTYIVWKISGLPVTRVIGSGCNLDSARFRYLIGEKLGV HPTSCHGWIIGEHGDSS conesponding to amino acids 1 - 197 ofLDHC_HUMAN_Vl, which also corresponds to amino acids 1 - 197 of HSU13680_PEA_1_P18, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GIIWNKRRTLSQYPLCLGAEWCLRCCEN conesponding to amino acids 198 - 225 of HSU13680_PEA_1_P18, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSU13680_PEA_1_P18, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GIIWNKRRTLSQYPLCLGAEWCLRCCEN in HSU13680_PEA_1_P18.
It should be noted that the known protein sequence (LDHC_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for LDHC_HUMAN_V1. These changes were previously known to occur and are listed in the table below.
Table 5 - Changes to LDHC_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither of the trans- membrane region prediction programs predicted a trans- membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSU13680_PEA_1_P18 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSU13680_PEA_1_P18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein HSU13680_PEA_1_P18 is encoded by the following transcript(s): HSU13680_PEA_1_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSU13680_PEA_1_T8 is shown in bold; this coding portion starts at position 123 and ends at position 797. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of
known SNPs in variant protein HSU13680_PEA_1_P18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HSU13680_PEA_1_P19 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSU13680_PEA_1_T10. An alignment is given to the known protein (L-lactate dehydrogenase C chain) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A bref description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSU13680_PEA_1_P19 and LDHC_HUMAN_V1: l.An isolated chimeric polypeptide encoding for HSU13680_PEA_1_P19, comprising a first amino acid sequence being at least 90 % homologous to MSTVKEQLIEKLIEDDENSQCKITIVGTGAVGMACAISILLKDLADELALVDVALDKLK GEMMDLQHGSLFFSTSKITSGKDYSVSANSRIVIVTAGARQQEGETRLALVQR VAIMK SIIPAIVHYSPDCKILVVSNPVDILTYIVWKISGLPVTRVIGSGCNLDSARFRYLIGEKLGV HPTSCHGWIIGEHGDSSVP conesponding to amino acids 1 - 199 of LDHC_HUMAN_V1, which also conesponds to amino acids 1 - 199 of HSU13680_PEA_1_P19, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MKLSS conesponding to amino acids 200 - 204 of HSU13680_PEA_1_P19,
wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSU13680_PEA_1_P19, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MKLSS in HSU13680_PEA_1_P19.
It should be noted that the known protein sequence (LDHC_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for LDHC_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 8 - Changes to LDHC_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither of the trans- membrane region prediction programs predicted a trans- membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSU13680_PEA_1_P19 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSU13680_PEA_1_P19 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein HSU13680_PEA_1_P19 is encoded by the following franscript(s): HSU13680_PEA_1_T10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSU13680_PEA_1_T10 is shown in bold; this coding portion starts at position 123 and ends at position 734. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSU13680_PEA_1_P19 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HSU13680_PEA_1_P15 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSU13680_PEA_1_T6 and HSU13680_PEA_1_T14. An alignment is given to the known protein (L-lactate dehydrogenase C chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSU13680_PEA_1_P15 and LDHC_HUMAN_V2 (SEQ ID NO:84): l.An isolated chimeric polypeptide encoding for HSU13680_PEA_1_P15, comprising a first amino acid sequence being at least 90 % homologous to MSTVKEQLIEKLIEDDENSQCKITIVGTGAVGMACAISILLK conesponding to amino acids 1 - 42 of LDHC_HUMAN_V2, which also conesponds to amino acids 1 - 42 of HSU13680_PEA_1_P15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NFCIF conesponding to amino acids 43 - 47 of HSU13680_PEA_1_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSU13680_PEA_1_P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NFCIF in HSU13680_PEA_1_P15.
It should be noted that the known protein sequence (LDHC_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for LDHC_HUMAN_V2. These changes were previously known to occur and are listed in the table below. Table 11 - Changes to LDHC_HUMAN_V2
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither of the trans- membrane region prediction programs predicted a trans- membrane region for this protein.
In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.
Variant protein HSU13680_PEA_1_P15 is encoded by the following transcript(s): HSU13680_PEA_1_T6 and HSU13680_PEA_1_T14, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSU13680_PEA_1_T6 is shown in bold; this coding portion starts at position 176 and ends at position 316. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSU13680_PEA_1_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
The coding portion of transcript HSU13680_PEA_1_T14 is shown in bold; this coding portion starts at position 176 and ends at position 316. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSU13680_PEA_1_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 13 - Nucleic acid SNPs
As noted above, cluster HSU 13680 features 10 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSU13680_PEA_l_node_3 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously descnbed. This segment can be found in the following transcript(s): HSU13680_PEA_1_T6, HSU13680_PEA_1_T8, HSU13680_PEA_1_T10, HSU13680_PEA_1_T11, HSU13680_PEA_1_T13 and HSU13680_PEA_1_T14. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster HSU13680_PEA_l_node_8 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU13680_PEA_1_T6, HSU13680_PEA_1_T8, HSU13680_PEA_1_T10, HSU13680_PEA_1_T11, HSU13680_PEA_1_T13 and HSU13680_PEA_1_T14. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster HSU13680_PEA_l_node_10 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU13680_PEA_1_T6, HSU13680_PEA_1_T8, HSU13680_PEA_1_T10, HSU13680_PEA_1_T11, HSU13680_PEA_1_T13 and HSU13680_PEA_1_T14. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster HSU13680_PEA_l_node_14 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU13680_PEA_1_T6, HSU13680_PEA_1_T10 and HSU13680_PEA_1_T13. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster HSU13680JPEA_l_node_16 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU13680_PEA_1_T6, HSU13680_PEA_1_T8, HSU13680_PEA_1_T10, HSU13680_PEA_1_T11, HSU13680_ PEA_1_T13 and HSU13680_PEA_1_T14. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSU13680_PEA_l_node_0 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU13680_PEA_1_T6, HSU13680_PEA_1_T8, HSU13680_PEA_1_T10, HSU13680_PEA_1_T11, HSU13680_PEA_1_T13 and HSU13680_PEA_1_T14. Table 19 below describes the starting and ending position of this segment on each franscript. Table 19 - Segment location on transcripts
Segment cluster HSU13680_PEA_l_node_l according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU13680_PEA_1_T6, HSU13680_PEA_1_T11, HSU13680_PEA_1_T13 and HSU13680_PEA_l_T14. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HSU13680_PEA_l_node_5 according to the present invention can be found in the following transcπpt(s): HSU13680_PEA_1_T6 and HSU13680_PEA_1_T14. Table 21 below describes the starting and ending position of this segment on each transcπpt. Table 21 - Segment location on transcripts
Segment cluster HSU13680_PEA_l_node_6 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSU13680_PEA_1_T6, HSU13680_PEA_1_T8, HSU13680_PEA_1_T10, HSU13680_PEA_1_T11, HSU13680_PEA_1_T13 and HSU13680_PEA_1_T14. Table 22 below describes the starting and ending position of this segment on each franscript. Table 22 - Segment location on transcripts
Segment cluster HSU13680_PEA_l_node_12 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSU13680_PEA_1_T6 and HSU13680_PEA_1_T14. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name: LDHC_HUMAN_V1
Sequence documentation:
Alignment of: HSU13680_PEA_1_P18 x LDHC_HUMAN_V1
Alignment segment 1/1: Quality: 1878.00
Escore : 0 Matching length: 201 Total length: 201 Matching Percent Similarity: 99.00 Matching Percent Identity: 98.51 Total Percent Similarity: 99.00 Total Percent Identity: 98.51 Gaps : 0
Alignment : 1 MSTVKEQLIEKLIEDDENSQCKITIVGTGAVGMACAISILLKDLADELAL 50 1 MSTVKEQLIEKLIEDDENSQCKITIVGTGAVGMACAISILLKDLADELAL 50 51 VDVALDKLKGEMMDLQHGSLFFSTSKITSGKDYSVSANSRIVIVTAGARQ 100 51 VlDlVlAlLlDlKlLlKlGlElMlMlDlLlQlHlGlSlLlFlFlSlTlSlKlIlTlSlGlKlDlYlSlVlSlAlNlSlRlIlVlIlVlTlAlGlAlRlQl 100 101 QEGETRLALVQRNVAIMKSIIPAIVHYSPDCKILWSNPVDILTYIVWKI 150 101 QlElGlElTlRlLlAlLlVlQlRlNlVlAlIlMlKlSlIlIlPlAlIlVlHlYlSlPlDlClKlIlLlWllSlNlPlVlDlIlLlTlYlIlVlWlKlIl 150 151 SGLPVTRVIGSGCNLDSARFRYLIGEKLGVHPTSCHGWIIGEHGDSSGII 200 151 SlGlLlPlVlTlRlVlIlGlSlGlClNlLlDlSlAlRlFlRlYlLlIlGlElKlLlGlVlHlPlTlSlClHlGlWlIlIlGlElHlGlDlSlSlVPL 200 201 W 201 201 W 201
Sequence name : LDHC_HUMAN_V1
Sequence documentation:
Alignment of: HSU13680_PEA_1_P19 x LDHC_HUMAN_V1
Alignment segment l/l: Quality: 1897.00
Escore: 0 Matching length: 200 Total length: 200 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.50 Total Percent Similarity: 100.00 Total Percent Identity: 99.50 Gaps : 0
Alignment : 1 MSTVKEQLIEKLIEDDENSQCKITIVGTGAVGMACAISILLKDLADELAL 50 1 MSTVKEQLIEKLIEDDENSQCKITIVGTGAVGMACAISILLKDLADELAL 50 51 VDVALDKLKGEMMDLQHGSLFFSTSKITSGKDYSVSANSRIVI TAGARQ 100 51 VDVALDKLKGEMMDLQHGSLFFSTSKITSGKDYSVSANSRIVIVTAGARQ 100 101 QEGETRLALVQRNVAIMKSIIPAIVHYSPDCKILWSNPVDILTYIVWKI 150 101 QEGETRLALVQRNVAIMKSIIPAIVHYSPDCKILWSNPVDILTYIVWKI 150 151 SGLPVTRVIGSGCNLDSARFRYLIGEKLGVHPTSCHGWIIGEHGDSSVPM 200 151 SGLPVTRVIGSGCNLDSARFRYLIGEKLGVHPTSCHGWIIGEHGDSSVPL 200
Sequence name: LDHC_HUMAN_V2 Sequence documentation:
Alignment of: HSU13680_PEA_1_P15 x LDHC_HUMAN_V2 Alignment segment l/l: Quality: 395.00 Escore : 0 Matching length: 43 Total length: 43 Matching Percent Similarity: 100.00 Matching Percent Identity: 97.67 Total Percent Similarity: 100.00 Total Percent Identity: 97.67 Gaps : 0
Al ignment : 1 MSTVKEQLIEKLIEDDENSQCKITIVGTGAVGMACAISILLKN 43 1 MSTVKEQLIEKLIEDDENSQCKITIVGTGAVGMACAISILLKD 43
DESCRIPTION FOR CLUSTER HSPROSAP Cluster HSPROSAP features 7 transcπpt(s) and 32 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
HSPROSAP PEA 1 T3
HSPROSAP PEA 1 T15 89
HSPROSAP_PEA_l T19 90
HSPROSAP PEA 1 T20 91
HSPROSAP_PEA 1 T23 92
HSPROSAP PEA 1 T24 93
HSPROSAP PEA 1 T25 94
Table 2 - Segments of interest -»-**«-- ' β < φ**ι Φ ♦"*■*-< R r tSe SgTm ate,n »t, Na *m *e Sequence ID No.
HSPROSAP PEA 1 node 0 95
Table 3 - Proteins of interest
These sequences are variants ofthe known protein Prostatic acid phosphatase precursor (SwissProt accession identifier PPAP_HUMAN; known also according to the synonyms EC 3.1.3.2), SEQ ID NO: 127, refened to herein as the previously known protein. This protein (and hence its variants) is suitable for the following diagnostic tests: Acid Phosphatase (Enzymes); used to differentiate multiple myeloma with other monoclonal gammopathies of uncertain significance; and for Prostatic acid phosphatase "Prostate marker, also rectal carcinoids". Acid phosphatase is an enzyme found throughout the body, but primarily in the prostate gland. Prostate Acid Phosphatase (PAP) is relatively prostate specific but not disease specific. Its level rises in blood in a range of prostate disorders from cancer to prostatitis. It is not a good screening test for prostate cancer and major rise in semm levels usually occur once prostate cancer has metastasized. Elevated PAP levels are also associated witha range of non-prostate diseases: testicular cancer, leukemia, non-Hodgkin's lymphoma, Gaucher's disease, Paget's disease, osteoporosis, cinhosis of the liver, pulmonary embolism, and hyperparathyroidism. Immunohistochemistry of PAP is used to differentiate normal from cancerous prostate tissue.
The sequence for protein Prostatic acid phosphatase precursor is given at the end ofthe application, as "Prostatic acid phosphatase precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
383 D -> N The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell cycle control, which are annotation(s) related to Biological Process; acid phosphatase; protein tyrosine phosphatase; hydrolase, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster HSPROSAP can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 5 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues.
Table 5 - Normal tissue distribution iNumbeH i«iff& brain colon epithelial 417 general 143 kidney 67 ovary prostate 5137 stomach
Table 6 - P values and ratios for expression in cancerous tissue
As noted above, cluster HSPROSAP features 7 franscript(s), which were listed 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Prostatic acid phosphatase precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSPROS AP_PEA_1_P3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSPROSAP_PEA_l_T3. An alignment is given to the known protein (Prostatic acid phosphatase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPROSAP_ PEA_l_P3 and PPAP_HUMAN: l.An isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_P3, comprising a first amino acid sequence being at least 90 % homologous to MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPIDTFPTDPIKE SSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT NLAALFPPEGVSIWNPILLWQPIPVHTVPLSEDQLLYLPFRNCPRFQELESETLKSEEFQK RLHPYKDFIATLGKLSGLHGQDLFGIWSKVYDPLYCESVHNFTLPSWATEDTMTKLREL SELSLLSLYGIHKQKEKSRLQGGVLVNEILNHMKRATQIPSYKKLIMYSA conesponding to amino acids 1 - 288 of PPAP_HUMAN, which also conesponds to amino acids 1 - 288 of HSPROSAP_PEA_l_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SLWAYGKFN conesponding to amino
acids 289 - 297 of HSPROSAP_PEA_l_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SLWAYGKFN in HSPROSAP_PEA_l_P3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSPROSAP_PEA_l_P3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROSAP_PEA_l_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
The glycosylation sites of variant protein HSPROSAP_PEA_l_P3, as compared to the known protein Prostatic acid phosphatase precursor, are described in Table 8 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 8 - Glycosylation site(s)
Variant protein HSPROSAP_PEA_l_P3 is encoded by the following franscript(s): HSPROSAP_PEA_l_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSPROSAP_PEA_l_T3 is shown in bold; this coding portion starts at position 105 and ends at position 995. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROSAP_PEA_l_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein HSPROSAP_PEA_l_P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPROSAP_PEA_l_T15. An alignment is given to the known protein (Prostatic acid phosphatase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPROSAP_PEA_l_P9 and PPAP_HUMAN: l.An isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_P9, comprising a first amino acid sequence being at least 90 % homologous to MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPIDTFPTDPIKE SSWPQGFGQLTQLGMEQF1YELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT NLAALFPPEGVSIWNPILLWQPIPVHTVPLSEDQLLYLPFRNCPRFQELESETLKSEEFQK RLHPYKDFIATLGKLSGLHGQDLFGIWSKVYDPLYCESVHNFTLPSWATEDTMTKLREL SELSLLSLYGIHKQKEKSRLQGGVLVNEILNHMKRATQIPSYKKLIMYSAHDTTVSGLQ MALDVYNGLLPPYASCHLTELYFEKGEYFVEMYYRNETQHEPYPLMLPGCSPSCPLER FAELVGPVIPQDWSTECMTTNSHQG conesponding to amino acids 1 - 380 of PPAPJTUMAN, which also conesponds to amino acids 1 - 380 of HSPROSAP_PEA_l_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PAETAHSARRNHDIALPCGRSTCLENTVLYYHYG conesponding to amino acids 381 - 414 of HSPROSAP_PEA_l_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%,
more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PAETAHSARRNHDIALPCGRSTCLENTVLYYHYG in HSPROSAP_PEA_l_P9.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSPROSAP_PEA_l_P9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROSAP_PEA_l_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 -Amino acid mutations
The glycosylation sites of variant protein HSPROSAP_PEA_l_P9, as compared to the known protein Prostatic acid phosphatase precursor, are described in Table 11 (given according
to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 11 - Glycosylation site(s)
Variant protein HSPROSAP_PEA_l_P9 is encoded by the following transcript(s): HSPROSAP_PEA_l_T15, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPROSAP_PEA_l_T15 is shown in bold; this coding portion starts at position 105 and ends at position 1346. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROSAP_PEA_l_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein HSPROSAP_PEA_l_Pl 1 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HSPROSAP_PEA_l_T19. An alignment is given to the known protein (Prostatic acid phosphatase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPR0SAP_PEA_1_P11 and PPAP_HUMAN: l.An isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_Pl l, comprising a first amino acid sequence being at least 90 % homologous to MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPIDTFPTDPIKE SSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT NLAALFPPEGVSIWNPILLWQPIPVHTVPLSEDQLL YLPFRNCPRFQELESETLKSEEFQK RLHPYKDFIATLGKLSGLHGQDLFGIWSKVYDPLYCE conesponding to amino acids 1 - 216 of PPAP_HUMAN, which also conesponds to amino acids 1 - 216 of
HSPROSAP_PEA_l_Pl 1, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90%) and most preferably at least 95% homologous to a polypeptide having the sequence VKEKKITG conesponding to amino acids 217 - 224 of HSPR0SAP_PEA_1_P11, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_Pl 1, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKEKKITG in HSPR0SAP_PEA_1_P11.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSPR0SAP_PEA_1_P11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROSAP_PEA_l_Pl 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 -Amino acid mutations
186 D -> No The glycosylation sites of variant protein HSPROSAP_PEA_l_Pl 1, as compared to the known protein Prostatic acid phosphatase precursor, are described in Table 14 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 14 - Glycosylation site(s)
Variant protein HSPROSAP_PEA_l_Pl 1 is encoded by the following franscript(s): HSPROSAP_PEA_l_T19, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSPROSAP_PEA_l_T19 is shown in bold; this coding portion starts at position 105 and ends at position 776. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPR0SAP_PEA_1_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Variant protein HSPROSAP_PEA_l_P12 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HSPROSAP_PEA_1_T20. An alignment is given to the known protein (Prostatic acid phosphatase precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPROSAP_PEA_l_P12 and PPAP_HUMAN: l.An isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_P12, comprising a first amino acid sequence being at least 90 % homologous to MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPIDTFPTDPIKE SSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT NLAALFPPEGVSIWNPILLWQPIPVHTVPLS EDQLLYLPFRNCPRFQELESETLKSEEFQK RLHPYK conesponding to amino acids 1 - 185 of PPAP_HUMAN, which also conesponds to amino acids 1 - 185 of HSPROSAP_PEA_l_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKS conesponding to amino acids 186 - 188 of HSPROSAP_PEA_l_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSPROSAP_PEA_l_P12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 16, (given according to their position(s) on the
amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROSAP_PEA_l_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Amino acid mutations
The glycosylation sites of variant protein HSPROSAP_ PEA_l_P12, as compared to the known protein Prostatic acid phosphatase precursor, are described in Table 17 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 17 - Glycosylation site(s)
Variant protein HSPROSAP_PEA_l_P12 is encoded by the following transcript(s): HSPROSAP_PEA_1_T20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPROSAP_PEA_1_T20 is shown in bold; this coding portion starts at position 105 and ends at position 668. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the
presence of known SNPs in variant protein HSPROSAP_PEA_l_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Variant protein HSPROSAP_PEA_l_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPROSAP_PEA_l_T23. An alignment is given to the known protein (Prostatic acid phosphatase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report betweenHSPROSAP_PEA_l_P13 and PPAP_HUMAN: l.An isolated chimeric polypeptide encoding for HSPROSAP_ PEA_l_P13, comprising a first amino acid sequence being at least 90 % homologous to MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPIDTFPTDPIKE
SSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT NLAALFPPEGVSIWNPILLWQPIPVHTVPLSEDQ conesponding to amino acids 1 - 152 of PPAP_HUMAN, which also conesponds to amino acids 1 - 152 of HSPROSAP_PEA_l_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSILGKPGDFRWT corresponding to amino acids 153 - 165 of HSPROSAP_PEA_l_P13, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSILGKPGDFRWT in HSPROSAP_PEA_l_P13.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSPROSAP_PEA_l_P13 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 19, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROSAP_PEA_l_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 -Amino acid mutations
The glycosylation sites of variant protein HSPROSAP_PEA_l_P13, as compared to the known protein Prostatic acid phosphatase precursor, are described in Table 20 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 20 - Glycosylation site(s)
Variant protein HSPROSAP_PEA_l_P13 is encoded by the following transcript(s): HSPROSAP_PEA_l_T23, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPROSAP_PEA_l_T23 is shown in bold; this coding portion starts at position 105 and ends at position 599. The franscript also has the following SNPs as listed in Table 21 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROSAP_PEA_l_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Nucleic acid SNPs
Variant protein HSPROSAP_PEA_l_P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPROSAP_PEA_l_T24. An alignment is given to the known protein (Prostatic acid phosphatase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPROSAP_PEA_l_P14 and PPAPJETUMAN: l.An isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_P14, comprising a first amino acid sequence being at least 90 % homologous to MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPIDTFPTDPIKE SSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHEQVYIRSTDVDRTLMSAMT NLAALFPPEGVSIWNPILLWQPIPVHTVPLSEDQ conesponding to amino acids 1 - 152 of PPAP_HUMAN, which also conesponds to amino acids 1 - 152 of HSPROSAP_PEA_l_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SHHRHDHRISLWLKLSLTAGPRLLPSDLWGRLLSSLSCQYP conesponding to amino acids 153 - 193 of HSPROSAP_PEA_l_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SHHRHDHRISLWLKLSLTAGPRLLPSDLWGRLLSSLSCQYP in HSPROSAP PEA 1 P14.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans -membrane region. Variant protein HSPROSAP_PEA_l_P14 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 22, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROSAP_PEA_l_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Amino acid mutations
The glycosylation sites of variant protein HSPROSAP_PEA_l_P14, as compared to the known protein Prostatic acid phosphatase precursor, are described in Table 23 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 23 - Glycosylation site(s)
Variant protein HSPROSAP_PEA_l_P14 is encoded by the following transcript(s): HSPROSAP_PEA_l_T24, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSPROSAP_PEA_l_T24 is shown in bold; this coding portion starts at position 105 and ends at position 683. The transcript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROSAP_PEA_l_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Variant protein HSPROSAP_PEA_l_P23 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPROSAP_PEA_l_T25. An alignment is given to the known protein (Prostatic acid phosphatase precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPROSAP_PEA_l_P23 and PPAP_HUMAN:
l.An isolated chimeric polypeptide encoding for HSPROSAP_PEA_l_P23, comprising a first amino acid sequence being at least 90 % homologous to MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLV conesponding to amino acids 1 - 41 of PPAP_HUMAN, which also conesponds to amino acids 1 - 41 of HSPROSAP_PEA_l_P23, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SRLFSLLFP conesponding to amino acids 42 - 50 of HSPROS AP_PEA_1_P23, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPROSAP_PEA_l_P23, comprising a polypeptide being at least 70%, optionally at least about 80%>, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SRLFSLLFP in HSPROSAP_PEA_l_P23. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSPROSAP_PEA_l_P23 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 25, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROS AP_PEA_1_P23 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Amino acid mutations
The glycosylation sites of variant protein HSPROSAP_PEA_l_P23, as compared to the known protein Prostatic acid phosphatase precursor, are described in Table 26 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 26 - Glycosylation site(s)
Variant protein HSPROS AP_PEA_1_P23 is encoded by the following transcript(s): HSPROSAP_PEA_l_T25, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSPROSAP_PEA_l_T25 is shown in bold; this coding portion starts at position 105 and ends at position 254. The franscript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPROS AP_PEA_1_P23 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Nucleic acid SNPs
1840 C -> T Yes 1860 A -> G Yes As noted above, cluster HSPROSAP features 32 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSPROSAP_PEA_l_node_0 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T3, HSPROSAP_PEA_l_T15, HSPROSAP_PEA_l_T19, HSPROSAP_PEA_1_T20, HSPROSAP_PEA_l_T23, HSPROSAP_PEA_l_T24 and HSPROSAP_PEA_l_T25. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_l according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T25. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster HSPROSAP_ PEA_l_node_9 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T3, HSPROSAP_PEA_l_T15, HSPROSAP_PEA_l_T19, HSPROSAP_PEA_1_T20, HSPROSAP_PEA_l_T23 and HSPROSAP_PEA_l_T24. Table 30 below describes the starting and ending position of this segment on each franscript. Table 30 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_10 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T23. Table 31 below describes the starting and ending position of this segment on each franscript. Table 31 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_l 1 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_ PEA_l_T23 and HSPROSAP_PEA_l_T24. Table 32 below describes the starting and ending position of this segment on each transcript.
Table 32 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_18 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPROSAP_PEA_1_T20. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_19 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPROSAP_PEA_1_T20. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_21 according to the present invention is supported by 8 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_1_T20. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_25 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T19. Table 36 below describes the starting and ending position of this segment on each franscript. Table 36 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_46 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPROSAP_PEA_l_T3 and HSPROSAP_PEA_l_T15. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_47 according to the present invention is supported by 91 libraries. The number of libraπes was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T3. Table 38 below descπbes the starting and ending position of this segment on each franscript. Table 38 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_48 according to the present invention is supported by 66 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPROSAP_PEA_l_T3. Table 39 below describes the starting and ending position of this segment on each franscript. Table 39 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_50 according to the present invention is supported by 87 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPROSAP_PEA_l_T3. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_53 according to the present invention is supported by 85 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T3 and HSPROSAP_PEA_l_T15. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSPROSAP_PEA_l_node_3 according to the present invention is supported by 60 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPROSAP_PEA_l_T3, HSPROSAP_PEA_l_T15, HSPROSAP_PEA_l_T19, HSPROSAP_PEA_1_T20, HSPROSAP_PEA_l_T23 and HSPROSAP_PEA_l_T24. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_4 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T3, HSPROSAP_PEA_l_T15, HSPROSAP_PEA_l_T19, HSPROSAP_PEA_1_T20, HSPROSAP_PEA_l_T23 and HSPROSAP_PEA_l_T24. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_5 according to the present invention can be found in the following franscript(s): HSPROSAP_PEA_l_T3, HSPROSAP_PEA_l_T15, HSPROSAP_PEA_l_T19, HSPROSAP_PEA_1_T20, HSPROSAP_PEA_l_T23 and HSPROSAP_PEA_l_T24. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_7 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following transcπpt(s): HSPROSAP_PEA_l_T3, HSPROSAP_PEA_l_T15, HSPROSAP_PEA_l_T19, HSPROSAP_PEA_1_T20, HSPROSAP_PEA_l_T23 and HSPROSAP_PEA_l_T24. Table 45 below descnbes the starting and ending position of this segment on each franscript. Table 45 - Segment location on transcripts
HSPROSAP PEA 1 T24 321 407
Segment cluster HSPROSAP_PEA_l_node_15 according to the present invention can be found in the following transcript(s): HSPROSAP_PEA_l_T3, HSPROSAP_PEA_l_T15, HSPROSAP_PEA_l_T19 and HSPROSAP_PEA_1_T20. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_16 according to the present invention can be found in the following transcript(s): HSPROSAP_PEA_l_T3, HSPROSAP_PEA_l_T15, HSPROSAP_PEA_l_T19 and HSPROSAP_PEA_1_T20. Table 47 below describes the starting and ending position of this segment on each franscript. Table 47 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_17 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T3, HSPROSAP_PEA_l_T15, HSPROSAP_PEA_l_T19 and HSPROSAP_PEA_1_T20. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_20 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_1_T20. Table 49 below describes the starting and ending position of this segment on each franscript. Table 49 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_24 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPROS AP_PEA_1_T3, HSPROSAP_PEA_l_T15 and HSPROSAP_PEA_l_T19. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_27 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T3 and
HSPROSAP_PEA_l_T15. Table 51 bebw describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_28 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T3 and HSPROSAP_PEA_l_T15. Table 52 below describes the starting and ending position of this segment on each franscript. Table 52 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_35 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPROSAP_PEA_l_T3 and HSPROSAP_PEA_l_T15. Table 53 below describes the starting and ending position of this segment on each franscript. Table 53 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_39 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T3. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_41 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T3 and HSPROSAP_PEA_l_T15. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_42 according to the present invention is supported by 39 libraries. The number of libranes was determined as previously described. This segment can be found in the following franscπpt(s): HSPROSAP_PEA_l_T3 and HSPROSAP_PEA_l_T15. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_49 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPROSAP_PEA_l_T3. Table 57 below describes the starting and ending position of this segment on each franscript. Table 57 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_51 according to the present invention is supported by 80 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPROSAP_PEA_l_T3 and HSPROSAP_PEA_l_T15. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Segment cluster HSPROSAP_PEA_l_node_52 according to the present invention can be found in the following transcript(s): HSPROSAP_PEA _T3 and HSPROSAP_PEA_ l_T15. Table 59 below describes the starting and ending position of this segment on each franscript. Table 59 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name: PPAP_HUMAN
Sequence documentation:
Alignment of: HSPROSAP_PEA_l_P3 x PPAP_HUMAN
Alignment segment 1/1: Quality: 2849.00
Escore: 0 Matching length: 288 Total length: 288 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MRAAPLLLARAASLSLGFLFLLFFW DRSVLAKELKFVTLVFRHGDRSPI 50 1 MlRlAlAlPlLlLlLlAlRlAlAlSlLlSlLlGlFlLlFlLlLlFlFlWlLlDlRlSlVlLlAlKlElLlKlFlVlTlLlVlFlRlHlGlDlRlSlPlIl 50 51 DTFPTDPIKESS PQGFGQLTQ GMEQHYELGEYIRKRYRKFLNESYKHE 100 51 DTFPTDPIKESS PQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHE 100 101 QVYIRSTDVDRTLMSAMTNLAALFPPEGVSI NPILL QPIPVHTVPLSE 150 101 QlVlYlIlRlSlTlDlVlDlRlTlLlMlSlAlMlTlNlLlAlAlLlFlPlPlElGlVlSlIlWlNlPlIlLlLllQlPlIlPlVlHlTlVlPllSlEl 150 151 DQLLYLPFRNCPRFQELESETLKSEEFQKR HPYKDFIATLGKLSGLHGQ 200 151 DQLLYLPFRNCPRFQELESETLKSEEFQKRLHPYKDFIAT GKLSGLHGQ 200 201 DLFGI SKVYDPLYCESVHNFTLPSWATEDTMTKLRELSELSLLSLYGIH 250 201 DLFGIWSKVYDPLYCESVHNFTLPSWATEDTMTKLRELSELSLLSLYGIH 250 251 KQKEKSRLQGGVLVNEILNHMKRATQIPSYKKLIMYSA 288 251 KQKEKSRLQGGVLV EILNHMKRATQIPSYKKLIMYSA 288
Sequence name: PPAP_HUMAN
Sequence documentation:
Alignment of: HSPROSAP_PEA_l_P9 x PPAP_HUMAN Alignment segment 1/1: Quality: 3797.00
Escore : 0 Matching length: 380 Total length: 380 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MRAAPLLLARAASLSLGFLFLLFFW DRSV AKELKFVTLVFRHGDRSPI 50 1 MRAAPLLLARAASLSLGFLFLLFFW DRSVLAKELKFVTLVFRHGDRSPI 50 51 DTFPTDPIKESS PQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHE 100 51 DTFPTDPIKESSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHE 100 101 QVYIRSTDVDRTLMSAMTNLAALFPPEGVSI NPILLWQPIPVHTVPLSE 150 101 QVYIRSTDVDRTLMSAMTN AALFPPEGVSI NPILL QPIPVHTVPLSE 150 151 DQLLYLPFRNCPRFQELESETLKSEEFQKRLHPYKDFIATLGKLSGLHGQ 200 151 DQLLYLPFRNCPRFQELESETLKSEEFQKRLHPYKDFIATLGKLSGLHGQ 200 201 DLFGIWSKVYDPLYCESVHNFT PS ATEDTMTKLRELSELSLLSLYGIH 250 201 DLFGI SKVYDPLYCESVHNFTLPS ATEDTMTKLRELSELSLLSLYGIH 250 251 KQKEKSRLQGGVLV EILNHMKRATQIPSYKKLIMYSAHDTTVSGLQMAL 300 251 KQKEKSRLQGGVLVNEILNHMKRATQIPSYKKLIMYSAHDTTVSGLQMAL 300
301 DVYNGLLPPYASCHLTELYFEKGEYFVEMYYRNETQHEPYPLMLPGCSPS 350 301 DVYNGLLPPYASCHLTELYFEKGEYFVEMYYRNETQHEPYPLMLPGCSPS 350 351 CPLERFAELVGPVIPQDWSTECMTTNSHQG 380 351 CPLERFAELVGPVIPQDWSTECMTTNSHQG 380
Sequence name : PPAP_HUMAN
Sequence documentation:
Alignment of: HSPR0SAP_PEA_1_P11 x PPAP_HUMAN Alignment segment l/l: Quality: 2148.00 Escore: 0 Matching length: 216 Total length: 216 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPI 50 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPI 50 51 DTFPTDPIKESSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHE 100 51 DTFPTDPIKESSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHE 100 101 QVYIRSTDVDRTLMS7ΛMTNLAALFPPEGVSIWNPILLWQPIPVHTVPLSE 150 101 QVYIRSTDVDRTLMSAMTNLAALFPPEGVSIWNPILLWQPIPVHTVPLSE 150 151 DQLLYLPFRNCPRFQELESETLKSEEFQKRLHPYKDFIATLGKLSGLHGQ 200
151 DQLLYLPFRNCPRFQELESETLKSEEFQKRLHPYKDFIATLGKLSGLHGQ 200 201 DLFGIWSKVYDPLYCE 216 201 DLFGIWSKVYDPLYCE 216
Sequence name: PPAP_HUMAN
Sequence documentation:
Alignment of: HSPROSAP_PEA_l_P12 x PPAP_HUMAN
Alignment segment 1/1: Quality: 1840.00
Escore: 0 Matching length: 185 Total length: 185 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPI 50 1 MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPI 50 . . . . . . 51 DTFPTDPIKESSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHE 100 51 DTFPTDPIKESSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHE 100 101 QVYIRSTDVDRTLMSAMTNLAALFPPEGVSIWNPILLWQPIPVHTVPLSE 150 101 QVYIRSTDVDRTLMSAMTNLAALFPPEGVSIWNPILLWQPIPVHTVPLSE 150 151 DQLLYLPFRNCPRFQELESETLKSEEFQKRLHPYK 185 11111111111111111111111111111111111 151 DQLLYLPFRNCPRFQELESETLKSEEFQKRLHPYK 185
Sequence name: PPAP_HUMAN Sequence documentation: Alignment of: HSPR0SAP_PEA_1_P13 x PPAP_HUMAN Alignment segment l/l: Quality: 1502.00
Escore : 0 Matching length: 152 Total length: 152 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPI 50 1 MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPI 50 51 DTFPTDPIKESSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHE 100 51 DTFPTDPIKESSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHE 100 101 QVYIRSTDVDRTLMSAMTNLAALFPPEGVSIWNPILLWQPIPVHTVPLSE 150 101 QVYIRSTDVDRTLMSAMTNLAALFPPEGVSIWNPILLWQPIPVHTVPLSE 150 151 DQ 152 II 151 DQ 152
Sequence name: PPAP_HUMAN
Sequence documentation:
Alignment of: HSPROSAP_PEA_l_P14 x PPAP_HUMAN
Alignment segment l/l: Quality: 1502.00
Escore: 0 Matching length: 152 Total length: 152 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPI 50 I I I I M I I I I I M I I M 1 MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFRHGDRSPI 50 51 DTFPTDPIKESSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHE 100 51 DTFPTDPIKESSWPQGFGQLTQLGMEQHYELGEYIRKRYRKFLNESYKHE 100 101 QVYIRSTDVDRTLMSAMTNLAALFPPEGVSIWNPILLWQPIPVHTVPLSE 150 101 QVYIRSTDVDRTLMSAMTNLAALFPPEGVSIWNPILLWQPIPVHTVPLSE 150 151 DQ 152 I I 151 DQ 152
Sequence name : PPAP_HUMAN
Sequence documentation: Alignment of: HSPROSAP_PEA_l_P23 x PPAP_HUMAN Alignment segment l/l:
Quality: 384.00 Escore: 0 Matching length: 43 Total length: 43 Matching Percent Similarity: 97.67 Matching Percent Identity: 97.67 Total Percent Similarity: 97.67 Total Percent Identity: 97.67 Gaps : 0 Alignment : 1 MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVSR 43 I 1 MRAAPLLLARAASLSLGFLFLLFFWLDRSVLAKELKFVTLVFR 43
DESCRIPTION FOR CLUSTER HUMPGCA Cluster HUMPGCA features 3 franscript(s) and 18 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
HUMPGCA PEA 1 node 0 138 HUMPGCA_PEA_l_node_2 139 HUMPGCA PEA 1 node 14 140 HUMPGCA_PEA_l_node_l 6 141 HUMPGCA PEA l_node 17 142 HUMPGCA PEA 1 node 19 143 HUMPGCA PEA 1 node 28 144 HUMPGCA_PEA_l_node_4 145 HUMPGCA PEA l_node 5 146 HUMPGCA PEA 1 node 6 147 HUMPGCA_PEA_l_node 9 148
Table 3 - Proteins of interest
HUMPGCA PEA 1 P12 158 HUMPGCA PEA 1 TO HUMPGCA PEA_1_P14 159 HUMPGCA PEA 1 Tl HUMPGCA PEA 1 P15 160 HUMPGCA PEA 1 T5 These sequences are variants of the known protein Gastricsin precursor (SwissProt accession identifier PEPC_HUMAN; known also according to the synonyms EC 3.4.23.3; Pepsinogen C), SEQ ID NO: 156, refened to herein as the previously known protein. The sequence for protein Gastricsin precursor is given at the end of the application, as "Gastricsin precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
This protein (and hence its variants) may optionally be used for the following tests: Pepsinogen (Enzymes) - (in the stomach), high in gastritis, low in pernicious anemia. Recent evidence suggests that semm pepsinogen C might serve as a semm marker for gastric carcinoma. Specifically it has been shown that together with pepsinogen-A, the ratio of semm PGA/PGC can be considered as a biomarker for precancerous lesions of the stomach, and may be useful as a screening test (Br J Cancer. 2003 Apr 22;88(8): 1239-47).
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: proteolysis and peptidolysis; digestion, which are annotation(s) related to Biological Process; aspartic-type endopeptidase; pepsin A; hydrolase, which are annotation(s) related to Molecular Function; and extracellular space, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HUMPGCA features 3 transcript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Gastricsin precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMPGCA_PEA_1_P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HUMPGCA_PEA_1_T0. An alignment is given to the known protein (Gastricsin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPGCA_PEA_1_P12 and Q8IUM8 (SEQ ID NO: 157): l.An isolated chimeric polypeptide encoding for HUMPGCA_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to MKWMVVVLVCLQLLEAAVVKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYRF GDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSS conesponding to amino acids 1 - 95 of Q8IUM8, which also corresponds to amino acids 1 - 95 of HUMPGCA_PEA_1_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
NLWVPSVYCQSQACTSHSRFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQV
PNQEFGLSENEPGTNFVYAQFDGIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVY LSNQQGSSGGAWFGGVDSSLYTGQIYWAPVTQELY QIGIEEFLIGGQASGWCSEGC QAIVDTGTSLLTVPQQYMSALLQATGAQEDEYGQFLVNCNSIQNLPSLTFIINGVEFPLP PSSYILSNNGYCTVGVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA conesponding to amino acids 96 - 388 of HUMPGCA_PEA_1_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMPGCA_PEA_1_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
NLWVPSVYCQSQACTSHSRFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQV PNQEFGLSENEPGTNFVYAQFDGIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVY LSNQQGSSGGAVVFGGVDSSLYTGQIYWAPVTQELYWQIGIEEFLIGGQASGWCSEGC QAIVDTGTSLLTVPQQYMSALLQATGAQEDEYGQFLVNCNSIQNLPSLTFIINGVEFPLP PSSYILSNNGYCTVGVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA in HUMPGCA_PEA_1_P12.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans -membrane region. Variant protein HUMPGCA_PEA_1_P12 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPGCA_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein HUMPGCA__PEA_1_P12 is encoded by the following franscript(s): HUMPGCA_PEA_1_T0, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPGCA_PEA_1_T0 is shown in bold; this coding portion starts at position 94 and ends at position 1253. The franscript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPGCA_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HUMPGCA_PEA_1_P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPGCA_PEA_1_T1. An alignment is given to the known protein (Gastricsin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HUMPGCA_PEA_1_P14 and PEPC_HUMAN: l.An isolated chimeric polypeptide encoding for HUMPGCA_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to MKVΛVIVVVLVCLQLLEAAVVK\ LKJ FKSIRETMKEKGLLGEFLRTIXKYDPAWKYRF GDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVPSVYCQSQACTSHSRF NPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTVQSIQVPNQEFGLSENEPGTNFVYAQ FDGIMGLAYPALSVDEATTAMQGMVQEGALTSPVFSVYLS conesponding to amino acids 1 - 215 of PEPC_HUMAN, which also corresponds to amino acids 1 - 215 of HUMPGCA_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence K conesponding to amino acids 216 - 216 of HUMPGCA_PEA_1_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The vanant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans -membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HUMPGCA_PEA_1_P14 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 7, (given according to their posιtion(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPGCA_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein HUMPGCA_PEA_1_P14 is encoded by the following transcript(s): HUMPGCA__PEA_1_T1, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HUMPGCA_PEA_1_T1 is shown in bold; this coding portion starts at position 93 and ends at position 740. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPGCA_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HUMPGCA_PEA_1_P15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMPGCA_PEA_1_T5. An alignment is given to the known protein (Gastricsin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMPGCA_PEA_1_P15 and PEPC_HUMAN: l.An isolated chimeric polypeptide encoding for HUMPGCA_PEA_1_P15, comprising a first amino acid sequence being at least 90 % homologous to
MKWMVVVLVCLQLLEAAVVKVPLKKFKSIRETMKEKGLLGEFLRTHKYDPAWKYRF GDLSVTYEPMAYMD conesponding to amino acids 1 - 70 of PEPC.HUMAN, which also conesponds to amino acids 1 - 70 of HUMPGCA_PEA_1_P15, and a second amino acid sequence being at least 90 % homologous to VQSIQVPNQEFGLSENEPGTNFVYAQFDGLMGLAYPALSVDEATTAMQGMVQEGALTS PVFSVYLSNQQGSSGGAWFGGVDSSLYTGQIYWAPVTQELYWQIGIEEFLIGGQASGW CSEGCQAIVDTGTSLLTVPQQYMSALLQATGAQEDEYGQFLVNCNSIQNLPSLTFIINGV EFPLPPSSYILSNNGYCTVGVEPTYLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFA TAA conesponding to amino acids 150 - 388 of PEPC_HUMAN, which also conesponds to amino acids 71 - 309 of HUMPGCA_PEA_1_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HUMPGCA_PEA_1_P15, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise DV, having a stmcture as follows: a sequence starting from any of amino acid numbers 70-x to 70; and ending at any of amino acid numbers 71+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HUMPGCA_PEA_1_P15 abo has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein
HUMPGCA_PEA_1_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein HUMPGCA_PEA_1_P15 is encoded by the following transcript(s): HUMPGCA_PEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMPGCA_PEA_1_T5 is shown in bold; this coding portion starts at position 93 and ends at position 1019. The franscript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMPGCA_PEA_1_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
As noted above, cluster HUMPGCA features 18 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMPGCA_PEA_l_node_0 according to the present invention is supported by 62 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPGCA_PEA_1_T0, HUMPGCA_PEA_1_T1 and HUMPGCA_PEA_1_T5. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_2 according to the present invention is supported by 74 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPGCA_PEA_1_T0, HUMPGCA_PEA_1_T1 and HUMPGCA_PEA_1_T5. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_14 according to the present invention is supported by 70 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMPGCA_PEA_1_T0, HUMPGCA_PEA_1_T1 and HUMPGCA_PEA_1_T5. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_16 according to the present invention is supported by 14 libraries. The number of hbranes was determined as previously described. This segment can be found in the following transcript(s): HUMPGCA_PEA_1_T1. Table 14 below describes the starting and ending position of this segment on each franscript. Table 14 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_17 according to the present invention is supported by 80 hbranes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPGCA_PEA_1_T0, HUMPGCA_PEA_1_T1 and HUMPGCA_PEA_1_T5. Table 15 below describes the starting and ending position of this segment on each franscript. Table 15 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_19 according to the present invention is supported by 92 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPGCA_PEA_1_T0, HUMPGCA_PEA_1_T1 and HUMPGCA_PEA_1_T5. Table 16 below descnbes the starting and ending position of this segment on each transcript.
Table 16 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_28 according to the present invention is supported by 86 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMPGCA__PEA_1_T0, HUMPGCA_PEA_1_T1 and HUMPGCA_PEA_1_T5. Table 17 below describes the starting and ending position of this segment on each franscript. Table 17 - Segment location on transcripts
d to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMPGCA_PEA_l_node_4 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscnpt(s): HUMPGCA_PEA_1_T0 and HUMPGCA_PEA_1_T1. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_5 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPGCA_PEA_1_T0 and HUMPGCA_PEA_1_T1. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_ l_node_6 according to the present invention can be found in the following transcript(s): HUMPGCA_PEA_1_T0 and HUMPGCA_PEA_1_T 1. Table 20 below descπbes the starting and ending position of this segment on each transcπpt. Table 20 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_9 according to the present invention can be found in the following transcπpt(s): HUMPGCA_PEA_1_T0 and HUMPGCA_PEA_1_T1. Table 21 below describes the starting and ending position of this segment on each transcπpt. Table 21 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_10 according to the present invention is supported by 58 hbranes The number of libraπes was determined as previously described. This
segment can be found in the following transcript(s): HUMPGCA_PEA_1_T0 and HUMPGCA_PEA_1_T1. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HUMPGCA__PEA_l_node_l 1 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPGCA_PEA_1_T0 and HUMPGCA_PEA_1_T1. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_15 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPGCA_PEA_1_T0, HUMPGCA_PEA_1_T1 and HUMPGCA_PEA_1_T5. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_22 according to the present invention is supported by 93 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMPGCA_PEA_1_T0, HUMPGCA_PEA_1_T1 and HUMPGCA_PEA_1_T5. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_26 according to the present invention is supported by 85 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPGCA_PEA_1_T0, HUMPGCA_PEA_1_T1 and HUMPGCA_PEA_1_T5. Table 26 below describes the starting and ending position of this segment on each franscript. Table 26 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_27 according to the present invention can be found in the following franscript(s): HUMPGCA_PEA_1_T0, HUMPGCA_PEA_1_T1 and HUMPGCA_PEA_1_T5. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster HUMPGCA_PEA_l_node_29 according to the present invention is supported by 78 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMPGCA_PEA_1_T0, HUMPGCA_PEA_1_T1 and HUMPGCA_PEA_1_T5. Table 28 below describes the starting and ending position of this segment on each franscπpt. Table 28 - Segment location on transcripts
Transcript nucleic acid sequences:
Variant protein alignment to the previously known protein: Sequence name: Q8IUM8 Sequence documentation: Alignment of: HUMPGCA_PEA_1_P12 x Q8IUM8 Alignment segment l/l: Quality: 939.00 Escore : 0 Matching length: 97 Total length: 97 Matching Percent Similarity: 98.97 Matching Percent Identity: 98.97
Total Percent Similarity: 98.97 Total Percent Identity: 98.97 Gaps : 0 Alignment: 1 MKWMVWLVCLQLLEAAWKVPLKKFKSIRETMKEKGLLGEFLRTHKYDP 50 1 MKWMVWLVCLQLLEAAWKVPLKKFKSIRETMKEKGLLGEFLRTHKYDP 50 . . . . 51 AWKYRFGDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNL 97 I 51 AWKYRFGDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSPL 97
Sequence name: PEPC_HUMAN
Sequence documentation:
Alignment of: HUMPGCA_PEA_1_P14 x PEPC_HUMAN
Alignment segment l/l: Quality: 2117.00 Escore: 0 Matching length: 215 Total length: 215 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MKWMVWLVCLQLLEAAVVKVPLKKFKSIRETMKEKGLLGEFLRTHKYDP 50 1 MKWMVWLVCLQLLEAAWKVPLKKFKSIRETMKEKGLLGEFLRTHKYDP 50 51 AWKYRFGDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVP 100 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 51 AWKYRFGDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVP 100
101 SVYCQSQACTSHSRFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTV 150 101 SVYCQSQACTSHSRFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTV 150 151 QSIQVPNQEFGLSENEPGTNFVYAQFDGIMGLAYPALSVDEATTAMQGMV 200 151 QSIQVPNQEFGLSENEPGTNFVYAQFDGIMGLAYPALSVDEATTAMQGMV 200 201 QEGALTSPVFSVYLS 215 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 201 QEGALTSPVFSVYLS 215
Sequence name: PEPC_HUMA Sequence documentation:
Alignment of: HUMPGCA_PEA_1_P15 x PEPC_HUMAN Alignment segment 1/1: Quality: 2932.00
Escore: 0 Matching length: 309 Total length: 388 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 79.64 Total Percent Identity: 79.64 Gaps : 1
Alignment : 1 MKWMVWLVCLQLLEAAWKVPLKKFKSIRETMKEKGLLGEFLRTHKYDP 50 1 MKWMVWLVCLQLLEAAWKVPLKKFKSIRETMKEKGLLGEFLRTHKYDP 50 51 AWKYRFGDLSVTYEPMAYMD 70 51 AWKYRFGDLSVTYEPMAYMDAAYFGEISIGTPPQNFLVLFDTGSSNLWVP 100 71 .V 71
101 SVYCQSQACTSHSRFNPSESSTYSTNGQTFSLQYGSGSLTGFFGYDTLTV 150 72 QSIQVPNQEFGLSENEPGTNFVYAQFDGIMGLAYPALSVDEATTAMQGMV 121 151 QSIQVPNQEFGLSENEPGTNFVYAQFDGIMGLAYPALSVDEATTAMQGMV 200 122 QEGALTSPVFSVYLSNQQGSSGGAWFGGVDSSLYTGQIYWAPVTQELYW 171 201 QEGALTSPVFSVYLSNQQGSSGGAWFGGVDSSLYTGQIYWAPVTQELYW 250 172 QIGIEEFLIGGQASGWCSEGCQAIVDTGTSLLTVPQQYMSALLQATGAQE 221 251 QIGIEEFLIGGQASGWCSEGCQAIVDTGTSLLTVPQQYMSALLQATGAQE 300 222 DEYGQFLVNCNSIQNLPSLTFIINGVEFPLPPSSYILSN GYCTVGVEPT 271 301 DEYGQFLVNCNSIQNLPSLTFI INGVEFPLPPSSYILSNNGYCTVGVEPT 350 272 YLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA 309 I ; 1 1 1 1 ! 1 1 1 1 : : I ! ! I 1 1 I 1 1 : ; 1 1 i ! 1 1 351 YLSSQNGQPLWILGDVFLRSYYSVYDLGNNRVGFATAA 388
DESCRIPTION FOR CLUSTER HUMFBRB Cluster HUMFBRB features 8 transcript(s) and 55 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3.
Table 1 - Transcripts of interest
TranscπpfNamel
HUMFBRB PEA 1 T14 161
HUMFBRB PEA 1 T16 162
HUMFBRB PEA 1 T19 163
HUMFBRB PEA 1 T20 164
HUMFBRB_PEA 1_T25 165
HUMFBRB PEA 1 T44 166
HUMFBRB PEA 1 T52 167
HUMFBRB PEA_1_T8 168
Table 3 - Proteins of interest
These sequences are variants of the known protein Fibrinogen beta chain precursor [Contains: Fibrinopeptide B] (SwissProt accession identifier FIBB_HUMAN), SEQ ID NO: 224), refened to herein as the previously known protein. This protein (and hence its variants) may optionally be used for the following diagnostic tests: Fibrinogen (Blood Clotting) Clotting disorders. Fibrinogen has a double function: yielding monomers that polymerize into fibrin and acting as a cofactor in platelet aggregation. It is an hexamer containing 2 sets of 3 nonidentical chains (alpha, beta and gamma), linked to each other by disulfide bonds.Thrombin sequentially cleaves fibrinopeptides A and B from the Aalpha and Bbeta chains of fibrinogen to produce fibrin monomer, which then polymerizes to form a fibrin clot. Semm fibrinogen is measured to: 1. Assess clotting (usually after PT, PTT and platelet counts have been performed) 2. Help diagnose disseminated infravascular coagulation 3. monitor the status of a progressive liver disease 4. determine a patient's overall risk of developing cardiovascular disease (as a marker of inflammation).
Protein Fibrinogen beta chain precursor [Contains: Fibrinopeptide B] is known or believed to have the following function(s): Fibrinogen has a double function: yielding monomers that polymerize into fibrin and acting as a cofactor in platelet aggregation ~ The sequence for protein Fibrinogen beta chain precursor [Contains: Fibrinopeptide B] iiss given at the end of the application, as "Fibrinogen beta chain precursor [Contains: Fibrinopeptide B] amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4 Table 4 - Amino acid mutations for Known Protein
The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Haemonhage; Wound healing; Surgery adjunct; Traumatic shock; Atherosclerosis. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically
related activity or activities of the previously known protein are as follows: Angiogenesis stimulant; Fibroblast growth factor 2 agonist; Immunostimulant; Thrombin inhibitor; Fibrinogen modulator. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drag database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticoagulant; Antithrombotic; Anticancer; Hypolipaemic/Antiatherosclerosis; Septic shock treatment; Neuroprotective; Cardiovascular; Vulnerary; Fibrinolytic; Immunoconjugate; antibody; Haemostatic; Blood fraction; Musculoskeletal; Imaging agent. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: blood coagulation; blood pressure regulation; positive control of cell proliferation, which are annotation(s) related to Biological Process; and fibrinogen; soluble fraction, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HUMFBRB features 8 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Fibrinogen beta chain precursor [Contains: Fibrinopeptide B]. A description of each variant protein according to the present invention is now provided.
Variant protein HUMFBRB_PEA_1_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s)
HUMFBRB_PEA_1_T8. An alignment is given to the known protein (Fibrinogen beta chain precursor [Contains: Fibrinopeptide B]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFBRB_PEA_1_P4 and FIBB_HUMAN:
l.An isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P4, comprising a first amino acid sequence being at least 90 % homologous to
MKRMVSWSFFIKLKTMKΗLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLDKKREEAP SLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADPDLGVLCPTGCQLQEA LLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYLLKΩLWQKRQKQVKDNENVVNEY SSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PWSGKECEEIIRKGGETSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSVDF GRKWDPYKQGFGNVATNTDGKNYCGLPGEYWLGNDKISQLTRMGPTELLIEMEDWK GDKVKAHYGGFTVQNEANKYQISVNKYRGTAGNALMDGASQLMGENRTMTIHNGMF FSTYDRDNDGW conesponding to amino acids 1 - 415 of FIBB_HUMAN, which also conesponds to amino acids 1 - 415 of HUMFBRB_PEA_1_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence YVWHSLLLL conesponding to amino acids 416 - 424 of HUMFBRB_PEA_1_P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMFBRB_PEA_1_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95 % homologous to the sequence YVWHSLLLL in HUMFBRB_PEA_1_P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans- membrane region.. Variant protein HUMFBRB_PEA_1_P4 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein
HUMFBRB_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein HUMFBRB_PEA_1_P4 is encoded by the following transcript(s): HUMFBRB_PEA_1_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HUMFBRB_PEA_1_T8 is shown in bold; this coding portion starts at position 65 and ends at position 1336. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the
alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFBRB_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HUMFBRB_PEA_1_P9 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMFBRB_PEA_1_T14. An alignment is given to the known protein (Fibrinogen beta chain precursor [Contains: Fibrinopeptide B]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFBRB_PEA_1_P9 and FIBB_HUMAN:
l.An isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to
MKRMVSWSFHKLKTMKJTLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLDKKREEAP SLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADPDLGVLCPTGCQLQEA LLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYLLKDLWQKRQKQVKDNENVVNEY SSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PVVSGKECEEIIRKGGETSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSVDF GRKWDPYKQGFGNVATNTDGKNYCGLPG conesponding to amino acids 1 - 320 of FIBB_HUMAN, which also conesponds to amino acids 1 - 320 of HUMFBRB_PEA_1_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NEQACKIKSFYLKWDFF conesponding to amino acids 321 - 337 of HUMFBRB_PEA_1_P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMFBRB_PEA_1_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NEQACKIKSFYLKWDFF in HUMFBRB_PEA_1_P9. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans- membrane region prediction program predicts that this protein has a frans- membrane region.. Variant protein HUMFBRB_PEA_1_P9 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFBRB_PEA_1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 7 - Amino acid mutations
Variant protein HUMFBRB_PEA_1_P9 is encoded by the following franscript(s): HUMFBRB_PEA_1_T14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMFBRB_PEA_1_T14 is shown in bold; this coding portion starts at position 65 and ends at position 1075. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFBRB_PEA_1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HUMFBRB_PEA_1_P11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMFBRB_PEA_1_T16. An alignment is given to the known protein (Fibrinogen beta chain precursor [Contains: Fibrinopeptide B]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFBRB_PEA_1_P11 and FIBB_HUMAN: l.An isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to MK IMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLDKKREEAP SLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADPDLGVLCPTGCQLQEA LLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYLLKDLWQKRQKQVKDNENWNEY
SSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PWSGKECEEIIRKGGETSEMYLIQPDSSVKPYRVYCDMNTENGG conesponding to amino acids 1 - 278 of FIBB_HUMAN, which also conesponds to amino acids 1 - 278 of HUMFBRB_PEA_1_P11, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KLSTWDLLICNYLDTVKCQETRPGWAHTCNSSTLGGQSGLIA conesponding to amino acids 279 - 322 of HUMFBRB_PEA_1_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMFBRB_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KLSTWDLLICNYLDTVKCQETRPGWAHTCNSSTLGGQSGLIA in HUMFBRB_PEA_1_P1 1.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans- membrane region prediction program predicts that this protein has a trans- membrane region.. Variant protein HUMFBRB_PEA_1_P11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their posιtion(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFBRB_PEA_1_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein HUMFBRB_PEA_1_P11 is encoded by the following franscript(s): HUMFBRB_PEA_1_T16, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HUMFBRB_PEA_1_T16 is shown in bold; this coding portion starts at position 65 and ends at position 1030. The franscript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFBRB_PEA_1_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HUMFBRB_PEA_1_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMFBRB_PEA_ 1_T19. An alignment is given to the known protein (Fibrinogen beta chain precursor [Contains: Fibrinopeptide B]) at the end of the application One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFBRB_PEA_1_P13 and FIBB_HUMAN: l.An isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to MKRMVSWSFF1KI.KTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLDKKREEAP SLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADPDLGVLCPTGCQLQEA LLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYLLKDLWQKRQKQVKDNENWNEY SSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PWSGK conesponding to amino acids 1 - 239 of FIBB_HUMAN, which also conesponds to amino acids 1 - 239 of HUMFBRB_PEA_1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GN
conesponding to amino acids 240 - 241 of HUMFBRB__PEA_1_P13, wherein said first and second amino acid sequences are contiguous and in a sequential order.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein HUMFBRB_PEA_1_P13 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFBRB_PEA_1_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 -Amino acid mutations
Variant protein HUMFBRB_PEA_1_P13 is encoded by the following transcript(s): HUMFBRB_PEA_1_T19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcπpt HUMFBRB_PEA_1_T19 is shown in bold; this coding portion starts at position 65 and ends at position 787. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFBRB_PEA_1_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein HUMFBRB_PEA_1_P17 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s)
HUMFBRB_PEA_1_T25. An alignment is given to the known protein (Fibrinogen beta chain precursor [Contains: Fibrinopeptide B]) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFBRB_PEA_1_P17 and FIBB_HUMAN: l.An isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P17, comprising a first amino acid sequence being at least 90 % homologous to MKRMVSWSFITKLKTMKHLLLLLLC LVKSQGVNDNEEGFFSARGHRPLDKKREEAP SLRP APPPISGGG YRARP AKA AATQKKVERKAPD AGGCLHADPDLGVLCPTGCQLQE A LLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYLLKDLWQKRQKQVKDNENVVNEY SSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PVVSGKECEEIIRKGGETSEMYLIQPDSSVKPYRVYCDMNTENG conesponding to amino acids 1 - 277 of FIBB_HUMAN, which also corresponds to amino acids 1 - 277 of HUMFBRB_PEA_1_P17, and a second amino acid sequence being at least 90 % homologous to GEYWLGNDKISQLTRMGPTELLIEMEDWKGDKVKAHYGGFTVQNEANKYQISVNKYR GTAGNALMDGASQLMGENRTMTIHNGMFFSTYDRDNDGWLTSDPRKQCSKEDGGGW WYNRCHAANPNGRYYWGGQYTWDMAKHGTDDGVVWMN KGSWYSMRKMSMKI RPFFPQQ conesponding to amino acids 320 - 491 of FD3B_HUMAN, which also corresponds to amino acids 278 - 449 of HUMFBRB_PEA_1_P17, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMFBRB_PEA_1_P17, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise GG, having a structure as follows: a sequence starting from any of amino acid numbers 277-x to 277; and ending at any of amino acid numbers 278+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein HUMFBRB_PEA_1_P17 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFBRB_PEA_1_P17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein HUMFBRB_PEA_1_P17 is encoded by the following transcript(s): HUMFBRB_PEA_1_T25, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMFBRB_PEA_1_T25 is shown in bold; this coding portion starts at position 65 and ends at position 1411. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFBRB_PEA_1_P17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein HUMFBRB_PEA_1_P26 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMFBRB_PEA_1_T44. An alignment is given to the known protein (Fibrinogen beta chain precursor [Contains: Fibrinopeptide B]) at the end of the application. One or more alignments to
one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFBRB_PEA_1_P26 and FIBB_HUMAN: l.An isolated chimeric polypeptide encoding for HUMFBRB_PEA_1_P26, comprising a first amino acid sequence being at least 90 % homologous to
MKRMVSWSFITKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLDKKRE conesponding to amino acids 1 - 54 of FIBB_HUMAN, which also conesponds to amino acids 1 - 54 of HUMFBRB_PEA_1_P26, and a second amino acid sequence being at least 90 % homologous to
EALLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYLLKDLWQKRQKQVKDNENWN EYSSELEKHQLYIDETVNSNIPTNLRVLRSILENLRSKIQKLESDVSAQMEYCRTPCTVSC NIPVVSGKECEEIIRKGGETSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSV DFGRKWDPYKQGFGNVATNTDGKNYCGLPGEYWLGNDKISQLTRMGPTELLIEMED WKGDKVKAHYGGFTVQNE ANKYQISVNKYRGTAGNALMDGASQLMGENRTMTIHN GMFFSTYDRDNDGWLTSDPRKQCSKEDGGGWWYNRCHAANPNGRYYWGGQYTWD MAKHGTDDGVVWMNWKGSWYSMRKMSMKIRPFFPQQ conesponding to amino acids 114 - 491 of FIBB_HUMAN, which also conesponds to amino acids 55 - 432 of HUMFBRB_PEA_1_P26, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HUMFBRB_PEA_1_P26, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EE, having a structure as follows: a sequence starting from any of amino acid numbers 54-x to 54; and ending at any of amino acid numbers 55+ ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized
programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans -membrane region.. Variant protein HUMFBRB_PEA_1_P26 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFBRB_PEA_1_P26 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 -Amino acid mutations
180 K -> Q No Variant protein HUMFBRB_PEA_1_P26 is encoded by the following franscript(s): HUMFBRB_PEA_1_T44, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMFBRB__PEA_1_T44 is shown in bold; this coding portion starts at position 65 and ends at position 1360. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFBRB__PEA_1_P26 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
As noted above, cluster HUMFBRB features 55 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMFBRB_PEA_l_node_0 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_28 according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
HUMFBRB PEA 1 T14 407 554 HUMFBRB PEA 1 T16 407 554 HUMFBRB PEA 1 T19 407 554 HUMFBRB_PEA 1 T20 407 554 HUMFBRB PEA 1 T25 407 554 HUMFBRB_PEA 1 T44 230 377 HUMFBRB PEA 1 T52 457 604 HUMFBRB PEA 1 T8 407 554
Segment cluster HUMFBRB_PEA_l_node_39 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T19. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_47 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T16. Table 20 below describes the starting and ending position of this segment on each franscript. Table 20 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_51 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14 and HUMFBRB_PEA_1_T20. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts ?sιtιoni HUMFBRB PEA_1 T14 1023 1228 HUMFBRB PEA 1 T20 1023 1228
Segment cluster HUMFBRB_PEA_l_node_55 according to the present invention is supported by 82 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_56 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T8. Table 23 below describes the starting and ending position of this segment on each transcπpt. Table 23 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_64 according to the present invention is supported by 74 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_69 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 25 below describes the starting and ending position of this segment on each franscript. Table 25 - Segment location on transcripts
HUMFBRB PEA 1 T14 2054 2799 HUMFBRB PEA 1 T16 2531 3276 HUMFBRB PEA_1_T19 2408 3153 HUMFBRB PEA 1 T25 1722 2467 HUMFBRB PEA 1 T44 1671 1798 HUMFBRB PEA 1 T52 1898 2025 HUMFBRB PEA 1 T8 2468 3213 Segment cluster HUMFBRB_PEA_l_node_71 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T25 and HUMFBRB_PEA_1_T8. Table 26 below describes the starting and ending position of this segment on each franscript. Table 26 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_74 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14,
HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T25 and HUMFBRB_PEA_1_T8. Table 27 below describes the starting and ending position of this segment on each franscript.
Table 27 - Segment location on transcripts
HUMFBRB PEA 1 T14 3444 3598
HUMFBRB_PEA_1_T 16 3921 4075
HUMFBRB PEA 1 T19 3798 3952
HUMFBRB PEA 1 T25 3112 3266
HUMFBRB PEA_1_T8 3858 4012
Segment cluster HUMFBRB_PEA_l_node_75 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T25 and HUMFBRB_PEA_1_T8. Table 28 below describes the starting and ending position of this segment on each franscript.
Table 28 - Segment location on transcripts
HUMFBRB_PEA_1_T 14 3599 3911
HUMFBRB PEA 1 T16 4076 4388
HUMFBRB PEA 1 T19 3953 4265
HUMFBRB PEA 1_T25 3267 3579
HUMFBRB PEA 1 T8 4013 4325
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMFBRB_PEA_l_node_12 according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This
segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_13 according to the present invention can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 30 below describes the starting and ending position of this segment on each franscript. Table 30 - Segment location on transcripts
HUMFBRB PEA 1 T14 230 238 HUMFBRB PEA 1 T16 230 238 HUMFBRB_PEA_1_T 19 230 238 HUMFBRB PEA 1 T20 230 238 HUMFBRB PEA 1 T25 230 238 HUMFBRB PEA 1 T52 280 288 HUMFBRB PEA 1 T8 230 238
Segment cluster HUMFBRB_PEA_l_node_14 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25,
HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 31 below describes the starting and ending position of this segment on each transcript.
Table 31 - Segment location on transcripts
HUMFBRB PEA 1 T14 239 250
HUMFBRB PEA 1 T16 239 250
HUMFBRB PEA_1_T19 239 250
HUMFBRB PEA 1 T20 239 250
HUMFBRB PEA 1 T25 239 250
HUMFBRB PEA 1 T52 289 300
HUMFBRB PEA_1_T8 239 250
Segment cluster HUMFBRB_PEA_l_node_15 according to the present invention is supported by 70 libraries. The number of libraries was determined as previously described. This segment can be found in the following ttanscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 32 below describes the starting and ending position of this segment on each franscript.
Table 32 - Segment location on transcripts
HUMFBRB PEA 1 T14 251 312
HUMFBRB PEA 1_T16 251 312
HUMFBRB PEA 1 T19 251 312
HUMFBRB PEA 1_T20 251 312
HUMFBRB PEA 1 T25 251 312
HUMFBRB PEA 1_T52 301 362
HUMFBRB PEA_1_T8 251 312 Segment cluster HUMFBRB_PEA_l_node_16 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 33 below describes the starting and ending position of this segment on each transcript.
Table 33 - Segment location on transcripts
HUMFBRB PEA 1 T14 313 316 HUMFBRB PEA 1 T16 313 316 HUMFBRB PEA_1_T19 313 316 HUMFBRB_PEA_1_T20 313 316 HUMFBRB PEA_1_T25 313 316 HUMFBRB PEA_1_T52 363 366 HUMFBRB PEA 1 T8 313 316
Segment cluster HUMFBRB_PEA_l_node_17 according to the present invention can be found in the following transcript(s): HUMFBRB__PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
HUMFBRB_PEA_1_T14 317 324 HUMFBRB PEA 1 T16 317 324 HUMFBRB PEA_1_T19 317 324 HUMFBRB_PEA_1_T20 317 324 HUMFBRB PEA_1_T25 317 324 HUMFBRB PEA 1 T52 367 374 HUMFBRB PEA 1_T8 317 324
Segment cluster HUMFBRB_PEA_l_node_18 according to the present invention canbe found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_19 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 36 below descπbes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
HUMFBRB PEA 1 T14 333 370 HUMFBRB PEA 1 T16 333 370 HUMFBRB PEA 1 T19 333 370 HUMFBRB PEA 1_T20 333 370 HUMFBRB PEA 1 T25 333 370 HUMFBRB PEA 1 T52 383 420 HUMFBRB PEA 1 T8 333 370
Segment cluster HUMFBRB_PEA_l_node_26 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 37 below descπbes the starting and ending position of this segment on each transcnpt. Table 37 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_27 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 38 below describes the starting and ending position of this segment on each franscript. Table 38 - Segment location on transcripts
HUMFBRB PEA 1 T14 379 406 HUMFBRB PEA 1 T16 379 406 HUMFBRB_PEA 1_T19 379 406 HUMFBRB PEA 1 T20 379 406 HUMFBRB PEA 1 T25 379 406 HUMFBRB PEA 1 T52 429 456 HUMFBRB PEA 1 T8 379 406
Segment cluster HUMFBRB_PEA_l_node_32 according to the present invention is supported by 66 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_33 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
HUMFBRB_PEA_1_T14 590 646 HUMFBRB PEA 1 T16 590 646 HUMFBRB_PEA_1_T 19 590 646 HUMFBRB_PEA 1 T20 590 646 HUMFBRB_PEA_1_T25 590 646 HUMFBRB PEA_1 T44 413 469 HUMFBRB PEA 1 T52 640 696 HUMFBRB_PEA 1 T8 590 646
Segment cluster HUMFBRB_PEA_l_node_34 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 41 below describes the starting and ending position of this segment on each franscript. Table 41 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_35 according to the present invention can be found in the following transcript(s): HUMFBRB__PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 42 below describes the starting and ending position of this segment on each franscript. Table 42 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_36 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_37 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 44 below describes the starting and ending position of this segment on each franscript. Table 44 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_38 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_40 according to the present invention can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
HUMFBRB PEA 1 T14 783 801 HUMFBRB PEA_1_T16 783 801 HUMFBRB PEA 1 T19 1343 1361 HUMFBRB PEA_1_T20 783 801 HUMFBRB PEA 1 T25 783 801 HUMFBRB PEA_1_T44 606 624 HUMFBRB PEA 1 T52 833 851 HUMFBRB PEA_1_T8 783 801
Segment cluster HUMFBRB_PEA_l_node_41 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster HUMFBRB__PEA_l_node_42 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 48 below describes the starting and ending position of this segment on each franscript. Table 48 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_43 according to the present invention can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 49 below describes the starting and ending position of this segment on each franscript. Table 49 - Segment location on transcripts •Mi. jΛ. -JEJS 3sI!<F .-.I*.*. "Tra innssccnnpt name Segmenπstarting position Segmen ending positions HUMFBRB PEA 1 T14 882 886 HUMFBRB PEA 1 T16 882 886 HUMFBRB_PEA_ 1_T 19 1442 1446 HUMFBRB_PEA_1_T20 882 886 HUMFBRB PEA 1 T25 882 886
Segment cluster HUMFBRB_PEA_l_node_44 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 50 below describes the starting and ending position of this segment on each franscript. Table 50 - Segment location on transcripts
HUMFBRB PEA_1_T14 887 890 HUMFBRB PEA 1_T16 887 890 HUMFBRB PEA 1 T19 1447 1450 HUMFBRB PEA 1_T20 887 890 HUMFBRB PEA 1 T25 887 890 HUMFBRB_PEA_1_T44 710 713 HUMFBRB PEA 1 T52 937 940 HUMFBRB PEA 1 T8 887 890
Segment cluster HUMFBRB_PEA_l_node_46 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_48 according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 52 below describes the starting and ending position of this segment on each franscript. Table 52 - Segment location on transcripts
HUMFBRB PEA 1 T14 897 940 HUMFBRB PEA 1 T16 1580 1623 HUMFBRB PEA 1 T19 1457 1500 HUMFBRB PEA 1 T20 897 940 HUMFBRB PEA 1 T44 720 763 HUMFBRB PEA 1 T52 947 990 HUMFBRB PEA_1 T8 897 940
Segment cluster HUMFBRB_PEA_l_node_49 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRBJ>EA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 53 below describes the starting and ending position of this segment on each franscript. Table 53 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_50 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB__PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
HUMFBRB PEA 1 T14 946 1022 HUMFBRB PEA 1 T16 1629 1705 HUMFBRB PEA 1 T19 1506 1582 HUMFBRB PEA 1 T20 946 1022 HUMFBRB PEA 1 T44 769 845 HUMFBRB PEA 1 T52 996 1072 HUMFBRB PEA 1 T8 946 1022 Segment cluster HUMFBRB_PEA_l_node_52 according to the present invention can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_53 according to the present invention is supported by 67 libraries. The number of libraries was determined as previously descπbed. This
segment can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 56 below describes the starting and ending position of this segment on each transcript.
Table 56 - Segment location on transcripts
HUMFBRB_PEA 1 T14 1252 1305
HUMFBRB PEA 1 T16 1729 1782
HUMFBRB PEA_1 T19 1606 1659
HUMFBRB PEA 1 T20 1252 1305
HUMFBRB PEA 1 T25 920 973
HUMFBRB_PEA_1_T44 869 922
HUMFBRB PEA 1 T52 1096 1149
HUMFBRB_PEA_1_T8 1046 1099
Segment cluster HUMFBRB_PEA_l_node_54 according to the present invention is supported by 75 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB__PEA_1_T8. Table 57 below describes the starting and ending position of this segment on each franscript.
Table 57 - Segment location on transcripts Segment£nάung*rωsifiyQn|
HUMFBRB_PE A_ 1_T 14 1306 1391
HUMFBRB PEA_1_T16 1783 1868
HUMFBRB PEA 1 T19 1660 1745
HUMFBRB_PEA_1_T20 1306 1391
HUMFBRB_PEA 1 T25 974 1059
HUMFBRB PEA 1 T44 923 1008
HUMFBRB_PEA_1_T52 1 150 1235
HUMFBRB_PEA_1 T8 1100 1185
Segment cluster HUMFBRB_PEA_l_node_57 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 58 below descπbes the starting and ending position of this segment on each transcript.
Table 58 - Segment location on transcripts
HUMFBRB PEA 1 T14 1515 1534
HUMFBRB PEA_1_T16 1992 2011
HUMFBRB PEA 1 T19 1869 1888
HUMFBRB PEA 1 T20 1515 1534
HUMFBRB PEA 1 T25 1183 1202
HUMFBRB PEA 1 T44 1132 1151
HUMFBRB PEA_1 T52 1359 1378
HUMFBRB PEA 1 T8 1929 1948
Segment cluster HUMFBRB_PEA_l_node_58 according to the present invention is supported by 82 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 59 below describes the starting and ending position of this segment on each transcript.
Table 59 - Segment location on transcripts
HUMFBRB PEA 1 T14 1535 1567
HUMFBRB PEA 1 T16 2012 2044
HUMFBRB_PEA_1 T19 1889 1921
HUMFBRB PEA 1 T20 1535 1567
HUMFBRB_PEA_1_T25 1203 1235
HUMFBRB PEA_1 T44 1152 1184
HUMFBRB PEA 1 T52 1379 1411
HUMFBRB_PEA_ 1_T8 1949 1981
Segment cluster HUMFBRB_PEA_l_node_59 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
HUMFBRB PEA 1 T14 1568 1588 HUMFBRB_PEA_1_T 16 2045 2065 HUMFBRB PEA 1 T19 1922 1942 HUMFBRB_PEA_1_T20 1568 1588 HUMFBRB PEA 1 T25 1236 1256 HUMFBRB PEA 1_T44 1185 1205 HUMFBRB PEA 1 T52 1412 1432 HUMFBRB PEA 1 T8 1982 2002
Segment cluster HUMFBRB_PEA_l_node_6 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T52. Table 61 below describes the starting and ending position of this segment on each franscript. Table 61 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_61 according to the present invention is supported by 87 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
pcanscnpinamt
HUMFBRB PEA 1 T14 1589 1694
HUMFBRB PEA 1_T16 2066 2171
HUMFBRB PEA 1 T19 1943 2048
HUMFBRB PEA 1 T20 1589 1694
HUMFBRB PEA 1 T25 1257 1362
HUMFBRB PEA 1 T44 1206 1311
HUMFBRB PEA 1 T52 1433 1538
HUMFBRB PEA 1 T8 2003 2108
Segment cluster HUMFBRB_PEA_l_node_62 according to the present invention can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 63 below describes the starting and ending position of this segment on each franscript.
Table 63 - Segment location on transcripts
HUMFBRB PEA 1 T14 1695 1711
HUMFBRB PEA 1 T16 2172 2188
HUMFBRB PEA 1 T19 2049 2065
HUMFBRB PEA 1 T20 1695 1711
HUMFBRB PEA 1 T25 1363 1379
HUMFBRB PEA 1 T44 1312 1328
HUMFBRB PEA 1 T52 1539 1555
HUMFBRB PEA 1 T8 2109 2125
Segment cluster HUMFBRB_PEA_l_node_63 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T20, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 64 below describes the starting and ending position of this segment on each transcript.
Table 64 - Segment location on transcripts
iEranscnpt name HUMFBRB PEA 1_T14 1712 1765 HUMFBRB PEA 1 T16 2189 2242 HUMFBRB PEA 1 T19 2066 2119 HUMFBRB PEA 1 T20 1712 1765 HUMFBRB PEA 1 T25 1380 1433 HUMFBRB PEA 1 T44 1329 1382 HUMFBRB PEA 1 T52 1556 1609 HUMFBRB PEA 1 T8 2126 2179
Segment cluster HUMFBRB_PEA_l_node_65 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 65 below descnbes the starting and ending position of this segment on each franscnpt. Table 65 - Segment location on transcripts
HUMFBRB PEA 1 T14 1959 1978 HUMFBRB PEA 1 T16 2436 2455 HUMFBRB PEA 1 T19 2313 2332 HUMFBRB PEA 1 T25 1627 1646 HUMFBRB PEA 1 T44 1576 1595 HUMFBRB PEA_1_T52 1803 1822 HUMFBRB PEA 1 T8 2373 2392
Segment cluster HUMFBRB_PEA_l_node_66 according to the present invention is supported by 46 libraries. The number of hbranes was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 66 below describes the starting and ending position of this segment on each transcπpt. Table 66 - Segment location on transcripts
Segment cluster HUMFBRB_PEA_l_node_67 according to the present invention can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB__PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts ^\
HUMFBRB PEA 1 T14 2012 2031 HUMFBRB PEA_1_T16 2489 2508 HUMFBRB PEA 1 T19 2366 2385 HUMFBRB_PEA_1 T25 1680 1699 HUMFBRB PEA 1 T44 1629 1648 HUMFBRB_PEA_1_T52 1856 1875 HUMFBRB PEA 1 T8 2426 2445
Segment cluster HUMFBRB_PEA_l_node_68 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T25, HUMFBRB_PEA_1_T44, HUMFBRB_PEA_1_T52 and HUMFBRB_PEA_1_T8. Table 68 below descπbes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
HUMFBRB PEA 1 T8 2446 2467
Segment cluster HUMFBRB_PEA_l_node_70 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T25 and HUMFBRB_PEA_1_T8. Table 69 below describes the starting and ending position of this segment on each franscript.
Table 69 - Segment location on transcripts
HUMFBRB PEA 1 T14 2800 2853
HUMFBRB PEA_1 T16 3277 3330
HUMFBRB PEA 1 T19 3154 3207
HUMFBRB PEA 1 T25 2468 2521
HUMFBRB_PEA_1_T8 3214 3267
Segment cluster HUMFBRB_PEA_l_node_72 according to the present invention is supported by 12 libranes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16, HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T25 and HUMFBRB_PEA_1_T8. Table 70 below describes the starting and ending position of this segment on each franscript.
Table 70 - Segment location on transcripts „SegmerAstarto posιtιpn& S gm tenαm Jggt *£&osi&ti&o< i
HUMFBRB_PEA_1_T14 3367 3420
HUMFBRB PEA 1_T16 3844 3897
HUMFBRB PEA 1 T19 3721 3774
HUMFBRB PEA 1_T25 3035 3088
HUMFBRB PEA 1 T8 3781 3834
Segment cluster HUMFBRB_PEA_l_node_73 according to the present invention can be found in the following transcript(s): HUMFBRB_PEA_1_T14, HUMFBRB_PEA_1_T16,
HUMFBRB_PEA_1_T19, HUMFBRB_PEA_1_T25 and HUMFBRB_PEA_1_T8. Table 71 below describes the starting and ending position of this segment on each franscript. Table 71 - Segment location on transcripts
Transcript nucleic acid sequences:
Variant protein alignment to the previously known protein: Sequence name: /tmp/qNGpGAnwXR/gFvQD2xRv4 :FIBB_HUMAN Sequence documentation: Alignment of: HUMFBRB_PEA_1_P4 x FIBB_HUMAN Alignment segment 1/1: Quality: 4142.00 Escore: 0 Matching length: 415 Total length: 415 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment : 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 51 KKREEAPSLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADP 100
51 KKREEAPS RPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADP 100 101 DLGVLCPTGCQLQEALLQQERPIRNSVDELN NVEAVSQTSSSSFQYMYL 150 101 DLGVLCPTGCQLQEALLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYL 150 151 LKDL QKRQKQVKDNENWNEYSSELEKHQLYIDETVNSNIPTNLRVLRS 200 151 LKDLWQKRQKQVKDNENVWEYSSELEKHQLYIDETWSNI PTNLRVLRS 200 201 ILENLRSKIQKLESDVSAQMEYCRTPCTVSCNIPWSGKECEEI IRKGGE 250 201 I ENLRSKIQKLESDVSAQMEYCRTPCTVSCNIPWSGKECEEIIRKGGE 250 251 TSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSVDFGRK DPYK 300 251 TSEMYLIQPDSSVKPYRVYCDMNTENGG TVIQNRQDGSVDFGRK DPYK 300 301 QGFGNVATNTDGKNYCGLPGEY LGNDKISQ TRMGPTELLIEMEDWKGD 350 301 QGFGNVATNTDGKNYCGLPGEYWLGNDKISQLTRMGPTELLIEMED KGD 350 351 KVKAHYGGFTVQNEA KYQISVNKYRGTAGNALMDGASQLMGENRTMTIH 400 351 KVKAHYGGFTVQNFA KYQISV KYRGTAGNALMDGASQLMGENRTMTIH 400 401 NGMFFSTYDRDNDGW 415 401 NGMFFSTYDRDNDGW 415
Sequence name: /tmp/UTIN4aye3n/eDNov8vJNW: FIBB_HUMAN
Sequence documentation: Alignment of: HUMFBRB_PEA_1_P9 x FIBB_HUMAN
Alignment segment 1/1: Quality: 3184.00 Escore: 0 Matching length: 320 Total length: 320
Matching Percent Similarity: 100..00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100. .00 Total Percent
Identity: 100.00 Gaps : 0
Alignment : 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 51 KKREEAPSLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADP 100 51 KKREEAPS RPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADP 100 101 DLGVLCPTGCQLQEALLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYL 150 101 DLGVLCPTGCQLQEALLQQERPIRNSVDE NNNVEAVSQTSSSSFQYMYL 150 151 LKDLWQKRQKQVKDNENWNEYSSELEKHQLYIDETVNSNIPTNLRVLRS 200 151 LlKlDlLlWlQlKlRlQlKlQlVlKlDlNlElNlWllNlElYlSlSlElLlElKlHlQlLlYlIlDlElTlVlNlSlNlIlPlTlNlLlRlVlLlRlSl 200 201 ILENLRSKIQKLESDVSAQMEYCRTPCTVSCNIPWSGKECEEI IRKGGE 250 201 IlLlElNlLlRlSlKlIlQlKlLlElSlDlVlSlAlQlMlElYlClRlTlPlClTlVlSlClNlIlPlWllSlGlKlElClElElIlIlRlKlGlGlEl 250 251 TSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSVDFGRKWDPYK 300 251 TSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSVDFGRKWDPYK 300 301 QGFGNVATNTDGKNYCGLPG 320 301 QGFGNVATNTDGKNYCGLPG 320
Sequence name: /tmp/exMoDWf9BJ/Exdzdf7JZk : FIBB_HUMAN
Sequence documentation:
Alignment of: HUMFBRB_PEA_1_P11 x FIBB_HUMAN
Alignment segment l/l Quality: 2752.00
Escore : 0 Matching length 278 Total length: 278 Matching Percent Similarity 100.00 Matching Percent Identity: 100.00 Total Percent Similarity 100.00 Total Percent Identity: 100.00 Gaps 0
Alignment : 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 51 KKREEAPSLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADP 100 51 KKREEAPSLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADP 100 101 DLGVLCPTGCQLQEALLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYL 150 101 DLGVLCPTGCQLQEALLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYL 150 151 LKDLWQKRQKQVKDNENWNEYSSELEKHQLYIDETVNSNIPTNLRVLRS 200 151 LKDLWQKRQKQVKDNENWNEYSSELEKHQLYIDETVNSNIPTNLRVLRS 200 201 ILENLRSKIQKLESDVSAQMEYCRTPCTVSCNIPWSGKECEEI IRKGGE 250 201 ILENLRSKIQKLESDVSAQMEYCRTPCTVSCNIPWSGKECEEIIRKGGE 250 251 TSEMYLIQPDSSVKPYRVYCDMNTENGG 278 251 TSEMYLIQPDSSVKPYRVYCDMNTENGG 278
Sequence name: /tmp/vfgqx3Vaw6/6hNObHLqSn: FIBB_HUMAN Sequence documentation:
Alignment of: HUMFBRB_PEA_1_P13 x FIBB_HUMAN
Alignment segment 1/1: Quality: 2358.00
Escore: 0 Matching length: 239 Total length: 239 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 . . . . . 51 KKREEAPSLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADP 100 51 KKREEAPSLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADP 100 101 DLGVLCPTGCQLQEALLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYL 150 101 DLGVLCPTGCQLQEALLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYL 150 151 LKDLWQKRQKQVKDNENWNEYSSELEKHQLYIDETVNSNI PTNLRVLRS 200 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 151 LKDLWQKRQKQVKDNENWNEYSSELEKHQLYIDETVNSNIPTNLRVLRS 200 201 ILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PWSGK 239 201 ILENLRSKIQKLESDVSAQMEYCRTPCTVSCNIPWSGK 239
Sequence name: /tmp/GXBQp6tVci/43mhGWIBuS : FIBB_HUMAN
Sequence documentation:
Alignment of: HUMFBRB_PEA_1_P17 x FIBB_HUMAN
Alignment segment 1/1: Quality: 4440.00
Escore : 0 Matching length: 449 Total length: 491 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 91.45 Total Percent Identity: 91.45 Gaps : 1
Alignment : 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 51 KKREEAPSLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADP 100 51 KKREEAPSLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADP 100 101 DLGVLCPTGCQLQEALLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYL 150 101 DLGVLCPTGCQLQEALLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYL 150 151 LKDLWQKRQKQVKDNENWNEYSSELEKHQLYIDETVNSNIPTNLRVLRS 200 151 LlKlDlLlWlQlKlRlQlKlQlVlKlDlNlElNlWllNlElYlSlSlElLlElKlHlQlLlYlIlDlElTlVlNlSlNlIlPlTlNlLlRlVlLlRlSl 200 201 ILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PWSGKECEEI IRKGGE 250 201 IlLlElNlLlRlSlKlIlQlKlLlElSlDlVlSlAlQlMlElYlClRlTlPlClTlVlSlClNlIlPlWllSlGlKlElClElElIlIlRlKlGlGlEl 250 251 TSEMYLIQPDSSVKPYRVYCDMNTENG 277 251 TlSlElMlYlLlIlQlPlDlSlSlVlKlPlYlRlVlYlClDlMlNlTlElNlGlGWTVIQNRQDGSVDFGRKWDPYK 300 278 GEYWLGNDKISQLTRMGPTELLIEMEDWKGD 308 301 QGFGNVATNTDGKNYCGLPGlElYlWlLlGlNlDlKlIlSlQlLlTlRlMlGlPlTlElLlLlIlElMlElDlWlKlGlDl 350 309 KVKAHYGGFTVQNEANKYQISVNKYRGTAGNALMDGASQLMGENRTMTIH 358 351 KVKAHYGGFTVQNEANKYQISVNKYRGTAGNALMDGASQLMGENRTMTIH 400 359 NGMFFSTYDRDNDGWLTSDPRKQCSKEDGGGWWYNRCHAANPNGRYYWGG 408
401 NGMFFSTYDRDNDGWLTSDPRKQCSKEDGGGWWYNRCHAANPNGRYYWGG 450 409 QYTWDMAKHGTDDGWWMNWKGSWYSMRKMSMKIRPFFPQQ 449 451 QYTWDMAKHGTDDGWWMNWKGSWYSMRKMSMKIRPFFPQQ 491
Sequence name: /tmp/Hdp04IQYhr/g891AlNcFf : FIBB_HUMAN Sequence documentation:
Alignment of: HUMFBRB_PEA_1_P26 x FIBB_HUMAN
Alignment segment 1/1: Quality: 4291.00 Escore: 0 Matching length: 432 Total length: 491 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 87.98 Total Percent Identity: 87.98 Gaps : 1
Alignment : 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 1 MKRMVSWSFHKLKTMKHLLLLLLCVFLVKSQGVNDNEEGFFSARGHRPLD 50 51 KKRE 54 MM 51 KKREEAPSLRPAPPPISGGGYRARPAKAAATQKKVERKAPDAGGCLHADP 100 . . . . . 55 EALLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYL 91 MMMMMMMMMMMMMMMMMMI 101 DLGVLCPTGCQLQEALLQQERPIRNSVDELNNNVEAVSQTSSSSFQYMYL 150 92 LKDLWQKRQKQVKDNENWNEYSSELEKHQLYIDETVNSNIPTNLRVLRS 141 MMMMMMMMMMMMMMMMMMMMMMMMM 151 LKDLWQKRQKQVKDNENWNEYSSELEKHQLYIDETVNSNIPTNLRVLRS 200
142 ILENLRSKIQKLESDVSAQMEYCRTPCTVSCNI PWSGKECEEI IRKGGE 191 MMMMMMMMMMMMMMMMMMMMMMMMM 201 ILENLRSKIQKLESDVSAQMEYCRTPCTVSCNIPWSGKECEEI IRKGGE 250 192 TSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSVDFGRKWDPYK 241 MMMMMMMMMMMMMMMMMMMMMMMMM 251 TSEMYLIQPDSSVKPYRVYCDMNTENGGWTVIQNRQDGSVDFGRKWDPYK 300 242 QGFGNVATNTDGKNYCGLPGEYWLGNDKISQLTRMGPTELLIEMEDWKGD 291 MMMMMMMMMMMMMMMMMMMMMMMMM 301 QGFGNVATNTDGKNYCGLPGEYWLGNDKISQLTRMGPTELLIEMEDWKGD 350 292 KVKAHYGGFTVQNEANKYQISVNKYRGTAGNALMDGASQLMGENRTMTIH 341 MMMMMMMMMMMMMMMMMMMMMMMMM 351 KVKAHYGGFTVQNEANKYQISVNKYRGTAGNALMDGASQLMGENRTMTIH 400 342 NGMFFSTYDRDNDGWLTSDPRKQCSKEDGGGWWYNRCHAANPNGRYYWGG 391 MMMMMMMMMMMMMMMMMMMMMMMMM 401 NGMFFSTYDRDNDGWLTSDPRKQCSKEDGGGWWYNRCHAANPNGRYYWGG 450 392 QYTWDMAKHGTDDGWWMNWKGSWYSMRKMSMKIRPFFPQQ 432 M M M M M M M M M M M M M M M M M M M M I 451 QYTWDMAKHGTDDGWWMNWKGSWYSMRKMSMKIRPFFPQQ 491
DESCRIPTION FOR CLUSTER HSMRACP5 Cluster HSMRACP5 features 3 transcript(s) and 18 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
HSMRACP5_PEA_1_T11 231 HSMRACP5_PEA_1_T14 232 HSMRACP5 PEA 1 T20 233
Table 2 - Segments of interest iSeement-Namβi Sequence DMbfo.» fr HSMRACP5_PEA_l_node 0 234 HSMRACP5_PEA_1 node 12 235 HSMRACP5_PEA_l_node 13 236
Table 3 - Proteins of interest
HSMRACP5_PEA_1_P11 253 HSMRACP5 PEA 1 P12 254 HSMRACP5_PE A_ 1 _P 14 255 These sequences are variants of the known protein Tarfrate- resistant acid phosphatase type 5 precursor (SwissProt accession identifier PPA5_HUMAN; known also according to the synonyms EC 3.1.3.2; TR- AP; Tarfrate-resistant acid ATPase; TrATPase), SEQ ID NO: 252, refened to herein as the previously known protein. This protein (and hence its variants) may optionally be used for the following test: Acid Phosphatase (Enzymes) - Used to differentiate multiple myeloma with other monoclonal gammopathies of uncertain significance. Tarfrate-resistant acid phosphatase (TRAP) is a basic iron-binding protein. It is detected in human alveolar macrophages, osteoclasts, spleen and liver. This relatively minor intracellular isozyme of acid phosphatase can become the dominant isozyme in certain pathological states (Gaucher's and Hodgkin's diseases, the hairy cell, the B- cell, and the T-cell leukemias). It is used in the immunohistochemistry diagnosis of hairy-cell leukemia.
The sequence for protein Tarfrate-resistant acid phosphatase type 5 precursor is given at the end of the application, as "Tartrate-resistant acid phosphatase type 5 precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Tarfrate-resistant acid phosphatase type 5 precursor localization is believed to be Lysosomal. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: acid phosphatase, which are annotation(s) related to Molecular Function; and integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HSMRACP5 features 3 transcript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Tartrate-resistant acid phosphatase type 5 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSMRACP5_PEA_1_P11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMRACP5_PEA_1_T11. An alignment is given to the known protein (Tartrate-resistant acid phosphatase type 5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief
description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSMRACP5_PEA_1_P11 and AAH25414 (SEQ ID NO: 1425): l.An isolated chimeric polypeptide encoding for HSMRACP5_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to
MDMWTALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMANAKEIART VQILGADFILSLGDNFYFTGVQDINDKRFQ conesponding to amino acids 1 - 87 of AAH25414, which also corresponds to amino acids 1 - 87 of HSMRACP5_PEA_1_P11, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VCAQQSGAGGGGGQWGEAALPSDLPLVRAEGR conesponding to amino acids 88 - 119 of HSMRACP5_PEA_1_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSMRACP5_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VCAQQSGAGGGGGQWGEAALPSDLPLVRAEGR in HSMRACP5_PEA_1_P11. Comparison report between HSMRACP5_PEA_1_P11 and PPA5_HUMAN: l.An isolated chimeric polypeptide encoding for HSMRACP5_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to
MDMWTALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMANAKEIART VQILGADFILSLGDNFYFTGVQDINDKRFQ conesponding to amino acids 1 - 87 of PPA5_HUMAN, which also conesponds to amino acids 1 - 87 of HSMRACP5_ PEA_1_P11, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VCAQQSGAGGGGGQWGEAALPSDLPLVRAEGR conesponding to amino acids 88 - 119 of HSMRACP5_PEA_1_P11, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSMRACP5_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%,
more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VCAQQSGAGGGGGQWGEAALPSDLPLVRAEGR in HSMRACP5_PEA_1_P11.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans -membrane region prediction program predicts that this protein has a frans -membrane region..
Variant protein HSMRACP5_PEA_1_P11 is encoded by the following franscript(s): HSMRACP5_PEA_1_T11, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMRACP5_PEA_1_T11 is shown in bold; this coding portion starts at position 231 and ends at position 587. The franscript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMRACP5_PEA_1_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
1069 G -> No
Variant protein HSMRACP5_PEA_1_P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMRACP5_PEA_1_T14. An alignment is given to the known protein (Tarfrate-resistant acid phosphatase type 5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSMRACP5_PEA_1_P12 and AAH25414: l.An isolated chimeric polypeptide encoding for HSMRACP5_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to
MDM TALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMANAKEIART VQILGADFILSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLRKVP conesponding to amino acids 1 - 103 of AAH25414, which also conesponds to amino acids 1 - 103 of
HSMRACP5_PEA_1_P12, and a second amino acid sequence being at least 90 % homologous to
WNFPSPFYRLHFKIPQTNVSVAIFMLDTVTLCGNSDDFLSQQPERPRDVKLARTQLSWL KKQLAAAREDYVLVAGHYPVWSIAEHGPTHCLVKQLRPLLATYGVTAYLCGHDHNLQ YLQDENGVGYVLSGAGNFMDPSKRHQRKVPNGYLRFHYGTEDSLGGFAYVEISSKEM TVTYIEASGKSLFKTRLPRRARP conesponding to amino acids 130 - 325 of AAH25414, which also conesponds to amino acids 104 - 299 of HSMRACP5_PEA_1_P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSMRACP5_PEA_1_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PW, having a structure as follows: a sequence starting from any of amino acid numbers 103-x to 103; and ending at any of amino acid numbers 104+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans -membrane region.. Variant protein HSMRACP5_PEA_1_P12 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMRACP5_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein HSMRACP5_PEA_1_P12 is encoded by the following transcript(s): HSMRACP5_PEA_1_T14, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSMRACP5_PEA_1_T14 is shown in bold; this coding portion starts at position 304 and ends at position 1200. The franscript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMRACP5_PEA_1_P12 sequence provides support for the deduc ed sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HSMRACP5_PEA_1_P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMRACP5_PEA_1_T20. An alignment is given to the known protein (Tarfrate-resistant acid phosphatase type 5 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSMRACP5_PEA_1_P14 and AAH25414: l.An isolated chimeric polypeptide encoding for HSMRACP5_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to MDMWTALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMANAKEIART VQILGADFILSLGDNFYFTGVQDTNDKRFQETFEDVFSDRSLRK conesponding to amino acids 1 - 101 of AAH25414, which also conesponds to amino acids 1 - 101 of HSMRACP5_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EGETQLMNCGAT conesponding to amino acids 102 - 113 of HSMRACP5_PEA_1_P14, wherein said first and second amino acid sequences are contiguous and in a sequential order.
2.An isolated polypeptide encoding for a tail of HSMRACP5_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EGETQLMNCGAT in HSMRACP5_PEA_1_P14. Comparison report between HSMRACP5_PEA_1_P14 and PPA5_HUMAN: l.An isolated chimeric polypeptide encoding for HSMRACP5_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to
MDMWTALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMANAKEIART VQILGADFILSLGDNFYFTGVQDΓNDKRFQETFEDVFSDRSLRK conesponding to amino acids 1 - 101 of PPA5_HUMAN, which also conesponds to amino acids 1 - 101 of
HSMRACP5_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EGETQLMNCGAT conesponding to amino acids 102 - 113 of HSMRACP5_PEA_1_P14, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSMRACP5_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EGETQLMNCGAT in HSMRACP5_PEA_1_P14.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans -membrane region..
Variant protein HSMRACP5_PEA_1_P14 is encoded by the following transcript(s): HSMRACP5_PEA_1_T20, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMRACP5_PEA_1_T20 is shown in bold; this coding portion starts at position 304 and ends at position 642. The transcript also has the following SNPs as
listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMRACP5_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSMRACP5_PEA_l_node_0 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMRACP5_PEA_1_T11, HSMRACP5_PEA_1_T14 and HSMRACP5_PEA_1_T20. Table 9 below describes the starting and ending position of this segment on each franscript. Table 9 - Segment location on transcripts
HSMRACP5 PEA 1 Ti l 1 143 HSMRACP5 PEA 1 T14 143 HSMRACP5 PEA_1_T20 143 Segment cluster HSMRACP5_PEA_l_node_12 according to the present invention is supported by 122 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMRACP5_PEA_1_T11, HSMRACP5_PEA_1_T14 and HSMRACP5_PEA_1_T20. Table 10 below describes the starting and ending position of this segment on each transcript.
Table 10 - Segment location on transcripts
Segment cluster HSMRACP5_PEA_l_node_13 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMRACP5_PEA_1_T11. Table 11 below describes the starting and ending position of this segment on each franscript. Table 11 - Segment location on transcripts
Segment cluster HSMRACP5_PEA_l_node_19 according to the present invention is supported by 124 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMRACP5_PEA_1_T11 and HSMRACP5_PEA_1_T14. Table 12 below describes the starting and ending position of this segment on each franscript. Table 12 - Segment location on transcripts
HSMRACP5 PEA 1 Ti l 833 1143 HSMRACP5 PEA 1 T14 615 925
Segment cluster HSMRACP5_PEA_l_node_24 according to the present invention is supported by 115 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): HSMRACP5_PEA_1_T1 1 and HSMRACP5_PEA_1_T14. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster HSMRACP5_PEA_l_node_25 according to the present invention is supported by 104 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMRACP5_PEA_1_T11 and HSMRACP5_PEA_1_T14. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
HSMRACP5 PEA 1 Ti l 1397 1537 HSMRACP5_PEA_1_T14 1179 1319
Segment cluster HSMRACP5_PEA_l_node_28 according to the present invention is supported by 76 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMRACP5_PEA_1_T11, HSMRACP5_PEA_1_T14 and HSMRACP5_PEA_1_T20. Table 15 below describes the starting and ending position of this segment on each franscript. Table 15 - Segment location on transcripts
HSMRACP5 PEA 1 Ti l 1661 1770 HSMRACP5 PEA_1 T14 1443 1552 HSMRACP5 PEA 1 T20 730 954
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSMRACP5_PEA_l_node_l 1 according to the present invention is supported by 104 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMRACP5_PEA_1_T11, HSMRACP5_PEA_1_T14 and HSMRACP5_PEA_1_T20. Table 16 below describes the starting and ending position of this segment on each franscript. Table 16 - Segment location on transcripts
HSMRACP5 PEA 1_T11 231 285 HSMRACP5 PEA 1 T14 304 358 HSMRACP5 PEA 1 T20 304 358
Segment cluster HSMRACP5_PEA_l_node_14 according to the present invention is supported by 93 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMRACP5_PEA_1_T11, HSMRACP5_PEA_1_T14 and HSMRACP5_PEA_1_T20. Table 17 below describes the starting and ending position of this segment on each franscript. Table 17 - Segment location on transcripts
Segment cluster HSMRACP5_PEA_l_node_15 according to the present invention can be found in the following transcript(s): HSMRACP5_PEA_1_T11, HSMRACP5_PEA_1_T14 and HSMRACP5_PEA_1_T20. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
Segment cluster HSMRACP5_PEA_l_node_16 according to the present invention can be found in the following franscript(s): HSMRACP5_PEA_1_T11 and HSMRACP5_PEA_1_T14. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
HSMRACP5 PEA 1 Ti l 747 754 HSMRACP5 PEA 1 T14 607 614
Segment cluster HSMRACP5_PEA_l_node_ 17 according to the present invention is supported by 94 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMRACP5_PEA_1_T11. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HSMRACP5_PEA_l_node_20 according to the present invention is supported by 82 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMRACP5_PEA_1_T11 and HSMRACP5_PEA_1_T14. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts iS.egrnentferHing pblsiti c sgK HSMRACP5 PEA 1 Ti l 1144 1 178 HSMRACP5 PEA 1 T14 926 960
Segment cluster HSMRACP5_PEA_l_node_23 according to the present invention is supported by 89 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMRACP5_PEA_1_T11 and
HSMRACP5_PEA_1_T14. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
HSMRACP5 PEA 1 Ti l 1179 1225 HSMRACP5 PEA 1 T14 961 1007
Segment cluster HSMRACP5_PEA_l_node_26 according to the present invention is supported by 93 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMRACP5_PEA_1_T11, HSMRACP5_PEA_1_T14 and HSMRACP5_PEA_1_T20. Table 23 below describes the starting and ending position of this segment on each franscript. Table 23 - Segment location on transcripts
Segment cluster HSMRACP5_PEA_l_node_27 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMRACP5_PEA_1_T11, HSMRACP5_PEA_1_T14 and HSMRACP5_PEA_1_T20. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HSMRACP5_PEA_l_node_3 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously descnbed. This
segment can be found in the following transcript(s): HSMRACP5_PEA_1_T14 and HSMRACP5_PEA_1_T20. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
HSMRACP5 PEA 1 T14 144 216 HSMRACP5 PEA 1 T20 144 216
Segment cluster HSMRACP5_PEA_l_node_8 according to the present invention is supported by 103 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMRACP5_PEA_1_T11, HSMRACP5_PEA_1_T14 and HSMRACP5_PEA_1_T20. Table 26 below describes the starting and ending position of this segment on each franscript. Table 26 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/hM40faUQsE/hjbvzMzhSz :AAH25414 Sequence documentation: Alignment of: HSMRACP5_PEA_1_P11 x AAH25414 Alignment segment 1/1: Quality: 837.00 Escore : 0 Matching length: 87 Total length: 87
Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MDM TALLILQALLLPSLADGATPALRFVAVGD GGVPNAPFHTAREMAN 50 MMMMMMMMMMMMMMMMMMMMMMMMM 1 MDM TALLILQALLLPSLADGATPALRFVAVGD GGVPNAPFHTAREMAN 50 51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQ 87 MMMMMMMMMMMMMMMMMIM 51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQ 87
Sequence name: /tmp/hM40faUQsE/hjbvzMzhSz : PPA5_HUMAN Sequence documentation:
Alignment of: HSMRACP5_PEA_1_P11 x PPA5_HUMAN Alignment segment 1/1: Quality: 837.00
Escore: 0 Matching length: 87 Total length: 87 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MDM TALLILQALLLPSLADGATPALRFVAVGD GGVPNAPFHTAREMAN 50 MMMMMMMMMMMMMMMMMMMMMMMMM 1 MDMWTALLILQALLLPSLADGATPALRFVAVGD GGVPNAPFHTAREMAN 50 51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQ 87 MMMMMMMMMMMMMMMMMMI
51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQ 87
Sequence name: /tmp/lkmXZ9hxcz/CRlg0hpvtx:AAH25414 Sequence documentation:
Alignment of: HSMRACP5_PEA_1_P12 x AAH25414 Alignment segment 1/1: Quality: 2842.00
Escore: 0 Matching length: 299 Total length: 325 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 92.00 Total Percent Identity: 92.00 Gaps : 1
Alignment 1 MDMWTALLILQALLLPSLADGATPALRFVAVGD GGVPNAPFHTAREMAN 50 M M M M M M M M M M M M M M M M M M M M M M M M M 1 MDMWTALLILQALLLPSLADGATPALRFVAVGD GGVPNAPFHTAREMAN 50 51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLR 100 M M M M M M M M M M M M M M M M M M M M M M M M M 51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLR 100 . . . . . 101 KVP NFPSPFYRLHFKIPQTNVSV 124 M l M M M M M M M M M M I 101 KVPWYVLAGNHDHLGNVSAQIAYSKISKRWNFPSPFYRLHFKIPQTNVSV 150 125 AIFMLDTVTLCGNSDDFLSQQPERPRDVKLARTQLS LKKQLAA/REDYV 174 M M M M M M M M M M M M M M M M M M M M M MM M M 151 AIFMLDTVTLCGNSDDFLSQQPERPRDVKLARTQLS LKKQLAAAREDYV 200 175 LVAGHYPVWSIAEHGPTHCLVKQLRPLLATYGVTAYLCGHDHNLQYLQDE 224 M M M M M M M M M M M M M M M M M M M M M M M M M 201 LVAGHYPVWSIAEHGPTHCLVKQLRPLLATYGVTAYLCGHDHNLQYLQDE 250
225 NGVGYVLSGAGNFMDPSKRHQRKVPNGYLRFHYGTEDSLGGFAYVEISSK 274 MMMMMMMMMMMMMMMMMMMMMMMMM 251 NGVGYVLSGAGNFMDPSKRHQRKVPNGYLRFHYGTEDSLGGFAYVEISSK 300 275 EMTVTYIEASGKSLFKTRLPRRARP 299 MMMMMMMMMMMMI 301 EMTVTYIEASGKSLFKTRLPRRARP 325
Sequence name: /tmp/lkmXZ9hxcz/CRlg0hpvtx: PPA5_HUMAN
Sequence documentation:
Alignment of: HSMRACP5_PEA_1_P12 x PPA5_HUMAN Alignment segment l/l: Quality: 2697.00 Escore: 0 Matching length: 297 Total length: 325 Matching Percent Similarity: 99.33 Matching Percent Identity: 99.33 Total Percent Similarity: 90.77 Total Percent Identity: 90.77 Gaps : 2
Alignment : 1 MDM TALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMAN 50 MMMMMMMMMMMMMMMMMMMMMMMMM 1 MDMWTALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMAN 50 51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLR 100 MMMMMMMMMMMMMMMMMMMMMMMMM 51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLR 100 101 KVP WNFPSPFYRLHFKIPQTNVSV 124 Ml 1111 ! 11111111111 ! 1111 101 KVPWYVLAGNHDHLGNVSAQIAYSKISKR NFPSPFYRLHFKIPQTNVSV 150 125 AIFMLDTVTLCGNSDDFLSQQPERPRDVKLARTQLS LKKQLAAAREDYV 174 MMMMMMMMMMMMM II II II 111111111 II II I
151 AIFMLDTVTLCGNSDDFLSQQPERPRLT..ARTQLS LKKQLAAAREDYV 198 175 LVAGHYPV SIAEHGPTHCLVKQLRPLLATYGVTAYLCGHDHNLQYLQDE 224 IMII MMMMMMMMMMMMMMIMMMMMMMM 199 LVAGHYPVWSIAEHGPTHCLVKQLRPLLATYGVTAYLCGHDHNLQYLQDE 248 225 NGVGYVLSGAGNFMDPSKRHQRKVPNGYLRFHYGTEDSLGGFAYVEISSK 274 MIMMMMMMMMMMIMMMMMIMMIMMMMM 249 NGVGYVLSGAGNFMDPSKRHQRKVPNGYLRFHYGTEDSLGGFAYVEISSK 298 275 EMTVTYIEASGKSLFKTRLPRRARP 299 MMMMMMMMMIMMM 299 EMTVTYIEASGKSLFKTRLPRRARP 323
Sequence name: /tmp/406disdoks/FAkOMludDl : AAH25414
Sequence documentation:
Alignment of: HSMRACP5_PEA_1_P14 x AAH25414
Alignment segment l/l: Quality: 972.00 Escore: 0 Matching length: 101 Total length: 101 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MDMWTALLILQALLLPSLADGATPALRFVAVGDWGGVPNAPFHTAREMAN 50 MMMMMMMMMMMMMMIMMIMMMIIMMMM 1 MDM TALLILQALLLPSLADGATPALRFVAVGD GGVPNAPFHTAREMAN 50 51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLR 100 M 1111111 I II II II II 11 II II I II 11111 II I II I I II I II II I II I 51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLR 100
101 K 101 I 101 K 101
Sequence name: /tmp/406disdoks/FAkOMludDl : PPA5_HUMAN Sequence documentation:
Alignment of: HSMRACP5_PEA_1_P14 x PPA5_HUMAN Alignment segment 1/1: Quality: 972.00
Escore : 0 Matching length: 101 Total length: 101 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment 1 MDMWTALLILQALLLPSLADGATPALRFVAVGD GGVPNAPFHTAREMAN 50 MMMIMMMMMMMMMMMMMMMMIMMMMM 1 MDM TALLILQALLLPSLADGATPALRFVAVGD GGVPNAPFHTAREMAN 50 51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLR 100 MIMMMMMMIMMMMMMMMIMIMMMMMMM 51 AKEIARTVQILGADFILSLGDNFYFTGVQDINDKRFQETFEDVFSDRSLR 100 101 K 101 101 K 101
SECTION 2: VARIANTS OF KNOWN IHC MARKERS
DESCRIPTION FOR CLUSTER Z25227 Cluster Z25227 features 2 transcript(s) and 12 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Z25227 node 34 258 Z25227 node 39 259 Z25227_node_40 260 Z25227 node 46 261 Z25227 node 49 262 Z25227 node_51 263 Z25227 node_53 264 Z25227 node 35 265 Z25227 node_36 266 Z25227 node_47 267 Z25227 node 50 268 Z25227 node_52 269
Table 3 - Proteins of interest Protein Name Sequence ID No.'
These sequences are variants of the known protein Mothers against decapentaplegic homolog 4 (SwissProt accession identifier SMA4_HUMAN; known also according to the synonyms SMAD 4; Mothers against DPP homolog 4; Deletion target in pancreatic carcinoma 4; hSMAD4), SEQ ID NO: 270, refened to herein as the previously known protein. Protein Mothers against decapentaplegic homolog 4 is known or believed to have the following function(s): COMMON MEDIATOR OF SIGNAL TRANSDUCTION BY TGF- BETA (TRANSFORMING GROWTH FACTOR) SUPERFAMILY; SMAD4 IS THE COMMON SMAD (CO- SMAD). PROMOTES BINDING OF THE SMAD2/SMAD4/FAST- 1 COMPLEX TO DNA AND PROVIDES AN ACTIVATION FUNCTION REQUIRED FOR SMAD1 OR SMAD2 TO STIMULATE TRANSCRIPTION. MAY ACT AS A TUMOR SUPPRESSOR. The sequence for protein Mothers against decapentaplegic homolog 4 is given at the end ofthe application, as "Mothers against decapentaplegic homolog 4 amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. The known protein Mothers against decapentaplegic homolog 4 (Smad4) is an intracellular protein and a mediator of signal transduction by TGF-beta superfamily. Smad4 changes in expression is associated with many types of malignant process, and actively used for the diagnosis of some. It includes but not limited to pancreatic carcinoma, gastric cancer, ovarian cancer, prostate cancer, colorectal cancer and endomefrial carcinoma. In addition, mutations in Smad4 are also associated with cancers/malignancies ofthe gastrointestinal tract, and in particular juvenile intestinal polyposis, colonrectal cancer and esophageal cancer. An example of a prefened test is detection of Ovarian mucinous carcinoma versus pancreatic carcinoma. Table 4 - Amino acid mutations for Known Protein
Protein Mothers against decapentaplegic homolog 4 localization is believed to be in the cytoplasm in the absence of ligand; migration to the nucleus when complexed with R-SMAD. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: transcription regulation; SMAD protein heteromerization, which are annotation(s) related to Biological Process; transcription factor; transcription co- factor, which are annotation(s) related to Molecular Function; and nucleus; cytoplasm, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster Z25227 features 2 franscript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Mothers against decapentaplegic homolog 4. A description of each variant protein according to the present invention is now provided.
Variant protein Z25227_P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z25227_T18 and Z25227_T19. An alignment is given to the known protein (Mothers against decapentaplegic homolog 4) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z25227_P 10 and SMA4_HUMAN: l.An isolated chimeric polypeptide encoding for Z25227_P10, comprising a first amino acid sequence being at least 90 % homologous to
MQQQAATAQAAAAAQAAAVAGNIPGPGSVGGIAPAISLSAAAGIGVDDLRRLCILRMS FVKGWGPDYPRQSIKETPCWIEIHLHRALQLLDEVLHTMPIADPQPLD conesponding to amino acids 447 - 552 of SMA4_HUMAN, which also conesponds to amino acids 1 - 106 of Z25227_P10.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM: Signal peptide,NN:NO) predicts that this protein has a signal peptide.. Variant protein Z25227_P10 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25227_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein Z25227_P10 is encoded by the following transcript(s): Z25227_T18 and Z25227_T19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z25227_T18 is shown in bold; this coding portion starts at position 364 and ends at position 681. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25227_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
The coding portion of franscript Z25227 T19 is shown in bold; this coding portion starts at position 825 and ends at position 1142. The franscript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z25227_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
As noted above, cluster Z25227 features 12 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster Z25227_node_34 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25227_T18. Table 8 below describes the starting and ending position of this segment on each transcript.
Table 8 - Segment location on transcripts
Segment cluster Z25227_node_39 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25227_T19. Table 9 below describes the starting and ending position of this segment on each franscript. Table 9 - Segment location on transcripts
Segment cluster Z25227_node_40 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following franscript(s): Z25227_T18 and Z25227_T19. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster Z25227_node_46 according to the present invention is supported by 218 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25227_T18 and Z25227_T19. Table 11 below describes the starting and ending position of this segment on each transcπpt. Table 11 - Segment location on transcripts
Segment cluster Z25227_node_49 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): Z25227_T18 and Z25227_T19. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster Z25227_node_51 according to the present invention is supported by 78 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following franscript(s): Z25227_T18 and Z25227_T19. Table 13 below describes the starting and ending position of this segment on each transcπpt. Table 13 - Segment location on transcripts
Segment cluster Z25227_node_53 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): Z25227_T18 and Z25227 T19. Table 14 below descnbes the starting and ending position of this segment on each transcript.
Table 14 - Segment location on transcripts
the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster Z25227_node_35 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously descnbed. This segment can be found in the following franscript(s): Z25227_T18. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster Z25227_node_36 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z25227_T18. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts t"Segment tfft positioι8 Segmenjgy mg position**:-" Z25227 T18 256 333
Segment cluster Z25227_node_47 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be
found in the following franscript(s): Z25227_T18 and Z25227 T19. Table 17 below descπbes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Z25227 T18 2577 2613 Z25227 T19 3038 3074
Segment cluster Z25227_node_50 according to the present invention can be found in the following transcript(s): Z25227_T18 and Z25227 T19. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster Z25227_node_52 according to the present invention can be found in the following transcript(s): Z25227_T18 and Z25227 T19. Table 19 below describes the starting and ending position of this segment on each franscript. Table 19 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name: /tmp/vNwspGYhmC/xRF4MFVwRZ : SMA4_HUMAN
Sequence documentation: Alignment of: Z25227_P10 x SMA4_HTJMAN
Alignment segment l/l:
Quality: 1024.00 Escore: 0
,,.*'' Matching length: 106 Total length: 106' Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : . . . . . 1 MQQQAATAQAAAAAQAAAVAGNIPGPGSVGGIAPAISLSAAAGIGVDDLR 50 MMMMIMMMMMMMMIMMMMMMMMMMMM 447 MQQQAATAQAAAAAQAAAVAGNIPGPGSVGGIAPAISLSAAAGIGVDDLR 496 51 RLCILRMSFVKG GPDYPRQSIKETPCWIEIHLHRALQLLDEVLHTMPIA 100 MMMMMMIMMMMMMMMMMMMMIMMMMM 497 RLCILRMSFVKGWGPDYPRQSIKETPC IEIHLHRALQLLDEVLHTMPIA 546 101 DPQPLD 106 I I M I I 547 DPQPLD •• 552
DESCRIPTION FOR CLUSTER T87719 Cluster T87719 features 2 franscript(s) and 19 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Podocalyxin- like protein 1 precursor (SwissProt accession identifier PODX_HUMAN, also called Lymphocyte antigen 75; Gp200), SEQ ID NO: 293, refened to herein as the previously known protein. The known protein known protein Podocalyxin- like protein 1 precursor is a type I membrane protein and it is used in immunohistochemistry diagnosis of renal cell carcinoma. Reportedly, gp200 is expressed by 93% of primary and 84% of metastatic renal cell carcinomas. Down regulation of gp200 is seen in breast cancer. An example of prefened test is diagnosis of Renal cell carcinoma. Protein Podocalyxin- like protein 1 precursor is known or believed to have the following function(s): Functions as an antiadhesin that maintains an open filtration pathway between neighboring foot processes in the podocyte by charge repulsion. The sequence for protein Podocalyxin-like protein 1 precursor is given at the end of the application, as "Podocalyxin-like protein 1 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Podocalyxin- like protein 1 precursor localization is believed to be Type I membrane protein (Potential). The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: integral plasma membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster T87719 features 2 transcript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Podocalyxin-like protein 1 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein T87719_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T87719_T1. An alignment is given to the known protein (Podocalyxin- like protein 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T87719_P2 and P0DX_HUMAN_V1 (SEQ ID NO: 294): 1.An isolated chimeric polypeptide encoding for T87719_P2, comprising a first amino acid sequence being at least 90 % homologous to MRCALALSALLLLLSTPPLLPS conesponding to amino acids 1 - 22 of P0DX_HUMAN__V1, which also conesponds to amino acids 1 - 22 of T87719_P2, a second amino acid sequence being at least 90 % homologous to SPSPSPSPSQNATQTTTDSSNKTAPTPASSVTIMATDTAQQSTVPTSKANEILASVKATTL GVSSDSPGTTTLAQQVSGPVNTTVARGGGSGNPTTTIESPKSTKSADTTTVATSTATAKP NTTSSQNGAEDTTNSGGKSSHSVTTDLTSTKAEHLTTPHPTSPLSPRQPTSTHPVATPTSS GHDHLMKISSSSSTVAIPGYTFTSPGMTTTLPSSVISQRTQQTSSQMPASSTAPSSQETVQ PTSPATALRTPTLPETMSSSPTAASTTHRYPKTPSPTVAHESNW conesponding to amino
acids 25 - 311 of P0DX_HUMAN_V1, which also conesponds to amino acids 23 - 309 of T87719_P2, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VTPAGVGQVGEPRLG conesponding to amino acids 310 - 324 of T87719_P2, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of T87719_P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SS, having a structure as follows: a sequence starting from any of amino acid numbers 22-x to 22; and ending at any of amino acid numbers 23+ ((n-2) - x), in which x varies from 0 to n-2. 3. An isolated polypeptide encoding for a tail of T87719_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VTPAGVGQVGEPRLG in T87719_P2.
It should be noted that the known protein sequence (PODX_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for PODX_HUMAN_Vl. These changes were previously known to occur and are listed in the table below. Table 5 - Changes to PODX_HUMAN_Vl
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized
programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein T87719_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87719_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein T87719_P2 is encoded by the following transcript(s): T87719_T1, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript T87719_T1 is shown in bold; this coding portion starts at position 260 and ends at position 1231. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide seque nee, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87719_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein T87719_P8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T87719_T9. An alignment is given to the known protein (Podocalyxin-like protein 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the
end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T87719_P8 and PODX_HUMAN: l.An isolated chimeric polypeptide encoding for T87719_P8, comprising a first amino acid sequence being at least 90 % homologous to MRCALALSALLLLLSTPPLLPS conesponding to amino acids 1 - 22 of PODX_HUMAN, which also conesponds to amino acids 1 - 22 of T87719_P8, a second amino acid sequence being at least 90 % homologous to SPSPSPSPSQNATQTTTDSSNKTAPTPASSVTIMATDTAQQSTVPTSKANEILASVKATTL GVSSDSPGTTTLAQQVSGPVNTTVARGGGSGNPTTTIESPKSTKSADTTTVATSTATAKP NTTSSQNGAEDTTNSGGKSSHSVTTDLTSTKAE conesponding to amino acids 25 - 178 of PODX_HUMAN, which also conesponds to amino acids 23 - 176 of T87719_P8, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence RARVKL conesponding to amino acids 177 - 182 of T87719_P8, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of T87719_P8, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise SS, having a structure as follows: a sequence starting from any of amino acid numbers 22-x to 22; and ending at any of amino acid numbers 23+ ((n-2) - x), in which x varies from 0 to n-2. 3.An isolated polypeptide encoding for a tail of T87719_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence RARVKL in T87719_P8.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell:
secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T87719_P8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87719_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
The glycosylation sites of variant protein T87719_P8, as compared to the known protein Podocalyxin-like protein 1 precursor, are described in Table 9 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s)
Variant protein T87719_P8 is encoded by the following transcript(s): T87719_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T87719_T9 is shown in bold; this coding portion starts at position 260 and ends at position 805.
The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87719_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T87719_node_0 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87719_T1 and T87719_T9. Table 11 below describes the starting and ending position of this segment on each franscript. Table 11 - Segment location on transcripts
Segment cluster T87719_node_4 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be
found in the following transcript(s): T87719_T1 and T87719 T9. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster T87719_node_5 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following franscript(s): T87719_T1. Table 13 below descnbes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster T87719_node_10 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following franscπpt(s): T87719_T1. Table 14 below describes the starting and ending position of this segment on each transcπpt. Table 14 - Segment location on transcripts
Segment cluster T87719_node_l 1 according to the present invention is supported by 3 hbranes The number of hbranes was determined as previously descπbed This segment can be found in the following franscπpt(s) T87719_T1 Table 15 below descπbes the starting and ending position of this segment on each transcπpt Table 15 - Segment location on transcripts
Segment cluster T87719_node_15 according to the present invention is supported by 24 libraπes The number of libraπes was determined as previously descπbed This segment can be found in the following franscπpt(s) T87719_T1 Table 16 below descπbes the starting and ending position of this segment on each franscnpt Table 16 - Segment location on transcripts
Segment cluster T87719_node_19 according to the present invention is supported by 29 libraries The number of libraries was determined as previously described This segment can be found in the following transcπpt(s) T87719_T1 Table 17 below descπbes the starting and ending position of this segment on each transcπpt Table 17 - Segment location on transcripts
Segment cluster T87719_node_21 according to the present invention is supported by 87 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87719_T1. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster T87719_node_23 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87719_T1. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster T87719_node_25 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87719_T1. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster T87719_node_26 according to the present invention is supported by 111 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87719_T1. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster T87719_node_27 according to the present invention is supported by 251 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87719_T1. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster T87719_node_28 according to the present invention is supported by 148 libraries. The number of libraπes was determined as previously described. This segment can be found in the following transcript(s): T87719_T1 and T87719_T9. Table 23 below descπbes the starting and ending position of this segment on each franscnpt. Table 23 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T87719_node_l according to the present invention can be found in the following transcript(s): T87719_T1 and T87719_T9. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster T87719_node_2 according to the present invention can be found in the following transcript(s): T87719_T1 and T87719_T9. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster T87719_node_12 according to the present invention is supported by 22 libraries. The number of hbranes was determined as previously described. This segment can be found in the following transcπpt(s): T87719_T1. Table 26 below descnbes the starting and ending position of this segment on each transcript.
Table 26 - Segment location on transcripts
Segment cluster T87719_node_17 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously descnbed. This segment can be found in the following transcript(s): T87719_T1. Table 27 below describes the starting and ending position of this segment on each franscript. Table 27 - Segment location on transcripts
Segment cluster T87719_node_22 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87719_T1. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster T87719_node_24 according to the present invention is supported by 29 libraries. The number of hbranes was determined as previously described. This segment can be
found in the following transcript(s): T87719_T1. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: PODX_HUMAN_Vl
Sequence documentation:
Alignment of: T87719_P2 x PODX_HUMAN_Vl Alignment segment l/l:
Quality: 2866.00 Escore : 0 Matching length: 309 Total length: 311 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 99.36 Total Percent Identity: 99.36 Gaps : 1
Alignment :
MRCALALSALLLLLSTPPLLPS .. SPSPSPSPSQNATQTTTDSSNKTAPT 48 11111111111111111111111111 MRCALALSALLLLLSTPPLLPSSPSPSPSPSPSQNATQTTTDSSNKTAPT 50 . . . . . PASSVTIMATDTAQQSTVPTSKANEILASVKATTLGVSSDSPGTTTLAQQ 98
PASSVTIMATDTAQQSTVPTSKANEILASVKATTLGVSSDSPGTTTLAQQ 100
VSGPVNTTVARGGGSGNPTTTIESPKSTKSADTTTVATSTATAKPNTTSS 148
VSGPVNTTVARGGGSGNPTTTIESPKSTKSADTTTVATSTATAKPNTTSS 150
QNGAEDTTNSGGKSSHSVTTDLTSTKAEHLTTPHPTSPLSPRQPTSTHPV 198 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I QNGAEDTTNSGGKSSHSVTTDLTSTKAEHLTTPHPTSPLSPRQPTSTHPV 200
ATPTSSGHDHLMKISSSSSTVAIPGYTFTSPGMTTTLPSSVISQRTQQTS 248
ATPTSSGHDHLMKISSSSSTVAIPGYTFTSPGMTTTLPSSVISQRTQQTS 250
SQMPASSTAPSSQETVQPTSPATALRTPTLPETMSSSPTAASTTHRYPKT 298
SQMPASSTAPSSQETVQPTSPATALRTPTLPETMSSSPTAASTTHRYPKT 300
PSPTVAHESNW 309
PSPTVAHESNW 311
Sequence name : PODX_HUMAN
Sequence documentation:
Alignment of: T87719_P8 x PODX_HUMAN
Alignment segment 1/1:
Quality: 1550.00 Escore : 0. Matching length: 176 Total length: 178 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 98.88 Total Percent Identity: 98.88 Gaps : 1
Alignment :
1 MRCALALSALLLLLSTPPLLPS .. SPSPSPSPSQNATQTTTDSSNKTAPT 48 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MRCALALSALLLLLSTPPLLPSSPSPSPSPSPSQNATQTTTDSSNKTAPT 50
49 PASSVTIMATDTAQQSTVPTSKANEILASVKATTLGVSSDSPGTTTLAQQ 98
51 PASSVTIMATDTAQQSTVPTSKANEILASVKATTLGVSSDSPGTTTLAQQ 100
99 VSGPVNTTVARGGGSGNPTTTIESPKSTKSADTTTVATSTATAKPNTTSS 148
101 VSGPVNTTVARGGGSGNPTTTIESPKSTKSADTTTVATSTATAKPNTTSS 150
149 QNGAEDTTNSGGKSSHSVTTDLTSTKAE 176
151 QNGAEDTTNSGGKSSHSVTTDLTSTKAE 178
DESCRIPTION FOR CLUSTER HSCAMPATl Cluster HSCAMPATl features 2 transcπpt(s) and 8 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application The selected protein vanants are given in table 3 Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein CAMPATH- 1 antigen precursor (SwissProt accession identifier CD52_HUMAN; known also according to the synonyms CD52 antigen; CDW52; Cambridge pathology 1 antigen; Epididymal secretory protein E5), SEQ ID NO: 307, refened to herein as the previously known protein. The known protein CAMPAIΗ- 1 antigen precursor (CD52 antigen) is associated with several types of cancer, including lymphomas, hairy-cell leukemia and B-cell chronic lymphocytic leukemia and it is used in as immunohistochemistry target for their diagnosis. CAMPATH(R) (alemtuzumab) is a monoclonal antibody specific to CD52 and is used in the freatment of B-cell chronic lymphocytic leukemia. CD52 can be used for diagnosis not only with immunohistochemistry but also with flow cytometry of body fluids. An example of a prefened test for CD52 (Campath-1) is Lymphoma typing. Protein CAMPATH- 1 antigen precursor is known or believed to have the following function(s): may play a role in carrying and orienting carbohydrate, as well as having a more specific role. The sequence for protein CAMPATH-1 antigen precursor is given at the end of the application, as "CAMPATH-1 antigen precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein CAMPATH- 1 antigen precursor localization is believed to be Attached to the membrane by a GPI- anchor.
It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Lymphocyte inhibitor. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticancer; Immunosuppressant; Multiple sclerosis treatment; Monoclonal antibody, humanized. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: membrane fraction; integral plasma membrane protein; membrane, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HSCAMPATl features 2 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein CAMPATH-1 antigen precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSCAMPAT1_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSCAMPAT1_PEA_1_T1 and HSCAMPATl _PEA_1_T2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region.
Variant protein HSCAMPATl _PEA_1_P2 is encoded by the following transcript(s): HSCAMPAT1_PEA_1_T1 and HSCAMPAT1_PEA_1_ T2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCAMPAT1_PEA_1_T1 is shown in bold; this coding portion starts at position 113 and ends at position 268. The franscript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCAMPAT1_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
The coding portion of franscript HSCAMPAT1_PEA_1_T2 is shown in bold; this coding portion starts at position 113 and ends at position 268. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCAMPAT1_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
1133 A -> G Yes 1276 A -> G Yes As noted above, cluster HSCAMPATl features 8 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application These segment(s) are portions of nucleic acid sequence(s) which are descπbed herein separately because they are of particular interest A descπption of each segment according to the present invention is now provided
Segment cluster HSCAMPATl_PEA_l_node_0 according to the present invention is supported by 149 hbranes The number of libraπes was determined as previously descπbed This segment can be found in the following transcπpt(s) HSCAMPAT1_PEA_1_T1 and HSCAMPAT1_PEA_1_T2 Table 7 below describes the starting and ending position of this segment on each transcπpt Table 7 - Segment location on transcripts
Segment cluster HSCAMPATl_PEA_l_node_3 according to the present invention is supported by 5 hbranes The number of hbranes was determined as previously descπbed This segment can be found in the following transcπpt(s) HSCAMPAT1_PEA_1_T2 Table 8 below describes the starting and ending position of this segment on each transcπpt Table 8 - Segment location on transcripts
Segment cluster HSCAMPATl_PEA_l_node_4 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCAMPAT1_PEA_ 1_T2. Table 9 below describes the starting and ending position of this segment on each franscript. Table 9 - Segment location on transcripts
Segment cluster HSCAMPATl_PEA_l_node_5 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCAMPATl _PEA__1_T2. Table 10 below describes the starting and ending position of this segment on each franscript. Table 10 - Segment location on transcripts
Segment cluster HSCAMPATl_ PEA_l_node_8 according to the present invention is supported by 176 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCAMPATl _PEA_1_T1. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSCAMPATl_PEA_l_node_2 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCAMPATl _PEA_1_T1 and HSCAMPAT1_PEA_1_T2. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster HSCAMPATl_PEA_l_node_7 according to the present invention is supported by 161 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCAMPAT1_PEA_1_T1. Table 13 below describes the starting and ending position of this segment on each franscnpt. Table 13 - Segment location on transcripts
Segment cluster HSCAMPATl_PEA_l_node_9 according to the present invention is supported by 114 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCAMPAT1_PEA_1_T1. Table 14 below describes the starting and ending position of this segment on each transcript.
Table 14 - Segment location on transcripts
DESCRIPTION FOR CLUSTER HSTIR Cluster HSTIR features 2 franscript(s) and 11 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein T-cell surface glycoprotein CD5 precursor (SwissProt accession identifier CD5_HUMAN; known also according to the synonyms Lymphocyte glycoprotein Tl/Leu-1; Lymphocyte antigen CD5), SEQ ID NO: 322, refened to herein as the previously known protein. The known protein T-cell surface glycoprotein CD5 precursor CD5 is a 67kDa human T- lymphocyte single-chain transmembrane glycoprotein. It is present on all mature T- lymphocytes, on most of thymocytes and on many T-cell leukemias and lymphomas. It is used for immunohistochemistry diagnosis of these diseases and in particular thymic carcinoma. Protein T-cell surface glycoprotein CD5 precursor is known or believed to have the following function(s): May act as a receptor in regulating T-cell proliferation. CD5 interacts with CD72/LYB-2. The sequence for protein T-cell surface glycoprotein CD5 precursor is given at the end of the application, as "T-cell surface glycoprotein CD5 precursor amino acid sequence". Protein T-cell surface glycoprotein CD5 precursor localization is believed to be Type I membrane protein.
It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Protein synthesis antagonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., descnbed herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Immunomodulator, anti- infective; Anticancer; Immunosuppressant; Immunotoxin; Antidiabetic; Antipsoriasis; Antiarthritic, immunological.
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell recognition; cell proliferation, which are annotation(s) related to Biological Process; fransmembrane receptor; scavenger receptor, which are annotation(s) related to Molecular Function; and integral plasma membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HSTIR features 2 franscript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein T-cell surface glycoprotein CD5 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSTIR_PEA_1_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSTIR_PEA_1_T3. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither ofthe trans -membrane region prediction programs predicted a trans -membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non- secreted protein. Variant protein HSTIR_PEA_1_P4 also has the following non-silent SNPs (Single
Nucleotide Polymorphisms) as listed in Table 4, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSTIR_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 4 - Amino acid mutations
Variant protein HSTIR_PEA_1_P4 is encoded by the following transcript(s): HSTIR_PEA_1_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSTIR_PEA_1_T3 is shown in bold; this coding portion starts at position 158 and ends at position 682. The transcript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HST1R_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
Variant protein HSTIR_PEA_1_ P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSTIR_PEA_1_T2. An alignment is given to the known protein (T-cell surface glycoprotein CD5 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the
relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSTIR_ PEA_1_P6 and CD5_HUMAN: l.An isolated chimeric polypeptide encoding for HSTIR_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to
MPMGSLQPLATLYLLGMLVASCLGRLSWYDPDFQARLTRSNSKCQGQLEVYLKDGW HMVCSQSWGRSSKQWEDPSQASKVCQRLNCGVPLSLGPFLVTYTPQSSIICYGQLGSFS NCSHSRNDMCHSLGLTCLEPQKTTPPTTRPPPTTTPEPTAPPRLQLVAQSGGQHCAGW EFYSGSLGGTISYEAQDKTQDLENFLCNNLQCGSFLKHLPETEAGRAQDPGEPREHQPL PIQWKJQNSSCTSLEHCFRKJKPQKSGRVLALLCSGFQPKVQSRLVGGSSICEGTVEVRQ GAQWAALCDSSSARSSLRWEEVCREQQCGSVNSYRVLDAGDPTSRGLFCPHQKLSQC HELWERNSYCKKVFVT conesponding to amino acids 1 - 366 of CD5_HUMAN, which also conesponds to amino acids 1 - 366 of HSTIR_PEA_1_P6, and a second amino acid sequence being at least 90 % homologous to FRQKKQRQWIGPTGMNQNMSFHRNHTATVRSHAENPTASHVDNEYSQPPRNSRLSAY PALEGVLHRSSMQPDNSSDSDYDLHGAQRL conesponding to amino acids 409 - 495 of CD5_HUMAN, which also conesponds to amino acids 367 - 453 of HSTIR_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSTIR_PEA_1_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise TF, having a stmcture as follows: a sequence starting from any of amino acid numbers 366-x to 366; and ending at any of amino acid numbers 367+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide
prediction programs predict that this protein has a signal peptide, and neither trans- embrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSTIR_PEA_1_P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSTIR_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
The glycosylation sites of variant protein HSTIR_PEA_1_P6, as compared to the known protein T-cell surface glycoprotein CD5 precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Glycosylation site(s)
Variant protein HSTIR_PEA_1_P6 is encoded by the following transcript(s): HSTIR_PEA_1_T2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSTIR_PEA_1_T2 is shown in bold; this coding portion starts at
position 186 and ends at position 1544. The franscript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSTIR_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
As noted above, cluster HSTIR features 11 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSTIR_PEA_l_node_0 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSTIR_PEA_1_T2. Table 9 below describes the starting and ending position of this segment on each franscript. Table 9 - Segment location on transcripts
Segment cluster HSTIR_PEA_l_node_4 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSTIR_PEA_1_T2. Table 10 below describes the starting and ending position of this segment on each franscript. Table 10 - Segment location on transcripts
Segment cluster HSTIR_PEA_l_node_8 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HST1R_PEA_1_T2. Table 11 below describes the starting and ending position of this segment on each franscript. Table 11 - Segment location on transcripts
Segment cluster HSTIR_PEA_l_node_10 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HST1R_PEA_1_T2. Table 12 below describes the starting and ending position of this segment on each franscript. Table 12 - Segment location on transcripts
Segment cluster HSTIR_PEA_l_node_17 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSTIR_PEA_1_T2 and HSTIR_PEA_1_T3. Table 13 below describes the starting and ending position of this segment on each franscript. Table 13 - Segment location on transcripts
Segment cluster HSTIR_PEA_l_node_21 according to the present invention is supported by 64 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSTIR_PEA_1_T2 and HSTIR_PEA_1_T3. Table 14 below describes the starting and ending position of this segment on each franscript. Table 14 - Segment location on transcripts
to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSTIR_PEA_l_node_2 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscnpt(s): HSTIR__PEA__1_T2. Table 15 below describes the starting and ending position of this segment on each franscript. Table 15 - Segment location on transcripts
Segment cluster HSTIR_PEA_l_node_6 according to the present invention is supported by 17 hbranes. The number of libraries was determined as previously descπbed. This segment can be found in the following transcript(s): HSTIR_PEA_1_T2. Table 16 below descπbes the starting and ending position of this segment on each franscript. Table 16 - Segment location on transcripts
Segment cluster HSTIR_PEA_l_node_14 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously descnbed. This segment can be found in the following transcπpt(s): HSTIR_PEA_1_T3. Table 17 below descnbes the starting and ending position of this segment on each franscript. Table 17 - Segment location on transcripts
Segment cluster HSTIR_PEA_l_node_15 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSTIR_PEA_1_T2 and HSTIR_PEA_1_T3. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
BBS il398R Pl9ttWi *l-il-PHH HSΗR_PEA_1_T2 1285 1338 HSΗR_PEA_1_T3 65 118
Segment cluster HSTIR_PEA_l_node_19 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSTIR_PEA_1_T2 and HSTIR_PEA_1_T3. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name : CD5_HUMAN
Sequence documentation:
Alignment of: HSTIR_PEA_1_P6 x CD5_HUMAN
Alignment segment 1/1:
Quality: 4494.00 Escore: 0 Matching length: 453 Total length: 495 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 91.52 Total Percent Identity: 91.52 Gaps : 1
Alignment :
1 MPMGSLQPLATLYLLGMLVASCLGRLSWYDPDFQARLTRSNSKCQGQLEV 50
1 MPMGSLQPLATLYLLGMLVASCLGRLS YDPDFQARLTRSNSKCQGQLEV 50
51 YLKDG HMVCSQSWGRSSKQ EDPSQASKVCQRLNCGVPLSLGPFLVTYT 100
51 YLKDGWHMVCSQSWGRSSKQWEDPSQASKVCQRLNCGVPLSLGPFLVTYT 100 101 PQSSIICYGQLGSFSNCSHSRNDMCHSLGLTCLEPQKTTPPTTRPPPTTT 150
101 PQSSIICYGQLGSFSNCSHSRNDMCHSLGLTCLEPQKTTPPTTRPPPTTT 150
151 PEPTAPPRLQLVAQSGGQHCAGWEFYSGSLGGTISYEAQDKTQDLENFL 200
151 PEPTAPPRLQLVAQSGGQHCAGWEFYSGSLGGTISYEAQDKTQDLENFL 200
201 CNNLQCGSFLKHLPETEAGRAQDPGEPREHQPLPIQ KIQNSSCTSLEHC 250
201 CNNLQCGSFLKHLPETEAGRAQDPGEPREHQPLPIQ KIQNSSCTSLEHC 250
251 FRKIKPQKSGRVLALLCSGFQPKVQSRLVGGSSICEGTVEVRQGAQ AAL 300
251 FRKIKPQKSGRVLALLCSGFQPKVQSRLVGGSSICEGTVEVRQGAQ AAL 300
301 CDSSSARSSLR EEVCREQQCGSVNSYRVLDAGDPTSRGLFCPHQKLSQC 350
301 CDSSSARSSLR EEVCREQQCGSVNSYRVLDAGDPTSRGLFCPHQKLSQC 350
351 HEL ERNSYCKKVFVT 366
351 HEL ERNSYCKKVFVTCQDPNPAGLAAGTVASIILALVLLWLLWCGPL 400
367 FRQKKQRQWIGPTGMNQNMSFHRNHTATVRSHAENPTASHVD 408 111111111111111111111111111111111111111111
401 AYKKLVKKFRQKKQRQWIGPTGMNQNMSFHRNHTATVRSHAENPTASHVD 450
409 NEYSQPPRNSRLSAYPALEGVLHRSSMQPDNSSDSDYDLHGAQRL 453
451 NEYSQPPRNSRLSAYPALEGVLHRSSMQPDNSSDSDYDLHGAQRL 495
DESCRIPTION FOR CLUSTER HSALK1A
Cluster HSALKIA features 1 franscript(s) and 5 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Serine/threonine-protein kinase receptor R3 precursor (SwissProt accession identifier KIR3_HUMAN; known also according to the synonyms EC 2.7.1.37; SKR3; Activin receptor- like kinase 1; ALK-1; TGF-B superfamily receptor type I; TSR-I), SEQ ID NO: 331, refened to herein as the previously known protein. The known protein Serine/threonine-protein kinase receptor R3 precursor (ALK-1; TGF- B superfamily receptor type I) is a type I membrane protein which together with other TFG-beta receptors form an heteromeric complex after binding TGF-beta at the cell surface and act as a signal transducer. It is used optionally in the immunohistochemistry diagnosis of lymphoma. Protein Serine/threonine-protein kinase receptor R3 precursor is known or believed to have the following functιon(s): Type I/type II TGF-beta receptors form an heteromeric complex
after binding TGF-beta at the cell surface and act as signal transducers. May bind activin as well. The sequence for protein Serine/threonine-protein kinase receptor R3 precursor is given at the end of the application, as "Serine/threonine-protein kinase receptor R3 precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Serine/threonine-protein kinase receptor R3 precursor localization is believed to be Type I membrane protein. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: TGFbeta receptor signaling pathway; circulation, which are annotation(s) related to Biological Process; fransmembrane receptor protein serine/threonine kinase, which are annotation(s) related to Molecular Function; and integral plasma membrane protein, which are annotation(s) related to Cellular Component.
The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HSALKIA features 1 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Serine/threonine- protein kinase receptor R3 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSALK1A_PEA_1_P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSALK1A_PEA_1_T21. An alignment is given to the known protein (Serine/threonine-protein kinase receptor R3 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSALK1A_PEA_1_P14 and KIR3_HUMAN: l.An isolated chimeric polypeptide encoding for HSALK1A_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to
MTLGSPRKGLLMLLMALVTQGDPVKPSRGPLVTCTCESPHCKGPTCRGAWCTVVLVR EEGRHPQEHRGCGNLHRELCRGRPTEFVNHYCCDSHLCNHNVSLVLE conesponding to amino acids 1 - 104 of KIR3_HUMAN, which also conesponds to amino acids 1 - 104 of HSALK1A_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GTSSCPSTPSPSSWPLPSLPSFPLMLWPIKGLGAGERVGRTLGSNWQSGLARGGGS conesponding to amino acids 105 - 160 of HSALK1A_PEA_1_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSALK1A_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%,
more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GTSSCPSTPSPSSWPLPSLPSFPLMLWPD GLGAGERVGRTLGSNWQSGLARGGGS in HSALKIA PEA 1 P14.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSALK1A_PEA_1_P14 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSALK1A_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
The glycosylation sites of variant protein HSALK1 A_PEA_1_P14, as compared to the known protein Seπne/threonine-protein kinase receptor R3 precursor, are described in Table 6 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein).
Table 6 - Glycosylation site(s)
Variant protein HSALK1A_PEA_1_P14 is encoded by the following franscript(s): HSALK1A_PEA_1_T21, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSALK1A_PEA_1_T21 is shown in bold; this coding portion starts at position 378 and ends at position 857. The franscript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSALK1A_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
As noted above, cluster HSALKIA features 5 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSALKlA_PEA_l_node_0 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSALK1A_PEA_1_T21. Table 8 below describes the starting and ending position of this segment on each transcript. Table 8 - Segment location on transcripts
Segment cluster HSALKlA_PEA_l_node_8 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSALK1A_PEA_1_T21. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Segment cluster HSALKlA_PEA_l_node_9 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSALK1A_PEA_1_T21. Table 10 below describes the starting and ending position of this segment on each transcript.
Table 10 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSALKlA_PEA_l_node_5 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSALK1A_PEA_1_T21. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster HSALKlA_PEA_l_node_7 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSALK1A_PEA_1_T21. Table 12 below describes the starting and ending position of this segment on each franscript. Table 12 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name : KIR3_HUMAN
Sequence documentation:
Alignment of: HSALK1A_PEA_1_P14 x KIR3_HUMAN
Alignment segment l/l:
Quality: 1077 . 00 Escore : 0 Matching length: 106 Total length: 106 Matching Percent Similarity: 99.06 Matching Percent Identity: 99.06 Total Percent Similarity: 99.06 Total Percent Identity: 99.06 Gaps : 0
Alignment : . . . . . 1 MTLGSPRKGLLMLLMALVTQGDPVKPSRGPLVTCTCESPHCKGPTCRGAW 50
1 MTLGSPRKGLLMLLMALVTQGDPVKPSRGPLVTCTCESPHCKGPTCRGA 50 51 CTWLVREEGRHPQEHRGCGNLHRELCRGRPTEFVNHYCCDSHLCNHNVS 100
51 CTWLVREEGRHPQEHRGCGNLHRELCRGRPTEFVNHYCCDSHLCNHNVS 100
101 LVLEGT 106 MM I 101 LVLEAT 106
DESCRIPTION FOR CLUSTER HSCDIA Cluster HSCDIA features 7 transcript(s) and 16 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein T-cell surface glycoprotein CDla precursor (SwissProt accession identifier CD1A_HUMAN; known also according to the synonyms CDla antigen; T-cell surface antigen T6/Leu-6; hTal thymocyte antigen), SEQ ID NO: 356, refened to herein as the previously known protein. The known protein T-cell surface glycoprotein CDla precursor is a type I membrane protein of 43 to 49kD expressed on dendritic cells and cortical thymocytes. CDla antigen expression has been shown to be useful in differentiating Langerhans cells, powerful antigen presenting cells present in skin and epithelia, from interdigitating cells. It is used in the immunohistochemistry diagnosis of atopic dermatitis and other dermatological conditions. An example of a prefened test is detection of Thymic T-cells, thymoma, Langerhans cells.
Protein T-cell surface glycoprotein CDla precursor is known or believed to have the following function(s): Not known. The sequence for protein T-cell surface glycoprotein CDla precursor is given at the end of the application, as "T-cell surface glycoprotein CDla precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein T-cell surface glycoprotein CDla precursor localization is believed to be Type I membrane protein. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: immune response, which are annotation(s) related to Biological Process; and integral plasma membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HSCDIA features 7 transcript(s), which were listed in Table 1 above. These transcripts) encode for protein(s) which are variant(s) of protein T-cell surface glycoprotein CD 1 a precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSCDIA_PEA_1_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSCDIA_PEA_1_T4. An alignment is given to the known protein (T-cell surface glycoprotein CDla precursor) at the end of the application. One or more alignments to one or more
previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCDIA_PEA_1_P5 and CD1A_HUMAN_V1 (SEQ ID NO: 357): l.An isolated chimeric polypeptide encoding for HSCDIA_PEA_1_P5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LQWGRKNLGAMFAF conesponding to amino acids 1 - 14 of HSCDIA_PEA_1_P5, a bridging amino acid T conesponding to amino acid 18 of
HSCDIA_PEA_1_P5, and a second amino acid sequence being at least 90 % homologous to GLKEPLSFHVTWIASFYNHSWKQNLVSGWLSDLQTHTWDSNSSTIVFLWPWSRGNFSN EEWKELETLFRXRTIRSFEGIRRYAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGS DFVSFQNNSWLPYPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKA HLQRQVKPEAWLSHGPSPGPGHLQLVC HVSGFYPKPVWVMWMRGEQEQQGTQRGDI LPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQDIVLYWEHHSSVGFIILAVIVPL LLLIGLALWFRKRCFC conesponding to amino acids 20 - 327 of CD1A_HUMAN_V1, which also conesponds to amino acids 19 - 326 of HSCDIA_PEA_1_P5, wherein said first amino acid sequence, bridging amino acid and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of HSCDIA_PEA_1_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LQWGRKNLGAMFAF of HSCDIA_PEA_1_P5.
It should be noted that the known protein sequence (CD1A_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for CD1A_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 5 - Changes to CD1A_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both frans -membrane region prediction programs predict that this protein has a trans -membrane region downstream of this signal peptide. Variant protein HSCDIA_PEA_1_P5 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCDIA_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein HSCDIA_PEA_1_P5 is encoded by the following transcript(s): .J HSCDIA_PEA_1_T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCDIA_PEA_1_T4 is shown in bold; this coding portion starts at position 1 and ends at position 978. The franscript also has the following SNPs as listed in Table
7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCDIA_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HSCDIA_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCDIA_PEA_1_T5. An alignment is given to the known protein (T-cell surface glycoprotein CDla precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCDIA_PEA_1_P6 and CD1A__HUMAN_V1 : l.An isolated chimeric polypeptide encoding for HSCDIA_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MLFLLLPLLA VLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLSDLQTHTW DSNSSTΓVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQFEYPFEIQVTGG CELHSGKVSGSFLQLAYQGSDFVSFQNNSWLPYPVAGNMAKHFCKVLNQNQHENDIT HNLLSDTCPRFILGLLDAGKAHLQRQVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPV
WVMWMRGEQEQQGTQRGDILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQD IVLYWEHHSSVGFIILAVΓVPLLLLIGLALWFRKR corresponding to amino acids 1 - 324 of CD1A_HUMAN_V1, which also conesponds to amino acids 1 - 324 of HSCDIA_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence W corresponding to amino acids 325 - 325 of HSCDIA_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
It should be noted that the known protein sequence (CD1 A_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for CD1A_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 8 - Changes to CD1A_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both frans- membrane region prediction programs predict that this protein has a frans -membrane region downstream of this signal peptide. Variant protein HSCDIA_PEA_1_P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCDIA_PEA_1_P6
sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein HSCDIA_PEA_1_P6 is encoded by the following transcript(s): HSCDIA_PEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCDIA_PEA_1_T5 is shown in bold; this coding portion starts at position 537 and ends at position 1511. The franscript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCDIA_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HSCDIA_PEA_1_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSCDIA_PEA_1_T7. An alignment is given to the known protein (T-cell surface glycoprotein CDla precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCDIA_PEA_1_P7 and CD1A_HUMAN_V1 : l.An isolated chimeric polypeptide encoding for HSCDIA_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to MLFLLLPLLA VLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLSDLQTHTW DSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQFEYPFEIQVTGG CELHSGKVSGSFLQLAYQGSDFVSFQNNSWLPYPVAGNMAKHFCKVLNQNQHENDIT HNLLSDTCPRFILGLLDAGKAHLQRQVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPV WVMWMRGEQEQQGTQRGDILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQD IVLYW conesponding to amino acids 1 - 294 of CD1A_HUMAN_V1, which also conesponds to amino acids 1 - 294 of HSCDIA_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEKKLRPRLEMPGSGPQA conesponding to amino acids 295 - 312 of HSCDIA_PEA_1_P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCDIA_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEKKLRPRLEMPGSGPQA in HSCDIA_PEA_1_P7.
It should be noted that the known protein sequence (CD1A_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for CD1A_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 11 - Changes to CD1A_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans- membrane region prediction program predicts that this protein has a trans -membrane region. Vanant protein HSCDIA_PEA_1_P7 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in vanant protein HSCDIA_PEA_1_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 -Amino acid mutations
Variant protein HSCDIA_PEA_1_P7 is encoded by the following franscript(s): HSCDIA_PEA_1_T7, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSCDIA_PEA_1_T7 is shown in bold; this coding portion starts at position 537 and ends at position 1472. The franscript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in vanant protein HSCDIA_PEA_1_P7 sequence provides support for the deduced sequence of this vanant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein HSCDIA_PEA_1_P8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCDIA_PEA_1_T8. An alignment is given to the known protein (T-cell surface glycoprotein CDla precursor) at the end of the application. One or more alignments to one or more
previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCDIA_PEA_1_P8 and CD1A_HUMAN_V1: l.An isolated chimeric polypeptide encoding for HSCDLA_PEA_1_P8, comprising a first amino acid sequence being at least 90 % homologous to
MLFLLLPLLA VLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLSDLQTHTW DSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQFEYPFEIQVTGG CELHSGKVSGSFLQLAYQGSDFVSFQNNSWLPYPVAGNMAKHFCKVLNQNQHENDIT HNLLSDTCPRFILGLLDAGKAHLQRQVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPV WVMWMRGEQEQQGTQRGDILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQD IVLYW corresponding to amino acids 1 - 294 of CD1A_HUMAN_V1, which also conesponds to amino acids 1 - 294 of HSCDIA_PEA_1_P8, and a second amino acid sequence being at least 90 % homologous to GLALWFRKRCFC conesponding to amino acids 316 - 327 of CD1A_HUMAN_V1, which also conesponds to amino acids 295 - 306 of
HSCDIA_PEA_1_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSCDIA_PEA_1_P8, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise WG, having a stmcture as follows: a sequence starting from any of amino acid numbers 294-x to 294; and ending at any of amino acid numbers 295+ ((n-2) - x), in which x varies from 0 to n-2.
It should be noted that the known protein sequence (CD1A_HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for CD1A_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 14 - Changes to CD1A_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans -membrane region prediction program predicts that this protein has a frans- membrane region. Variant protein HSCDIA_PEA_1_P8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCDIA_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Vanant protein HSCDIA_PEA_1_P8 is encoded by the following transcript(s): HSCDIA_PEA_1_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCDIA_PEA_1_T8 is shown in bold; this coding portion starts at position 537 and ends at position 1454. The transcript also has the following SNPs as listed in
Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCDIA_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein HSCDIA_PEA_1_P9 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSCDIA_PEA_1_T9. An alignment is given to the known protein (T-cell surface glycoprotein CDla precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCDIA_PEA_1_P9 and CD1A_HUMAN: l.An isolated chimeric polypeptide encoding for HSCDIA_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to MLFLLLPLLA VLPGDGNAD conesponding to amino acids 1 - 19 of CD 1A_HUMAN, which also conesponds to amino acids
1 - 19 of HSCDIA_PEA_1_P9, and a second amino acid sequence being at least 90 % homologous to
GWLSDLQTHTWDSNSSTΓVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQ FEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLPYPVAGNMAKHFCKV LNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQRQVKPEAWLSHGPSPGPGHLQLV CHVSGFYPKPVWVMWMRGEQEQQGTQRGDILPSADGTWYLRATLEVAAGEAADLSC
RVKHSSLEGQDIVLYWEHHSSVGFIILAVIVPLLLLIGLALWFRKRCFC conesponding to amino acids 47 - 327 of CD1A_HUMAN, which also conesponds to amino acids 20 - 300 of HSCDIA_PEA_1_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSCDIA_PEA_1_P9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise DG, having a structure as follows: a sequence starting from any of amino acid numbers 19-x to 19; and ending at any of amino acid numbers 20+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both trans- membrane region prediction programs predict that this protein has a frans -membrane region downstream of this signal peptide. Variant protein HSCDIA_PEA_1_P9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCDIA_PEA_1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 17 - Amino acid mutations
The glycosylation sites of variant protein HSCDIA_PEA_1_P9, as compared to the known protein T-cell surface glycoprotein CDla precursor, are described in Table 18 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 18 - Glycosylation site(s)
Variant protein HSCDIA_PEA_1_P9 is encoded by the following transcript(s): HSCDIA_PEA_1_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCDIA_PEA_1_T9 is shown in bold; this coding portion starts at position 537 and ends at position 1436. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCDIA_PEA_1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Variant protein HSCDIA_PEA_1_P11 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HSCDIA_PEA_1_T11. An alignment is given to the known protein (T-cell surface glycoprotein CDla precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCDIA_PEA_1_P11 and CD1A_HUMAN_V1: l.An isolated chimeric polypeptide encoding for HSCDIA_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to MLFLLLPLLAVLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLSDLQTHTW DSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQFEYPFEIQVTGG CELHSGKVSGSFLQLAYQGSDFVSFQNNSWLPYPVAGNMAKHFCKVLNQNQHENDIT HNLLSDTCPRFILGLLDAGKAHLQRQVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPV WVMWMR conesponding to amino acids 1 - 239 of CD1A_HUMAN__V1, which also conesponds to amino acids 1 - 239 of HSCDIA_PEA_1_P1 1, and a second amino acid sequence being at least 90 % homologous to EHHSSVGFIILAVIVPLLLLIGLALWFRKRCFC
conesponding to amino acids 295 - 327 of CD1A_HUMAN_V1, which also conesponds to amino acids 240 - 272 of HSCDIA_PEA_1_P11, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSCDIA_PEA_1_P11, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise RE, having a structure as follows: a sequence starting from any of amino acid numbers 239-x to 239; and ending at any of amino acid numbers 240+ ((n-2) - x), in which x varies from 0 to n-2.
It should be noted that the known protein sequence (CD1 A_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for CD1A_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 20 - Changes to CD1A_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both frans- membrane region prediction programs predict that this protein has a frans- membrane region downstream of this signal peptide.
Variant protein HSCDIA_PEA_1_P11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 21, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCDIA_PEA_1_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Amino acid mutations
Variant protein HSCDIA_PEA_1_P11 is encoded by the following franscript(s): HSCDIA_PEA_1_T11, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCDIA_PEA_1_T11 is shown in bold; this coding portion starts at position 537 and ends at position 1352. The franscript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCDIA_PEA_1_P11 sequence provides support for the deduced sequence of this vanant protein according to the present invention). Table 22 - Nucleic acid SNPs
As noted above, cluster HSCDIA features 16 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSCDIA_PEA_l_node_3 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCDIA_PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_PEA_1_T8, HSCDIA_PEA_1_T9, HSCDIA_PEA_1_T10 and HSCDIA_PEA_1_T11. Table 23 below describes the starting and ending position of this segment on each franscript. Table 23 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_7 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment
can be found in the following franscript(s): HSCDIA_PEA_1_T4, HSCDIA__PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_PEA_1_T8, HSCDIA_PEA_1_T9, HSCDIA_PEA_1_T10 and HSCDIA_PEA_1_T11. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_l 1 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCDIA_PEA_1_T4, HSCDIA_PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_PEA_1_T8, HSCDIA_PEA_1_T9, HSCDIA_PEA_1_T10 and HSCDIA_PEA_1_T11. Table 25 below describes the starting and ending position of this segment on each franscript. Table 25 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_14 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCDIA_PEA_1_T4, HSCDIA_PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_PEA_1_T8, HSCDIA_PEA_1_T9 and HSCDIA_PEA_1_T10. Table 26 below describes the starting and ending position of this segment on each franscript. Table 26 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_15 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCDIA_PEA_1_T7. Table 27 below describes the starting and ending position of this segment on each franscript. Table 27 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_18 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCDIA_PEA_1_T5 and HSCDIA_PEA_1_T7. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_20 according to the present invention is supported by 20 libraries. The number of hbranes was determined as previously described. This segment can be found in the following transcript(s): HSCDIA_PEA_1_T4, HSCDIA_PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_PEA_1_T8, HSCDIA_PEA_1_T9 and HSCDIA_PEA_1_T11. Table 29 below describes the starting and ending position of this segment on each franscript. Table 29 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_21 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCDIA_PEA_1_T4, HSCDIA_PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_PEA_1_T8, HSCDIA_PEA_1_T9 and HSCDIA_PEA_1_T11. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_24 according to the present invention is supported by 0 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCDIA_PEA_1_T10. Table 31 below describes the starting and ending position of this segment on each franscript. Table 31 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSCDIA_PEA_l_node_l according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCDIA_PEA_1_T4. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_6 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following ttanscript(s): HSCDIA_PEA_1_T4, HSCDIA_PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_PEA_1_T8, HSCDIA_PEA_1_T10 and HSCDIA_PEA_1_T11. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_10 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCDIA_PEA_1_T4, HSCDIA_PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_PEA_1_T8, HSCDIA_PEA_1_T9, HSCDIA_PEA_1_T10 and HSCDIA_PEA_1_T11. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_13 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCDIA_PEA_1_T4, HSCDIA_PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_PEA_1_T8, HSCDIA_PEA_1_T9, HSCDIA_PEA_1_T10 and HSCDIA_PEA_1_T11. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_16 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCDIA_PEA_1_T4, HSCDIA_PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_PEA_1_T9 and HSCDIA_PEA_1_T11. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_17 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCDIA_PEA_1_T4, HSCDIA_PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_PEA_1_T8, HSCDIA_PEA_1_T9 and
HSCDIA_PEA_1_T11. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster HSCDIA_PEA_l_node_19 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCDIA_PEA_1_T4, HSCDIA_PEA_1_T5, HSCDIA_PEA_1_T7, HSCDIA_ PEA_1_T8, HSCDIA_PEA_1_T9 and HSCDIA_PEA_1_T11. Table 38 below describes the starting and ending position of this segment on each franscript. Table 38 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: CD1A_HUMAN_V1
Sequence documentation:
Alignment of: HSCDIA_PEA_1_P5 x CD1A_HUMAN_V1
Alignment segment 1/1:
Quality: 3125.00 Escore: 0 Matching length: 312 Total length: 312 Matching Percent Similarity: 99.36 Matching Percent Identity: 99.36 Total Percent Similarity: 99.36 Total Percent Identity: 99.36 Gaps : 0
Alignment
15 GGATGLKEPLSFHVTWIASFYNHS KQNLVSG LSDLQTHT DSNSSTIV 64 I I 16 GNADGLKEPLSFHVT IASFYNHSWKQNLVSGWLSDLQTHTWDSNSSTIV 65
65 FLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQFEYPFEIQV 114
66 FLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRRYAHELQFEYPFEIQV 115
115 TGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLPYPVAGNMAKHFCKVL 164
116 TGGCELHSGKVSGSFLQLAYQGSDFVSFQNNS LPYPVAGNMAKHFCKVL 165 . . . . . 165 NQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQRQVKPEAWLSHGPSPG 214
166 NQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQRQVKPEA LSHGPSPG 215 215 PGHLQLVCHVSGFYPKPV VM MRGEQEQQGTQRGDILPSADGT YLRAT 264
216 PGHLQLVCHVSGFYPKPV VMWMRGEQEQQGTQRGDILPSADGT YLRAT 265
265 LEVAAGEAADLSCRVKHSSLEGQDIVLY EHHSSVGFIILAVIVPLLLLI 314 1111! 111111111111111111111111111111111111111111111 266 LEVAAGEAADLSCRVKHSSLEGQDIVLYWEHHSSVGFIILAVIVPLLLLI 315 315 GLALWFRKRCFC 326 316 GLALWFRKRCFC 327
Sequence name: CD1A_HUMAN_V1
Sequence documentation:
Alignment of: HSCDIA_PEA_1_P6 x CD1A_HUMAN_V1
Alignment segment 1/1:
Quality: 3251.00 Escore: 0 Matching length: 324 Total length: 324 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MLFLLLPLLAVLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLS 50
1 MLFLLLPLLAVLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLS 50
51 DLQTHTWDSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRR 100
51 DLQTHTWDSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRR 100
101 YAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLP 150
101 YAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLP 150
151 YPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQR 200
151 YPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQR 200
201 QVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPVWVMWMRGEQEQQGTQRG 250
201 QVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPVWVMWMRGEQEQQGTQRG 250 251 DILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQDIVLYWEHHSSV 300
251 DILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQDIVLYWEHHSSV 300 301 GFIILAVIVPLLLLIGLALWFRKR 324 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 301 GFIILAVIVPLLLLIGLALWFRKR 324
Sequence name : CD1A_HUMAN_V1
Sequence documentation:
Alignment of: HSCDIA_PEA_1_P7 x CD1A_HUMAN_V1
Alignment segment 1/1:
Quality: 2970.00
Escore : 0 Matching length: 294 Total length: 294 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00
Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MLFLLLPLLAVLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLS 50
1 MLFLLLPLLAVLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLS 50 51 DLQTHTWDSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRR 100
51 DLQTHTWDSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRR 100 101 YAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLP 150
101 YAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLP 150 151 YPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQR 200 11111111111111111111111111111111111111111111111111 151 YPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQR 200 201 QVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPVWVMWMRGEQEQQGTQRG 250 201 QVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPVWVMWMRGEQEQQGTQRG 250 251 DILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQDIVLYW 294
251 DILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQDIVLYW 294
Sequence name: CD1A_HUMAN_V1
Sequence documentation:
Alignment of: HSCDIA_PEA_1_P8 x CD1A_HUMAN_V1
Alignment segment l/l:
Quality: 3000.00
Escore : 0 Matching length: 306 Total length: 327 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 93.58 Total Percent Identity: 93.58 Gaps : 1
Alignment : 1 MLFLLLPLLAVLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLS 50
1 MLFLLLPLLAVLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLS 50 51 DLQTHTWDSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRR 100 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 51 DLQTHTWDSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRR 100
101 YAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLP 150
101 YAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLP 150 . . . . . 151 YPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQR 200
151 YPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQR 200 201 QVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPVWVMWMRGEQEQQGTQRG 250
201 QVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPVWVMWMRGEQEQQGTQRG 250 251 DILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQDIVLYW 294 1111111111111111111111111111111 II 11111111111 251 DILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQDIVLYWEHHSSV 300
295 GLALWFRKRCFC 306 301 GFIILAVIVPLLLLIGLALWFRKRCFC 327
Sequence name : CD1A_HUMAN
Sequence documentation:
Alignment of: HSCDIA_PEA_1_P9 x CD1A_HUMAN
Alignment segment 1/1:
Quality: 2911.00 Escore : 0 Matching length: 300 Total length: 327 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 91.74 Total Percent Identity: 91.74 Gaps : 1
Alignment
1 MLFLLLPLLAVLPGDGNAD GWLS 23 Mil 1 MLFLLLPLLAVLPGDGNADGLKEPLSFHVIWIASFYNHSWKQNLVSGWLS 50
24 DLQTHTWDSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRR 73
51 DLQTHTWDSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRR 100
74 YAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLP 123
101 YAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLP 150
124 YPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQR 173
151 YPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQR 200
174 QVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPVWVMWMRGEQEQQGTQRG 223
201 QVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPVWVMWMRGEQEQQGTQRG 250 224 DILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQDIVLYWEHHSSV 273
251 DILPSADGTWYLRATLEVAAGEAADLSCRVKHSSLEGQDIVLYWEHHSSV 300 274 GFIILAVIVPLLLLIGLALWFRKRCFC 300 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 301 GFIILAVIVPLLLLIGLALWFRKRCFC 327
Sequence name : CD1A_HUMAN_V1
Sequence documentation:
Alignment of: HSCDIA_PEA_1_P11 x CD1A_HUMAN_V1
Alignment segment 1/1:
Quality: 2655.00 Escore : 0 Matching length: 272 Total length: 327 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00
Total Percent Similarity: 83.18 Total Percent Identity: 83.18 Gaps : 1
Alignment:
1 MLFLLLPLLAVLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLS 50
1 MLFLLLPLLAVLPGDGNADGLKEPLSFHVTWIASFYNHSWKQNLVSGWLS 50 51 DLQTHTWDSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRR 100
51 DLQTHTWDSNSSTIVFLWPWSRGNFSNEEWKELETLFRIRTIRSFEGIRR 100 101 YAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLP 150
101 YAHELQFEYPFEIQVTGGCELHSGKVSGSFLQLAYQGSDFVSFQNNSWLP 150 151 YPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQR 200 11111111111111111111111111111111111111111111111111 151 YPVAGNMAKHFCKVLNQNQHENDITHNLLSDTCPRFILGLLDAGKAHLQR 200 201 QVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPVWVMWMR 239 201 QVKPEAWLSHGPSPGPGHLQLVCHVSGFYPKPVWVMWMRGEQEQQGTQRG 250 240 EHHSSV 245 MINI 251 DILPSADGTWYLRATLEVAAGE/AADLSCRVKHSSLEGQDIVLYWEHHSSV 300
246 GFIILAVIVPLLLLIGLALWFRKRCFC 272
301 GFIILAVIVPLLLLIGLALWFRKRCFC 327
DESCRIPTION FOR CLUSTER S69686 Cluster S69686 features 4 franscript(s) and 64 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Pulmonary surfactant-associated protein A precursor (SwissProt accession identifier PSPA_HUMAN; known also according to the synonyms SP-A; PSP-A; PSAP; Alveolar proteinosis protein; 35 kDa pulmonary surfactant- associated protein), SEQ ID NO: 432, refened to herein as the previously known protein.
Protein Pulmonary surfactant-associated protein A precursor is known or believed to have the following function(s): In presence of calcium ions, PSAP binds to surfactant phosphohpids and contributes to lower the surface tension at the air- liquid interface in the alveoli of the mammalian lung and is essential for normal respiration. Immunohistochemistry for this protein is used to distinguish primary from secondary carcinomas (adenocarcinoma metastatic to lung and pleura) in the lung.
The sequence for protein Pulmonary surfactant- associated protein A precursor is given at the end of the application, as "Pulmonary surfactant-associated protein A precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Pulmonary surfactant- associated protein A precursor localization is believed to be Extracellular.
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: lipid transporter; apolipoprotein, which are annotation(s) related to Molecular Function; and extracellular space, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster S69686 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Pulmonary surfactant-associated protein A precursor. A description of each variant protein according to the present invention is now provided.
Variant protein S69686_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S69686_T1. An alignment is given to the known protein (Pulmonary surfactant-associated protein A precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S69686_P2 and PSPA_HUMAN_V 1 (SEQ ID NO: 433): l.An isolated chimeric polypeptide encoding for S69686_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more
preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLRLPYPLTWRQRPKQLEALCVGAATGPRA conesponding to amino acids 1 - 30 of S69686_P2, and a second amino acid sequence being at least 90 % homologous to MWLCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGLKGDPGPPG PMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPPGLPAHLDEELQATLHDFRHQ ILQTRGALSLQGSIMTVGEKVFSSNGQSITFDAIQEACARAGGRIAVPRNPEENEAIASFV KKYNTYAYVGLTEGPSPGDFRYSDGTPVNYTNWYRGEPAGRGKEQCVEMYTDGQWN DRNCLYSRLTICEF conesponding to amino acids 1 - 248 of PSPA_HUMAN_V1, which also conesponds to amino acids 31 - 278 of S69686_P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of S69686_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLRLPYPLTWRQRPKQLEALCVGAATGPRA of S69686_P2.
It should be noted that the known protein sequence (PSPA_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for PSPA_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 5 - Changes to PSPA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal-
peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein S69686_P2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S69686_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein S69686_P2 is encoded by the following franscript(s): S69686_T1, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript S69686_T1 is shown in bold; this coding portion starts at position 44 and ends at position 877. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S69686_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein S69686_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) S69686_T33. An alignment is given to the known protein (Pulmonary surfactant-associated protein A precursor) at the end of the application. One or more alignments to one or more previously published protein
sequences are given at the end of the application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S69686_P6 and PSPA_HUMAN_V1: 1.An isolated chimeric polypeptide encoding for S69686_P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLRLPYPLTWRQRPKQLEALCVATGPRA conesponding to amino acids 1 - 28 of S69686_P6, and a second amino acid sequence being at least 90 % homologous to MWLCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGLKGDPGPPG PMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPPGLPAHLDEELQATLHDFRHQ ILQTRGALSLQGSIMTVGEKVFSSNGQSITFDAIQEACARAGGRIAVPRNPEENEAIASFV KKYNTYAYVGLTEGPSPGDFRYSDGTPVNYTNWYRGEPAGRGKEQCVEMYTDGQWN DRNCLYSRLTICEF corresponding to amino acids 1 - 248 of PSPA_HUMAN_V1, which also conesponds to amino acids 29 - 276 of S69686_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of S69686_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLRLPYPLTWRQRPKQLEALCVATGPRA of S69686_P6.
It should be noted that the known protein sequence (PSPA_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for PSPA_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 8 - Changes to PSPA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM: Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein S69686_P6 also has the following non- silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S69686_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein S69686_P6 is encoded by the following franscript(s): S69686_T33, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript S69686_T33 is shown in bold; this coding portion starts at position 44 and ends at position 871. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S69686_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein S69686_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S69686_T36. An alignment is given to the known protein (Pulmonary surfactant- associated protein A precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S69686_P7 and PSPA_HUMAN_V1: l.An isolated chimeric polypeptide encoding for S69686_P7, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MLRLPYPLTWRQRPKQLEALCVGAATGPRA conesponding to amino acids 1 - 30 of S69686_P7, a second amino acid sequence being at least 90 % homologous to MWLCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGLKGDPGPPG PMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPP conesponding to amino acids 1 - 97 of PSPA_HUMAN_V 1 , which also conesponds to amino acids 31 - 127 of S69686_P7 , and a third amino acid sequence being at least 90 % homologous to
ALSLQGSIMTVGEKVFSSNGQSITFDAIQEACARAGGRIAVPRNPEENEAIASFVKKYNT YAYVGLTEGPSPGDFRYSDGTPVNYTNWYRGEPAGRGKEQCVEMYTDGQWNDRNCL YSRLΗCEF conesponding to amino acids 124 - 248 of PSPA_HUMAN_V1, which also conesponds to amino acids 128 - 252 of S69686_P7, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of S69686_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MLRLPYPLTWRQRPKQLEALCVGAATGPRA of S69686_P7. 3. An isolated chimeric polypeptide encoding for an edge portion of S69686_P7, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PA, having a
structure as follows: a sequence starting from any of amino acid numbers 127-x to 127; and ending at any of amino acid numbers 128+ ((n-2) - x), in which x varies from 0 to n-2.
It should be noted that the known protein sequence (PSPA_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for PSPA_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 11 - Changes to PSPA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein S69686_P7 also has the following non- silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in vanant protein S69686_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 -Amino acid mutations
Variant protein S69686_P7 is encoded by the following transcript(s): S69686_T36, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript S69686_T36 is shown in bold; this coding portion starts at position 44 and ends at position 799. The franscript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S69686_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein S69686_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S69686_T48. An alignment is given to the known protein (Pulmonary surfactant-associated protein A precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S69686_P13 and PSPA_HUMAN_ VT : l.An isolated chimeric polypeptide encoding for S69686_P13, comprising a first amino acid sequence being at least 90 % homologous to MWLCPLALNLILMAASGAVCEVKDVCVGSP conesponding to amino acids 1 - 30 of PSPA_HUMAN_V1, which also conesponds to amino acids 1 - 30 of S69686_P13, and a second amino acid sequence being at least 90 % homologous to GRGKEQCVEMYTDGQWNDRNCLYSRLTICEF conesponding to amino acids 218 - 248 of PSPA_HUMAN_V1, which also conesponds to amino acids 31 - 61 of S69686_P13, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of S69686_P13, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino
acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 30-x to 30; and ending at any of amino acid numbers 31+ ((n-2) - x), in which x varies from 0 to n-2.
It should be noted that the known protein sequence (PSPA_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for PSPA_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 14 - Changes to PSPA_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein S69686_P13 also has the following non- silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S69686_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Variant protein S69686_P13 is encoded by the following franscript(s): S69686_T48, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S69686 T48 is shown in bold; this coding portion starts at position 205 and ends at position 387. The franscript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S69686_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
As noted above, cluster S69686 features 4 segment s , w ch were listed n Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s)
are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster S69686_node_34 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S69686_T1, S69686_T33 and S69686 T36. Table 17 below describes the starting and ending position of this segment on each franscript. Table 17 - Segment location on transcripts
Segment cluster S69686_node_36 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686_T1, S69686_T33, S69686JT36 and S69686_T48. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
Segment cluster S69686_node_80 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 19 below describes the starting and ending position of this segment on each franscript. Table 19 - Segment location on transcripts
d to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster S69686_node_4 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 20 below descnbes the starting and ending position of this segment on each franscript. Table 20 - Segment location on transcripts
Segment cluster S69686_node_7 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be
found in the following transcript(s): S69686_T48. Table 21 below describes the starting and ending position of this segment on each franscript. Table 21 - Segment location on transcripts
Segment cluster S69686_node_8 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686_T48. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster S69686_node_9 according to the present invention can be found in the following franscript(s): S69686_T48. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster S69686_node_10 according to the present invention can be found in the following transcript(s): S69686_T48. Table 24 below describes the starting and ending position of this segment on each franscript. Table 24 - Segment loca tion on transcripts
Segment cluster S69686_node_16 according to the present invention can be found in the following franscript(s): S69686_T1 , S69686_T36 and S69686 T48. Table 25 below describes the starting and ending position of this segment on each franscript. Table 25 - Segment location on transcripts
Segment cluster S69686_node_17 according to the present invention can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster S69686_node_18 according to the present invention is supported by 65 hbranes. The number of hbranes was determined as previously descπbed. This segment can be found in the following transcπpt(s): S69686_T1, S69686_T33, S69686 T36 and S69686_T48. Table 27 below describes the starting and ending position of this segment on each franscπpt. Table 27 - Segment location on transcripts
Segment cluster S69686_node_19 according to the present invention can be found in the following franscπpt(s)- S69686_T1, S69686_T33 and S69686_T36. Table 28 below descnbes the starting and ending position of this segment on each franscnpt. Table 28 - Segment location on transcripts
Segment cluster S69686_node_20 according to the present invention can be found in the following franscnpt(s) S69686_T1, S69686_T33 and S69686 T36 Table 29 below descnbes the starting and ending position of this segment on each transcript.
Table 29 - Segment location on transcripts
Segment cluster S69686_node_21 accordmg to the present invention can be found in the following franscπpt(s): S69686_T1, S69686_T33 and S69686_T36. Table 30 below descπbes the starting and ending position of this segment on each franscnpt. Table 30 - Segment location on transcripts
Segment cluster S69686_node_22 according to the present invention can be found in the following transcπpt(s): S69686 T1, S69686 T33 and S69686_T36. Table 31 below descπbes the starting and ending position of this segment on each transcript Table 31 - Segment location on transcripts
Segment cluster S69686_node_23 according to the present invention can be found in the following franscript(s): S69686_T1, S69686_T33 and S69686_T36. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Segment cluster S69686_node_25 according to the present invention can be found in the following franscript(s): S69686_T1, S69686_T33 and S69686_T36. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster S69686_node_27 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S69686_T1, S69686_T33 and S69686_T36. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster S69686_node_28 according to the presert invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscπpt(s): S69686_T1, S69686_T33 and S69686_T36. Table 35 below describes the starting and ending position of this segment on each franscript. Table 35 - Segment location on transcripts
Segment cluster S69686_node_30 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): S69686_T1 and S69686_T33. Table 36 below descπbes the starting and ending position of this segment on each franscnpt. Table 36 - Segment location on transcripts Jranscript name ..Segment .
starting position ""'4 ending position5 S69686 Tl 426 503 S69686 T33 420 497
Segment cluster S69686_node_33 according to the present invention can be found in the following franscript(s): S69686_T1, S69686_T33 and S69686_T36. Table 37 below describes the starting and ending position of this segment on each franscript. Table 37 - Segment location on transcripts
Segment cluster S69686_node_35 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S69686_T1, S69686_T33, S69686 T36 and S69686_T48. Table 38 below describes the starting and ending position of this segment on each franscript. Table 38 - Segment location on transcripts
Segment cluster S69686_node_37 according to the present invention can be found in the following transcript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 39 below describes the starting and ending position of this segment on each franscript. Table 39 - Segment location on transcripts
Segment cluster S69686_node_38 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686 T48. Table 40 below descπbes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster S69686_node_39 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686_T1, S69686_T33, S69686 T36 and S69686_T48. Table 41 below describes the starting and ending position of this segment on each franscript. Table 41 - Segment location on transcripts
Segment cluster S69686_node_40 according to the present invention can be found in the following transcript(s): S69686_T1, S69686 T33, S69686_T36 and S69686 T48. Table 42 below describes the starting and ending position of this segment on each franscript. Table 42 - Segment location on transcripts
Segment cluster S69686_node_41 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S69686_T1, S69686_T33, S69686 T36 and S69686_T48. Table 43 below descnbes the starting and ending position of this segment on each franscript. Table 43 - Segment location on transcripts
Segment cluster S69686_node_42 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 44 below describes the starting and ending position of this segment on each franscript. Table 44 - Segment location on transcripts
Segment cluster S69686_node_43 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 45 below describes the starting and ending position of this segment on each franscript. Table 45 - Segment location on transcripts
Segment cluster S69686_node_44 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be
found in the following franscript(s): S69686_T1, S69686 T33, S69686_T36 and S69686 T48. Table 46 below describes the starting and ending position of this segment on each franscript. Table 46 - Segment location on transcripts
Segment cluster S69686_node_45 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686 T1, S69686_T33, S69686 T36 and S69686 T48. Table 47 below describes the starting and ending position of this segment on each franscript. Table 47 - Segment location on transcripts
Segment cluster S69686_node_46 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 48 below describes the starting and ending position of this segment on each franscript. Table 48 - Segment location on transcripts
2005/107364
530
Segment cluster S69686_node_47 according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686 T1, S69686 T33, S69686_T36 and S69686_T48. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster S69686_node_48 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S69686 T1, S69686_T33, S69686JT36 and S69686 T48. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Segment cluster S69686_node_49 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster S69686_node_51 according to the present invention can be found in the following franscript(s): S69686 T1, S69686_T33, S69686 F36 and S69686_T48. Table 52 below describes the starting and ending position of this segment on each franscnpt. Table 52 - Segment location on transcripts
Segment cluster S69686_node_52 according to the present invention can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Segment cluster S69686_node_53 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686 T1, S69686_T33, S69686_T36 and S69686_T48. Table 54 below describes the starting and ending position of this segment on each franscript. Table 54 - Segment location on transcripts
Segment cluster S69686_node_54 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 55 below describes the starting and ending position of this segment on each transcript.
Table 55 - Segment location on transcripts
Segment cluster S69686_node_55 according to the present invention can be found in the following transcπpt(s). S69686_T1, S69686_T33, S69686_T36 and S69686_T48 Table 56 below describes the starting and ending position of this segment on each franscript Table 56 - Segment location on transcripts
Segment cluster S69686_node_56 according to the present invention is supported by 35 libraries The number of hbranes was determined as previously described This segment can be found in the following franscπpt(s). S69686 T1, S69686_T33, S69686 T36 and S69686_T48 Table 57 below descπbes the starting and ending position of this segment on each franscπpt Table 57 - Segment location on transcripts
Segment cluster S69686_node_57 according to the present invention can be found in the following franscπpt(s)- S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 58 below descnbes the starting and ending position of this segment on each franscπpt. Table 58 - Segment location on transcripts
Segment cluster S69686_node_58 according to the present invention can be found in the following franscnpt(s) S69686_T1, S69686_T33, S69686_T36 and S69686_T48 Table 59 below descπbes the starting and ending position of this segment on each franscript Table 59 - Segment location on transcripts
Segment cluster S69686_node_59 according to the present invention can be found in the following franscript(s): S69686_T1, S69686 T33, S69686_T36 and S69686_T48. Table 60 below describes the starting and ending position of this segment on each franscript. Table 60 - Segment location on transcripts
Segment cluster S69686_node_60 according to the present invention can be found in the following franscπpt(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 61 below describes the starting and ending position of this segment on each franscript. Table 61 - Segment location on transcripts
Segment cluster S69686_node_61 according to the present invention can be found in the following franscript(s): S69686 T1, S69686_T33, S69686JT36 and S69686_T48. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Segment cluster S69686_node_62 according to the present invention can be found in the following franscript(s): S69686_T1, S69686 T33, S69686_T36 and S69686_T48. Table 63 below describes the starting and ending position of this segment on each franscript. Table 63 - Segment location on transcripts
Segment cluster S69686_node_63 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Segment cluster S69686_node_64 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686 T48. Table 65 below describes the starting and ending position of this segment on each franscript. Table 65 - Segment location on transcripts
Segment cluster S69686_node_65 according to the present invention can be found in the following franscript(s): S69686 T1, S69686_T33, S69686 T36 and S69686_T48. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
Segment cluster S69686_node_66 according to the present invention can be found in the following transcript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 67 below describes the starting and ending position of this segment on each franscript. Table 67 - Segment location on transcripts
Segment cluster S69686_node_67 according to the present invention can be found in the following transcript(s): S69686_T1, S69686 T33, S69686_T36 and S69686 T48. Table 68 below describes the starting and ending position of this segment on each franscript. Table 68 - Segment location on transcripts
Segment cluster S69686_node_68 according to the present invention can be found in the following franscript(s): S69686 T1, S69686_T33, S69686_T36 and S69686_T48. Table 69 below describes the starting and ending position of this segment on each franscript. Table 69 - Segment location on transcripts
Segment cluster S69686_node_69 according to the present invention can be found in the following transcript(s): S69686JT1, S69686_T33, S69686_T36 and S69686 T48. Table 70 below describes the starting and ending position of this segment on each franscript. Table 70 - Segment location on transcripts
Segment cluster S69686_node_70 according to the present invention can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 71 below describes the starting and ending position of this segment on each franscript. Table 71 - Segment location on transcripts
Segment cluster S69686_node_71 according to the present invention can be found in the following transcript(s): S69686_T1, S69686_T33, S69686 T36 and S69686_T48. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts
Segment cluster S69686_node_72 according to the present invention can be found in the following transcπpt(s): S69686_T1, S69686 T33, S69686_T36 and S69686_T48. Table 73 below describes the starting and ending position of this segment on each transcπpt. Table 73 - Segment location on transcripts
Segment cluster S69686_node_73 according to the present invention can be found in the following transcript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 74 below describes the starting and ending position of this segment on each franscript. Table 74 - Segment location on transcripts
Segment cluster S69686_node_74 according to the present invention can be found in the following transcript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts
Segment cluster S69686_node_75 according to the present invention can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686 T48. Table 76 below describes the starting and ending position of this segment on each franscript. Table 76 - Segment location on transcripts
Segment cluster S69686_node_76 according to the present invention can be found in the following franscπpt(s). S69686_T1, S69686_T33, S69686_T36 and S69686_T48 Table 77 below descπbes the starting and ending position of this segment on each franscπpt Table 77 - Segment location on transcripts
Segment cluster S69686_node_77 according to the present invention can be found m the following franscnpt(s)- S69686_T1, S69686_T33, S69686_T36 and S69686_T48 Table 78 below descnbes the starting and ending position of this segment on each franscript Table 78 - Segment location on transcripts
Segment cluster S69686_node_78 according to the present invention can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 79 below describes the starting and ending position of this segment on each transcript. Table 79 - Segment location on transcripts
Segment cluster S69686_node_79 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S69686_T1, S69686_T33, S69686_T36 and S69686_T48. Table 80 below describes the starting and ending position of this segment on each transcript. Table 80 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name: PSPA_HU AN_V1
Sequence documentation:
Alignment of: S69686 P2 x PSPA HUMAN VI
Alignment segment 1/1:
Quality: 2482.00 Escore : 0 Matching length: 248 Total length: 248 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment :
31 M LCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGL 80
1 M LCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGL 50
81 KGDPGPPGPMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPPGLP 130
51 KGDPGPPGPMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPPGLP 100 131 AHLDEELQATLHDFRHQILQTRGALSLQGSIMTVGEKVFSSNGQSITFDA 180
101 AHLDEELQATLHDFRHQILQTRGALSLQGSIMTVGEKVFSSNGQSITFDA 150 181 IQEACARAGGRIAVPRNPEENEAIASFVKKYNTYAYVGLTEGPSPGDFRY 230
151 IQEACARAGGRIAVPRNPEENEAIASFVKKYNTYAYVGLTEGPSPGDFRY 200 231 SDGTPVNYTN YRGEPAGRGKEQCVEMYTDGQWNDRNCLYSRLTICEF 278 201 SDGTPVNYTNWYRGEPAGRGKEQCVEMYTDGQWNDRNC YSRLTICEF 248
Sequence name : PSPA_HUMAN_V1
Sequence documentation:
Alignment of: S69686_P6 x PSPA_HUMAN_V1
Alignment segment 1/1: Quality: 2482.00
Escore: 0 Matching length: 248 Total length: 248 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00
Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
29 M LCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGL 78
1 M LCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGL 50 79 KGDPGPPGPMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPPGLP 128
51 KGDPGPPGPMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPPGLP 100 129 AHLDEELQATLHDFRHQILQTRGALSLQGSIMTVGEKVFSSNGQSITFDA 178
101 AHLDEELQATLHDFRHQILQTRGALSLQGSIMTVGEKVFSSNGQSITFDA 150 179 IQEACARAGGRIAVPRNPEENEAIASFVKKYNTYAYVGLTEGPSPGDFRY 228 11111111111111111111111111111111111111111111111111 151 IQEACARAGGRIAVPRNPEENEAIASFVKKYNTYAYVGLTEGPSPGDFRY 200 229 SDGTPVNYTNWYRGEPAGRGKEQCVEMYTDGQ NDRNCLYSRLTICEF 276 201 SDGTPVNYTNWYRGEPAGRGKEQCVEMYTDGQWNDRNCLYSRLTICEF 248
Sequence name: PSPA_HUMAN_V1
Sequence documentation:
Alignment of: S69686_P7 x PSPA_HUMAN_V1
Alignment segment l/l:
Quality: 2127.00 Escore: 0 Matching length: 222 Total length: 248 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 89.52 Total Percent Identity: 89.52 Gaps : 1
Alignment :
31 M LCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGL 80
1 M LCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGL 50 81 KGDPGPPGPMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPP... 127
51 KGDPGPPGPMGPPGEMPCPPGNDGLPGAPGIPGECGEKGEPGERGPPGLP 100
128 ALSLQGSIMTVGEKVFSSNGQSITFDA 154 111111111111111111111111111 101 AHLDEE QATLHDFRHQILQTRGALSLQGSIMTVGEKVFSSNGQSITFDA 150
155 IQEACARAGGRIAVPRNPEENEAIASFVKKYNTYAYVGLTEGPSPGDFRY 204
151 IQEACARAGGRIAVPRNPEENEAIASFVKKYNTYAYVGLTEGPSPGDFRY 200 205 SDGTPVNYTN YRGEPAGRGKEQCVEMYTDGQWNDRNCLYSRLTICEF 252
201 SDGTPVNYTN YRGEPAGRGKEQCVEMYTDGQ NDRNCLYSRLTICEF 248
Sequence name: PSPA_HUMAN_V1
Sequence documentation:
Alignment of: S69686_P13 x PSPA_HUMAN_V1
Alignment segment 1/1:
Quality: 522.00
Escore : 0 Matching length: 61 Total length: 248 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 24.60 Total Percent Identity: 24.60 Gaps : 1
Alignment :
1 MWLCPLALNLILMAASGAVCEVKDVCVGSP 30
1 M LCPLALNLILMAASGAVCEVKDVCVGSPGIPGTPGSHGLPGRDGRDGL 50
30 30 51 KGDPGPPGPMGPPGEMPCPPGNDG PGAPGIPGECGEKGEPGERGPPGLP 100
30 30
101 AHLDEELQATLHDFRHQILQTRGALSLQGSIMTVGEKVFSSNGQSITFDA 150 . . . . . 30 30
151 IQEACARAGGRIAVPRNPEENEAIASFVKKYNTYAYVGLTEGPSPGDFRY 200 31 GRGKEQCVEMYTDGQ NDRNCLYSRLTICEF 61
201 SDGTPVNYTNWYRGEPAGRGKEQCVEMYTDGQWNDRNC YSR TICEF 248
DESCRIPTION FOR CLUSTER HUMTCXAAA Cluster HUMTCXAAA features 5 transcript(s) and 17 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants ofthe known protein T-cell surface glycoprotein CD8 alpha chain precursor (SwissProt accession identifier CD8A_HUMAN; known also according to the synonyms T-lymphocyte differentiation antigen T8/Leu-2), SEQ ID NO: 460, refened to herein as the previously known protein. Protein T-cell surface glycoprotein CD8 alpha chain precursor is known or believed to have the following function(s): identifies cytotoxic/suppressor T-cells that interact with MHC class I bearing targets. CD8 is thought to play a role in the process of T-cell mediated killing. CD8 alpha chains binds to class I MHC molecules alpha-3 domains. The sequence for protein T- cell surface glycoprotein CD8 alpha chain precursor is given at the end of the application, as "T- cell surface glycoprotein CD8 alpha chain precursor amino acid sequence". Protein T-cell surface glycoprotein CD8 alpha chain precursor localization is believed to be Type I membrane protein. The CD8 molecule is composed of two chains (alpha and beta) and has a molecular weight of 30 to 32kD. It is found on a T cell subset of normal cytotoxic/suppressor cells which make up approximately 20 to 35 per cent of human peripheral blood lymphocytes. The CD8 antigen is also detected on natural killer cells, 80 per cent of thymocytes, on a subpopulation of 30 per cent of peripheral blood null cells and 15 to 30 per cent of bone anow cells. It is used in the immunohistochemistry lymphoma and leukemia typing and in particular the identification
of granular lymphocytic leukemia in the bone manow and Sezary syndrome (an erythrodermic cutaneous T-cell lymphoma with a leukemic component). It has been investigated for clinical therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities ofthe previously known protein are as follows: CD4 agonist;
Immunostimulant; Lymphocyte function- associated molecule inhibitor; CD8 modulator. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Immunostimulant, anti- AIDS; Monoclonal antibody, murine;
Anticancer; Immunostimulant; Immunosuppressant; Monoclonal antibody, humanized;
Antidiabetic; Antipsoriasis; Multiple sclerosis treatment; Antiarthritic, immunological. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: immune response; fransmembrane receptor protein tyrosine kinase signaling pathway; cellular defense response (sensu Vertebrata), which are annotation(s) related to Biological Process; protein binding; coreceptor, which are annotation(s) related to Molecular
Function; and integral plasma membrane protein; integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl
Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HUMTCXAAA features 5 transcript(s), which were listed in
Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein T-cell surface glycoprotein CD8 alpha chain precursor. A description of each variant protein according to the present invention is now provided. Variant protein HUMTCXAAA__PEA_1_P6 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s)
HUMTCXAAA__PEA_1_T6. An alignment is given to the known protein (T-cell surface glycoprotein CD8 alpha chain precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTCXAAA_PEA_1_P6 and CD8A_HUMAN: l.An isolated chimeric polypeptide encoding for HUMTCXAAA_PEA_1_P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MRNQAPGRPKGATFPPRRPTGSRAPPLAPELRAKQRPGERV conesponding to amino acids 1 - 41 of HUMTCXAAA_PEA_1_P6, a second amino acid sequence being at least 90 % homologous to
MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNPTSGCSWLFQ PRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVLTLSDFRRENEGYYFCSAL SNSIMYFSHFVPVFLP corresponding to amino acids 1 - 134 of CD8A_HUMAN, which also conesponds to amino acids 42 - 175 of HUMTCXAAA_PEA_1_P6, a third amino acid sequence bridging amino acid sequence comprising of G, and a fourth amino acid sequence being at least 90 % homologous to NRRRVCKCPRPWKSGDKPSLSARYV conesponding to amino acids 210 - 235 of CD8A_HUMAN, which also conesponds to amino acids 177 - 202 of HUMTCXAAA_PEA_1_P6, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of HUMTCXAAA_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MRNQAPGRPKGATFPPRRPTGSRAPPLAPELRAKQRPGERV of HUMTCXAAA_PEA_1_P6. 3.An isolated polypeptide encoding for an edge portion of HUMTCXAAA_PEA_1_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at
least about 50 amino acids in length, wherein at least two amino acids comprise PGN having a stmcture as follows (numbering according to HUMTCXAAA_PEA_1__P6): a sequence starting from any of amino acid numbers 175-x to 175; and ending at any of amino acid numbers 177 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because one of the two signal- peptide prediction programs (HMM:Signal peptide,NN:NO) predicts that this protein has a signal peptide. Variant protein HUMTCXAAA_PEA_1_P6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 4, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTCXAAA_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 4 - Amino acid mutations
Variant protein HUMTCXAAA_PEA_1_P6 is encoded by the following franscript(s): HUMTCXAAA_PEA_1_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HUMTCXAAA_PEA_1_T6 is shown in bold; this coding portion starts at position 1821 and ends at position 2426. The franscript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTCXAAA_PEA_1_P6
sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
Variant protein HUMTCXAAA__PEA_1__P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HUMTCXAAA_PEA_1_T9. An alignment is given to the known protein (T-cell surface glycoprotein CD8 alpha chain precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTCXAAA_PEA_1_P12 and CD8A_HUMAN: l.An isolated chimeric polypeptide encoding for HUMTCXAAA_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to
MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNPTSGCSWLFQ PRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVLTLSDFRRENEGYYFCSAL SNSIMYFSHFVPVFLPAKPTTTPAPRPPTPAPTIASQPLSLRPEACRPAAGGA conesponding to amino acids 1 - 171 of CD8A_HUMAN, which also corresponds to amino acids 1 - 171 of HUMTCXAAA_PEA_1_P12, a second amino acid sequence bridging amino acid sequence comprising of G, and a third amino acid sequence being at least 90 % homologous to NRRRVCKCPRPWKSGDKPSLSARYV corresponding to amino acids 210 - 235 of CD8A_HUMAN, which also conesponds to amino acids 173 - 198 of HUMTCXAAA_PEA_1_P12, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HUMTCXAAA_PEA_1_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AGN having a stmcture as follows (numbering according to HUMTCXAAA_PEA_1_P12): a sequence starting from any of amino acid numbers 171-x to 171; and ending at any of amino acid numbers 173 + ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans- membrane region prediction program predicts that this protein has a frans- membrane region.
Variant protein HUMTCXAAA_PEA_1_P12 is encoded by the following transcript(s): HUMTCXAAA_PEA_1_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTCXAAA_PEA_1_T9 is shown in bold; this coding portion starts at position 1608 and ends at position 2201. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide
sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTCXAAA_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HUMTCXAAA_PEA_1_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HUMTCXAAA_PEA_1_T10. An alignment is given to the known protein (T-cell surface glycoprotein CD8 alpha chain precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTCXAAA_PEA_1_P13 and CD8A_HUMAN:
l.An isolated chimeric polypeptide encoding for HUMTCXAAA_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNPTSGCSWLFQ PRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVLTLSDFRRENEGYYFCSAL SNSIMYFSHFVPVFLP AKPTTTPAPRPPTPAPTIASQPLSLRPEACRPAAGGA VHTRGLDF ACDIYIWAPLAGTCGVLLLSLVITLYCNH corresponding to amino acids 1 - 208 of CD8A_HUMAN, which also corresponds to amino acids 1 - 208 of
HUMTCXAAA__PEA_1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%> homologous to a polypeptide having the sequence SKSRGIAAGRSRPRSCPWLC conesponding to amino acids 209 - 228 of HUMTCXAAA_PEA_1_P13, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMTCXAAA_PEA_1_P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SKSRGIAAGRSRPRSCPWLC in HUMTCXAAA_PEA_1_P13.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is beliered to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both frans- membrane region prediction programs predicted a frans- membrane region for this protein.
Variant protein HUMTCXAAA_PEA_1_P13 is encoded by the following transcript(s): HUMTCXAAA_PEA_ 1_T10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMTCXAAA_PEA_1_T10 is shown in bold; this coding portion starts at position 1608 and ends at position 2291. The franscript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTCXAAA_PEA_1__P13
sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HUMTCXAAA__PEA_1_P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HUMTCXAAA_PEA_1_T11. An alignment is given to the known protein (T-cell surface glycoprotein CD8 alpha chain precursor) at the end of the application. One or more alignments
to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTCXAAA_PEA_1_P14 and CD8A_HUMAN: l.An isolated chimeric polypeptide encoding for HUMTCXAAA_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNPTSGCSWLFQ PRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVLTLSDFRRENEGYYFCSAL SNSIMYFSH corresponding to amino acids 1 - 127 of CD8A_HUMAN, which also conesponds to amino acids 1 - 127 of HUMTCXAAA_PEA_1_P14, and a second amino acid sequence being at least 90 % homologous to
FACDIYIWAPLAGTCGVLLLSLVITLYCNHRNRRRVCKCPRPWKSGDKPSLSARYV conesponding to amino acids 179 - 235 of CD8A_HUMAN, which also conesponds to amino acids 128 - 184 of HUMTCXAAA_PEA_1_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HUMTCXAAA_PEA_1_P14, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise HF, having a stmcture as follows: a sequence starting from any of amino acid numbers 127-x to 127; and ending at any of amino acid numbers 128+ ((n-2) - x), in which x varies from 0 to n-2. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both trans- membrane region prediction programs predict that this protein has a frans-membrane region downstream of this signal peptide.
Variant protein HUMTCXAAA_PEA__1__P14 is encoded by the following franscript(s): HUMTCXAAA_PEA_1_T11, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HUMTCXAAA_PEA_1_T11 is shown in bold; this coding portion starts at position 1608 and ends at position 2159. The franscript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTCXAAA_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HUMTCXAAA_PEA_1_P15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s)
HUMTCXAAA_PEA_1_T12. An alignment is given to the known protein (T-cell surface glycoprotein CD8 alpha chain precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMTCXAAA_PEA_1_P15 and CD8A_HUMAN: l.An isolated chimeric polypeptide encoding for HUMTCXAAA_PEA_1_P15, comprising a first amino acid sequence being at least 90 % homologous to MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNPTSGCSWLFQ PRG AAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVLTLSDFRRENEGYYFCS AL SNSIMYFSHFVPVFLP corresponding to amino acids 1 - 134 of CD8 A_HUMAN, which also corresponds to amino acids 1 - 134 of HUMTCXAAA_PEA_1_P15, a second amino acid sequence bridging amino acid sequence comprising of G, and a third amino acid sequence being at least 90 % homologous to NRRRVCKCPRPWKSGDKPSLSARYV conesponding to amino acids 210 - 235 of CD8A_HUMAN, which also conesponds to amino acids 136 - 161 of HUMTCXAAA_PEA_1_P15, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. n isolated polypeptide encoding for an edge portion of HUMTCXAAA_PEA_1_P15, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PGN having a structure as follows (numbering according to HUMTCXAAA_PEA_1_P15): a sequence starting from any of amino acid numbers 134-x to 134; and ending at any of amino acid numbers 136 + ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide
prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region.
Variant protem HUMTCXAAA_PEA_1__P15 is encoded by the following transcript(s): HUMTCXAAA__PEA_1__T12, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HUMTCXAAA_PEA_1_T12 is shown in bold; this coding portion starts at position 1608 and ends at position 2090. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMTCXAAA_PEA_1_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately
because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMTCXAAA_PEA_l_node_0 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMTCXAAA_PEA_1_T6, HUMTCXAAA_PEA_1_T9, HUMTCXAAA_PEA_1_T10, HUMTCXAAA_PEA_1_T11 and HUMTCXAAA_PEA_1_T12. Table 10 below describes the starting and ending position of this segment on each franscript. Table 10 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_l according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTCXAAA_PEA_1_T6. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_2 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMTCXAAA_PEA_1_T6, HUMTCXAAA_PEA_1_T9, HUMTCXAAA_PEA_1_T10, HUMTCXAAA__PEA_1_T11 and HUMTCXAAA_PEA_1_T12. Table 12 below describes the starting and ending position of this segment on each franscript. Table 12 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node__4 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMTCXAAA_PEA_1_T6, HUMTCXAAA_PEA_1_T9, HUMTCXAAA_PEA_1_T10, HUMTCXAAA_PEA_1_T11 and HUMTCXAAA_PEA_1_T12. Table 13 below describes the starting and ending position of this segment on each franscript. Table 13 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_6 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMTCXAAA_PEA_1_T6, HUMTCXAAA_PEA_1_T9, HUMTCXAAA_PEA_1_T10, HUMTCXAAA_PEA_1_T11 and HUMTCXAAA_PEA_1_T12. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_8 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMTCXAAA_PEA_1_T6, HUMTCXAAA_PEA_1_T9, HUMTCXAAA_PEA_1_T10, HUMTCXAAA_PEA_1_T11 and HUMTCXAAA_PEA_1_T12. Table 15 below describes the starting and ending position of this segment on each franscript. Table 15 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_17 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTCXAAA_PEA_1_T10. Table 16 below describes the starting and ending position of this segment on each franscript. Table 16 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_20 according to the present invention is supported by 29 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTCXAAA_PEA_1_T6, HUMTCXAAA_PEA_1_T9, HUMTCXAAA_PEA_1_T10, HUMTCXAAA_ PEA_1_T11 and HUMTCXAAA_PEA_1_T12. Table 17 below describes the starting and ending position of this segment on each franscript. Table 17 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_21 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTCXAAA_PEA_1_T6, HUMTCXAAA_PEA_1_T9, HUMTCXAAA_ PEA_1_T10, HUMTCXAAA_PEA_1_T11 and HUMTCXAAA_PEA_1_T12. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_22 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): HUMTCXAAA_PEA_1_T6, HUMTCXAAA_PEA_1_T9, HUMTCXAAA_PEA_1_T10, HUMTCXAAA_PEA_1_T11 and HUMTCXAAA_PEA__1_T12. Table 19 below describes the starting and ending position of this segment on each transcπpt. Table 19 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMTCXAAA_PEA_l_node_9 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMTCXAAA_PEA_1_T6, HUMTCXAAA_PEA_1_T9, HUMTCXAAA_PEA_1_T10, HUMTCXAAA__PEA__1_T11 and HUMTCXAAA_PEA_1_T12. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_10 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTCXAAA_PEA_1_T6,
HUMTCXAAA_PEA_1_T9, HUMTCXAAA_PEA_1_T10, HUMTCXAAA_PEA_1_T11 and HUMTCXAAA_PEA_1_T12. Table 21 below describes the starting and ending position of this segment on each franscript. Table 21 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_l 1 according to the present invention can be found in the following franscript(s): HUMTCXAAA_PEA_1_T6, HUMTCXAAA_PEA_1_T9, HUMTCXAAA_PEA_1_T10 and HUMTCXAAA_PEA_1_T12. Table 22 below describes the starting and ending position of this segment on each franscript. Table 22 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_13 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscnpt(s): HUMTCXAAA_PEA_1_T9 and
HUMTCXAAA._PEA_1._T 10. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_15 according to the present invention can be found in the following transcript(s): HUMTCXAAA_PEA__1_T10. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l__node_16 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMTCXAAA_PEA_1_T10 and HUMTCXAAA_PEA_1_T11. Table 25 below describes the starting and ending position of this segment on each franscript. Table 25 - Segment location on transcripts
Segment cluster HUMTCXAAA_PEA_l_node_18 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMTCXAAA_PEA_1_T6, HUMTCXAAA_PEA_1_T9, HUMTCXAAA_PEA_1_T10, HUMTCXAAA_PEA_1_T11 and HUMTCXAAA_PEA_1_T12. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name : CD8A_HUMAN
Sequence documentation:
Alignment of: HUMTCXAAA_PEA_1_P6 x CD8A_HUMAN Alignment segment 1/1:
Quality: 1472.00 Escore :
Matching length: 161 Total length: 235 Matching Percent Similarity: 99.38 Matching Percent Identity: 99.38 Total Percent Similarity: 68.09 Total Percent Identity: 68.09 Gaps : 1
Alignment :
42 MALPVTALLLPLAL LHAARPSQFRVSPLDRT NLGETVELKCQVLLSNP 91
1 MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNP 50
92 TSGCSWLFQPRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVL 141
51 TSGCSW FQPRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVL 100
142 T SDFRRENEGYYFCSALSNSIMYFSHFVPVFLPG 176
101 TLSDFRRENEGYYFCSALSNSIMYFSHFVPVFLPAKPTTTPAPRPPTPAP 150
176 176
151 TIASQPLSLRPEACRP7ΛAGGAVHTRGLDFACDIYI APLAGTCGVLLLSL 200
177 NRRRVCKCPRPWKSGDKPSLSARYV 202
201 VITLYCNHRNRRRVCKCPRPWKSGDKPSLSARYV 235
Sequence name: CD8A_HUMAN
Sequence documentation:
Alignment of: HUMTCXAAA_PEA_1_P12 x CD8A_HUMAN
Alignment segment l/l:
Quality: 1831.00 Escore: 0 Matching length: 198 Total length: 235 Matching Percent Similarity: 99.49 Matching Percent Identity: 99.49 Total Percent Similarity: 83.83 Total Percent Identity: 83.83 Gaps : 1
Alignment : 1 MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNP 50
1 MALPVTALLLPLALLLHAARPSQFRVSPLDRT NLGETVELKCQVLLSNP 50 51 TSGCS LFQPRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVL 100 11111111111111111111111111111111111111111111111111 51 TSGCSWLFQPRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVL 100
101 TLSDFRRENEGYYFCSALSNSIMYFSHFVPVFLPAKPTTTPAPRPPTPAP 150
101 TLSDFRRENEGYYFCSALSNSIMYFSHFVPVFLPAKPTTTPAPRPPTPAP 150 . . . . . 151 TIASQPLSLRPEACRPAAGGA 171
151 TIASQPLSLRPEACRPAAGGAVHTRGLDFACDIYI APLAGTCGVLLLSL 200 172 GNRRRVCKCPRPWKSGDKPSLSARYV 198
201 VITLYCNHRNRRRVCKCPRPWKSGDKPSLSARYV 235
Sequence name : CD8A_HUMAN
Sequence documentation:
Alignment of: HUMTCXAAA_PEA_1_ 13 x CD8A_HUMAN
Alignment segment 1/1:
Quality: 2043.00 Escore : 0 Matching length: 212 Total length: 212
Matching Percent Similarity: 98.58 Matching Percent
Identity: 98.58 Total Percent Similarity: 98.58 Total Percent
Identity: 98.58 Gaps : 0
Alignment :
1 MALPVTALLLPLALLLHAARPSQFRVSPLDRT NLGETVELKCQVLLSNP 50 MINIMI 1 MALPVTALLLPLALLLHAARPSQFRVSPLDRTWNLGETVELKCQVLLSNP 50 51 TSGCS LFQPRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVL 100 IMMMMIMMMIMMMMIMIMMIMMMMMMMM 51 TSGCS LFQPRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVL 100
101 TLSDFRRENEGYYFCSALSNSIMYFSHFVPVFLPAKPTTTPAPRPPTPAP 150 11111111 ! 11111111111 M 1111111111111111111 II 11111 ! I 101 TLSDFRRENEGYYFCSALSNSIMYFSHFVPVFLPAKPTTTPAPRPPTPAP 150 . . . . . 151 TIASQPLSLRPEACRPAAGGAVHTRGLDFACDIYI APLAGTCGVLLLSL 200 11111 II 11 E 111111 Ii 111 II M f I II i 111 II 11111111 II 11111 151 TIASQPLSLRPEACRPAAGGAVHTRGLDFACDIYIWAPLAGTCGVLLLSL 200 201 VITLYCNHSKSR 212 M M! I 201 VITLYCNHRNRR 212
Sequence name: CD8A_HUMAN
Sequence documentation:
Alignment of: HUMTCXAAA_PEA_1_P14 x CD8A_HUMAN
Alignment segment 1/1:
Quality: 1710.00 Escore: 0 Matching length: 184 Total length: 235 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 78.30 Total Percent Identity: 78.30 Gaps : 1
Alignment :
1 MALPVTALLLPLALLLHAARPSQFRVSPLDRT NLGETVELKCQVLLSNP 50 1111111 M 111 i 11111111111 i 1111111111111111111111111 1 MALPVTALLLPLALLLHAARPSQFRVSPLDRT NLGETVELKCQVLLSNP 50
51 TSGCS LFQPRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVL 100 MMMIMMIMMIMMMIMMMMMMMMMMIMMI 51 TSGCS LFQPRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVL 100 101 TLSDFRRENEGYYFCSALSNSIMYFSH 127
IIIMMMMMMMMMMMM 101 TLSDFRRENEGYYFCSALSNSIMYFSHFVPVFLPAKPTTTPAPRPPTPAP 150 128 FACDIYI APLAGTCGVLLLSL 149 l l l l l l l l l l l l l l l l l l l l l l 151 TIASQPLSLRPEACRPAAGGAVHTRGLDFACDIYIWAPLAGTCGVLLLSL 200 150 VITLYCNHRNRRRVCKCPRPWKSGDKPSLSARYV 184 MMMIMMMMMMIIIIMMMMMM 201 VITLYCNHRNRRRVCKCPRPWKSGDKPSLSARYV 235
Sequence name: CD8A_HUMAN
Sequence documentation:
Alignment of: HUMTCXAAA_PEA_1_P15 x CD8A_HUMAN
Alignment segment l/l: Quality: 1472.00
Escore: 0 Matching length: 161 Total length: 235 Matching Percent Similarity: 99.38 Matching Percent Identity: 99.38
Total Percent Similarity: 68.09 Total Percent Identity: 68.09 Gaps : 1
Alignment:
1 MALPVTALLLPLALLLHAARPSQFRVSPLDRT NLGETVELKCQVLLSNP 50 MIMMMMMIMMMMMIMMMMIMMMIMMMMI 1 MALPVTALLLPLALLLHAARPSQFRVSPLDRT NLGETVELKCQVLLSNP 50 51 TSGCS LFQPRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVL 100 MMMIIMMMMMIIMMMMMMMMIMMIMMMM 51 TSGCS LFQPRGAAASPTFLLYLSQNKPKAAEGLDTQRFSGKRLGDTFVL 100 101 TLSDFRRENEGYYFCSALSNSIMYFSHFVPVFLPG 135 MMMMMMMMMMIIMIMMIMM 101 TLSDFRRENEGYYFCSALSNSIMYFSHFVPVFLPAKPTTTPAPRPPTPAP 150
135 135
151 TIASQPLSLRPEACRPAAGGAVHTRGLDFACDIYI APLAGTCGVLLLSL 200
136 NRRRVCKCPRPWKSGDKPSLSARYV 161 111111111111 If 111 II 1111111 201 VITLYCNHRNRRRVCKCPRPWKSGDKPSLSARYV 235
DESCRIPTION FOR CLUSTER HSPPI Cluster HSPPI features 7 franscript(s) and 14 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
These sequences are variants ofthe known protein Insulin precursor (SwissProt accession identifier INS_HUMAN), SEQ ID NO: 487, refened to herein as the previously known protein. Protein Insulin precursor is known or believed to have the following function(s): insulin decreases blood glucose concenfration. It increases cell permeability to monosaccharides, amino acids and fatty acids. It accelerates glycolysis, the pentose phosphate cycle, and glycogen synthesis in liver. Protein Insulin precursor is used for the diagnosis and monitoring of many metabolic syndromes including diabetes, hypoglycemia and obesity. Immunohistochemistry staining is used in the diagnosis of pancreatic endocrine tumors. The sequence for protein Insulin precursor is given at the end of the application, as "Insulin precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Insulin precursor localization is believed to be Secreted.
The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Diabetes, Type I; Diabetes, Type II; Cardiomyopathy, diabetic; Diabetes; Wound healing. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Insulin agonist; Interleukin 10 agonist; Interleukin 4 agonist; Immunomodulator. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Antidiabetic; Insulin; Symptomatic antidiabetic; Cardiovascular; Growth hormone; Vulnerary. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: glucose metabolism; energy pathways; lipid metabolism; cell surface receptor linked signal transduction; cell-cell signaling; physiological processes, which are annotation(s) related to Biological Process; insulin receptor ligand; hormone, which are annotation(s) related to Molecular Function; and exfracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HSPPI features 7 franscript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Insulin precursor. A description of each variant protein according to the present invention is now provided. Variant protein HSPPI_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPPI_PEA_1_T12. An alignment is given to the known protein (Insulin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPPI_PEA_1_P6 and INS_HUMAN: l.An isolated chimeric polypeptide encoding for HSPPI_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEAL YLVCGERGFFYTPKTRRE AEDLQ conesponding to amino acids 1 - 62 of INS_HUMAN, which also conesponds to amino acids 1 - 62 of HSPPI_PEA_1_P6, and a second amino acid sequence being at least 90 % homologous to GSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN conesponding to amino acids 75 - 110 of INS_HUMAN, which also corresponds to amino acids 63 - 98 of HSPPI_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSPPI_PEA_1_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise QG, having a structure as follows: a sequence starting from any of amino acid numbers 62-x to 62; and ending at any of amino acid numbers 63+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell:
secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans-membrane region.
Variant protein HSPPI_PEA_1_P6 is encoded by the following franscript(s): HSPPI_PEA_1_T12, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSPPI_PEA_1_T12 is shown in bold; this coding portion starts at position 128 and ends at position 421. The franscript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPPI_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
Variant protein HSPPI_PEA_1_P8 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPPI_PEA_1_T3. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither
trans- membrane region prediction program predicts that this protein has a frans-membrane region.
Variant protein HSPPI_PEA_1_P8 is encoded by the following franscript(s): HSPPI_PEA_1_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSPPI_PEA_1_T3 is shown in bold; this coding portion starts at position 311 and ends at position 640. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPPI_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HSPPI_PEA_1_P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSPPI_PEA_1_T5. An alignment is given to the known protein (Insulin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences
are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPPI_PEA_1_P9 and ιNS_HUMAN: l.An isolated chimeric polypeptide encoding for HSPPI_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHL VEAL YLVCGERGFFYTPKTRRE AEDLQ corresponding to amino acids 1 - 62 of INS_HUMAN, which also conesponds to amino acids 1 - 62 of HSPPI_PEA_1_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GEPTAHCCPWPPPATPCSWRSHPAWAEGGRRLPPSRGSGALF conesponding to amino acids 63 - 104 of HSPPI_PEA_1_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPPI_PEA_1_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GEPTAHCCPWPPPATPCSWRSHPAWAEGGRRLPPSRGSGALF in HSPPI_PEA_1_P9. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region.
Variant protein HSPPI_PEA_1_P9 is encoded by the following franscript(s): HSPPI_PEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSPPI_PEA_1_T5 is shown in bold; this coding portion starts at position 311 and ends at position 622. The franscript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative
nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPPI_PEA_1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HSPPI_PEA_1_P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPPI_PEA_1_T6. An alignment is given to the known protein (Insulin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPPI_PEA_1_P10 and INS TUMAN: l.An isolated chimeric polypeptide encoding for HSPPI_PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHL VEAL YLVCGERGFFYTPKTRRE AEDLQ conesponding to amino acids 1 - 62 of INS_HUMAN, which also conesponds to amino acids 1 - 62 of HSPPI_PEA_1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GGAGRGPWCRQPAALGPGGVPAEAWHCGTMLYQHLLPLPAGELLQLDAARRQPHTR
RLLHRERWNKALEPA conesponding to amino acids 63 - 133 of HSPPI_PEA_1_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPPI_PEA_ _P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GGAGRGPWCRQPAALGPGGVPAEAWHCGTMLYQHLLPLPAGELLQLDAARRQPHTR RLLHRERWNKALEPA in HSPPI_PEA_1_P10.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSPPI_PEA_1_P10 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPPI_PEA_1_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein HSPPI_PEA_1_P10 is encoded by the following franscript(s): HSPPI_PEA_1_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPPI_PEA_1_T6 is shown in bold; this coding portion starts at position 311 and ends at position 709. The franscript also has the following SNPs as listed in Table 9 (given according to their. position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPPI_PEA_1_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein HSPPI_PEA_1_P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSPPI_PEA_1_T13. An alignment is given to the known protein (Insulin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPPI_PEA_1_P12 and INS_HUMAN: l.An isolated chimeric polypeptide encoding for HSPPI_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRRE AEDLQ conesponding to amino acids 1 - 62 of INS_HUMAN, which also conesponds to
amino acids 1 - 62 of HSPPI_PEA_1_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AGELLQLDAARRQPHTRRLLHRERWNKALEPA conesponding to amino acids 63 - 94 of HSPPI_PEA_1_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPPI_PEA_1_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence AGELLQLDAARRQPHTRRLLHRERWNKALEPA in HSPPI_PEA_1_P12.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSPPI_PEA_1_P12 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPPI_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Variant protein HSPPI_PEA_1_P12 is encoded by the following franscript(s): HSPPI_PEA_1_T13, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript HSPPI_PEA_1_T13 is shown in bold; this coding portion starts at position 311 and ends at position 592. The franscript also has the following SNPs as listed in Table 11 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPPI_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Variant protein HSPPI_PEA_1_P14 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by ttanscript(s) HSPPI_PEA_1_T17. An alignment is given to the known protein (Insulin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPPI_PEA_1_P14 and INS_HUMAN: l.An isolated chimeric polypeptide encoding for HSPPI_PEA_1_P14, comprising a first amino acid sequence being at least 90 % homologous to MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRRE
AEDLQ conesponding to amino acids 1 - 62 of INS_HUMAN, which also conesponds to amino acids 1 - 62 of HSPPI_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AGELLQLDAAIIRQPHTRRLLHRERWNKALEPALLCP CVLGALGQAPLPGTWSPSQL SPRSLGAHRCQRRPGPACSGSPQSGHACRLPAAPTLWLRVQYGSCGGL conesponding to amino acids 63 - 168 of HSPPI_PEA_1_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPPI_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
AGELLQLDAARRQPHTRRLLHRERWNKALEPALLCRLCVLGALGQAPLPGTWSPSQL SPRSLGAHRCQRRPGPACSGSPQSGHACRLPAAPTLWLRVQYGSCGGL in HSPPI_PEA_1_P14.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSPPI_PEA_1_P14 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPPI_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 12 - Amino acid mutations
Variant protein HSPPI_PEA_1_P14 is encoded by the following franscript(s): HSPPI_PEA_1_T17, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPPI_PEA_1_T17 is shown in bold; this coding portion starts at position 128 and ends at position 631. The franscript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPPI_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein HSPPI_PEA_1_P15 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSPPI_PEA_1_T18. An alignment is given to the known protein (Insulin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPPI_PEA_1_P15 and INS_HUMAN: l.An isolated chimeric polypeptide encoding for HSPPI_PEA_1_P15, comprising a first amino acid sequence being at least 90 % homologous to
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEAL YLVCGERGFFYTPKTRRE AEDLQ conesponding to amino acids 1 - 62 of INS_HUMAN, which also conesponds to amino acids 1 - 62 of HSPPI_PEA_1_P15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GGAGRGPWCRQPAALGPGGVPAEAWHCGTMLYQHLLPLPAGELLQLDAARRQPHTR RLLHRERWNKALEPALLCRLCVLGALGQAPLPGTWSPSQLSPRSLGAHRCQRRPGPA CSGSPQSGHACRLPAAPTLWLRVQYGSCGGL conesponding to amino acids 63 - 207 of HSPPI_PEA_1_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPPI_PEA_1_P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GGAGRGPWCRQPAALGPGGVPAEAWHCGTMLYQHLLPLPAGELLQLDAARRQPHTR RLLHRERWNKALEPALLCRLCVLGALGQAPLPGTWSPSQLSPRSLGAHRCQRRPGPA CSGSPQSGHACRLPAAPTLWLRVQYGSCGGL in HSPPI_PEA_1_P15.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell:
secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSPPI_PEA_1_P15 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPPI_PEA_1_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations
Variant protein HSPPI_PEA_1_P15 is encoded by the following transcript(s): HSPPI_PEA_1_T18, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPPI_PEA_1_T18 is shown in bold; this coding portion starts at position 311 and ends at position 931. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPPI_PEA_1_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSPPI_PEA_l_node_2 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPPI_PEA_1_T3, HSPPI_PEA_1_T5, HSPPI_PEA_1_T6, HSPPI_PEA_1_T13 and HSPPI_PEA_1_T18. Table 16 below describes the starting and ending position of this segment on each franscript. Table 16 - Segment location on transcripts
Segment cluster HSPPI_PEA_l_node_7 according to the present invention is supported by 6 hbranes. The number of hbranes was determined as previously descπbed This segment can be found in the following transcπpt(s): HSPPI_PEA_1_T5. Table 17 below descπbes the starting and ending position of this segment on each transcπpt. Table 17 - Segment location on transcripts
Segment cluster HSPPI_PEA_ _node_13 according to the present invention is supported by 24 hbranes. The number of hbranes was determined as previously descnbed This segment can be found in the following transcnpt(s) HSPPI_PEA_1_T3, HSPPI_PEA_1_T5, HSPPI_PEA_1_T6, HSPPI_PEA_1_T12, HSPPI_PEA_1_T13, HSPPI_PEA_1_T17 and HSPPI_PEA_1_T18 Table 18 below descπbes the starting and ending position of this segment on each franscnpt Table 18 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSPPI_PEA_l_node_0 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPPI_PEA_1_T3, HSPPI_PEA_1_T5, HSPPI_PEA_1_T6, HSPPI_PEA_1_T12, HSPPI_PEA_1_T13, HSPPI_PEA_1_T17 and HSPPI_PEA_1_T18. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster HSPPI_PEA_l_node_l according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPPI_PEA_1_ T3, HSPPI_PEA_1_T5,
HSPPI_PEA_1_T6, HSPPI_PEA_1_T13 and HSPPI_PEA_1_T18. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HSPPI_PEA_l_node_3 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPPI_PEA_1_T3, HSPPI_PEA_1_T5, HSPPI_PEA_1_T6, HSPPI_PEA_1_T12, HSPPI_PEA_1_T13, HSPPI_PEA_1_T17 and HSPPI_PEA_1_T18. Table 21 below describes the starting and ending position of this segment on each franscript. Table 21 - Segment location on transcripts
Segment cluster HSPPI_PEA_l_node_4 according to the present invention can be found in the following transcπpt(s) HSPPI_PEA_1_T3, HSPPI_PEA_1_T5, HSPPI_PEA_1_T6, HSPPI_PEA_1_T12, HSPPI_PEA_1_T13, HSPPI_PEA_1_T17 and HSPPI_PEA_1_T18 Table 22 below descnbes the starting and ending position of this segment on each transcπpt Table 22 - Segment location on transcripts
Segment cluster HSPPI_PEA_l_node_5 according to the present invention is supported by 24 libraries The number of hbranes was determined as previously described This segment can be found in the following franscnpt(s) HSPPI_PEA_1_T3, HSPPI_PEA_1_T5, HSPPI_PEA_1_T6, HSPPI_PEA_1_T12, HSPPI_PEA_1_T13, HSPPI_PEA_1_T17 and HSPPI_PEA_1_T18 Table 23 below descnbes the starting and ending position of this segment on each franscπpt Table 23 - Segment location on transcripts
Segment cluster HSPPI_PEA_l_node_6 according to the present invention can be found in the following franscript(s). HSPPI_PEA_1_T3, HSPPI_PEA_1_T5, HSPPI_PEA_1_T6, HSPPI_PEA_1_T12, HSPPI_PEA_1_T13, HSPPI_PEA_1_T17 and HSPPI_PEA_1_T18. Table 24 below descnbes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HSPPI_PEA_l_node_8 according to the present invention can be found in the following transcπpt(s): HSPPI_PEA_1_T5. Table 25 below descnbes the starting and ending position of this segment on each franscnpt. Table 25 - Segment location on transcripts
Segment cluster HSPPI_PEA_l_node_9 according to the present invention can be found in the following transcript(s): HSPPI_PEA_1_T5. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HSPPI_PEA_l_node_10 according to the present invention can be found in the following franscript(s): HSPPI_PEA_1_T3 and HSPPI_PEA_1_T5. Table 27 below describes the starting and ending position of this segment on each franscript. Table 27 - Segment location on transcripts
Segment cluster HSPPI_PEA_l_node_l 1 according to the present invention is supported by 22 libraries. The number of libraπes was determined as previously described. This segment can be found in the following franscript(s): HSPPI_PEA_1_T3, HSPPI_PEA_1_T5, HSPPI_PEA_1_T6 and HSPPI_PEA_1_T18. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HSPPI_PEA_l_node_12 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPPI_PEA_1_T3, HSPPI_PEA_1_T5, HSPPI_PEA_1_T6, HSPPI_PEA_1_T12 and HSPPI_PEA_1_T18. Table 29 below describes the starting and ending position of this segment on each franscript. Table 29 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name : INS_HUMAN
Sequence documentation:
Alignment of: HSPPI_PEA_1_P6 x INS_HUAN
Alignment segment l/l
Quality: 875.00 Escore: 0 Matching length: 98 Total length: 110 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 89.09 Total Percent Identity: 89.09 Ga s :
Alignment :
1 MAL MRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50
1 MALWMRLLPLLALLAL GPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50
51 TPKTRREAEDLQ GSLQPLALEGSLQKRGIVEQCCTSIC 88
51 TPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSIC 100
89 SLYQLENYCN 98
101 SLYQLENYCN 110
Sequence name : INS_HUMAN
Sequence documentation:
Alignment of: HSPPI_PEA_1_P9 x INSJHUMAN
Alignment segment l/l:
Quality: 615.00 Escore: 0 Matching length: 62 Total length: 62 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment
1 MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50
1 MAL MRLLPLLALLAL GPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50
51 TPKTRREAEDLQ 62
51 TPKTRREAEDLQ 62
Sequence name : INS_HUMAN
Sequence documentation:
Alignment of: HSPPI_PEA_1_P10 x INS_HUMAN
Alignment segment 1/1:
Quality: 616.00 Escore: 0 Matching length: 64 Total length: 64 Matching Percent Similarity: 98.44 Matching Percent Identity: 98.44 Total Percent Similarity: 98.44 Total Percent Identity: 98.44 Gaps : 0
Alignment :
1 MAL MRLLPLLALLAL GPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50 1 MAL MRLLPLLALLAL GPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50
51 TPKTRREAEDLQGG 64 I 51 TPKTRREAEDLQVG 64
Sequence name: INS_HUMAN
Sequence documentation:
Alignment of: HSPPI_PEA_1_P12 x INS_HUMAN
Alignment segment 1/1:
Quality: 624.00 Escore: 0 Matching length: 65 Total length: 65 Matching Percent Similarity: 98.46 Matching Percent Identity: 96.92 Total Percent Similarity: 98.46 Total Percent Identity: 96.92 Gaps :
Alignment :
1 MAL MRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50
1 MAL MRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50
51 TPKTRREAEDLQAGE 65
51 TPKTRREAEDLQVGQ 65
Sequence name : INS_HUMAN
Sequence documentation:
Alignment of: HSPPI_PEA_1_P14 x INS_HUMAN
Alignment segment 1/1:
Quality: 624.00
Escore : 0 Matching length: 65 Total length: 65 Matching Percent Similarity: 98.46 Matching Percent Identity: 96.92 Total Percent Similarity: 98.46 Total Percent Identity: 96.92 Gaps :
Alignment:
1 MAL MRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50
1 MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50
51 TPKTRREAEDLQAGE 65
51 TPKTRREAEDLQVGQ 65
Sequence name : INS_HUMAN
Sequence documentation:
Alignment of: HSPPI_PEA_1_P15 x INS_HUMAN
Alignment segment 1/1:
Quality: 616.00 Escore : 0 Matching length: 64 Total length: 64 Matching Percent Similarity: 98.44 Matching Percent Identity: 98.44 Total Percent Similarity: 98.44 Total Percent Identity: 98.44 Gaps :
Alignmen :
1 MAL MRLLPLLALLAL GPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50
1 MAL MRLLPLLALLAL GPDPAAAFVNQHLCGSHLVEALYLVCGERGFFY 50
51 TPKTRREAEDLQGG 64 I 51 TPKTRREAEDLQVG 64
DESCRIPTION FOR CLUSTER DI 1581 Cluster DI 1581 features 4 franscript(s) and 31 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
D11581 PEA 1 node 4 499 D11581 PEA 1 node 8 500 D11581_PEA_1 node 16 501 D11581 PEA 1 node 23 502 D11581 PEA 1 node 27 503 D11581 PEA 1 node 55 504 D11581 PEA_1 node 6 505 D11581 PEA_1 node 10 506 D11581 PEA_1 node 11 507 D11581_PEA_1 node 12 508
Table 3 - Proteins of interest
These sequences are variants of the known protein Alpha- fetoprotein precursor (SwissProt accession identifier FETA_HUMAN; known also according to the synonyms Alpha- fetoglobulin; Alpha- 1- fetoprotein), SEQ ID NO: 530, refened to herein as the previously known protein. Alpha fetoprotein (AFP) is expressed in fetal tissues including the liver and is normally absent in adult tissues. It is prone to be expressed in neoplastic adult tissues conesponding to its site of fetal production. As such, alpha fetoprotein is produced by neoplasms of the liver, gonads (testes and ovaries), and some carcinomas ofthe bladder. Approximately 70% to 90% of hepatomas have been reported to be AFP positive. Between 5% and 50% of cells within AFP positive hepatomas will stain with antibodies directed against AFP. In the context of liver diseases, primary cholangiocarcinoma and most metastatic adenocarcinomas are AFP negative. In addition to malignanices semm AFP is used for the assessing the risk of fetal abnormalities (e.g. down's syndrome) and the second trimester of pregnancy. An example of a prefened test is: Alpha Feto Protein is used in pregnancy for abnormalities screening and as a cancer marker. TEST: AFP (Cancer) "Pregnancy, testicular cancer and hepatocellular cancer". Protein Alpha- fetoprotein precursor is known or believed to have the following function(s): binds copper, nickel, and fatty acids as well as, and bilirubin less well than, semm albumin. Only a small percentage (less than 2%) of the human AFP shows estrogen-binding properties. The sequence for protein Alpha-fetoprotein precursor is given at the end of the application, as "Alpha-fetoprotein precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Alpha-fetoprotein precursor localization is believed to be Secreted.
A therapeutic role for a protein represerted by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public
databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Immunoconjugate; Imaging agent. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: transport; immune response, which are annotation(s) related to Biological Process; carrier, nickel binding, which are annotation(s) related to Molecular Function; and exfracellular space, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster DI 1581 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 6 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: epithelial malignant tumors and a mixture of malignant tumors from different tissues.
Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue colon 6.5e-01 5.0e-01 1.1 7.7e-01 1.4 epithelial 5.4e-01 4.4e-01 1.4e-01 1.8 2.3e-18 4.4 general 3.4e-01 2.1e-01 3.7e-04 3.1 2.0e-33 6.3 liver 1.2e-01 6.2e-01 1.3e-02 4.7 3.8e-17 2.2 lung 5.0e-01 4.0e-01 4.1e-01 2.4 6.2e-01 1.7 lymph nodes 5.7e-01 1 1.0 5.8e-01 2.5 breast 8.2e-01 8.7e-01 6.9e-01 1.2 8.2e-01 1.0 prostate 1 7.8e-01 1 1.0 9.9e-03 1.7 As noted above, cluster DI 1581 features 4 transcript(s), which were listed in Table 1 above. These franscript(s) encode for protein(s) which are variant(s) of protein Alpha- fetoprotein precursor. A description of each variant protein according to the present invention is now provided.
Variant protein DI 1581_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) DI 1581_PEA_1_T5. An alignment is given to the known protein (Alpha-fetoprotein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between DI 1581_PEA_1_P6 and FETA_HUMAN: l.An isolated chimeric polypeptide encoding for DI 1581_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIFFAQFVQEATYK
EVSKMVKDALTAIEKPTGDEQSSGCLENQ conesponding to amino acids 1 - 90 of FETA_HUMAN, which also conesponds to amino acids 1 - 90 of DI 1581_PEA_1_P6, and a second amino acid sequence being at least 90 % homologous to
YGHSDCCSQSEEGRHNCFLAHKKPTPASIPLFQVPEPVTSCEAYEEDRETFMNKFIYEIA RRHPFLYAPTILLWAARYDKIIPSCCKAENAVECFQTKAATVTKELRESSLLNQHACAV MK^FGTRTFQAITVTKLSQKFTKVNFTEIQKLVLDVAHVHEHCCRGDVLDCLQDGEKI MSYICSQQDTLSNKITECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSS GEKNIFLASFVHEYSRRHPQLAVSVILRVAKGYQELLEKCFQTENPLECQDKGEEELQK YIQESQALAKRSCGLFQKLGEYYLQNAFLVAYTKKAPQLTSSELMAITRKMAATAATC CQLSEDKLLACGEGAADIIIGHLCIRHEMTPVNPGVGQCCTSS YANRRPCFSSLWDETY VPPAFSDDKFIFHKDLCQAQGVALQTMKQEFLINLVKQKPQITEEQLEAVIADFSGLLEK CCQGQEQEVCFAEEGQKLISKTRAALGV conesponding to amino acids 108 - 609 of FETA_HUMAN, which also conesponds to amino acids 91 - 592 of DI 1581_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of D11581_PEA_1_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise QY, having a stmcture as follows: a sequence starting from any of amino acid numbers 90-x to 90; and ending at any of amino acid numbers 91+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein DI 1581_PEA_1_P6 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the
amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein DI 1581_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
The glycosylation sites of variant protein DI 1581_PEA_1_P6, as compared to the known protein Alpha-fetoprotein precursor, are described in Table 8 (given according to their positιon(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 8 - Glycosylation site(s)
Vanant protein DI 1581_PEA_1_P6 is encoded by the following transcript(s): DI 1581_PEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript DI 1581_PEA_1_T5 is shown in bold; this coding portion starts at position 105 and ends at position 1880. The franscript also has the following SNPs as listed in
Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in vanant protein DI 1581_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein DI 1581_PEA_1_P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) DI 1581_PEA_1_T12. An alignment is given to the known protein (Alpha-fetoprotein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief descπption of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between DI 1581_PEA_1_P10 and FETA_HUMAN :
l.An isolated chimeric polypeptide encoding for DI 1581_PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIFFAQFVQEATYK EVSKMVKDALTAIEKPTGDEQSSGCLENQLPAFLEELCHEKEILEKYGHSDCCSQSEEG RHNCFLAHKKPTPASIPLFQVPEPVTSCEAYEEDRETFMNKFIYEIARRHPFLYAPTILLW AARYDKIIPSCCKAENAVECFQTKAATVTKELRESSLLNQHACAVMKNFGTRTFQAITV TKLSQKFTKVNFTEIQKLVLDVAHVHEHCCRGDVLDCLQDGEKIMSYICSQQDTLSNKI TECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSSGEKNIFLASFVHEYS RRHPQLAVSVILRVAKGYQELLEKCFQTENPLECQDKGEEELQKYIQESQALAKRSCGL FQKLGEYYLQNAFLVAYTKKAPQLTSSELMAITRKMAATAATCCQLSEDKLLACGEGA ADIIIGHLCIRHEMTPΛ^NPGVGQCCTSSYANRRPCFSSLVVDETYVPPAFSDDKFIFHKDL CQAQGVALQTMKQE conesponding to amino acids 1 - 551 of FETA_HUMAN, which also conesponds to amino acids 1 - 551 of DI 1581_PEA_1_P10. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein D11581_PEA_1_P10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein DI 1581_PEA_1_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Ammo acid mutations
The glycosylation sites of variant protein DI 1581_PEA_1_P10, as compared to the known protein Alpha-fetoprotein precursor, are described in Table 11 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 11 - Glycosylation site(s)
Variant protein DI 1581_PEA_1_P10 is encoded by the following transcript(s): D11581_PEA_1_T12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript DI 1581_PEA_1_T12 is shown in bold; this coding portion starts at position 105 and ends at position 1757. The franscript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein DI 1581_PEA_1_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein D11581_PEA_1_P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) DI 1581_PEA_1_T14. An alignment is given to the known protein (Alpha-fetoprotein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between DI 1581_PEA_1_P12 and FETA_HUMAN: l.An isolated chimeric polypeptide encoding for DI 1581_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIFFAQFVQEATYK EVSKMVKDALTAIEKPTGDEQSSGCLENQLPAFLEELCHEKEILEKYGHSDCCSQSEEG RHNCFLAHKKPTPASIPLFQVPEPVTSCEA YEEDRETFMNKFI YEIARRHPFLYAPTILLW AARYDKIIPSCCKAENAVECFQTKAATVTKELRESSLLNQHACAVMKNFGTRTFQAITV TKLSQKFTKVNFTEIQKLVLDVAHVHEHCCRGDVLDCLQDGEKIMSYICSQQDTLSNKI TECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSSGEKNIFLA conesponding to amino acids 1 - 352 of FETA_HUMAN, which also conesponds to amino acids 1 - 352 of DI 1581_PEA_1_P12, and a second amino acid sequence being at least 90 % homologous to SLWDETYVPPAFSDDKFIFHKDLCQAQGVALQTMKQEFLINLVKQKPQITEEQLEAVI
ADFSGLLEKCCQGQEQEVCFAEEGQKLISKTRAALGV conesponding to amino acids 514 - 609 of FETA_HUMAN, which also conesponds to amino acids 353 - 448 of DI 1581_PEA_1_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of DI 1581_PEA_1_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AS, having a structure as follows: a sequence starting from any of amino acid numbers 352-x to 352; and ending at any of amino acid numbers 353+ ((n-2) - x), in which x varies from 0 to n-2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein D11581_PEA_1_P12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein DI 1581_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
The glycosylation sites of variant protein DI 1581_PEA_1_P12, as compared to the known protein Alpha-fetoprotein precursor, are described in Table 14 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 14 - Glycosylation site(s)
Variant protein DI 1581_PEA_1_P12 is encoded by the following transcπpt(s): D11581_PEA_1_T14, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript DI 1581_PEA_1_T14 is shown in bold; this coding portion starts at position 105 and ends at position 1448. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in vanant protein DI 1581_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Variant protein DI 1581_PEA_1_P16 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) DI 1581_PEA_1_T4. An alignment is given to the known protein (Alpha-fetoprotein precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between D 11581_PEA_1_P 16 and FETA_HUMAN: l.An isolated chimeric polypeptide encoding for DI 1581_PEA_1_P16, comprising a first amino acid sequence being at least 90 % homologous to MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIFFAQFVQEATYK EVSKMVKDALTAIEKPTGDEQSSGCLENQ corresponding to amino acids 1 - 90 of FETA_HUMAN, which also conesponds to amino acids 1 - 90 of DI 1581_PEA_1_P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NFAMRKKFWRSTDIQTAAAKVKREDITVFLHTKSPLQHRSHFSKFQNLSQAVKHMKKT GRHS conesponding to amino acids 91 - 152 of DI 1581_PEA_1_P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of DI 1581_PEA_1_P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NFAMPJCKFWRSTDIQTAAAKVKREDITVFLHTKSPLQHRSHFSKTQNLSQAVKHMKKT GRHS in D11581 PEA 1 P16.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein D11581_PEA_1_P16 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 16, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein DI 1581_PEA_1_P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 -Amino acid mutations
The glycosylation sites of variant protein DI 1581_PEA_1_P16, as compared to the known protein Alpha-fetoprotein precursor, are described in Table 17 (given according to their posιtion(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 17 - Glycosylation sιte(s)
Vanant protein D11581_PEA_1_P16 is encoded by the following transcript(s): DI 1581_PEA_1_T4, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript DI 1581_PEA_1_T4 is shown in bold; this coding portion starts at
position 105 and ends at position 560. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein DI 1581_PEA_1_P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster DI 1581_PEA_l_node_4 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): DI 1581_PEA_1_T4, DI 1581_PEA_1_T5,
DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_8 according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): D11581_PEA_1_T4, D11581_PEA_1_T5, DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_16 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): DI 1581_PEA_1_T4, DI 1581_PEA_1_T5, DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 21 below describes the starting and ending position of this segment on each franscript.
Table 21 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_23 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D11581_PEA_1_T4, D11581_PEA_1_T5, DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 22 below descnbes the starting and ending position of this segment on each franscript. Table 22 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_27 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): DI 1581_PEA_1_T4, DI 1581_PEA_1_T5 and DI 1581_PEA_1_T12. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_55 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D11581_PEA_1_T4, D11581_PEA_1_T5 and DI 1581_PEA_1_T14. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
d to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster D11581_PEA_l_node_6 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): DI 1581_PEA_1_T4, DI 1581_PEA_1_T5, DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_10 according to the present invention can be found in the following franscript(s): DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_l 1 according to the present invention can be found in the following franscript(s): DI 1581_PEA_1_T4, DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_12 according to the present invention can be found in the following franscπpt(s): D11581_PEA_1_T4, D11581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 28 below describes the starting and ending position of this segment on each franscnpt. Table 28 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_13 according to the present invention is supported by 36 hbranes. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): D11581_PEA_1_T4, D11581_PEA_1_T5, DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 29 below describes the starting and ending position of this segment on each franscript. Table 29 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_14 according to the present invention is supported by 33 hbranes. The number of hbranes was determined as previously descπbed. This segment can be found in the following transcπpt(s): D11581_PEA_1_T4, D11581_PEA_1_ T5,
DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_18 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D11581_PEA_1_T4, D11581_PEA_1_T5, D11581_PEA_1_T12 and D11581_PEA_1_T14. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_20 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D11581_PEA_1_T4, D11581_PEA_1_T5, DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 32 below describes the starting and ending position of this segment on each transcript.
Table 32 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_21 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): D11581_PEA_1_T4, D11581_PEA_1_T5, DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 33 below describes the starting and ending position of this segment on each franscript. Table 33 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_29 according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): D11581_PEA_1_T4, D11581_PEA_1_T5 and DI 1581_PEA_1_T12. Table 34 below describes the starting and ending position of this segment on each franscript. Table 34 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_30 according to the present invention can be found in the following transcript(s): DI 1581_PEA_1_T4, D11581_PEA_1_T5 and D11581_PEA_1_T12. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_33 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): DI 1581_PEA_1_T4, DI 1581_PEA_1_T5 and DI 1581_PEA_1_T12. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_34 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): D11581_PEA_1_T4, D11581_PEA_1_T5 and D11581_PEA_1_T12. Table 37 below describes the starting and ending position of this segment on each franscript. Table 37 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_36 according to the present invention can be found in the following franscript(s): D11581_PEA_1_T4, DI 1581_PEA_1_T5 and DI 1581_PEA_1_T12. Table 38 below describes the starting and ending position of this segment on each franscript. Table 38 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_38 according to the present invention can be found in the following transcript(s): D11581_PEA_1_T4, D11581_PEA_1_T5 and
DI 1581_PEA_1_T12. Table 39 below describes the starting and ending position of this segment on each franscnpt. Table 39 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_39 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): DI 1581_PEA_1_T4, DI 1581_PEA_1_T5 and DI 1581_PEA_1_T12. Table 40 below describes the starting and ending position of this segment on each franscnpt. Table 40 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_40 according to the present invention can be found in the following franscript(s): DI 1581_PEA_1_T4, DI 1581_PEA_1_T5, DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 41 below descnbes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_41 according to the present invention is supported by 54 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D11581_PEA_1_T4, D11581_PEA_1_T5, D11581_PEA_1_T12 and D11581_PEA_1_T14. Table 42 below describes the starting and ending position of this segment on each transcπpt. Table 42 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_42 according to the present invention can be found in the following transcπpt(s): D11581_PEA_1_T4, D11581_PEA_1_T5, DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_43 according to the present invention can be found in the following franscript(s): D11581_PEA_1_T4, D11581_PEA_1_T5, DI 1581_PEA_1_T12 and DI 1581_PEA_1_T14. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_44 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): DI 1581_PEA_1_T4, DI 1581_PEA_1_T5, D11581_PEA_1_T12 and D11581_PEA_1_T14. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_45 according to the present invention can be found in the following transcπpt(s) DI 1581_PEA_1_T12 Table 46 below descnbes the starting and ending position of this segment on each franscπpt Table 46 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_49 according to the present invention can be found in the following transcπpt(s) DI 1581_PEA_1_T4, DI 1581_PEA_1_T5 and DI 1581_PEA_1_T14 Table 47 below describes the starting and ending position of this segment on each franscnpt Table 47 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_50 according to the present invention is supported by 63 hbranes The number of hbranes was determined as previously described This segment can be found in the following transcπpt(s) DI 1581_PEA_1_T4, DI 1581_PEA_1_T5 and DI 1581_PEA_1_T14 Table 48 below describes the starting and ending position of this segment on each transcπpt
Table 48 - Segment location on transcripts
Segment cluster DI 1581_PEA_l_node_52 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): D11581_PEA_1_T4, D11581_PEA_1_T5 and DI 1581_PEA_ 1_T14. Table 49 below describes the starting and ending position of this segment on each franscript. Table 49 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: FETA_HUMAN
Sequence documentation:
Alignment of: D11581_PEA_1_P6 x FETA_HUMAN
Alignment segment l/l:
Quality: 5750.00 Escore: 0 Matching length: 592 Total length: 609 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 97.21 Total Percent Identity: 97.21 Gaps : 1
Alignment : 1 MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIF 50
1 MKVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIF 50 51 FAQFVQEATYKEVSKMVKDALTAIEKPTGDEQSSGCLENQ 90 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 51 FAQFVQEATYKEVSKMVKDALTAIEKPTGDEQSSGCLENQLPAFLEELCH 100 91 YGHSDCCSQSEEGRHNCFLAHKKPTPASIPLFQVPEPVTSCEA 133 101 EKEILEKYGHSDCCSQSEEGRHNCFLAHKKPTPASIPLFQVPEPVTSCEA 150 134 YEEDRETFMNKFIYEIARRHPFLYAPTILL AARYDKIIPSCCKAENAVE 183
151 YEEDRETFMNKFIYEIARRHPFLYAPTILLWAARYDKIIPSCCKAENAVE 200
184 CFQTKAATVTKELRESSLLNQHACAVMKNFGTRTFQAITVTKLSQKFTKV 233
201 CFQTKAATVTKELRESSLLNQHACAVMKNFGTRTFQAITVTKLSQKFTKV 250
234 NFTEIQKLVLDVAHVHEHCCRGDVLDCLQDGEKIMSYICSQQDTLSNKIT 283
251 NFTEIQKLVLDVAHVHEHCCRGDVLDCLQDGEKIMSYICSQQDTLSNKIT 300
284 ECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSSGEKNIF 333
301 ECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSSGEKNIF 350
334 LASFVHEYSRRHPQLAVSVILRVAKGYQELLEKCFQTENPLECQDKGEEE 383 1111 [ I r 111 E 11111 i I E f I f 111 f 11 E I f 1111 f 11111 f 111111 ϊ I
351 LASFVHEYSRRHPQLAVSVILRVAKGYQELLEKCFQTENPLECQDKGEEE 400 . . . . .
384 LQKYIQESQALAKRSCGLFQKLGEYYLQNAFLVAYTKKAPQLTSSELMAI 433 Mill II 11111 III
401 LQKYIQESQALAKRSCGLFQKLGEYYLQNAFLVAYTKKAPQLTSSELMAI 450
434 TRKMAATAATCCQLSEDKLLACGEGAADIIIGHLCIRHEMTPVNPGVGQC 483
451 TRKMAATAATCCQLSEDKLLACGEGAADI I IGHLCIRHEMTPVNPGVGQC 500
484 CTSSYANRRPCFSSLWDETYVPPAFSDDKFI FHKDLCQAQGVALQTMKQ 533 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
501 CTSSYANRRPCFSSLWDETYVPPAFSDDKFIFHKDLCQAQGVALQTMKQ 550
534 EFLINLVKQKPQITEEQLEAVIADFSGLLEKCCQGQEQEVCFAEEGQKLI 583
551 EFLINLVKQKPQITEEQLEAVIADFSGLLEKCCQGQEQEVCFAEEGQKLI 600
584 SKTRAALGV 592 M I N I M I 601 SKTRAALGV 609
Sequence name: FETA_HUMAN
Sequence documentation:
Alignment of: D11581_PEA_1_P10 x FETA_HUMAN
Alignment segment l/l:
Quality: 5461.00 Escore: 0 Matching length: 551 Total length: 551 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment
1 MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIF 50 I i l l I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
i
643 1 MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIF 50
51 FAQFVQEATYKEVSKMVKDALTAIEKPTGDEQSSGCLENQLPAFLEELCH 100 111111111 ill M 11 III 11111111111111111111 IM 11 III 111 51 FAQFVQEATYKEVSKMVKDALTAIEKPTGDEQSSGCLENQLPAFLEELCH 100 101 EKEILEKYGHSDCCSQSEEGRHNCFLAHKKPTPASIPLFQVPEPVTSCEA 150 IMMMMMIIMMIMMMMMMMMMMMMMMMM 101 EKEILEKYGHSDCCSQSEEGRHNCFLAHKKPTPASIPLFQVPEPVTSCEA 150 151 YEEDRETFMNKFIYEIARRHPFLYAPTILL AARYDKIIPSCCKAENAVE 200 MMMMMMMMMMMMMMMMMMMMMMMMM 151 YEEDRETFMNKFIYEIARRHPFLYAPTILL AARYDKIIPSCCKAENAVE 200 201 CFQTKAATVTKELRESSLLNQHACAVMKNFGTRTFQAITVTKLSQKFTKV 250 MMMMMMMMMMMMMMMMMMMMMMMMM 201 CFQTKAATVTKELRESSLLNQHACAVMKNFGTRTFQAITVTKLSQKFTKV 250 251 NFTEIQKLVLDVAHVHEHCCRGDVLDCLQDGEKIMSYICSQQDTLSNKIT 300 MMMMMMMMMMMMMMMMMMMMMMMMM 251 NFTEIQKLVLDVAHVHEHCCRGDVLDCLQDGEKIMSYICSQQDTLSNKIT 300 301 ECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSSGEKNIF 350 MMMMMMMMMMMMMMMMMMMMMMMMM 301 ECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSSGEKNIF 350 351 LASFVHEYSRRHPQLAVSVILRVAKGYQELLEKCFQTENPLECQDKGEEE 400 MMMMMMMMMMMMMMMMIIIMMMIMMMM 351 LASFVHEYSRRHPQLAVSVILRVAKGYQELLEKCFQTENPLECQDKGEEE 400
401 LQKYIQESQALAKRSCGLFQKLGEYYLQNAFLVAYTKKAPQLTSSELMAI 450
MMMIMIIMMMMMMMMMMMMMMMMMIMM 401 LQKYIQESQALAKRSCGLFQKLGEYYLQNAFLVAYTKKAPQLTSSELMAI 450 451 TRKMAATAATCCQLSEDKLLACGEGAADIIIGHLCIRHEMTPVNPGVGQC 500 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l 451 TRKMAATAATCCQLSEDKLLACGEGAADIIIGHLCIRHEMTPVNPGVGQC 500 501 CTSSYANRRPCFSSLWDETYVPPAFSDDKFIFHKDLCQAQGVALQTMKQ 550 MIMMIMIMMMMMMMMMMMMMMMIMMMM 501 CTSSYANRRPCFSSLWDETYVPPAFSDDKFIFHKDLCQAQGVALQTMKQ 550 551 E 551
551 E 551
Sequence name: FETA_HUMAN
Sequence documentation:
Alignment of: D11581_PEA_1_P12 x FETA_HUMAN
Alignment segment 1/1:
Quality: 4325.00 Escore: 0
Matching length: 448 Total length: 609 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 73.56 Total Percent Identity: 73.56 Gaps : 1
Alignment : . . . . . 1 MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIF 50 MMMMMMMMMMMMMMMMMMMMMMMMM. 1 MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIF 50 51 FAQFVQEATYKEVSKMVKDALTAIEKPTGDEQSSGCLENQLPAFLEELCH 100 MMMMMIIMMMMMMMMMIMMMMMIMMMM 51 FAQFVQEATYKEVSKMVKDALTAIEKPTGDEQSSGCLENQLPAFLEELCH 100 101 EKEILEKYGHSDCCSQSEEGRHNCFLAHKKPTPASIPLFQVPEPVTSCEA 150 M 11111 M M I II 1111 II 1111111 II II 11 II 111 II 11111111 II 101 EKEILEKYGHSDCCSQSEEGRHNCFLAHKKPTPASIPLFQVPEPVTSCEA 150 151 YEEDRETFMNKFIYEIARRHPFLYAPTILLWAARYDKIIPSCCKAENAVE 200 MMMMMMMMMMMMMMMMMMMMMIMMIM 151 YEEDRETFMNKFIYEIARRHPFLYAPTILL AARYDKIIPSCCKAENAVE 200 201 CFQTKAATVTKELRESSLLNQHACAVMKNFGTRTFQAITVTKLSQKFTKV 250 MMMMIMMMMMIMMMMMMMIMMMMMMIM 201 CFQTKAATVTKELRESSLLNQHACAVMKNFGTRTFQAITVTKLSQKFTKV 250
251 NFTEIQKLVLDVAHVHEHCCRGDVLDCLQDGEKIMSYICSQQDTLSNKIT 300
MMMMMMMMMMMMMMMMMMMMMMMMM
251 NFTEIQKLVLDVAHVHEHCCRGDVLDCLQDGEKIMSYICSQQDTLSNKIT 300
301 ECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSSGEKNIF 350 MMIMMIMMMMMMMMMIMMMMIMMMMMM
301 ECCKLTTLERGQCIIHAENDEKPEGLSPNLNRFLGDRDFNQFSSGEKNIF 350
351 LA 352 II 351 LASFVHEYSRRHPQLAVSVILRVAKGYQELLEKCFQTENPLECQDKGEEE 400
352 352
401 LQKYIQESQALAKRSCGLFQKLGEYYLQNAFLVAYTKKAPQLTSSELMAI 450 . . . . .
352 352
451 TRKMAATAATCCQLSEDKLLACGEGAADIIIGHLCIRHEMTPVNPGVGQC 500
353 SLWDETYVPPAFSDDKFIFHKDLCQAQGVALQTMKQ 389 MMMMMMMMMMMMIMMMMMM
501 CTSSYANRRPCFSSLWDETYVPPAFSDDKFIFHKDLCQAQGVALQTMKQ 550
390 EFLINLVKQKPQITEEQLEAVIADFSGLLEKCCQGQEQEVCFAEEGQKLI 439 MMMMMMMMMMMMMMMMMMMMMMMMM
551 EFLINLVKQKPQITEEQLEAVIADFSGLLEKCCQGQEQEVCFAEEGQKLI 600
440 SKTRAALGV 448 IIIIIMM 601 SKTRAALGV 609
Sequence name : FETA_HUMAN
Sequence documentation:
Alignment of: D11581_PEA_1_P16 x FETA_HUMAN
Alignment segment 1/1:
Quality: 870.00 Escore: 0 Matching length: 90 Total length: 90 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps :
Alignment :
1 MK VESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIF 50 MMMMMMMMIMMMMMMMMMMMMMMIMM 1 MKWVESIFLIFLLNFTESRTLHRNEYGIASILDSYQCTAEISLADLATIF 50
51 FAQFVQEATYKEVSKMVKDALTAIEKPTGDEQSSGCLENQ 90 MMMMMMMMMMMMMMMMMMMM
51 FAQFVQEATYKEVSKMVKDALTAIEKPTGDEQSSGCLENQ 90
DESCRIPTION FOR CLUSTER HSPRO204 Cluster HSPRO204 feamres 9 transcript(s) and 31 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
HSPRO204 PEA 1 T3 535 HSPRO204 PEA 1 T4 536 HSPRO204 PEA 1 T5 537 HSPRO204 PEA 1 T6 538 HSPRO204 PEA 1 Ti l 539 HSPRO204 PEA 1 T12 540 HSPRO204 PEA 1 T17 541 HSPRO204 PEA 1 T18 542 HSPRO204 PEA 1 T22 543
Table 2 - Segments of interest
Table 3 - Proteins of interest
HSPRO204 PEA 1 P3 576 HSPRO204 PEA 1 T3 HSPRO204 PEA 1 P4 577 HSPRO204 PEA 1 T4 HSPRO204 PEA 1 P5 578 HSPRO204 PEA 1 T5 HSPRO204_PEA_l_P6 579 HSPRO204 PEA 1 T6 HSPRO204 PEA 1 Pl l 580 HSPRO204 PEA 1 Ti l HSPRO204 PEA_1 P12 581 HSPRO204 PEA 1 T12 HSPRO204 PEA 1 P16 582 HSPRO204_PEA_l T17 HSPRO204 PEA 1 P21 583 HSPRO204 PEA 1 T18
These sequences are variants of the known protein Prolactin precursor (SwissProt accession identifier PRL_HUMAN; known also according to the synonyms PRL), SEQ ID NO: 575, refened to herein as the previously known protein. Protein Prolactin precursor is known or believed to have the following function(s): prolactin acts primarily on the mammary gland by promoting lactation. Prolactin secreting adenomas (prolactinomas) are the most prevalent form of pituitary tumors in humans. Prolactin staining by immunohistochemistry is a valuable tool in the diagnosis and differential diagnosis of pituitary tumors. Semm prolactin is used for the diagnosis of pituitary dysfunction (as a result of adenoma for example) and is also used to evaluate amenonhea, galactonhea and infertility. The variants described herein may optionally be used for such diagnostic assays. The sequence for protein Prolactin precursor is given at the end of the application, as "Prolactin precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Prolactin precursor localization is believed to be Secreted.
The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer; Immunodeficiency; Vaccine adjunct. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities ofthe previously known protein are as follows: Natural killer cell stimulant; T cell stimulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticancer; Immunostimulant. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell surface receptor linked signal transduction; hemocyte development; pregnancy; lactation; cell proliferation, which are annotation(s) related to Biological Process; prolactin receptor ligand; hormone, which are annotation(s) related to Molecular Function; and extracellular space; soluble fraction, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HSPRO204 features 9 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Prolactin precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSPRO204_PEA_l_P3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPRO204_PEA_l_T3. An alignment is given to the known protein (Prolactin precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPRO204_PEA_l_P3 and PRL_HUMAN: l.An isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P3, comprising a first amino acid sequence being at least 90 % homologous to MNIKGSPWKGSLLLLLVSNLLLCQS VAPLPICPGGAARCQVTLRDLFDRA WLSHYIHN LSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQAQQMN conesponding to amino acids 1 - 104 of PRL_HUMAN, which also conesponds to amino acids 1 - 104 of HSPRO204_PEA_l_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KTF conesponding to amino acids 105 - 107 of HSPRO204_PEA_l_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KTF in HSPRO204_PEA_l_P3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide
prediction programs predict that this protein has a signal peptide, and neither frans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSPRO204_PEA_l_P3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
The glycosylation sites of variant protein HSPRO204_PEA_l_P3, as compared to the known protein Prolactin precursor, are described in Table 6 (given according to their posit ion(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 6 - Glycosylation site(s)
Variant protein HSPRO204_PEA_ _P3 is encoded by the following franscript(s): HSPRO204_PEA_l_T3, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPRO204_PEA_l_T3 is shown in bold; this coding portion starts at position 192 and ends at position 512. The franscript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HSPRO204_PEA_l_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPRO204_PEA_l_T4. An alignment is given to the known protein (Prolactin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSPRO204_PEA_l_P4 and PRL_HUMAN: l.An isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P4, comprising a first amino acid sequence being at least 90 % homologous to
MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRAλ^VLSHYIHN LSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQAQQMNQKDFLSLIVSILRSW NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQTKRLLEGMELΓVSQ conesponding to amino acids 1 - 164 of PRL_HUMAN, which also conesponds to amino acids 1 - 164 of HSPRO204_PEA_l_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LERTRTYKY conesponding to amino acids 165 - 173 of HSPRO204_PEA_l_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%) homologous to the sequence LERTRTYKY in HSPRO204_PEA_l_P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSPRO204_PEA_l_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
The glycosylation sites of variant protein HSPRO204_PEA_l_P4, as compared to the known protein Prolactin precursor, are described in Table 9 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s)
Variant protein HSPRO204_PEA_l_P4 is encoded by the following franscript(s): HSPRO204_PEA_l_T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPRO204_PEA_l_T4 is shown in bold; this coding portion
starts at position 192 and ends at position 710. The franscript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HSPRO204_PEA_l_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSPRO204_PEA_l_T5. An alignment is given to the known protein (Prolactin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPRO204_PEA_l_P5 and PRL_HUMAN: l.An isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P5, comprising a first amino acid sequence being at least 90 % homologous to MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA WLSHYIHN LSSEMFSEFDKRYTHGRGFITKAΓNSCHTSSLATPEDKEQAQQMNQKDFLSLIVSILRSW NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQTKRLLEGMELIVSQ conesponding to amino acids 1 - 164 of PRL_HUMAN, which also conesponds to amino acids 1 - 164 of
HSPRO204_PEA_l_P5, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SLFFCVMRFILKPKKMRSTLSGRDFHPCRWLMKSLAFLLIITCSTAYAGIHIKSTIISSS conesponding to amino acids 165 - 224 of HSPRO204_PEA_l_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SLFFCVMRFILK KKMRSTLSGRDFHPCRWLMKSLAFLLIITCSTAYAGIHIKSTIISSS in HSPRO204 PEA 1 P5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein HSPRO204_PEA_l_P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 1 1 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
The glycosylation sites of variant protein HSPRO204_PEA_l_P5, as compared to the known protein Prolactin precursor, are described m Table 12 (given according to their posιtιon(s) on the ammo acid sequence in the first column, the second column indicates whether the glycosylation site is present in the vanant protein; and the last column indicates whether the position is different on the vanant protein) Table 12 - Glycosylation sιte(s)
Variant protein HSPRO204_PEA_l_P5 is encoded by the following transcript(s): HSPRO204_PEA_l_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSPRO204_PEA_l_T5 is shown in bold; this coding portion starts at position 192 and ends at position 863. The franscript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein HSPRO204_PEA_l_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPRO204_PEA_l_T6. An alignment is given to the known protein (Prolactin precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPRO204_PEA_l_P6 and PRL_HUMAN: l.An isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P6, comprising a first amino acid sequence being at least 90 % homologous to MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA WLSHYIHN LSSEMFSEFDKRYTHGRGFITKAΓNSCHTSSLATPEDKEQAQQMNQKDFLSLIVSILRSW
NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQTKRLLEGMELΓVSQV conesponding to amino acids 1 - 165 of PRL_HUMAN, which also conesponds to amino acids 1 - 165 of HSPRO204_PEA_l_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SSLLVLLCFSH corresponding to amino acids 166 - 176 of HSPRO204_PEA_l_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SSLLVLLCFSH in HSPRO204_PEA_l_P6.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The vanant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSPRO204_PEA_l_P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their positιon(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Amino acid mutations
The glycosylation sites of variant protein HSPRO204_PEA_l_P6, as compared to the known protein Prolactin precursor, are described in Table 15 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 15 - Glycosylation site(s)
Variant protein HSPRO204_PEA_l_P6 is encoded by the following franscript(s): HSPRO204_PEA_l_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPRO204_PEA_l_T6 is shown in bold; this coding portion starts at position 192 and ends at position 719. The franscript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 16 - Nucleic acid SNPs
Variant protein HSPRO204_PEA_l_Pl 1 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPRO204_PEA_l_Tl 1. An alignment is given to the known protein (Prolactin precursor) at
the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPRO204_PEA_l_Pl 1 and PRL_HUMAN: 1.An isolated chimeric polypeptide encoding for HSPRO204_PEA_l_Pl 1, comprising a first amino acid sequence being at least 90 % homologous to
MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA WLSHYIHN LSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQAQQMN corresponding to amino acids 1 - 104 of PRL_HUMAN, which also conesponds to amino acids 1 - 104 of HSPRO204_PEA_l_Pl 1, and a second amino acid sequence being at least 90 % homologous to VHPETKENEIYPVWSGLPSLQMADEESRLSAYYNLLHCLRRDSHKXDNYLKLLKCRIIH NNNC conesponding to amino acids 165 - 227 of PRL_HUMAN, which also conesponds to amino acids 105 - 167 of HSPRO204_PEA_l_Pl 1, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of
HSPRO204_PEA_l_Pl 1, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise NV, having a structure as follows: a sequence starting from any of amino acid numbers 104-x to 104; and ending at any of amino acid numbers 105+ ((n-2) - x), in which x varies from 0 to n-2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSPRO204_PEA_l_Pl 1 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the
amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_Pl 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
The glycosylation sites of variant protein HSPRO204_PEA_l_Pl 1, as compared to the known protein Prolactin precursor, are described in Table 18 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether
the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 18 - Glycosylation site(s)
Variant protein HSPRO204_PEA_l_Pl l is encoded by the following franscript(s): HSPRO204_PEA_l_Tl 1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPRO204_PEA_l_Tl 1 is shown in bold; this coding portion starts at position 192 and ends at position 692. The franscript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_Pl 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Variant protein HSPRO204_PEA_l_P12 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSPRO204_PEA_l_T12. An alignment is given to the known protein (Prolactin precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPRO204_PEA_l_P12 and PRL_HUMAN:
l.An isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P12, comprising a first amino acid sequence being at least 90 %> homologous to MMKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA WLSHYIHN LSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQAQQMN corresponding to amino acids 1 - 104 of PRL_HUMAN, which also conesponds to amino acids 1 - 104 of HSPRO204_PEA_l_P12, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AKRTDCSASSMGQAVV conesponding to amino acids 105 - 120 of HSPRO204_PEA_l_P12, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P12, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence AKRTDCSASSMGQAVV in HSPRO204_PEA_l_P12.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSPRO204_PEA_l_P12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 20, (given according to their position(s) on the amino acid sequence, with the alternative amino acιd(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 -Amino acid mutations
The glycosylation sites of variant protein HSPRO204_PEA_l_P12, as compared to the known protein Prolactin precursor, are described in Table 21 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 21 - Glycosylation site(s)
Variant protein HSPRO204_PEA_l_P12 is encoded by the following franscript(s): HSPRO204_PEA_l_T12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPRO204_PEA_l_T12 is shown in bold; this coding portion starts at position 192 and ends at position 551. The transcript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the
presence of known SNPs in variant protein HSPRO204_PEA_l_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
Variant protein HSPRO204_PEA_l_P16 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSPRO204_PEA_l_T17. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows
with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe frans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein HSPRO204_PEA_l_P16 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations
Variant protein HSPRO204_PEA_l_P16 is encoded by the following franscript(s): HSPRO204_PEA_l_T17, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPRO204_PEA_l_T17 is shown in bold; this coding portion starts at position 168 and ends at position 671. The transcript also has the following SNPs as
listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Variant protein HSPRO204_PEA_l_P21 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSPRO204_PEA_l_T18. An alignment is given to the known protein (Prolactin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSPRO204_PEA_l_P21 and PRL_HUMAN: l.An isolated chimeric polypeptide encoding for HSPRO204_PEA_l_P21, comprising a first amino acid sequence being at least 90 % homologous to MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQ corresponding to amino acids 1 - 40 of PRL_HUMAN, which also conesponds to amino acids 1 - 40 of HSPRO204_PEA_l_P21, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90%> and most preferably at least 95% homologous to a polypeptide having the sequence LPHFFPCHPRRQGASPTDESKRLSEPDSQHIAILE conesponding to amino acids 41 - 75 of HSPRO204_PEA_l_P21, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSPRO204_PEA_l_P21, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LPHFFPCHPRRQGASPTDESKRLSEPDSQHIAILE in HSPRO204_PEA_l_P21.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein HSPRO204_PEA_l_P21 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 25, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether
the SNP is known or not; the presence of known SNPs in variant protein HSPRO204_PEA_l_P21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Amino acid mutations
The glycosylation sites of variant protein HSPRO204_PEA_l_P21, as compared to the known protein Prolactin precursor, are described in Table 26 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 26 - Glycosylation site(s)
Variant protein HSPRO204_PEA_l_P21 is encoded by the following franscript(s): HSPRO204_PEA_l_T18, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSPRO204_PEA_l_T18 is shown in bold; this coding portion starts at position 192 and ends at position 416. The transcript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the
presence of known SNPs in variant protein HSPRO204_PEA_l_P21 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Nucleic acid SNPs
As noted above, cluster HSPRO204 features 31 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSPRO204_PEA_l_node_2 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPRO204_PEA_l_T22. Table 28 below describes the starting and ending position of this segment on each franscript. Table 28 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_6 according to the present invention is supported by 65 libraπes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl l, HSPRO204_PEA_l_T12 and HSPRO204_PEA_l_T18. Table 29 below describes the starting and ending position of this segment on each franscript. Table 29 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_20 according to the present invention is supported by 1 hbranes The number of hbranes was determined as previously descnbed This segment can be found in the following transcnpt(s) HSPRO204_PEA_l_T17 Table 30 below descπbes the starting and ending position of this segment on each franscπpt Table 30 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_28 according to the present invention is supported by 1 libraries The number of hbranes was determined as previously descnbed This segment can be found in the following transcnpt(s) HSPRO204_PEA_l_T12 Table 31 below descnbes the starting and ending position of this segment on each franscπpt Table 31 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_35 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPRO204_PEA_l_T6. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_40 according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_Tl 1, HSPRO204_PEA_l_T17 and HSPRO204_PEA_l_T18. Table 33 below describes the starting and ending position of this segment on each franscript. Table 33 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_41 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_Tl l, HSPRO204_PEA_l_T17 and HSPRO204_PEA_l_T18. Table 34 below describes the starting and ending position of this segment on each franscript. Table 34 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HSPRO204_PEA_l_node_0 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl l, HSPRO204_PEA_l_T12, HSPRO204_PEA_l_T18 and HSPRO204_PEA_l_T22. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_9 according to the present invention is supported by 70 libraries. The number of hbranes was determined as previously described. This segment can be found in the following transcnpt(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl l, HSPRO204_PEA_l_T12 and HSPRO204_PEA_l_T18. Table 36 below describes the starting and ending position of this segment on each franscript. Table 36 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_10 according to the present invention is supported by 72 hbranes. The number of hbranes was determined as previously described. This segment can be found in the following transcript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl l, HSPRO204_PEA_l_T12 and HSPRO204_PEA_l_T18. Table 37 below descπbes the starting and ending position of this segment on each franscript. Table 37 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_l 1 according to the present invention can be found in the following transcript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl 1, HSPRO204_PEA_l_T12 and HSPRO204_PEA_l_T18. Table 38 below describes the starting and ending position of this segment on each franscπpt. Table 38 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_12 according to the present invention can be found in the following franscript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl l, HSPRO204_PEA_l_T12 and HSPRO204_PEA_l_T18. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_13 according to the present invention can be found in the following transcript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl l and HSPRO204_PEA_l_T12. Table 40 below describes the starting and ending position of this segment on each franscript. Table 40 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_14 according to the present invention can be found in the following franscript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl 1 and HSPRO204_PEA_l_T12. Table 41 below describes the starting and ending position of this segment on each franscript. Table 41 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_15 according to the present invention can be found in the following transcript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl 1 and
HSPRO204_PEA_l_T12. Table 42 below describes the starting and ending position of this segment on each franscript. Table 42 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_16 according to the present invention can be found in the following franscript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl l and HSPRO204_PEA_l_T12. Table 43 below describes the starting and ending position of this segment on each franscript. Table 43 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_17 according to the present invention can be found in the following transcript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl l and HSPRO204_PEA_l_T12. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_18 according to the present invention can be found in the following transcript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl l and HSPRO204_PEA_l_T12. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_22 according to the present invention can be found in the following transcript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl 1, HSPRO204_PEA_l_T12 and HSPRO204_PEA_ l_T17. Table 46 below describes the starting and ending position of this segment on each franscript. Table 46 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_23 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcnpt(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl 1, HSPRO204_PEA_l_T12 and HSPRO204_PEA_l_T17. Table 47 below describes the starting and ending position of this segment on each franscript. Table 47 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_24 according to the present invention can be found in the following transcript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl l, HSPRO204_PEA_l_T12, HSPRO204_PEA_l_T17 and HSPRO204_PEA_l_T18. Table 48 below describes the starting and ending position of this segment on each franscript. Table 48 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_25 according to the present invention can be found in the following franscript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl l,
HSPRO204_PEA_l_T12, HSPRO204_PEA_l_T17 and HSPRO204_PEA_l_T18 Table 49 below descπbes the starting and ending position of this segment on each transcπpt Table 49 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_26 according to the present invention can be found in the following franscπpt(s) HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_Tl 1, HSPRO204_PEA_l_T12, HSPRO204_PEA_l_T17 and HSPRO204_PEA_l_T18 Table 50 below describes the starting and ending position of this segment on each transcπpt Table 50 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_30 according to the present invention can be found in the following transcπpt(s): HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_T17 and HSPRO204_PEA_l_T18. Table 51 below describes the starting and ending position of this segment on each franscπpt. Table 51 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_31 according to the present invention is supported by 67 hbranes. The number of hbranes was determined as previously described. This segment can be found in the following franscπpt(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_T17 and HSPRO204_PEA_l_T18. Table 52 below descπbes the starting and ending position of this segment on each franscπpt. Table 52 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_32 according to the present invention is supported by 66 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_T17 and HSPRO204_PEA_l_T18. Table 53 below describes the starting and ending position of this segment on each franscript. Table 53 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_33 according to the present invention can be found in the following franscript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_T17 and HSPRO204_PEA_l_T18. Table 54 below describes the starting and ending position of this segment on each franscript. Table 54 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_34 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_T6, HSPRO204_PEA_l_T17 and HSPRO204_PEA_l_T18. Table 55 below describes the starting and ending position of this segment on each franscript. Table 55 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_37 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This
segment can be found in the following transcript(s): HSPRO204_PEA_l_T4. Table 56 below describes the starting and ending position of this segment on each franscript. Table 56 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_38 according to the present invention can be found in the following franscript(s): HSPRO204_PEA_l_T4 and HSPRO204_PEA_l_T5. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Segment cluster HSPRO204_PEA_l_node_39 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSPRO204_PEA_l_T3, HSPRO204_PEA_l_T4, HSPRO204_PEA_l_T5, HSPRO204_PEA_l_Tl 1, HSPRO204_PEA_l_T17 and HSPRO204_PEA_l_T18. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name : PRL_HUMAN
Sequence documentation: Alignment of: HSPRO204_PEA_l_P3 x PRL_HUMAN Alignment segment l/l:
Quality: 1027.00 Escore: 0 Matching length: 105 Total length: 105 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.05 Total Percent Similarity: 100.00 Total Percent Identity: 99.05 Gaps : 0
Alignment :
1 MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50
1 MNIKGSP KGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50
51 WLSHYIHNLSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100
51 WLSHYIHNLSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100
101 QQMNK 105 M I h 101 QQMNQ 105
Sequence name : PRL_HUMAN
Sequence documentation:
Alignment of: HSPRO204_PEA_l_P4 x PRL_HUMAN
Alignment segment l/l:
Quality: 1598.00 Escore : 0 Matching length: 164 Total length: 164 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00
Gaps : 0
Alignment : 1 MNIKGSP KGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50
1 MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50 51 WLSHYIHNLSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100 1111 II 1111111111111111111111111 II 11111111111111111 51 WLSHYIHNLSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100 101 QQMNQKDFLSLIVSILRS NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQ 150 101 QQMNQKDFLSLIVSILRS NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQ 150 151 TKRLLEGMELIVSQ 164 IIIMIIIMIIII 151 TKRLLEGMELIVSQ 164
Sequence name : PRL_HUMAN
Sequence documentation:
Alignment of: HSPRO204_PEA_l_P5 x PRL_HUMAN
Alignment segment l/l:
Quality: 1598.00 Escore : 0 Matching length: 164 Total length: 164 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50
1 MNIKGSP KGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50 51 WLSHYIHNLSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100 || I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I II I 51 WLSHYIHNLSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100 101 QQMNQKDFLSLIVSILRS NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQ 150 101 QQMNQKDFLSLIVSILRS NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQ 150 151 TKRLLEGMELIVSQ 164
151 TKRLLEGMELIVSQ 164
Sequence name : PRL_HUMAN
Sequence documentation:
Alignment of: HSPRO204_PEA_l_P6 x PRL HUMAN
Alignment segment l/l:
Quality: 1606.00 Escore : 0 Matching length: 165 Total length: 165 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment
1 MNIKGSP KGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50
1 MNIKGSP KGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50
51 WLSHYIHNLSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100
51 WLSHYIHNLSSEMFS EFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100
101 QQMNQKDFLSLIVSILRS NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQ 150
101 QQMNQKDFLSLIVSILRS NEPLYHLVTEVRGMQEAPEAILSKAVEIEEQ 150
151 TKRLLEGMELIVSQV 165
151 'TKRLLEGMELIVSQV 165
Sequence name : PRL_HUMAN
Sequence documentation:
Alignment of: HSPRO204_PEA_l_Pll x PRL_HUMAN
Alignment segment 1/1:
Quality: 1568.00 Escore: 0 Matching length: 167 Total length: 227 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 73.57 Total Percent Identity: 73.57 Gaps : 1
Alignment :
1 MNIKGSP KGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50
1 MNIKGSPWKGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50 51 WLSHYIHNLSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100 51 WLSHYIHNLSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100 101 QQMN 104 MM 101 QQMNQKDFLSLIVSILRSWNEPLYHLVTEVRGMQEAPEAILSKAVEIEEQ 150 . . . . . 105 VHPETKENEIYPV SGLPSLQMADEESRLSAYYNLL 140
151 TKRLLEGMELIVSQVHPETKENEIYPV SGLPSLQMADEESRLSAYYNLL 200 141 HCLRRDSHKIDNYLKLLKCRIIHNNNC 167 111111111 ϊ f 1111111 111 E 1111 201 HCLRRDSHKIDNYLKLLKCRIIHNNNC 227
Sequence name : PRL_HUMAN
Sequence documentation:
Alignment of: HSPR0204_PEA_1_P12 x PRL_HUMAN
Alignment segment 1/1:
Quality: 1033.00
Escore: 0 Matching length: 106 Total length: 106 Matching Percent Similarity: 99.06 Matching Percent Identity: 99.06 Total Percent Similarity: 99.06 Total Percent Identity: 99.06 Gaps : 0
Alignment :
1 MNIKGSP KGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50 1 MNIKGSP KGSLLLLLVSNLLLCQSVAPLPICPGGAARCQVTLRDLFDRA 50
51 WLSHYIHNLSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100
51 WLSHYIHNLSSEMFSEFDKRYTHGRGFITKAINSCHTSSLATPEDKEQA 100
101 QQMNAK 106 M M I 101 QQMNQK 106
Sequence name : PRL_HUMAN
Sequence documentation:
Alignment of: HSPRO204_PEA_l_P21 x PRL_HUMAN
Alignment segment 1/1:
Quality: 393.00 Escore: 0 Matching length: 40 Total length: 40 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MNIKGSP KGSLLLLLVSNLLLCQSVAPLPICPGGAARCQ 40 M 11 II 11 II I II II I II II II 111 II I II II 11111 II I 1 MNIKGSP KGSLLLLLVSNLLLCQSVAPLPICPGGAARCQ 40
DESCRIPTION FOR CLUSTER T87096
Cluster T87096 features 4 transcript(s) and 47 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Cathepsin D precursor (SwissProt accession identifier CATD_HUMAN; known also according to the synonyms EC 3.4.23.5), SEQ ID NO:635, refened to herein as the previously known protein. Cathepsin D is a lysosomal acid protease, present in normal cells at a very low concenfration, which is active in intracellular protein breakdown. Cathepsin D is first produced in precursor form, pro-cathepsin D (52kD), and then processed in the cell to an intermediate form of 48kD, then finally to the mature forms of 34kD and 14kD. It has been proposed that the presence of high levels of cathepsin D in breast cancer may signify a functional estrogen receptor apparatus indicating a likely response to endocrine therapy. However, the relationship of cathepsin D to prognosis in breast cancer remains controversial. In addition, there is some evidence for the involvement of Cathepsin- D in late onset Alzheimer's disease, including elevated levels in the cerebrospinal fluid, but this evidence is inconclusive. Recently, it has been shown that cathepsin D may be a prognostic factor for squamous cell carcinomas of the skin. The variants according to the present invention are suitable for such diagnostic uses. Protein Cathepsin D precursor is known or believed to have the following function(s): acid protease active in intracellular protein breakdown. Involved in the pathogenesis of several diseases such as breast cancer and possibly Alzheimer's disease. The sequence for protein Cathepsin D precursor is given at the end of the application, as "Cathepsin D precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Cathepsin D precursor localization is believed to be Lysosomal. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: proteolysis and peptidolysis, which are annotation(s) related to Biological Process; cathepsin D; pepsin A; hydrolase, which are annotation(s) related to Molecular Function; and rysosome, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
Cluster T87096 can be used as a diagnostic marker according to overexpression of transcripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 7 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: kidney malignant tumors and pancreas carcinoma.
Table 5 - Normal tissue distribution ϊame,of Tissue , Number adrenal 689 bladder 656
Table 6 - P values and ratios for expression in cancerous tissue
above. These franscript(s) encode for protein(s) which are variant(s) of protein Cathepsin D precursor. A description of each variant protein according to the present invention is now provided.
Variant protein T87096_PEA_1_P11 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T87096_PEA_1_T18. An alignment is given to the known protein (Cathepsin D precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T87096_PEA_1_P11 and CATD_HUMAN: l.An isolated chimeric polypeptide encoding for T87096_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to MQPSSLLPLALCLLAAPAS AL VRIPLHKFTSIRRTMSEVGGSVEDLIAKGPVSKYSQ A VP
AVTEGPIPEVLKNYMDAQYYGEIGIGTPPQCFTVVFDTGSSNLWVPSIHCKLLDIACWIH HKYNSDKSSTYVKNGTSFDIHYGSGSLSGYLSQDTVSVPCQSASSASALGGVKVERQVF GEATKQPGITFIAAKFDGILGMAYPRISVNNVLPVFDNLMQQKLVDQNIFSFYLSRDPD AQPGGELMLGGTDSKYYKGSLSYLNVTRKAYWQVHLDQVEVASGLTLCKEGCEAIVD TGTSLMVGPVDEVRELQKAIGAVPLIQGE conesponding to amino acids 1 - 324 of
CATD_HUMAN, which also conesponds to amino acids 1 - 324 of T87096_PEA_1_P11, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSAGGWGWGWGWQGEPQGHHYHPDTAVTPLST conesponding to amino acids 325 - 356 of T87096_PEA_1_P11, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T87096_PEA_ 1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSAGGWGWGWGWQGEPQGHHYHPDTAVTPLST in T87096_PEA_1_P11.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein T87096_PEA_1_P11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87096_PEA_1_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 7 - Amino acid mutations
The glycosylation sites of vanant protein T87096_PEA_1_P11, as compared to the known protein Cathepsin D precursor, are descπbed in Table 8 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 8 - Glycosylation site(s)
Variant protein T87096_PEA_1_P11 is encoded by the following transcript(s): T87096_PEA_1_T18, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T87096_PEA_1_T18 is shown in bold; this coding portion starts at
position 134 and ends at position 1201. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87096_PEA_1_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein T87096_PEA__1_P27 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) T87096_PEA_1_T37. An alignment is given to the known protein (Cathepsin D precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T87096_PEA_1_P27 and CATD_HUMAN: l.An isolated chimeric polypeptide encoding for T87096_PEA_1_P27, comprising a first amino acid sequence being at least 90 % homologous to MQPSSLLPLALCLLAAPASALVPJPLHKFTSIRRTMSEVGGSVEDLIAKGPVSKYSQAVP AVTEGPIPEVLKNYMDAQYYGEIGIGTPPQCFTVVTDTGSSNLWVPSIHCKLLDIACWIH HKYNSDKSSTYVKNGTSFDIHYGSGSLSGYLSQDTVSVPCQSASSASALGGVKVERQVF GEATKQPGITFIAAKFDGILGMAYPRIS VNNVLPVFDNLMQQKL VDQNIFSFYLSRDPD AQPGGELMLGGTDSKYYKGSLSYLNVTRKAYWQVHLDQV conesponding to amino acids 1 - 277 of CATD_HUMAN, which also conesponds to amino acids 1 - 277 of
T87096_PEA_1_P27, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence WAAVG conesponding to amino acids 278 - 283 of T87096_PEA_1_P27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T87096_PEA_1_P27, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence WAAVG in T87096_PEA_1_P27.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein T87096_PEA_1_P27 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87096_PEA_1_P27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
The glycosylation sites of variant protein T87096_PEA_1_P27, as compared to the known protein Cathepsin D precursor, are described in Table 11 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 11 - Glycosylation site(s)
Variant protein T87096_PEA_1_P27 is encoded by the following franscript(s): T87096_PEA_1_T37, for which the sequence(s) is/are given at the end ofthe application. The coding portion of franscript T87096_PEA_1_T37 is shown in bold; this coding portion starts at position 134 and ends at position 88889. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87096_PEA_1_P27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein T87096_PEA_1_P29 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) T87096_PEA_1_T39. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both
signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein T87096_PEA_1_P29 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87096_PEA_1_P29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein T87096_PEA_1_P29 is encoded by the following franscript(s): T87096_PEA_1_T39, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T87096_PEA_1_T39 is shown in bold; this coding portion starts at position 134 and ends at position 874. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87096_PEA_1_P29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein T87096_PEA_1_P39 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T87096_PEA_1_T14. An alignment is given to the known protein (Cathepsin D precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T87096_PEA_1_P39 and CATD_HUMAN: l.An isolated chimeric polypeptide encoding for T87096_PEA_1_P39, comprising a first amino acid sequence being at least 90 % homologous to MQPSSLLPLALCLLAAPASALVRIPLHKFTSIRRTMSEVGGSVEDLIAKGPVSKYSQAVP AVTEGPIPEVLKNYMDAQYYGEIGIGTPPQCFTWFDTGSSNLWVPSIHCKLLDIAC conesponding to amino acids 1 - 117 of CATD_HUMAN, which also conesponds to amino acids 1 - 117 of T87096_PEA_1_P39, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CESRTLAPSPRSCPSGMSLQGCLRNHLGNAILLPLGPVSQASPPPCSSH conesponding to amino acids 118 - 166 of T87096_PEA_1_P39, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T87096_PEA_1_P39, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CESRTLAPSPRSCPSGMSLQGCLRNHLGNAILLPLGPVSQASPPPCSSH in T87096_PEA_1_P39.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a trans- membrane region. Variant protein T87096_PEA_1_P39 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87096_PEA_1_P39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 -Amino acid mutations
The glycosylation sites of variant protein T87096_PEA_1_P39, as compared to the known protein Cathepsin D precursor, are described in Table 16 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 16 - Glycosylation site(s)
Variant protein T87096_PEA_1_P39 is encoded by the following franscript(s): T87096_PEA_1_T14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T87096_PEA_1_T14 is shown in bold; this coding portion starts at position 134 and ends at position 631. The franscript also has the following SNPs as listed in Table 17 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T87096_PEA_1_P39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Nucleic acid SNPs
As noted above, cluster T87096 features 47 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T87096_PEA_l_node_2 according to the present invention is supported by 138 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18, T87096_PEA_1_T37 and T87096_PEA_1_T39. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_21 according to the present invention is supported by 195 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T37. Table 19 below describes the starting and ending position of this segment on each franscript. Table 19 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_22 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87096_PEA_1_T14. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_27 according to the present invention is supported by 276 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T37. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_38 according to the present invention is supported by 272 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87096_PEA_1_T14 and T87096_PEA_1_T18. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_55 according to the present invention is supported by 313 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and
T87096_PEA_1_T39. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T87096_PEA_l_node_l l according to the present invention is supported by 172 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18, T87096_PEA_1_T37 and T87096_PEA_1_T39. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_12 according to the present invention can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18, T87096_PEA_1_T37 and T87096_PEA_1_T39. Table 25 below describes the starting and ending position of this segment on each franscript.
Table 25 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_13 according to the present invention is supported by 178 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18, T87096_PEA_1_T37 and T87096_PEA_1_T39. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_14 according to the present invention can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18, T87096_PEA_1_T37 and T87096_PEA_1_T39. Table 27 below describes the starting and ending position of this segment on each franscript. Table 27 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_15 according to the present invention can be found in the following franscπpt(s): T87096_PEA_1_T14, T87096_PEA_1_T18, T87096_PEA_1_T37 and T87096_PEA_1_T39. Table 28 below descπbes the starting and ending position of this segment on each franscπpt Table 28 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_16 according to the present invention can be found in the following transcπpt(s)- T87096_PEA_1_T14, T87096_PEA_1_T18, T87096_PEA_1_T37 and T87096_PEA_1_T39 Table 29 below descπbes the starting and ending position of this segment on each transcript Table 29 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_17 according to the present invention can be found in the following transcript(s): T87096_PEA_1 T14, T87096_PEA_1_T18, T87096_PEA_1_T37 and T87096_PEA_1_T39. Table 30 below descπbes the startmg and ending position of this segment on each transcπpt. Table 30 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_18 according to the present invention can be found in the following transcπpt(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T37. Table 31 below descπbes the starting and ending position of this segment on each franscnpt. Table 31 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_23 according to the present invention can be found in the following franscript(s): T87096_PEA_1_T14. Table 32 below describes the starting and ending position of this segment on each franscript. Table 32 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_24 according to the present invention is supported by 239 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T37. Table 33 below describes the starting and ending position of this segment on each franscript. Table 33 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_26 according to the present invention is supported by 259 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T37. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_30 according to the present invention can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T37. Table 35 below describes the starting and ending position of this segment on each transcript.
Segment cluster T87096_PEA_l_node_31 according to the present invention is supported by 292 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T37. Table 36 below describes the starting and ending position of this segment on each franscript. Table 36 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_32 according to the present invention is supported by 276 hbranes The number of hbranes was determined as previously descnbed. This segment can be found in the following transcπpt(s) T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T37. Table 37 below describes the starting and ending position of this segment on each franscnpt. Table 37 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_33 according to the present invention can be found in the following transcπpt(s) T87096_PEA_1_T37 Table 38 below descπbes the starting and ending position of this segment on each transcript Table 38 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_39 according to the present invention can be found in the following transcπpt(s) T87096_PEA_1_T14 and T87096_PEA_1_T18 Table 39 below describes the starting and ending position of this segment on each transcπpt Table 39 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_40 according to the present invention is supported by 4 libraπes The number of hbranes was determined as previously descnbed This segment can be found in the following franscπpt(s) T87096_PEA_1_T18 Table 40 below describes the starting and ending position of this segment on each franscπpt Table 40 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_41 according to the present invention is supported by 253 libraπes The number of hbranes was determined as previously described This segment can be found in the following franscπpt(s) T87096_PEA_1_T14 and T87096_PEA_1_T18 Table 41 below describes the starting and ending position of this segment on each transcript Table 41 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_42 according to the present invention can be found in the following transcript(s): T87096_PEA_1_T14 and T87096_PEA_1_T18. Table 42 below describes the starting and ending position of this segment on each franscript. Table 42 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_43 according to the present invention is supported by 262 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87096_PEA_1_T14 and T87096_PEA_1_T18. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_47 according to the present invention is supported by 248 libraries. The number of hbranes was determined as previously described. This segment can be found in the following franscript(s): T87096_PEA_1_T14 and T87096_PEA_1_T18. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_48 according to the present invention can be found in the following franscript(s): T87096_PEA_1_T14 and T87096_PEA_1_T18. Table 45 below describes the starting and ending position of this segment on each franscript. Table 45 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_49 according to the present invention is supported by 283 libraries. The number of libraries was determined as previously described. This segment can be found in the following frans cript(s): T87096_PEA_1_T14 and T87096_PEA_1_T18. Table 46 below describes the starting and ending position of this segment on each franscript. Table 46 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_50 according to the present invention is supported by 276 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87096_PEA_1_T14 and T87096_PEA_1_T18. Table 47 below describes the starting and ending position of this segment on each franscript. Table 47 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_51 according to the present invention can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 48 below describes the starting and ending position of this segment on each franscript. Table 48 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_52 according to the present invention is supported by 301 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_53 according to the present invention can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 50 below describes the starting and ending position of this segment on each franscript. Table 50 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_54 according to the present invention is supported by 290 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 51 below describes the starting and ending position of this segment on each franscript. Table 51 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_56 according to the present invention can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and
T87096_PEA_1_T39. Table 52 below describes the starting and ending position of this segment on each franscript. Table 52 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_57 according to the present invention can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_58 according to the present invention can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 54 below describes the starting and ending position of this segment on each franscnpt. Table 54 - Segment location on transcripts
Segment cluster T87096_PEA_ l_node_59 according to the present invention can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_60 according to the present invention is supported by 190 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_61 according to the present invention can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 57 below describes the starting and ending position of this segment on each franscript. Table 57 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_62 according to the present invention can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 58 below describes the starting and ending position of this segment on each franscnpt. Table 58 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_63 according to the present invention can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 59 below describes the starting and ending position of this segment on each transcript.
Table 59 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_64 according to the present invention is supported by 167 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_65 according to the present invention is supported by 126 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_66 according to the present invention is supported by 125 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_67 according to the present invention is supported by 105 libraries. The number of libraries was determined as previously described. This segment can be found in the following ttanscript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Segment cluster T87096_PEA_l_node_68 according to the present invention is supported by 71 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T87096_PEA_1_T14, T87096_PEA_1_T18 and T87096_PEA_1_T39. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: CATD_HUMA
Sequence documentation:
Alignment of: T87096_PEA_1_P11 x CATD_HUMAN Alignment segment l/l
Quality: 3143.00 Escore: 0 Matching length: 324 Total length: 324
Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
1 MQPSSLLPLALC LAAPASALVRIPLHKFTSIRRTMSEVGGSVEDLIAKG 50 111111111111111111111111 II 111111111111111111111111 1 MQPSSLLPLALCLLTλAPASALVRIPLHKFTSIRRTMSEVGGSVEDLIAKG 50 51 PVSKYSQAVPAVTEGPIPEV KNYMDAQYYGEIGIGTPPQCFTWFDTGS 100 51 PVSKYSQAVPAVTEGPIPEVLKNYMDAQYYGEIGIGTPPQCFTWFDTGS 100
101 SNL VPSIHCKL DIACWIHHKYNSDKSSTYVKNGTSFDIHYGSGSLSGY 150 IIIIMIIIIMIIIIMIIM 101 SNL VPSIHCKL DIACWIHHKYNSDKSSTYVK GTSFDIHYGSGSLSGY 150 . . . . . 151 LSQDTVSVPCQSASSASALGGVKVERQVFGEATKQPGITFIAAKFDGILG 200 IIIIIMIIMIII 151 LSQDTVSVPCQSASSASALGGVKVERQVFGEATKQPGITFIAAKFDGILG 200 201 MAYPRISVNNVLPVFDN MQQKLVDQNIFSFYLSRDPDAQPGGELMLGGT 250 IIIIM 201 MAYPRISVNNVLPVFDNLMQQKLVDQNIFSFYLSRDPDAQPGGELM GGT 250 251 DSKYYKGSLSYL VTRKAYWQVHLDQVEVASGLTLCKEGCEAIVDTGTS 300 11111111111111111111111111111111111111111111111111 251 DSKYYKGSLSYLNVTRKAY QVHLDQVEVASGLTLCKEGCEAIVDTGTSL 300
301 MVGPVDEVRELQKAIGAVPLIQGE 324
301 MVGPVDEVRELQKAIGAVPLIQGE 324
Sequence name : CATD_HUM7ΛN
Sequence documentation:
Alignment of: T87096_PEA 1 P27 x CATD_HUMAN
Alignment segment 1/1
Quality: 2716.00 Escore: 0 Matching length: 281 Total length: 281 Matching Percent Similarity: 99.64 Matching Percent Identity: 99.29 Total Percent Similarity: 99.64 Total Percent Identity: 99.29 Gaps : 0
Alignment :
1 MQPSSLLPLALCLLAAPASALVRIPLHKFTSIRRTMSEVGGSVEDLIAKG 50
1 MQPSSLLPLALCLLAAPASALVRIPLHKFTSIRRTMSEVGGSVEDLIAKG 50 51 PVSKYSQAVPAVTEGPIPEVLKNYMDAQYYGEIGIGTPPQCFTWFDTGS 100 1111111 f I E f I E 111 f I E i 1111111111 E 11111 f 111 i 111111 f 11 51 PVSKYSQAVPAVTEGPIPEVLKNYMDAQYYGEIGIGTPPQCFTWFDTGS 100 101 SNLWVPSIHCKLLDIACWIHHKYNSDKSSTYVKNGTSFDIHYGSGSLSGY 150 101 SNL VPSIHCKLLDIAC IHHKYNSDKSSTYVKNGTSFDIHYGSGSLSGY 150 151 LSQDTVSVPCQSASSASALGGVKVERQVFGEATKQPGITFIAAKFDGILG 200
151 LSQDTVSVPCQSASSASALGGVKVERQVFGEATKQPGITFIATKFDGILG 200 . . . . . 201 MAYPRISVNNVLPVFDNLMQQKLVDQNIFSFYLSRDPDAQPGGELMLGGT 250
201 MAYPRISVNNVLPVFDNLMQQKLVDQNIFSFYLSRDPDAQPGGELMLGGT 250 251 DSKYYKGSLSYLNVTRKAY QVHLDQVWAA 281 11111 i f 1111 E 1111111 E 1111 i 11 251 DSKYYKGSLSYLNVTRKAY QVHLDQVEVAS 281
Sequence name : CATD_HUMAN
Sequence documentation:
Alignment of: T87096_PEA 1 P39 x CATD_HUMAN
Alignment segment l/l:
Quality: 1137.00 Escore : 0 Matching length: 117 Total length: 117 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment
1 MQPSSLLPLALCLLAAPASALVRIPLHKFTSIRRTMSEVGGSVEDLIAKG 50 M I I M I I I I M I I I I 1 MQPSSLLPLALCLLAAPASALVRIPLHKFTSIRRTMSEVGGSVEDLIAKG 50
51 PVSKYSQAVPAVTEGPIPEVLKNYMDAQYYGEIGIGTPPQCFTWFDTGS 100
51 PVSKYSQAVPAVTEGPIPEVLKNYMDAQYYGEIGIGTPPQCFTWFDTGS 100
101 SNL VPSIHCKLLDIAC 117
101 SNLWVPSIHCKLLDIAC 117
DESCRIPTION FOR CLUSTER S42303
Cluster S42303 features 6 transcript(s) and 27 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Neural-cadherin precursor (SwissProt accession identifier CAD2_HUMAN; known also according to the synonyms N-cadherin; Cadherin-2), SEQ ID NO: 673, refened to herein as the previously known protein. The N-cadherin type I membrane protein is a calcium dependent cell-cell adhesion glycoprotein comprised of five extracellular cadherin repeats, a fransmembrane region and a highly conserved cytoplasmic tail. The protein functions during gastrulation and is required for establishment of left-right asymmetry. At certain central nervous system synapses, presynaptic to postsynaptic adhesion is mediated at least in part by this gene product. It has been shown to
promote tumor invasiveness in breast and colon cancers. In addition, expression of N-cadherin was observed in the intercellular spaces between tumor cells in gastric carcinoma and to have a utility in distinguishing pleural mesotheliomas from lung adenocarcinomas. The variants according to the present invention have these diagnostic utilities. Variants according to the present invention also have utility for lung and breast cancer diagnostics. Protein Neural- cadherin precursor is known or believed to have the following function(s): cadherins are calcium dependent cell adhesion proteins. They preferentially interact with themselves in a homophilic manner in connecting cells; cadherins may thus contribute to the sorting of heterogeneous cell types. N-cadhenn may be involved in neuronal recognition mechanism. The sequence for protein Neural- cadherin precursor is given at the end ofthe application, as "Neural- cadherin precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Neural-cadherin precursor localization is believed to be Type I membrane protein. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: cell adhesion; homophilic cell adhesion, which are annotatιon(s) related to Biological Process; calcium binding; protein binding, which are annotation(s) related to Molecular Function; and integral membrane protein, which are annotation(s) related to Cellular Component.
The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
The heart- selective diagnostic marker prediction engine provided the following results with regard to cluster S42303. Predictions were made for selective expression of transcripts of this cluster in heart tissue, according to the previously described methods. The numbers on the y-axis of Figure 8 refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histogram in Figure 8, concerning the number of heart- specific clones in libraries/sequences; as well as with regard to the histogram in Figures 9 - 10, concerning the acmal expression of oligonucleotides in various tissues, including heart.
This cluster was found to be selectively expressed in heart for the following reasons: in a comparison ofthe ratio of expression of the cluster in heart specific ESTs to the overall expression of the cluster in non- heart ESTs, which was found to be 4.8; the ratio of expression of the cluster in heart specific ESTs to the overall expression of the cluster in muscle- specific ESTs which was found to be 120.3; and fisher exact test P-values were computed both for library and weighted clone counts to check that the counts are statistically significant, and were found to be 5.60E-06.
One particularly important measure of specificity of expression of a cluster in heart tissue is the previously described comparison of the ratio of expression of the cluster in heart as opposed to muscle. This cluster was found to be specifically expressed in heart as opposed to non-heart ESTs as described above. However, many proteins have been shown to be generally expressed at a higher level in both heart and muscle, which is less desirable. For this cluster, as
described above, the ratio of expression of the cluster in heart specific ESTs to the overall expression of the cluster in muscle-specific ESTs which was found to be 4.8, which clearly supports specific expression in heart tissue.
As noted above, cluster S42303 features 6 franscript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Neural-cadherin precursor. A description of each variant protein according to the present invention is now provided.
Variant protein S42303_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) S42303_PEA_1_T4. An alignment is given to the known protein (Neural-cadherin precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S42303_PEA_1_P2 and CAD2_HUMAN: l.An isolated chimeric polypeptide encoding for S42303_PEA_1_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homobgous to a polypeptide having the sequence SRRNYGKWKLDGMFLLRRYVCIFTEKLKNQAELYVFLS conesponding to amino acids 1 - 38 of S42303_PEA_1_P2, and a second amino acid sequence being at least 90 % homologous to
VKFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQDKETQEK WQVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQRQKRDWVIPPINLPENSRG PFPQELVPVIRSDRDKNLSLRYSVTGPGADQPPTGIFIINPISGQLSVTKPLDREQIARFHLR AHAVDΓNGNQVENPIDIVΓNVIDMNDNRPEFLHQVWNGTVPEGSKPGTYVMTVTAIDA DDPNALNGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQATDM EGNPTYGLSNTATA VITVTD VNDNPPEFTAMTFYGEVPENRVDIIVANLTVTDKDQPHT PAW AVYRISGGDPTGRFAIQTDPNSNDGLVTVVKPIDFETNRMFVLTVAAENQVPLA KGIQHPPQSTATVSVTVIDVNENPYFAPNPKIIRQEEGLHAGTMLTTFTAQDPDRYMQQ
NIRYTKLSDPANWLKIDPVNGQITTIAVLDRESPNV NMYNATFLASDNGIPPMSGTGT LQIYLLDΓNDNAPQVLPQEAETCETPDPNSΓNITALDYDIDPNAGPFAFDLPLSPVTIKRN WTITRLNGDFAQLNLKIKFLEAGΓYEVPIIITDSGNPPKSNISILRVKVCQCDSNGDCTDV DWVGAGLGTGAΠAILLCΠILLILVLMFVVWMKRRDKERQAKQLLIDPEDDVRDNILKY DEEGGGEEDQDYDLSQLQQPDTVEPDAIKPVGIPXRMDEPVPIHAEPQYPVRSAAPHPGDI GDFINEGLKAADNDPTAPPYDSLLVFDYEGSGSTAGSLSSLNSSSSGGEQDYDYLNDW GPRFKKLADMYGGGDD corresponding to amino acids 58 - 906 of CAD2_HUMAN, which also conesponds to amino acids 39 - 887 of S42303_PEA_1_P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of S42303_PEA_1_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SRRNYGKWKLDGMFLLRRYVCIFTEKLKNQAELYVFLS of S42303_PEA_1_P2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans- membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non- secreted protein. Variant protein S42303_PEA_1_P2 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S42303_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
The glycosylation sites of variant protein S42303_PEA_1_P2, as compared to the known protein Neural-cadherin precursor, are described in Table 6 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 6 - Glycosylation site(s)
Variant protein S42303_PEA_1_P2 is encoded by the following transcript(s): S42303_PEA_1_T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S42303_PEA_1_T4 is shown in bold; this coding portion starts at position 1 and ends at position 2661. The franscript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of
known SNPs in variant protein S42303_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein S42303_PEA_1_P3 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) S42303_PEA_1_T5. An alignment is given to the known protein (Neural-cadherin precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S42303_PEA_1_P3 and CAD2_HUMAN: l.An isolated chimeric polypeptide encoding for S42303_PEA_1_P3, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MCNTQRM conesponding to amino acids 1 - 7 of S42303_PEA_1_P3, and a second amino acid sequence being at least 90 % homologous to
KFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQDKETQEKW QVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQRQKRDWVIPPINLPENSRGP FPQELVRIRSDRDKNLSLRYS VTGPGADQPPTGIFIINPISGQLS VTKPLDREQIARFHLRA HAVDINGNQVENPIDIVINVIDMNDNRPEFLHQVWNGTVPEGSKPGTYVMTVTAIDAD DPNALNGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQATDME GNPTYGLSNTATA VITVTD VNDNPPEFTAMTFYGEVPENRVDIIVANLTVTDKDQPHTP AWNAVYRISGGDPTGRFAIQTDPNSNDGLVTVVKPIDFETNRMFVLTVAAENQVPLAK GIQHPPQSTATVSVTVIDVNENPYFAPNPKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNI RYTKLSDPANWLKIDPVNGQITTIAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQ IYLLDΓNDNAPQVLPQEAETCETPDPNSΓNITALDYDIDPNAGPFAFDLPLSPVTIKRNWTI TRLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILRVKVCQCDSNGDCTDVDRIV GAGLGTGAIIAILLCIIILLILVLMFVVWMKRRDKERQAKQLLIDPEDDVRDNILKYDEE GGGEEDQDYDLSQLQQPDTVEPDAIKPVGIRRMDERPIHAEPQYPVRSAAPHPGDIGDFI NEGLKAADNDPTAPPYDSLLVFDYEGSGSTAGSLSSLNSSSSGGEQDYDYLNDWGPRF
KKLADMYGGGDD conesponding to amino acids 59 - 906 of CAD2_HUMAN, which also conesponds to amino acids 8 - 855 of S42303_PEA_1_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of S42303_PEA_1_P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%,
more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MCNTQRM of S42303_PEA_1_P3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both trans- membrane region prediction programs predicted a trans -membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein S42303_PEA_1_P3 also has the following non- silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S42303_PEA_1_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
The glycosylation sites of variant protein S42303_PEA_1_P3, as compared to the known protein Neural-cadherin precursor, are descπbed in Table 9 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the vanant protein; and the last column indicates whether the position is different on the variant protein).
Table 9 - Glycosylation site(s)
Variant protein S42303_PEA_1_P3 is encoded by the following franscript(s): S42303_PEA_1_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript S42303_PEA_1_T5 is shown in bold; this coding portion starts at position 174 and ends at position 2738. The franscript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S42303_PEA_1_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein S42303_PEA_1_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) S42303_PEA_1_T6. An alignment is given to the known protein (Neural-cadherin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S42303_PEA_1_P4 and CAD2_HUMAN: l.An isolated chimeric polypeptide encoding for S42303_PEA_1_P4, comprising a first amino acid sequence being at least 90 % homologous to MVYAVRSFPLSSEHAKFLIYAQDKETQEKWQVAVKLSLKPTLTEESVKESAEVEEIVFP RQFSKHSGITLQRQKP )WVIPPΓNLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGPGAD QPPTGIFIINPISGQLSVTKPLDREQIARFHLRAHAVDINGNQVENPIDIVINVIDMNDNRP EFLHQVWNGTVPEGSKPGTYVMTVTAID ADDPNALNGMLRYRIVSQAPSTPSPNMFTI NNETGDIITVAAGLDREKVQQYTLIIQATDMEGNPTYGLSNTATA VITVTD VNDNPPEF
TAMTFYGEVPENRVDIIVANLTVTDKDQPHTPAWNAVYRISGGDPTGRFAIQTDPNSND GLVTVVKPIDFETNRMFVLTVAAENQVPLAKGIQHPPQSTATVSVTVIDVNENPYFAPN PKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRYTKLSDPANWLKJDPVNGQITTIAVL DRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIYLLDINDNAPQVLPQEAETCETPDPN S ITALDYDIDPNAGPFAFDLPLSPVTIKTIN VTITRLNGDFAQLNLKJKFLEAGIYEVPIII TDSGNPPKSNISILRVKVCQCDSNGDCTDVDRΓVGAGLGTGAIIAILLCIIILLILVLMFVV WMKRRDKERQAKQLLIDPEDDVRDNILKYDEEGGGEEDQDYDLSQLQQPDTVEPDAI KPVGIRRMDERPIHAEPQYPVRSAAPHPGDIGDFINEGLKAADNDPTAPPYDSLLVFDYE GSGSTAGSLSSLNSSSSGGEQDYDYLNDWGPRFKKLADMYGGGDD conesponding to amino acids 86 - 906 of CAD2_HUMAN, which also corresponds to amino acids 1 - 821 of S42303 PEA_1_P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both frans-membrane region prediction programs predicted a frans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein. Variant protein S42303_PEA_1_P4 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S42303_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 -Amino acid mutations
The glycosylation sites of variant protein S42303_PEA_1_P4, as compared to the known protein Neural-cadherin precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Glycosylation site(s)
Variant protein S42303_PEA_1_P4 is encoded by the following franscript(s): S42303_PEA_1_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript S42303_PEA_1_T6 is shown in bold; this coding portion starts at position 167 and ends at position 2629. The franscript also has the following SNPs as listed in Table 13 (given according to their position on tie nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S42303_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein S42303_PEA_1_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S42303_PEA_1_T8. An alignment is given to the known protein (Neural-cadherin precursor) at ' the end ofthe application. One or more alignments to one or more previously published protein
sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S42303_PEA_1_P5 and CAD2_HUMAN_V1 (SEQ ID NO: 674): l.An isolated chimeric polypeptide encoding for S42303_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to
MCRIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVHEGQPLLNVK FSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQDKETQEKWQ VAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQRQKl^WVIPPiNLPENSRGPFP QELVRIRSDRDKNLSLRYSVTGPGADQPPTGIFIINPISGQLSVTKPLDREQIARFHLRAH AVDINGNQVENPIDIVlr VroMNDNP^EFLHQVWNGTVPEGSKPGTYVMTVTAIDADDP NALNGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQATDMEGN PTYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIIVANLTVTDKDQPHTPAW NAVYRISGGDPTGRFAIQTDPNSNDGLVTVVKPIDFETNRMFVLTVAAENQVPLAKGIQ HPPQSTATVSVTVIDVNENPYFAPNPKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRY TKLSDPANWLKTDPVNGQITTIAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIY LLDINDNAPQVLPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKRNWTIT RLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILR conesponding to amino acids 1 - 697 of CAD2_HUMAN_V1, which also conesponds to amino acids 1 - 697 of S42303_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SIC conesponding to amino acids 698 - 700 of S42303_PEA_1_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
It should be noted that the known protein sequence (CAD2_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for CAD2_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 14 - Changes to CAD2_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein S42303_PEA_1_P5 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S42303_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Variant protein S42303_PEA_1_P5 is encoded by the following franscript(s): S42303_PEA_1_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S42303_PEA_1_T8 is shown in bold; this coding portion starts at position 666 and ends at position 2765. The franscript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of
known SNPs in
protein S42303_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein S42303_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) S42303_PEA_1_T9. An alignment is given to the known protein (Neural-cadherin precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S42303_ PEA_1_P6 and CAD2_HUMAN_V1: l.An isolated chimeric polypeptide encoding for S42303_PEA_1_P6, comprising a first amino acid sequence being at least 90 % homologous to MCIIIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVHEGQPLLNVK FSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQDKETQEKWQ VAVKISLKPTLTEESVKESAE\ΈEIWPRQFSKHSGHLQRQKIΦWVIPPINLPENSRGPFP QELVRIRSDRDKNLSLRYSVTGPGADQPPTGIFIΓNPISGQLSVTKPLDREQIARFHLRAH AVDINGNQVENPIDIVINVROMNDNRPEFLHQVWNGTVPEGSKPGTYVMTVTAIDADDP NALNGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQATDMEGN PTYGLSNTATA VITVTD VNDNPPEFTAMTFYGEVPENRVDIIVANLTVTDKDQPHTP AW NAVYRISGGDPTGRFAIQTDPNSNDGLVTVVKPIDFETNRMFVLTVAAENQVPLAKGIQ HPPQSTATVSVTVIDVNENPYFAPNPKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRY TKLSDPANWLKIDPVNGQITTIAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIY LLDINDNAPQVLPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKRNWTIT RLNGDFAQLNLKTKFLEAGIYEVPIIITDSGNPPKSNISILRVKVCQCDSNGDCTDVDRIV GAGLGTGAIIAILLCIIILLILVLMFVVWMKRRDKERQAKQLLIDPEDDVRDNILKYDEE GGGEEDQDYDLSQLQQPDTVEPDAIKPVGIRRMDERPIHAEPQYPVRSAAPHPGDIGDFI
NE conesponding to amino acids 1 - 838 of CAD2_HUMAN_V1, which also conesponds to amino acids 1 - 838 of S42303_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KTWPIESLHL conesponding to amino acids 839 - 848 of S42303_PEA_1_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S42303_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%,
more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KTWPIESLHL in S42303_PEA_1_P6.
It should be noted that the known protein sequence (CAD2_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino acid sequence for CAD2_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 17 - Changes to CAD2_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal- peptide prediction programs agree that this protein has a signal peptide, both trans- membrane region prediction programs predict that this protein has a frans-membrane region downstream of this signal peptide. Variant protein S42303_PEA_1_P6 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 18, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S42303_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 -Amino acid mutations
Variant protein S42303_PEA_1_P6 is encoded by the following transcript(s): S42303_PEA_1_T9, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S42303_PEA_1_T9 is shown in bold; this coding portion starts at position 666 and ends at position 3209. The franscript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S42303_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Variant protein S42303_PEA_1_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) S42303_PEA_1_T10. An alignment is given to the known protein (Neural-cadherin precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S42303_PEA_1_P7 and CAD2_HUMAN_V 1 : l.An isolated chimeric polypeptide encoding for S42303_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to MCRIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVHEGQPLLNVK FSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQDKETQEKWQ VAVKLSLKPTLTEESVKES AEVEEIVFPRQFSKHSGHLQRQKRDWVlPPiNLPENSRGPFP QELVRIRSDRDKNLSLRYSVTGPGADQPPTGIFIINPISGQLSVTKPLDREQIARFH conesponding to amino acids 1 - 234 of CAD2_HUMAN_V1, which also conesponds to amino acids 1 - 234 of S42303_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VRFQPADN conesponding to amino acids 235 - 242 of S42303_PEA_1_P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of S42303_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VRFQPADN in S42303_PEA_1_P7.
It should be noted that the known protein sequence (CAD2_HUMAN) has one or more changes than the sequence given at the end of the application and named as being the amino
acid sequence for CAD2_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 20 - Changes to CAD2_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein S42303_PEA_1_P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 21, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S42303_PEA_1_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 -Amino acid mutations
Variant protein S42303_PEA_1_P7 is encoded by the following transcript(s): S42303_PEA_1_T10, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S42303_PEA_1_T10 is shown in bold; this coding portion starts at position 666 and ends at position 88889. The transcript also has the following SNPs as listed in
Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S42303_PEA_1_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
As noted above, cluster S42303 features 27 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster S42303_PEA_l_node_l according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S42303_PEA_1_T8, S42303_PEA_1_T9 and S42303_PEA_1_T10. Table 23 below describes the starting and ending position of this segment on each franscript. Table 23 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_2 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S42303_PEA_1_T8, S42303_PEA_1_T9 and S42303_PEA_1_T10. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_3 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S42303_PEA_1_T8, S42303_PEA_1_T9 and S42303_PEA_1_T10. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_10 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment
can be found in the following transcript(s): S42303_PEA_1_T5. Table 26 below describes the starting and ending position of this segment on each franscript. Table 26 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_14 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously descnbed. This segment can be found in the following transcript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6, S42303_PEA_1_T8, S42303_PEA_1_T9 and S42303_PEA_1_T10. Table 27 below descπbes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_17 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6, S42303_PEA_1_T8, S42303_PEA_1_T9 and S42303_PEA_1_T10. Table 28 below describes the starting and ending position of this segment on each franscript.
Table 28 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_20 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6, S42303_PEA_1_T8, S42303_PEA_1_T9 and S42303_PEA_1_T10. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_23 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S42303_PEA_1_T4, S42303_PEA_1_T5,
S42303_PEA_1_T6, S42303_PEA_1_T8 and S42303_PEA_1_T9. Table 30 below descnbes the starting and ending position of this segment on each franscript. Table 30 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_25 according to the present invention is supported by 20 libraries. The number of libraπes was determined as previously descnbed This segment can be found in the following franscπpt(s). S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6, S42303_PEA_1_T8 and S42303_PEA_1_T9 Table 31 below describes the starting and ending position of this segment on each franscπpt Table 31 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_27 according to the present invention is supported by 21 hbranes The number of libraries was determined as previously described This segment
can be found in the following franscript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6, S42303_PEA_1_T8 and S42303_PEA_1_T9. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_29 according to the present invention is supported by 25 libraries. The number of libraries V/ΆS determined as previously described. This segment can be found in the following franscript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6, S42303_PEA_1_T8 and S42303_PEA_1_T9. Table 33 below describes the starting and ending position of this segment on each franscript. Table 33 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_31 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6, S42303_PEA_1_T8 and S42303_PEA_1_T9. Table 34 below describes the starting and ending position of this segment on each franscript. Table 34 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_33 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6, S42303_PEA_1_T8 and S42303_PEA_1_T9. Table 35 below describes the starting and ending position of this segment on each franscript. Table 35 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_35 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6, S42303_PEA_1_T8 and S42303_PEA_1_T9. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_41 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously descnbed. This segment can be found in the following franscript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6, S42303_PEA_1_T8 and S42303_PEA_1_T9. Table 37 below describes the starting and ending position of this segment on each franscript. Table 37 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_44 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6, S42303_PEA_1_T8 and S42303_PEA_1_T9. Table 38 below describes the starting and ending position of this segment on each franscript. Table 38 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_46 according to the present invention is supported by 62 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6 and S42303_PEA_1_T8. Table 39 below describes the starting and ending position of this segment on each franscript. Table 39 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_48 according to the present invention is supported by 124 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6 and S42303_PEA_1_T8. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_50 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S42303_PEA_1_T9. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster S42303_PEA_l_node_4 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S42303_PEA_1_T8, S42303_PEA_1_T9 and S42303_PEA_1_T10. Table 42 below describes the starting and ending position of this segment on each franscnpt. Table 42 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_6 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following franscript(s): S42303_PEA_1_T8, S42303_PEA_1_T9 and S42303_PEA_1_T10. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_8 according to the present invention is supported by 1 libraπes. The number of libraries was determined as previously descnbed. This segment can be found in the following franscript(s): S42303_PEA_1_T6. Table 44 below describes the starting and ending position of this segment on each transcπpt.
Table 44 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_12 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S42303_PEA_1_T4. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_21 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S42303_PEA_1_T10. Table 46 below describes the starting and ending position of this segment on each transcπpt. Table 46 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_37 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S42303_PEA_1_T4, S42303_PEA_1_T5,
S42303_PEA_1_T6, S42303_PEA_1_T8 and S42303_PEA_1_T9. Table 47 below describes the starting and ending position of this segment on each franscript. Table 47 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_38 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): S42303_PEA_1_T4, S42303_PEA_1_T5, S42303_PEA_1_T6 and S42303_PEA_1_T9. Table 48 below describes the starting and ending position of this segment on each franscript. Table 48 - Segment location on transcripts
Segment cluster S42303_PEA_l_node_47 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S42303_PEA_1_T4, S42303_PEA_1_T5,
S42303_PEA_1_T6 and S42303_PEA_1_T8. Table 49 below describes the starting and ending position of this segment on each franscript. Table 49 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: CAD2_HUMAN
Sequence documentation:
Alignment of: S42303_PEA_1_P2 x CAD2_HUMAN Alignment segment 1/1:
Quality: 8304.00 Escore: 0 Matching length: 851 Total length: 851 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.88 Total Percent Similarity: 100.00 Total Percent Identity: 99.88
Gaps : 0
Alignment : 37 LSVKFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIY 86
56 LNVKFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIY 105 87 AQDKETQEK QVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQ 136 1111111111 II 11111111111111111111111111111111111111 106 AQDKETQEK QVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQ 155 137 RQKRD VIPPINLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGPGADQP 186 156 RQKRD VIPPINLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGPGADQP 205 187 PTGIFIINPISGQLSVTKPLDREQIARFHLRAHAVDINGNQVENPIDIVI 236
206 PTGIFIINPISGQLSVTKPLDREQIARFHLRAHAVDINGNQVENPIDIVI 255 . . . . . 237 NVIDMNDNRPEFLHQV NGTVPEGSKPGTYVMTVTAIDADDPNALNGMLR 286
256 NVIDMNDNRPEFLHQV NGTVPEGSKPGTYVMTVTAIDADDPNALNGMLR 305 287 YRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQATDME 336
306 YRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQATDME 355 337 GNPTYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIIVANLTV 386 11111111111111111111111111111111111111111111111111 356 GNPTYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIIVANLTV 405
387 TDKDQPHTPAWNAVYRISGGDPTGRFAIQTDPNSNDGLVTWKPIDFETN 436
406 TDKDQPHTPA NAVYRISGGDPTGRFAIQTDPNSNDGLVTWKPIDFETN 455 . . . . .
437 RMFVLTVAAENQVPLAKGIQHPPQSTATVSVTVIDVNENPYFAPNPKIIR 486
456 RMFVLTVAAENQVPLAKGIQHPPQSTATVSVTVIDVNENPYFAPNPKIIR 505
487 QEEGLHAGTMLTTFTAQDPDRYMQQNIRYTKLSDPANWLKIDPVNGQITT 536
506 QEEGLHAGTMLTTFTAQDPDRYMQQNIRYTKLSDPANWLKIDPVNGQITT 555
537 IAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIYLLDINDNAPQV 586 llllllllllllllllllllllllllllllllllllllllllllllllll
556 IAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIYLLDINDNAPQV 605
587 LPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKRN TIT 636
606 LPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKRNWTIT 655
637 RLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILRVKVCQCDS 686
656 RLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILRVKVCQCDS 705 . . . . .
687 NGDCTDVDRIVGAGLGTGAIIAILLCIIILLILVLMFW MKRRDKERQA 736
706 NGDCTDVDRIVGAGLGTGAIIAILLCIIILLILVLMFWWMKRRDKERQA 755
737 KQLLIDPEDDVRDNILKYDEEGGGEEDQDYDLSQLQQPDTVEPDAIKPVG 786
756 KQLLIDPEDDVRDNILKYDEEGGGEEDQDYDLSQLQQPDTVEPDAIKPVG 805
787 IRRMDERPIHAEPQYPVRSAAPHPGDIGDFINEGLKAADNDPTAPPYDSL 836
806 IRRMDERPIHAEPQYPVRSAAPHPGDIGDFINEGLKAADNDPTAPPYDSL 855 837 LVFDYEGSGSTAGSLSSLNSSSSGGEQDYDYLND GPRFKKLADMYGGGD 886
856 LVFDYEGSGSTAGSLSSLNSSSSGGEQDYDYLND GPRFKKLADMYGGGD 905 887 D 887
906 D 906
Sequence name: CAD2_HUMAN
Sequence documentation:
Alignment of: S42303_PEA_1_P3 x CAD2_HUMAN
Alignment segment l/l:
Quality: 8288.00
Escore : 0 Matching length: 848 Total length: 848
Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
8 KFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQD 57 111111 II 111111111111111111111111111111111111111111 59 KFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHAKFLIYAQD 108 58 KETQEK QVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQRQK 107 109 KETQEK QVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKHSGHLQRQK 158 108 RDWVIPPINLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGPGADQPPTG 157
159 RDWVIPPINLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGPGADQPPTG 208 . . . . . 158 IFIINPISGQLSVTKPLDREQIARFHLRAHAVDINGNQVENPIDIVINVI 207
209 IFIINPISGQLSVTKPLDREQIARFHLRAHAVDINGNQVENPIDIVINVI 258 208 DMNDNRPEFLHQV NGTVPEGSKPGTYVMTVTAIDADDPNALNGMLRYRI 257
259 DMNDNRPEFLHQVWNGTVPEGSKPGTYVMTVTAIDADDPNALNGMLRYRI 308 258 VSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQATDMEGNP 307 11111111111111111111111111111111111111111111111111 309 VSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQATDMEGNP 358
308 TYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIIVANLTVTDK 357
359 TYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIIVANLTVTDK 408 . . . . .
358 DQPHTPAWNAVYRISGGDPTGRFAIQTDPNSNDGLVTWKPIDFETNRMF 407
409 DQPHTPA NAVYRISGGDPTGRFAIQTDPNSNDGLVTWKPIDFETNRMF 458
408 VLTVAAENQVPLAKGIQHPPQSTATVSVTVIDVNENPYFAPNPKIIRQEE 457
459 VLTVAAENQVPLAKGIQHPPQSTATVSVTVIDVNENPYFAPNPKIIRQEE 508
458 GLHAGTMLTTFTAQDPDRYMQQNIRYTKLSDPAN LKIDPVNGQITTIAV 507 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
509 GLHAGTMLTTFTAQDPDRYMQQNIRYTKLSDPAN LKIDPVNGQITTIAV 558
508 LDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIYLLDINDNAPQVLPQ 557
559 LDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIYLLDINDNAPQVLPQ 608
558 EAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKRNWTITRLN 607
609 EAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKRN TITRLN 658 . . . . .
608 GDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILRVKVCQCDSNGD 657
659 GDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILRVKVCQCDSNGD 708
658 CTDVDRIVGAGLGTGAIIAILLCIIILLILVLMFW MKRRDKERQAKQL 707
709 CTDVDRIVGAGLGTGAIIAILLCIIILLILVLMFWMKRRDKERQAKQL 758
708 LIDPEDDVRDNILKYDEEGGGEEDQDYDLSQLQQPDTVEPDAIKPVGIRR 757
759 LIDPEDDVRDNILKYDEEGGGEEDQDYDLSQLQQPDTVEPDAIKPVGIRR 808 758 MDERPIHAEPQYPVRSAAPHPGDIGDFINEGLKAADNDPTAPPYDSLLVF 807
809 MDERPIHAEPQYPVRSAAPHPGDIGDFINEGLKAADNDPTAPPYDSLLVF 858 808 DYEGSGSTAGSLSSLNSSSSGGEQDYDYLND GPRFKKLADMYGGGDD 855
859 DYEGSGSTAGSLSSLNSSSSGGEQDYDYLND GPRFKKLADMYGGGDD 906
Sequence name: CAD2_HUMAN
Sequence documentation:
Alignment of: S42303_PEA_1_P4 x CAD2_HUMAN
Alignment segment 1/1:
Quality: 8016.00 Escore: 0 Matching length: 821 Total length: 821
Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment
1 MVYAVRSFPLSSEHAKFLIYAQDKETQEK QVAVKLSLKPTLTEESVKES 50
86 MVYAVRSFPLSSEHAKFLIYAQDKETQEK QVAVKLSLKPTLTEESVKES 135
51 AEVEEIVFPRQFSKHSGHLQRQKRDWVIPPINLPENSRGPFPQELVRIRS 100
136 AEVEEIVFPRQFSKHSGHLQRQKRD VIPPINLPENSRGPFPQELVRIRS 185
101 DRDKNLSLRYSVTGPGADQPPTGIFIINPISGQLSVTKPLDREQIARFHL 150
186 DRDKNLSLRYSVTGPGADQPPTGIFIINPISGQLSVTKPLDREQIARFHL 235
151 RAHAVDINGNQVENPIDIVINVIDMNDNRPEFLHQVWNGTVPEGSKPGTY 200
236 RAHAVDINGNQVENPIDIVINVIDMNDNRPEFLHQV NGTVPEGSKPGTY 285
201 VMTVTAIDADDPNALNGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAA 250
286 VMTVTAIDADDPNALNGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAA 335
251 GLDREKVQQYTLIIQATDMEGNPTYGLSNTATAVITVTDVNDNPPEFTAM 300
336 GLDREKVQQYTLIIQATDMEGNPTYGLSNTATAVITVTDVNDNPPEFTAM 385
301 TFYGEVPENRVDIIVANLTVTDKDQPHTPAWNAVYRISGGDPTGRFAIQT 350
386 TFYGEVPENRVDIIVANLTVTDKDQPHTPAWNAVYRISGGDPTGRFAIQT 435 . . . . .
351 DPNSNDGLVTWKPIDFETNRMFVLTVAAENQVPLAKGIQHPPQSTATVS 400
436 DPNSNDGLVTWKPIDFETNRMFVLTVAAENQVPLAKGIQHPPQSTATVS 485
401 VTVIDVNENPYFAPNPKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRYT 450
486 VTVIDVNENPYFAPNPKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRYT 535
451 KLSDPANWLKIDPVNGQITTIAVLDRESPNVKNNIYNATFLASDNGIPPM 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
536 KLSDPAN LKIDPVNGQITTIAVLDRESPNVICNNIYNATFLASDNGIPPM 585
501 SGTGTLQIYLLDINDNAPQVLPQEAETCETPDPNSINITALDYDIDPNAG 550
586 SGTGTLQIYLLDINDNAPQVLPQEAETCETPDPNSINITALDYDIDPNAG 635
551 PFAFDLPLSPVTIKRN TITRLNGDFAQLNLKIKFLEAGIYEVPIIITDS 600
636 PFAFDLPLSPVTIKRN TITRLNGDFAQLNLKIKFLEAGIYEVPIIITDS 685 . . . . .
601 GNPPKSNISILRVKVCQCDSNGDCTDVDRIVGAGLGTGAIIAILLCIIIL 650
686 GNPPKSNISILRVKVCQCDSNGDCTDVDRIVGAGLGTGAIIAILLCIIIL 735
651 LILVLMFW MKRRDKERQAKQLLIDPEDDVRDNILKYDEEGGGEEDQDY 700
736 LILVLMFW MKRRDKERQAKQLLIDPEDDVRDNILKYDEEGGGEEDQDY 785
701 DLSQLQQPDTVEPDAIKPVGIRRMDERPIHAEPQYPVRSAAPHPGDIGDF 750
786 DLSQLQQPDTVEPDAIKPVGIRRMDERPIHAEPQYPVRSAAPHPGDIGDF 835 751 INEGLKAADNDPTAPPYDSLLVFDYEGSGSTAGSLSSLNSSSSGGEQDYD 800
836 INEGLKAADNDPTAPPYDSLLVFDYEGSGSTAGSLSSLNSSSSGGEQDYD 885 801 YLND GPRFKKLADMYGGGDD 821
886 YLNDWGPRFKKLADMYGGGDD 906
Sequence name: CAD2_HUMAN_V1
Sequence documentation:
Alignment of: S42303_PEA_1_P5 x CAD2_HUMAN_V1
Alignment segment l/l:
Quality: 6793.00 Escore: 0 Matching length: 697 Total length: 697
Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps : 0
Alignment :
1 MCRIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVH 50 11111111111111 II I II 1111111111111111111111111111111 1 MCRIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVH 50 51 EGQPLLNVKFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHA 100 51 EGQPLLNVKFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHA 100
101 KFLIYAQDKETQEKWQVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKH 150
101 KFLIYAQDKETQEK QVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKH 150 . . . . . 151 SGHLQRQKRDWVIPPINLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGP 200
151 SGHLQRQKRDWVIPPINLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGP 200 201 GADQPPTGIFIINPISGQLSVTKPLDREQIARFHLRAHAVDINGNQVENP 250
201 GADQPPTGIFIINPISGQLSVTKPLDREQIARFHLRAHAVDINGNQVENP 250 251 IDIVINVIDMNDNRPEFLHQV NGTVPEGSKPGTYVMTVTAIDADDPNAL 300 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 251 IDIVINVIDMNDNRPEFLHQV NGTVPEGSKPGTYVMTVTAIDADDPNAL 300
301 NGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQ 350
301 NGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQ 350 . . . . .
351 ATDMEGNPTYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIIV 400
351 ATDMEGNPTYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIIV 400
401 ANLTVTDKDQPHTPAWNAVYRISGGDPTGRFAIQTDPNSNDGLVTWKPI 450
401 ANLTVTDKDQPHTPA NAVYRISGGDPTGRFAIQTDPNSNDGLVTWKPI 450
451 DFETNRMFVLTVAAENQVPLAKGIQHPPQSTATVSVTVIDVNENPYFAPN 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
451 DFETNRMFVLTVAAENQVPLAKGIQHPPQSTATVSVTVIDVNENPYFAPN 500
501 PKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRYTKLSDPANWLKIDPVN 550
501 PKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRYTKLSDPANWLKIDPVN 550
551 GQITTIAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIYLLDIND 600
551 GQITTIAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIYLLDIND 600 . . . . .
601 NAPQVLPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKR 650
601 NAPQVLPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKR 650
651 N TITRLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILR 697
651 N TITRLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILR 697
Sequence name: CAD2_HUMAN_V1
Sequence documentation:
Alignment of: S42303_PEA_1_P6 x CAD2_HUMAN_V1
Alignment segment l/l:
Quality: 8166.00 Escore : 0 Matching length: 838 Total length: 838 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment :
1 MCRIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVH 50
1 MCRIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVH 50
EGQPLLNVKFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHA 100
EGQPLLNVKFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHA 100
KFLIYAQDKETQEK QVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKH 150
KFLIYAQDKETQEK QVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKH 150
SGHLQRQKRD VIPPINLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGP 200 11111111111111111111111111111111111111111111111111 SGHLQRQKRD VIPPINLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGP 200
GADQPPTGIFIINPISGQLSVTKPLDREQIARFHLRAHAVDINGNQVENP 250
GADQPPTGIFIINPISGQLSVTKPLDREQIARFHLRAHAVDINGNQVENP 250
IDIVINVIDMNDNRPEFLHQVWNGTVPEGSKPGTYVMTVTAIDADDPNAL 300
IDIVINVIDMNDNRPEFLHQVWNGTVPEGSKPGTYVMTVTAIDADDPNAL 300 . . . . . NGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQ 350
NGMLRYRIVSQAPSTPSPNMFTINNETGDIITVAAGLDREKVQQYTLIIQ 350
ATDMEGNPTYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIIV 400
ATDMEGNPTYGLSNTATAVITVTDVNDNPPEFTAMTFYGEVPENRVDIIV 400
ANLTVTDKDQPHTPA NAVYRISGGDPTGRFAIQTDPNSNDGLVTWKPI 450 I I I I I || I I I I I I I || I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I ANLTVTDKDQPHTPA NAVYRISGGDPTGRFAIQTDPNSNDGLVTWKPI 450
451 DFETNRMFVLTVAAENQVPLAKGIQHPPQSTATVSVTVIDVNENPYFAPN 500
451 DFETNRMFVLTVAAENQVPLAKGIQHPPQSTATVSVTVIDVNENPYFAPN 500 . . . . .
501 PKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRYTKLSDPANWLKIDPVN 550
501 PKIIRQEEGLHAGTMLTTFTAQDPDRYMQQNIRYTKLSDPANWLKIDPVN 550
551 GQITTIAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIYLLDIND 600
551 GQITTIAVLDRESPNVKNNIYNATFLASDNGIPPMSGTGTLQIYLLDIND 600
601 NAPQVLPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKR 650 1111111111111111111111111111111111111111 II 11111111
601 NAPQVLPQEAETCETPDPNSINITALDYDIDPNAGPFAFDLPLSPVTIKR 650
651 N TITRLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILRVKV 700
651 NWTITRLNGDFAQLNLKIKFLEAGIYEVPIIITDSGNPPKSNISILRVKV 700
701 CQCDSNGDCTDVDRIVGAGLGTGAIIAILLCIIILLILVLMFWWMKRRD 750
701 CQCDSNGDCTDVDRIVGAGLGTGAIIAILLCIIILLILVLMFW MKRRD 750 . . . . .
751 KERQAKQLLIDPEDDVRDNILKYDEEGGGEEDQDYDLSQLQQPDTVEPDA 800
751 KERQAKQLLIDPEDDVRDNILKYDEEGGGEEDQDYDLSQLQQPDTVEPDA 800
801 IKPVGIRRMDERPIHAEPQYPVRSAAPHPGDIGDFINE 838
801 IKPVGIRRMDERPIHAEPQYPVRSAAPHPGDIGDFINE 838
Sequence name : CAD2_HUMAN_V1
Sequence documentation:
Alignment of: S42303_PEA_1_P7 x CAD2_HUMAN_V1
Alignment segment l/l:
Quality: 2292.00 Escore: 0 Matching length: 241 Total length: 241 Matching Percent Similarity: 98.76 Matching Percent Identity: 97.93 Total Percent Similarity: 98.76 Total Percent Identity: 97.93 Gaps : 0
Alignment :
1 MCRIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVH 50 1 MCRIAGALRTLLPLLAALLQASVEASGEIALCKTGFPEDVYSAVLSKDVH 50
51 EGQPLLNVKFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHA 100
51 EGQPLLNVKFSNCNGKRKVQYESSEPADFKVDEDGMVYAVRSFPLSSEHA 100 101 KFLIYAQDKETQEKWQVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKH 150
101 KFLIYAQDKETQEK QVAVKLSLKPTLTEESVKESAEVEEIVFPRQFSKH 150
151 SGHLQRQKRD VI PPINLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGP 200
151 SGHLQRQKRDWVI PPINLPENSRGPFPQELVRIRSDRDKNLSLRYSVTGP 200
201 GADQPPTGIFI INPISGQLSVTKPLDREQIARFHVRFQPAD 241 : I 201 GADQPPTGIFI INPISGQLSVTKPLDREQIARFHLRAHAVD 241
DESCRIPTION FOR CLUSTER HSMUCIA Cluster HSMUCIA features 14 transcπpt(s) and 22 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein vanants are given in table 3 Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Mucin 1 precursor (SwissProt accession identifier MUC1_HUMAN; known also according to the synonyms MUC-1; Polymorphic epithelial mucin; PEM; PEMT; Episialin; Tumor-associated mucin; Carcinoma- associated mucin; Tumor-associated epithelial membrane antigen; EMA; H23AG; Peanut- reactive urinary mucin; PUM; Breast carcinoma-associated antigen DF3; CD227 antigen), SEQ ID NO: 717, refened to herein as the previously known protein. Protein Mucin 1 precursor is known or believed to have the following function(s): may play a role in adhesive functions and in cell-cell interactions, metastasis and signaling. May
provide a protective layer on epithelial surfaces. Direct or indirect interaction with actin cytoskeleton. Isoform 7 behaves as a receptor and binds the secreted isoform 5. The binding induces the phosphorylation ofthe isoform 7, alters cellular morphology and initiates cell signaling. Can bind to GRB2 adapter protein. The sequence for protein Mucin 1 precursor is given at the end of the application, as "Mucin 1 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Mucin 1 (MUCl) that is present on the apical surface of normal epithelial cells, its exfracellular domain consists of a heavily O- linked glycosylated peptide core made up of variable number of multiple repeats of 20 amino acid sequence refened to as VNTR (Variable Number Tandem Repeat). This variability results in natural polymorphism of MUCl. Each VNTR has five potential 0-linkage sites. Disease state alters the enzymes which glycosylate Mucin 1 and therefore the polysaccharide side chains of mmor associated MUCl are generally shorter than those on the normally expressed molecule. Both abenant and up-regulated expression of MUCl are features of malignancy and MUCl related markers are based on it. Products ofthe Mucinl gene (including CA 15-3 and CA-27-29) are used both for immunohistochemistry and serum test diagnosis of multiple malignant cancers, particularly breast cancer. The variants of the present invention are suitable for these diagnostic uses. Table 4 - Amino acid mutations for Known Protein
Protein Mucin 1 precursor localization is believed to be Type I membrane protein. Two secreted forms (5 and 9) are also produced. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer, breast; Cancer, lung, non-small cell; Cancer, ovarian; Cancer, prostate. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: CD8 agonist; DNA antagonist; Immunostimulant; Interferon gamma agonist; MUC-1 inhibitor. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticancer; Monoclonal antibody, murine; Immunotoxin; Immunostimulant; Immunoconjugate. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: actin binding, which are annotation(s) related to Molecular Function; and cytoskeleton; integral plasma membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. Cluster HSMUCIA can be used as a diagnostic marker according to overexpression of franscripts of this cluster in cancer. Expression of such transcripts in normal tissues is also given according to the previously described methods. The term "number" in the left hand column of the table and the numbers on the y-axis of the figure refer to weighted expression of ESTs in each category, as "parts per million" (ratio of the expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million).
Overall, the following results were obtained as shown with regard to the histograms in Figure 11 and Table 5. This cluster is overexpressed (at least at a minimum level) in the following pathological conditions: a mixture of malignant tumors from different tissues, breast malignant tumors, pancreas carcinoma and prostate cancer.
Table 5 - Normal tissue distribution
Table 6 - P values and ratios for expression in cancerous tissue
For this cluster, at least one oligonucleotide was found to demonstrate overexpression of the cluster, although not of at least one transcript/segment as listed below, demonstrating utility for diagnosis of breast and ovarian cancer. Microanay (chip) data is also available for this cluster as follows. Various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer, as previously described. The following oligonucleotides were found to hit this cluster but not other segments/franscripts below, shown in Table 7. Table 7 - Oligonucleotides related to this cluster
above. These transcript(s) encode for protein(s) which are variant(s) of protein Mucin 1 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSMUCl A_PEA_1_P25 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s)
HSMUC 1A_PEA_1_T26. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein HSMUC1A_PEA_1_P25 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUCl A_PEA_1_P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein HSMUC1A_PEA_1_P25 is encoded by the following transcript(s): HSMUC 1A_PEA_1_T26, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1A_PEA_1_T26 is shown in bold; this coding portion starts at position 507 and ends at position 1115. The franscript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein HSMUC1A_PEA_1_P29 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUC1A_PEA_1_T33. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region.
Variant protein HSMUC1A_PEA_1_P29 is encoded by the following franscript(s): HSMUCl A_PEA_1_T33, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC1A_PEA_1_T33 is shown in bold; this coding portion starts at position 507 and ends at position 953. The franscript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HSMUC1A_PEA_1_P30 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1A_PEA_1_T34. The location of the variant protein was determined according to results from a numb er of different software programs and analyses, including analyses from
SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. .Variant protein HSMUC 1A_PEA_1_P30 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 -Amino acid mutations
Variant protein HSMUC1A_PEA_1_P30 is encoded by the following transcπpt(s): HSMUC 1A_PEA_1_T34, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1A_PEA_1_T34 is shown in bold; this coding portion starts at position 507 and ends at position 1004. The franscπpt also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P30 sequence provides support for the deduced sequence of this vanant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein HSMUCl A_PEA_1_P32 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUC1A_PEA_1_T36. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide. Variant protein HSMUC 1A_PEA_1_P32 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative ammo acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P32 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein HSMUC1A_PEA_1_P32 is encoded by the following franscript(s): HSMUCl A_PEA_1_T36, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC1A_PEA_1_T36 is shown in bold; this coding portion starts at position 507 and ends at position 977. The franscript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P32 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein HSMUC1A_PEA_1_P36 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1A_PEA_1_T40. The location of the variant protein was determined according to
results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region. Variant protein HSMUC1A_PEA_1_P36 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P36 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 -Amino acid mutations
Variant protein HSMUC 1A_PEA_1_P36 is encoded by the following franscript(s): HSMUC1A_PEA_1_T40, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1A_PEA_1_T40 is shown in bold; this coding portion starts at position 507 and ends at position 983. The franscript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P36 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein HSMUC 1A_PEA_1_P39 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1A_PEA_1_T43. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSMUCl A_PEA_1_P39 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P39 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 17 - Amino acid mutations
Variant protein HSMUC 1A_PEA_1_P39 is encoded by the following franscript(s): HSMUC 1A_PEA_1_T43, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSMUC1A_PEA_1_T43 is shown in bold; this coding portion starts at position 507 and ends at position 914. The franscript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in vanant protein HSMUC1A_PEA_1_P39 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Variant protein HSMUCl A_PEA_1_P45 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUCl A_PEA_1_T29. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region.
Variant protein HSMUC1A_PEA_1_P45 is encoded by the following franscript(s): HSMUC 1A_PEA_1_T29, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC1A_PEA_1_T29 is shown in bold; this coding portion starts at position 507 and ends at position 746. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P45 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Variant protein HSMUC 1A_PEA_1_P49 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUC 1A_PEA_1_T 12. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region.
Variant protein HSMUC 1A_PEA_1_P49 is encoded by the following franscript(s): HSMUCl A_PEA_1_T12, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1A_PEA_1_T 12 is shown in bold; this coding portion starts at position 507 and ends at position 884. The transcript also has the following SNPs as listed in Table 20 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P49 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Nucleic acid SNPs
Variant protein HSMUC1A_PEA_1_P52 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC 1A_PEA_1_T30. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans -membrane region.
Variant protein HSMUC1A_PEA_1_P52 is encoded by the following franscript(s): HSMUCl A_PEA_1_T30, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUCl A_PEA_1_T30 is shown in bold; this coding portion starts at position 507 and ends at position 719. The franscript also has the following SNPs as listed in Table 21 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P52 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Nucleic acid SNPs
Variant protein HSMUC1A_PEA_1_P53 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUC1A_PEA_1_T31. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans-membrane region.
Variant protein HSMUC1A_PEA_1_P53 is encoded by the following franscript(s): HSMUCl A_PEA_1_T31, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSMUCl A_PEA_1_T31 is shown in bold; this coding portion starts at position 507 and ends at position 665. The franscript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P53 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
Variant protein HSMUC1A_PEA_1_P56 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUCl A_PEA_1_T42. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSMUC1A_PEA_1_P56 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P56 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 -Amino acid mutations
Variant protein HSMUC1A_PEA_1_P56 is encoded by the following transcript(s): HSMUC 1A_PEA_1_T42, for which the sequence(s) is/are given at the end of the application.
The coding portion of franscript HSMUC1A_PEA_1_T42 is shown in bold; this coding portion starts at position 507 and ends at position 890. The franscript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P56 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Variant protein HSMUC1A_PEA_1_P58 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1A_PEA_1_T35. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both
signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HSMUC1A_PEA_1_P58 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 25, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P58 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Amino acid mutations
Variant protein HSMUC 1A_PEA_1_P58 is encoded by the following transcript(s): HSMUC1A_PEA_1_T35, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSMUC1A_PEA_1_T35 is shown in bold; this coding portion starts at position 507 and ends at position 980. The franscript also has the following SNPs as listed in Table 26 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC1A_PEA_1_P58 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Nucleic acid SNPs
Variant protein HSMUC1A_PEA_1_P59 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSMUC1A_PEA_1_T28. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans -membrane region prediction program predicts that this protein has a frans-membrane region.
Variant protein HSMUC 1A_PEA_1_P59 is encoded by the following transcript(s): HSMUCl A_PEA_1_T28, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSMUC 1A_PEA_1_T28 is shown in bold; this coding portion starts at position 507 and ends at position 794. The franscript also has the following SNPs as listed in Table 27 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P59 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Nucleic acid SNPs
Variant protein HSMUC1A_PEA_1_P63 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by franscript(s) HSMUC1A_PEA_1_T47. An alignment is given to the known protein (Mucin 1 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSMUC1A_PEA_1_P63 and MUC1_HUMAN: l.An isolated chimeric polypeptide encoding for HSMUC 1A_PEA_1_P63, comprising a first amino acid sequence being at least 90 % homologous to MTPGTQSPFFLLLLLTVLTVVTGSGHASSTPGGEKETSATQRSSV conesponding to amino acids 1 - 45 of MUC1_HUMAN, which also conesponds to amino acids 1 - 45 of
HSMUC 1A_PEA_1_P63, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK conesponding to amino acids 46 - 85 of HSMUC1A_PEA_1_P63, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSMUC 1A_PEA_1_P63, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95 % homologous to the sequence EEEVSADQVSVGASGVLGSFKEARNAPSFLSWSFSMGPSK in HSMUCIA PEA 1 P63.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither frans-membrane region prediction program predicts that this protein has a trans -membrane region. The glycosylation sites of variant protein HSMUC 1A_PEA_1_P63, as compared to the known protein Mucin 1 precursor, are described in Table 28 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 28 - Glycosylation site(s)
Variant protein HSMUC 1A_PEA_1_P63 is encoded by the following franscript(s): HSMUC 1A_PEA_1_T47, for which the sequence(s) is/are given at the end of the application. The coding portion of franscript HSMUCl A_PEA_1_T47 is shown in bold; this coding portion starts at position 507 and ends at position 761. The franscript also has the following SNPs as listed in Table 29 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSMUC 1A_PEA_1_P63 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Nucleic acid SNPs
As noted above, cluster HSMUCIA features 22 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSMUClA_PEA_l_node_0 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC1A_PEA_1_T12, HSMUC 1A_PEA_1_T26, HSMUC 1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUC 1A_PEA_1_T30, HSMUC1A_PEA_1_T31, HSMUC1A_PEA_1_T33, HSMUC1A_PEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC1A_PEA_1_T36, HSMUC1A_PEA_1_T40, HSMUC 1A_PEA_1_T42, HSMUC 1A_PEA_1_T43 and HSMUC1A_PEA_1_T47. Table 30 below describes the starting and ending position of this segment on each franscript. Table 30 - Segment location on transcripts
Segment cluster HSMUClA_PEA_l_node_14 according to the present invention is supported by 55 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUCl A_PEA_1_T12. Table 31 below describes the starting and ending position of this segment on each franscript. Table 31 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_24 according to the present invention is supported by 135 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC1A_PEA_1_T12. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster HSMUClA_PEA_l_node_29 according to the present invention is supported by 156 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC1A_PEA_1_T12, HSMUC 1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUCl A_PEA_1_T30, HSMUC 1A_PEA_1_T31, HSMUC 1A_PEA_1_T33, HSMUC 1A_PEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC 1A_PEA_1_T36,
HSMUC1A_PEA_1_T40, HSMUC 1A_PEA_1_T42 and HSMUC1A_PEA_1_T43. Table 33 below describes the starting and ending position of this segment on each franscπpt. Table 33 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_35 according to the present invention is supported by 51 libraries The number of libraries was determined as previously described. This segment can be found in the following franscπpt(s) HSMUC 1A_PEA_1_T47 Table 34 below descπbes the starting and ending position of this segment on each transcπpt. Table 34 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment, shown in Table 35. Table 35 - Oligonucleotides related to this segment
Segment cluster HSMUClA_PEA_l_node_38 according to the present invention is supported by 140 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC 1A_PEA_1_T 12, HSMUC1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUCl A_PEA_1_T30, HSMUCl A_PEA_1_T31, HSMUCl A_PEA_1_T33, HSMUC1A_PEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC1A_PEA_1_T36, HSMUC 1A_PEA_1_T40, HSMUC 1A_PEA_1_T42, HSMUC 1A_PEA_1_T43 and HSMUC 1A_PEA_1_T47. Table 36 below descπbes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HSMUCl A_PEA_l_node_3 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC 1A_PEA_1_T29, HSMUC1A_PEA_1_T34, HSMUC 1A_PEA_1_T40 and HSMUC 1A_PEA_1__T43. Table 37 below describes the starting and ending position of this segment on each franscript. Table 37 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_4 according to the present invention can be found in the following franscπpt(s): HSMUC1A_PEA_1_T12, HSMUC 1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUC1A_PEA_1_T30, HSMUC1A_PEA_1_T31, HSMUC1A_PEA_1_T33, HSMUC 1A_PEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC 1A_PEA_1_T36, HSMUC1A_PEA_1_T40,
HSMUCl A_PEA_1_T42, HSMUC 1A_PEA_1_T43 and HSMUC 1A_PEA_1_ T47. Table 38 below describes the starting and ending position of this segment on each franscript. Table 38 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_5 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC 1A_PEA_1_T 12, HSMUC1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUC 1A_PEA_1_T30, HSMUC 1A_PEA_1_T31, HSMUC1A_PEA_1_T33, HSMUC 1A_PEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC1A_PEA_1_T36, HSMUC1A_PEA_1_T40, HSMUC 1A_PEA_1_T42, HSMUC 1A_PEA_1_T43 and HSMUC 1A_PEA_1_T47. Table 39 below describes the starting and ending position of this segment on each transcript.
Table 39 - Segment location on transcripts
Segment cluster HSMUCl A_PEA_l_node_6 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC1A_PEA_1_T12, HSMUC1A_PEA_1_T26, HSMUC 1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUC1A_PEA_1_T30, HSMUC 1A_PEA_1_T31, HSMUC 1A_PEA_1_T33, HSMUC1A_PEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC1A_PEA_1_T36, HSMUC 1A_PEA_1_T40, HSMUC 1A_PEA_1_T42, HSMUC 1A_PEA_1_T43 and HSMUCl A_PEA_1_T47. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in vanous disease conditions, particularly cancer. The following ohgonucleotides were found to hit this segment, shown in Table 41, thereby demonsfrating utility of the segment, transcript(s) containing the segment and/or proteins encoded by a franscript containing the segment with regard to diagnosis of ovarian cancer. Table 41 - Oligonucleotides related to this segment
Segment cluster HSMUC lA_PEA_l_node_7 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This
segment can be found in the following franscript(s): HSMUC1A_PEA_1_T12, HSMUC 1A_PEA_1_T26, HSMUC 1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUC1A_PEA_1_T30, HSMUC 1A_PEA_1_T31, HSMUC 1A_PEA_1_T33, HSMUC1A_PEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC1A_PEA_1_T36, HSMUC1A_PEA_1_T40, HSMUC 1A_PEA_1_T42 and HSMUC1A_PEA_1_T43. Table 42 below describes the starting and ending position of this segment on each franscript. Table 42 - Segment location on transcripts
Microanay (chip) data is also available for this segment as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment, shown in Table 43, thereby demonsfrating utility of the segment, transcript(s) containing the segment and/or proteins encoded by a transcript containing the segment with regard to diagnosis of ovarian cancer. Table 43 - Oligonucleotides related to this segment
HSMUCIA 0 37 0 ovaπan carcinoma OVA
Segment cluster HSMUC lA_PEA_l_node_ 17 according to the present invention can be found in the following franscπpt(s). HSMUC1A_PEA_1_T28, HSMUC1A_PEA_1_T33 and HSMUC 1A_PEA_1_T40. Table 44 below descπbes the starting and ending position of this segment on each franscπpt. Table 44 - Segment location on transcripts
Segment cluster HSMUClA_PEA_l_node_18 according to the present invention is supported by 90 libraπes. The number of hbranes was determined as previously descnbed. This segment can be found in the following transcπpt(s). HSMUC1A_PEA_1_T12, HSMUC1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC1A_PEA_1_T29, HSMUC1A_PEA_1_T30, HSMUC 1A_PEA_1_T33, HSMUC1A_PEA_1_T35, HSMUC1A_PEA_1_T40 and HSMUCl A_PEA_1_T42 Table 45 below descnbes the starting and ending position of this segment on each franscnpt. Table 45 - Segment location on transcripts
Segment cluster HSMUClA_PEA_l_node_20 according to the present invention can be found in the following franscript(s): HSMUC 1A_PEA_1_T12, HSMUC1A_PEA_1_T26, HSMUC 1A_PEA_1_T28, HSMUC 1A_PEA_1_T33, HSMUC1A_PEA_1_T35 and HSMUC 1A_PEA_1_T42. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_21 according to the present invention is supported by 97 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC1A_PEA_1_T12, HSMUC1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC1A_PEA_1_T33, HSMUC 1A_PEA_1_T35 and HSMUC 1A_PEA_1_T42. Table 47 below describes the starting and ending position of this segment on each franscript.
Segment cluster HSMUC lA_PEA_l_node_23 according to the present invention can be found in the following franscript(s): HSMUC1A_PEA_1_T12. Table 48 below describes the starting and ending position of this segment on each franscript. Table 48 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_26 according to the present invention is supported by 129 libraπes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC1A_PEA_1_T12, HSMUC1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUC 1A_PEA_1_T30 and HSMUC 1A_PEA_1_T31. Table 49 below describes the starting and ending position of this segment on each franscript. Table 49 - Segment location on transcripts
Segment cluster HSMUClA_PEA_l_node_27 according to the present invention is supported by 140 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC1A_PEA_1_T12, HSMUC1A_PEA_1_T26, HSMUC 1A_PEA_1_T28, HSMUC1A_PEA_1_T29, HSMUC 1A_PEA_1_ T30, HSMUC1A_PEA_ _T31, HSMUC 1A_PEA_1_T33, HSMUC1A_PEA_1_T34, HSMUC1A_PEA_1_T35 and HSMUClA_PEA_l_T36. Table 50 below describes the starting and ending position of this segment on each franscript. Table 50 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_31 according to the present invention can be found in the following transcript(s): HSMUC1A_PEA_1_T12, HSMUC1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC1A_PEA_1_T29, HSMUC1A_PEA_1_T30, HSMUC1A_PEA_1_T31, HSMUC1A_PEA_1_ T33, HSMUC 1A_PEA_1_T34, HSMUC 1A_PEA_1_T35, HSMUC1A_PEA_1_T36, HSMUC1A_PEA_1_T40, HSMUC 1A_PEA_1_T42 and HSMUC 1A_PEA_1_T43. Table 51 below describes the starting and ending position of this segment on each franscript. Table 51 - Segment location on transcripts
Segment cluster HSMUC lA_PEA_l_node_34 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSMUC1A_PEA_1_T47. Table 52 below describes the starting and ending position of this segment on each franscript. Table 52 - Segment location on transcripts
Segment cluster HSMUClA_PEA_l_node_36 according to the present invention is supported by 135 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSMUC1A_PEA_1_T12, HSMUC1A_PEA_1_T26, HSMUC1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUC 1A_PEA_1_T30, HSMUC 1A_PEA_1_T31, HSMUC1A_PEA_1_T33, HSMUC1A_PEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC1A_PEA_1_T36, HSMUC 1A_PEA_1_T40, HSMUC1A_PEA_1_T42, HSMUC 1A_PEA_1_T43 and HSMUC 1A_PEA_1_T47. Table 53 below describes the starting and ending position of this segment on each franscript. Table 53 - Segment location on transcripts
Segment cluster HSMUClA_PEA_l_node_37 according to the present invention is supported by 146 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following franscript(s) : HSMUC1A_PEA_1_T12, HSMUC 1A_PEA_1_T26, HSMUC 1A_PEA_1_T28, HSMUC 1A_PEA_1_T29, HSMUC 1A_PEA_1_T30, HSMUC 1A_PEA_1_T31, HSMUC 1A_PEA_1_T33, HSMUC1A_PEA_1_T34, HSMUC1A_PEA_1_T35, HSMUC 1A_PEA_1_T36, HSMUC 1A_PEA_1_T40, HSMUC 1A_PEA_1_T42, HSMUC 1A_PEA_1_T43 and HSMUC 1A_PEA_1_T47. Table 54 below describes the starting and ending position of this segment on each franscript. Table 54 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name: MUC1_HUMAN
Sequence documentation:
Alignment of: HSMUC1A_PEA_1_P63 x MUC1_HUMAN
Alignment segment l/l: Quality: 429.00
Escore: 0 Matching length: 59 Total length: 59 Matching Percent Similarity: 86.44 Matching Percent Identity: 81.36 Total Percent Similarity: 86.44 Total Percent Identity: 81.36 Gaps : 0
Alignment:
1 MTPGTQSPFFLLLLLTVLTWTGSGHASSTPGGEKETSATQRSSVEEEVS 50
1 MTPGTQSPFFLLLLLTVLTWTGSGHASSTPGGEKETSATQRSSVPSSTE 50
51 ADQVSVGAS 59 : Ih =1 51 KNAVSMTSS 59
Section 3 - Individual markers This section features a plurality of individual markers which are suitable for diagnosing various diseases and conditions, as described herein. Each new subsection relates to a particular marker and/or group of markers.
Subsection A - Inhibins Inhibin and activin are members of the transforming growth factor beta (TGFbeta) family of cytokines produced by the gonads, with a recognised role in regulating pituitary FSH secretion. Inhibin consists of two homologous subunits, alpha and either betaA or betaB (inhibin A and B). Activins are hetero- or homodimers of the beta-subunits. Activin A is a dimer of beta A subunits. In young girls, the concentrations of inhibin A increase as puberty progresses. Therefore, the measurement of inhibin A could aid in determining gonadal maturity and diagnosing precious puberty in girls. Once women reach reproductive age, inhibin A levels change with the menstrual cycle. Levels rise through the follicular phase to a maximum in the luteal phase with an intermediate peak at ovulation. In the early perimenopausal phase of the menopausal transition, the circulating follicular phase levels of inhibin decline. In postmenopausal women, inhibin A levels fall to <5 pg/mL. Normal men do not produce measurable levels of inhibin A. Genes from the inhibin family are strongly associated with ovarian cancer. Serum inhibin is an ovarian product which decreases to non detectable levels after menopause, however, certain ovarian cancers (mucinous carcinomas and sex cord stromal tumours such as granulosa cell tumours) continue to produce inhibin which provides a basis for a serum diagnostic test. Inhibin and free alpha subunit are known products of two ovarian tumours (granulosa cell tumours and mucinous carcinomas). This observation has provided the basis for the development of a serum diagnostic test to monitor the occunence and treatment of these cancers. Transgenic mice with an inhibin alpha subunit gene deletion develop stromal/granulosa cell tumours suggesting that the alpha subunit is a tumour suppressor gene. The role of inhibin and activin has been shown both as a measure of proven clinical utility in diagnosis and management and also as a factor in the pathogenesis of these tumours. Available data show that inhibin assays which detect all inhibin forms, i.e. assays which detect the alpha subunit both as the free form and as an alphabeta subunit dimer provide the highest sensitivity/specificity characteristics as an ovarian cancer diagnostic test.
In addition, these genes were associated with a diagnostic value in other ovary related syndromes. Controlled experimental studies have clarified the regulation and physiology of inhibin A and inhibin B, providing evidence for their use as markers of ovarian function. Ongoing work suggests alterations in inhibin and follistatin that may be linked to the pathophysiology of polycystic ovary syndrome. Inhibins and activins therefore might be used as part of the assessment of fertility and infertility in women in general. Another ovary related usage is in to allow differential diagnosis between functional and organic cysts. Inhibins can be used for the assessment of infertility in men too, as low levels of inhibin B, which appears to be produced only in the testes, may indicate blockage or abnormalities in the seminiferous tubules. Inhibins and activins have been associated with malignant processes other than ovarian cancer (including changes in serum levels), including but not limited to, prostate cancer (especially the loss of the inhibin alpha subunit expression); testicular tumors (especially Sertoli cell tumors); breast cancer; adrenal tumors; pituitary tumors; pancreas cancer; placental tumors; endometrial tumors; kidney tumors; and liver tumors. During pregnancy, the fetoplacental unit produces relatively large amounts of inhibin A. Assessment of inhibin A concentration relative to gestational age of fetus has been applied extensively to prenatal screening for Down's syndrome and in predicting pre- eclampsia. Hereinafter, the term "inhibins" refers to variants of both inhibins and activins unless otherwise specified. The present invention provides inhibin variants, which may optionally be used as diagnostic markers. Preferably these inhibin variants are useful as diagnostic markers for various diseases and conditions including but not limited to various cancers such as ovarian cancer (optionally including changes in semm levels, preferably including mucinous carcinomas and sex cord stromal tumours such as granulosa cell tumours), prostate cancer (preferably the loss of the inhibin alpha subunit expression), testicular tumors (preferably Sertoli cell tumors), breast cancer, adrenal tumors, pituitary tumors, pancreas cancer, placental tumors, endometrial tumors, kidney tumors and liver tumors; determining gonadal maturity and diagnosing precious puberty in girls; fertility or infertility in men or women, preferably through detection of polycystic ovary syndrome (in women); differential diagnosis between
functional and organic cysts; assessment of inhibin A concentration relative to gestational age of fetus for prenatal screening for Down's syndrome and in predicting pre-eclampsia.
DESCRIPTION FOR CLUSTER HUMEDF Cluster HUMEDF features 3 transcript(s) and 8 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants ofthe known protein Inhibin beta A chain precursor (SwissProt accession identifier IHBA_HUMAN; known also according to the synonyms Activin beta-A chain; Erythroid differentiation protein; EDF), SEQ ID NO:743, refened to herein as the previously known protein. Protein Inhibin beta A chain precursor is known or believed to have the following function(s): inhibins and activins inhibit and activate, respectively, the secretion of follitropin by the pituitary gland. Inhibins/activins are involved in regulating a number of
diverse functions such as hypothalamic and pituitary hormone secretion, gonadal hormone secretion, germ cell development and maturation, erythroid differentiation, insulin secretion, nerve cell survival, embryonic axial development or bone growth, depending on their subunit composition. Inhibins appear to oppose the functions of activins. The sequence for protein Inhibin beta A chain precursor is given at the end of the application, as "Inhibin beta A chain precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Cancer; Osteoporosis; Contraceptive, female; Contraceptive, male; Diagnosis, cancer. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities ofthe previously known protein are as follows: Erythroid differentiation factor agonist; Follicle-stimulating hormone agonist; Growth factor agonist; Inhibin agonist; Interleukin 6 antagonist; Osteoblast stimulant. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Haematological; Female contraceptive; Male contraceptive; Antianaemic; Osteoporosis treatment; Fertility enhancer; Anticancer; Diagnostic; Antisickling; Neurological; Alimentary/Metabolic. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: skeletal development; ovarian follicle development; induction of apoptosis; defense response; cell cycle anest; cell surface receptor linked signal transduction; cell-cell signaling; neurogenesis; mesoderm development; cell growth and/or maintenance; response to external stimulus; cell differentiation; erythrocyte differentiation; growth, which are annotation(s) related to Biological Process; defense/immunity protein; cytokine; transforming growth factor beta receptor ligand; hormone; protein binding; growth
factor; activin inhibitor, which are annotation(s) related to Molecular Function; and exfracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HUMEDF features 3 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Inhibin beta A chain precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMEDF PEA 2 P5 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HUMEDF_PEA_2_T10. An alignment is given to the known protein (Inhibin beta A chain precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMEDF_PEA_2_P5 and IHBA_HUMAN: 1.An isolated chimeric polypeptide encoding for HUMEDF_PEA_2_P5, comprising a first amino acid sequence being at least 90 % homologous to
MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNSQPEMVEA \^KKHILNMLHLKJ<JIPDVTQPVPKAALLNAIRKLHVGKVGENGYVEIEDDIGRRAEM NELMEQTSEIITFAESGT conesponding to amino acids 1 - 131 of IHB A_HUMAN, which also conesponds to amino acids 1 - 131 of HUMEDF_PEA_2_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKS conesponding to amino acids 132 - 134 of HUMEDF_PEA_2_P5, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMEDF PEA 2 P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%,
more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKS in HUMEDF_PEA_2_P5.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.
The glycosylation sites of variant protein HUMEDF PEA 2 P5, as compared to the known protein Inhibin beta A chain precursor, are described in Table 5 (given according to their posιtion(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 5 - Glycosylation site(s)
Variant protein HUMEDF_PEA_2_P5 is encoded by the following transcript(s): HUMEDF_PEA_2_T10, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMEDF_PEA_2_T10 is shown in bold; this coding portion starts at position 246 and ends at position 647. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMEDF_PEA_2_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
362 G -> C No
Variant protein HUMEDF PEA 2 P6 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMEDF_PEA_2_T11. An alignment is given to the known protein (Inhibin beta A chain precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMEDF_PEA_2_P6 and IHBA_HUMAN: 1.An isolated chimeric polypeptide encoding for HUMEDF_PEA_2_P6, comprising a first amino acid sequence being at least 90 % homologous to
MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNSQPEMVEA VKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGYVEIEDDIGRRAEM NELMEQTSEIITFAESG conesponding to amino acids 1 - 130 of IHB A HUMAN, which also conesponds to amino acids 1 - 130 of HUMEDF PEA 2 P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence HSEA conesponding to amino acids 131 - 134 of HUMEDF_PEA_2_P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMEDF_PEA_2_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence HSEA in HUMEDF_PEA_2_P6.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.
The glycosylation sites of variant protein HUMEDF_PEA_2_P6, as compared to the known protein Inhibin beta A chain precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Glycosylation site(s)
Variant protein HUMEDF_PEA_2_P6 is encoded by the following franscript(s): HUMEDF_PEA_2_T11, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMEDF_PEA_2_T1 1 is shown in bold; this coding portion starts at position 246 and ends at position 647. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMEDF_PEA_2_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HUMEDF_PEA_2_P8 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by franscript(s) HUMEDF_PEA_2_T5. An alignment is given to the known protein (Inhibin beta A chain precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMEDF_PEA_2_P8 and IHBA_HUMAN:
1.An isolated chimeric polypeptide encoding for HUMEDF PEA 2 P8, comprising a first amino acid sequence being at least 90 % homologous to MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNSQPEMVEA VKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGYVEIEDDIGRRAEM NELMEQTSEIITFAESGT conesponding to amino acids 1 - 131 of IHB A_HUM AN, which also conesponds to amino acids 1 - 131 of HUMEDF_PEA_2_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VKS conesponding to amino acids 132 - 134 of HUMEDFJPEA 2 P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMEDF_PEA_2_P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VKS in HUMEDF_PEA_2_P8.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.
The glycosylation sites of variant protein HUMEDF PEA 2 P8, as compared to the known protein Inhibin beta A chain precursor, are described in Table 9 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s)
Variant protein HUMEDF_PEA_2_P8 is encoded by the following transcript(s): HUMEDF_PEA_2_T5, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMEDF_PEA_2_T5 is shown in bold; this coding portion starts at position 246 and ends at position 647. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMEDF PEA 2 P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
As noted above, cluster HUMEDF features 8 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMEDF_PEA_2_node_6 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMEDF_PEA_2_T5, HUMEDF_PEA_2_T10 and HUMEDF_PEA_2_T11. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster HUMEDF_PEA_2_node_l 1 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMEDF_PEA_2_T10 and HUMEDF_PEA_2_T11. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster HUMEDF_PEA_2_node_18 according to the present invention is supported by 90 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMEDF PEA 2 T5. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster HUMEDF_PEA_2_node_19 according to the present invention is supported by 86 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HUMEDF_PEA_2_T5. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster HUMEDF_PEA_2_node_22 according to the present invention is supported by 89 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMEDF_PEA_2_T5. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HUMEDF_PEA_2_node_2 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMEDF_PEA_2_T5, HUMEDF_PEA_2_T10 and HUMEDF_PEA_2_T11. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster HUMEDF_PEA_2_node_8 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMEDF_PEA_2_T5 and HUMEDF_PEA_2_T10. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster HUMEDF_PEA_2_node_20 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMEDF_PEA_2_T5. Table 18 below describes the starting and ending position of this segment on each franscript. Table 18 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: IHBA_HUMAN Sequence documentation: Alignment of: HUMEDF_PEA_2_P5 x IHBA_HUMAN Alignment segment 1/1: Quality: 1285.00 Escore: 0 Matching length: 133 Total length: 133 Matching Percent Similarity: 99.25 Matching Percent Identity: 98.50 Total Percent Similarity: 99.25 Total Percent Identity: 98.50 Gaps : 0 Alignment: 1 MPLLWLRGFLLASC IIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPLL LRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50 51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY 100
101 VEIEDDIGRRAEMNELMEQTSEIITFAESGTVK
133 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 101 VEIEDDIGRRAEMNELMEQTSEIITFAESGTAR
133
Sequence name: IHBA_HUMAN
Sequence documentation:
Alignment of: HUMEDF_PEA_2_P6 x IHBA_HUMAN
Alignment segment 1/1: Quality: 1275.00 Escore: 0 Matching length: 130 Total length: 130 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MPLL LRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50 51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY 100 101 VEIEDDIGRRAEMNELMEQTSEIITFAESG
130 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 VEIEDDIGRRAEMNELMEQTSEIITFAESG
130
Sequence name: IHBA_HUMAN Sequence documentation: Alignment of: HUMEDF_PEA_2_P8 x IHBA_HUMAN Alignment segment 1/1: Quality: 1285.00 Escore: 0 Matching length: 133 Total length: 133 Matching Percent Similarity: 99.25 Matching Percent Identity: 98.50 Total Percent Similarity: 99.25 Total Percent Identity: 98.50 Gaps : 0 Alignment: 1 MPLLWLRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I II I I I II 1 MPLL LRGFLLASCWIIVRSSPTPGSEGHSAAPDCPSCALAALPKDVPNS 50 51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY 100 I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I II I I I I I 51 QPEMVEAVKKHILNMLHLKKRPDVTQPVPKAALLNAIRKLHVGKVGENGY 100 101 VEIEDDIGRRAEMNELMEQTSEIITFAESGTVK 133 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 101 VEIEDDIGRRAEMNELMEQTSEIITFAESGTAR 133 DESCRIPTION FOR CLUSTER HUMINHA Cluster HUMINHA features 4 transcript(s) and 13 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
These sequences are variants ofthe known protein Inhibin alpha chain precursor (SwissProt accession identifier IHA HUMAN), SEQ ID NO:764, refened to herein as the previously known protein. Protein Inhibin alpha chain precursor is known or believed to have the following function(s) as described with regard to the previous cluster. The sequence for protein Inhibin alpha chain precursor is given at the end ofthe application, as "Inhibin alpha chain precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
al therapeutic use(s): Contraceptive, female; Contraceptive, male; Diagnosis, cancer;
Diagnosis. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities ofthe previously known protein are as follows: Inhibin agonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Female contraceptive; Male contraceptive; Diagnostic; Antisickling; Anticancer. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: skeletal development; ovarian follicle development; induction of apoptosis; cell cycle anest; cell surface receptor linked signal transduction; cell- cell signaling; neurogenesis; cell growth and/or maintenance; response to external stimulus; cell differentiation; erythrocyte differentiation, which are annotation(s) related to Biological Process; defense/immunity protein; cytokine; hormone; protein binding; growth factor; activin inhibitor, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster HUMINHA features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Inhibin alpha chain precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HUMINHA_PEA_1_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMINHA_PEA_1_T5. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal
peptide, and neither trans-membrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HUMINHA_PEA_1_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMιNHA_PEA_l_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein HUMINHA_PEA_1_P4 is encoded by the following transcript(s): HUMINHA_PEA_1_T5, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMINHA_PEA_1_T5 is shown in bold; this coding portion starts at position 157 and ends at position 912. The franscript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMINHA_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HUMINHA_PEA_1_P5 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMINHA_PEA_1_T6. An alignment is given to the known protein (Inhibin alpha chain precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMINHA_PEA_1_P5 and IHA_HUMAN: l.An isolated chimeric polypeptide encoding for HUMINHA_PEA_1_P5, comprising a first amino acid sequence being at least 90 % homologous to MVLHLLLFLLLTPQGGHSCQGLELARELVLAKVRALFLDALGPPAVTREGGDPGVR RLPRRHALGGFTHRGSEPEEEEDVSQAILFPAT conesponding to amino acids 1 - 89 of IHA_HUMAN, which also conesponds to amino acids 1 - 89 of HUMINHA_PEA_1_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GSAPQRPVAMTTAQRDSLLWKLAGLLRESGDVVLSGCSTLSLLTPTLQQLNHVFEL HLGPWGPGQTGFV conesponding to amino acids 90 - 158 of HUMINHA_PEA_1_P5,
wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMINHA_PEA_1_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GSAPQRPVAMTTAQRDSLLWKLAGLLRESGDVVLSGCSTLSLLTPTLQQLNHVFEL HLGPWGPGQTGFV in HUMINHA_PEA_1_P5.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMΓNHA_PEA 1_P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMINHA_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
The glycosylation sites of variant protein HUMINHA_PEA_1_P5, as compared to the known protein Inhibin alpha chain precursor, are described in Table 8 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein).
Table 8 - Glycosylation site(s)
Variant protein HUMINHA PEA 1 P5 is encoded by the following transcript(s): HUMINHA_PEA_1_T6, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMI HA_PEA_1_T6 is shown in bold; this coding portion starts at position 157 and ends at position 88889. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMINHA_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein HUMINHA_PEA_1_P8 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMINHA_PEA_1_T2. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal
peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMINHA_PEA_1_P8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMINHA_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Variant protein HUMINHA_PEA_1_P8 is encoded by the following franscript(s): HUMΓNHA_PEA_1_T2, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMI HA_PEA_1_T2 is shown in bold; this coding portion starts at position 157 and ends at position 696. The transcript also has the following SNPs as listed in Table 1 1 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMINHA PEA 1 P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Variant protein HUMINHA_PEA_1_P10 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMrNHA_PEA_l_T4. An alignment is given to the known protein (Inhibin alpha chain precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMINHA_PEA_1_P10 and IHA HUMAN: l.An isolated chimeric polypeptide encoding for HUMINHA_PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to MVLHLLLFLLLTPQGGHSCQGLELARELVLAKVRALFLDALGPPAVTREGGDPGVR RLPRRHALGGFTHRGSEPEEEEDVSQAILFPATDASCEDKSAARGLAQEAEEGLFRY MFRPSQHTR conesponding to amino acids 1 - 122 of IHA HUMAN, which also
conesponds to amino acids 1 - 122 of HUMINHA_PEA_1_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NHPVEGREPDAQLP conesponding to amino acids 123 - 136 of HUMINHA_PEA_1_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMINHA_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NHPVEGREPDAQLP in HUMINHA_PEA_1_P10.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a frans-membrane region. Variant protein HUMINHA PEA 1JP10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 12, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMINHA_PEA_1_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Amino acid mutations
The glycosylation sites of variant protein HUMINHA_PEA_1_P10, as compared to the known protein Inhibin alpha chain precursor, are described in Table 13 (given according to their position(s) on the amino acid sequence in the first column; the second column
indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 13 - Glycosylation site(s)
Variant protein HUMINHA_PEA_1_P10 is encoded by the following transcript(s): HUMINHA PEA 1 T4, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMINHA_PEA_1_T4 is shown in bold; this coding portion starts at position 157 and ends at position 564. The franscript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMINHA_PEA_1_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
As noted above, cluster HUMINHA features 13 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end of the application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMINHA_PEA_l_node_2 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMINHA_PEA_1_T2, HUMINHA_PEA_1_T4, HUMINHA_PEA_1_T5 and HUMINHA_PEA_1_T6. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
> Segment cluster HUMINHA_PEA_l_node_3 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMINHA_PEA_1_T2, HUMINHA_PEA_1_T4, HUMINHA_PEA_1_T5 and HUMINHA_PEA_1_T6. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster HUMINHA_PEA_l_node_4 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described.
This segment can be found in the following transcript(s): HUMINHA_PEA_1_T2. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster HUMINHA_PEA_l_node_7 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMINHA PEA 1 T2. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster HUMINHA_PEA_l_node_9 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMINHA_PEA_1_T2, HUMINHA_PEA_1_T4 and HUMINHA_PEA_1_T5. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster HUMINHA_PEA_l_node_10 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMINHA_PEA_1_T2, HUMINHA_PEA_1_T4 and HUMINHA_PEA_1_T5. Table 20 below describes the starting and ending position of this segment on each transcript.
Table 20 - Segment location on transcripts
Segment cluster HUMINHA PEA l node lό according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMINHA_PEA_1_T6. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMINHA_PEA_l_node_5 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMINHA_PEA_1_T2, HUMINHA_PEA_1_T4 and HUMINHA_PEA_1_T5. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HUMINHA_PEA_l_node_6 according to the present invention can be found in the following transcript(s): HUMINHA_PEA_1_T2 and HUMINHA_PEA_1_T5. Table 23 below describes the starting and ending position of this segment on each transcript.
Table 23 - Segment location on transcripts
Segment cluster HUMINHA_PEA_l_node_8 according to the present invention can be found in the following rranscript(s): HUMINHA_PEA_1_T2 and HUMΓNHA_PEA_1_T4. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HUMINHA_PEA_l_node_l 1 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s). HUMINHA_PEA_1_T2, HUMINHA_PEA_1_T4 and HUMINHA_PEA_1_T5. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HUMINHA_PEA_l_node_12 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): HUMINHA PEA 1 T2, HUMINHA_PEA_1_T4 and HUMINHA_PEA_1_T5. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HUMINHA_PEA_l_node_14 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMINHA_PEA_1_T6. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: IHA_HUMAN Sequence documentation: Alignment of: HUMINHA_PEA_1_P5 x IHA_HUMAN Alignment segment 1/1: Quality: 848.00 Escore: 0 Matching length: 89 Total length: 89 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 1 MVLHLLLFLLLTPQGGHSCQGLELARELVLAKVRALFLDALGPPAVTREG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MVLHLLLFLLLTPQGGHSCQGLELARELVLAKVRALFLDALGPPAVTREG 50
51 GDPGVRRLPRRHALGGFTHRGSEPEEEEDVSQAILFPAT 89 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 GDPGVRRLPRRHALGGFTHRGSEPEEEEDVSQAILFPAT 89
Sequence name: IHA_HUMAN Sequence documentation:
Alignment of: HUMINHA_PEA_1_P10 x IHA_HUMAN Alignment segment 1/1: Quality: 1178.00
Escore : 0 Matching length: 126 Total length: 126 Matching Percent Similarity: 98.41 Matching Percent
Identity: 97.62 Total Percent Similarity: 98.41 Total Percent
Identity: 97.62 Gaps : 0
Alignment : 1 MVLHLLLFLLLTPQGGHSCQGLELARELVLAKVRALFLDALGPPAVTREG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 1 MVLHLLLFLLLTPQGGHSCQGLELARELVLAKVRALFLDALGPPAVTREG 50 51 GDPGVRRLPRRHALGGFTHRGSEPEEEEDVSQAILFPATDASCEDKSAAR 100 51 GDPGVRRLPRRHALGGFTHRGSEPEEEEDVSQAILFPATDASCEDKSAAR
100 101 GLAQEAEEGLFRYMFRPSQHTRNHPV
126 I II I I I I I I I I I I I I I I I I I I I : I 101 GLAQEAEEGLFRYMFRPSQHTRSRQV
126
Subsection B: CD117 (KIT)
CD1 17 (also called KIT) is a 954 aa (108kDa) type I membrane protein. It is expressed on numerous diverse fetal and adult cells including hematopoietic cells, mast cells, melanocytes, germ cells, and the interstitial cells of Cajal (Adv Anat Pathol. 2002 Jan;9(l):65-9.). CD117 is the receptor for stem cell factor (mast cell growth factor) and has a tyrosine-protein kinase activity. Binding ofthe ligand leads to the autophosphorylation ofthe receptor and its association with substrates such as phosphatidylinositol 3-kinase (Pi3K). CD1 17 is being used mostly as an immunohistochemistry marker for gastrointestinal stromal tumors, mast cell tumors, and seminomatous germ cell tumors. Of particular therapeutic importance is the fact that gastrointestinal stromal tumors have been recognized as a biologically distinctive tumor type, different from smooth muscle and neural tumors of the gastrointestinal tract. The finding of remarkable antitumor effects of the molecular inhibitor, imatinib (Glivec) in metastatic and inoperable for gastointestinal stromal tumors, has necessitated accurate diagnosis of these tumors and their distinction from other gastrointestinal mesenchymal tumors. (Pathol Oncol Res. 2003 ;9(1): 13-9). CD1 17 has a major contribution to the definite diagnosis of this tumor type. Recently, evidence has started emerging for the role of CD1 17 in the diagnosis of lung cancer. Immunoreactivity for CD1 17 in small cell lung carcinoma has been well established. However, CD117 immunostaining in other lung tumors is still under research. Preliminary results are encouraging. Pelosi et at showed that downregulation of CD117 by neoadjuvant chemotherapy was seen in large-cell neuroendocrine carcinomas but not small- cell carcinomas. In addition, Membrane CD1 17 immunoreactivity in 5% or more tumor cells was documented in 77% large-cell neuroendocrine carcinomas and 67% small-cell carcinomas but rarely in carcinoids tumors (Virchows Arch. 2004 Sep 16). The same group also showed the importance of CD117 in the diagnosis of squamous cell carcinoma ofthe lung (Mod Pathol. 2004 Jun;17(6):711-21) According to the present invention, the splice variants described herein are non- limiting examples of markers for diagnosing CD 117-detectable cancers. Each splice variant marker ofthe present invention can be used alone or in combination, for various uses, including but not limited to, prognosis, prediction, screening, early diagnosis, staging, therapy selection and treatment monitoring of CD 117-detectable cancers. Examples of such CD 117-detectable cancers (which are cancers detectable through a CD117 variant according to the present invention) include but are not limited to gastrointestinal stromal tumors, mast cell tumors, and seminomatous germ cell tumors. These markers are overexpressed in
cancer specifically, as opposed to normal tissues. The measurement of these markers, alone or in combination, in patient samples provides information that the diagnostician can conelate with a probable diagnosis of cancer. The markers ofthe present invention, alone or in combination, show a high degree of differential detection between cancerous and normal states. For example, optionally and preferably, these markers may be used for staging CDl 17-detectable cancers and/or monitoring the progression ofthe disease. Furthermore, the markers of the present invention, alone or in combination, can be used for detection ofthe source of metastasis found in anatomical places other then gastointestinal stroma, mast cell, seminomatous germ cells and lung tissues. Also, one or more ofthe markers may optionally be used in combination with one or more other cancer markers (including those described herein and other than those described herein). According to an optional embodiment of the present invention, such a combination may be used to differentiate between various types of gastointestinal stromal tumors, mast cell tumors, seminomatous germ cell tumors and optionally lung cancers.
The markers ofthe present invention were tested with regard to their expression in various cancerous and non-cancerous tissue samples. A description ofthe samples used in the prostate cancer testing panel is provided in Table 1 below. A description ofthe samples used in the ovarian cancer testing panel is provided in Table 2 below. A description of the samples used in the colon cancer testing panel is provided in Table 3 below. A description of the samples used in the lung cancer testing panel is provided in Table 4 below. A description ofthe samples used in the breast cancer testing panel is provided in Table 5 below. A description of the samples used in the normal tissue panel is provided in Table 6 below. Tests were then performed as described in the "Materials and Experimental Procedures" section below.
Table 1: Tissue samples in prostate cancer testing panel
Table 2: Tissue samples in ovarian cancer testing panel
Table 3 : Tissue samples in colon cancer testing panel
Table 4: Tissue samples in lung cancer testing panel
Table 5: Tissue samples in breast cancer testing panel
Table 6: Tissue samples in normal panel:
Materials and Experimental Procedures RNA preparation - RNA was obtained from Clontech (Franklin Lakes, NJ USA 07417, www.clontech.com), BioChain Inst. Inc. (Hayward, CA 94545 USA www.biochain.com), ABS (Wilmington, DE 19801, USA, www.absbioreagents.com) or Ambion (Austin, TX 78744 USA, www.ambion.com). Alternatively, RNA was generated from tissue samples using TRI-Reagent (Molecular Research Center), according to Manufacturer's instructions. Tissue and RNA samples were obtained from patients or from postmortem. Total RNA samples were treated with DNasel (Ambion) and purified using RNeasy columns (Qiagen). RT PCR - Purified RNA (1 μg) was mixed with 150 ng Random Hexamer primers (Invitrogen) and 500 μM dNTP in a total volume of 15.6 μl. The mixture was incubated for 5 min at 65 °C and then quickly chilled on ice. Thereafter, 5 μl of 5X SuperscriptH first strand buffer (Invitrogen), 2.4μl O.IM DTT and 40 units RNasin (Promega) were added, and
_ 893 the mixture was incubated for 10 min at 25 °C, followed by further incubation at 42 °C for 2 min. Then, 1 μl (200units) of Superscriptll (Invitrogen) was added and the reaction (final volume of 25μl) was incubated for 50 min at 42 °C and then inactivated at 70 °C for 15min. The resulting cDNA was diluted 1 :20 in TE buffer (10 mM Tris pH=8, 1 mM EDTA pH=8). Real-Time RT-PCR analysis- cDNA (5μl), prepared as described above, was used as a template in Real-Time PCR reactions using the SYBR Green I assay (PE Applied Biosystem) with specific primers and UNG Enzyme (Eurogentech or ABI or Roche). The amplification was effected as follows: 50 °C for 2 min, 95 °C for 10 min, and then 40 cycles of 95 °C for 15sec, followed by 60 °C for 1 min. Detection was performed by using the PE Applied Biosystem SDS 7000. The cycle in which the reactions achieved a threshold level (Ct) of fluorescence was registered and was used to calculate the relative transcript quantity in the RT reactions. The relative quantity was calculated using the equation Q=efficiencyΛ"Ct. The efficiency ofthe PCR reaction was calculated from a standard curve, created by using serial dilutions of several reverse transcription (RT) reactions. To minimize inherent differences in the RT reaction, the resulting relative quantities were normalized to the geometric mean ofthe relative quantities of several housekeeping (HSKP) genes. Schematic summary of quantitative real-time PCR analysis is presented in Figure 12. As shown, the x- axis shows the cycle number. The C = Threshold Cycle point, which is the cycle that the amplification curve crosses the fluorescence threshold that was set in the experiment. This point is a calculated cycle number in which PCR product signal is above the background level (passive dye ROX) and still in the Geometric/Exponential phase (as shown, once the level of fluorescence crosses the measurement threshold, it has a geometrically increasing phase, during which measurements are most accurate, followed by a linear phase and a plateau phase; for quantitative measurements, the latter two phases do not provide accurate measurements). The y-axis shows the normalized reporter fluorescence. It should be noted that this type of analysis provides relative quantification.
The sequences of the housekeeping genes measured in all the examples below on prostate panel were as follows:
SDH A (GenBank Accession No. NM_004168)
SDHA Forward primer (SEQ ID NO: 1367): TGGGAACAAGAGGGCATCTG
SDHA Reverse primer (SEQ ID NO: 1368): CCACCACTGCATCAAATTCATG
SDHA-amplicon (SEQ ID NO: 1369):
TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCA
GTAGTGGATCATGAATTTGATGCAGTGGTGG PBGD (GenBank Accession No. BC019323),
PBGD Forward primer (SEQ ID NO: 1370): TGAGAGTGATTCGCGTGGG PBGD Reverse primer (SEQ ID NO: 1371): CCAGGGTACGAGGCTTTCAAT PBGD-amplicon (SEQ ID NO: 1372):
TGAGAGTGATTCGCGTGGGTACCCGCAAGAGCCAGCTTGCTCGCATACAGACGG ACAGTGTGGTGGC A AC ATTGAAAGCCTCGTACCCTGG
HPRT1 (GenBank Accession No. NM_000194),
HPRT1 Forward primer (SEQ ID NO: 1373): TGACACTGGCAAAACAATGCA HPRT1 Reverse primer (SEQ ID NO: 1374): GGTCCTTTTCACCAGCAAGCT HPRTl-amplicon (SEQ ID NO: 1375):
TGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCC AAAGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACC
RPL19 (GenBank Accession No. NM_000981 RPL19Forward primer (SEQ ID NO: 1376): TGGCAAGAAGAAGGTCTGGTTAG RPL19Reverse primer (SEQ ID NO: 1377): TGATCAGCCCATCTTTGATGAG RPL19-amplicon (SEQ ID NO: 1378):
TGGCAAGAAGAAGGTCTGGTTAGACCCCAATGAGACCAATGAAATCGCCAATG
CCAACTCCCGTCAGCAGATCCGGAAGCTCATCAAAGATGGGCTGATCA
The sequences of the housekeeping genes measured in all the examples on ovarian cancerpanel were as follows:
SDHA (GenBank Accession No. NM_004168) SDHA Forward primer (SEQ ID NO: 1367): TGGGAACAAGAGGGCATCTG
SDHA Reverse primer (SEQ ID NO: 1368): CCACCACTGCATCAAATTCATG
SDHA-amplicon (SEQ ID NO: 1369):
TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCA
GTAGTGGATCATGAATTTGATGCAGTGGTGG
PBGD (GenBank Accession No. BC019323),
PBGD Forward primer (SEQ ID NO: 1370): TGAGAGTGATTCGCGTGGG PBGD Reverse primer (SEQ ID NO: 1371): CCAGGGTACGAGGCTTTCAAT PBGD-amplicon (SEQ ID NO: 1372):
TGAGAGTGATTCGCGTGGGTACCCGCAAGAGCCAGCTTGCTCGCATACAGACGG ACAGTGTGGTGGCAACATTGAAAGCCTCGTACCCTGG
HPRT1 (GenBank Accession No. NM 000194),
HPRT1 Forward primer (SEQ ID NO: 1373): TGACACTGGCAAAACAATGCA
HPRTl Reverse primer (SEQ ID NO: 1374): GGTCCTTTTCACCAGCAAGCT
HPRT1 -amplicon (SEQ ID NO: 1375):
TGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCC
AAAGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACC
GAPDH (GenBank Accession No. BC026907)
GAPDH Forward primer (SEQ ID NO: 1379): TGCACCACCAACTGCTTAGC GAPDH Reverse primer (SEQ ID NO: 1380): CCATCACGCCACAGTTTCC GAPDH-amplicon (SEQ ID NO: 1381):
TGCACCACCAACTGCTTAGCACCCCTGGCCAAGGTCATCCATGACAACTTTGGT
ATCGTGGAAGGACTCATGACCACAGTCCATGCCATCACTGCCACCCAGAAGACT
GTGGATGG The sequences ofthe housekeeping genes measured in all the examples on colon cancer tissue testing panel were as follows:
PBGD (GenBank Accession No. BC019323),
PBGD Forward primer (SEQ ID NO: 1370): TGAGAGTGATTCGCGTGGG
PBGD Reverse primer (SEQ ID NO: 1371): CCAGGGTACGAGGCTTTCAAT PBGD-amplicon (SEQ ID NO: 1372):
TGAGAGTGATTCGCGTGGGTACCCGCAAGAGCCAGCTTGCTCGCATACAGACGG
ACAGTGTGGTGGCAACATTGAAAGCCTCGTACCCTGG
HPRT1 (GenBank Accession No. NM_000194),
HPRT1 Forward primer (SEQ ID NO: 1373): TGACACTGGCAAAACAATGCA
HPRT1 Reverse primer (SEQ ID NO: 1374): GGTCCTTTTCACCAGCAAGCT
HPRTl-amplicon (SEQ ID NO: 1375): TGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCC
AAAGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACC
G6PD (GenBank Accession No. NM_000402)
G6PD Forward primer (SEQ ID NO: 1382): gaggccgtcaccaagaacat
G6PD Reverse primer (SEQ ID NO: 1383): ggacagccggtcagagctc
G6PD-amplicon (SEQ ID NO: 1384): gaggccgtcaccaagaacattcacgagtcctgcatgagccagataggctggaaccgcatcatcgtggagaagcccttcgggaggg acctgcagagctctgaccggctgtcc
RPS27A (GenBank Accession No. NM_002954)
RPS27A Forward primer (SEQ ID NO: 1385): CTGGCAAGCAGCTGGAAGAT RPS27A Reverse primer (SEQ ID NO: 1386): TTTCTTAGCACCACCACGAAGTC RPS27A-amplicon (SEQ ID NO: 1387):
CTGGCAAGCAGCTGGAAGATGGACGTACTTTGTCTGACTACAATATTCAAAAGG
AGTCTACTCTTCATCTTGTGTTGAGACTTCGTGGTGGTGCTAAGAAA
The sequences ofthe housekeeping genes measured in all the examples in testing panel were as follows:
Ubiquitin (GenBank Accession No. BC000449)
Ubiquitin Forward primer (SEQ ID NO: 1388): ATTTGGGTCGCGGTTCTTG Ubiquitin Reverse primer (SEQ ID NO: 1389): TGCCTTGACATTCTCGATGGT Ubiquitin-amplicon (SEQ ID NO: 1390)
ATTTGGGTCGCGGTTCTTGTTTGTGGATCGCTGTGATCGTCACTTGACAATGCAG ATCTTCGTGAAGACTCTGACTGGTAAGACCATCACCCTCGAGG TTGAGCCCAGTGACACCATCGAGAATGTCAAGGCA
SDHA (GenBank Accession No. NM_004168)
SDHA Forward primer (SEQ ID NO: 1367): TGGGAACAAGAGGGCATCTG SDHA Reverse primer (SEQ ID NO: 1368): CCACCACTGCATCAAATTCATG SDHA-amplicon (SEQ ID NO: 1369):
TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCA GTAGTGGATCATGAATTTGATGCAGTGGTGG
PBGD (GenBank Accession No. BC019323),
PBGD Forward primer (SEQ ID NO: 1370): TGAGAGTGATTCGCGTGGG
PBGD Reverse primer (SEQ ID NO: 1371): CCAGGGTACGAGGCTTTCAAT
PBGD-amplicon (SEQ ID NO: 1372):
TGAGAGTGATTCGCGTGGGTACCCGCAAGAGCCAGCTTGCTCGCATACAGACGG ACAGTGTGGTGGCAACATTGAAAGCCTCGTACCCTGG
HPRT1 (GenBank Accession No. NM_000194),
HPRT1 Forward primer (SEQ ID NO: 1373): TGACACTGGCAAAACAATGCA HPRT1 Reverse primer (SEQ ID NO: 1374): GGTCCTTTTCACCAGCAAGCT HPRT1 -amplicon (SEQ ID NO: 1375):
TGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCC AAAGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACC
The sequences ofthe housekeeping genes measured in all the examples on breast cancer panel were as follows: G6PD (GenBank Accession No. NM 000402) G6PD Forward primer (SEQ ID NO: 1382): gaggccgtcaccaagaacat G6PD Reverse primer (SEQ ID NO: 1383): ggacagccggtcagagctc G6PD-amplicon (SEQ ID NO: 1384): gaggccgtcaccaagaacattcacgagtcctgcatgagccagataggctggaaccgcatcatcgtggagaagcccttcgggaggg acctgcagagctctgaccggctgtcc
SDHA (GenBank Accession No. NM_004168)
SDHA Forward primer (SEQ ID NO: 1367): TGGGAACAAGAGGGCATCTG
SDHA Reverse primer (SEQ ID NO: 1368): CCACCACTGCATCAAATTCATG
SDHA-amplicon (SEQ ID NO: 1369):
TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCA
GTAGTGGATCATGAATTTGATGCAGTGGTGG
PBGD (GenBank Accession No. BCO 19323),
PBGD Forward primer (SEQ ID NO: 1370): TGAGAGTGATTCGCGTGGG PBGD Reverse primer (SEQ ID NO: 1371): CCAGGGTACGAGGCTTTCAAT PBGD-amplicon (SEQ ID NO: 1372):
TGAGAGTGATTCGCGTGGGTACCCGCAAGAGCCAGCTTGCTCGCATACAGACGG ACAGTGTGGTGGCAACATTGAAAGCCTCGTACCCTGG
HPRT1 (GenBank Accession No. NM_000194),
HPRT1 Forward primer (SEQ ID NO: 1373): TGACACTGGCAAAACAATGCA
HPRT1 Reverse primer (SEQ ID NO: 1374): GGTCCTTTTCACCAGCAAGCT
HPRTl-amplicon (SEQ ID NO: 1375): TGACACTGGCAAAACAATGCAGACTTTGCTTTCCTTGGTCAGGCAGTATAATCC
AAAGATGGTCAAGGTCGCAAGCTTGCTGGTGAAAAGGACC
The sequences ofthe housekeeping genes measured in all the examples on normal tissue samples panel were as follows:
RPL19 (GenBank Accession No. NM_000981
RPL19Forward primer (SEQ ID NO: 1376): TGGCAAGAAGAAGGTCTGGTTAG RPL19Reverse primer (SEQ ID NO: 1377): TGATCAGCCCATCTTTGATGAG RPL19-amplicon (SEQ ID NO: 1378):
TGGCAAGAAGAAGGTCTGGTTAGACCCCAATGAGACCAATGAAATCGCCAATG
CCAACTCCCGTCAGCAGATCCGGAAGCTCATCAAAGATGGGCTGATCA TATA box (GenBank Accession No. NM_003194),
TATA box Forward primer (SEQ ID NO: 1391) : CGGTTTGCTGCGGTAATCAT TATA box Reverse primer (SEQ ID NO: 1392): TTTCTTGCTGCCAGTCTGGAC TATA box -amplicon (SEQ ID NO: 1393): CGGTTTGCTGCGGTAATCATGAGGATAAGAGAGCCACGAACCACGGCACTGATT TTCAGTTCTGGGAAAATGGTGTGCACAGGAGCCAAGAGTGAAGAACAGTCCAG ACTGGCAGCAAGAAA
il 898
Ubiquitin (GenBank Accession No. BC000449) Ubiquitin Forward primer (SEQ ID NO: 1388): ATTTGGGTCGCGGTTCTTG Ubiquitin Reverse primer (SEQ ID NO: 1389): TGCCTTGACATTCTCGATGGT Ubiquitin-amplicon (SEQ ID NO: 1390) ATTTGGGTCGCGGTTCTTGTTTGTGGATCGCTGTGATCGTCACTTGACAATGCAG ATCTTCGTGAAGACTCTGACTGGTAAGACCATCACCCTCGAGG TTGAGCCCAGTGACACCATCGAGAATGTCAAGGCA
SDHA (GenBank Accession No. NM_004168) SDHA Forward primer (SEQ ID NO: 1367): TGGGAACAAGAGGGCATCTG SDHA Reverse primer (SEQ ID NO: 1368): CCACCACTGCATCAAATTCATG SDHA-amplicon (SEQ ID NO: 1369): TGGGAACAAGAGGGCATCTGCTAAAGTTTCAGATTCCATTTCTGCTCAGTATCCA GTAGTGGATCATGAATTTGATGCAGTGGTGG
DESCRIPTION FOR CLUSTER HSKITCR Cluster HSKITCR features 4 transcript(s) and 29 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Mast/stem cell growth factor receptor precursor (SwissProt accession identifier KIT HUMAN; known also according to the synonyms EC 2.7.1.1 12; SCFR; Proto-oncogene tyrosine-protein kinase Kit; c-kit; CDl 17 antigen), SEQ ID NO: 802, referred to herein as the previously known protein. Protein Mast/stem cell growth factor receptor precursor is known or believed to have the following function(s): this is the receptor for stem cell factor (mast cell growth factor). It has a tyrosine-protein kinase activity. Binding ofthe ligands leads to the autophosphorylation of KIT and its association with substrates such as phosphatidylinositol 3-kinase (Pi3K).
The sequence for protein Mast/stem cell growth factor receptor precursor is given at the end of the application, as "Mast/stem cell growth factor receptor precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Mast/stem cell growth factor receptor precursor localization is believed to be Type I membrane protein. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: protein amino acid phosphorylation; protein amino acid dephosphorylation; signal transduction; transmembrane receptor protein tyrosine kinase signaling pathway; cell growth and/or maintenance, which are annotation(s) related to Biological Process; receptor signaling protein tyrosine kinase; receptor; vascular endothelial growth factor receptor; ATP binding; transferase, which are annotation(s) related to Molecular Function; and integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HSKITCR features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Mast/stem cell growth factor receptor precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSKITCR P2 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSKITCR T2. An alignment is given to the known protein (Mast/stem cell growth factor receptor precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe
relationship ofthe variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSKITCR_P2 and KIT_HUMAN: 1.An isolated chimeric polypeptide encoding for HSKITCR_P2, comprising a first amino acid sequence being at least 90 % homologous to MAPESIFNCVYTFESDVWSYGIFLWELFSLGSSPYPGMPVDSKFYKMIKEGFRMLSP EHAPAEMYDIMKTCWDADPLKRPTFKQIVQLIEKQISESTNHIYSNLANCSPNRQKP VVDHSVRINSVGSTASSSQPLLVHDDV corresponding to amino acids 836 - 976 of KIT_HUMAN, which also corresponds to amino acids 1 - 141 of HSKITCR_P2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellular because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein HSKITCR P2 is encoded by the following transcript(s): HSKITCR T2, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSKITCR_T2 is shown in bold; this coding portion starts at position 216 and ends at position 638. The transcript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSKITCR_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
Variant protein HSKITCR_P3 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSKITCR_T4. An alignment is given to the known protein (Mast/stem cell growth factor receptor precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSKITCR_P3 and KIT_HUMAN: 1.An isolated chimeric polypeptide encoding for HSKITCR P3, comprising a first amino acid sequence being at least 90 % homologous to MRGARGAWDFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRVGDEIRLLC TDPGFVKWTFEILDETNENKQNEWITEKAEATNTGKYTCTNKHGLSNSIYVFVRDP AKLFLVDRSLYGKEDNDTLVRCPLTDPEVTNYSLKGCQGKPLPKDLRFIPDPKAGIM IKSVKRAYHRLCLHCSVDQEGKSVLSEKFILKVRPAFKAVPWSVSKASYLLREGEE FTVTCTIKDVSSSVYSTWKRENSQTKLQEKYNSWHHGDFNYERQATLTISSARVND SGVFMCYANNTFGSANVTTTLEWDKGFINIFPMINTTVFVNDGENVDLIVEYEAFP KPEHQQWIYM RTFTDKWEDYPKSENESNIRYVSELHLTRLKGTEGGTYTFLVSNS DVNAAIAFNVYVNTKPEILTYDRLVNGMLQCVAAGFPEPTIDWYFCPGTEQRCSAS VLPVDVQTLNSSGPPFGKLWQSSIDSSAFKHNGTVECKAYNDVGKTSAYFNFAFK GNNKEQIHPHTLFTPLLIGFVIVAGMMCIIVMILTYKYLQKPMYEVQWKWEEINGN NYVYIDPTQLPYDHKWEFPRNRLSFGKTLGAGAFGKVVEATAYGLIKSDAAMTVA VKMLKPSAHLTEREALMSELKVLSYLGNHMNIVNLLGACTIGGPTLVITEYCCYGD LLNFLRRKRDSFICSKQEDHAEAALYKNLLHSKESSCSDSTNEYMDMKPGVSYVVP TKADKRRSVRIGSYIERDVTPAIMEDDELALDLEDLLSFSYQVAKGMAFLASKNCIH RDLAARNILLTHGRITKICDFGLARDIKNDSNYVVKGNARLPVKWMAPESIFNCVYT FESDVWSYGIFLWELFSLGSSPYPGMPVDSKFYKMIKEGFRMLSPEHAPAEMYDIM KTCWDADPLKRPTFKQIVQLIEKQISESTNHIYSNLANCSPNRQKPW corresponding to amino acids 1 - 951 of KIT HUMAN, which also corresponds to amino acids 1 - 951 of HSKITCR_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%
homologous to a polypeptide having the sequence LQGHFIESFVLDILESLYFYNFFLHQMFLCSGLMFEIILWLFL corresponding to amino acids 952 - 994 of HSKITCR_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSKITCR P3, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LQGHFIESFVLDILESLYFYNFFLHQMFLCSGLMFEIILWLFL in HSKITCR_P3.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both transmembrane region prediction programs predicted a trans-membrane region for this protein.. Variant protein HSKITCR P3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSKITCR P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
Variant protein HSKITCR_P3 is encoded by the following transcript(s): HSKITCR T4, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSKITCR_T4 is shown in bold; this coding portion starts at position 82 and ends at position 3063. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence
of known SNPs in variant protein HSKITCR_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HSKITCR P4 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSKITCR T5. An alignment is given to the known protein (Mast/stem cell growth factor receptor precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSKITCR_P4 and KIT_HUMAN: 1.An isolated chimeric polypeptide encoding for HSKITCR_P4, comprising a first amino acid sequence being at least 90 % homologous to MRGARGAWDFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRVGDEIRLLC TDPGFVKWTFEILDETNENKQNEWITEKAEATNTGKYTCTNKHGLSNSIYVFVRDP AKLFLVDRSLYGKEDNDTLVRCPLTDPEVTNYSLKGCQGKPLPKDLRFIPDPKAGIM IKSVKRAYHRLCLHCSVDQEGKSVLSEKFILKVRPAFKAVPVVSVSKASYLLREGEE FTVTCTIKDVSSSVYSTWKRENSQTKLQEKYNSWHHGDFNYERQATLTISSARVND SGVFMCYANNTFGSANVTTTLEVVDKGFINIFPMINTTVFVNDGENVDLIVEYEAFP KPEHQQWIYMNRTFTDKWEDYPKSENESNIRYVSELHLTRLKGTEGGTYTFLVSNS DVNAAIAFNVYVNTKPEILTYDRLVNGMLQCVAAGFPEPTIDWYFCPGTEQRCSAS VLPVDVQTLNSSGPPFGKLWQSSIDSSAFKHNGTVECKAYNDVGKTSAYFNFAFK GNNKEQIHPHTLFTPLLIGFVIVAGMMCIIVMILTYKYLQKPMYEVQWKVVEEINGN NYVYIDPTQLPYDHKWEFPRNRLSFGKTLGAGAFGKVVEATAYGLIKSDAAMTVA VKMLKPSAHLTEREALMSELKVLSYLGNHMNIVNLLGACTIGGPTLVITEYCCYGD LLNFLPRKRDSFICSKQEDHAEAALYKNLLHSKESSCSDSTNEYMDMKPGVSYVVP
TKADKRRSVRIGSYIERDVTPAIMEDDELALDLEDLLSFSYQVAKGMAFLASKNCIH RDLAARNILLTHGRITKICDFGLARDIKNDSNYWKGN corresponding to amino acids 1 - 828 of KIT_HUMAN, which also corresponds to amino acids 1 - 828 of HSKITCR_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSTHSLLDSPAKDF corresponding to amino acids 829 - 842 of HSKITCR P4, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSKITCR P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSTHSLLDSPAKDF in HSKITCR_P4.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal-peptide prediction programs agree that this protein has a signal peptide, both trans-membrane region prediction programs predict that this protein has a trans-membrane region downstream of this signal peptide.. Variant protein HSKITCR P4 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSKITCR P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein HSKITCR P4 is encoded by the following transcript(s): HSKITCR_T5, for which the sequence(s) is/are given at the end ofthe application. The
coding portion of transcript HSKITCR_T5 is shown in bold; this coding portion starts at position 82 and ends at position 2607. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSKITCR P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein HSKITCR P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSKITCR T6. An alignment is given to the known protein (Mast/stem cell growth factor receptor precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSKITCR_P5 and KIT TUMAN: 1.An isolated chimeric polypeptide encoding for HSKITCR P5, comprising a first amino acid sequence being at least 90 % homologous to MRGARGAWDFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRVGDEIRLLC TDPGFVKWTFEILDETNENKQNEWITEKAEATNTGKYTCTNKHGLSNSIYVFVR corresponding to amino acids 1 - 1 12 of KIT HUMAN, which also corresponds to amino acids 1 - 112 of HSKITCR_P5, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
f 907 GKCLAFCSAVLSRI corresponding to amino acids 113 - 126 of HSKITCR_P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSKITCR_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GKCLAFCSAVLSRI in HSKITCR_P5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSKITCR_P5 is encoded by the following transcript(s): HSKITCR T6, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSKITCR T6 is shown in bold; this coding portion starts at position 82 and ends at position 459. As noted above, cluster HSKITCR features 29 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster HSKITCR_node_0 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4, HSKITCR T5 and HSKITCR T6. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster HSKITCR_node_l 1 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 1 1 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
HSKITCR T4 1007 1196 HSKITCR T5 1007 1 196
Segment cluster HSKITCR_node_17 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster HSKITCR_node_2 according to the present invention is supported by 11 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR T4, HSKITCR_T5 and HSKITCR T6. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster HSKITCR_node_21 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment
it 909 can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts TranscπptΛiaπ & i% Segmentfstartrag position' HSKITCR T4 1729 1855 HSKITCR T5 1729 1855
Segment cluster HSKITCR_node_27 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
HSKITCR T4 2072 2222 HSKITCR T5 2072 2222
Segment cluster HSKITCR_node_3 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR T6. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts Transjcπpt namei ΪSegraenfstartύϊglrøsitionl Segn t endingraiti n HSKITCR T6 419 543 Segment cluster HSKITCR_node_31 according to the present invention is supported by 1 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
■ r ' » tl ft ππ$ ,. 910 Segment cluster HSKITCR_node_33 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster HSKITCR_node_34 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T5. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster HSKITCR_node_36 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR T2. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HSKITCR_node_44 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T2. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
HSKITCR T2 561 2054
Segment cluster HSKITCR_node_46 according to the present invention is supported by 75 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T2. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HSKITCR_node_5 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR T4 and HSKITCR T5. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HSKITCR_node_50 according to the present invention is supported by 63 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T2 and HSKITCR_T4. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HSKITCR_node_7 according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 25 below describes the starting and ending position of this segment on each transcript.
Table 25 - Segment location on transcripts sTransmpiSame t Segment ending positions Λlf HSKITCR T4 701 837 HSKITCR T5 701 837
Segment cluster HSKITCR_node_9 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 26 below describes the starting and ending position of this segment on each transcπpt. Table 26 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSKITCR_node_13 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster HSKITCR node l 5 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
HSKITCR T5 1313 1427
Segment cluster HSKITCR_node_19 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts |Sfegmeηi rø ιngTposιtιon ISegmeirøendmgippsitionΛ HSKITCR T4 1622 1728 HSKITCR T5 1622 1728
Segment cluster HSKITCR_node_23 according to the present invention is supported by 3 libraries. The number of libraπes was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster HSKITCR_node_25 according to the present invention is supported by 4 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCRJT5. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster HSKITCR_node_29 according to the present invention is supported by 1 1 libraries. The number of libraries was determined as previously described. This
segment can be found in the following transcript(s): HSKITCR_T4 and HSKITCR_T5. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts S.. «" JBLii. * Bfanlcnpt narne > .Segmenj tartragippsitioi HSKITCR T4 2223 2314 HSKITCR T5 2223 2314
Segment cluster HSKITCR_node_37 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T2 and HSKITCR_T4. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HSKITCR_node_39 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T2 and HSKITCR_T4. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts Segmen iendingφositior HSKITCR T2 307 406 HSKITCR T4 2678 2777 Segment cluster HSKITCR_node_41 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T2 and HSKITCR T4. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
HSKITCR T4 2778 2883
Segment cluster HSKITCR_node_43 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T2 and HSKITCR_T4. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster HSKITCR_node_47 according to the present invention is supported by 56 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR T2 and HSKITCR T4. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts *!%. * ' .»! T* IBS' Wh vSSΪ1"* ?«" t*|f! iTranscπpfname β ■ HSKITCR T2 2457 2492 HSKITCR T4 2932 2967
Segment cluster HSKITCR_node_48 according to the present invention is supported by 61 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSKITCR_T2 and HSKITCRJT4. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
MmsmptM S ,SegmeηfstarhngτP4osιtιonw f Segment endmgrpositiorF. HSKITCR T2 2493 2605 HSKITCR T4 2968 3080
Variant protein alignment to the previously known protein: Sequence name: /tmp/FKrQWooslK/vZlqDwG01F:KIT_HUMAN
Sequence documentation: Alignment of: HSKITCR_P2 x KIT_HUMAN
Alignment segment 1/1: Quality: 1423.00 Escore: 0 Matching length: 141 Total length: 141 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment: 1 MAPESIFNCVYTFESDV SYGIFLWELFSLGSSPYPGMPVDSKFYKMIKE 50 II II I M I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I 836 MAPESIFNCVYTFESDVWSYGIFLWELFSLGSSPYPGMPVDSKFYKMIKE
885 51 GFRMLSPEHAPAEMYDIMKTCWDADPLKRPTFKQIVQLIEKQISESTNHI
100 I I I I I I I I II I I I I I I I I I I I I II I I II I I I I I II I I I I I I I I I I I II I I 886 GFRMLSPEHAPAEMYDIMKTCWDADPLKRPTFKQIVQLIEKQISESTNHI
935 101 YSNLANCSPNRQKPWDHSVRINSVGSTASSSQPLLVHDDV
141 I I I I I I I II II I I I I I I I II I I I I I I I I I I I I I I I I I I II I 936 YSNLANCSPNRQKPWDHSVRINSVGSTASSSQPLLVHDDV 976
Sequence name: /tmp/wBoB9oFSOF/Ih2hkBVNUl :KIT_HUMAN
Sequence documentation:
Alignment of: HSKITCR_P3 x KIT_HUMAN
Alignment segment 1/1: Quality: 9409.00
Escore:
Matching length: 951 Total length: 951 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: . . . . . 1 MRGARGAWDFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRV 50 I I I I I I I I I II II I II II II I I I I II I I II I I I I I I II I II I I I I I II II 1 MRGARGA DFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRV 50 51 GDEIRLLCTDPGFVKWTFEILDETNENKQNE ITEKAEATNTGKYTCTNK
100 I I I I II I II I I I I I I I I II I I I I I I I I I I I I II I II I I I I I I I I I I I I II 51 GDEIRLLCTDPGFVKWTFEILDETNENKQNEWITEKAEATNTGKYTCTNK
100 . . . . . 101 HGLSNSIYVFVRDPAKLFLVDRSLYGKEDNDTLVRCPLTDPEVTNYSLKG
150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I 101 HGLSNSIYVFVRDPAKLFLVDRSLYGKEDNDTLVRCPLTDPEVTNYSLKG 150 151 CQGKPLPKDLRFIPDPKAGIMIKSVKRAYHRLCLHCSVDQEGKSVLSEKF 200 I I I I I I I I I I I I I I II I I I I I II I II II I I I I I I I I I II I I I I I I I I I I I 151 CQGKPLPKDLRFIPDPKAGIMIKSVKRAYHRLCLHCSVDQEGKSVLSEKF 200 201 ILKVRPAFKAVPVVSVSKASYLLREGEEFTVTCTIKDVSSSVYSTWKREN 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 ILKVRPAFKAVPVVSVSKASYLLREGEEFTVTCTIKDVSSSVYST KREN 250 251 SQTKLQEKYNSWHHGDFNYERQATLTISSARVNDSGVFMCYANNTFGSAN 300 I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I II II I I I I I I I I I I I 251 SQTKLQEKYNS HHGDFNYERQATLTISSARVNDSGVFMCYANNTFGSAN 300 301 VTTTLEWDKGFINIFPMINTTVFVNDGENVDLIVEYEAFPKPEHQQWIY 350 I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I II II I I I I I II I I I I I I I 301 VTTTLEVVDKGFINIFPMINTTVFVNDGENVDLIVEYEAFPKPEHQQWIY 50 . . . . .
i 918 351 MNRTFTDK EDYPKSENESNIRYVSELHLTRLKGTEGGTYTFLVSNSDVN 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 MNRTFTDK EDYPKSENESNIRYVSELHLTRLKGTEGGTYTFLVSNSDVN 400 401 AAIAFNVYVNTKPEILTYDRLVNGMLQCVAAGFPEPTI DWYFCPGTEQRC 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 AAIAFNVYVNTKPEILTYDRLVNGMLQCVAAGFPEPTID YFCPGTEQRC
450 451 SASVLPVDVQTLNSSGPPFGKLVVQSSIDSSAFKHNGTVECKAYNDVGKT 500 I I I I M I I I I I I I I I I I I I I I I I I I I I | | I I I I I I I I I I I I I I I I I I I I I 451 SASVLPVDVQTLNSSGPPFGKLVVQSSIDSSAFKHNGTVECKAYNDVGKT 500 501 SAYFNFAFKGNNKEQIHPHTLFTPLLIGFVIVAGMMCI IVMILTYKYLQK 550 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 SAYFNFAFKGNNKEQIHPHTLFTPLLIGFVIVAGMMCI IVMILTYKYLQK 550 551 PMYEVQ KWEEINGNNYVYIDPTQLPYDHKWEFPRNRLSFGKTLGAGAF
600 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 551 PMYEVQWKVVEEINGNNYVYIDPTQLPYDHK EFPRNRLSFGKTLGAGAF
600 . . . . . 601 GKVVE ATAYGL I KS DAAMTVAVKMLKPS HLTERE ALMSELKVLS YLGNH
650 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 601 GKVVEATAYGLIKSDAAMTVAVKMLKPSAHLTEREALMSELKVLSYLGNH 650 651 MNIVNLLGACTIGGPTLVITEYCCYGDLLNFLRRKRDSFICSKQEDHAEA 700 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 651 MNIVNLLGACTIGGPTLVITEYCCYGDLLNFLRRKRDSFICSKQEDHAEA
700 701 ALYKNLLHSKESSCSDSTNEYMDMKPGVSYVVPTKADKRRSVRIGSYIER 750 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I 701 ALYKNLLHSKESSCSDSTNEYMDMKPGVSYVVPTKADKRRSVRIGSYIER 750 751 DVTPAIMEDDELALDLEDLLSFSYQVAKGMAFLASKNCIHRDLAARNILL 800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
751 DVTPAIMEDDELALDLEDLLSFSYQVAKGMAFLASKNCIHRDLAARNILL 800 801 THGRITKICDFGLARDIKNDSNYVVKGNARLPVK MAPESIFNCVYTFES 850 I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 801 THGRITKICDFGLARDIKNDSNYWKGNARLPVK MAPESIFNCVYTFES 850 851 DV SYGIFL ELFSLGSSPYPGMPVDSKFYKMIKEGFRMLSPEHAPAEMY 900 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 851 DVWSYGIFL ELFSLGSSPYPGMPVDSKFYKMIKEGFRMLSPEHAPAEMY
900 . . . . . 901 DIMKTC DADPLKRPTFKQIVQLIEKQISESTNHIYSNLANCSPNRQKPV
950 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 901 DIMKTC DADPLKRPTFKQIVQLIEKQISESTNHIYSNLANCSPNRQKPV 950 951 V 951 I 951 V 951
Sequence name: /tmp/OJWoploanN/dyx5E9df8q:KIT_HUMAN
Sequence documentation:
Alignment of: HSKITCR_P4 x KIT_HUMAN
Alignment segment 1/1: Quality: 8149.00 Escore: 0 Matching length: 828 Total length: 828 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MRGARGA DFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRV 50 I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MRGARGA DFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRV 50 51 GDEIRLLCTDPGFVKWTFEILDETNENKQNE ITEKAEATNTGKYTCTNK 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 GDEIRLLCTDPGFVKWTFEILDETNENKQNEWITEKAEATNTGKYTCTNK 100 101 HGLSNSIYVFVRDPAKLFLVDRSLYGKEDNDTLVRCPLTDPEVTNYSLKG 150 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 HGLSNSIYVFVRDPAKLFLVDRSLYGKEDNDTLVRCPLTDPEVTNYSLKG 150 151 CQGKPLPKDLRFIPDPKAGIMIKSVKRAYHRLCLHCSVDQEGKSVLSEKF 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 CQGKPLPKDLRFIPDPKAGIMIKSVKRAYHRLCLHCSVDQEGKSVLSEKF
200 . . . . . 201 ILKVRPAFKAVPVVSVSKASYLLREGEEFTVTCTIKDVSSSVYST KREN
250 I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 ILKVRPAFKAVPVVSVSKASYLLREGEEFTVTCTIKDVSSSVYSTWKREN 250 251 SQTKLQEKYNS HHGDFNYERQATLTISSARVNDSGVFMCYANNTFGSAN 300 I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 SQTKLQEKYNS HHGDFNYERQATLTISSARVNDSGVFMCYANNTFGSAN 00 301 VTTTLEVVDKGFINIFPMINTTVFVNDGENVDLIVEYEAFPKPEHQQWIY 50 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I M I I I M I I I I M 301 VTTTLEWDKGFINIFPMINTTVFVNDGENVDLIVEYEAFPKPEHQQWIY 50 351 MNRTFTDKWEDYPKSENESNIRYVSELHLTRLKGTEGGTYTFLVSNSDVN 00 I II I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 351 MNRTFTDK EDYPKSENESNIRYVSELHLTRLKGTEGGTYTFLVSNSDVN 00 401 AAIAFNVYVNTKPEILTYDRLVNGMLQCVAAGFPEPTID YFCPGTEQRC 50
401 AAIAFNVYVNTKPEILTYDRLVNGMLQCVAAGFPEPTID YFCPGTEQRC 450 451 SASVLPVDVQTLNSSGPPFGKLVVQSSIDSSAFKHNGTVECKAYNDVGKT 500 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 SASVLPVDVQTLNSSGPPFGKLVVQSSIDSSAFKHNGTVECKAYNDVGKT
500 . . . . . 501 SAYFNFAFKGNNKEQIHPHTLFTPLLIGFVIVAGMMCIIVMILTYKYLQK
550 I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 SAYFNFAFKGNNKEQIHPHTLFTPLLIGFVIVAGMMCIIVMILTYKYLQK 550 551 PMYEVQWKVVEEINGNNYVYIDPTQLPYDHKWEFPRNRLSFGKTLGAGAF 600 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 551 PMYEVQ KVVEEINGNNYVYIDPTQLPYDHKWEFPRNRLSFGKTLGAGAF 600 601 GKVVEATAYGLIKSDAAMTVAVKMLKPSAHLTEREALMSELKVLSYLGNH 650 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I 601 GKVVEATAYGLIKSDAAMTVAVKMLKPSAHLTEREALMSELKVLSYLGNH 650 651 MNIVNLLGACTIGGPTLVITEYCCYGDLLNFLRRKRDSFICSKQEDHAEA 700 I I I I I I I I I I II I I I I I I I I I I I I I II I I II I I I I I II I I II II I I I I I I 651 MNIVNLLGACTIGGPTLVITEYCCYGDLLNFLRRKRDSFICSKQEDHAEA 700 701 ALYKNLLHSKESSCSDSTNEYMDMKPGVSYVVPTKADKRRSVRIGSYIER 750 I I I I I I I I I I f I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 701 ALYKNLLHSKESSCSDSTNEYMDMKPGVSYVVPTKADKRRSVRIGSYIER
750 . . . . . 751 DVTPAIMEDDELALDLEDLLSFSYQVAKGMAFLASKNCIHRDLAARNILL
800 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 751 DVTPAIMEDDELALDLEDLLSFSYQVAKGMAFLASKNCIHRDLAARNILL 800 801 THGRITKICDFGLARDIKNDSNYVVKGN 828 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 801 THGRITKICDFGLARDIKNDSNYVVKGN 828
Sequence name: /tmp/a2m3JlAqNv/OAemZbGj r :KIT_HUMAN
Sequence documentation:
Alignment of: HSKITCR_P5 x KIT_HUMAN Alignment segment 1/1: Quality: 1107.00 Escore: 0 Matching length: 112 Total length: 112 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MRGARGA DFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRV 50 I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MRGARGAWDFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRV 50 51 GDEIRLLCTDPGFVKWTFEILDETNENKQNEWITEKAEATNTGKYTCTNK
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 51 GDEIRLLCTDPGFVKWTFEILDETNENKQNEWITEKAEATNTGKYTCTNK
100 101 HGLSNSIYVFVR 112 101 HGLSNSIYVFVR
112
Expression of Mast/stem cell growth factor receptor SCFR;Proto-oncogene tyrosine-protein kinase Kit (HSKITCR) transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2 in normal and cancerous ovary tissues Expression of Mast/stem cell growth factor receptor SCFR;Proto-oncogene tyrosine- protein kinase Kit transcripts detectable by or according to seg3, HSKITCR seg3F2R2 amplicon (SEQ ID NO: 1394) and HSKITCR seg3F2 (SEQ ID NO: 1395) and HSKITCR seg3R2 (SEQ ID NO: 1396) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO: 1372), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRT1 -amplicon, SEQ ID NO: 1375), SDHA (GenBank
Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO:1369), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon, SEQ ID NO: 1381) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of the quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median ofthe quantities ofthe normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 2 above, "Tissue sample in ovarian cancer testing panel"). Then the reciprocal of this ratio was calculated, to obtain a value of fold down-regulation for each sample relative to median ofthe normal PM samples. Figure 13 is a histogram showing down regulation ofthe above-indicated Mast/stem cell growth factor receptor Kit transcripts in cancerous ovary samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. The number and percentage of samples that exhibit at least 5 fold down regulation, out ofthe total number of samples tested, is indicated in the bottom. As is evident from Figure 13, the expression of Mast/stem cell growth factor receptor
Kit transcripts detectable by the above amplicon(s) in cancer samples was significantly lower than in the non-cancerous samples (Sample Nos. 45-48, 71 Table 2, "Tissue sample in ovarian cancer testing panel"). Notably down regulation of at least 5 fold was found in 22 out of 43 adenocarcinoma samples.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non-limiting illustrative example only of a suitable primer pair: HSKITCR seg3F2 forward primer; and HSKITCR seg3R2 reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSKITCR seg3F2R2.
Forward primer (SEQ ID NO: 1395): GTAAATGCTTGGCTTTCTGCAGT Reverse primer (SEQ ID NO: 1396): AATATTTTATCTATGGCTCAGTCATCCAT Amplicon (SEQ ID NO: 1394):
GTAAATGCTTGGCTTTCTGCAGTGCTGTGCTTTCAAGAATTTAATATCCTGCTCTT AATTTTGGATGACATATGGATGACTGAGCCATAGATAAAATATT
Expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2 in normal and cancerous colon tissues Expression of Mast/stem cell growth factor receptor Kit transcripts detectable by or according to seg3, HSKITCR seg3F2R2 amplicon (SEQ ID NO: 1394), and HSKITCR seg3F2 (SEQ ID NO: 1395)and HSKITCR seg3R2 (SEQ ID NO: 1396) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:1372), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRT1 -amplicon, SEQ ID NO: 1375), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID
NO: 1384), and RPS27A (GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO:1387) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean ofthe quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median ofthe quantities ofthe normal post-mortem (PM) samples (Sample Nos. 41,52, 62-67, 69-71, Table 3, above: "Tissue samples in colon cancer testing panel"), to obtain a value of fold differential expression for each sample relative to median of the normal PM samples. Figure 14 is a histogram showing down regulation ofthe above-indicated Mast/stem cell growth factor receptor Kit transcripts in cancerous colon samples relative to the normal
samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. As is evident from Figure 14, the expression of these transcripts detectable by the above amplicon(s) in cancer samples was significantly lower than in the non-cancerous samples (Sample Nos. 41,52,62-67,69-71 Table 3, "Tissue samples in colon cancer testing panel"). Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non-limiting illustrative example only of a suitable primer pair: HSKITCR seg3F2 forward primer; and HSKITCR seg3R2 reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSKITCR seg3F2R2.
Forward primer (SEQ ID NO: 1395): GTAAATGCTTGGCTTTCTGCAGT Reverse primer (SEQ ID NO: 1396): AATATTTTATCTATGGCTCAGTCATCCAT Amplicon (SEQ ID NO: 1394):
GTAAATGCTTGGCTTTCTGCAGTGCTGTGCTTTCAAGAATTTAATATCCTGCTCTT AATTTTGGATGACATATGGATGACTGAGCCATAGATAAAATATT Expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2 in normal and cancerous lung tissues Expression of Mast/stem cell growth factor receptor Kit transcripts detectable by or according to seg3, HSKITCR seg3F2R2 amplicon (SEQ ID NO: 1394) and HSKITCR seg3F2 (SEQ ID NO: 1395) and HSKITCR seg3R2 (SEQ ID NO: 1396) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:1372), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRT1 -amplicon, SEQ ID NO: 1375), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon, SEQ ID NO: 1390) and SDHA (GenBank Accession No. NM 004168; amplicon - SDHA- amplicon, SEQ ID NO: 1369) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean ofthe quantities ofthe
housekeeping genes. The normalized quantity of each RT sample was then divided by the median ofthe quantities of the normal post-mortem (PM) samples (Sample Nos. 47-50, 90- 93, 96-99, Table 4: "Tissue samples in lung cancer testing panel"). In order to estimate down regulation, the reciprocal of this ratio was also calculated, to obtain a value of fold down- regulation for each sample relative to median ofthe normal PM samples. Figures 15 and 16 are a histograms showing over expression and down regulation, respectively, ofthe above-indicated Mast/stem cell growth factor receptor Kit transcripts in cancerous lung samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. The number and percentage of samples that exhibit at least 5 fold over expression or down regulation, out ofthe total number of samples tested is indicated in the bottom. As is evident from Figures 15 and 16, the expression of these transcripts detectable by the above amplicon(s) in cancer samples was differentially expressed relative to the non- cancerous samples (Sample Nos. 47-50, 90-93, 96-99 Table 4: "Tissue samples in lung cancer testing panel"). Notably, over expression of at least 5 fold was found in 4 out of 15 adenocarcinoma samples, 3 out of 16 squamous cell carcinoma samples, and in 3 out of 8 small cells carcinoma samples, and down regulation of at least 5 fold was found in 3 out of 15 adenocarcinoma samples, 6 out of 16 squamous cell carcinoma samples, and in 2 out of 4 large cell carcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non-limiting illustrative example only of a suitable primer pair: HSKITCR seg3F2 forward primer; and HSKITCR seg3R2 reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSKITCR seg3F2R2.
Forward primer (SEQ ID NO: 1395): GTAAATGCTTGGCTTTCTGCAGT
Reverse primer (SEQ ID NO: 1396): AATATTTTATCTATGGCTCAGTCATCCAT
Amplicon (SEQ ID NO: 1394):
GTAAATGCTTGGCTTTCTGCAGTGCTGTGCTTTCAAGAATTTAATATCCTGCTCTT
AATTTTGGATGACATATGGATGACTGAGCCATAGATAAAATATT
Expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2 in normal and cancerous prostate tissues Expression of Mast/stem cell growth factor receptor Kit transcripts detectable by or according to seg3, HSKITCR seg3F2R2 amplicon (SEQ ID NO: 1394) and HSKITCR seg3F2 (SEQ ID NO: 1395) and HSKITCR seg3R2 (SEQ ID NO: 1396) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BCO 19323; amplicon - PBGD-amplicon, SEQ ID NO: 1372), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRT1 -amplicon, SEQ ID NO: 1375), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO:1369), and RPL19 (GenBank Accession No. NM_000981 ; RPL19 amplicon, SEQ ID NO: 1378) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean ofthe quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median ofthe quantities ofthe normal post-mortem (PM) samples (Sample Nos. 42,48-53, 59-63, Table 1, above, "Tissue samples in prostate cancer testing panel"), to obtain a value of fold upregulation for each sample relative to median ofthe normal PM samples. Figure 17 is a histogram showing over expression ofthe above-indicated Mast/stem cell growth factor receptor Kit transcripts in cancerous prostate samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. The number and percentage of samples that exhibit at least 5 fold over-expression, out ofthe total number of samples tested, is indicated in the bottom. As is evident from Figure 17, the expression of these transcripts detectable by the above amplicon(s) in several cancer samples was higher than in the non-cancerous samples
(Sample Nos. 42,48-53, 59-63 Table 1 : "Tissue samples in prostate cancer testing panel").
Notably an over-expression of at least 5 fold was found in 5 out of 19 adenocarcinoma samples.
Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non-limiting illustrative example only of a suitable primer pair: HSKITCR seg3F2 forward primer; and HSKITCR seg3R2 reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSKITCR seg3F2R2.
Forward primer (SEQ ID NO: 1395): GTAAATGCTTGGCTTTCTGCAGT Reverse primer (SEQ ID NO: 1396): AATATTTTATCTATGGCTCAGTCATCCAT Amplicon (SEQ ID NO: 1394): GTAAATGCTTGGCTTTCTGCAGTGCTGTGCTTTCAAGAATTTAATATCCTGCTCTT AATTTTGGATGAC ATATGGATGACTGAGCCATAGATAAAATATT Expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2 in normal and cancerous breast tissues Expression of Mast/stem cell growth factor receptor Kit transcripts detectable by or according to seg3, HSKITCR seg3F2R2amplicon (SEQ ID NO: 1394) and HSKITCR seg3F2 (SEQ ID NO: 1395) and HSKITCR seg3R2 (SEQ ID NO: 1396) primers was measured by real time PCR. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BCO 19323; amplicon - PBGD-amplicon, SEQ ID NO: 1372), HPRT1 (GenBank Accession No. NM_000194; amplicon - HPRT1 -amplicon, SEQ ID NO:1375), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO: 1369), and G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO: 1384) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean ofthe quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 5, above: "Tissue samples in breast cancer testing panel"). Then the reciprocal of this ratio was calculated, to obtain a value of fold down-regulation for each sample relative to median of the normal PM samples.
Figure 18 is a histogram showing down regulation ofthe above-indicated transcripts in cancerous breast samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained.) The number and percentage of samples that exhibit at least 5 fold down regulation, out ofthe total number of samples tested, is indicated in the bottom. As is evident from Figure 18, the expression of these transcripts detectable by the above amplicon(s) in several cancer samples was lower than in the non-cancerous samples (Sample Nos. 56-60, 63-67 Table 5: "Tissue samples in breast cancer testing panel"). Notably down regulation of at least 5 fold was found in 14 out of 28 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non-limiting illustrative example only of a suitable primer pair: HSKITCR seg3F2forward primer; and HSKITCR seg3R2 reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSKITCR seg3F2R2.
Forward primer (SEQ ID NO: 1395): GTAAATGCTTGGCTTTCTGCAGT Reverse primer (SEQ ID NO: 1396): AATATTTTATCTATGGCTCAGTCATCCAT Amplicon (SEQ ID NO: 1394):
GTAAATGCTTGGCTTTCTGCAGTGCTGTGCTTTCAAGAATTTAATATCCTGCTCTT AATTTTGGATGACATATGGATGACTGAGCCATAGATAAAATATT Expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2 in different normal tissues
Expression of Mast/stem cell growth factor receptor Kit transcripts detectable by or according to HSKITCR seg3F2R2 amplicon (SEQ ID NO: 1394) and primers- HSKITCR seg3F2 (SEQ ID NO: 1395) and HSKITCR seg3R2 (SEQ ID NO: 1396), was measured by real time PCR. In parallel the expression of four housekeeping genes - RPL19 (GenBank Accession No. NM_000981 ; RPL19 amplicon, SEQ ID NO:1378), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO: 1393), Ubiquitin (GenBank
Accession No. BC000449; amplicon - Ubiquitin-amplicon, SEQ ID NO: 1390) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO: 1369) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean ofthe quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median ofthe quantities of the prostate samples (Sample Nos. 36-38 Table 6, above: "Tissue samples in normal panel"), to obtain a value of relative expression of each sample relative to median ofthe prostate samples.
Forward primer (SEQ ID NO: 1395): GTAAATGCTTGGCTTTCTGCAGT
Reverse primer (SEQ ID NO: 1396): AATATTTTATCTATGGCTCAGTCATCCAT
Amplicon (SEQ ID NO: 1394):
GTAAATGCTTGGCTTTCTGCAGTGCTGTGCTTTCAAGAATTTAATATCCTGCTCTT AATTTTGGATGACATATGGATGACTGAGCCATAGATAAAATATT
The results are presented in Figure 19, demonstrating the expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg3F2R2 in different normal tissues.
Expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2 in normal and cancerous colon tissues Expression of Mast/stem cell growth factor receptor Kit transcripts detectable by or according to seg44, HSKITCR seg44F2R2 amplicon (SEQ ID NO: 1397) and HSKITCR seg44F2 (SEQ ID NO: 1398) and HSKITCR seg44R2 (SEQ ID NO: 1399) primers was measured by real time PCR. These transcripts are related to the previously known or WT
(wild type) protein. In parallel the expression of four housekeeping genes -PBGD (GenBank
Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:1372), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:1375),
G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO: 1384), RPS27A
(GenBank Accession No. NM_002954; RPS27A amplicon, SEQ ID NO: 1387) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem
(PM) samples (Sample Nos. 41,52, 62-67, 69-71, Table 3, above),. Then the reciprocal of this ratio was calculated, to obtain a value of fold down-regulation for each sample relative to median ofthe normal PM samples. Figure 20 is a histogram showing down regulation ofthe above-indicated Mast/stem cell growth factor receptor Kit transcripts in cancerous colon samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. As is evident from Figure 20, the expression of [Mast/stem cell growth factor receptor Kit transcripts detectable by the above amplicon(s) in cancer samples was significantly lower than in the non-cancerous samples (Sample Nos. 41, 52, 62-67, 69-71,
Table 3). Notably down regulation of at least 5 fold was found in 19 out of 36 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non-limiting illustrative example only of a suitable primer pair: HSKITCR seg44F2 forward primer; and HSKITCR seg44R2 reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following
amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSKITCR seg44F2R2.
Forward primer: (SEQ ID NO: 1398) AGAATCAGTGTTTGGGTCACCC Reverse primer: (SEQ ID NO: 1399) CACTATCCTGGAGTTGGATGCA
Amplicon (SEQ ID NO: 1397):
AGAATCAGTGTTTGGGTCACCCCTCCAGGAATGATCTCTTCTTTTGGCTTCCATG
ATGGTTATTTTCTTTTCTTTCAACTTGCATCCAACTCCAGGATAGTG
Expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2 in normal and cancerous Breast tissues Expression of Mast/stem cell growth factor receptor Kit transcripts detectable by or according to seg44, HSKITCR seg44F2R2 amplicon (SEQ ID NO: 1397) and HSKITCR seg44F2 (SEQ ID NO: 1398) and HSKITCR seg44R2 (SEQ ID NO: 1399) primers was measured by real time PCR. These transcripts are related to the previously known or WT (wild type) protein. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:1372), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:1375), SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO: 1369), G6PD (GenBank Accession No. NM_000402; G6PD amplicon, SEQ ID NO: 1384) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 56-60, 63-67, Table 5, above). Then the reciprocal of this ratio was calculated, to obtain a value of fold down-regulation for each sample relative to median ofthe normal PM samples. Figure 21 is a histogram showing down regulation ofthe above-indicated Mast/stem cell growth factor receptor Kit transcripts in cancerous breast samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained, (samples 16, 17 and 39 were not checked in duplicates).
As is evident from Figure 21, the expression of Mast/stem cell growth factor receptor Kit transcripts detectable by the above amplicon(s) in cancer samples was significantly lower than in the non-cancerous samples (Sample Nos. 56-60, 63-67 Table 5). Notably down regulation of at least 5 fold was found in 16 out of 28 adenocarcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non-limiting illustrative example only of a suitable primer pair: HSKITCR seg44F2 forward primer; and HSKITCR seg44R2 reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSKITCR seg44F2R2.
Forward primer: (SEQ ID NO: 1398) AGAATCAGTGTTTGGGTCACCC Reverse primer: (SEQ ID NO: 1399) CACTATCCTGGAGTTGGATGCA
Amplicon (SEQ ID NO: 1397):
AGAATCAGTGTTTGGGTCACCCCTCCAGGAATGATCTCTTCTTTTGGCTTCCATG ATGGTTATTTTCTTTTCTTTCAACTTGCATCCAACTCCAGGATAGTG
Expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2 in normal and cancerous lung tissues Expression of Mast/stem cell growth factor receptor Kit transcripts detectable by or according to seg44, HSKITCR seg44 amplicon (SEQ ID NO: 1397) and HSKITCR seg44F2 (SEQ ID NO: 1398) and HSKITCR seg44R2 (SEQ ID NO: 1399) primers was measured by real time PCR. These transcripts are related to the previously known or WT (wild type) protein. In parallel the expression of four housekeeping genes -PBGD (GenBank Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:1372), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:1375), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon, SEQ ID NO: 1390) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID
NO: 1369) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 47-50, 90-93, 96-99, Table 4, above). Then the reciprocal of this ratio was calculated, to obtain a value of fold down-regulation for each sample relative to the median ofthe normal PM samples. Figure 22 is a histogram showing down regulation of the above-indicated Mast/stem cell growth factor receptor Kit transcripts in cancerous lung samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. As is evident from Figure 22, the expression of Mast/stem cell growth factor receptor
Kit transcripts detectable by the above amplicon(s) in cancer samples was lower than in the non-cancerous samples (Sample Nos. 46-50, 90-93, 96-99, Table 4). Notably down regulation of at least 5 fold was found in 6 out of 16 squamous cell carcinoma samples, 2 out of 4 large cell carcinoma samples. Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non-limiting illustrative example only of a suitable primer pair: HSKITCR seg44F2 forward primer; and HSKITCR seg44R2 reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSKITCR seg44F2R2.
Forward primer: (SEQ ID NO: 1398) AGAATCAGTGTTTGGGTCACCC Reverse primer: (SEQ ID NO: 1399) CACTATCCTGGAGTTGGATGCA
Amplicon (SEQ ID NO: 1397):
AGAATCAGTGTTTGGGTCACCCCTCCAGGAATGATCTCTTCTTTTGGCTTCCATG ATGGTTATTTTCTTTTCTTTCAACTTGCATCCAACTCCAGGATAGTG
Expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2 in normal and cancerous ovary tissues Expression of Mast/stem cell growth factor receptor Kit transcripts detectable by or according to seg44, HSKITCR seg44F2R2 amplicon (SEQ ID NO: 1397) and HSKITCR seg44F2 (SEQ ID NO: 1398) and HSKITCR seg44R2 (SEQ ID NO: 1399) primers was measured by real time PCR. These transcripts are related to the previously known or WT
(wild type) protein. In parallel the expression of four housekeeping genes -PBGD (GenBank
Accession No. BC019323; amplicon - PBGD-amplicon, SEQ ID NO:1372), HPRTl (GenBank Accession No. NM_000194; amplicon - HPRTl -amplicon, SEQ ID NO:1375), and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID
NO: 1369), and GAPDH (GenBank Accession No. BC026907; GAPDH amplicon, SEQ ID
NO: 1381) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the normal post-mortem (PM) samples (Sample Nos. 45-48, 71, Table 2 above). Then the reciprocal of this ratio was calculated, to obtain a value of fold down-regulation for each sample relative to median ofthe normal PM samples. Figure 23 is a histogram showing down regulation ofthe above-indicated Mast/stem cell growth factor receptor Kit transcripts in cancerous ovary samples relative to the normal samples. Values represent the average of duplicate experiments. Error bars indicate the minimal and maximal values obtained. As is evident from Figure 23, the expression of Mast/stem cell growth factor receptor Kit transcripts detectable by the above amplicon(s) in cancer samples was lower than in the non-cancerous samples (Sample Nos. 45-48, 71 Table 2). Notably down regulation of at least 5 fold was found in 24 out of 43 adenocarcinoma samples, Primer pairs are also optionally and preferably encompassed within the present invention; for example, for the above experiment, the following primer pair was used as a non-limiting illustrative example only of a suitable primer pair: HSKITCR seg44F2 forward primer; and HSKITCR seg44R2 reverse primer. The present invention also preferably encompasses any amplicon obtained through the use of any suitable primer pair; for example, for the above experiment, the following
amplicon was obtained as a non-limiting illustrative example only of a suitable amplicon: HSKITCR seg44F2R2.
Forward primer: (SEQ ID NO: 1398) AGAATCAGTGTTTGGGTCACCC Reverse primer: (SEQ ID NO: 1399) CACTATCCTGGAGTTGGATGCA
Amplicon (SEQ ID NO: 1397):
AGAATCAGTGTTTGGGTCACCCCTCCAGGAATGATCTCTTCTTTTGGCTTCCATG ATGGTTATTTTCTTTTCTTTCAACTTGCATCCAACTCCAGGATAGTG
Expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2 in different normal tissues
Expression of Mast/stem cell growth factor receptor Kit transcripts detectable by or according to HSKITCR seg44F2R2 amplicon (SEQ ID NO: 1397) and primers HSKITCR seg44F2 (SEQ ID NO: 1398) and HSKITCR seg44R2 (SEQ ID NO: 1399), was measured by real time PCR. These transcripts are related to the previously known or WT (wild type) protein. In parallel the expression of four housekeeping genes - RPL19 (GenBank Accession No. NM_000981 ; RPL19 amplicon, SEQ ID NO:1378), TATA box (GenBank Accession No. NM 003194; TATA amplicon, SEQ ID NO: 1393), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon, SEQ ID NO: 1390) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO: 1369) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean ofthe quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities of the prostate samples (Sample Nos. 34-36 above), to obtain a value of relative expression of each sample relative to median ofthe prostate samples.
Forward primer: (SEQ ID NO: 1398) AGAATCAGTGTTTGGGTCACCC Reverse primer: (SEQ ID NO: 1399) CACTATCCTGGAGTTGGATGCA
Amplicon (SEQ ID NO: 1397): AGAATCAGTGTTTGGGTCACCCCTCCAGGAATGATCTCTTCTTTTGGCTTCCATG ATGGTTATTTTCTTTTCTTTCAACTTGCATCCAACTCCAGGATAGTG The results are presented in Figure 24, demonstrating the expression of Mast/stem cell growth factor receptor Kit HSKITCR transcripts which are detectable by amplicon as depicted in sequence name HSKITCR seg44F2R2 in different normal tissues.
No differential expression was observed in one experiment carried out with HSKITCR seg44F2R2 amplicon (SEQ ID NO: 1397) on Prostate samples panel, as can be seen from Figure 40.
Overall, the expression pattern ofthe variant protein transcript is similar to the WT (known protein) transcript expression. However, in some cases (e.g. ovary, prostate and lung cancer) over expression of HSKITCR T6 seems to be higher.
Subsection C: Oxytocin-neurophysin 1
DESCRIPTION FOR CLUSTER HUMOTCB
Cluster HUMOTCB features 3 transcript(s) and 5 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants ofthe known protein Oxytocin-neurophysin 1 precursor (SwissProt accession identifier NEU1 HUMAN; known also according to the synonyms OT-NPI; Ocytocin), SEQ ID NO:815, referred to herein as the previously known protein. Protein Oxytocin-neurophysin 1 precursor is known or believed to have the following function(s): Neurophysin 1 specifically binds oxytocin;Oxytocin causes contraction ofthe smooth muscle ofthe uterus and ofthe mammary gland. Oxytocin (neurophysin II) is a nonapeptide produced by the neurohypophysis. It differs from vasopressin (AVP,neurophysin I) at position 3 and 8. Oxytocin stimulates postpartum milk letdown in response to suckling. It may also help to initiate or facilitate labor. Paracrine oxytocin production stimulated by estrogen may be important in activating the uterus at term. Opiate withdrowal elevates oxytocin levels too. Oxytocin is degraded by the liver and kidney and by an N-terminal peptidase produced by the placenta. Plasma oxytocin measurment was suggested for the early detection of oat cell- and possibly other neuroendocrine-derived carcinomas. Plasma oxytocin levels are lower in children with abdominal pain. Tumors such as breast and endometrial carcinomas, neuroblastomas, and glioblastomas express oxytocin receptor. A radiolabeled ligand of oxytocin receptor was suggested as a tool for imaging and, possibly, therapy of OTR-positive tumors. The variants according to the present invention are believed to be useful diagnostics for these indications, optionally and preferably including diagnosis of endocrine syndromes related to lactation. Furthermore these variants maintain oxytocin itself but feature variations in the accessory protein that is part ofthe precursor.
The sequence for protein Oxytocin-neurophysin 1 precursor is given at the end ofthe application, as "Oxytocin-neurophysin 1 precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Oxytocin-neurophysin 1 precursor localization is believed to be Secreted. The following GO Annotatιon(s) apply to the previously known protein. The following annotation(s) were found: signal transduction; parturition, which are annotation(s) related to Biological Process; hormone; neurohypophyseal hormone, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster HUMOTCB features 3 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Oxytocin-neurophysin 1 precursor. A description of each variant protein according to the present invention is now provided. Variant protein HUMOTCB_P2 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMOTCB_Tl . An alignment is given to the known protein (Oxytocin-neurophysin 1 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMOTCB_P2 and NEU1_HUMAN: 1.An isolated chimeric polypeptide encoding for HUMOTCB P2, comprising a first amino acid sequence being at least 90 % homologous to
MAGPSLACCLLGLLALTSACYIQNCPLGGKRAAPDLDVRK corresponding to amino acids 1 - 40 of NEU1_HUMAN, which also corresponds to amino acids 1 - 40 of HUMOTCB_P2, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSPQPWSRGAPGREGPAATGARPAPASPENSRS corresponding to amino acids 41 - 73 of HUMOTCB P2, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMOTCB_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSPQPWSRGAPGREGPAATGARPAPASPENSRS in HUMOTCB_P2.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMOTCB_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOTCB P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
The phosphorilation sites of variant protein HUMOTCB P2, as compared to the known protein Oxytocin-neurophysin 1 precursor, are described in Table 6 (given according to their position(s) on the amino acid sequence in the first column; the second column
indicates whether the phosphorilation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 6 - Phosphorilation site(s)
Variant protein HUMOTCB P2 is encoded by the following transcript(s): HUMOTCB_Tl, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMOTCB T1 is shown in bold; this coding portion starts at position 29 and ends at position 247. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOTCB_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HUMOTCB P3 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMOTCB T2. An alignment is given to the known protein (Oxytocin-neurophysin 1 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMOTCB_P3 and NEUI TUMAN: l.An isolated chimeric polypeptide encoding for HUMOTCB P3, comprising a first amino acid sequence being at least 90 % homologous to MAGPSLACCLLGLLALTSACYIQNCPLGGKRAAPDLDVRKCLPCGPGGKGRCFGPN ICCAEELGCFVGTAEALRCQEENYLPSPCQSGQKACGSGGRCAVLGLCCS
corresponding to amino acids 1 - 106 of NEU1 HUMAN, which also corresponds to amino acids 1 - 106 of HUMOTCB P3, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PGERGKALRGQGEAGGGAAGIPLTPPLP corresponding to amino acids 107 - 134 of HUMOTCB P3, and a third amino acid sequence being at least 90 % homologous to PDGCHADPACDAEATFSQR corresponding to amino acids 107 - 125 of NEU1JHUMAN, which also corresponds to amino acids 135 - 153 of HUMOTCB P3, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for an edge portion of HUMOTCB P3, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for PGERGKALRGQGEAGGGAAGIPLTPPLP, corresponding to HUMOTCB_P3.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMOTCB P3 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOTCB P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
The phosphorilation sites of variant protein HUMOTCB P3, as compared to the known protein Oxytocin-neurophysin 1 precursor, are described in Table 9 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorilation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Phosphorilation site(s)
Variant protein HUMOTCB P3 is encoded by the following transcript(s): HUMOTCB T2, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMOTCB T2 is shown in bold; this coding portion starts at position 29 and ends at position 487. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOTCB_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HUMOTCB P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMOTCB_T3. An alignment is given to the known protein (Oxytocin-neurophysin 1 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMOTCB_P4 and NEU1_HUMAN:
1.An isolated chimeric polypeptide encoding for HUMOTCB P4, comprising a first amino acid sequence being at least 90 % homologous to MAGPSLACCLLGLLALTSACYIQNCPLGGKRAAPDLDVRK corresponding to amino acids 1 - 40 of NEU1_HUMAN, which also corresponds to amino acids 1 - 40 of HUMOTCB_P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TAATPTLPATRKPPSPSAET corresponding to amino acids 41 - 60 of HUMOTCB P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMOTCB P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TAATPTLPATRKPPSPSAET in HUMOTCB_P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMOTCB_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOTCB P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
The phosphorilation sites of variant protein HUMOTCB P4, as compared to the known protein Oxytocin-neurophysin 1 precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second
column indicates whether the phosphorilation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Phosphorilation site(s)
Variant protein HUMOTCB P4 is encoded by the following transcript(s): HUMOTCB T3, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMOTCB_T3 is shown in bold; this coding portion starts at position 29 and ends at position 208. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMOTCB P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided. Segment cluster HUMOTCB node O according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOTCB_Tl, HUMOTCB_T2 and HUMOTCB T3. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
HUMOTCB T3 1 148
Segment cluster HUMOTCB node l according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOTCB Tl. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster HUMOTCB_node_2 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOTCB_Tl and HUMOTCB T2. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster HUMOTCB_node_4 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOTCB T1, HUMOTCB T2 and HUMOTCB T3. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMOTCB_node_3 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMOTCB Tl and HUMOTCB T2. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name : NEUl_HUMAN Sequence documentation: Alignment of: HUMOTCB_P2 x NEUl_HUMAN Alignment segment 1/1: Quality: 392 00 Escore: 0 Matching length: 43 Total length: 43 Matching Percent Similarity: 95 35 Matching Percent Identity: 95.35 Total Percent Similarity: 95 35 Total Percent Identity: 95.35 Gaps : 0 Alignment : 1 MAGPSLACCLLGLLALTSACYIQNCPLGGKRAAPDLDVRKVSP 43 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAGPSLACCLLGLLALTSACYIQNCPLGGKRAAPDLDVRKCLP 43
Sequence name: NEU1_HUMAN
Sequence documentation:
Alignment of: HUMOTCB_P3 x NEU1_HUMAN
Alignment segment 1/1: Quality: 1167.00 Escore: 0 Matching length: 125 Total length: 153 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 81.70 Total Percent Identity: 81.70 Gaps : 1
Alignment: 1 MAGPSLACCLLGLLALTSACYIQNCPLGGKRAAPDLDVRKCLPCGPGGKG 50 I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAGPSLACCLLGLLALTSACYIQNCPLGGKRAAPDLDVRKCLPCGPGGKG 50 51 RCFGPNICCAEELGCFVGTAEALRCQEENYLPSPCQSGQKACGSGGRCAV 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 RCFGPNICCAEELGCFVGTAEALRCQEENYLPSPCQSGQKACGSGGRCAV
100 101 LGLCCSPGERGKALRGQGEAGGGAAGIPLTPPLPPDGCHADPACDAEATF
150 I I I I I I I I I I I I I I I I I I I I I I 101 LGLCCS PDGCHADPACDAEATF
122 151 SQR
153 I I I 123 SQR 125
Sequence name: NEU1_HUMAN Sequence documentation: Alignment of: HUMOTCB_P4 x NEU1_HUMAN Alignment segment 1/1: Quality: 389.00 Escore: 0 Matching length: 40 Total length: 40 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MAGPSLACCLLGLLALTSACYIQNCPLGGKRAAPDLDVRK 40 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MAGPSLACCLLGLLALTSACYIQNCPLGGKRAAPDLDVRK 40
Subsection D: Endothelin-1
DESCRIPTION FOR CLUSTER S56805 Cluster S56805 features 1 transcript(s) and 15 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest ξTranscπptNamelf Sequence ΪDMom S56805 T4 819
Table 2 - Segments of interest
Table 3 - Proteins of interest o'teinfanfl κorrespimdιngJEranscπpt(s)l S56805 P2 836 S56805 T4 These sequences are variants ofthe known protein Endothelin-1 precursor (SwissProt accession identifier ET1_HUMAN; known also according to the synonyms ET-1), SEQ ID NO: 835, referred to herein as the previously known protein. Protein Endothelin-1 precursor is known or believed to have the following fiιnction(s): Endothelins are endothelium-derived vasoconstrictor peptides. ET-1 induces a program of matrix synthesis in lung fibroblasts and may play a key role in connective tissue deposition during wound repair and in pulmonary fibrosis. Furthermore, ET-1 plasma levels were significantly increased in both patients with primary tumour and patients with metastases, compared to controls (P < 0.01, 3.9 +/- 1.4, 4.5 +/- 1.5, vs. 2.75 +/- 1.37 pg/ml, respectively). Therefore, the variants according to the present invention are believed to be useful for these diagnostic indications. The sequence for protein Endothelin-1 precursor is given at the end ofthe application, as "Endothelin-1 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Endothelin-1 precursor localization is believed to be Secreted.
It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or
therapeutically related activity or activities ofthe previously known protein are as follows: Endothelin 1 antagonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Cardiovascular; Anti- inflammatory; Neurological. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: signal transduction; cell-cell signaling; blood pressure regulation; positive control of cell proliferation; pathogenesis; regulation of vasoconstriction, which are annotation(s) related to Biological Process; hormone, which are annotation(s) related to Molecular Function; and extracellular space; soluble fraction, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster S56805 features 1 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Endothelin-1 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein S56805 P2 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) S56805 T4. An alignment is given to the known protein (Endothelin-1 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S56805 P2 and ET1 HUMAN: l.An isolated chimeric polypeptide encoding for S56805_P2, comprising a first amino acid sequence being at least 90 % homologous to MDYLLMIFSLLFVACQGAPETAVLGAELSAVGENGGEKPTPSPPWRLRRSKRCSCSS LMDKECVYFCHLDIIWVNTPE corresponding to amino acids 1 - 78 of ET1 HUMAN, which also corresponds to amino acids 1 - 78 of S56805 P2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.
Variant protein S56805 P2 is encoded by the following transcript(s): S56805 T4, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript S56805 T4 is shown in bold; this coding portion starts at position 730 and ends at position 963. The transcπpt also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S56805 P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
As noted above, cluster S56805 features 15 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster S56805_node_4 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S56805 T4. Table 6 below describes the starting and ending position of this segment on each transcript. Table 6 - Segment location on transcripts
Segment cluster S56805_node_5 according to the present invention is supported by 53 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S56805_T4. Table 7 below describes the starting and ending position of this segment on each transcript. Table 7 - Segment location on transcripts
Segment cluster S56805_node_12 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S56805 T4. Table 8 below describes the starting and ending position of this segment on each transcript. Table 8 - Segment location on transcripts
Segment cluster S56805_node_13 according to the present invention is supported by 13 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S56805_T4. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Segment cluster S56805_node_14 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S56805_T4. Table 10 below describes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster S56805_node_17 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S56805_T4. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster S56805_node_20 according to the present invention is supported by 79 libraries. The number of libraries was determined as previously described. This segment
can be found in the following transcript(s): S56805_T4. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster S56805_node_21 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S56805 T4. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster S56805_node_23 according to the present invention is supported by 72 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S56805_T4. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster S56805_node_24 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S56805_T4. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
S56805 T4 3460 3934 According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster S56805_node_6 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S56805 T4. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster S56805_node_7 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S56805_T4. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster S56805_node_8 according to the present invention can be found in the following transcript(s): S56805_T4. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster S56805_node_16 according to the present invention can be found in the following transcript(s): S56805_T4. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster S56805_node_22 according to the present invention is supported by 52 libraries. The number of libraπes was determined as previously described. This segment can be found in the following transcript(s): S56805 T4. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: ET1_HUMAN Sequence documentation: Alignment of: S56805_P2 x ET1_HUMAN Alignment segment 1/1: Quality: 786.00 Escore: 0 Matching length: 78 Total length: 78 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment:
1 MDYLLMIFSLLFVACQGAPETAVLGAELSAVGENGGEKPTPSPPWRLRRS 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MDYLLMIFSLLFVACQGAPETAVLGAELSAVGENGGEKPTPSPPWRLRRS 50 51 KRCSCSSLMDKECVYFCHLDII VNTPE 78 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 KRCSCSSLMDKECVYFCHLDI IWVNTPE 78
Subsection E: Vitamin-K-dependent protein C precursor DESCRIPTION FOR CLUSTER S50739 Cluster S50739 features 5 transcript(s) and 24 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3.
Table 1 - Transcripts of interest
S50739 PEA 2 T4 837
S50739 PEA 2 T5 838
S50739 PEA 2 T6 839
S50739 PEA 2 Ti l 840
S50739 PEA 2 T13 841
Table 2 - Segments of interest
SegmeniN £τamm*,S* -* Sequence βSNos
S50739 PEA 2 node 8 842
S50739 PEA 2 node 18 843
S50739 PEA 2 node 19 844
S50739 PEA 2 node 31 845
S50739 PEA 2 node 33 846
S50739 PEA 2 node 0 847
S50739 PEA 2 node 1 848
S50739 PEA 2 node 3 849
S50739 PEA 2 node 4 850
S50739 PEA 2 node 5 851
S50739 PEA 2 node 7 852
S50739 PEA 2 node 11 853
S50739 PEA 2 node 14 854
S50739 PEA 2 node 15 855
S50739 PEA 2 node 16 856
S50739 PEA 2 node 17 857
S50739 PEA 2 node 22 858
Table 3 - Proteins of interest
These sequences are variants ofthe known protein Vitamin-K-dependent protein C precursor (SwissProt accession identifier PRTC HUMAN; known also according to the synonyms EC 3.4.21.69; Autoprothrombin IIA; Anticoagulant protein C; Blood coagulation factor XIV), SEQ ID NO: 866, referred to herein as the previously known protein. Protein Vitamin-K-dependent protein C precursor is known or believed to have the following function(s): Protein C is a vitamin K-dependent serine protease that regulates blood coagulation by inactivating factors Va and Villa in the presence of calcium ions and phosphohpids. The function of protein C is to inactivate factor Va and factor Villa. The first step in this process is the activation of thrombomodulin by thrombin. Subsequently, protein C combines with thrombomodulin in order to produce activated Protein C. Activated protein C then combines with protein S on the surface of a platelet (platelets are the clotting cells that circulate in the blood and provide phosphohpids to support that clotting process). Activated protein C can then degrade factor Va and factor Villa. Des-gamma-carboxy (prothrombin) protein C complex was suggested as a complementary marker in the diagnosis of hepatocellular carcinoma. Severe protein C deficiency (<40% ofthe level of protein C in pooled normal human plasma) and high interleukin-6 levels were associated with early death that resulted predominantly from refractory shock and multiple organ dysfunction. Protein C deficiency is present in approximately 0.2% ofthe general population. Protein C abnormalities may result in a variaty of thrombotic phenomena, including deep vein thrombosis, puφura fulminans, disseminated intravascular coagulation, portal vein thrombosis and portal hypertention, pregnancy loss. Interestingly, low protein C levels were
also found in sickle cell anemia patients, possibly due to a combination of their abnormal hemostatic state and their liver damage. The variants ofthe present invention are useful for diagnosis of these diseases, optionally and preferably including diagnosis of inherited clotting disorders. The sequence for protein Vitamin-K-dependent protein C precursor is given at the end ofthe application, as "Vitamin-K-dependent protein C precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities ofthe previously known protein are as follows: Fibrinogen antagonist. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anticoagulant; Antithrombotic; Fibrinolytic; Immunoconjugate; Imaging agent; Anticancer; Septic shock treatment; Neuroprotective; Cardiovascular. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: proteolysis and peptidolysis; blood coagulation, which are annotation(s) related to Biological Process; protein C (activated); chymotrypsin; trypsin; calcium binding; serine-type peptidase; hydrolase, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster S50739 features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Vitamin-K-
dependent protein C precursor. A description of each variant protein according to the present invention is now provided.
Variant protein S50739 PEA 2 P17 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) S50739 PEA 2 T4. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal-peptide prediction programs predict that this protein has a signal peptide, and neither trans-membrane region prediction program predicts that this protein has a trans-membrane region.
Variant protein S50739_PEA_2_P17 is encoded by the following transcript(s): S50739_PEA_2_T4, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript S50739 PEA 2 T4 is shown in bold; this coding portion starts at position 98 and ends at position 253. The transcript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S50739_PEA_2_P17 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
Variant protein S50739 PEA 2 P18 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S50739_PEA_2_T5 and S50739_PEA_2_T6. An alignment is given to the known protein (Vitamin-K-dependent protein C precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S50739_PEA_2_P18 and PRTC_HUMAN: l.An isolated chimeric polypeptide encoding for S50739 PEA 2 P18, comprising a first amino acid sequence being at least 90 % homologous to
MWQLTSLLLFVATWGISGTPAPLDSVFSSSERAHQVLRIRKRANSFLEELRHSSLERE CIEEICDFEEAKEIFQNVDDTLAFWSKHVDGDQCLVLPLEHPCASLCCGHGTCIDGIG SFSCDCRSGWEGRFCQR corresponding to amino acids 1 - 133 of PRTC_HUMAN, which also corresponds to amino acids 1 - 133 of S50739 PEA 2 P18, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEGERWMLAGGGAGLGPGWGRGTSTSCPRPPLPA corresponding to amino acids 134 - 167 of S50739 PEA 2 P18, and a third amino acid sequence being at least 90 % homologous to EVSFLNCSLDNGGCTHYCLEEVGWRRCSCAPGYKLGDDLLQCHPAVKFPCGRPWK RMEKKRSHLKRDTEDQEDQVDPRLIDGKMTRRGDSPWQVVLLDSKKKLACGAVLI HPSWVLTAAHCMDESKKLLVRLGEYDLRRWEKWELDLDIKEVFVHPNYSKSTTDN DIALLHLAQPATLSQTIVPICLPDSGLAERELNQAGQETLVTGWGYHSSREKEAKRN RTFVLNFIKIPVVPHNECSEVMSNMVSENMLCAGILGDRQDACEGDSGGPMVASFH GTWFLVGLVSWGEGCGLLHNYGVYTKVSRYLDWIHGHIRDKEAPQKSWAP corresponding to amino acids 134 - 461 of PRTC HUMAN, which also corresponds to amino acids 168 - 495 of S50739_ PEA_2_P18, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of S50739 PEA 2 P18, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for
GEGERWMLAGGGAGLGPGWGRGTSTSCPRPPLPA, corresponding to S50739 PEA 2 PI 8.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide. Variant protein S50739_PEA_2_P18 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 6, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S50739 PEA 2 P18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Amino acid mutations
The glycosylation sites of variant protein S50739_PEA_2_P18, as compared to the known protein Vitamin-K-dependent protein C precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Glycosylation site(s)
The phosphorilation sites of variant protein S50739_PEA_2_P18, as compared to the known protein Vitamin-K-dependent protein C precursor, are described in Table 8 (given
according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorilation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 8 - Phosphorilation site(s)
Variant protein S50739_PEA_2_P18 is encoded by the following transcript(s): S50739_PEA_2_T5 and S50739_PEA_2_T6, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript S50739_PEA_2_T5 is shown in bold; this coding portion starts at position 94 and ends at position 1578. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S50739_PEA_2_P18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
1624 G -> A No The coding portion of transcript S50739_PEA_2_T6 is shown in bold; this coding portion starts at position 214 and ends at position 1698. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S50739 PEA 2 P18 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein S50739 PEA 2 P19 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S50739_PEA_2_T11. An alignment is given to the known protein (Vitamin-K-dependent protein C precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S50739_PEA_2_P19 and PRTC_HUMAN: 1.An isolated chimeric polypeptide encoding for S50739_PEA_2_P19, comprising a first amino acid sequence being at least 90 % homologous to MWQLTSLLLFVATWGISGTPAPLDSVFSSSERAHQVLRIRKRANSFLEELRHSSLERE CIEEICDFEEAKEIFQNVDDTLAFWSKHVDGDQCLVLPLEHPCASLCCGHGTCIDGIG
SFSCDCRSGWEGRFCQREVSFLNCSLDNGGCTHYCLEEVGWRRCSCAPGYKLGDD LLQCHPA corresponding to amino acids 1 - 178 of PRTC_HUMAN, which also corresponds to amino acids 1 - 178 of S50739_PEA_2_P19, and a second amino acid sequence being at least 90 % homologous to GEYDLRRWEKWELDLDIKEVFVHPNYSKSTTDNDIALLHLAQPATLSQTIVPICLPD SGLAERELNQAGQETLVTGWGYHSSREKEAKRNRTFVLNFIKIPVVPHNECSEVMS NMVSENMLCAGILGDRQDACEGDSGGPMVASFHGTWFLVGLVSWGEGCGLLHNY GVYTKVSRYLDWIHGHIRDKEAPQKSWAP corresponding to amino acids 266 - 461 of PRTC_HUMAN, which also corresponds to amino acids 179 - 374 of S50739_PEA_2_P19, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of S50739_PEA_2_P19, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AG, having a structure as follows: a sequence starting from any of amino acid numbers 178-x to 178; and ending at any of amino acid numbers 179+ ((n-2) - x), in which x varies from 0 to n-2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein S50739 PEA 2 P19 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein
S50739_PEA_2_P19 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 11 - Amino acid mutations
The glycosylation sites of variant protein S50739 PEA 2 P19, as compared to the known protein Vitamin-K-dependent protein C precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Glycosylation site(s)
The phosphorilation sites of variant protein S50739 PEA 2 P19, as compared to the known protein Vitamin-K-dependent protein C precursor, are described in Table 13 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorilation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 13 - Phosphorilation site(s)
Variant protein S50739_PEA_2_P19 is encoded by the following transcript(s): S50739_PEA_2_T1 1, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript S50739 PEA 2 T11 is shown in bold; this coding portion
starts at position 94 and ends at position 1215. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S50739_PEA_2_P19 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein S50739_PEA_2_P20 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) S50739_PEA_2_T13. An alignment is given to the known protein (Vitamin-K-dependent protein C precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between S50739_PEA_2_P20 and PRTC_HUMAN: 1.An isolated chimeric polypeptide encoding for S50739 PEA 2 P20, comprising a first amino acid sequence being at least 90 % homologous to MWQLTSLLLFVATWGISGTPAPLDSVFSSSERAHQVLRIRKRANSFLEELRHSSLERE CIEEICDFEEAKEIFQNVDDTLAFWSKHVDGDQCLVLPLEHPCASLCCGHGTCIDGIG SFSCDCRSGWEGRFCQR corresponding to amino acids 1 - 133 of PRTC_HUMAN, which also corresponds to amino acids 1 - 133 of S50739_PEA_2_P20, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GEGERWMLAGGGAGLGPGWGRGTSTSCPRPPLPA corresponding to amino acids 134 - 167 of S50739_PEA_2_P20, a third amino acid
sequence being at least 90 % homologous to
EVSFLNCSLDNGGCTHYCLEEVGWRRCSCAPGYKLGDDLLQCHPA corresponding to amino acids 134 - 178 of PRTC_HUMAN, which also corresponds to amino acids 168 - 212 of S50739_PEA_2_P20, and a fourth amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
GEKPPIHRPGITLGAGWAGPLTGRGAGGSGGFLGRERGTELSLGAAADAPQHRGHC corresponding to amino acids 213 - 268 of S50739_PEA_2_P20, wherein said first amino acid sequence, second amino acid sequence, third amino acid sequence and fourth amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for an edge portion of S50739_PEA_2_P20, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for GEGERWMLAGGGAGLGPGWGRGTSTSCPRPPLPA, corresponding to S50739_PEA_2_P20. 3.An isolated polypeptide encoding for a tail of S50739_PEA_2_P20, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
GEKPPIHRPGITLGAGWAGPLTGRGAGGSGGFLGRERGTELSLGAAADAPQHRGHC in S50739_PEA_2_P20.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein S50739_PEA_2_P20 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein
S50739_PEA_2_P20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
The glycosylation sites of variant protein S50739 PEA 2 P20, as compared to the known protein Vitamin-K-dependent protein C precursor, are described in Table 16 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 16 - Glycosylation site(s)
The phosphorilation sites of variant protein S50739_PEA_2_P20, as compared to the known protein Vitamin-K-dependent protein C precursor, are described in Table 17 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorilation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 17 - Phosphorilation site(s)
Variant protein S50739_PEA_2_P20 is encoded by the following transcript(s): S50739_PEA_2_T13, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript S50739_PEA_2_T13 is shown in bold; this coding portion starts at position 94 and ends at position 897. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein S50739 PEA 2JP20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
As noted above, cluster S50739 features 24 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster S50739_PEA_2_node_8 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6, S50739_PEA_2_T11 and S50739_PEA_2_T13. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_18 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6, S50739_PEA_2_T11 and S50739_PEA_2_T13. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_19 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T13. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_31 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6 and S50739_PEA_2_T11. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_33 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6 and S50739_PEA_2_T1 1. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster S50739_PEA_2_node_0 according to the present invention can be found in the following transcπpt(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6, S50739_PEA_2_T11 and S50739_PEA_2_T13. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
S50739 PEA 2 T13 1 24
Segment cluster S50739_PEA_2_node_l according to the present invention is supported by 7 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6, S50739_PEA_2_T11 and S50739_PEA_2_T13. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_3 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739 PEA 2 T6. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_4 according to the present invention can be found in the following transcript(s): S50739_PEA_2_T4 and S50739_PEA_2_T6. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_5 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6, S50739_PEA_2_T1 1 and S50739_PEA_2_T13. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_7 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T5, S50739_PEA_2_T6, S50739_PEA_2_T11 and S50739_PEA_2_T13. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_l 1 according to the present invention can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6, S50739_PEA_2_T1 1 and S50739_PEA_2_T13. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_14 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6, S50739_PEA_2_T1 1 and S50739_PEA_2_T13. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_15 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6, S50739_PEA_2_T1 1 and S50739_PEA_2_T13. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_16 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T5,
S50739_PEA_2_T6 and S50739_PEA_2_T13. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_17 according to the present invention can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6, S50739_PEA_2_T11 and S50739_PEA_2_T13. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_22 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739 PEA 2 T4, S50739_PEA_2_T5 and S50739_PEA_2_T6. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_23 according to the present invention can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5 and S50739_PEA_2_T6. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_24 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following transcript(s): S50739 PEA 2 T4, S50739_PEA_2_T5 and S50739_PEA_2_T6. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_26 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739JPEA 2 T4, S50739_PEA_2_T5 and S50739_PEA_2_T6. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_27 according to the present invention can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5 and S50739 PEA 2 T6. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_28 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5 and S50739_PEA_2_T6. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster S50739_PEA_2_node_30 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6 and S50739_PEA_2_T1 1. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
S50739 PEA 2 Ti l 629 656
Segment cluster S50739_PEA_2_node_32 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): S50739_PEA_2_T4, S50739_PEA_2_T5, S50739_PEA_2_T6 and S50739 PEA 2 T1 1. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: PRTC_HUMAN Sequence documentation: Alignment of: S50739_PEA_2_P18 x PRTC_HU AN Alignment segment 1/1: Quality: 4551.00 Escore: 0 Matching length: 461 Total length: 495 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 93.13 Total Percent Identity: 93.13 Gaps: 1 Alignment: 1 MWQLTSLLLFVAT GISGTPAPLDSVFSSSERAHQVLRIRKRANSFLEEL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MWQLTSLLLFVATWGISGTPAPLDSVFSSSERAHQVLRIRKRANSFLEEL 50
RHSSLERECIEEICDFEEAKEIFQNVDDTLAFWSKHVDGDQCLVLPLEHP I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I RHSSLERECIEEICDFEEAKEIFQNVDDTLAFWSKHVDGDQCLVLPLEHP
CASLCCGHGTCIDGIGSFSCDCRSG EGRFCQRGEGERWMLAGGGAGLGP I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I CASLCCGHGTCIDGIGSFSCDCRSG EGRFCQR
G GRGTSTSCPRPPLPAEVSFLNCSLDNGGCTHYCLEEVG RRCSCAPGY I I I M I I I I I I I I I I I I I I I I I I I I I I I I II I I EVSFLNCSLDNGGCTHYCLEEVGWRRCSCAPGY
KLGDDLLQCHPAVKFPCGRPWKRMEKKRSHLKRDTEDQEDQVDPRLIDGK I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I KLGDDLLQCHPAVKFPCGRPWKRMEKKRSHLKRDTEDQEDQVDPRLIDGK
MTRRGDSP QVVLLDSKKKLACGAVLIHPS VLTAAHCMDESKKLLVRLG
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I MTRRGDSP QVVLLDSKKKLACGAVLIHPS VLTAAHCMDESKKLLVRLG . . . . . EYDLRR EK ELDLDIKEVFVHPNYSKSTTDNDIALLHLAQPATLSQTIV
I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I EYDLRR EK ELDLDIKEVFVHPNYSKSTTDNDIALLHLAQPATLSQTIV
PICLPDSGLAERELNQAGQETLVTGWGYHSSREKEAKRNRTFVLNFIKIP
I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I PICLPDSGLAERELNQAGQETLVTG GYHSSREKEAKRNRTFVLNFIKIP
VVPHNECSEVMSNMVSENMLCAGILGDRQDACEGDSGGPMVASFHGTWFL I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I VVPHNECSEVMSNMVSENMLCAGILGDRQDACEGDSGGPMVASFHGTWFL
VGLVSWGEGCGLLHNYGVYTKVSRYLDWIHGHIRDKEAPQKSWAP
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
417 VGLVSWGEGCGLLHNYGVYTKVSRYLD IHGHIRDKEAPQKS AP 461
Sequence name: PRTC_HUMAN
Sequence documentation:
Alignment of: S50739_PEA_2_P1 x PRTC_HUMAN Alignment segment 1/1: Quality: 3677.00 Escore: 0 Matching length: 374 Total length: 461 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 81.13 Total Percent Identity: 81.13 Gaps: 1
Alignment: 1 M QLTSLLLFVATWGISGTPAPLDSVFSSSERAHQVLRIRKRANSFLEEL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 M QLTSLLLFVATWGISGTPAPLDSVFSSSERAHQVLRIRKRANSFLEEL 50 51 RHSSLERECIEEICDFEEAKEIFQNVDDTLAF SKHVDGDQCLVLPLEHP 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 RHSSLERECIEEICDFEEAKEIFQNVDDTLAFWSKHVDGDQCLVLPLEHP
100 101 CASLCCGHGTCIDGIGSFSCDCRSGWEGRFCQREVSFLNCSLDNGGCTHY 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 101 CASLCCGHGTCIDGIGSFSCDCRSGWEGRFCQREVSFLNCSLDNGGCTHY
150 151 CLEEVG RRCSCAPGYKLGDDLLQCHPA
178 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 CLEEVGWRRCSCAPGYKLGDDLLQCHPAVKFPCGRPWKRMEKKRSHLKRD
200
178
178 201 TEDQEDQVDPRLIDGKMTRRGDSPWQWLLDSKKKLACGAVLIHPSWVLT 250 179 GEYDLRR EKWELDLDIKEVFVHPNYSKSTTDNDI
213 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 AAHCMDESKKLLVRLGEYDLRRWEK ELDLDIKEVFVHPNYSKSTTDNDI 300 214 ALLHLAQPATLSQTIVPICLPDSGLAERELNQAGQETLVTG GYHSSREK 263 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 ALLHLAQPATLSQTIVPICLPDSGLAERELNQAGQETLVTG GYHSSREK 350 264 EAKRNRTFVLNFIKIPWPHNECSEVMSNMVSENMLCAGILGDRQDACEG 313 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 EAKRNRTFVLNFIKIPVVPHNECSEVMSNMVSENMLCAGILGDRQDACEG 400 314 DSGGPMVASFHGTWFLVGLVS GEGCGLLHNYGVYTKVSRYLDWIHGHIR 363 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 DSGGPMVASFHGTWFLVGLVSWGEGCGLLHNYGVYTKVSRYLD IHGHIR 450 364 DKEAPQKSWAP 374 I I I I I I I I I I I 451 DKEAPQKSWAP 461
Sequence name : PRTC_HUMAN
Sequence documentation:
Alignment of: S50739_PEA_2_P20 x PRTC_HUMAN
Alignment segment 1/1: Quality: 1712.00
Escore: 0
Matching length: 178 Total length: 212 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 83.96 Total Percent Identity: 83.96 Gaps : 1
Alignment: . . . . . 1 MWQLTSLLLFVATWGISGTPAPLDSVFSSSERAHQVLRIRKRANSFLEEL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 1 MWQLTSLLLFVATWGISGTPAPLDSVFSSSERAHQVLRIRKRANSFLEEL 50 51 RHSSLERECIEEICDFEEAKEIFQNVDDTLAFWSKHVDGDQCLVLPLEHP
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 RHSSLERECIEEICDFEEAKEIFQNVDDTLAFWSKHVDGDQCLVLPLEHP
100 . . . . . 101 CASLCCGHGTCIDGIGSFSCDCRSGWEGRFCQRGEGERWMLAGGGAGLGP
150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 CASLCCGHGTCIDGIGSFSCDCRSGWEGRFCQR 133 151 GWGRGTSTSCPRPPLPAEVSFLNCSLDNGGCTHYCLEEVGWRRCSCAPGY 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 134 EVSFLNCSLDNGGCTHYCLEEVGWRRCSCAPGY
166 201 KLGDDLLQCHPA 212 l l l l l l l l l l l l 167 KLGDDLLQCHPA 178
Subsection F: Creatine kinase Variants
DESCRIPTION FOR CLUSTER T05088
Cluster T05088 features 11 transcript(s) and 71 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest Transcript N e* • M wn *ΛΛ1 T05088 PEA 1 T5 871 T05088 PEA 1 Ti l 872 T05088 PEA 1 T14 873 T05088 PEA 1 T15 874 T05088 PEA 1 T16 875 T05088 PEA 1 T17 876 T05088 PEA 1 T18 877 T05088 PEA 1 T19 878 T05088 PEA 1 T21 879 T05088 PEA 1 T24 880 T05088 PEA 1 T36
Table 2 - Segments of interest ø nβnMs ι ≠ T05088 PEA node 18 882 T05088 PEA node 35 883 T05088 PEA node 45 884 T05088 PEA node 46 885 T05088 PEA node 48 886 T05088 PEA node 55 887 T05088 PEA node 77 888
T05088 PEA node 3 889
T05088 PEA node 4 890 T05088 PEA node 7 891
T05088 PEA node 8 892
T05088 PEA node 10 893
T05088 PEA node 11 894
T05088 PEA node 13 895
T05088 PEA node 14 896
T05088 PEA node 15 897
T05088 PEA node 16 898
T05088 PEA node 17 899
T05088 PEA node 19 900
T05088 PEA node 20 901
T05088 PEA node 21 902
T05088 PEA node 22 903
T05088 PEA node 23 904
T05088 PEA node 24 905
T05088 PEA node 25 906
Table 3 - Proteins of interest
These sequences are variants ofthe known protein Creatine kinase, B chain (SwissProt accession identifier KCRB HUMAN; known also according to the synonyms EC 2.7.3.2; B- CK), SEQ ID NO: 953, referred to herein as the previously known protein. Protein Creatine kinase, B chain is known or believed to have the following function(s): reversibly catalyzes the transfer of phosphate between ATP and various phosphogens (e.g. creatine phosphate). Creatine kinase isoenzymes play a central role in energy transduction in tissues with large, fluctuating energy demands, such as skeletal muscle, heart, brain and spermatozoa. CPK-MB is not a secreted variant, yet as many other heart damage markers is found in higher concentration in the serum due to heart damage and cell necrosis (Clin Invest Med.1984;7(4): 187-91). CPK is composed of two subunits - M and B. While CPK-MM isoenzyme is dominant in adult skeletal muscle and CPK-BB is dominant in brain, CPK-MB, composed of one M subunit and one B subunit, is relatively heart specific. However, CPK- MB does exist in skeletal muscle (up to a third ofthe concentration found in heart muscle) and though its specificity is better than for SGOT and LDH, it is still limited both in specificity and sensitivity which reach only 67% when used together with electrocardiogram (J Am Osteopath Assoc. 2000 Jan;100(l):29-32). In addition, cardiac surgery, myocarditis, and electrical cardioversion often result in elevated serum levels of the CPK-MB isoenzyme. Moreover, small infarct with minor myocardial cell necrosis often do not increase serum CPK-MB to a detected level. To overcome these disadvantages, occasionally a ratio (relative index) of CPK-MB mass to CPK activity is measured and value >2.5 suggests myocardial rather than a skeletal muscle source for the CPK-MB elevation. This ratio is less useful when levels of total CPK are high owing to skeletal muscle injury or when the total CPK level is within the normal range but CPK-MB is elevated. Another way of trying and avoid CPK-
MB limitations is not to make the diagnosis of AMI on the basis of a single measurement of CPK and CPK-MB, but to evaluate a series of measurements obtained over the first 24 h. Skeletal muscle release of CPK-MB typically produces a ""plateau"" pattern, whereas AMI produces a CPK-MB elevation that peaks approximately 20 h after the onset of coronary occlusion (Clin Biochem. 1984 Dec; 17(6):356-61). With regard to diagnostic utilities, it should be noted that it is mostly used for diagnosis of muscle pathologies. The MB variant is heart specific and used in the diagnosis of myocardial infarction. The variants ofthe present invention are also preferably used for diagnosis of cardiac disease, more preferably myocardial infarction and/or other acute cardiac damage. The sequence for protein Creatine kinase, B chain is given at the end ofthe application, as "Creatine kinase, B chain amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Creatine kinase, B chain localization is believed to be Cytoplasmic. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: creatine kinase; transferase, transferring phosphorus- containing groups, which are annotation(s) related to Molecular Function; and cytoplasm, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster T05088 features 11 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Creatine kinase, B chain. A description of each variant protein according to the present invention is now provided.
Variant protein T05088_PEA_1_P8 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T05088_PEA_1_T1 1. An alignment is given to the known protein (Creatine kinase, B chain) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05088_PEA _P8 and KCRB_HUMAN: l .An isolated chimeric polypeptide encoding for T05088 PEA 1 P8, comprising a first amino acid sequence being at least 90 % homologous to
MPFSNSHNALKLRFPAEDEFPDLSAHN HMAKVLTPELYAELRAKSTPSGFTLDDVI QTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGYKPSDEHKTDLNPD NLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGERRAIEKLAVEALSSLDGDLA GRYYALKSMTEAEQQQLIDDHFLFDKPVSPLLLASGMARDWPDARGIWH DNKTF LVWVNEEDHLRVISMQKGGNMKEVFTRFCTGLTQIETLFKSKDYEFMWNPHLGYIL TCPSNLGTGLRAGVHIKLPNLGKHEKFSEVLKRLRLQKRGTG corresponding to amino acids 1 - 323 of KCRB_HUMAN, which also corresponds to amino acids 1 - 323 of T05088 PEA 1 P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
EQGRCCGFPWPLGSPVSSALTCCLPRRCGHGCGGRGLRRLQR corresponding to amino acids 324 - 365 of T05088 PEA 1 P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T05088 PEA 1 P8, comprising a polypeptide being at least 70%, optionally at least about 80%), preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence EQGRCCGFPWPLGSPVSSALTCCLPRRCGHGCGGRGLRRLQR in T05088 PEA 1 P8.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein T05088_PEA_1_P8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein T05088_PEA_1_P8 is encoded by the following transcript(s): T05088_PEA_1_T11, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T05088_PEA_1_T11 is shown in bold; this coding portion starts at position 100 and ends at position 1 194. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not;
the presence of known SNPs in variant protein T05088_PEA_1_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein T05088 PEA 1 P1 1 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T05088_PEA_1_T14. An alignment is given to the known protein (Creatine kinase, B chain)
at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05088_PEA_1_P 1 1 and KCRB HUMAN: l.An isolated chimeric polypeptide encoding for T05088_PEA_1_P1 1, comprising a first amino acid sequence being at least 90 % homologous to
MPFSNSH ALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSGFTLDDVI QTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGYKPSDEHKTDLNPD NLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGERRAIEKLAVEALSSLDGDLA GRYYALKSMTEAEQQQLIDDHFLFDKPVSPLLLASGMARDWPDARGIW corresponding to amino acids 1 - 218 of KCRB HUMAN, which also corresponds to amino acids 1 - 218 of T05088_PEA_1_P11, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
CVSLCALSRRPPSPLPPLSLSPPSRGWGPSRRGGGGGGGRGRPRSGSGFRAAPPPAPV TLAEQVR corresponding to amino acids 219 - 283 of T05088_PEA_1_P1 1, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T05088_PEA_1_P11 , comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CVSLCALSRRPPSPLPPLSLSPPSRGWGPSRRGGGGGGGRGRPRSGSGFRAAPPPAPV TLAEQVR in T05088_PEA_1_P1 1.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because of manual inspection of known protein localization and/or gene structure. Variant protein T05088_PEA_1_P11 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the
amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P1 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein T05088 PEA 1 P11 is encoded by the following transcript(s): T05088_PEA_1_T14, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T05088_PEA_1_T14 is shown in bold; this coding portion starts at position 100 and ends at position 948. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein T05088_PEA_1_P12 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T05088_PEA_1_T15. An alignment is given to the known protein (Creatine kinase, B chain) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05088_PEA_1_P12 and KCRB_HUMAN: l.An isolated chimeric polypeptide encoding for T05088_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSGFTLDDVI QTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGYKPSDEHKTDLNPD NL corresponding to amino acids 1 - 115 of KCRB HUMAN, which also corresponds to amino acids 1 - 1 15 of T05088_PEA_1_P12, a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90% and most
preferably at least 95% homologous to a polypeptide having the sequence QVRGCGRAGRAGPGSSGAHSRLAS corresponding to amino acids 116 - 139 of T05088_PEA_1_P12, and a third amino acid sequence being at least 90 % homologous to QGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGERRAIEKLAVEALSSLDGDLAGR YYALKSMTEAEQQQLIDDHFLFDKPVSPLLLASGMARDWPDARGIWHNDNKTFLV WVNEEDHLRVISMQKGGNMKEVFTRFCTGLTQIETLFKSKDYEFMW PHLGYILTC PSNLGTGLRAGVHIKLPNLGKHEKFSEVLKRLRLQKRGTGGVDTAAVGGVFDVSN ADRLGFSEVELVQMVVDGVKLLIEMEQRLEQGQAIDDLMPAQK corresponding to amino acids 116 - 381 of KCRB HUMAN, which also corresponds to amino acids 140 - 405 of T05088 PEA 1 P12, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of T05088 PEA 1 P12, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for QVRGCGRAGRAGPGSSGAHSRLAS, corresponding to T05088_PEA_1_P12.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein T05088_PEA_1_P12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein T05088_PEA_1_P12 is encoded by the following transcript(s): T05088 PEA 1 T15, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T05088_PEA_1_T15 is shown in bold; this coding portion starts at position 100 and ends at position 1314. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein T05088_PEA_1_P13 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T05088 PEA 1 T16. An alignment is given to the known protein (Creatine kinase, B chain) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05088_PEA_1_P13 and KCRB_HUMAN: l.An isolated chimeric polypeptide encoding for T05088_PEA_1_P13, comprising a first amino acid sequence being at least 90 % homologous to MPFSNSHNALKLRFPAEDEFPDLSAH NHMAKVLTPELYAELRAKSTPSGFTLDDVI QTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGYKPSDEHKTDLNPD NLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGERRAIEKLAVEALSSLDGDLA GRYYALKSMTEAEQQQLIDDHFLFDKPVSPLLLASGMARDWPDARGIWHNDNKTF LVWVNEEDHLRVISMQKGGNMKEVFTRFCTGLTQ corresponding to amino acids 1 - 259 of KCRB HUMAN, which also corresponds to amino acids 1 - 259 of T05088_PEA_1_P13, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least
95% homologous to a polypeptide having the sequence VPGTGQAQTPGPQQGCGCPSISPPGGFPALGSLRACRGFRQAFSLIPSSPSAD corresponding to amino acids 260 - 312 of T05088_PEA_1_P13, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T05088_PEA_1_P13, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPGTGQAQTPGPQQGCGCPSISPPGGFPALGSLRACRGFRQAFSLIPSSPSAD in T05088 PEA 1 PI 3.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein T05088_PEA_1_P13 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 1 1 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
250 F -> V No Variant protein T05088_PEA_1_P13 is encoded by the following transcript(s): T05088_PEA_1_T16, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T05088_PEA_1_T16 is shown in bold; this coding portion starts at position 100 and ends at position 1035. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P13 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein T05088JPEA 1JP14 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T05088_PEA_1_T17. An alignment is given to the known protein (Creatine kinase, B chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05088_PEA_1_P14 and KCRB_HUMAN: l .An isolated chimeric polypeptide encoding for T05088 PEA 1 P14, comprising a first amino acid sequence being at least 90 % homologous to MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSGFTLDDVI QTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGYKPSDEHKTDLNPD NLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGERRAIEKLAVEALSSLDGDLA GRYYALKSMTEAEQQQLIDDHFLFDKPVSPLLLASGMARDWPDARGIWHNDNKTF LVWVNEEDHLRVISMQKGGNMKEVFTRFCTGLTQIETLFKSKDYEFMWNPHLGYIL TCPSNLGTGLRAGVHIKLPNLGKHEKFSEVLKRLRLQKRGTGG corresponding to amino acids 1 - 324 of KCRB HUMAN, which also corresponds to amino acids 1 - 324 of T05088_PEA_1_P14, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AGADGGGRSEAAHRDGTAAGAGPGHRRPHACPEMKPGPHPTPALLLPNLLPGQCPP CTPDVRRLASP corresponding to amino acids 325 - 391 of T05088_PEA_1_P14, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T05088_PEA_1_P14, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
AGADGGGRSEAAHRDGTAAGAGPGHRRPHACPEMKPGPHPTPALLLPNLLPGQCPP CTPDVRRLASP in T05088 PEA 1 P14.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein T05088_PEA_1_P14 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein T05088_PEA_1_P14 is encoded by the following transcript(s): T05088_PEA_1_T17, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T05088_PEA_1_T17 is shown in bold; this coding portion starts at position 100 and ends at position 1272. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P14 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein T05088_PEA_1_P15 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T05088_PEA_1_T18. An alignment is given to the known protein (Creatine kinase, B chain) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05088_PEA_1_P15 and KCRB_HUMAN: l.An isolated chimeric polypeptide encoding for T05088_PEA_1_P15, comprising a first amino acid sequence being at least 90 % homologous to MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSGFTLDDVI QTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGYKPSDEHKTDLNPD NLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGERRAIEKLAVE corresponding to amino acids 1 - 160 of KCRB_HUMAN, which also corresponds to amino acids 1 - 160 of T05088 PEA 1 P15, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence GRGRAGRGAAAAASPSRRGPRPLLFTSPGSGSRRRALICARPGSVSRRDRGTEAQPR AHSGLGPREGGSWRGVTAWDRRPGREDWTPADPGGWGPLTSPEVGHGGGRVRAA GWRGGRGSRAS corresponding to amino acids 161 - 282 of T05088_PEA_1_P15, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T05088 PEA 1 P15, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence GRGRAGRGAAAAASPSRRGPRPLLFTSPGSGSRRRALICARPGSVSRRDRGTEAQPR AHSGLGPREGGSWRGVTAWDRRPGREDWTPADPGGWGPLTSPEVGHGGGRVRAA GWRGGRGSRAS in T05088 PEA 1 P15.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein T05088_PEA_1_P15 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Variant protein T05088 PEA 1 P15 is encoded by the following transcript(s): T05088_PEA_1_T18, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T05088_PEA_1_T18 is shown in bold; this coding portion starts at position 100 and ends at position 945. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P15 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein T05088_PEA_1_P16 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T05088_PEA_1_T19. An alignment is given to the known protein (Creatine kinase, B chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05088_PEA_1_P16 and KCRB_HUMAN: l.An isolated chimeric polypeptide encoding for T05088_PEA_1_P16, comprising a first amino acid sequence being at least 90 % homologous to
MPFSNSHNALKLRFPAEDEFPDLSAHN HMAKVLTPELYAELRAKSTPSGFTLDDVI QTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGYKPSDEHKTDLNPD NLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGERRAIEKLAVEALSSLDGDLA GRYYALKSMTEAEQQQLIDDHFLFDKPVSPLLLASGMARDWPDARGIW corresponding to amino acids 1 - 218 of KCRB_HUMAN, which also corresponds to amino acids 1 - 218 of T05088_PEA_1_P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CVSLCALSRRPPSPLPPLSLSPPSRGWGPSRRGGGGGGGRGRPRSGSGFRAAPPPAPV TLAEQAQ corresponding to amino acids 219 - 283 of T05088_PEA_1_P16, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T05088_PEA_1_P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
CVSLCALSRRPPSPLPPLSLSPPSRGWGPSRRGGGGGGGRGRPRSGSGFRAAPPPAPV TLAEQAQ in T05088_PEA_1_P16. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein T05088 PEA 1 P16 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088 PEA 1 P16 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 17 - Amino acid mutations
Variant protein T05088_PEA_1_P16 is encoded by the following transcript(s): T05088 PEA 1 T19, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript T05088 PEA 1 T19 is shown in bold; this coding portion starts at position 100 and ends at position 948. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088 PEA 1 P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
Variant protein T05088_PEA_1_P29 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T05088_PEA_1_T36. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein T05088_PEA_1_P29 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 19, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088 PEA 1 P29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Amino acid mutations
Variant protein T05088_PEA_1_P29 is encoded by the following transcript(s): T05088_PEA_1_T36, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T05088_PEA_1_T36 is shown in bold; this coding portion starts at position 100 and ends at position 1308. The transcript also has the following SNPs as listed in Table 20 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 20 - Nucleic acid SNPs
Variant protein T05088_PEA_1_P52 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) T05088_PEA_1_T5. An alignment is given to the known protein (Creatine kinase, B chain) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05088_PEA_1_P52 and KCRB_HUMAN: l.An isolated chimeric polypeptide encoding for T05088_PEA_1_P52, comprising a first amino acid sequence being at least 90 % homologous to MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSGFTLDDVI QTGVDNPG corresponding to amino acids 1 - 65 of KCRB_HUMAN, which also corresponds to amino acids 1 - 65 of T05088_PEA_1_P52, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TRPLGAGVPAPPPPAQPQGPQQRARARQ corresponding to amino acids 66 - 93 of T05088_PEA_1_P52, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T05088_PEA_1_P52, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TRPLGAGVPAPPPPAQPQGPQQRARARQ in T05088_PEA_1_P52.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because of manual inspection of known protein localization and/or gene structure.
Variant protein T05088_PEA_1_P52 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 21, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P52 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Amino acid mutations
Variant protein T05088 PEA 1 P52 is encoded by the following transcript(s): T05088_PEA_1_T5, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T05088_PEA_1_T5 is shown in bold; this coding portion starts at position 100 and ends at position 378. The transcript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P52 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
Variant protein T05088_PEA_1_P53 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) T05088_PEA_1_T21. An alignment is given to the known protein (Creatine kinase, B chain) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05088_PEA_1_P53 and KCRB_HUMAN: l .An isolated chimeric polypeptide encoding for T05088_PEA_1_P53, comprising a first amino acid sequence being at least 90 % homologous to MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSGFTLDDVI QTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGYKPSDEHKTDLNPD NLQ corresponding to amino acids 1 - 1 16 of KCRB HUMAN, which also corresponds to amino acids 1 - 1 16 of T05088 PEA 1 P53, and a second amino acid sequence being at
least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PCPAWTATWRADTTRSRA corresponding to amino acids 117 - 134 of T05088_PEA_1_P53, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of T05088_PEA_1_P53, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PCPAWTATWRADTTRSRA in T05088 PEA P53.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein T05088_PEA_1_P53 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 23, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P53 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 23 - Amino acid mutations
Variant protein T05088 PEA 1 P53 is encoded by the following transcript(s): T05088_PEA_1_T21 , for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T05088 PEA 1 T21 is shown in bold; this coding portion
starts at position 100 and ends at position 501. The transcript also has the following SNPs as listed in Table 24 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P53 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Nucleic acid SNPs
Variant protein T05088_PEA_1_P54 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s)
T05088_PEA_1_T24. An alignment is given to the known protein (Creatine kinase, B chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between T05088_PEA_1_P54 and KCRB_HUMAN: l.An isolated chimeric polypeptide encoding for T05088_PEA_ 1_P54, comprising a first amino acid sequence being at least 90 % homologous to MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSGFTLDDVI QTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGYKPSDEHKTDLNPD NLQGGDDLDPNYVLSSR corresponding to amino acids 1 - 130 of KCRB_HUMAN, which also corresponds to amino acids 1 - 130 of T05088 PEA 1 P54, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence PCPAWTATWRADTTRSRA corresponding to amino acids 131 - 148 of T05088 PEA 1 P54, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of T05088_PEA_1_P54, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PCPAWTATWRADTTRSRA in T05088_PEA_1_P54.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein T05088_PEA_1_P54 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 25, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein
T05088 PEA 1 P54 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Amino acid mutations
Variant protein T05088 PEA 1 P54 is encoded by the following transcript(s): T05088 PEA 1 T24, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript T05088 PEA 1 T24 is shown in bold; this coding portion starts at position 100 and ends at position 543. The transcript also has the following SNPs as listed in Table 26 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein T05088_PEA_1_P54 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 26 - Nucleic acid SNPs
As noted above, cluster T05088 features 71 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster T05088_PEA_l_node_18 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_35 according to the present invention is supported by 32 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T18. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_45 according to the present invention is supported by 22 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088 PEA 1 T14, T05088_PEA_1_T19 and T05088_PEA_1_T36. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_46 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T14. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_48 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T14. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_55 according to the present invention is supported by 10 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T16. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_77 according to the present invention is supported by 163 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also prpvided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster T05088_PEA_l_node_3 according to the present invention is supported by 58 libraries. The number of libraries was determined as previously described.
This segment can be found in the following transcripts): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_4 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088 PEA 1 T36. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
T05088 PEA 1 T36 73 87
Segment cluster T05088_PEA_l_node_7 according to the present invention can be found in the following transcript(s): T05088_PEA _T5, T05088_PEA_1_T11, T05088_ PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_8 according to the present invention is supported by 68 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088 PEA 1 T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_10 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_ T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088 PEA 1 T36. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_l 1 according to the present invention is supported by 82 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_13 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1 , T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_14 according to the present invention is supported by 83 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088 PEA 1 T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05O88_PEA_l_T15, T05088_PEA_1_T16,
T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_15 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11 , T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1 T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_16 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21 , T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_17 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_19 according to the present invention can be found in the following transcript(s)- T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088 PEA 1 T36. Table 45 below describes the starting and ending position of this segment on each transcπpt. Table 45 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_20 according to the present invention can be found in the following transcript(s)- T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088JPEA T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36 Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_21 according to the present invention is supported by 81 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_22 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 48 below descπbes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_23 according to the present invention can be found in the following transcπpt(s) T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36 Table 49 below describes the starting and ending position of this segment on each transcπpt Table 49 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_24 according to the present invention can be found in the following transcπpt(s)- T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24
and T05088_PEA_1_T36. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_25 according to the present invention can be found in the following transcript(s): T05088_PEA_1_ T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_ T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088 PEA 1 T36. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_26 according to the present invention can be found in the following transcript(s)- T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_27 according to the present invention can be found in the following transcπpt(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_28 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_29 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T15. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_30 according to the present invention is supported by 83 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5,
T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 56 below describes the starting and ending position of this segment on each transcript Table 56 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_31 according to the present invention can be found in the following transcπpt(s) T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA _T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA _T24 and T05088 PEA 1 T36. Table 57 below describes the starting and ending position of this segment on each transcript Table 57 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_32 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_33 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19 and T05088_PEA_1_T36. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_34 according to the present invention is supported by 84 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19 and T05088_PEA_1_T36. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_36 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_37 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_38 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088 >EA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_39 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088 PEA 1 T36. Table 64 below describes the starting and ending position of this segment on each transcript. Table 64 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_40 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1 , T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_41 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_42 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24
and T05088_PEA_1_T36. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_43 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_44 according to the present invention is supported by 135 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 69 below describes the starting and ending position of this segment on each transcript. Table 69 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_47 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T14. Table 70 below describes the starting and ending position of this segment on each transcript. Table 70 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_49 according to the present invention is supported by 142 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088 PEA 1 T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088 PEA_1_T17, T05088 PEA_1 T18, T05088_PEA_1_T19, T05088_PEA_1_T21
and T05088_PEA_1_T24. Table 71 below describes the starting and ending position of this segment on each transcript. Table 71 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_50 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21 and T05088 PEA 1 T24. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_51 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11,
T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21 and T05088_PEA_1_T24. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_52 according to the present invention is supported by 160 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21 and T05088_PEA_1_T24. Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_53 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21 and T05088_PEA_1_T24. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_54 according to the present invention is supported by 6 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T16. Table 76 below describes the starting and ending position of this segment on each transcript. Table 76 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_56 according to the present invention is supported by 205 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_.PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21,
T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 77 below describes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_57 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088 PEA 1 T36. Table 78 below describes the starting and ending position of this segment on each transcript. Table 78 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_58 according to the present invention is supported by 228 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 79 below describes the starting and ending position of this segment on each transcript. Table 79 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_59 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 80 below describes the starting and ending position of this segment on each transcript. Table 80 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_60 according to the present invention is supported by 234 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088 PEA 1 T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 81 below describes the starting and ending position of this segment on each transcript. Table 81 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_61 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088 PEA 1 T1 1. Table 82 below describes the starting and ending position of this segment on each transcript. Table 82 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_62 according to the present invention is supported by 231 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 83 below describes the starting and ending position of this segment on each transcript. Table 83 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_63 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 84 below describes the starting and ending position of this segment on each transcript. Table 84 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_64 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088 PEA 1 T36. Table 85 below describes the starting and ending position of this segment on each transcript. Table 85 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_65 according to the present invention can be found in the following transcript(s)- T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 86 below describes the starting and ending position of this segment on each transcript. Table 86 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_66 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 87 below describes the starting and ending position of this segment on each transcript. Table 87 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_67 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 88 below describes the starting and ending position of this segment on each transcript. Table 88 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_68 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088 PEA 1 T36. Table 89 below describes the starting and ending position of this segment on each transcript. Table 89 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_69 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 90 below describes the starting and ending position of this segment on each transcript. Table 90 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_70 according to the present invention can be found in the following transcπpt(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 91 below describes the starting and ending position of this segment on each transcript. Table 91 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_71 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_ T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24
and T05088_PEA_1_T36. Table 92 below describes the starting and ending position of this segment on each transcript. Table 92 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_72 according to the present invention is supported by 235 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 93 below describes the starting and ending position of this segment on each transcript. Table 93 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_73 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1, T05088 PEA 1 T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088 PEA 1 T36. Table 94 below describes the starting and ending position of this segment on each transcript. Table 94 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_74 according to the present invention is supported by 21 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088 PEA 1 T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 95 below describes the starting and ending position of this segment on each transcript. Table 95 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_75 according to the present invention is supported by 186 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T11, T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088_PEA_1_T36. Table 96 below describes the starting and ending position of this segment on each transcript. Table 96 - Segment location on transcripts
Segment cluster T05088_PEA_l_node_76 according to the present invention can be found in the following transcript(s): T05088_PEA_1_T5, T05088_PEA_1_T1 1 , T05088_PEA_1_T14, T05088_PEA_1_T15, T05088_PEA_1_T16, T05088_PEA_1_T17, T05088_PEA_1_T18, T05088_PEA_1_T19, T05088_PEA_1_T21, T05088_PEA_1_T24 and T05088 PEA 1 T36. Table 97 below describes the starting and ending position of this segment on each transcript. Table 97 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: KCRB_HUMAN Sequence documentation: Alignment of: T05088_PEA_1_P8 x KCRB_HUMAN Alignment segment 1/1: Quality: 3203.00 Escore: 0 Matching length: 323 Total length: 323 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY 100 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE 150 151 RRAIEKLAVEALSSLDGDLAGRYYALKSMTEAEQQQLIDDHFLFDKPVSP 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RRAIEKLAVEALSSLDGDLAGRYYALKSMTEAEQQQLIDDHFLFDKPVSP 200 201 LLLASGMARDWPDARGI HNDNKTFLVWVNEEDHLRVISMQKGGNMKEVF 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 LLLASGMARDWPDARGIWHNDNKTFLVWVNEEDHLRVISMQKGGNMKEVF 250 251 TRFCTGLTQIETLFKSKDYEFM NPHLGYILTCPSNLGTGLRAGVHIKLP 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 TRFCTGLTQIETLFKSKDYEFMWNPHLGYILTCPSNLGTGLRAGVHIKLP 300 301 NLGKHEKFSEVLKRLRLQKRGTG
323 I I II I I I I I II I I I I I I I I I I I I 301 NLGKHEKFSEVLKRLRLQKRGTG 323
Sequence name: KCRB_HUMAN Sequence documentation:
Alignment of: T05088_PEA_1_P11 x KCRB_HUMAN Alignment segment 1/1: Quality: 2154.00
Escore: 0 Matching length: 218 Total length: 218 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY 100 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE
150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE 150 151 RRAIEKLAVEALSSLDGDLAGRYYALKSMTEAEQQQLIDDHFLFDKPVSP 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RRAIEKLAVEALSSLDGDLAGRYYALKSMTEAEQQQLIDDHFLFDKPVSP 200 201 LLLASGMARDWPDARGI 218 I I I I I I I I I I I I I I I I I I 201 LLLASGMARDWPDARGIW 218
Sequence name: KCRB_HUMAN
Sequence documentation:
Alignment of: T05088_PEA_1_P12 x KCRB_HUMAN
Alignment segment 1/1: Quality: 3646.00 Escore: 0 Matching length: 381 Total length: 405 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 94.07 Total Percent Identity: 94.07 Gaps : 1
Alignment: 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY 100 101 KPSDEHKTDLNPDNLQVRGCGRAGRAGPGSSGAHSRLASQGGDDLDPNYV 150 I I I I I I I I I I I I I I I I I I I I I I I I I I 101 KPSDEHKTDLNPDNL QGGDDLDPNYV
126 151 LSSRVRTGRSIRGFCLPPHCSRGERRAIEKLAVEALSSLDGDLAGRYYAL 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 127 LSSRVRTGRSIRGFCLPPHCSRGERRAIEKLAVEALSSLDGDLAGRYYAL
176 . . . . . 201 KSMTEAEQQQLIDDHFLFDKPVSPLLLASGMARDWPDARGIWHNDNKTFL
250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 177 KSMTEAEQQQLIDDHFLFDKPVSPLLLASGMARDWPDARGIWHNDNKTFL 226 251 VWVNEEDHLRVISMQKGGNMKEVFTRFCTGLTQIETLFKSKDYEFMWNPH 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 227 VWVNEEDHLRVISMQKGGNMKEVFTRFCTGLTQIETLFKSKDYEFMWNPH 276 301 LGYILTCPSNLGTGLRAGVHIKLPNLGKHEKFSEVLKRLRLQKRGTGGVD 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 277 LGYILTCPSNLGTGLRAGVHIKLPNLGKHEKFSEVLKRLRLQKRGTGGVD 326 351 TAAVGGVFDVSNADRLGFSEVELVQMWDGVKLLIEMEQRLEQGQAIDDL 400 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 327 TAAVGGVFDVSNADRLGFSEVELVQMWDGVKLLIEMEQRLEQGQAIDDL 376 401 MPAQK 405
377 MPAQK 381
Sequence name: KCRB_HUMAN Sequence documentation:
Alignment of: T05088_PEA_1_P13 x KCRB_HUMAN Alignment segment 1/1: Quality: 2574.00
Escore: 0 Matching length: 260 Total length: 260 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.62 Total Percent Similarity: 100.00 Total Percent Identity: 99.62 Gaps: 0
Alignment: 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY 100 II I II I I II I I I I II I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY
100 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE
150 151 RRAIEKLAVEALSSLDGDLAGRYYALKSMTEAEQQQLIDDHFLFDKPVSP
200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RRAIEKLAVEALSSLDGDLAGRYYALKSMTEAEQQQLIDDHFLFDKPVSP 200
201 LLLASGMARDWPDARGIWHNDNKTFLVWVNEEDHLRVISMQKGGNMKEVF 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 LLLASGMARDWPDARGIWHNDNKTFLVWVNEEDHLRVISMQKGGNMKEVF 250 251 TRFCTGLTQV 260 I I I I I I I I I : 251 TRFCTGLTQI 260
Sequence name: KCRB_HUMAN Sequence documentation:
Alignment of: T05088_PEA_1_P14 x KCRB_HUMAN Alignment segment 1/1: Quality: 3212.00
Escore: 0 Matching length: 324 Total length: 324 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY
100 I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY
100 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE 150 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE 150 151 RRAIEKLAVEALSSLDGDLAGRYYALKSMTEAEQQQLIDDHFLFDKPVSP 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RRAIEKLAVEALSSLDGDLAGRYYALKSMTEAEQQQLIDDHFLFDKPVSP 200 201 LLLASGMARDWPDARGIWHNDNKTFLVWVNEEDHLRVISMQKGGNMKEVF 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 LLLASGMARDWPDARGIWHNDNKTFLVWVNEEDHLRVISMQKGGNMKEVF
250 . . . . . 251 TRFCTGLTQIETLFKSKDYEFMWNPHLGYILTCPSNLGTGLRAGVHIKLP
300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 TRFCTGLTQIETLFKSKDYEFMWNPHLGYILTCPSNLGTGLRAGVHIKLP 300 301 NLGKHEKFSEVLKRLRLQKRGTGG 324 II I I I I I I II I I I I I I I I I I I I I I 301 NLGKHEKFSEVLKRLRLQKRGTGG
324
Sequence name: KCRB_HUMAN Sequence documentation:
Alignment of: T05088_PEA_1_P15 x KCRB_HUMAN Alignment segment 1/1: Quality: 1586.00
Escore: 0 Matching length: 160 Total length: 160 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment:
1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY 100 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE
150 I I II I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE 150 151 RRAIEKLAVE
160 I I I I I I I I I I 151 RRAIEKLAVE
160
Sequence name: KCRB_HUMAN Sequence documentation:
Alignment of: T05088_PEA_1_P16 x KCRB_HUMAN Alignment segment 1/1: Quality: 2154.00
Escore: 0 Matching length: 218 Total length: 218 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50
51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY 100 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE
150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSRVRTGRSIRGFCLPPHCSRGE 150 151 RRAIEKLAVEALSSLDGDLAGRYYALKSMTEAEQQQLIDDHFLFDKPVSP 200 151 RRAIEKLAVEALSSLDGDLAGRYYALKSMTEAEQQQLIDDHFLFDKPVSP
200 201 LLLASGMARDWPDARGIW 218 I I I I I I I I I I I I I I I I I I 201 LLLASGMARDWPDARGIW 218
Sequence name: KCRB_HUMAN
Sequence documentation: Alignment of: T05088_PEA_1_P52 x KCRB_HUMAN Alignment segment 1/1: Quality: 641.00 Escore: 0 Matching length: 65 Total length: 65 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment: 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 51 FTLDDVIQTGVDNPG 65 I I I I I I I I I I I I I I I 51 FTLDDVIQTGVDNPG 65
Sequence name: KCRB_HUMAN Sequence documentation:
Alignment of: T05088_PEA_1_P53 x KCRB_HUMAN Alignment segment 1/1: Quality: 1157.00
Escore: 0 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY
100 101 KPSDEHKTDLNPDNLQ 116 I I I I I I I I I I I I I I I I 101 KPSDEHKTDLNPDNLQ 116
Sequence name : KCRB_HUMAN Sequence documentation:
Alignment of: T05088_PEA_1_P54 x KCRB_HUMAN Alignment segment 1/1: Quality: 1292.00
Escore: 0 Matching length: 130 Total length: 130 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFSNSHNALKLRFPAEDEFPDLSAHNNHMAKVLTPELYAELRAKSTPSG 50 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTLDDVIQTGVDNPGHPYIMTVGCVAGDEESYEVFKDLFDPIIEDRHGGY
100 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSR 130 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 KPSDEHKTDLNPDNLQGGDDLDPNYVLSSR 130 DESCRIPTION FOR CLUSTER HUMCKMA Cluster HUMCKMA features 4 transcript(s) and 40 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3.
Table 1 - Transcripts of interest
Table 3 - Proteins of interest
These sequences are variants of the known protein Creatine kinase, M chain (SwissProt accession identifier KCRM HUMAN; known also according to the synonyms EC 2.7.3.2; M-CK), SEQ ID NO: 1009, referred to herein as the previously known protein. The variants ofthe present invention have the previously described diagnostic utilities for creatine kinase variants in the previous cluster.
The sequence for protein Creatine kinase, M chain is given at the end ofthe application, as "Creatine kinase, M chain amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Creatine kinase, M chain localization is believed to be Cytoplasmic. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: creatine kinase; transferase, transferring phosphorus- containing groups, which are annotation(s) related to Molecular Function. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
The heart-selective diagnostic marker prediction engine provided the following results with regard to cluster HUMCKMA. Predictions were made for selective expression of transcripts of this cluster in heart tissue, according to the previously described methods. The numbers on the y-axis ofthe first figure refer to weighted expression of ESTs in each category, as "parts per million" (ratio ofthe expression of ESTs for a particular cluster to the expression of all ESTs in that category, according to parts per million). Overall, the following results were obtained as shown with regard to the histogram in Figure 25, concerning the number of heart-specific clones in libraries/sequences; as well as with regard to the histogram in Figure 26, concerning the actual expression of oligonucleotides in various tissues, including heart.
This cluster was found to be selectively expressed in heart for the following reasons: in a comparison of the ratio of expression ofthe cluster in heart specific ESTs to the overall expression of the cluster in non-heart ESTs, which was found to be 10.1 ; the ratio of expression ofthe cluster in heart specific ESTs to the overall expression ofthe cluster in muscle-specific ESTs which was found to be 0.3; and fisher exact test P-values were computed both for library and weighted clone counts to check that the counts are statistically significant, and were found to be 3.80E-56.
One particularly important measure of specificity of expression of a cluster in heart tissue is the previously described comparison ofthe ratio of expression ofthe cluster in heart as opposed to muscle. This cluster was found to be specifically expressed in heart as opposed to non-heart ESTs as described above. However, many proteins have been shown to be generally expressed at a higher level in both heart and muscle, which is less desirable. For this cluster, as described above, the ratio of expression ofthe cluster in heart specific ESTs to the overall expression of the cluster in muscle-specific ESTs which was found to be 10.1, which clearly supports specific expression in heart tissue. As noted above, cluster HUMCKMA features 4 transcript(s), which were listed in
Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Creatine kinase, M chain. A description of each variant protein according to the present invention is now provided.
Variant protein HUMCKMA_PEA_1_P9 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCKMA_PEA_1_T13. An alignment is given to the known protein (Creatine kinase, M chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCKMA_PEA_1_P9 and KCRM_HUMAN: 1.An isolated chimeric polypeptide encoding for HUMCKMA PEA 1 P9, comprising a first amino acid sequence being at least 90 % homologous to MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAK corresponding to amino acids 1 - 32 of KCRM HUMAN, which also corresponds to amino acids 1 - 32 of HUMCKMA_PEA_1_P9, and a second amino acid sequence being at least 90 % homologous to
RRAVEKLSVEALNSLTGEFKGKYYPLKSMTEKEQQQLIDDHFLFDKPVSPLLLASG MARDWPDARGIWHNDNKSFLVWVNEEDHLRVISMEKGGNMKEVFRRFCVGLQK1 EEIFKKAGHPFMWNQHLGYVLTCPSNLGTGLRGGVHVKLAHLSKHPKFEEILTRLR LQKRGTGGVDTAAVGSVFDVSNADRLGSSEVEQVQLVVDGVKLMVEMEKKLEKG QSIDDMIPAQK corresponding to amino acids 151 - 381 of KCRM_HUMAN, which also corresponds to amino acids 33 - 263 of HUMCKMA PEA 1 P9, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMCKMA_PEA_1_P9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise KR, having a structure as follows: a sequence starting from any of amino acid numbers 32-x to 32; and ending at any of amino acid numbers 33+ ((n-2) - x), in which x varies from 0 to n-2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other
specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein- Variant protein HUMCKMA_PEA_1_P9 also has the following non-silent SNPs (Single Nucleotide Polymorphisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCKMA_PEA_1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein HUMCKMA_PEA_1_P9 is encoded by the following transcript(s): HUMCKMA_PEA_1_T13, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMCKMA_PEA_1_T13 is shown in bold; this coding portion starts at position 187 and ends at position 975. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCKMA_PEA_1_P9
sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HUMCKMA_PEA_1_P20 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s)
HUMCKMA_PEA_1_T5. An alignment is given to the known protein (Creatine kinase, M chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCKMA_PEA_1_P20 and KCRM_HUMAN: l.An isolated chimeric polypeptide encoding for HUMCKMA PEA 1 P20, comprising a first amino acid sequence being at least 90 % homologous to MPFGNTHNKJKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSGFTVDD V corresponding to amino acids 1 - 56 of KCRM HUM AN, which also corresponds to amino acids 1 - 56 of HUMCKMA_PEA_1_P20, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TPSS corresponding to amino acids 57 - 60 of HUMCKMA_PEA_1_P20, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMCKMA PEA 1 P20, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TPSS in HUMCKMA_PEA_1_P20. Comparison report between HUMCKMA_PEA_1_P20 and AAP35439 (SEQ ID NO: 1426): l.An isolated chimeric polypeptide encoding for HUMCKMA_PEA_1_P20, comprising a first amino acid sequence being at least 90 % homologous to MPFGNTHNKFKLNYK EEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSGFTVDD V corresponding to amino acids 1 - 56 of AAP35439, which also corresponds to amino acids 1 - 56 of HUMCKMA_PEA_1_P20, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TPSS corresponding to amino acids 57 - 60 of HUMCKMA_PEA_1_P20, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCKMA_PEA_1_P20, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TPSS in HUMCKMA_PEA_1_P20.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither of the trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein HUMCKMA_PEA_1_P20 is encoded by the following transcript(s): HUMCKMA_PEA_1_T5, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCKMA_PEA_1_T5 is shown in bold; this coding portion starts at position 187 and ends at position 366. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCKMA_PEA_1_P20 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HUMCKMA_PEA_1_P22 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCKMA_PEA_1_T8. An alignment is given to the known protein (Creatine kinase, M chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCKMA_PEA_1_P22 and KCRM_HUMAN: l .An isolated chimeric polypeptide encoding for HUMCKMA_PEA_1_P22, comprising a first amino acid sequence being at least 90 % homologous to MPFGNTHNKJKLNYKPEEEYPDLSKJI lNHMAKVLTLELYKKLRDKETPSGFTVDD VIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGYKPTDKHKTDLNH
ENLKGGDDLDPNY corresponding to amino acids 1 - 125 of KCRM_HUMAN, which also corresponds to amino acids 1 - 125 of HUMCKMA_PEA_1_P22, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LSTA corresponding to amino acids 126 - 129 of
HUMCKMA_PEA_1_P22, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCKMA_PEA_1_P22, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LSTA in HUMCKMA_PEA_1_P22. Comparison report between HUMCKMA_PEA_1_P22 and AAP35439: l .An isolated chimeric polypeptide encoding for HUMCKMA_PEA_1_P22, comprising a first amino acid sequence being at least 90 % homologous to MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSGFTVDD VIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGYKPTDKHKTDLNH ENLKGGDDLDPNY corresponding to amino acids 1 - 125 of AAP35439, which also corresponds to amino acids 1 - 125 of HUMCKMA PEA 1 P22, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LSTA corresponding to amino acids 126 - 129 of HUMCKMA_PEA_1_P22, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCKMA PEA 1 P22, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LSTA in HUMCKMA_PEA_1_P22.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane
region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein HUMCKMA_PEA_1_P22 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCKMA_PEA_1_P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
Variant protein HUMCKMA_PEA_1_P22 is encoded by the following transcript(s): HUMCKMA PEA 1 T8, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCKMA PEA 1 T8 is shown in bold; this coding portion starts at position 187 and ends at position 573. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCKMA_PEA_1_P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein HUMCKMA_PEA_1_P23 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMCKMA_PEA_1_T9. An alignment is given to the known protein (Creatine kinase, M chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMCKMA_PEA_1_P23 and KCRM_HUMAN:
l.An isolated chimeric polypeptide encoding for HUMCKMA_PEA_1_P23, comprising a first amino acid sequence being at least 90 % homologous to MPFGNTHNKFK1NYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSGFTVDD VIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGYKPTDKHKTDLNH ENLK corresponding to amino acids 1 - 116 of KCRM TUMAN, which also corresponds to amino acids 1 - 1 16 of HUMCKMA_PEA_1_P23, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LSTA corresponding to amino acids 1 17 - 120 of HUMCKMA_PEA_1_P23, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMCKMA_PEA_1_P23, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LSTA in HUMCKMA_PEA_1_P23. Comparison report between HUMCKMA_PEA_1_P23 and AAP35439: l.An isolated chimeric polypeptide encoding for HUMCKMA_PEA_1_P23, comprising a first amino acid sequence being at least 90 % homologous to MPFGNTHNKJKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSGFTVDD VIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGYKPTDKHKTDLNH ENLK corresponding to amino acids 1 - 1 16 of AAP35439, which also corresponds to amino acids 1 - 1 16 of HUMCKMA PEA 1 P23, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LSTA corresponding to amino acids 117 - 120 of HUMCKMA_PEA_1_P23, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMCKMA_PEA_1_P23, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LSTA in HUMCKMA_PEA_1_P23.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to
the cell: intracellularly. The protein localization is believed to be intracellularly because neither ofthe trans-membrane region prediction programs predicted a trans-membrane region for this protein. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein HUMCKMA_PEA_1_P23 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCKMA PEA 1 P23 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
Variant protein HUMCKMA PEA 1 P23 is encoded by the following transcript(s): HUMCKMA_PEA_1_T9, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMCKMA_PEA_1_T9 is shown in bold; this coding portion starts at position 187 and ends at position 546. The transcript also has the following SNPs as listed in Table 1 1 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMCKMA_PEA_1_P23 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Nucleic acid SNPs
Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMCKMA_PEA_l_node_0 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13,
HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_29 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_42 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_47 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13,
HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts TFanscri riame Segmenflnmng position! HUMCKMA PEA 1 T13 1218 1315 HUMCKMA PEA 1 T5 1544 1641 HUMCKMA PEA 1 T8 1466 1563 HUMCKMA PEA 1 T9 1439 1536
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMCKMA PEA l node l 1 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_12 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA PEA 1 T9. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_13 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 18 below describes the starting and ending position of this segment on each transcπpt. Table 18 - Segment location on transcripts Trans pt arnel' I ' If, ! it® gegmenf||artιng positionjξ Segment ending position HUMCKMA PEA 1 T5 442 445 HUMCKMA PEA 1 T8 470 473 HUMCKMA PEA 1 T9 470 473
Segment cluster HUMCKMA_PEA_l_node_14 according to the present invention can be found in the following transcπpt(s): HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA PEA 1 T9. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts M Ml wπ a • . *. Segment tartmgffiosιtιoni.l ISegment .endmg.rjosition j HUMCKMA PEA 1 T5 446 485 HUMCKMA PEA 1 T8 474 513 HUMCKMA PEA 1 T9 474 513
Segment cluster HUMCKMA_PEA_l_node_15 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA PEA 1 T9. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_16 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8
and HUMCKMA_PEA_1_T9. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_18 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5 and HUMCKMA_PEA_1_T8. Table 22 below describes the starting and ending position of this segment on each transcript. 7αό/e 22 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_19 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5 and HUMCKMA_PEA_1_T8. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_20 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_21 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_22 according to the present invention can be found in the following transcript(s): HUMCKMA PEA 1 T5. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_23 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts • Transcnptfname ISegment«startmgpositionl iJegmentønαmgβsitioηΛ HUMCKMA PEA 1 T5 581 604
Segment cluster HUMCKMA_PEA_l_node_24 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts ϊTranscnptmame f iζf*.- m Se *gκme fnteshtart &i&n&g pos &ition sm ; Segmentø . ndASinIg position HUMCKMA PEA 1 T5 605 608
Segment cluster HUMCKMA_PEA_l_node_25 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13 and
HUMCKMA_PEA_1_T5. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_26 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13 and HUMCKMA_PEA_1_T5. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts Transcπpt nameiiPt aaw 4.* |Segment artmg*posιtιon> f HUMCKMA PEA 1 T13 291 313 HUMCKMA PEA 1 T5 617 639
Segment cluster HUMCKMA_PEA_l_node_28 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_3 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Transcπpt name " Segment ending position \, HUMCKMA PEA 1 T13 169 203 HUMCKMA PEA 1 T5 169 203 HUMCKMA PEA 1 T8 169 203 HUMCKMA PEA 1 T9 169 203
Segment cluster HUMCKMA_PEA_l_node_31 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_32 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_33 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts _!»» Ά ?
J ranscripflϊanfe t.*f r~ Segment startmgl ositiorr' - [ Segment ending position;
Segment cluster HUMCKMA_PEA_l_node_34 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_35 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_37 according to the present invention can be found in the following transcript(s),! HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_38 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_39 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_4 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_41 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts Transcπpttnameiϊi Segmentendingipositioif Wm HUMCKMA PEA 1 T13 800 895 HUMCKMA PEA 1 T5 1126 1221 HUMCKMA PEA 1 T8 1048 1 143 HUMCKMA PEA 1 T9 1021 1116
Segment cluster HUMCKMA_PEA_l_node_43 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_44 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
HUMCKMA PEA 1 T9 1309 1361
Segment cluster HUMCKMA_PEA_l_node_45 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_46 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_5 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T13, HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_6 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_7 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T5, HUMCKMA_PEA_1_T8 and HUMCKMA PEA 1 T9. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_8 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T8 and HUMCKMA_PEA_1_T9. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Segment cluster HUMCKMA_PEA_l_node_9 according to the present invention can be found in the following transcript(s): HUMCKMA_PEA_1_T8 and
HUMCKMA_PEA_1_T9. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: 7tmp/eUrHZQyT79/QyRrZvC5eX: KCRM_HUMAN Sequence documentation: Alignment of: HU CKMA_PEA_1_P9 x KCR _HUMAN Alignment segment 1/1: Quality: 2506.00 Escore: 0 Matching length: 263 Total length: 381 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 69.03 Total Percent Identity: 69.03 Gaps : 1 Alignment : 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAK 32 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 32 32 51 FTVDDVIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPI ISDRHGGY 100 32 32 101 KPTDKHKTDLNHENLKGGDDLDPNYVLSSRVRTGRS IKGYTLPPHCSRGE 150 33 RRAVEKLSVEALNSLTGEFKGKYYPLKSMTEKEQQQLIDDHFLFDKPVSP 82
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RRAVEKLSVEALNSLTGEFKGKYYPLKSMTEKEQQQLIDDHFLFDKPVSP 200 83 LLLASGMARDWPDARGI HNDNKSFLVWVNEEDHLRVISMEKGGNMKEVF
132 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 LLLASGMARDWPDARGIWHNDNKSFLVWVNEEDHLRVISMEKGGNMKEVF
250 133 RRFCVGLQKIEEIFKKAGHPFM NQHLGYVLTCPSNLGTGLRGGVHVKLA
182 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I II 251 RRFCVGLQKIEEIFKKAGHPFM NQHLGYVLTCPSNLGTGLRGGVHVKLA 300 183 HLSKHPKFEEILTRLRLQKRGTGGVDTAAVGSVFDVSNADRLGSSEVEQV
232 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 301 HLSKHPKFEEILTRLRLQKRGTGGVDTAAVGSVFDVSNADRLGSSEVEQV 350 233 QLVVDGVKLMVEMEKKLEKGQSIDDMIPAQK 263 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 QLVVDGVKLMVEMEKKLEKGQSIDDMIPAQK 381
Sequence name: /tmp/I17VB5XyRf/Dz iESpv4W:KCRM_HUMAN
Sequence documentation:
Alignment of: HUMCKMA_PEA_1_P20 x KCRM_HUMAN Alignment segment 1/1: Quality: 567.00 Escore: 0 Matching length: 56 Total length: 56 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Al ignment : 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 51 FTVDDV 56 I I I I I I 51 FTVDDV 56
Sequence name: /tmp/I17VB5XyRf/Dz iESpv4 : AAP35439
Sequence documentation: Alignment of: HUMCKMA_PEA_1_P20 x AAP35439
Alignment segment 1/1: Quality: 567.00 Escore: 0 Matching length: 56 Total length: 56 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 51 FTVDDV 56 I I I I I I 51 FTVDDV 56
Sequence name: /tmp/At6STwGe P/7wnvdEPEWG:KCRM__HUMAN Sequence documentation:
Alignment of: HUMCKMA_PEA_1_P22 x KCRM_HUMAN
Alignment segment 1/1: Quality: 1259.00 Escore: 0 Matching length: 125 Total length: 125 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 I I II II I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 51 FTVDDVIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGY
100 51 FTVDDVIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGY
100 101 KPTDKHKTDLNHENLKGGDDLDPNY 125 I I I I I I I I I I I I I I I I I I I I I I I I I 101 KPTDKHKTDLNHENLKGGDDLDPNY 125
Sequence name: /tmp/At6STwGeWP/7wnvdEPEWG: AAP35439
Sequence documentation:
Alignment of: HUMCKMA_PEA_1_P22 x AAP35439 Alignment segment 1/1: Quality: 1259.00 Escore: 0 Matching length: 125 Total length: 125
Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 51 FTVDDVIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGY
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTVDDVIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGY
100 101 KPTDKHKTDLNHENLKGGDDLDPNY 125 I I I I I I I I I I I I I I I I I I I I I I I I I 101 KPTDKHKTDLNHENLKGGDDLDPNY
125
Sequence name: /tmp/ gpkEotpHE/qraG5A20Mj : KCRM_HUMAN Sequence documentation:
Alignment of: HUMCKMA_PEA_1_P23 x KCRM_HUMAN Alignment segment 1/1: Quality: 1168.00
Escore: 0 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 51 FTVDDVIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGY 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTVDDVIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGY
100 101 KPTDKHKTDLNHENLK
116 I I I I I I I I I I I I I I I I 101 KPTDKHKTDLNHENLK
116
Sequence name: /tmp/WgpkEotpHE/qraG5A20Mj : AAP35439
Sequence documentation:
Alignment of: HUMCKMA_PEA_1_P23 x AAP35439
Alignment segment 1/1: Quality: 1168.00 Escore: 0 Matching length: 116 Total length: 116 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MPFGNTHNKFKLNYKPEEEYPDLSKHNNHMAKVLTLELYKKLRDKETPSG 50 51 FTVDDVIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGY 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTVDDVIQTGVDNPGHPFIMTVGCVAGDEESYEVFKELFDPIISDRHGGY
100 101 KPTDKHKTDLNHENLK 116
I I I I I I I I I I I I I I I I 101 KPTDKHKTDLNHENLK 116
Subsection G: Ferritin heavy chain
DESCRIPTION FOR CLUSTER HUMFERHA Cluster HUMFERHA features 15 transcript(s) and 59 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
HUMFERHA PEA 1 T27 1014 HUMFERHA PEA 1 T30 1015 HUMFERHA PEA 1 T31 1016 HUMFERHA PEA 1 T32 1017 HUMFERHA PEA 1 T35 1018 HUMFERHA PEA 1 T36 1019 HUMFERHA PEA 1 T41 1020 HUMFERHA PEA 1 T42 1021 HUMFERHA PEA 1 T43 1022 HUMFERHA PEA 1 T44 1023 HUMFERHA PEA 1 T46 1024 HUMFERHA PEA 1 T50 1025 HUMFERHA PEA 1 T51 1026 HUMFERHA PEA 1 T54 1027 HUMFERHA PEA 1 T59 1028
Table 2 - Segments of interest
HUMFERHA PEA 1 node 64 1087
Table 3 - Proteins of interest
These sequences are variants of the known protein Ferritin heavy chain (SwissProt accession identifier FRIH HUMAN; known also according to the synonyms Ferritin H subunit), SEQ ID NO: 1088, referred to herein as the previously known protein. Protein Ferritin heavy chain is known or believed to have the following function(s): Ferritin is an intracellular molecule that stores iron in a soluble, nontoxic, readily available form. The functional molecule, which is composed of 24 chains, is roughly spherical and contains a central cavity into which the polymeric ferric iron core is deposited.
The sequence for protein Ferritin heavy chain is given at the end ofthe application, as "Ferritin heavy chain amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: iron transport; intracellular iron storage; immune response; cell proliferation; negative control of cell proliferation, which are annotation(s)
related to Biological Process; ligand binding or carrier; iron binding; ferric iron binding, which are annotation(s) related to Molecular Function; and plasma membrane; ferritin, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot >; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HUMFERHA features 15 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Ferritin heavy chain. The variants ofthe present invention for this cluster are optionally and preferably used for the following diagnostic utility: iron deficiency anemia. A description of each variant protein according to the present invention is now provided.
Variant protein HUMFERHA_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMFERHA_PEA_1_T35. An alignment is given to the known protein (Ferritin heavy chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFERHA_PEA_1_P6 and FRIH_HUMAN_V1 (SEQ ID NO: 1089): l.An isolated chimeric polypeptide encoding for HUMFERHA_PEA_1_P6, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MTTASTSQ corresponding to amino acids 1 - 8 of HUMFERHA_PEA_1_P6, and a second amino acid sequence being at least 90 % homologous to SYYFDRDDVALKNFAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDIKKPDCDD WESGLNAMECALHLEKNVNQSLLELHKLATDKNDPHLCDFIETHYLNEQVKAIKEL GDHVTNLRKMGAPESGLAEYLFDKHTLGDSDNES corresponding to amino acids 39 - 183 of FRIH HUMAN Vl, which also corresponds to amino acids 9 - 153 of
HUMFERHA PEA l Pό, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of HUMFERHA_PEA_1_P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MTTASTSQ of HUMFERHA_PEA_1_P6.
It should be noted that the known protein sequence (FRIH_HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for FRIH HUMAN Vl . These changes were previously known to occur and are listed in the table below. Table 5 - Changes to FRIH HUMAN _V1
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region.
Variant protein HUMFERHA PEA l Pό is encoded by the following transcript(s): HUMFERHA_PEA_1_T35, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMFERHA_PEA_1_T35 is shown in bold; this coding portion starts at position 342 and ends at position 800. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HUMFERHA PEA 1 P7 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMFERHA_PEA_1_T41. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region.
Variant protein HUMFERHA_PEA_1_P7 is encoded by the following transcript(s): HUMFERHA_PEA_1_T41, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMFERHA_PEA_1_T41 is shown in bold; this coding portion starts at position 342 and ends at position 668. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP
is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HUMFERHA_PEA_1_P25 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMFERHA_PEA_1_T27. An alignment is given to the known protein (Ferritin heavy chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HUMFERHA_PEA_1_P25 and FRIH_HUMAN_V1 (SEQ ID NO: 1089): l.An isolated chimeric polypeptide encoding for HUMFERHA_PEA_1_P25, comprising a first amino acid sequence being at least 90 % homologous to TTASTSQVRQNYHQDSEAAI RQINLELYASYVYLSM corresponding to amino acids 2 - 38 of FRIH HUMAN Vl, which also corresponds to amino acids 1 - 37 of HUMFERHA_PEA_1_P25, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence IVTATCLWGSLV corresponding to amino acids 38 - 49 of HUMFERHA_PEA_1_P25, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMFERHA_PEA_1_P25, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence IVTATCLWGSLV in HUMFERHA_PEA_1_P25.
It should be noted that the known protein sequence (FRIH HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for FRIH_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 8 - Changes to FRIH _HU MAN _V1
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region.
Variant protein HUMFERHA_PEA_1_P25 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein HUMFERHA_PEA_1_P25 is encoded by the following transcript(s): HUMFERHA PEA 1 T27, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMFERHA_PEA_1_T27 is shown in bold; this coding portion starts at position 345 and ends at position 491. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P25 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HUMFERHA_PEA_1_P26 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMFERHA_PEA_1_T30 and HUMFERHA_PEA_1_T31. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein HUMFERHA_PEA_1_P26 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P26 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein HUMFERHA_PEA_1_P26 is encoded by the following transcript(s): HUMFERHA_PEA_1_T30 and HUMFERHA_PEA_1_T31, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMFERHA_PEA_1_T30 is shown in bold; this coding portion starts at position 345 and ends at position 668. The transcript also has the
following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P26 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
The coding portion of transcript HUMFERHA_PEA_1_T31 is shown in bold; this coding portion starts at position 345 and ends at position 668. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P26 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein HUMFERHA_PEA_1_P27 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMFERHA_PEA_1_T32. An alignment is given to the known protein (Ferritin heavy chain) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFERHA_PEA_1_P27 and FRIH_HUMAN_V 1 : l.An isolated chimeric polypeptide encoding for HUMFERHA_PEA_1_P27, comprising a first amino acid sequence being at least 90 % homologous to TTASTSQVRQNYHQDSEAAINRQI LELYASYVYL corresponding to amino acids 2 - 36 of FRIH HUMAN Vl, which also corresponds to amino acids 1 - 35 of HUMFERHA_PEA_1_P27, and a second amino acid sequence being at least 70%,
optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VPLSVLLL corresponding to amino acids 36 - 43 of HUMFERHA_PEA_1_P27, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMFERHA_PEA_1_P27, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VPLSVLLL in HUMFERHA_PEA_1_P27.
It should be noted that the known protein sequence (FRIH_HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for FRIH HUMAN V 1. These changes were previously known to occur and are listed in the table below. Table 14 - Changes to FRIH_HUMAN_V1 ϊβsiϊi6n(s)'δn Ψ |Type*o|*changJi h am' »m»oftacid >.r Sste«αauae»nc .-e-Jaliil U.. "J "is mit met
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein HUMFERHA_PEA_1_P27 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA PEA 1 P27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Ammo acid mutations
Variant protein HUMFERHA_PEA_1_P27 is encoded by the following transcript(s): HUMFERHA_PEA_1_T32, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMFERHA PEA 1 T32 is shown in bold; this coding portion starts at position 345 and ends at position 473. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P27 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein HUMFERHA_PEA_1_P29 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMFERHA_PEA_1_T43. An alignment is given to the known protein (Ferritin heavy chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFERHA_PEA_1_P29 and FRIH_HUM AN_V 1 : 1.An isolated chimeric polypeptide encoding for HUMFERHA_PEA_1_P29, comprising a first amino acid sequence being at least 90 % homologous to TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKNFAKYFL HQSHEEREHAEKLMKLQNQRGGRIFLQDIK corresponding to amino acids 2 - 87 of FRIH HUMAN Vl, which also corresponds to amino acids 1 - 86 of HUMFERHA PEA 1 P29, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VNKRS corresponding to amino acids 87 - 91 of HUMFERHA PEA 1 P29, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMFERHA_PEA_1_P29, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VNKRS in HUMFERHA_PEA_1_P29.
It should be noted that the known protein sequence (FRIH HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for FRIH_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 17 - Changes to FRIH_HUMAN_V1
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein HUMFERHA_PEA_1_P29 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 18, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Amino acid mutations
Variant protein HUMFERHA_PEA_1_P29 is encoded by the following transcript(s): HUMFERHA PEA 1 T43, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMFERHA_PEA_1_T43 is shown in bold; this coding portion starts at position 345 and ends at position 617. The transcript also has the following SNPs as listed in Table 19 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P29 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 19 - Nucleic acid SNPs
Variant protein HUMFERHA PEA 1 P30 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMFERHA_PEA_1_T44. An alignment is given to the known protein (Ferritin heavy chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFERHA_PEA_1_P30 and FRIH HUMAN Vl : l.An isolated chimeric polypeptide encoding for HUMFERHA_PEA_1_P30, comprising a first amino acid sequence being at least 90 % homologous to TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKNFAKYFL HQSHEEREHAEKLMKLQNQRGGRIFLQDIK corresponding to amino acids 2 - 87 of FRIH_HUMAN_V1, which also corresponds to amino acids 1 - 86 of HUMFERHA_PEA_1_P30, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous tc a polypeptide having the sequence TVMTGRAG corresponding to amino acids 87 - 94 of HUMFERHA_PEA_1_P30, wherein said first
amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMFERHA_PEA_1_P30, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TVMTGRAG in HUMFERHA_PEA_1_P30.
It should be noted that the known protein sequence (FRIH HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for FRIH HUMAN Vl . These changes were previously known to occur and are listed in the table below. Table 20 - Changes to FRIH_HUMAN_V1
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein HUMFERHA_PEA_1_P30 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 21, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 21 - Amino acid mutations
Variant protein HUMFERHA_PEA_1_P30 is encoded by the following transcript(s): HUMFERHA_PEA_1_T44, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMFERHA_PEA_1_T44 is shown in bold; this coding portion starts at position 345 and ends at position 626. The transcript also has the following SNPs as listed in Table 22 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P30 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 22 - Nucleic acid SNPs
Variant protein HUMFERHA_PEA_1_P31 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMFERHA_PEA_1_T46. An alignment is given to the known protein (Ferritin heavy
chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFERHA_PEA_ 1 _P31 and FRIH_HUMAN_V 1 : l.An isolated chimeric polypeptide encoding for HUMFERHA_PEA_1_P31, comprising a first amino acid sequence being at least 90 % homologous to TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSM corresponding to amino acids 2 - 38 of FRIH HUMAN Vl, which also corresponds to amino acids 1 - 37 of HUMFERHA PEA 1 _P31 , and a second amino acid sequence being at least 90 % homologous to KPDCDDWESGLNAMECALHLEKNVNQSLLELHKLATDKNDPHLCDFIETHYLNEQ VKAIKELGDHVTNLRKMGAPESGLAEYLFDKHTLGDSDNES corresponding to amino acids 88 - 183 of FRIH_HUMAN_V 1 , which also corresponds to amino acids 38 - 133 of HUMFERHA_PEA_1_P31 , wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMFERHAJPEA 1 P31, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MK, having a structure as follows: a sequence starting from any of amino acid numbers 37-x to 37; and ending at any of amino acid numbers 38+ ((n-2) - x), in which x varies from 0 to n-2.
It should be noted that the known protein sequence (FRIH HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for FRIH_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 23 - Changes to FRIH_HUMAN_V1
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protejn HUMFERHA_PEA_1_P31 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 24, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P31 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 24 - Amino acid mutations
Variant protein HUMFERHA PEA 1 P31 is encoded by the following transcript(s): HUMFERHA_PEA_1_T46, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMFERHA_PEA_1_T46 is shown in bold; this coding portion starts at position 345 and ends at position 743. The transcript also has the following SNPs as listed in Table 25 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA PEA 1 P31 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 25 - Nucleic acid SNPs
Variant protein HUMFERHA PEA 1 P34 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMFERHA_PEA_1_T50 and HUMFERHA PEA T59. An alignment is given to the known protein (Ferritin heavy chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFERHA_PEA_1_P34 and FRIH_HUMAN_V1 : l .An isolated chimeric polypeptide encoding for HUMFERHA_PEA_1_P34, comprising a first amino acid sequence being at least 90 % homologous to TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKNFAKYFL HQSHEEREHAEKLMKLQNQRGGRIFLQDIKKPDCDDWESGLNAMECALHLEKNVN QSLLELHKLATDKNDPH corresponding to amino acids 2 - 129 of FRIH_HUMAN_V1, which also corresponds to amino acids 1 - 128 of HUMFERHA_PEA_1_P34, and a second
amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSIGTPGNKWRKSFALGIGKAAH corresponding to amino acids 129 - 151 of HUMFERHA_PEA_1_P34, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HUMFERHA_PEA_1_P34, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSIGTPGNKWRKSFALGIGKAAH in HUMFERHA PEA 1 P34.
It should be noted that the known protein sequence (FRIH HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for FRIH HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 26 - Changes to FRIH_HUMAN_V1
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein HUMFERHA_PEA_1_P34 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 27, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 27 - Amino acid mutations
Variant protein HUMFERHA_PEA_1_P34 is encoded by the following transcript(s): HUMFERHA_PEA_1_T50 and HUMFERHA_PEA_1_T59, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMFERHA_PEA_1_T50 is shown in bold; this coding portion starts at position 345 and ends at position 797. The transcript also has the following SNPs as listed in Table 28 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 28 - Nucleic acid SNPs
The coding portion of transcript HUMFERHA_PEA_1_T59 is shown in bold; this coding portion starts at position 316 and ends at position 768. The transcript also has the following SNPs as listed in Table 29 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA PEA 1 P34 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 29 - Nucleic acid SNPs
Variant protein HUMFERHA_PEA_1_P35 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMFERHA_PEA_1_T51. An alignment is given to the known protein (Ferritin heavy
chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFERHA_PEA_ 1 _P35 and FRIH_HUMAN_V 1 : l .An isolated chimeric polypeptide encoding for HUMFERHA_PEA_1_P35, comprising a first amino acid sequence being at least 90 % homologous to TTASTSQVRQNYHQDSE corresponding to amino acids 2 - 18 of FRIH_HUMAN_V1, which also corresponds to amino acids 1 - 17 of HUMFERHA_PEA_1_P35, and a second amino acid sequence being at least 90 % homologous to KPDCDDWESGLNAMECALHLEKNVNQSLLELHKLATDKNDPHLCDFIETHYLNEQ VKAIKELGDHVTNLRKMGAPESGLAEYLFDKHTLGDSDNES corresponding to amino acids 88 - 183 of FRIH_HUMAN_V1, which also corresponds to amino acids 18 - 1 13 of HUMFERHA_PEA_1_P35, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HUMFERHA_PEA_1_P35, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise EK, having a structure as follows: a sequence starting from any of amino acid numbers 17-x to 17; and ending at any of amino acid numbers 18+ ((n-2) - x), in which x varies from 0 to n-2. It should be noted that the known protein sequence (FRIH HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for FRIH_HUMAN_V1. These changes were previously known to occur and are listed in the table below. Table 30 - Changes to FRIH_HUMAN_V1
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other
specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein HUMFERHA_PEA_1_P35 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 31, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P35 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 31 - Amino acid mutations
Variant protein HUMFERHAJPEA 1 P35 is encoded by the following transcript(s): HUMFERHA_PEA_1_T51, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMFERHA_PEA_1_T51 is shown in bold; this coding portion starts at position 345 and ends at position 683. The transcript also has the following SNPs as listed in Table 32 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P35 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 32 - Nucleic acid SNPs
Variant protein HUMFERHA_PEA_1_P37 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by trariscript(s) HUMFERHA_PEA_1_T54. An alignment is given to the known protein (Ferritin heavy chain) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMFERHA_PEA_1_P37 and FRIH_HUMAN_V1 : l .An isolated chimeric polypeptide encoding for HUMFERHA_PEA_1_P37, comprising a first amino acid sequence being at least 90 % homologous to TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKNFAKYFL HQSHEEREHAEKLMKLQNQRGGRIFLQDIKKPDCDDWESGLNAMECALHLEKNVN QSLLELHKLATDKNDPH corresponding to amino acids 2 - 129 of FRIH_HUMAN_V 1 , which also corresponds to amino acids 1 - 128 of HUMFERHA_PEA_1_P37, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologcus to a polypeptide having the sequence VSESHQRIG corresponding to amino acids 129 - 137 of
HUMFERHA_PEA_1_P37, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMFERHA_PEA_1_P37, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSESHQRIG in HUMFERHA_PEA_1_P37.
It should be noted that the known protein sequence (FRIH_HUMAN) has one or more changes than the sequence given at the end ofthe application and named as being the amino acid sequence for FRIH HUMAN Vl. These changes were previously known to occur and are listed in the table below. Table 33 - Changes to FRIH _H MAN _V1
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region. Variant protein HUMFERHA_PEA_1_P37 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 34, (given according to their positιon(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P37 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 34 - Amino acid mutations
Variant protein HUMFERHA_PEA_1_P37 is encoded by the following transcript(s): HUMFERHA_PEA_1_T54, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMFERHA_PEA_1_T54 is shown in bold; this coding portion starts at position 345 and ends at position 755. The transcript also has the following SNPs as listed in Table 35 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMFERHA_PEA_1_P37 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 35 - Nucleic acid SNPs
As noted above, cluster HUMFERHA features 59 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMFERHA_PEA_l_node_4 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA PEA T43, HUMFERHA PEA 1 T44, HUMFERHA_PEA_1_T46, HUMFERHA PEAJ T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_28 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T30,
HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T41 and HUMFERHA_PEA_1_T42. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_30 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T41 and HUMFERHA PEA T42. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_31 according to the present invention is supported by 115 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T41 and HUMFERHA_PEA_1_T42. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_43 according to the present invention is supported by 70 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T41 and HUMFERHA_PEA_1_T43. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts Transcπptlrϊame Se ment , -,_ __. „_. T fd •starting Dositibn" / „ ppsition HUMFERHA PEA 1 T41 2399 2654 HUMFERHA PEA 1 T43 603 858
Segment cluster HUMFERHA_PEA_l_node_65 according to the present invention is supported by 74 libraries The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51 , HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 41 below descπbes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
According to an optional embodiment of the present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMFERHA_PEA_l_node_6 according to the present invention is supported by 111 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_7 according to the present invention is supported by 189 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31 , HUMFERHA_PEA_1_T32, HUMFERHA PEA _T35, HUMFERHA_PEA_1 T36, HUMFERHA_PEA_1_T41 ,
HUMFERHA_PEA_1_T42, HUMFERHAJ>EA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_ T59. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_8 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHAJPEA_1_T31 , HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_9 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_10 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27,
HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_l 1 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51 , HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_12 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA PEA 1 T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA PEA 1 T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_13 according to the present invention can be found in the following transcriρt(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_14 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41 , HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_15 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_16 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31 , HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51 , HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_17 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51 and HUMFERHA_PEA_1_T54. Table 53 below describes the starting and ending position of this segment on each transcript.
Table 53 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_18 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51 and HUMFERHA_PEA_1_T54. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_19 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51 and HUMFERHA_PEA_1_T54. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_20 according to the present invention is supported by 225 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31 , HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA JPEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_21 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_22 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA PEA 1 T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_23 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1 _T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA _PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 59 below describes the starting and ending position of this segment on each transcript.
Table 59 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_24 according to the present invention can be found in the following transcript(s)- HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 60 below describes the starting and ending position of this segment on each transcript. Table 60 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_25 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_26 according to the present invention is supported by 231 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31 , HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 62 below describes the starting and ending position of this segment on each transcript. Table 62 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_27 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA PEA 1 T59. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_29 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T41 and HUMFERHA_PEA_1_T42. Table 64 below describes the starting and ending position of this segment on each transcript.
Table 64 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_32 according to the present invention is supported by 76 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA PEA 1 T27, HUMFERHA_PEA_1_T41 and HUMFERHA_PEA_1_T42. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_33 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T41 and HUMFERHA_PEA_1_T42. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_34 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32,
HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_35 according to the present invention is supported by 260 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_36 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31 , HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 69 below describes the starting and ending position of this segment on each transcript. Table 69 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_38 according to the present invention can be found in the following' transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31 , HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_I_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 70 below describes the starting and ending position of this segment on each transcript.
Table 70 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_39 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 71 below describes the starting and ending position of this segment on each transcript. Table 71 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_40 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA _T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_41 according to the present invention can be found in the following transcript(s): HUMFERHA PEA 1 T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 73 below describes the starting and ending position of this segment on each transcript. Table 73 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_42 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 74 below describes the starting and ending position of this segment on each transcript. Table 74 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_44 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31 , HUMFERHA_PEA_1_T32,
HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 75 below describes the starting and ending position of this segment on each transcript. Table 75 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_45 according to the present invention is supported by 279 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 76 below describes the starting and ending position of this segment on each transcript. Table 76 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_46 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 77 below describes the starting and ending position of this segment on each transcript. Table 77 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_47 according to the present invention is supported by 269 libraries. The number of libraries was determined as previously described.
This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA >EA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA PEA T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 78 below describes the starting and ending position of this segment on each transcript. Table 78 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_48 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 79 below describes the starting and ending position of this segment on each transcript. Table 79 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_50 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T54 and HUMFERHA PEA T59. Table 80 below describes the starting and ending position of this segment on each transcript. Table 80 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_51 according to the present invention is supported by 57 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA PEA 1 T50 and HUMFERHA_PEA_1_T59. Table 81 below describes the starting and ending position of this segment on each transcript. Table 81 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_52 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T50 and HUMFERHA_PEA_1_T59. Table 82 below describes the starting and ending position of this segment on each transcript. Table 82 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_53 according to the present invention is supported by 261 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51 and HUMFERHA_PEA_1_T59. Table 83 below describes the starting and ending position of this segment on each transcript. Table 83 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_54 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 84 below describes the starting and ending position of this segment on each transcript. Table 84 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_55 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA T41 , HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 85 below describes the starting and ending position of this segment on each transcript. Table 85 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_56 according to the present invention is supported by 238 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 86 below describes the starting and ending position of this segment on each transcript. Table 86 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_57 according to the present invention can be found in the following transcπpt(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA _T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51 , HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 87 below describes the starting and ending position of this segment on each transcript. Table 87 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_58 according to the present invention can be found in the following transcπpt(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1 T50, HUMFERHA PEA_1_T51,
HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 88 below describes the starting and ending position of this segment on each transcript. Table 88 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_59 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 89 below describes the starting and ending position of this segment on each transcript. Table 89 - Segment location on transcripts
•JSi' ' i. . __*-_j_W. > a? , i? _- i l alTranscnptvnarne v iSegment* |[ f. fw startmg posifcMMr v& HUMFERHA PEA 1 T27 986 994 HUMFERHA PEA 1 T30 1584 1592 HUMFERHA PEA 1 T31 1247 1255 HUMFERHA PEA 1 T32 931 939 HUMFERHA PEA 1 T35 834 842 HUMFERHA PEA 1 T36 805 813 HUMFERHA PEA 1 T41 2976 2984 HUMFERHA PEA 1 T42 2713 2721
Segment cluster HUMFERHA_PEA_l_node_60 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 90 below describes the starting and ending position of this segment on each transcript. Table 90 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_61 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA PEA_1 T35, HUMFERHA PEA 1 T36, HUMFERHA PEA 1 T41,
HUMFERHA J>EA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA__PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA >EA_1_T54 and HUMFERHA_PEA_1_T59. Table 91 below describes the starting and ending position of this segment on each transcript. Table 91 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_62 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31 , HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 92 below describes the starting and ending position of this segment on each transcript. Table 92 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_63 according to the present invention is supported by 162 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 93 below describes the starting and ending position of this segment on each transcript. Table 93 - Segment location on transcripts
Segment cluster HUMFERHA_PEA_l_node_64 according to the present invention can be found in the following transcript(s): HUMFERHA_PEA_1_T27, HUMFERHA_PEA_1_T30, HUMFERHA_PEA_1_T31, HUMFERHA_PEA_1_T32, HUMFERHA_PEA_1_T35, HUMFERHA_PEA_1_T36, HUMFERHA_PEA_1_T41, HUMFERHA_PEA_1_T42, HUMFERHA_PEA_1_T43, HUMFERHA_PEA_1_T44, HUMFERHA_PEA_1_T46, HUMFERHA_PEA_1_T50, HUMFERHA_PEA_1_T51, HUMFERHA_PEA_1_T54 and HUMFERHA_PEA_1_T59. Table 94 below describes the starting and ending position of this segment on each transcript. Table 94 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: FRIH_HUMAN_Vl Sequence documentation: Alignment of: HU FERHA_PEA_1_P6 x FRIH_HU AN_V1 Alignment segment 1/1: Quality: 1456.00 Escore:
Matching length: 147 Total ength: 147 Matching Percent Similarity: 99.32 Matching Percent dentity: 99.32 Total Percent Similarity: 99.32 Total Percent dentity: 99.32 Gaps : 0
Alignment : 7 SQSYYFDRDDVALKNFAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDI 56 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 37 SMSYYFDRDDVALKNFAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDI 86 57 KKPDCDDWESGLNAMECALHLEKNVNQSLLELHKLATDKNDPHLCDFIET
106 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 87 KKPDCDDWESGLNAMECALHLEKNVNQSLLELHKLATDKNDPHLCDFIET
136 107 HYLNEQVKAIKELGDHVTNLRKMGAPESGLAEYLFDKHTLGDSDNES
153 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 137 HYLNEQVKAIKELGDHVTNLRKMGAPESGLAEYLFDKHTLGDSDNES 183
Sequence name: FRIH_HUMAN_V1
Sequence documentation:
Alignment of: HUMFERHA_PEA_1_P25 x FRIH_HUMAN_V1
Alignment segment 1/1: Quality: 360.00
Escore: 0 Matching length: 37 Total length: 37 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment:
1 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSM 37 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSM 38
Sequence name: FRIH_HUMAN_V1 Sequence documentation:
Alignment of: HUMFERHA_PEA_1_P27 x FRIH_HUMAN_Vl Alignment segment 1/1: Quality: 339.00
Escore: 0 Matching length: 35 Total length: 35 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYL 35 I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 2 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYL 36
Sequence name: FRIH_HUMAN_V1 Sequence documentation:
Alignment of: HUMFERHA_PEA_1_P29 x FRIH_HUMAN_V1 Alignment segment 1/1: Quality: 851.00
Escore: 0 Matching length: 86 Total length: 86
Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKN 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKN 51 51 FAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDIK 86 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 52 FAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDIK 87
Sequence name: FRIH_HUMAN_Vl
Sequence documentation:
Alignment of: HUMFERHA_PEA_1_P30 x FRIH_HUMAN_V1
Alignment segment 1/1: Quality: 851.00
Escore: 0 Matching length: 86 Total length: 86 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKN 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKN 51 51 FAKYFLHQSHEEREHAEKLMK QNQRGGRIFLQDIK 86 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 52 FAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDIK 87
Sequence name: FRIH_HUMAN_V1
Sequence documentation:
Alignment of: HUMFERHA_PEA_1_P31 x FRIH_HUMAN_V1 Alignment segment 1/1: Quality: 1218.00
Escore: 0 Matching length: 113333 Total length: 182 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 73.08 Total Percent Identity: 73.08 Gaps : 1
Alignment: 1 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSM 37 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKN 51 38 KPDCDDWESGLNAM 51 I I I I I I I I I I I I I I 52 FAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDIKKPDCDD ESGLNAM
101 52 ECALHLEKNVNQSLLELHKLATDKNDPHLCDFIETHYLNEQVKAIKELGD
101 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 102 ECALHLEKNVNQSLLELHKLATDKNDPHLCDFIETHYLNEQVKAIKELGD
151 102 HVTNLRKMGAPESGLAEYLFDKHTLGDSDNES 133 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 152 HVTNLRKMGAPESGLAEYLFDKHTLGDSDNES
183
Sequence name: FRIH_HUMAN_V1
Sequence documentation:
Alignment of: HUMFERHA_PEA_1_P34 x FRIH_HUMAN_Vl Alignment segment 1/1: Quality: 1280.00 Escore: 0 Matching length: 128 Total length: 128 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKN 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKN 51 51 FAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDIKKPDCDDWESGLNAM
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 52 FAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDIKKPDCDD ESGLNAM
101 101 ECALHLEKNVNQSLLELHKLATDKNDPH 128 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 102 ECALHLEKNVNQSLLELHKLATDKNDPH
129
Sequence name: FRIH_HUMAN_V1
Sequence documentation:
Alignment of: HUMFERHA_PEA_1_P35 x FRIH_HUMAN_V1
Alignment segment 1/1: Quality: 1027.00 Escore: 0 Matching length: 113 Total length: 182
Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 62.09 Total Percent
Identity: 62.09 Gaps : 1
Alignment : 1 TTASTSQVRQNYHQDSE 17 I I I I I I I I I I I I I I I I I 2 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKN 51 18 KPDCDDWESGLNAM 31 I I I I I I I I I I I I I I 52 FAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDIKKPDCDDWESGLNAM
101 32 ECALHLEKNVNQSLLELHKLATDKNDPHLCDFIETHYLNEQVKAIKELGD 81 I I I I I I I I I I I I I I II II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 102 ECALHLEKNVNQSLLELHKLATDKNDPHLCDFIETHYLNEQVKAIKELGD 151 82 HVTNLRKMGAPESGLAEYLFDKHTLGDSDNES 113 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 152 HVTNLRKMGAPESGLAEYLFDKHTLGDSDNES 183
Sequence name: FRIH_HUMAN_V1
Sequence documentation:
Alignment of: HUMFERHA_PEA_1_P37 x FRIH_HUMAN_V1 Alignment segment 1/1: Quality: 1280.00
Escore: 0 Matching length: 128 Total length: 128 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00 Gaps: 0
Alignment: 1 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKN 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 2 TTASTSQVRQNYHQDSEAAINRQINLELYASYVYLSMSYYFDRDDVALKN 51 51 FAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDIKKPDCDDWESGLNAM 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 52 FAKYFLHQSHEEREHAEKLMKLQNQRGGRIFLQDIKKPDCDDWESGLNAM 101 101 ECALHLEKNVNQSLLELHKLATDKNDPH 128 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 102 ECALHLEKNVNQSLLELHKLATDKNDPH 129
Subsection H: Beta-2-microglobulin precursor
DESCRIPTION FOR CLUSTER HSB2MMU Cluster HSB2MMU features 5 transcript(s) and 44 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
HSB2MMU T7 1101 HSB2MMU T8 1102 HSB2MMU T27 1103 HSB2MMU T28 1104 HSB2MMU T29 1 105 Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants ofthe known protein Beta-2-microglobulin precursor (SwissProt accession identifier B2MG_HUMAN; known also according to the synonyms HDCMA22P), SEQ ID NO: 1150, referred to herein as the previously known protein. Protein Beta-2-microglobulin precursor is known or believed to have the following function(s): Beta-2-microglobulin is the beta-chain of major histocompatibility complex class I molecules. Beta-2-microglobulin (B2M) has a predominantly beta-pleated sheet structure that may adopt the fibrillar configuration of amyloid in certain pathologic states. Beta-2- microglobulin is essential to expression of HLA. Hence, B2M gene mutation is one mechanism by which tumor cells (e.g. melanoma, colorectal carcinoma) may escape immune recognition by cytotoxic T cells. Progressive hepatic iron overload, indistinguishable from that observed in human hemochromatosis, was found only in mice homozygous for the mutated B2M gene. B2M is found in the urine and serum of normal individuals. Its levels are changed in many conditions related to renal dysfunction, and in various conditions related to lymphocyte pathophysiology. Urine B2M is found in elevated amounts in patients with Wilson disease, cadmium poisoning, and other conditions leading to renal tubular dysfunction. A protein identical to B2M in several characteristics accumulates in amyloid- laden tissue obtained from a chronic hemodialysis patient with carpal tunnel syndrome. Hemodialysis-related amyloidosis is a form of systemic amyloidosis with a predilection for the synovium and bone that occurs on long-term hemodialysis. The clinical features include carpal tunnel syndrome, erosive arthropathy, spondyloarthropathy, lytic bone lesions, and pathologic fractures. Severe renal insufficiency may also occur with B2M amyloidosis without any dialysis. B2M was also elevated in fetal serum in severe impairment of fetal renal function. CSF B2M level increases both in Human Immunodeficiency Virus type 1 (HIV-1) related and in opportunistic CNS syndromes (i.e. AIDS dementia complex, multifocal giant cells encephalitis, cryptococcal meningitis). B2M is an important serological marker for non-Hodgkin's lymphoma tumor load, used in disease staging. Level was significantly higher in the CSF of patients with viral meningitis than in ones without meningitis. B2M level can be a good parameter to define cyclosporine A tubular toxicity. Significantly elevated serum levels of beta 2-m were found in patients with lymphoproliferative disorders like monoclonal gammopathies, in malignant lymphoma, post-transplant lymphoproliferative disease and in chronic lymphatic leukemia.
In cases of HIV-infection, increasing levels of B2M exhibited an inverse correlation to the CD4+ T-lymphocyte count and indicated disease progression. In patients having renal transplantation a rejection ofthe graft was accompanied by a rise ofthe B2M serum level. On the other hand, B2M was the most suitable reference gene tested in quantization of target mRNAs in various malignancy patients' leukocytes, as its variation between different sample origins and within distinct cell types was low. B2M was found to be an adequate marker of glomerular filtration, with a diagnostic accuracy very similar to that of creatinine. Glomerular hyperfiltration combined with the increased level of the serum beta 2- microglobulin can be used as an early marker of autosomal dominant polycystic kidney disease. B2M in saliva was high in the autoimmune Sjogren's syndrome patients. Elevated serum levels of B2M were found in patients with amyloidosis and in cancers including nasopharyngeal and ovarian carcinoma. The variants ofthe present invention are useful for diagnosis of these diseases and/or pathological conditions. The sequence for protein Beta-2-microglobulin precursor is given at the end ofthe application, as "Beta-2-microglobulin precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Beta-2-microglobulin precursor localization is believed to be Secreted. As noted above, cluster HSB2MMU features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Beta- 2-microglobulin precursor. A description of each variant protein according to the present invention is now provided. Variant protein HSB2MMU_P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSB2MMU_T27. An alignment is given to the known protein (Beta-2-microglobulin precursor) at the end of the application One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe
relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSB2MMU .P9 and B2MG_HUMAN (SEQ ID NO:1151): 1.An isolated chimeric polypeptide encoding for HSB2MMU P9, comprising a first amino acid sequence being at least 90 % homologous to
MSRSVALAVLALLSLSGLEAIQRTPKIQVYSRHPAENGKSNFLNCYVSGFHPSDIEVD LLKNGERIEKVEHSDLSFSKDWSFYLLYYTEFTPTEKDE corresponding to amino acids 1 - 97 of B2MG HUMAN, which also corresponds to amino acids 1 - 97 of HSB2MMU P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence SRHVSSIMEV corresponding to amino acids 98 - 107 of HSB2MMU P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSB2MMU_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence SRHVSSIMEV in HSB2MMU_P9. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither trans- membrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSB2MMU_P9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSB2MMU P9 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 5 - Amino acid mutations
Variant protein HSB2MMU P9 is encoded by the following transcript(s): HSB2MMU T27, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSB2MMU_T27 is shown in bold; this coding portion starts at position 165 and ends at position 485. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSB2MMU_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HSB2MMU P10 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSB2MMU_T28. An alignment is given to the known protein (Beta-2-microglobulin precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSB2MMU_P10 and Q9UM88 (SEQ ID NO: 1152): l.An isolated chimeric polypeptide encoding for HSB2MMU P10, comprising a first amino acid sequence being at least 90 % homologous to MSRSVALAVLALLSLSGLEAIQRTPKIQ corresponding to amino acids 1 - 28 of Q9UM88, which also corresponds to amino acids 1 - 28 of HSB2MMU P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TCLSARTGLSISCTTLNSPPLKKMSMPAV corresponding to amino acids 29 - 57 of HSB2MMU P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSB2MMU P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TCLSARTGLSISCTTLNSPPLKKMSMPAV in HSB2MMU_P10. Comparison report between HSB2MMU_P10 and Q8NE94 (SEQ ID NO: 1153): l .An isolated chimeric polypeptide encoding for HSB2MMU P10, comprising a first amino acid sequence being at least 90 % homologous to MSRSVALAVLALLSLSGLEAIQR corresponding to amino acids 1 1 - 33 of Q8NE94, which also corresponds to amino acids 1 - 23 of HSB2MMU_P10, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide
having the sequence TPKIQTCLSARTGLSISCTTLNSPPLKKMSMPAV corresponding to amino acids 24 - 57 of HSB2MMU_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSB2MMU_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TPKIQTCLSARTGLSISCTTLNSPPLKKMSMPAV in HSB2MMU_P10.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSB2MMU P10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSB2MMU_P 10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein HSB2MMU_P10 is encoded by the following transcript(s): HSB2MMU T28, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSB2MMU_T28 is shown in bold; this coding portion starts at position 165 and ends at position 335. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence
of known SNPs in variant protein HSB2MMU_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HSB2MMU P11 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSB2MMU_T29. An alignment is given to the known protein (Beta-2-microglobulin precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSB2MMU_P1 1 and B2MG_HUMAN:
l .An isolated chimeric polypeptide encoding for HSB2MMU_P11, comprising a first amino acid sequence being at least 90 % homologous to MSRSVALAVLALLSLSGLEAIQRTPKIQVYSRHPAENGKSNFLNCYVSGFHPSDIEVD LLKNGERIEK corresponding to amino acids 1 - 68 of B2MG_HUMAN, which also corresponds to amino acids 1 - 68 of HSB2MMU_P1 1, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MSMPAV corresponding to amino acids 69 - 74 of HSB2MMU_P1 1, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSB2MMU P1 1, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MSMPAV in HSB2MMU_P1 1.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSB2MMU P1 1 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSB2MMU P1 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein HSB2MMU P11 is encoded by the following transcript(s): HSB2MMU_T29, for which the sequence(s) is/are given at the end ofthe application. The
coding portion of transcript HSB2MMU_T29 is shown in bold; this coding portion starts at position 165 and ends at position 386. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSB2MMU_P11 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HSB2MMU P22 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSB2MMU_T7 and HSB2MMU_T8. An alignment is given to the known protein (Beta-2-microglobulin precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSB2MMU P22 and B2MG_HUMAN:
l.An isolated chimeric polypeptide encoding for HSB2MMU_P22, comprising a first amino acid sequence being at least 90 % homologous to MSRSVALAV corresponding to amino acids 1 - 9 of B2MG_HUM AN, which also corresponds to amino acids 1 - 9 of HSB2MMU P22, a bridging amino acid V corresponding to amino acid 10 of HSB2MMU P22, a second amino acid sequence being at least 90 % homologous to ALLSLSGLEAIQRTPKIQVYSRHPAENGKSNFLNCYVSGFHPSDIEVDLLKNGERIEK VEHSDLSFSKDWSFYLLYYTEFTPTEK corresponding to amino acids 11 - 95 of B2MG_HUMAN, which also corresponds to amino acids 11 - 95 of HSB2MMU_P22, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MSMPAV corresponding to amino acids 96 - 101 of HSB2MMU P22, wherein said first amino acid sequence, bridging amino acid, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSB2MMU P22, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MSMPAV in HSB2MMU_P22.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSB2MMU P22 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 1 1, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSB2MMU_P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein HSB2MMU_P22 is encoded by the following transcript(s): HSB2MMU_T7 and HSB2MMU_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcπpt HSB2MMU T7 is shown in bold; this coding portion starts at position 165 and ends at position 468. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSB2MMU_P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
The coding portion of transcript HSB2MMU T8 is shown in bold; this coding portion starts at position 165 and ends at position 468. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSB2MMU P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
As noted above, cluster HSB2MMU features 44 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSB2MMU_node_0 according to the present invention is supported by 346 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU T27, HSB2MMU_T28 and HSB2MMU T29. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster HSB2MMU_node_53 according to the present invention is supported by 190 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU_T29. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description. Segment cluster HSB2MMU_node_l according to the present invention is supported by 440 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8,
HSB2MMU T27, HSB2MMU T28 and HSB2MMU_T29. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster HSB2MMU_node_5 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU T8, HSB2MMU T27, HSB2MMU_T28 and HSB2MMU_T29. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster HSB2MMU_node_6 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU_T29. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts Transcπpt name ! .Segment*- § ό Slelgm,e_,.nl$tif* , $ * w ' -;* i«fe• *T$-f startmg posit idnf *■ ending position m HSB2MMU T7 243 248 HSB2MMU T8 243 248 HSB2MMU T27 243 248 HSB2MMU T28 243 248 HSB2MMU T29 243 248
Segment cluster HSB2MMU_node_7 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27 and HSB2MMU T29. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster HSB2MMU_node_8 according to the present invention can be found in the following transcript(s): HSB2MMU T7, HSB2MMU_T8, HSB2MMU_T27 and HSB2MMU T29. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HSB2MMU_node_9 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27 and HSB2MMU T29. Table 21 below descπbes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster HSB2MMU_node_10 according to the present invention is supported by 521 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27 and HSB2MMU_T29. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HSB2MMU_node_l 1 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27 and HSB2MMU T29. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HSB2MMU_node_12 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27 and HSB2MMU_T29. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HSB2MMU_node_13 according to the present invention can be found in the following transcπpt(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27 and HSB2MMU T29. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HSB2MMU_node_14 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8 and HSB2MMU_T27. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HSB2MMU_node_15 according to the present invention is supported by 535 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU T27 and HSB2MMU_T28. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 ' - Segment location on transcripts
Segment cluster HSB2MMU_node_16 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27 and HSB2MMU_T28. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HSB2MMU_node_17 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU T27 and HSB2MMU_T28. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster HSB2MMU_node_18 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMUJT27 and HSB2MMU_T28. Table 30 below describes the starting and ending position of this segment on each transcπpt. Table 30 - Segment location on transcripts
Segment cluster HSB2MMU_node_19 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27, HSB2MMU T28 and HSB2MMU_T29. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster HSB2MMU_node_20 according to the present invention is supported by 471 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSB2MMU T7, HSB2MMU_T8, HSB2MMU T28 and HSB2MMU_T29. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster HSB2MMU_node_25 according to the present invention is supported by 406 libraries. The number of libraries was determined as previously descπbed. This segment can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8,
HSB2MMU T27, HSB2MMU_T28 and HSB2MMU_T29. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HSB2MMU_node_29 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU T27, HSB2MMU_T28 and HSB2MMU_T29. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster HSB2MMU_node_30 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU_T29. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HSB2MMU_node_31 according to the present invention is supported by 349 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSB2MMU T7, HSB2MMUJT8, HSB2MMU T27, HSB2MMU_T28 and HSB2MMU_T29. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster HSB2MMU_node_32 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU T29. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster HSB2MMU_node_33 according to the present invention can be found in the following transcript(s): HSB2MMU T7, HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU_T29. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster HSB2MMU_node_34 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27, HSB2MMU T28 and HSB2MMU_T29. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster HSB2MMU_node_35 according to the present invention is supported by 330 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU_T29. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster HSB2MMU_node_36 according to the present invention can be found in the following transcript(s): HSB2MMU T7, HSB2MMU_T8, HSB2MMU 27,
HSB2MMU_T28 and HSB2MMU_T29. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster HSB2MMU_node_37 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU_T29. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster HSB2MMU_node_38 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU 8, HSB2MMU T27, HSB2MMU_T28 and HSB2MMU_T29. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster HSB2MMU_node_39 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU_T29. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster HSB2MMU_node_40 according to the present invention is supported by 302 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSB2MMU T27, HSB2MMU_T28 and HSB2MMU T29. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster HSB2MMU_node_41 according to the present invention can be found in the following transcπpt(s): HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU T29. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
HSB2MMU T29 697 701
Segment cluster HSB2MMU_node_42 according to the present invention can be found in the following transcript(s): HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU T29. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster HSB2MMU_node_43 according to the present invention can be found in the following transcπpt(s): HSB2MMU_T8, HSB2MMU T27, HSB2MMU_T28 and HSB2MMU T29. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster HSB2MMU_node_44 according to the present invention can be found in the following transcript(s): HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU_T29. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment loc ation on transcripts
Segment cluster HSB2MMU_node_45 according to the present invention can be found in the following transcript(s): HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU_T29. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Segment cluster HSB2MMU_node_46 according to the present invention can be found in the following transcπpt(s): HSB2MMU_T8, HSB2MMU T27, HSB2MMU_T28 and HSB2MMU T29. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster HSB2MMU_node_47 according to the present invention can be found in the following transcript(s). HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU_T29. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Segment cluster HSB2MMU_node_48 according to the present invention can be found in the following transcript(s): HSB2MMU_T8, HSB2MMU_T27, HSB2MMU 28 and HSB2MMU T29. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Segment cluster HSB2MMU_node_49 according to the present invention is supported by 239 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSB2MMU T7, HSB2MMU_T8, HSB2MMU T27, HSB2MMU_T28 and HSB2MMU_T29. Table 54 below describes the starting and ending position of this segment on each transcript. Table 54 - Segment location on transcripts
Segment cluster HSB2MMU_node_50 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU T29. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
Segment cluster HSB2MMU_node_51 according to the present invention can be found in the following transcript(s): HSB2MMU_T7, HSB2MMUJT8, HSB2MMU T27, HSB2MMU T28 and HSB2MMU T29. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Segment cluster HSB2MMU_node_52 according to the present invention is supported by 198 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSB2MMU 7, HSB2MMU_T8, HSB2MMU_T27, HSB2MMU_T28 and HSB2MMU_T29. Table 57 below describes the starting and ending position of this segment on each transcript. Table 57 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name: B2MG_HUMAN
Sequence documentation:
Alignment of: HSB2MMU_P9 x B2MG_HUMAN
Alignment segment 1/1: Quality: 947.00
Escore: 0 Matching length: 9977 Total length: 97 Matching Percent Similarity: 100.00 Matching Percent Identity:' 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MSRSVALAVLALLSLSGLEAIQRTPKIQVYSRHPAENGKSNFLNCYVSGF 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MSRSVALAVLALLSLSGLEAIQRTPKIQVYSRHPAENGKSNFLNCYVSGF 50 51 HPSDIEVDLLKNGERIEKVEHSDLSFSKD SFYLLYYTEFTPTEKDE 97 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 HPSDIEVDLLKNGERIEKVEHSDLSFSKDWSFYLLYYTEFTPTEKDE 97
Sequence name: Q9UM88 Sequence documentation: Alignment of: HSB2MMU_P10 x Q9UM88 Alignment segment 1/1: Quality: 252.00
Escore: 0 Matching length: 28 Total length: 28 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: MSRSVALAVLALLSLSGLEAIQRTPKIQ I I I I I I I I I I I I I I I I I I I I I I I I I I I I MSRSVALAVLALLSLSGLEAIQRTPKIQ
Sequence name: Q8NE94 Sequence documentation:
Alignment of: HSB2MMU_P10 x Q8NE94 Alignment segment 1/1: Quality: 202.00
Escore: 0 Matching length: 23 Total length: 23 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MSRSVALAVLALLSLSGLEAIQR 23 I I I I I I I I I I I I I I I I I I I I I I I 11 MSRSVALAVLALLSLSGLEAIQR 33
Sequence name: B2MG_HUMAN
Sequence documentation:
Alignment of: HSB2MMU_P11 x B2MG_HUMAN
Alignment segment 1/1: Quality: 651.00
Escore: 0
Matching length: 68 Total length: 68 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : 1 MSRSVALAVLALLSLSGLEAIQRTPKIQVYSRHPAENGKSNFLNCYVSGF 50 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MSRSVALAVLALLSLSGLEAIQRTPKIQVYSRHPAENGKSNFLNCYVSGF 50 51 HPSDIEVDLLKNGERIEK 68 I I I I I I I I I I I I I I I I I I 51 HPSDIEVDLLKNGERIEK 68
Sequence name: B2MG_HUMAN
Sequence documentation:
Alignment of: HSB2MMU_P22 x B2MG_HUMAN Alignment segment 1/1: Quality: 919.00 Escore: 0 Matching length: 95 Total length: 95 Matching Percent Similarity: 100.00 Matching Percent Identity: 98.95 Total Percent Similarity: 100.00 Total Percent Identity: 98.95 Gaps : 0
Alignment: 1 MSRSVALAVVALLSLSGLEAIQRTPKIQVYSRHPAENGKSNFLNCYVSGF 50 I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MSRSVALAVLALLSLSGLEAIQRTPKIQVYSRHPAENGKSNFLNCYVSGF 50 51 HPSDIEVDLLKNGERIEKVEHSDLSFSKD SFYLLYYTEFTPTEK 95 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 HPSDIEVDLLKNGERIEKVEHSDLSFSKDWSFYLLYYTEFTPTEK 95
Subsection I: Interleukin-6 receptor alpha chain precursor DESCRIPTION FOR CLUSTER HSI6REC Cluster HSI6REC features 5 transcript(s) and 12 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
HSI6REC T2 158 HSI6REC T4 1 159 HSI6REC T5 1160 HSI6REC T7 1 161 HSI6REC T8 1 162
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants ofthe known protein Interleukin-6 receptor alpha chain precursor (SwissProt accession identifier IL6A HUMAN; known also according to the synonyms IL-6R-alpha; IL-6R 1; CD126 antigen), SEQ ID NO: 1175, referred to herein as the previously known protein.
Protein Interleukin-6 receptor alpha chain precursor is known or believed to have the following function(s): part ofthe receptor for interleukin 6. Binds to IL-6 with low affinity, but does not transduce a signal. Signal activation necessitate an association with IL6ST. Activation may lead to the regulation ofthe immune response, acute-phase reactions and hematopoiesis. Low concentration of a soluble form of interleukin-6 receptor acts as an agonist of IL6 activity. The variants ofthe present invention are useful as a predictor of metastatic potential and recurrence, which may optionally be used independently. The sequence for protein Interleukin-6 receptor alpha chain precursor is given at the end of the application, as "Interleukin-6 receptor alpha chain precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Interleukin-6 receptor alpha chain precursor localization is believed to be Type I membrane protein (isoform 1). Secreted (isoform 2). The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Hepatic dysfunction. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities ofthe previously known protein are as follows: Interleukin 2 agonist; Interleukin 6 receptor antagonist; Interleukin 6 modulator. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Radio/chemoprotective; Cytokine; Anticancer; Anti-inflammatory; Monoclonal antibody, humanized; Antiarthritic, immunological; Antianaemic; Antiviral, interferon; GI inflammatory/bowel disorders; Immunosuppressant; Hepatoprotective; Haematological. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: skeletal development; immune response; cell surface receptor linked signal transduction; developmental processes; cell proliferation, which are annotation(s) related to Biological Process; receptor; hematopoeitin/interferon-class (D200- domain) cytokine receptor; interleukin-6 receptor, which are annotation(s) related to Molecular Function; and plasma membrane; interleukin-6 receptor; integral membrane protein, which are annotation(s) related to Cellular Component.
The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster HSI6REC features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Interleukin-6 receptor alpha chain precursor. A description of each variant protein according to the present invention is now provided. Variant protein HSI6REC P4 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSI6REC T2. An alignment is given to the known protein (Interleukin-6 receptor alpha chain precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSI6REC_P4 and IL6AJHUMAN: l.An isolated chimeric polypeptide encoding for HSI6REC P4, comprising a first amino acid sequence being at least 90 % homologous to MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGVEPEDNA TVHWVLRKPAAGSHPSRWAGMGRRLLLRSVQLHDSGNYSCYRAGRPAGTVHLLV DVPPEEPQLSCFRKSPLSNVVCEWGPRSTPSLTTKAVLLVRKFQNSPAEDFQEPCQY SQESQKFSCQLAVPEGDSSFYIVSMCVASSVGSKFSKTQTFQGCGILQPDPPANITVT AVARNPRWLSVTWQDPHSWNSSFYRLRFELRYRAERSKTFTTWMVKDLQHHCVIH DAWSGLRHVVQLRAQEEFGQGEWSEWSPEAMGTPW corresponding to amino acids 1 - 315 of IL6A_HUMAN, which also corresponds to amino acids 1 - 315 of HSI6REC_P4, a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TEMRSHYVTQAGFKLLASWDSPASVSQSAGI corresponding to amino acids 316 - 346 of HSI6REC P4, and a third amino acid sequence being at least 90 % homologous to
TESRSPPAENEVSTPMQALTTNKDDDNILFRDSANATSLPVQDSSSVPLPTFLVAGGS LAFGTLLCIAIVLRFKKTWKLRALKEGKTSMHPPYSLGQLVPERPRPTPVLVPLISPP
VSPSSLGSDNTSSHNRPDARDPRSPYDISNTDYFFPR corresponding to amino acids 316 - 468 of IL6A HUMAN, which also corresponds to amino acids 347 - 499 of HSI6REC_P4, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for an edge portion of HSI6REC P4, comprising an amino acid sequence being at least 70%, optionally at least about 80%, preferably at least about 85%o, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence encoding for TEMRSHYVTQAGFKLLASWDSPASVSQSAGI, corresponding to HSI6REC_P4.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although both signal-peptide prediction programs agree that this protein has a signal peptide, both trans-membrane region prediction programs predict that this protein has a trans-membrane region downstream of this signal peptide. Variant protein HSI6REC P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSI6REC P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
The glycosylation sites of variant protein HSI6REC_P4, as compared to the known protein Interleukin-6 receptor alpha chain precursor, are described in Table 6 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein).
Table 6 - Glycosylation site(s)
Variant protein HSI6REC_P4 is encoded by the following transcript(s): HSI6REC_T2, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSI6REC T2 is shown in bold; this coding portion starts at position 438 and ends at position 1934. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSI6REC P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein HSI6REC P6 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSI6REC T4. An alignment is given to the known protein (Interleukin-6 receptor alpha chain precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSI6REC_P6 and IL6A_HUMAN: 1.An isolated chimeric polypeptide encoding for HSI6REC_P6, comprising a first amino acid sequence being at least 90 % homologous to
MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGVEPEDNA TVHWVLRKPAAGSHPSRWAGMGRRLLLRSVQLHDSGNYSCYRAGRPAGTVHLLV DVPPEEPQLSCFRKSPLSNVVCEWGPRSTPSLTTKAVLLVRKFQNSPAEDFQEPCQY SQESQKFSCQLAVPEGDSSFYIVSMCVASSVGSKFSKTQTFQGCGILQPDPPANITVT AVARNPRWLSVTWQDPHSWNSSFYRLRFELRYRAERSKTFTTWMVKDLQHHCVIH DAWSGLRHVVQLRAQEEFGQGEWSEWSPEAMGTPWTESRSPPAENEVSTPMQ corresponding to amino acids 1 - 332 of IL6A HUMAN, which also corresponds to amino acids 1 - 332 of HSI6REC P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
VQEDVEAAGSEGRQDKHASAVLFGAAGPGEASTHPSACSSHLPTGVPQQPGV corresponding to amino acids 333 - 384 of HSI6REC P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSI6REC P6, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
VQEDVEAAGSEGRQDKHASAVLFGAAGPGEASTHPSACSSHLPTGVPQQPGV in HSI6REC P6.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to
the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSI6REC_P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 8, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSI6REC_P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Amino acid mutations
The glycosylation sites of variant protein HSI6REC P6, as compared to the known protein Interleukin-6 receptor alpha chain precursor, are described in Table 9 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 9 - Glycosylation site(s)
Variant protein HSI6REC P6 is encoded by the following transcript(s): HSI6REC_T4, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSI6REC_T4 is shown in bold; this coding portion starts at position 438 and ends at position 1589. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSI6REC P6 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 10 - Nucleic acid SNPs
Variant protein HSI6REC P7 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSI6REC T5. An alignment is given to the known protein (Interleukin-6 receptor alpha chain precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSI6REC_P7 and IL6A_HUMAN: 1.An isolated chimeric polypeptide encoding for HSI6REC P7, comprising a first amino acid sequence being at least 90 % homologous to MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGVEPEDNA TVHWVLRKPAAGSHPSRWAGMGRRLLLRSVQLHDSGNYSCYRAGRPAGTVHLLV DVPPEEPQLSCFRKSPLSNVVCEWGPRSTPSLTTKAVLLVRKFQNSPAEDFQEPCQY SQESQKFSCQLAVPEGDSSFYIVSMCVASSVGSKFSKTQTFQGCGILQPDPPANITVT AVARNPRWLSVTWQDPHSWNSSFYRLRFELRYRAERSKTFTTWM corresponding to
amino acids 1 - 269 of 1L6AJHUMAN, which also corresponds to amino acids 1 - 269 of HSI6RECJP7, and a second amino acid sequence being at least 70%, optionally at least 80%), preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence NPGVLQLRTRCPPPCRHLLLIKTMIIFSSEILQMRQASQCKILLQYHCPHSWLLEGAW PSERSSALPLF corresponding to amino acids 270 - 338 of HSI6REC_P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSI6REC P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence NPGVLQLRTRCPPPCRHLLLIKTMIIFSSEILQMRQASQCKILLQYHCPHSWLLEGAW PSERSSALPLF in HSI6REC_P7.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSI6REC P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSI6REC_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
The glycosylation sites of variant protein HSI6REC_P7, as compared to the known protein Interleukin-6 receptor alpha chain precursor, are described in Table 12 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 12 - Glycosylation site(s)
Variant protein HSI6REC_P7 is encoded by the following transcript(s): HSI6REC_T5, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSI6REC T5 is shown in bold; this coding portion starts at position 438 and ends at position 1451. The transcript also has the following SNPs as listed in Table 13 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSI6REC P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Nucleic acid SNPs
Variant protein HSI6REC_P9 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSI6REC T7 and HSI6REC T8. An alignment is given to the known protein (Interleukin-6 receptor alpha chain precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSI6REC_P9 and IL6A_HUMAN: l.An isolated chimeric polypeptide encoding for HSI6REC_P9, comprising a first amino acid sequence being at least 90 % homologous to MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGVEPEDNA TVHWVLRKPAAGSHPSRWAGMGRRLLLRSVQLHDSGNYSCYRAGRPAGTVHLLV DVPPEEPQLSCFRKSPLSNVVCEWGPRSTPSLTTKAVLLVRKFQNSPAEDFQEPCQY SQESQKFSCQLAVPEGDSSFYIVSMCVASSVGSKFSKTQTFQGCGILQPDPPANITVT AVARNPRWLSVTWQDPHSWNSSFYRLRFELRYRAERSKTFTTWMVKDLQHHCVIH DAWSGLRHWQLRAQEEFGQGEWSEWSPEAMGTPWT corresponding to amino acids 1 - 316 of IL6A HUMAN, which also corresponds to amino acids 1 - 316 of HSI6REC_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence DRLSPRCPGWSTAVQSQLTATSASWVQAILPPQPPK corresponding to amino acids 317 - 352 of HSI6REC P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSI6REC P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence DRLSPRCPGWSTAVQSQLTATSASWVQAILPPQPPK in HSI6REC_P9. The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other
specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HSI6REC_P9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 14, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSI6REC P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Ammo acid mutations
The glycosylation sites of variant protein HSI6REC_P9, as compared to the known protein Interleukin-6 receptor alpha chain precursor, are described in Table 15 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 15 - Glycosylation site(s)
Variant protein HSI6REC_P9 is encoded by the following transcript(s): HSI6REC_T7 and HSI6REC T8, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSI6REC T7 is shown in bold; this coding portion starts at position 438 and ends at position 1493. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSI6REC P9 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 16 - Nucleic acid SNPs
The coding portion of transcript HSI6REC T8 is shown in bold; this coding portion starts at position 438 and ends at position 1493. The transcript also has the following SNPs as listed in Table 17 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSI6REC P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Nucleic acid SNPs
As noted above, cluster HSI6REC features 12 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSI6REC_node_0 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSI6REC_T2, HSI6REC_T4, HSI6REC_T5, HSI6REC_T7 and HSI6REC_T8. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster HSI6REC_node_2 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSI6REC_T2, HSI6REC_T4, HSI6REC_T5, HSI6REC T7 and HSI6REC T8. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster HSI6REC_node_4 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSI6REC_T2, HSI6REC_T4, HSI6REC_T5, HSI6REC T7 and HSI6REC T8. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HSI6REC_node_6 according to the present invention is supported by 26 libraries. The number of libraπes was determined as previously described. This segment can be found in the following transcπpt(s): HSI6REC T2, HSI6REC_T4, HSI6REC_T5, HSI6REC T7 and HSI6REC_T8. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster HSI6REC_node_8 according to the present invention is supported by 26 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): HSI6REC T2, HSI6REC_T4, HSI6REC_T5, HSI6REC T7 and HSI6REC T8. Table 22 below describes the starting and ending position of this segment on each transcript Table 22 - Segment location on transcripts
Segment cluster HSI6REC_node_10 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSI6REC T2, HSI6REC_T4, HSI6REC_T7 and HSI6REC_T8. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HSI6REC_node_12 according to the present invention is supported by 5 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSI6REC_T7 and HSI6REC_T8. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HSI6REC_node_25 according to the present invention is supported by 210 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSI6REC T2, HSI6REC T4 and HSI6REC T5. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSI6REC_node_16 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSI6REC T2. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HSI6REC_node_18 according to the present invention is supported by 21 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSI6REC T2, HSI6REC_T4 and HSI6REC T5. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster HSI6REC_node_20 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSI6REC_T2 and HSI6REC_T5. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
|Transcπptfflamei«#. * |»- <&. iSegmenw , A- "■ Si- . starting position* , >$£.- ιdιng position HSI6REC T2 1527 1596 HSI6REC T5 1292 1361
Segment cluster HSI6REC_node_23 according to the present invention is supported by 23 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSI6REC_T2 and HSI6REC T5. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: IL6A_HUMAN Sequence documentation: Alignment of: HSI6REC_P4 x IL6A_HUMAN Alignment segment 1/1: Quality: 4560.00 Escore: 0 Matching length: 468 Total length: 499 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 93.79 Total Percent Identity: 93.79 Gaps: 1 Alignment : 1 MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGV 50 51 EPEDNATVHWVLRKPAAGSHPSRWAGMGRRLLLRSVQLHDSGNYSCYRAG 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
51 EPEDNATVHWVLRKPAAGSHPSRWAGMGRRLLLRSVQLHDSGNYSCYRAG 100 101 RPAGTVHLLVDVPPEEPQLSCFRKSPLSNWCEWGPRSTPSLTTKAVLLV 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 RPAGTVHLLVDVPPEEPQLSCFRKSPLSNVVCE GPRSTPSLTTKAVLLV 150 151 RKFQNSPAEDFQEPCQYSQESQKFSCQLAVPEGDSSFYIVSMCVASSVGS 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RKFQNSPAEDFQEPCQYSQESQKFSCQLAVPEGDSSFYIVSMCVASSVGS 200 . . . . . 201 KFSKTQTFQGCGILQPDPPANITVTAVARNPR LSVT QDPHSWNSSFYR 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 KFSKTQTFQGCGILQPDPPANITVTAVARNPRWLSVTWQDPHSWNSSFYR 250 251 LRFELRYRAERSKTFTT MVKDLQHHCVIHDAWSGLRHVVQLRAQEEFGQ 300. I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 LRFELRYRAERSKTFTTWMVKDLQHHCVIHDAWSGLRHVVQLRAQEEFGQ 300 301 GE SE SPEAMGTP TEMRSHYVTQAGFKLLASWDSPASVSQSAGITESR 350 I I I I I I I I I I I I I I I I I I I 301 GEWSE SPEAMGTPW TESR
319 351 SPPAENEVSTPMQALTTNKDDDNILFRDSANATSLPVQDSSSVPLPTFLV 00 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 320 SPPAENEVSTPMQALTTNKDDDNILFRDSANATSLPVQDSSSVPLPTFLV 69 401 AGGSLAFGTLLCIAIVLRFKKT KLRALKEGKTSMHPPYSLGQLVPERPR 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 370 AGGSLAFGTLLCIAIVLRFKKTWKLRALKEGKTSMHPPYSLGQLVPERPR 19 . . . . 451 PTPVLVPLISPPVSPSSLGSDNTSSHNRPDARDPRSPYDISNTDYFFPR 99 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 420 PTPVLVPLISPPVSPSSLGSDNTSSHNRPDARDPRSPYDISNTDYFFPR 68
Sequence name: IL6A_HUMAN
Sequence documentation: Alignment of: HSI6REC_P6 x IL6A_HUMAN
Alignment segment 1/1: Quality: 3334.00 Escore: 0 Matching length: 332 Total length: 332 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 1 MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGV 50 51 EPEDNATVHWVLRKPAAGSHPSRWAGMGRRLLLRSVQLHDSGNYSCYRAG
100 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 EPEDNATVHWVLRKPAAGSHPSR AGMGRRLLLRSVQLHDSGNYSCYRAG
100 101 RPAGTVHLLVDVPPEEPQLSCFRKSPLSNVVCE GPRSTPSLTTKAVLLV
150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 RPAGTVHLLVDVPPEEPQLSCFRKSPLSNVVCE GPRSTPSLTTKAVLLV 150 151 RKFQNSPAEDFQEPCQYSQESQKFSCQLAVPEGDSSFYIVSMCVASSVGS
200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RKFQNSPAEDFQEPCQYSQESQKFSCQLAVPEGDSSFYIVSMCVASSVGS
200 201 KFSKTQTFQGCGILQPDPPANITVTAVARNPR LSVTWQDPHSWNSSFYR
250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
201 KFSKTQTFQGCGILQPDPPANITVTAVARNPR LSVT QDPHS NSSFYR
250 251 LRFELRYRAERSKTFTTWMVKDLQHHCVIHDAWSGLRHVVQLRAQEEFGQ 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 LRFELRYRAERSKTFTTWMVKDLQHHCVIHDAWSGLRHVVQLRAQEEFGQ
300 301 GE SE SPEAMGTPWTESRSPPAENEVSTPMQ 332 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 GE SEWSPEAMGTP TESRSPPAENEVSTPMQ
332
Sequence name: IL6A_HUMAN
Sequence documentation: Alignment of: HSI6REC_P7 x IL6A_HUMAN
Alignment segment 1/1: Quality: 2678.00 Escore: 0 Matching length: 269 Total length: 269 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment: 1 MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGV 50 51 EPEDNATVH VLRKPAAGSHPSRWAGMGRRLLLRSVQLHDSGNYSCYRAG
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 EPEDNATVHWVLRKPAAGSHPSR AGMGRRLLLRSVQLHDSGNYSCYRAG
100
101 RPAGTVHLLVDVPPEEPQLSCFRKSPLSNWCE GPRSTPSLTTKAVLLV
150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 RPAGTVHLLVDVPPEEPQLSCFRKSPLSNVVCE GPRSTPSLTTKAVLLV
150 151 RKFQNSPAEDFQEPCQYSQESQKFSCQLAVPEGDSSFYIVSMCVASSVGS
200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RKFQNSPAEDFQEPCQYSQESQKFSCQLAVPEGDSSFYIVSMCVASSVGS 200 201 KFSKTQTFQGCGILQPDPPANITVTAVARNPR LSVTWQDPHSWNSSFYR
250 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 KFSKTQTFQGCGILQPDPPANITVTAVARNPR LSVT QDPHSWNSSFYR 250 251 LRFELRYRAERSKTFTTWM 269 I I I I I I I I I I I I I I I I I I I 251 LRFELRYRAERSKTFTTWM
269
Sequence name: IL6A_HUMAN
Sequence documentation:
Alignment of: HSI6REC_P9 x ILΘA_HUMAN
Alignment segment 1/1: Quality: 3186.00
Escore: 0 Matching length: 321 Total length: 321 Matching Percent Similarity: 99.38 Matching Percent
Identity: 99.07 Total Percent Similarity: 99.38 Total Percent
Identity: 99.07 Gaps : 0
Alignment: MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
1 MLAVGCALLAALLAAPGAALAPRRCPAQEVARGVLTSLPGDSVTLTCPGV 50 51 EPEDNATVHWVLRKPAAGSHPSRWAGMGRRLLLRSVQLHDSGNYSCYRAG 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 EPEDNATVHWVLRKPAAGSHPSRWAGMGRRLLLRSVQLHDSGNYSCYRAG 100 101 RPAGTVHLLVDVPPEEPQLSCFRKSPLSNWCEWGPRSTPSLTTKAVLLV 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 RPAGTVHLLVDVPPEEPQLSCFRKSPLSNVVCEWGPRSTPSLTTKAVLLV 150 151 RKFQNSPAEDFQEPCQYSQESQKFSCQLAVPEGDSSFYIVSMCVASSVGS 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 RKFQNSPAEDFQEPCQYSQESQKFSCQLAVPEGDSSFYIVSMCVASSVGS
200 . . . . . 201 KFSKTQTFQGCGILQPDPPANITVTAVARNPRWLSVTWQDPHSWNSSFYR
250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 KFSKTQTFQGCGILQPDPPANITVTAVARNPRWLSVTWQDPHSWNSSFYR 250 251 LRFELRYRAERSKTFTTWMVKDLQHHCVIHDAWSGLRHVVQLRAQEEFGQ 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 LRFELRYRAERSKTFTTWMVKDLQHHCVIHDAWSGLRHVVQLRAQEEFGQ 300 301 GEWSEWSPEAMGTPWTDRLSP 321 I I I I I I I I I I I I I I I I : II 301 GEWSEWSPEAMGTPWTESRSP 321
Subsection J: Vascular endothelial growth factor receptor 2 precursor DESCRIPTION FOR CLUSTER HUMKDRZ Cluster HUMKDRZ features 3 transcript(s) and 38 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3.
Table 1 - Transcripts of interest
Table 2 - Segments of interest SequenceJD,No ; HUMKDRZ PEA node 0 1183 HUMKDRZ PEA node 4 1184 HUMKDRZ PEA node 6 1185 HUMKDRZ PEA node 8 1186 HUMKDRZ PEA node 10 1 187 HUMKDRZ PEA node 12 1188 HUMKDRZ PEA node 14 1189 HUMKDRZ PEA node 18 1190 HUMKDRZ PEA node 19 1191 HUMKDRZ PEA node 21 1192 HUMKDRZ PEA node 23 1 193 HUMKDRZ PEA node 27 1194 HUMKDRZ PEA node 30 1195 HUMKDRZ PEA node 32 1196 HUMKDRZ PEA node 36 1197 HUMKDRZ PEA node 45 1198 HUMKDRZ PEA node 49 1199 HUMKDRZ PEA node 58 1200 HUMKDRZ PEA node 65 1201 HUMKDRZ PEA node 67 1202 HUMKDRZ PEA node 68 1203 HUMKDRZ PEA node 69 1204 HUMKDRZ PEA node 70 1205 HUMKDRZ PEA node 71 1206 HUMKDRZ PEA node 2 1207 HUMKDRZ PEA node 16 1208 HUMKDRZ PEA node 25 1209 HUMKDRZ PEA node 34 1210 HUMKDRZ PEA node 38 1211 HUMKDRZ PEA node 40 1212 HUMKDRZ PEA node 43 1213 HUMKDRZ PEA node 47 1214 HUMKDRZ PEA node 51 1215 HUMKDRZ PEA node 53 1216 HUMKDRZ PEA node 55 1217 HUMKDRZ PEA node 61 1218 HUMKDRZ PEA node 63 1219 HUMKDRZ PEA node 66 1220
Table 3 - Proteins of interest
These sequences are variants ofthe known protein Vascular endothelial growth factor receptor 2 precursor (SwissProt accession identifier VGR2 HUMAN; known also according to the synonyms EC 2.7.1.112; VEGFR-2; Kinase insert domain receptor; Protein-tyrosine kinase receptor Flk-1), SEQ ID NO: 1221, referred to herein as the previously known protein. Protein Vascular endothelial growth factor receptor 2 precursor is known or believed to have the following function(s): receptor for VEGF or VEGF-C; has a tyrosine-protein kinase activity; the VEGF-kinase ligand/receptor signaling system plays a key role in vascular development and regulation of vascular permeability. Undetectable in normal human breast tissues, this protein was found to be overexpressed by the vast majority of human primary breast cancers examined. The variants ofthe present invention have this diagnostic utility.
The sequence for protein Vascular endothelial growth factor receptor 2 precursor is given at the end ofthe application, as "Vascular endothelial growth factor receptor 2 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Vascular endothelial growth factor receptor 2 precursor localization is believed to be Type I membrane protein.
It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities of the previously known protein are as follows: Endothelial growth factor receptor kinase inhibitor; Angiogenesis modulator; Endothelial growth factor modulator. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Cardiovascular; Vulnerary; Anticancer; Symptomatic antidiabetic; Monoclonal antibody, human. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: angiogenesis; protein amino acid phosphorylation; transmembrane receptor protein tyrosine kinase signaling pathway, which are annotation(s) related to Biological Process; receptor; vascular endothelial growth factor receptor; ATP binding; transferase, which are annotation(s) related to Molecular Function; and integral plasma membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster HUMKDRZ features 3 transcript(s), which were listed in
Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Vascular endothelial growth factor receptor 2 precursor. A description of each variant protein according to the present invention is now provided. Variant protein HUMKDRZ_PEA_1_P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HUMKDRZ_PEA_1_T13. An alignment is given to the known protein (Vascular endothelial growth factor receptor 2 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMKDRZ_PEA_1_P9 and VGR2JHUMAN:
1.An isolated chimeric polypeptide encoding for HUMKDRZ_PEA_1_P9, comprising a first amino acid sequence being at least 90 % homologous to
MQSKVLLAVALWLCVETRAASVGLPSVSLDLPRLSIQKDILTIKANTTLQITCRGQR DLDWLWPNNQSGSEQRVEVTECSDGLFCKTLTIPKVIGNDTGAYKCFYRETDLASVI YVYVQDYRSPFIASVSDQHGVVYITENKNKTWIPCLGSISNLNVSLCARYPEKRFVP DGNRISWDSKKGFTIPSYMISYAGMVFCEAKINDESYQSIMYIVVVVGYRIYDVVLS PSHGIELSVGEKLVLNCTARTELNVGIDFNWEYPSSKHQHKKLVNRDLKTQSGSEM KKFLSTLTIDGVTRSDQGLYTCAASSGLMTKKNSTFVRVHEKPFVAFGSGMESLVE ATVGERVRIPAKYLGYPPPEIKWYKNGIPLESNHTIKAGHVLTIMEVSERDTGNYTVI LTNPISKEKQSH WSL WY corresponding to amino acids 1 - 418 of VGR2_HUMAN, which also corresponds to amino acids 1 - 418 of HUMKDRZ_PEA_1_P9, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%), more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence GESIQFSSLPKIYYDTLSSKSAKPPFLCLLLLHSYHGWACVQKSSGWKLK corresponding to amino acids 419 - 469 of HUMKDRZ_PEA_1_P9, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMKDRZ_PEA_1_P9, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95% homologous to the sequence
GESIQFSSLPKIYYDTLSSKSAKPPFLCLLLLHSYHGWACVQKSSGWKLK in HUMKDRZ_PEA_1_P9.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMKDRZ_PEA_1_P9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s)
on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMKDRZ_PEA_1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
The glycosylation sites of variant protein HUMKDRZ_PEA_1_P9, as compared to the known protein Vascular endothelial growth factor receptor 2 precursor, are described in Table 6 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 6 - Glycosylation site(s)
The phosphorilation sites of variant protein HUMKDRZ_PEA_1_P9, as compared to the known protein Vascular endothelial growth factor receptor 2 precursor, are described in Table 7 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorilation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 7 - Phosphorilation sιte(s)
Variant protein HUMKDRZ_PEA_1_P9 is encoded by the following transcript(s): HUMKDRZ_PEA_1_T13, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMKDRZ_PEA_1_T13 is shown in bold; this coding portion starts at position 303 and ends at position 1709. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMKDRZ_PEA_1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HUMKDRZ_PEA_1_P10 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMKDRZ_PEA_1_T4. An alignment is given to the known protein (Vascular endothelial growth factor receptor 2 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMKDRZ_PEA_1_P10 and VGR2_HUMAN: l.An isolated chimeric polypeptide encoding for HUMKDRZ_PEA_1_P10, comprising a first amino acid sequence being at least 90 % homologous to
MQSKVLLAVALWLCVETRAASVGLPSVSLDLPRLSIQKDILTIKANTTLQITCR corresponding to amino acids 1 - 54 of VGR2_HUMAN, which also corresponds to amino acids 1 - 54 of HUMKDRZ PEA I PIO, and a second amino acid sequence being at least 70%), optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MKDRRPLGLT corresponding to amino acids 55 - 64 of HUMKDRZ_PEA_1_P10, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMKDRZ_PEA_1_P10, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%>, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MKDRRPLGLT in HUMKDRZ_PEA_1_P10.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein HUMKDRZ_PEA_1 JP10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein
HUMKDRZ PEA I PIO sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
The glycosylation sites of variant protein HUMKDRZ PEA I PIO, as compared to the known protein Vascular endothelial growth factor receptor 2 precursor, are described in Table 10 (given according to their posιtion(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 10 - Glycosylation site(s)
The phosphorilation sites of variant protein HUMKDRZ_PEA_1_P10, as compared to the known protein Vascular endothelial growth factor receptor 2 precursor, are described in Table 11 (given according to their posιtion(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorilation site is present in the
variant protein; and the last column indicates whether the position is different on the variant protein). Table 11 - Phosphorilation site(s)
Variant protein HUMKDRZ PEA I PIO is encoded by the following transcript(s): HUMKDRZ_PEA_1_T4, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HUMKDRZ PEA 1 T4 is shown in bold; this coding portion starts at position 303 and ends at position 494. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HUMKDRZ PEA I PIO sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein HUMKDRZ PEA I PI 1 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HUMKDRZ PEA 1 T5. An alignment is given to the known protein (Vascular endothelial growth factor receptor 2 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HUMKDRZ_PEA_1_P1 1 and VGR2_HUMAN: l .An isolated chimeric polypeptide encoding for HUMKDRZ_PEA_1_P11, comprising a first amino acid sequence being at least 90 % homologous to MQSKVLLAVALWLCVETRAASVG corresponding to amino acids 1 - 23 of VGR2 HUMAN, which also corresponds to amino acids 1 - 23 of HUMKDRZ PEA I PI 1, and a second amino acid sequence being at least 10%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95%) homologous to a polypeptide having the sequence DRGTWTGFGPIIRVAVSKGWR corresponding to amino acids 24 - 44 of HUMKDRZ PEA I PI 1, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HUMKDRZ_PEA_1_P11, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%), more preferably at least about 90% and most preferably at least about 95%
homologous to the sequence DRGTWTGFGPIIRVAVSKGWR in HUMKDRZ PEA 1 Pl l.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.
The glycosylation sites of variant protein HUMKDRZ_PEA_1_P11, as compared to the known protein Vascular endothelial growth factor receptor 2 precursor, are described in Table 13 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 13 - Glycosylation site(s)
The phosphorilation sites of variant protein HUMKDRZ_PEA_1_P11, as compared to the known protein Vascular endothelial growth factor receptor 2 precursor, are described in Table 14 (given according to their posιtion(s) on the amino acid sequence in the first column; the second column indicates whether the phosphorilation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 14 - Phosphorilation sιte(s)
Variant protein HUMKDRZ PEA I PI 1 is encoded by the following transcript(s): HUMKDRZ_PEA_1_T5, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HUMKDRZ_PEA_1_T5 is shown in bold; this coding portion starts at position 303 and ends at position 434. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not, the presence of known SNPs in variant protein HUMKDRZ PEA I PI 1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HUMKDRZ_PEA_l_node_0 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4, HUMKDRZ_PEA_1_T5 and HUMKDRZ_PEA_1_T13. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_4 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described.
This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_6 according to the present invention is supported by 15 libraπes. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): HUMKDRZ_PEA_1_T4, HUMKDRZ_PEA_1_T5 and HUMKDRZ_PEA_1_T13. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_8 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4, HUMKDRZ_PEA_1_T5 and HUMKDRZ_PEA_1_T13. Table 19 below describes the starting and ending position of this segment on each transcπpt. Table 19 - Segment location on transcripts
Segment cluster HUMKDRZ PEA l node lO according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): HUMKDRZ_PEA_1_T4,
HUMKDRZ_PEA_1_T5 and HUMKDRZ_PEA_1_T13. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_12 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4, HUMKDRZ_PEA_1_T5 and HUMKDRZ_PEA_1_T13. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_14 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4, HUMKDRZ_PEA_1_T5 and HUMKDRZ_PEA_1_T13. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_18 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4, HUMKDRZ_PEA_1_T5 and HUMKDRZ_PEA_1_T13. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_19 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ __PEA_1_T13. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_21 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_23 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_27 according to the present invention is supported by 20 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_30 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts εf Segment1, L *» .'. ISelεmeήfc fe '" 'sAt, ''' ending position ya| , HUMKDRZ PEA 1 T4 2444 2590 HUMKDRZ PEA 1 T5 2196 2342
Segment cluster HUMKDRZ_PEA_l_node_32 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_36 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_45 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_49 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_58 according to the present invention is supported by 36 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_65 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_67 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_68 according to the present invention is supported by 49 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_69 according to the present invention is supported by 92 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ PEA 1 T5. Table 37 below describes the starting and ending position of this segment on each transcript. Table 37 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_70 according to the present invention is supported by 85 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_71 according to the present invention is supported by 87 libraries. The number of libraπes was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HUMKDRZ_PEA_l_node_2 according to the present invention is supported by 12 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T13. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_16 according to the present invention is supported by 18 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4, HUMKDRZ_PEA_1_T5 and HUMKDRZ_PEA_1_T13. Table 41 below describes the starting and ending position of this segment on each transcript. Table 41 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_25 according to the present invention is supported by 15 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ PEA 1 T4 and HUMKDRZ PEA 1 T5. Table 42 below describes the starting and ending position of this segment on each transcript. Table 42 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_34 according to the present invention is supported by 14 libraπes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 43 below describes the starting and ending position of this segment on each transcript. Table 43 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_38 according to the present invention is supported by 14 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_40 according to the present invention is supported by 17 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 45 below describes the starting and ending position of this segment on each transcript. Table 45 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_43 according to the present invention is supported by 19 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_47 according to the present invention is supported by 25 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_51 according to the present invention is supported by 28 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_53 according to the present invention is supported by 31 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_55 according to the present invention is supported by 30 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_61 according to the present invention is supported by 35 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ PEA 1 T4 and HUMKDRZ PEA 1 T5. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_63 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Segment cluster HUMKDRZ_PEA_l_node_66 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HUMKDRZ_PEA_1_T4 and HUMKDRZ_PEA_1_T5. Table 53 below describes the starting and ending position of this segment on each transcript. Table 53 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: VGR2_HUMAN Sequence documentation: Alignment of: HUMKDRZ_PEA_1_P9 x VGR2_HUMAN Alignment segment 1/1: Quality: 4079.00 Escore: 0 Matching length: 418 Total length: 418 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MQSKVLLAVALWLCVETRAASVGLPSVSLDLPRLSIQKDILTIKANTTLQ 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
MQSKVLLAVAL LCVETRAASVGLPSVSLDLPRLSIQKDILTIKANTTLQ 50 ITCRGQRDLDWL PNNQSGSEQRVEVTECSDGLFCKTLTIPKVIGNDTGA I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ITCRGQRDLDWLWPNNQSGSEQRVEVTECSDGLFCKTLTIPKVIGNDTGA
YKCFYRETDLASVIYVYVQDYRSPFIASVSDQHGVVYITENKNKTVVIPC I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I YKCFYRETDLASVIYVYVQDYRSPFIASVSDQHGVVYITENKNKTVVIPC
LGSISNLNVSLCARYPEKRFVPDGNRIS DSKKGFTIPSYMISYAGMVFC
I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I LGSISNLNVSLCARYPEKRFVPDGNRISWDSKKGFTIPSYMISYAGMVFC . . . . . EAKINDESYQSIMYIVVVVGYRIYDVVLSPSHGIELSVGEKLVLNCTART
I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I EAKINDESYQSIMYIVVVVGYRIYDVVLSPSHGIELSVGEKLVLNCTART
ELNVGIDFN EYPSSKHQHKKLVNRDLKTQSGSEMKKFLSTLTIDGVTRS
I I I I I l-l I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ELNVGIDFNWEYPSSKHQHKKLVNRDLKTQSGSEMKKFLSTLTIDGVTRS
DQGLYTCAASSGLMTKKNSTFVRVHEKPFVAFGSGMESLVEATVGERVRI I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I DQGLYTCAASSGLMTKKNSTFVRVHEKPFVAFGSGMESLVEATVGERVRI
PAKYLGYPPPEIK YKNGIPLESNHTIKAGHVLTIMEVSERDTGNYTVIL
I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I PAKYLGYPPPEIKWYKNGIPLESNHTIKAGHVLTIMEVSERDTGNYTVIL
TNPISKEKQSHVVSLVVY
I I I I I I I I I I I I I I I I I I TNPISKEKQSHVVSLVVY
Sequence name: VGR2_HUMAN
Sequence documentation:
Alignment of: HUMKDRZ_PEA_1_P10 x VGR2_HUMAN
Alignment segment 1/1: Quality: 506.00 Escore: 0 Matching length: 54 Total length: 54 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment : 1 MQSKVLLAVALWLCVETRAASVGLPSVSLDLPRLSIQKDILTIKANTTLQ 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MQSKVLLAVAL LCVETRAASVGLPSVSLDLPRLSIQKDILTIKANTTLQ 50 51 ITCR 54 51 ITCR 54
Sequence name: VGR2_HUMAN Sequence documentation:
Alignment of: HUMKDRZ_PEA_l_Pll x VGR2_HUMAN Alignment segment 1/1: Quality: 216.00
Escore: 0 Matching length: 2233 Total length: 23 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00
Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 1 MQSKVLLAVAL LCVETRAASVG 23 I I I I I I I I I I I I I I I I I I I I I I I 1 MQSKVLLAVALWLCVETRAASVG 23
Subsection K: Group X secretory phospholipase A2 precursor
DESCRIPTION FOR CLUSTER N93958 Cluster N93958 features 5 transcript(s) and 5 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
N93958 PEA 1 node 4 1230 N93958 PEA 1 node 6 1231 N93958 PEA 1 node 15 1232 N93958 PEA 1 node 8 1233 N93958 PEA 1 node 12 1234
Table 3 - Proteins of interest
These sequences are variants ofthe known protein Group X secretory phospholipase A2 precursor (SwissProt accession identifier PA2X_HUMAN; known also according to the synonyms EC 3.1.1.4; Phosphatidylcholine 2-acylhydrolase GX; GX sPLA2; sPLA2-X), SEQ ID NO: 1235, referred to herein as the previously known protein. Protein Group X secretory phospholipase A2 precursor is known or believed to have the following function(s): PA2 catalyzes the calcium-dependent hydrolysis of the 2- acyl groups in 3-sn-phosphoglycerides. Has a powerful potency for releasing arachidonic acid from cell membrane phosphohpids. Prefers phosphatidylethanolamine and phosphatidylcholine liposomes to those of phosphatidylserine. Phospholipase ofthe A2 type (PA2) catalyzes the calcium-dependent hydrolysis ofthe
2-acyl groups in 3-sn-phosphoglycerides. It has a powerful potency for releasing arachidonic acid from cell membrane phosphohpids. This protein prefers phosphatidylethanolamine and phosphatidylcholine liposomes to those of phosphatidylserine. This protein is secreted and binds 1 calcium ion per subunit as a cofactor. Its reaction is: Phosphatidylcholine + H20 = 1 -acylglycerophosphocholine + a carboxylate. This gene belongs to the Phospholipase A2 family and is specifically found in spleen, thymus, peripheral blood leukocytes, pancreas, lung, and colon. The Mouse group X secretory phospholipase A2 induces a potent release of arachidonic acid from spleen cells and acts as a ligand for the phospholipase A2 receptor. The deposition of cholesterol ester within foam cells ofthe artery wall is fundamental to the pathogenesis of atherosclerosis. Modifications of low density lipoprotein (LDL), such as oxidation, are prerequisite events for the formation of foam cells. Group X secretory phospholipase A2 (sPLA2-X) may be involved in this process. sPLA2-X was found to induce potent hydrolysis of phosphatidylcholine in LDL leading to the production of large amounts of unsaturated fatty acids and lysophosphatidylcholme (lyso-PC), which contrasted with little, if any, lipolytic modification of LDL by the classic types of group IB and HA secretory PLA2s. Treatment with sPLA2-X caused an increase in the negative charge of LDL with little modification of apolipoprotein B (apoB) in contrast to the excessive aggregation and fragmentation of apoB in oxidized LDL. The sPLA2-X-modifιed LDL was efficiently incoφorated into macrophages to induce the accumulation of cellular cholesterol ester and the formation of non-membrane-bound lipid droplets in the cytoplasm, whereas the extensive accumulation of multilayered structures was found in the cytoplasm in oxidized LDL-treated macrophages.
Immunohistochemical analysis revealed marked expression of sPLA2-X in foam cell lesions in the arterial intima of high fat-fed apolipoprotein E-deficient mice. These findings suggest that modification of LDL by sPLA2-X in the arterial vessels is one ofthe mechanisms responsible for the generation of atherogenic lipoprotein particles as well as the production of various lipid mediators, including unsaturated fatty acids and lyso-PC. The rationale for suggesting this gene as a marker candidate is that this exact mechanism is the basis for the PLACTM which uses another secreted A2 (Lp-PLA2 ) phospholipase as a marker for LDL independent coronary heart disease risk evaluation (by Diadexus). The variants of the present invention are useful for these diagnostic indications. The sequence for protein Group X secretory phospholipase A2 precursor is given at the end of the application, as "Group X secretory phospholipase A2 precursor amino acid sequence". Protein Group X secretory phospholipase A2 precursor localization is believed to be Secreted. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: phospholipase A2, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster N93958 features 5 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Group X secretory phospholipase A2 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein N93958_PEA_1_P1 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) N93958 PEA 1 T0. An alignment is given to the known protein (Group X secretory phospholipase A2 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between N93958_PEA_1_P1 and PA2X_HUMAN:
l.An isolated chimeric polypeptide encoding for N93958_PEA_1_P1, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGPLPVCLPI corresponding to amino acids 1 - 10 of N93958_PEA_1_P1, and a second amino acid sequence being at least 90 % homologous to MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPIAYMKYGCF CGLGGHGQPRDAIDWCCHGHDCCYTRAEEAGCSPKTERYSWQCVNQSVLCGPAEN KCQELLCKCDQEIANCLAQTEYNLKYLFYPQFLCEPDSPKCD corresponding to amino acids 1 - 155 of P A2X HUM AN, which also corresponds to amino acids 1 1 - 165 of N93958 PEA 1 P1, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of N93958_PEA_1_P1, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGPLPVCLPI ofN93958_PEA_l_Pl.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein N93958 PEA 1 P1 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 4, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein N93958_PEA_ 1_P1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 4 - Amino acid mutations
The glycosylation sites of variant protein N93958 PEA 1 P1, as compared to the known protein Group X secretory phospholipase A2 precursor, are described in Table 5 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 5 - Glycosylation site(s)
Variant protein N93958 PEA 1JP1 is encoded by the following transcript(s): N93958_PEA_1_T0, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript N93958_PEA_1_T0 is shown in bold; this coding portion starts at position 441 and ends at position 935. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein N93958_PEA_1_P1 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein N93958_PEA_1_P2 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) N93958_PEA_1_T3. An alignment is given to the known protein (Group X secretory phospholipase A2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between N93958_PEA_1_P2 and PA2X_HUMAN: l.An isolated chimeric polypeptide encoding for N93958_PEA_l_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a
polypeptide having the sequence MGPLPVCLPI corresponding to amino acids 1 - 10 of N93958_PEA_1_P2, a second amino acid sequence being at least 90 % homologous to MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPIAYMKYGCF CGLGGHGQPRDAIDWCCHGHDCCYTRAEEAGCSPKTERYSWQCVNQSVLC corresponding to amino acids 1 - 108 of P A2X HUMAN, which also conesponds to amino acids 11 - 1 18 of N93958_PEA_1_P2, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VLLCHPGWSAVV corresponding to amino acids 119 - 130 of N93958_PEA_1_P2, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of N93958_PEA_1_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGPLPVCLPI of N93958_PEA_1_P2. 3.An isolated polypeptide encoding for a tail of N93958_PEA_1_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VLLCHPGWSAVV in N93958_PEA_1_P2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein N93958_PEA_1_P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein
N93958_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 7 - Amino acid mutations
The glycosylation sites of variant protein N93958 PEA 1 P2, as compared to the known protein Group X secretory phospholipase A2 precursor, are described in Table 8 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 8 - Glycosylation site(s)
Variant protein N93958_PEA_1_P2 is encoded by the following transcript(s): N93958_PEA_1_T3, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript N93958_PEA_1_T3 is shown in bold; this coding portion starts at position 441 and ends at position 830. The transcript also has the following SNPs as listed in Table 9 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein N93958_PEA_1_P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Nucleic acid SNPs
Variant protein N93958 PEA 1 P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) N93958_PEA_1_T8. An alignment is given to the known protein (Group X secretory phospholipase A2 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between N93958_PEA_1_P4 and PA2X_HUMAN:
l .An isolated chimeric polypeptide encoding for N93958_PEA_1_P4, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGPLPVCLPI corresponding to amino acids 1 - 10 of N93958_PEA_1_P4, a second amino acid sequence being at least 90 % homologous to
MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPIAYMKYGCF CGLGGHGQPRDAIDW corresponding to amino acids 1 - 73 of PA2X HUMAN, which also corresponds to amino acids 1 1 - 83 of N93958_PEA_1_P4, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TGREQMPRTVVQV corresponding to amino acids 84 - 96 of N93958_PEA_1_P4, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of N93958 PEA 1 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGPLPVCLPI of N93958_PEA_1_P4. 3.An isolated polypeptide encoding for a tail of N93958 PEA 1 P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TGREQMPRTVVQV in N93958_PEA_1_P4.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein N93958_PEA_1_P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 10, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein
N93958_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Amino acid mutations
The glycosylation sites of variant protein N93958_PEA_1_P4, as compared to the known protein Group X secretory phospholipase A2 precursor, are described in Table 1 1 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 11 - Glycosylation sιte(s)
Variant protein N93958 PEA 1 P4 is encoded by the following transcript(s): N93958_PEA_1_T8, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript N93958_PEA_1_T8 is shown in bold; this coding portion starts at position 440 and ends at position 728. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein N93958 PEA 1 P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein N93958 PEA 1 P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) N93958_PEA_1_T10. An alignment is given to the known protein (Group X secretory phospholipase A2 precursor) at the end of the application. One or more alignments to one or
more previously published protein sequences are given at the end of the application. A brief description of the relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between N93958_PEA_1_P5 and PA2X_HUMAN: 1.An isolated chimeric polypeptide encoding for N93958_PEA_1_P5, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence MGPLPVCLPI corresponding to amino acids 1 - 10 of N93958_PEA_1_P5, a second amino acid sequence being at least 90 % homologous to MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPIAYMKYGCF CGLGGHGQPRDAID corresponding to amino acids 1 - 72 of PA2X HUM AN, which also corresponds to amino acids 11 - 82 of N93958_PEA_1_P5, and a third amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence CLALSPRLECSGVISAHFNLCLLGSSDPRTSAS corresponding to amino acids 83 - 115 of N93958 PEA 1 P5, wherein said first amino acid sequence, second amino acid sequence and third amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a head of N93958_PEA_1_P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence MGPLPVCLPI of N93958_PEA_1_P5. 3.An isolated polypeptide encoding for a tail of N93958 PEA 1 P5, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence CLALSPRLECSGVISAHFNLCLLGSSDPRTSAS in N93958_PEA_1_P5.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.
Variant protein N93958_PEA_1_P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein N93958_PEA_1_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
The glycosylation sites of variant protein N93958 PEA 1 P5, as compared to the known protein Group X secretory phospholipase A2 precursor, are described in Table 14 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 14 - Glycosylation site(s)
Variant protein N93958_PEA_1_P5 is encoded by the following transcript(s): N93958 PEA 1 T10, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript N93958_PEA_1_T10 is shown in bold; this coding portion starts at position 440 and ends at position 785. The transcript also has the following SNPs as listed in Table 15 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein N93958 PEA 1 P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Nucleic acid SNPs
Variant protein N93958_PEA_1_P7 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) N93958_PEA_1_T9. An alignment is given to the known protein (Group X secretory phospholipase A2 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between N93958_PEA_1_P7 and PA2X_HUMAN: 1.An isolated chimeric polypeptide encoding for N93958_PEA_1_P7, comprising a first amino acid sequence being at least 90 % homologous to
MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPIAYMKYGCF CGLGGHGQPRDAIDW corresponding to amino acids 1 - 73 of PA2X_HUMAN, which also corresponds to amino acids 1 - 73 of N93958_PEA_1_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TGREQMPRTVVQV corresponding to amino acids 74 - 86 of N93958 PEA 1 P7, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of N93958_PEA_1_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TGREQMPRTVVQV in N93958_PEA_1_P7.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein N93958_PEA_1_P7 also has the following non-silent SNPs (Single
Nucleotide Polymoφhisms) as listed in Table 16, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein
N93958_PEA_1_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Amino acid mutations
The glycosylation sites of variant protein N93958_PEA_1_P7, as compared to the known protein Group X secretory phospholipase A2 precursor, are described in Table 17 (given according to their position(s) on the amino acid sequence in the first column; the second column indicates whether the glycosylation site is present in the variant protein; and the last column indicates whether the position is different on the variant protein). Table 17 - Glycosylation site(s)
Variant protein N93958 PEA 1 P7 is encoded by the following transcript(s): N93958 PEA 1 T9, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript N93958 PEA 1 T9 is shown in bold; this coding portion starts at position 471 and ends at position 728. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein N93958_PEA_1_P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
As noted above, cluster N93958 features 5 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster N93958_PEA_l_node_4 according to the present invention is supported by 8 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): N93958_PEA_1_T0, N93958_PEA_1_T3, N93958_PEA_1_T8, N93958_PEA_1_T9 and N93958_PEA_l_T10. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts
Segment cluster N93958_PEA_l_node_6 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): N93958_PEA_1_T0, N93958_PEA_1_T3, N93958_PEA_1_T8, N93958_PEA_1_T9 and N93958_PEA_l_T10. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster N93958_PEA_l_node_l 5 according to the present invention is supported by 27 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): N93958_PEA_1_T0, N93958_PEA_1_T3, N93958_PEA_1_T8, N93958_PEA_1_T9 and N93958_PEA_1_T10. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster N93958_PEA_l_node_8 according to the present invention is supported by 9 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): N93958_PEA_1_T0 and N93958 PEA 1 T3. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster N93958_PEA_l_node_12 according to the present invention is supported by 1 libraπes. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): N93958_PEA_1_T3 and N93958_PEA_1_T10. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Variant protein alignment to the previously known protein:
Sequence name: PA2X_HUMAN
Sequence documentation:
Alignment of: N93958_PEA_l_Pl x PA2X_HUMAN
Alignment segment 1/1: Quality: 1583.00
Escore: 0 Matching length: 155 Total length: 155 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 11 MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPI 60 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 1 MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPI 50 61 AYMKYGCFCGLGGHGQPRDAID CCHGHDCCYTRAEEAGCSPKTERYSWQ
110 I I I I I I II I I I I I I II I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I II 51 AYMKYGCFCGLGGHGQPRDAIDWCCHGHDCCYTRAEEAGCSPKTERYS Q 100 111 CVNQSVLCGPAENKCQELLCKCDQEIANCLAQTEYNLKYLFYPQFLCEPD
160 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 CVNQSVLCGPAENKCQELLCKCDQEIANCLAQTEYNLKYLFYPQFLCEPD
150 161 SPKCD
165 151 SPKCD
155
Sequence name: PA2X_HUMAN Sequence documentation:
Alignment of: N93958_PEA_1_P2 x PA2X_HUMAN
Alignment segment 1/1: Quality: 1089.00 Escore: 0 Matching length: 108 Total length: 108 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0
Alignment: 11 MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPI -60 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPI 50 61 AYMKYGCFCGLGGHGQPRDAIDWCCHGHDCCYTRAEEAGCSPKTERYSWQ
110 51 AYMKYGCFCGLGGHGQPRDAID CCHGHDCCYTRAEEAGCSPKTERYSWQ
100 111 CVNQSVLC
118 I I I I I I I I 101 CVNQSVLC
108
Sequence name: PA2X_HUMAN Sequence documentation:
Alignment of: N93958_PEA_1_P4 x PA2X_HUMAN Alignment segment 1/1: Quality: 706.00
Escore : Matching length: 73 Total length: 73
Matching Percent Similarity: 100 , . 00 Matching Percent Identity: 100.00 Total Percent Similarity: 100 . . 00 Total Percent Identity: 100.00 Gaps: 0
Alignment : 11 MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPI 60 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPI 50 61 AYMKYGCFCGLGGHGQPRDAIDW 83 I I I I I I I I I I I I I I I I I I I I I I I 51 AYMKYGCFCGLGGHGQPRDAIDW 73
Sequence name: PA2X_HUMAN
Sequence documentation:
Alignment of: N93958_PEA_1_P5 x PA2X_HUMAN
Alignment segment 1/1: Quality: 689.00
Escore: 0 Matching length: 72 Total length: 72 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps: 0 Alignment: 11 MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPI 60 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPI 50 61 AYMKYGCFCGLGGHGQPRDAID 82 I I I I I I I I I I I I I I I I I I I I I I 51 AYMKYGCFCGLGGHGQPRDAID 72
Sequence name : PA2X_HUMAN Sequence documentation: Alignment of: N93958_PEA_1_P7 x PA2X_HUMAN Alignment segment 1/1: Quality: 706.00 Escore: 0 Matching length: 73 Total length: 73 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: 1 MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPI 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MLLLLLPSLLLLLLLPGPGSGEASRILRVHRRGILELAGTVGCVGPRTPI 50 51 AYMKYGCFCGLGGHGQPRDAIDW 73 I I I I I I I I I I I I I I I I I I I I I I I 51 AYMKYGCFCGLGGHGQPRDAI DW 73
Subsection K: Group XII secretory phospholipase A2 precursor DESCRIPTION FOR CLUSTER Z24931 Cluster Z24931 features 4 transcript(s) and 12 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Z24931 PEA 1 node 1 1245 Z24931 PEA 1 node 16 1246 Z24931 PEA 1 node 17 1247 Z24931 PEA 1 node 18 1248 Z24931 PEA 1 node 0 1249 Z24931 PEA 1 node 2 1250 Z24931 PEA 1 node 4 1251 Z24931 PEA 1 node 8 1252 Z24931 PEA 1 node 9 1253 Z24931 PEA 1 node 10 1254 Z24931 PEA 1 node 13 1255 Z24931 PEA 1 node 14 1256
Table 3 - Proteins of interest
These sequences are variants ofthe known protein Group XII secretory phospholipase A2 precursor (SwissProt accession identifier PA2Y_HUMAN; known also according to the synonyms EC 3.1.1.4; Phosphatidylcholine 2-acylhydrolase GXII; GXII sPLA2), SEQ ID NO: 1257, referred to herein as the previously known protein. Protein Group XII secretory phospholipase A2 precursor is known or believed to have the following function(s): PA2 catalyzes the calcium-dependent hydrolysis ofthe 2- acyl groups in 3-sn-phosphoglycerides. This protein is secreted and binds 1 calcium ion per subunit as a cofactor. Its reaction is: Phosphatidylcholine + H20 = 1 -acylglycerophosphocholine + a carboxylate This gene belongs to the Phospholipase A2 family and is abundantly expressed in heart, skeletal muscle, kidney, liver and pancreas.The human group XII (hGXII) cDNA contains a putative signal peptide of 22 residues followed by a mature protein of 167 amino acids that displays homology to all known sPLA2s only over a short stretch of amino acids in the active site region. Northern blot and reverse transcription-polymerase chain reaction analyses show that the tissue distribution of hGXII is distinct from the other human sPLA2s with strong expression in heart, skeletal muscle, kidney, and pancreas and weaker expression in brain, liver, small intestine, lung, placenta, ovaries, testis, and prostate. Catalytically active hGXII was produced in Escherichia coli and shown to be Ca2+-dependent despite the fact that it is
predicted to have an unusual Ca2+-binding loop. Similar to the previously characterized mouse group HE sPLA2s, the specific activity of hGXII is low in comparison to that of other mammalian sPLA2, suggesting that hGXII could have novel functions that are independent of its phospholipase A2 activity.The rationale for suggesting this gene as a marker candidate is that this exact mechanism is the basis for the PLACTM which uses another secreted A2 (Lp-PLA2 ) phospholipase as a marker for LDL independent coronary heart disease risk evaluation (by Diadexus). In addition to the X-group PLA2, this gene is also expressed in heart. The variants ofthe present invention are useful for these diagnostic utilities. The sequence for protein Group XII secretory phospholipase A2 precursor is given at the end of the application, as "Group XII secretory phospholipase A2 precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Group XII secretory phospholipase A2 precursor localization is believed to be Secreted.
It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities ofthe previously known protein are as follows: 5 Lipoxygenase inhibitor; Leucotriene B4 antagonist; Phospholipase A2 inhibitor; Phospholipase inhibitor. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Anti-inflammatory; Analgesic, NSAID; Cardiovascular; Neuroprotective; Septic shock treatment; Antiarthritic. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: lipid catabolism, which are annotation(s) related to Biological Process; calcium binding; hydrolase, which are annotation(s) related to Molecular Function; and extracellular, which are annotation(s) related to Cellular Component.
The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster Z24931 features 4 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Group XII secretory phospholipase A2 precursor. A description of each variant protein according to the present invention is now provided.
Variant protein Z24931 PEA 1 P3 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) Z24931_PEA_1_T5. An alignment is given to the known protein (Group XII secretory phospholipase A2 precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship ofthe variant protein according to the present invention to- each such aligned protein is as follows: Comparison report between Z24931_PEA_1_P3 and PA2YJHUMAN: l.An isolated chimeric polypeptide encoding for Z24931_PEA_1_P3, comprising a first amino acid sequence being at least 90 % homologous to MALLSRPALTLLLLLMAAVVRCQEQAQTTDWRATLKTIRNGVHKIDTYLNAALDL LGGEDGLCQYKCSD corresponding to amino acids 1 - 69 of PA2Y_HUMAN, which also corresponds to amino acids 1 - 69 of Z24931_PEA_1_P3, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence A corresponding to amino acids 70 - 70 of Z24931_PEA_1_P3, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because of manual inspection of known protein localization and/or gene structure.
Variant protein Z24931_PEA_1_P3 is encoded by the following transcript(s): Z24931_PEA_1_T5, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript Z24931_PEA_1_T5 is shown in bold; this coding portion starts at position 279 and ends at position 488. The transcript also has the following SNPs as listed in Table 5 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z24931 PEA 1 P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Nucleic acid SNPs
Variant protein Z24931_PEA_1_P4 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z24931 PEA 1 T6. An alignment is given to the known protein (Group XII secretory phospholipase A2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief
description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: ,, Comparison report between Z24931_PEA_1_P4 and PA2Y HUMAN: l.An isolated chimeric polypeptide encoding for Z24931_PEA_1_P4, comprising a first amino acid sequence being at least 90 % homologous to
MALLSRPALTLLLLLMAAVVRCQEQAQTTDWRATLKTIRNGVHKIDTYLNAALDL LGGEDGLCQYKCSDGS corresponding to amino acids 1 - 71 of PA2 Y_HUM AN, which also corresponds to amino acids 1 - 71 of Z24931 PEA 1 P4, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LWLLC corresponding to amino acids 72 - 76 of Z24931_PEA_1_P4, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of Z24931_PEA_1_P4, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LWLLC in Z24931_PEA_1_P4.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.
Variant protein Z24931_PEA_1_P4 is encoded by the following transcript(s): Z24931_PEA_1_T6, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript Z24931 PEA 1 T6 is shown in bold; this coding portion starts at position 279 and ends at position 506. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z24931_PEA_1_P4 sequence provides support for the deduced sequence of this variant protein according to the present invention).
Table 6 - Nucleic acid SNPs
Variant protein Z24931_PEA_1_P6 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) Z24931_PEA_1_T10. An alignment is given to the known protein (Group XII secretory phospholipase A2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z24931_PEA_1_P6 and PA2Y_HUMAN: l.An isolated chimeric polypeptide encoding for Z24931 PEA 1 P6, comprising a first amino acid sequence being at least 90 % homologous to MALLSRPALTLLLLLMAAVVRCQEQAQTTDWRATLKTIRNGVHKIDTYLNAALDL LGGEDGLCQYKCSDGSKPFPRYGYKPSPPNGCGSPLFGVH corresponding to amino
acids 1 - 95 of P A2 Y HUMAN, which also corresponds to amino acids.1 - 95 of Z24931_PEA_1_P6, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VL corresponding to amino acids 96 - 97 of Z24931 PEA 1 P6, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region. Variant protein Z24931 PEA 1JP6 is encoded by the following transcript(s): Z24931_PEA_1_T10, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript Z24931_PEA_1_T10 is shown in bold; this coding portion starts at position 279 and ends at position 569. The transcript also has the following SNPs as listed in Table 7 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z24931 PEA 1 P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Nucleic acid SNPs
Variant protein Z24931_PEA_1_P8 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) Z24931_PEA_1_T14. An alignment is given to the known protein (Group XII secretory phospholipase A2 precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between Z24931_PEA_1_P8 and PA2Y HUMAN: l.An isolated chimeric polypeptide encoding for Z24931_PEA_1_P8, comprising a first amino acid sequence being at least 90 % homologous to MALLSRPALTLLLLLMAAVVRCQEQAQTTDWRATLKTIRNGVHKIDTYLNAALDL LGGEDGLCQYKC corresponding to amino acids 1 - 67 of P A2 Y_HUMAN, which also corresponds to amino acids 1 - 67 of Z24931_PEA_1_P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence TFPTLWL corresponding to amino acids 68 - 74 of Z24931_PEA_1_P8, wherein said first amino acid sequence and second amino acid sequence are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of Z24931 PEA 1 P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence TFPTLWL in Z24931_PEA_1_P8.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other
specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.
Variant protein Z24931 PEA 1 P8 is encoded by the following transcript(s): Z24931_PEA_1_T14, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript Z24931_PEA_1_T14 is shown in bold; this coding portion starts at position 279 and ends at position 500. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein Z24931 PEA 1 P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
As noted above, cluster Z24931 features 12 segment(s), which were listed in Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster Z2493 l_PEA_l_node_l according to the present invention is supported by 52 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z24931 PEA 1 T5, Z24931_PEA_1_T6, Z24931_PEA_1_T10 and Z24931_PEA_1_T14. Table 9 below describes the starting and ending position of this segment on each transcript. Table 9 - Segment location on transcripts
Segment cluster Z2493 I PEA l node l 6 according to the present invention is supported by 118 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z24931_PEA_1_T5, Z24931_PEA_1_T6, Z24931_PEA_1_T10 and Z24931_PEA_1_T14. Table 10 below descπbes the starting and ending position of this segment on each transcript. Table 10 - Segment location on transcripts
Segment cluster Z24931_PEA_l_node_17 according to the present invention is supported by 16 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z24931_PEA_1_T5,
Z24931_PEA_1_T6, Z24931_PEA_1_T10 and Z24931_PEA_1_T14. Table 11 below describes the starting and ending position of this segment on each transcript. Table 11 - Segment location on transcripts
Segment cluster Z24931_PEA_l_node_18 according to the present invention is supported by 135 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z24931_PEA_1_T5, Z24931_PEA_1_T6, Z24931_PEA_1_T10 and Z24931_PEA_1_T14. Table 12 below describes the starting and ending position of this segment on each transcript. Table 12 - Segment location on transcripts
to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster Z2493 l_PEA_l_node_0 according to the present invention is supported by 24 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z24931 PEA 1 T5, Z24931_PEA_1_T6, Z24931_PEA_1_T10 and Z24931_PEA_1_T14. Table 13 below describes the starting and ending position of this segment on each transcript. Table 13 - Segment location on transcripts
Segment cluster Z2493 l_PEA_l_node_2 according to the present invention can be found in the following transcript(s): Z24931_PEA_1_T5, Z24931_PEA_1_T6 and Z24931_PEA_1_T10. Table 14 below describes the starting and ending position of this segment on each transcript. Table 14 - Segment location on transcripts
Segment cluster Z2493 l_PEA_l_node_4 according to the present invention is supported by 3 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z24931 PEA 1 T6. Table 15 below describes the starting and ending position of this segment on each transcript. Table 15 - Segment location on transcripts
Segment cluster Z2493 l_PEA_l_node_8 according to the present invention can be found in the following transcript(s): Z24931_PEA_1_T6 and Z24931_PEA_1_T10. Table 16 below describes the starting and ending position of this segment on each transcript. Table 16 - Segment location on transcripts
Segment cluster Z2493 l_PEA_l_node_9 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z24931_PEA_1_T6, Z24931_PEA_1_T10 and Z24931_PEA_1_T14. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts
Segment cluster Z2493 l_PEA_l_node_10 according to the present invention can be found in the following transcript(s): Z24931_PEA_1_T6, Z24931_PEA_1_T10 and Z24931 PEA 1 T14. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster Z24931_PEA_l_node_13 according to the present invention is supported by 59 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z24931 PEA 1 T5, Z24931_PEA_1_T6 and Z24931_PEA_1_T14. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts Tf anschpfr narrjerv ~ . •• ι≠£ ■ Sc__£_f startmgφ sitioE ending position Z24931 PEA 1 T5 487 536 Z24931 PEA 1 T6 678 727 Z24931 PEA 1 T14 550 599
Segment cluster Z2493 l_PEA_l_node_14 according to the present invention is supported by 65 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): Z24931 PEA 1 T5, Z24931_PEA_1_T6, Z24931_PEA_1_T10 and Z24931_PEA_1_T14. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: PA2Y_HUMAN Sequence documentation: Alignment of: Z24931_PEA_1_P3 x PA2Y_HUMAN Alignment segment 1/1: Quality: 662.00 Escore: 0 Matching length: 69 Total length: 69 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0 Alignment: l MALLSRPALTLLLLLMAAWRCQEQAQTTDWRATLKTΓRNGVHKIDTYLN 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I l MALLSRPALTLLLLLMAAWRCQEQAQTTDWRATLKTIRNGVHKIDTYLN 50 51 AALDLLGGEDGLCQYKCSD 69 I I I I I I I I I I I I I I I I I I I
51 AALDLLGGEDGLCQYKCSD 69
Sequence name: PA2Y_HUMAN
Sequence documentation:
Alignment of: Z24931_PEA_1_P4 x PA2Y_HUMAN
Alignment segment 1/1: Quality: 680.00
Escore: 0 Matching length 71 Total length: 71 Matching Percent Similarity 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MALLSRPALTLLLLLMAAVVRCQEQAQTTDWRATLKTIRNGVHKIDTYLN 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MALLSRPALTLLLLLMAAVVRCQEQAQTTDWRATLKTIRNGVHKIDTYLN 50 51 AALDLLGGEDGLCQYKCSDGS 71 I I I I I I I I I I I I I I I I I I I I I 51 AALDLLGGEDGLCQYKCSDGS 71
Sequence name: PA2Y_HUMAN
Sequence documentation: Alignment of: Z24931_PEA_1_P6 x PA2Y_HUMAN Alignment segment 1/1: Quality: 936.00 Escore: 0
Matching length: 95 Total length: 95 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MALLSRPALTLLLLLMAAVVRCQEQAQTTDWRATLKTIRNGVHKIDTYLN 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MALLSRPALTLLLLLMAAVVRCQEQAQTTDWRATLKTIRNGVHKIDTYLN 50 51 AALDLLGGEDGLCQYKCSDGSKPFPRYGYKPSPPNGCGSPLFGVH ' 95 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AALDLLGGEDGLCQYKCSDGSKPFPRYGYKPSPPNGCGSPLFGVH 95
Sequence name: PA2Y_HUMAN
Sequence documentation:
Alignment of: Z24931_PEA_1_P8 x PA2Y_HUMAN Alignment segment 1/1: Quality: 645.00
Escore: 0 Matching length: 68 Total length: 68 Matching Percent Similarity: 100.00 Matching Percent
Identity: 98.53 Total Percent Similarity: 100.00 Total Percent
Identity: 98.53 Gaps: 0
Alignment: 1 MALLSRPALTLLLLLMAAWRCQEQAQTTDWRATLKTIRNGVHKIDTYLN 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MALLSRPALTLLLLLMAA VRCQEQAQTTDWRATLKTIRNGVHKIDTYLN 50 51 AALDLLGGEDGLCQYKCT 68 I I I I I I I I I I I I I I I I I : 51 AALDLLGGEDGLCQYKCS 68
Subsection L: NONSPECIFIC ALKALINE PHOSPHATASE The Alkaline Phosphatase family consists of a group of a few enzymes, all glycoproteins, encoded by four different gene loci: tissue-nonspecific, intestinal, placental, and germ-cell. The major isoenzyme, the nonspecific one, is expressed in bone liver and kidney. The different isoenzymes are found the serum and usually detected by enzymatic assay but also by ELISA assay when needed. The major variant is 485 amino acid (50kDa) and it is a homodimer bound to the cell membrane by a GPI anchor. Rise in serum levels of Alkaline phosphatase is a none-specific marker for a list of clinical situations including but not limited to: liver diseases (including infectious, malignant, autoimmune and more); cholestatic liver disorders; bone conditions characterized by rapid bone turnover including but not limited to: Paget's disease - excessive resorption of bone by osteoclasts, followed by the replacement of normal marrow by vascular, fibrous connective tissue; Osteomalacia; Rickets; cancer; bone changes occurring due to parathyroid disorders; and specific diseases including but not limited to hodgkin's disease, diabetes, hyperthyroidism and congestive heart failure; Tumors - benign, malignant (metastatic or not). When interpreting results of serum alkaline phosphatase levels, care should be taken to use the proper reference ranges, taking into account the age and sex ofthe patient. A normal total Alkaline phosphatase activity does not rule out the presence of an abnormal isoenzyme pattern, particularly in children. (Crit Rev Clin Lab Sci. 1994,31(3): 197-293). The isoenzyme produced by bone and liver is the one most used clinically. Changes in serum alkaline phosphatase that are of main diagnostic importance result from increased entry of enzyme into the circulation. As explained, this results from increased osteoblastic activity in bone disease, and increased synthesis of alkaline phosphatase by hepatocytes in hepatobiliary disease. Though this enzyme is a single gene product, the liver and bone forms of alkaline phosphatase are differently-glycosylated forms and therefore can be distinguished from one another (Clin Biochem. 1987 Aug;20(4):225-30). Special attention was given to the determination of Osteoporosis by Alkaline Phosphatase serum levels. Osteoporosis affects more than 10 million people in the USA alone, yet only 20% of them are diagnosed and treated. The major affected group is menopausal women. Non-imaging diagnostic tools to diagnose and monitor the diseases and
response to treatment are sought after. Osteoporosis is characterized by bone resorption and therefore it has been hypothesized that bone specific Alkaline Phosphatase could be of a diagnostic value. However, it was found to have a limited value only. Kyd et al showed that bone specific Alkaline Phosphatase had a value as a marker for suppression of bone turnover by alendronate (anti-resorptive drug) but wasn't useful in the detection of osteoporosis, nor the prediction of individual bone mineral density response to alendronate therapy (Ann Clin Biochem. 1998 Nov;35 ( Pt 6):717-25). Morote et al concluded that serum bone specific Alkaline Phosphatase should not be considered a good marker for the diagnosis of osteoporosis in men with prostate cancer under androgen ablation and cannot replace bone densito etry as a diagnostic tool (Int J Biol Markers. 2003 Oct-Dec; 18(4):290-4). The present invention provides (bone/liver/kidney) alkaline phosphatase variants, which may optionally be used as diagnostic markers. Preferably these (bone/liver/kidney) alkaline phosphatase are useful as diagnostic markers for liver diseases including but not limited to infectious, malignant, degenerating, cholestatic and autoimmune diseases, bone conditions including but not limited to Paget's disease, Osteomalacia, Rickets, bone tumors, osteoporosis, bone changes occurring due to parathyroid disorders, tumors (either benign, malignant or metastatic) in general, and more specific diseases including but not limited to Hodgkin's disease, diabetes, hyperthyroidism and congestive heart failure. The variants ofthe present invention are also useful in that they may optionally be detected through a variant amino acid and/or nucleic acid sequence, alone or in combination with glycosylation pattern(s), as opposed to known markers which are distinguished with glycosylation patterns alone.
DESCRIPTION FOR CLUSTER HSAPHOL Cluster HSAPHOL features 7 transcript(s) and 18 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end of the application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
HSAPHOL T9 1268
Table 2 - Segments of interest
Table 3 - Proteins of interest
These sequences are variants ofthe known protein Alkaline phosphatase, tissue- nonspecific isozyme precursor (SwissProt accession identifier PPBT HUMAN; known also according to the synonyms EC 3.1.3.1 ; AP-TNAP; Liver/bone/kidney isozyme; TNSALP), SEQ ID NO: 1287, referred to herein as the previously known protein. Protein Alkaline phosphatase, tissue-nonspecific isozyme precursor is known or believed to have the following function(s): this isozyme may play a role in skeletal mineralization. The sequence for protein Alkaline phosphatase, tissue-nonspecific isozyme precursor is given at the end ofthe application, as "Alkaline phosphatase, tissue-nonspecific
isozyme precursor amino acid sequence". Known polymorphisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
Protein Alkaline phosphatase, tissue-nonspecific isozyme precursor localization is believed to be attached to the membrane by a GPI-anchor.
The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: skeletal development; ossification; metabolism, which are annotation(s) related to Biological Process; magnesium binding; alkaline phosphatase; hydrolase, which are annotation(s) related to Molecular Function; and integral membrane protein, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more of the SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>.
As noted above, cluster HSAPHOL features 7 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein Alkaline phosphatase, tissue-nonspecific isozyme precursor. A description of each variant protein according to the present invention is now provided. Variant protein HSAPHOL P2 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOL_T4.
An alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSAPHOL_P2 and AAH21289 (SEQ ID NO: 1427): l.An isolated chimeric polypeptide encoding for HSAPHOL_P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence PHSGPAAAFIRRRGWWPGPRCA corresponding to amino acids 1 - 22 of HSAPHOL P2, second amino acid sequence being at least 90 % homologous to PATPRPLSWLRAPTRLCLDGPSPVLCA corresponding to amino acids 1 - 27 of AAH21289, which also corresponds to amino acids 23 - 49 of HSAPHOL_P2, and a third amino acid sequence being at least 90 % homologous to EKEKDPKYWRDQ AQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILK GQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVG VSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRD WYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDE KARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDM QYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHE AVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDK KPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALA LYPLSVLF corresponding to amino acids 83 - 586 of AAH21289, which also corresponds to amino acids 50 - 553 of HSAPHOL P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of HS APHOL_P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCA of HSAPHOL_P2. 3. An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino
acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a structure as follows: a sequence starting from any of amino acid numbers 49-x to 49; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2.
Comparison report between HSAPHOL_P2 and PPBT_HUMAN: l.An isolated chimeric polypeptide encoding for HSAPHOL P2, comprising a first amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA corresponding to amino acids 1 - 49 of HS APHOL P2, second amino acid sequence being at least 90 % homologous to EKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILK GQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVG VSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRD WYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDE KARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDM QYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHE AVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDK KPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALA LYPLSVLF corresponding to amino acids 21 - 524 of PPBT_HUMAN, which also corresponds to amino acids 50 - 553 of HSAPHOL_P2, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a head of HSAPHOL P2, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence PHSGPAAAFIRRRGWWPGPRCAPATPRPLSWLRAPTRLCLDGPSPVLCA of HSAPHOL_P2. 3. An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P2, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in
length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise AE, having a structure as follows: a sequence starting from any of amino acid numbers 49-x to 49; and ending at any of amino acid numbers 50+ ((n-2) - x), in which x varies from 0 to n-2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because although it is a partial protein, because both trans-membrane region prediction programs predict that this protein has a trans-membrane region, and similarity to known proteins suggests a GPI anchor. Variant protein HSAPHOL P2 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein HSAPHOL P2 is encoded by the following transcript(s): HSAPHOL T4, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSAPHOL_T4 is shown in bold; this coding portion starts at position 1 and ends at position 1659. The transcript also has the following SNPs as listed in
Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P2 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HSAPHOL P3 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSAPHOL_T5. An alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL P3 and AAH21289: l.An isolated chimeric polypeptide encoding for HSAPHOL P3, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP
corresponding to amino acids 63 - 82 of AAH21289, which also corresponds to amino acids 1 - 20 of HSAPHOL P3, and a second amino acid sequence being at least 90 % homologous to
GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVN HATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKY MYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDP HNVDYLLGLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRI DHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPR GNSIFGLAPMLSDTDKKPFTAILYGNGPGYKWGGERENVSMVDYAHNNYQAQS A VPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA SSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 123 - 586 of AAH21289, which also corresponds to amino acids 21 - 484 of HSAPHOL P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2. Comparison report between HSAPHOL_P3 and PPBTJTUMAN: l.An isolated chimeric polypeptide encoding for HSAPHOL P3, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVP corresponding to amino acids 1 - 20 of PPBT HUMAN, which also corresponds to amino acids 1 - 20 of HSAPHOL P3, and a second amino acid sequence being at least 90 % homologous to
GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTAT AYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVN HATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKY MYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDP HNVDYLLGLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRI DHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTPR
GNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSA VPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA SSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 61 - 524 of PPBT_HUMAN, which also corresponds to amino acids 21 - 484 of HSAPHOL_P3, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P3, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise PG, having a structure as follows: a sequence starting from any of amino acid numbers 20-x to 20; and ending at any of amino acid numbers 21+ ((n-2) - x), in which x varies from 0 to n-2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because of manual inspection of known protein localization and/or gene structure, and/or similarity to known proteins.. Variant protein HSAPHOL_P3 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein HSAPHOL P3 is encoded by the following transcript(s): HSAPHOL T5, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSAPHOL T5 is shown in bold; this coding portion starts at position 253 and ends at position 1704. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P3 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HSAPHOL_P4 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSAPHOL T6. An alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description of the relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL_P4 and AAH21289: 1.An isolated chimeric polypeptide encoding for HSAPHOL P4, comprising a first amino acid sequence being at least 90 % homologous to
MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATA YLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNH ATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYM YPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHN VDYLLGLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDH GHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGN SIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVP LRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASS AGSLAAGPLLLALALYPLSVLF corresponding to amino acids 124 - 586 of AAH21289, which also corresponds to amino acids 1 - 463 of HSAPHOL_P4.
Comparison report between HSAPHOL P4 and PPBT_HUMAN: 1.An isolated chimeric polypeptide encoding for HS APHOL P4, comprising a first amino acid sequence being at least 90 % homologous to
MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATA YLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTTRVNH ATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYM YPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLTLDPHN VDYLLGLFEPGDMQYELNRNNVTDPSLSEMVWAIQILRKNPKGFFLLVEGGRIDH GHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGN SIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVP LRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASS
AGSLAAGPLLLALALYPLSVLF corresponding to amino acids 62 - 524 of PPBT_HUMAN, which also corresponds to amino acids 1 - 463 of HSAPHOL P4.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because only one of the two trans-membrane region prediction programs (Tmpred: 1, Tmhmm: 0) has predicted that this protein has a trans-membrane region, but similarity to known proteins suggests a GPI anchor. In addition both signal-peptide prediction programs predict that this protein is a non-secreted protein.. Variant protein HSAPHOL P4 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein HSAPHOL P4 is encoded by the following transcript(s): HSAPHOL T6, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSAPHOL T6 is shown in bold; this coding portion starts at position 215 and ends at position 1603. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the
alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P4 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HSAPHOL_P5 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSAPHOL_T7. An alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL_P5 and AAH21289: 1.An isolated chimeric polypeptide encoding for HSAPHOL_P5, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIM
FLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGG RKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLT LDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSEMVWAIQILRKNPKGFFLLVEG GRIDHGHHEGKAKQALHEAVEM corresponding to amino acids 63 - 417 of AAH21289, which also corresponds to amino acids 1 - 355 of HSAPHOL P5, and a second amino acid sequence being at least 90 % homologous to DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKWGGERENVSM VDYAHNNYQAQS AVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAY AACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 440 - 586 of AAH21289, which also corresponds to amino acids 356 - 502 of HSAPHOL P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2.
Comparison report between HSAPHOL_P5 and PPBT HUMAN: 1 n isolated chimeric polypeptide encoding for HSAPHOL_P5, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIM FLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGG RKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLT LDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEG GRIDHGHHEGKAKQALHEAVEM corresponding to amino acids 1 - 355 of
PPBT_HUMAN, which also corresponds to amino acids 1 - 355 of HSAPHOL_P5, and a second amino acid sequence being at least 90 % homologous to DHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKWGGERENVSM VDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAY AACIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 377 - 524 of PPBT_HUMAN, which also corresponds to amino acids 356 - 502 of HSAPHOL P5, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P5, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise MD, having a structure as follows: a sequence starting from any of amino acid numbers 355-x to 355; and ending at any of amino acid numbers 356+ ((n-2) - x), in which x varies from 0 to n-2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because of manual inspection of known protein localization and/or gene structure and/or similarity to known protein.. Variant protein HSAPHOL P5 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 11 , (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein HSAPHOL P5 is encoded by the following transcript(s): HSAPHOL T7, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSAPHOL_T7 is shown in bold; this coding portion starts at position 253 and ends at position 1758. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P5 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein HSAPHOL P6 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSAPHOL T8. An alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HS APHOL_P6 and AAH21289 : 1.An isolated chimeric polypeptide encoding for HSAPHOL P6, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIM FLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGG RKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLT LDPHNVDYLL corresponding to amino acids 63 - 349 of AAH21289, which also corresponds to amino acids 1 - 287 of HS APHOL P6, and a second amino acid sequence being at least 90 % homologous to GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGG YTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQ AQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGH CAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 395 - 586 of AAH21289, which also corresponds to amino acids 288 - 479 of HSAPHOL_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL_P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally a*t least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably
at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2.
Comparison report between HSAPHOL_P6 and PPBT HUMAN: 1.An isolated chimeric polypeptide encoding for HSAPHOL_P6, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIM FLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGG RKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLT LDPHNVDYLL corresponding to amino acids 1 - 287 of PPBT_HUMAN, which also corresponds to amino acids 1 - 287 of HSAPHOL P6, and a second amino acid sequence being at least 90 % homologous to
GGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGG YTPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQ AQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGH CAPASSAGSLAAGPLLLALALYPLSVLF corresponding to amino acids 333 - 524 of PPBT_HUMAN, which also corresponds to amino acids 288 - 479 of HSAPHOL_P6, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated chimeric polypeptide encoding for an edge portion of HSAPHOL P6, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise LG, having a structure as follows: a sequence starting from any of amino acid numbers 287-x to 287; and ending at any of amino acid numbers 288+ ((n-2) - x), in which x varies from 0 to n-2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other
specialized programs. The variant protein is believed to be located as follows with regard to the cell: membrane. The protein localization is believed to be membrane because both signal-peptide prediction programs predict that this protein has a signal peptide, and at least one of two trans-membrane region prediction programs predicts that this protein has a transmembrane region, also similarity to known proteins suggests a GPI anchor.. Variant protein HSAPHOL_P6 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their position(s) on the amino acid sequence, with the alternative ammo acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein HSAPHOL P6 is encoded by the following transcript(s): HSAPHOL T8, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSAPHOL_T8 is shown in bold; this coding portion starts at position 253 and ends at position 1689. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P6 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein HSAPHOL P7 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSAPHOL_T9. An alignment is given to the known protein (Alkaline phosphatase, tissue-nonspecific isozyme precursor) at the end of the application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL_P7 and AAH21289: 1.An isolated chimeric polypeptide encoding for HS APHOL P7, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIM FLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA GTATAYLCGVKANEGTVGVS AATERSRCNTTQGNEVTSILRWAKDAGKS VGIVTTT RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGG RKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK corresponding to amino acids 63 - 326 of AAH21289, which also corresponds to amino acids 1 - 264 of HSAPHOL_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP corresponding to amino acids 265 - 306 of HSAPHOL_P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSAPHOL_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL_P7. Comparison report between HSAPHOL_P7 and PPBT_HUMAN: 1.An isolated chimeric polypeptide encoding for HS APHOL P7, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIM FLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGG RKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPR corresponding to amino acids 1 - 262 of PPBT HUMAN, which also corresponds to amino acids 1 - 262 of HSAPHOL P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP corresponding to amino acids 263 - 306 of HSAPHOL P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSAPHOL P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence YKLPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL P7.
Comparison report between HSAPHOL_P7 and 075090 (SEQ ID NO: 1428): 1.An isolated chimeric polypeptide encoding for HSAPHOL P7, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIM FLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGG RKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYK corresponding to amino acids 1 - 264 of 075090, which also corresponds to amino acids 1 - 264 of HSAPHOL_P7, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP conesponding to amino acids 265 - 306 of HSAPHOL P7, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSAPHOL_P7, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence LPPRCPLANRVDFSWAGREYRLQTFSKPLIFLANVFLQTQRP in HSAPHOL P7.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSAPHOL P7 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HS APHOL P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Variant protein HSAPHOL P7 is encoded by the following transcript(s): HSAPHOL T9, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HS APHOL T9 is shown in bold; this coding portion starts at position 253 and ends at position 1170. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P7 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Variant protein HSAPHOL P8 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSAPHOL_T10. An alignment is given to the known protein (Alkaline phosphatase, tissue- nonspecific isozyme precursor) at the end ofthe application. One or more alignments to one
or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSAPHOL_P8 and AAH21289: l.An isolated chimeric polypeptide encoding for HSAPHOL P8, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIM FLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT RVNHATPS AAY AHS ADRD WYSDNEMPPE ALSQGCKDI A YQLMHNIRDID VIMGGG RKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLT LDPHNVDYLLG conesponding to amino acids 63 - 350 of AAH21289, which also conesponds to amino acids 1 - 288 of HSAPHOL P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence
KWRGWRGGCMARSLVAGAACGQHLGTRP conesponding to amino acids 289 - 316 of HSAPHOL P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSAPHOL P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL_P8.
Comparison report between HSAPHOL_P8 and PPBT HUMAN: 1.An isolated chimeric polypeptide encoding for HSAPHOL_P8, comprising a first amino acid sequence being at least 90 % homologous to
MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIM FLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGG RKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLT LDPHNVDYLLG conesponding to amino acids 1 - 288 of PPBT_HUMAN, which also conesponds to amino acids 1 - 288 of HSAPHOL_P8, and a second amino acid sequence
being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP conesponding to amino acids 289 - 316 of HSAPHOL P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HS APHOL P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL_P8.
Comparison report between HSAPHOL_P8 and 075090: l.An isolated chimeric polypeptide encoding for HSAPHOL P8, comprising a first amino acid sequence being at least 90 % homologous to MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIM FLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGG RKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKPRYKHSHFIWNRTELLT LDPHNVDYLLG conesponding to amino acids 1 - 288 of 075090, which also conesponds to amino acids 1 - 288 of HSAPHOL P8, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP conesponding to amino acids 289 - 316 of HSAPHOL P8, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSAPHOL P8, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence KWRGWRGGCMARSLVAGAACGQHLGTRP in HSAPHOL_P8.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to
the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSAPHOL_P8 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 17, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL_P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 17 - Amino acid mutations
Variant protein HSAPHOL P8 is encoded by the following transcript(s): HSAPHOL T10, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcπpt HSAPHOL T10 is shown in bold; this coding portion starts at position 253 and ends at position 1200. The transcript also has the following SNPs as listed in Table 18 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSAPHOL P8 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 18 - Nucleic acid SNPs
2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSAPHOL node l 1 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10, HSAPHOL_T5, HSAPHOL_T7, HSAPHOL_T8 and HS APHOL_T9. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts iSegmentfendmgφosmcϊi HSAPHOL T10 149 313 HSAPHOL T5 149 313 HSAPHOL T7 149 313 HSAPHOL T8 149 313 HSAPHOL T9 149 313 Segment cluster HSAPHOL node l 3 according to the present invention is supported by 50 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10, HSAPHOL_T4, HSAPHOL_T7, HSAPHOL_T8 and HSAPHOL_T9. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HSAPHOL_node_15 according to the present invention is supported by 1 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T6. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster HSAPHOL_node_19 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10, HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOL_T7, HSAPHOL_T8 and HSAPHOL_T9. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
Segment cluster HSAPHOL_node_2 according to the present invention is supported by 33 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_Tl 0, HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T7, HSAPHOL T8 and HSAPHOL_T9. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts
Segment cluster HS APHOL_node_21 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10, HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOL T7, HSAPHOL T8 and HSAPHOL_T9. Table 24 below describes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts i»0 mm «*y _ __• At rM.i.' #ss Seg g nt ending positionf HSAPHOL T10 725 900 HSAPHOL T4 560 735 HSAPHOL T5 605 780 HSAPHOL T6 504 679 HSAPHOL T7 725 900 HSAPHOL T8 725 900 HSAPHOL T9 725 900
Segment cluster HSAPHOL_node_23 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10, HSAPHOL T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOL_T7, HSAPHOL_T8 and HSAPHOL_T9. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HSAPHOL_node_26 according to the present invention is supported by 2 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL T10. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HSAPHOL_node_28 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL T4, HSAPHOL T5, HSAPHOL_T6 and HSAPHOL_T7. Table 27 below describes the starting and ending position of this segment on each transcript. Table 27 - Segment location on transcripts „TranscraptΦame , SegmenSstajtogfposition HSAPHOL T4 950 1084 HSAPHOL T5 995 1 129 HSAPHOL T6 894 1028 HSAPHOL T7 1115 1249
Segment cluster HSAPHOL_node_38 according to the present invention is supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOL T7 and HSAPHOL T8. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HSAPHOL_node_40 according to the present invention is supported by 69 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOL T7 and HSAPHOL_T8. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster HSAPHOL_node_42 according to the present invention is supported by 99 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T4, HSAPHOL T5, HSAPHOL_T6, HSAPHOL T7, HSAPHOL_T8 and HSAPHOL_T9. Table 30 below describes the starting and ending position of this segment on each transcript. Table 30 - Segment location on transcripts
According to an optional embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSAPHOL_node_16 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T10, HSAPHOL_T4,
HSAPHOL_T5, HSAPHOL_T6, HSAPHOL_T7, HSAPHOL_T8 and HSAPHOL_T9. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster HSAPHOL_node_25 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL T10, HSAPHOL T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOL_T7 and HSAPHOL_T8. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts aTrrøanscπptiname Segment starting∑posjti 'end HSAPHOL T10 1045 11 14 HSAPHOL T4 880 949 HSAPHOL T5 925 994 HSAPHOL T6 824 893 HSAPHOL T7 1045 11 14 HSAPHOL T8 1045 1 1 14
Segment cluster HSAPHOL_node_34 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6, HSAPHOL T7 and HSAPHOL T8. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HSAPHOL_node_35 according to the present invention is supported by 51 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL_T4, HSAPHOL_T5, HSAPHOL_T6 and HSAPHOL_T8. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts iTranscπpjSEameg '* /θegmen startmg losinona HSAPHOL T4 1 156 1221 HSAPHOL T5 1201 1266 HSAPHOL T6 1 100 1 165 HSAPHOL T8 1 186 1251
Segment cluster HSAPHOL_node_36 according to the present invention is supported by 47 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL T4, HSAPHOL T5, HSAPHOL T6, HSAPHOL_T7 and HSAPHOL_T8. Table 35 below describes the starting and ending position of this segment on each transcript. Table 35 - Segment location on transcripts
Segment cluster HS APHOL_node_41 according to the present invention is supported by 60 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSAPHOL T4, HSAPHOL_T5, HSAPHOL T6, HSAPHOL_T7 and HSAPHOL_T8. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Microanay (chip) data is also available for this gene as follows. As described above with regard to the cluster itself, various oligonucleotides were tested for being differentially expressed in various disease conditions, particularly cancer. The following oligonucleotides were found to hit this segment, shown in Table 37. Table 37 - Oligonucleotides related to this gene Θlιgonucleotιde|name»%- HSAPHOL 0 11 0 Ovarian cancer Ovary
Variant protein alignment to the previously known protein: Sequence name: /tmp/rTOιp70HMr/xEFXPsrVLD: PPBT_HUMAN Sequence documentation: Alignment of: HSAPHOL_P2 x PPBT_HUMAN Alignment segment 1/1: Quality: 4926.00 Escore : 0 Matching length: 507 Total length: 507 Matching Percent Similarity: 99.61 Matching Percent Identity: 99.41 Total Percent Similarity: 99.61 Total Percent Identity: 99.41 Gaps : 0 Alxgnment: 47 LCAEKEKDPKYWRDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTV 96 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 18 LVPEKEKDPKY RDQAQETLKYALELQKLNTNVAKNVIMFLGDGMGVSTV 67
97 TAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAY 146 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 68 TAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSAGTATAY 117 147 LCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT 196 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 118 LCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSVGIVTTT 167 197 RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDV 246 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 168 RVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHNIRDIDV 217 247 IMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDT KSFKPRYKHSH 296 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I 218 IMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDT KSFKPRHKHSH 267 297 FI NRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSEMVWAI 346 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 268 FIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSEMVVVAI 17 . . . . . 347 QILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTS 96 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II 318 QILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQAGSLTS 67 397 SEDTLTWTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGN 46 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 368 SEDTLTWTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFTAILYGN 17 447 GPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPM 96 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 418 GPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAVFSKGPM 67 497 AHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALAL 46 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
468 AHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPLLLALAL
517 547 YPLSVLF
553 I I I I I I I 518 YPLSVLF
524
Sequence name: /tmp/rTOip70HMr/xEFXPsrVLD: AAH21289 Sequence documentation: Alignment of: HSAPHOL_P2 x AAH21289 Alignment segment 1/1: Quality: 5108.00
Escore: 0 Matching length: 553311 Total length: 586 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 90.61 Total Percent Identity: 90.61 Gaps : 1
Alignment : 23 PATPRPLSWLRAPTRLCLDGPSPVLCA 49 I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 PATPRPLS LRAPTRLCLDGPSPVLCAGLEHQLTSDHCQPTPSHPRRLHL 50 50 EKEKDPKYWRDQAQETLK 67 I I I I I I I I I I I I I I I I I I 51 APGIKQVLGCTMISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLK
100 68 YALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLE 117 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 YALELQKLNTNVAKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLE
150 118 MDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSR
167
151 MDKFPFVALSKTYNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSR 200 168 CNTTQGNEVTSILR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYS 217 I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CNTTQGNEVTSILR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYS
250 . . . . . 218 DNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYES
267 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 DNEMPPEALSQGCKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYES 300 268 DEKARGTRLDGLDLVDT KSFKPRYKHSHFI NRTELLTLDPHNVDYLLG 317 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 DEKARGTRLDGLDLVDTWKSFKPRYKHSHFI NRTELLTLDPHNVDYLLG 350 318 LFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDH 367 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 LFEPGDMQYELNRNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDH 400 368 GHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGY 417 I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 GHHEGKAKQALHEAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGY 450 418 TPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAH 467 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 TPRGNSIFGLAPMLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAH
500 . . . . . 468 NNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAA
517 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 501 NNYQAQSAVPLRHETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAA 550 518 CIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF 553 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 551 CIGANLGHCAPASSAGSLAAGPLLLALALYPLSVLF 586
Sequence name: /tmp/pYLJnulFqm/UcqrrsA3UA: PPBT_HUMAN
Sequence documentation:
Alignment of: HSAPHOL_P3 x PPBT_HUMAN
Alignment segment 1/1: Quality: 4615.00
Escore: 0 Matching length: 484 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.79 Total Percent Similarity: 92.37 Total Percent Identity: 92.18 Gaps : 1 Alignment: 1 MISPFLVLAIGTCLTNSLVP 20 I I I I II I I I I I I I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50 21 GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 60 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
100 61 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
110 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 111 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG
160 I I I I II I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I 151 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG
200 161 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL
210 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 211 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 260 I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 DLVDTWKSFKPRHKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 261 RNNVTDPSLSEMWVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 310 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH
350 . . . . . 311 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP
360 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 400 361 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 410 I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 450 11 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 460 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 500 461 SSAGSLAAGPLLLALALYPLSVLF 484 I I I I I I I I I I I I I I I I I I I I I I I I 501 SSAGSLAAGPLLLALALYPLSVLF 524
Sequence name: /tmp/pYLJnulFqm/UcqrrsA3UA: AAH21289
Sequence documentation:
Alignment of: HSAPHOL_P3 x AAH21289
Alignment segment 1/1:
Quality: 4626.00 Escore: 0 Matching length: 484 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 92.37 Total Percent Identity: 92.37 Gaps : 1
Alignment : 1 MISPFLVLAIGTCLTNSLVP 20 I I I I I I I I I I I I I I I I I I I I 63 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 112 21 GMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 60 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
162 61 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 110 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
212 111 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 160 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 213 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG
262 161 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 10 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 12 211 DLVDT KSFKPRYKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELN 60 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 313 DLVDT KSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 62 261 RNNVTDPSLSEMWVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 10 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
363 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 412 311 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 360 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 413 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 462 361 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 410 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 463 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR
512 . . . . . 411 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA
460 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 513 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 562 461 SSAGSLAAGPLLLALALYPLSVLF 484 I I I I I I I I I I I I I I I I I I I I I I II 563 SSAGSLAAGPLLLALALYPLSVLF 586
Sequence name: /tmp/iYbOicGuUc/lM HKKvSld: PPBT_HUMAN Sequence documentation:
Alignment of: HSAPHOL_P4 x PPBT_HUMAN
Alignment segment 1/1: Quality: 4517.00
Escore: 0 Matching length: 463 Total length: 463 Matching Percent Similarity: 100.00 Matching Percent
Identity: 99.78 Total Percent Similarity: 100.00 Total Percent
Identity: 99.78 Gaps : 0
Alignment:
1 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 62 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA 111 51 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSV 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 112 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSV 161 101 GIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQGCKDIAYQLMHN 150 I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 162 GIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHN 211 151 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDT KSFKP 200 I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 212 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKP 261 201 RYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 250 I : I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I 262 RHKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 311 . . . . . 251 MVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 300 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 312 MVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 361 301 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 350 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 362 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 411 351 AILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV ' 400 I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 412 AILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV 461 401 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL 450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
462 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL
511 451 LLALALYPLSVLF
463 512 LLALALYPLSVLF
524
Sequence name: /tmp/iYbOicGuUc/lM HKKvSld: AAH21289 Sequence documentation: Alignment of: HSAPHOL_P4 x AAH21289 Alignment segment 1/1: Quality: 4528.00
Escore: 0 Matching length: 463 Total length: 463 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 124 MGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKTYNTNAQVPDSA
173 51 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILRWAKDAGKSV
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 174 GTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSILR AKDAGKSV
223 101 GIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQGCKDIAYQLMHN
150 I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 224 GIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQGCKDIAYQLMHN 273
151 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDTWKSFKP 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 27 IRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGLDLVDT KSFKP 323 201 RYKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 324 RYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELNRNNVTDPSLSE 373 251 MVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 300 I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 374 MVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALHEAVEMDRAIGQ 423 301 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 424 AGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAPMLSDTDKKPFT 473 351 AILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV 400 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II II I I 474 AILYGNGPGYKWGGERENVSMVDYAHNNYQAQSAVPLRHETHGGEDVAV
523 . . . . . 401 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL
450 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 524 FSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPASSAGSLAAGPL 573 451 LLALALYPLSVLF 463 574 LLALALYPLSVLF 586
Sequence name: /tmp/v0YiupJ4xl/W6HH5Tm6Ym: PPBT_HUMAN Sequence documentation:
Alignment of: HSAPHOL_P5 x PPBT_HUMAN
Alignment segment 1/1: Quality: 4816.00
Escore: 0 Matching length: 502 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.80 Total Percent Similarity: 95.80 Total Percent Identity: 95.61 Gaps : 1 Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 DLVDT KSFKPRHKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELN 300 . . . . .
301 RNNVTDPSLSEMVWAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 301 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 350 351 EAVEM DHSHVFTFGGYTPRGNSIFGLAP
378 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 351 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 400 379 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 428 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 401 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 450 429 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 47! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 451 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA
500 479 SSAGSLAAGPLLLALALYPLSVLF 502 I I I I I I I I I I I I I I I I I I I I I I I I 501 SSAGSLAAGPLLLALALYPLSVLF
524
Sequence name: /tmp/v0YiupJ4xl/W6HH5Tm6Ym: AAH21289
Sequence documentation: Alignment of: HSAPHOL_P5 x AAH21289
Alignment segment 1/1: Quality: 4827.00 Escore: 0 Matching length: 502 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 95.80 Total Percent
Identity: 95.80
Gaps : 1
Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 63 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 112 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
162 . . . . . 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
150 I I I I I I I I I I I I I I I I I I.I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212 151 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 213 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 262 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 12 251 DLVDT KSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 00 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 313 DLVDT KSFKPRYKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELN 62 301 RNNVTDPSLSEMWVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 363 RNNVTDPSLSEMWVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 12 . . . . . 351 EAVEM DHSHVFTFGGYTPRGNSIFGLAP 78 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 413 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 62
379 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR
428 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 463 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR
512 429 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA
478 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 513 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 562 479 SSAGSLAAGPLLLALALYPLSVLF
502 I I I I II I I I I I I I I I I I I I I I I I I 563 SSAGSLAAGPLLLALALYPLSVLF
586
Sequence name: /tmp/LlylqOddii/lFFtdNNCUx : PPBT_HUMAN
Sequence documentation: Alignment of: HSAPHOL_P6 x PPBT_HUMAN Alignment segment 1/1: Quality: 4575.00
Escore: 0 Matching length: 479 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.79 Total Percent Similarity: 91.41 Total Percent Identity: 91.22 Gaps : 1
Alignment : 1 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG . . . . . CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL
DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLL I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I DLVDTWKSFKPRHKHSHFI NRTELLTLDPHNVDYLLGLFEPGDMQYELN GGRIDHGHHEGKAKQALH I I I I I I I I I I I I I I I I I I RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH
EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I EAVEMDRAIGQAGSLTSSEDTLTWTADHSHVFTFGGYTPRGNSIFGLAP
MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR . . . . . HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA
456 SSAGSLAAGPLLLALALYPLSVLF 479 I I I I I I I I I I I I I I I I I I I I I I I I 501 SSAGSLAAGPLLLALALYPLSVLF 524
Sequence name: /tmp/LlylqOddii/lFFtdNNCUx: AAH21289
Sequence documentation:
Alignment of: HSAPHOL_P6 x AAH21289
Alignment segment 1/1: Quality: 4586.00
Escore: 0 Matching length: 479 Total length: 524 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 91.41 Total Percent
Identity: 91.41 Gaps : 1 Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 63 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 112 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
100 I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 162 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
213 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 262 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312 251 DLVDT KSFKPRYKHSHFIWNRTELLTLDPHNVDYLL
287 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 313 DLVDT KSFKPRYKHSHFIWNRTELLTLDPHNVDYLLGLFEPGDMQYELN 362 . . . . . 288 GGRIDHGHHEGKAKQALH
305 I I I I I I I I I I I I I I I I I I 363 RNNVTDPSLSEMVVVAIQILRKNPKGFFLLVEGGRIDHGHHEGKAKQALH 412 306 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 355 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 413 EAVEMDRAIGQAGSLTSSEDTLTVVTADHSHVFTFGGYTPRGNSIFGLAP 462 356 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 405 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 463 MLSDTDKKPFTAILYGNGPGYKVVGGERENVSMVDYAHNNYQAQSAVPLR 512 406 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 455 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 513 HETHGGEDVAVFSKGPMAHLLHGVHEQNYVPHVMAYAACIGANLGHCAPA 562 456 SSAGSLAAGPLLLALALYPLSVLF 479 I I I I I I I I I I I I I I I I I I I I I I I I 563 SSAGSLAAGPLLLALALYPLSVLF 586
Sequence name: /tmp/K05Xam2Hdo/CV0GTdjKc : PPBT HUMAN
Sequence documentation:
Alignment of: HSAPHOL_P7 x PPBT_HUMAN
Alignment segment 1/1: Quality: 2574.00 Escore: 0 Matching length: 264 Total length: 264 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.62 Total Percent Similarity: 100.00 Total Percent Identity: 99.62 Gaps : 0
Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
100 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
150 151 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG
200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 LR AKDAGKSVGIVTTTRVNHATPSAAYAHSADRD YSDNEMPPEALSQG 00 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 50 251 DLVDT KSFKPRYK 64
251 DLVDT KSFKPRHK
264
Sequence name: /tmp/K05Xam2Hdo/CV0GTdjKcW: AAH21289
Sequence documentation:
Alignment of: HSAPHOL_P7 x AAH21289
Alignment segment 1/1: Quality: 2585.00
Escore: 0 Matching length: 264 Total length: 264 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00" Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MISPFLVLAIGTCLTNSLVPEKEKDPKY RDQAQETLKYALELQKLNTNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 63 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 112 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
162 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
212 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG
200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 213 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 62
201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312 251 DLVDTWKSFKPRYK 264 I I I I I I I I I I I I I I 313 DLVDTWKSFKPRYK 326
Sequence name: /tmp/K05Xam2Hdo/CV0GTdjKcW:O75090 Sequence documentation:
Alignment of: HSAPHOL_P7 x 075090 Alignment segment 1/1: Quality: 2585.00
Escore: 0 Matching length: 264 Total length: 264 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment; 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
100 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG
200 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL
250 251 DLVDTWKSFKPRYK
264 I I I I I I I I I I I I I I 251 DLVDTWKSFKPRYK 264
Sequence name: /tmp/H6G7vkGMmy/rSljwUOCll : PPBT_HUMAN
Sequence documentation:
Alignment of: HSAPHOL_P8 x PPBT_HUMAN
Alignment segment 1/1: Quality: 2819.00
Escore: 0 Matching length: 288 Total length: 288 Matching Percent Similarity: 100.00 Matching Percent Identity: 99.65 Total Percent Similarity: 100.00 Total Percent Identity: 99.65 Gaps : 0 Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 . . . . .
51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT 100 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 288 I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I 251 DLVDTWKSFKPRHKHSHFIWNRTELLTLDPHNVDYLLG 288
Sequence name: /tmp/H6G7vkGMmy/rSljwUOCll : AAH21289
Sequence documentation: Alignment of: HSAPHOL_P8 x AAH21289 Alignment segment 1/1: Quality: 2830.00 Escore: 0 Matching length: 288 Total length: 288 Matching Percent Similarity: 100.00 Matching Percent
Identity: 100.00 Total Percent Similarity: 100.00 Total Percent
Identity: 100.00
Gaps : 0
Alignment: 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 63 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 112 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 113 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
162 . . . . . 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 163 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 212 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 213 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 262 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 263 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 312 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 288 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 313 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 350
Sequence name: /tmp/H6G7vkGMmy/rSljwUOCll : 075090
Sequence documentation:
Alignment of: HSAPHOL_P8 x 075090
Alignment segment 1/1:
Quality : 2830 . 00 Escore : 0 Matching length : 288 Total length : 288 Matching Percent Similarity : 100 . 00 Matching Percent Identity : 100 . 00 Total Percent Similarity : 100 . 00 Total Percent Identity : 100 . 00 Gaps: 0
Alignment : 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MISPFLVLAIGTCLTNSLVPEKEKDPKYWRDQAQETLKYALELQKLNTNV 50 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 AKNVIMFLGDGMGVSTVTAARILKGQLHHNPGEETRLEMDKFPFVALSKT
100 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI 150 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 YNTNAQVPDSAGTATAYLCGVKANEGTVGVSAATERSRCNTTQGNEVTSI
150 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG 200 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 151 LRWAKDAGKSVGIVTTTRVNHATPSAAYAHSADRDWYSDNEMPPEALSQG
200 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL
250 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 201 CKDIAYQLMHNIRDIDVIMGGGRKYMYPKNKTDVEYESDEKARGTRLDGL 250 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG
288 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 251 DLVDTWKSFKPRYKHSHFIWNRTELLTLDPHNVDYLLG 288
Schematic presentation of the wild type and new variants of Homo sapiens alkaline phosphatase, liver/bone/kidney (ALPL) HSAPHOL mRNA and protein structures is shown
in Figure 27. Orange boxes indicate the regions, representing exons. Anows represent the introns. Yellow boxes indicate the amino acid coding regions. Green boxes represent the unique amino acids, encoded by the new variants; the number of the unique amino acids in each variant is indicated within each box. The known mRNA and protein is indicated by "WT". The new variants are marked as T10, T4, T6, T5, and T8, respectively. The location of the GPI-anchor and the location ofthe CGEN-oligo are indicated.
Expression of Homo sapiens alkaline phosphatase, liver/bone/kidney (ALPL) HSAPHOL transcripts which are detectable by amplicon as depicted in sequence name HSAPHOL junc2-13 in different normal tissues
Expression of Homo sapiens alkaline phosphatase, liver/bone/kidney (ALPL) transcripts detectable by or according to HSAPHOL junc2- 13 amplicon (SEQ ID NO: 1400) and primers HSAPHOL junc2-13F (SEQ ID NO: 1401) and HSAPHOL junc2-13R (SEQ ID NO: 1402) was measured by real time PCR. In parallel the expression of four housekeeping genes - RPL19 (GenBank Accession No. NM_000981; RPL19 amplicon, SEQ ID NO: 1378), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO: 1393), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon, SEQ ID NO:1390) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA- amplicon, SEQ ID NO: 1369) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean ofthe quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median ofthe quantities ofthe liver samples (Sample Nos. 47-49 above), to obtain a value of relative expression of each sample relative to median of the liver samples. These data are plotted in Figures 28 and 29, in 2 different scales.
HSAPHOL junc2-13F (SEQ ID NO: 1401): GACCCTCGCCAGTGCTCTG HSAPHOL junc2-13R (SEQ ID NO: 1402): GGTGTTGAGCTTCTGAAGCTCC
Amplicon (SEQ ID NO: 1400):
GACCCTCGCCAGTGCTCTGCGCAGAGAAAGAGAAAGACCCCAAGTACTGGCGA
GACCAAGCGCAAGAGACACTGAAATATGCCCTGGAGCTTCAGAAGCTCAACAC
C
Expression of Homo sapiens alkaline phosphatase, liver/bone/kidney (ALPL) HSAPHOL transcripts which are detectable by amplicon as depicted in sequence name HSAPHOL seg26F2R2 in different normal tissues
Expression of Homo sapiens alkaline phosphatase, liver/bone/kidney (ALPL) transcripts detectable by or according to HSAPHOL seg26F2R2 amplicon (SEQ ID NO:
1403)and primers HSAPHOL seg26F2 (SEQ ID NO: 1404) and HSAPHOL seg26R2 (SEQ ID NO: 1405) was measured by real time PCR. In parallel the expression of four housekeeping genes - RPL19 (GenBank Accession No. NM_000981; RPL19 amplicon, SEQ ID NO: 1378), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO: 1393), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin- amplicon, SEQ ID NO: 1390) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO: 1369) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median ofthe quantities ofthe liver samples (Sample Nos. 47-49 above), to obtain a value of relative expression of each sample relative to median ofthe liver samples. These data are plotted in Figure 30.
HSAPHOL seg26F2 (SEQ ID NO: 1404): GCACAAGTGACAGCGGTACG HSAPHOL seg26R2 (SEQ ID NO: 1405): GAGCTGACTCCAGGTCCCAG Amplicon (SEQ ID NO: 1403):
GCACAAGTGACAGCGGTACGGCCCAGGCAAGTTTGAGCCCTGGCTGGGAACTG GGACTTAACAGCTCCTGGGCTATGGAGCCTGGGACCTGGAGTCAGCTC
Expression of Homo sapiens alkaline phosphatase, liver/bone/kidney (ALPL) HSAPHOL transcripts which are detectable by amplicon as depicted in sequence name HSAPHOL seg38 in different normal tissues Expression of Homo sapiens alkaline phosphatase, liver/bone/kidney (ALPL) transcripts detectable by or according to HSAPHOL seg38 amplicon (SEQ ID NO: 1406) and primers HSAPHOL seg38F (SEQ ID NO: 1407) and HSAPHOL seg38R (SEQ ID NO: 1408) was measured by real time PCR. These transcripts relate to the sequence ofthe known (WT or wild type) protein. In parallel the expression of four housekeeping genes - RPL19 (GenBank Accession No. NM_000981 ; RPL 19 amplicon, SEQ ID NO : 1378), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO: 1393), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon, SEQ ID NO: 1390) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA-amplicon, SEQ ID NO: 1369) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean ofthe quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median ofthe quantities ofthe liver samples (Sample Nos. 47-49 above), to obtain a value of relative expression of each sample relative to median ofthe Liver samples. These data are plotted in Figure 31.
Forward prime HSAPHOL seg38F (SEQ ID NO: 1407): TGGCCCCCATGCTGAGT
Reverse primer HSAPHOL seg38R (SEQ ID NO: 1408): CGTTCACCGCCCACCA
Amplicon (SEQ ID NO: 1406):
TGGCCCCCATGCTGAGTGACACAGACAAGAAGCCCTTCACTGCCATCCTGTATG GCAATGGGCCTGGCTACAAGGTGGTGGGCGGTGAACG
Subsection M: CRP variants C-reactive protein (CRP, named for its capacity to precipitate the somatic C- polysaccharide of Streptococcus pneumoniae) is a member of the acute-phase reactant proteins and its serum level rises in response to inflammation, infection, and tissue damage.
Therefore it is a non-specific marker for a long list of diseases, yet it makes a powerful
contribution to the management and monitoring of these diseases. CRP reflects ongoing inflammation and/or tissue damage much more accurately than do other laboratory parameters of the acute-phase response (J. Clin. Invest. 111 :1805-1812). The clinical conditions in which CRP serum level is used for diagnostic puφoses are: 1. Screening test for an organic disease 2. Assessment of disease activity in inflammatory conditions (Juvenile rheumatoid arthritis, Rheumatoid arthritis, Ankylosing spondylitis, Reiter disease, Psoriatic arthropathy, Vasculitides Behcet syndrome, Wegener granulomatosis, Polyarteritis nodosa, Poly yalgia rheumatica, Crohn's disease, Rheumatic fever, Familial fevers including familial Meditenanean fever, Acute pancreatitis) 3. Diagnosis and management of infection (Bacterial endocarditis, Neonatal septicemia and meningitis, Intercunent infection in systemic lupus erythematosus, Intercunent infection in leukemia and its treatment, Postoperative complications including infection, and thromboembolism) 4. Differential diagnosis/classification of inflammatory disease (Systemic lupus erythematosus vs. rheumatoid arthritis, Crohn disease vs. ulcerative colitis). 5. Tissue necrosis itself is a potent acute-phase stimulus, therefore following myocardial infarction, there is a major CRP response, the magnitude of which reflects the extent of myocardial necrosis (Br. Heart J. 47:239-243). The peak CRP values at around 48 hours after the onset powerfully predict outcome after myocardial infarction (Eur. Heart J. 17: 1345-1349). Plasma CRP is produced only by hepatocytes, predominantly under transcriptional control by the cytokine IL-6. CRP base-line level is in the range of a few (<10) mg/1. Yet, following an acute-phase stimulus, values usually increase significantly to more than 500 mg/1 (more than 100 times more). Its hepatic synthesis starts very rapidly after a single stimulus and serum concentrations rising above 5 mg/1 by about 6 hours and peaking around 48 hours. CRP plasma half-life is about 19 hours, which does not change in different disease conditions. The sole determinant of circulating CRP concentration is the synthesis rate (J.
Clin. Invest. 91 :1351-1357), which directly reflects the intensity of the pathological process.
Suφrisingly, in view of the rapid and significant rise in CRP levels as part of the acute response, subjects in the general population tend to have stable base-line CRP concentrations characteristic for each individual. In the last decade new diagnostic properties were associated with CRP. Unlike previous usages where serum CRP level in diseases state appeared to be of a diagnostic value, it was found and repeatedly verified that base-line CRP level is a predictor for major diseases. Diagnostic properties associated with CRP base-line level are (Cardiol Clin. 2003
Aug;21(3):327-31) 1. Coronary artery disease, non-fatal and fatal 2. Stroke 3. Progression of peripheral vascular disease 4. Development of Congestive Heart Failure (Acta Cardiol. 2004 Apr;59(2):217-8) 5. Sudden Cardiac Death. 6. Poor prognosis in severe unstable angina (J. Clin. Invest. 91 :1351-1357) 7. Poor prognosis after angioplasty. The relative risk of coronary event is 2.0 for single base-line CRP concentration >2.4 mg/1 versus <1 mg/1 (based on Meta-analysis of all published studies up to the year 2000, comprising a total of 1,953 coronary events). In January 2003, a comprehensive review and guidelines recommendations for the use of CRP measurements was published, sponsored by the Center for Diseases Control and Prevention and the American Heart Association (Circulation. 2003;107:499-51 1). Therefore, new laboratory tests to accurately identify base-line low CRP levels were developed. These tests measure hsCRP - "high-sensitivity" CRP. The "high sensitivity" refers simply to the lower detection limit of the assay procedures being used. The actual CRP analyte, the plasma protein that is being measured, is the same regardless of the assay range. The mechanisms responsible for the low-grade upregulation of CRP production that predicts coronary events, stroke and other cardiovascular diseases as unstable angina are unknown (N. Engl. J. Med. 331 :417-424, N. Engl. J. Med. 336:973-979).
CRP base-line level has a disadvantage as a marker - it is affected by inelevant factors including: 1. Basal Metabolic Index - BMI (Eur. Heart J. 20:954-959) 2. Weight loss (e.g. due to Diet) 3. Insulin resistance 4. Oral contraceptive use (Fibrinolysis and Proteolysis. 13:239-244) and systemic, but not transdermal, postmenopausal hormone replacement therapy (Circulation. 100:717-722, Circulation. 100:713-716) 5. Physical exercise (Epidemiology. 13:561-568) 6. Moderate alcohol consumption (Lancet. 357:763-767) 7. Periodontal disease 6. Smoking (Eur. Heart J. 20:954-959). HMG CoA-reductase inhibitors (statins), reduce CRP values, independently of their effects on lipid profiles (Circulation. 100:230-235). The mechanism is not known but recent studies suggest that statins reduce the risk of future cardiovascular events to the same extent in patients with raised LDL cholesterol values and in those with normal LDL but with baseline CRP concentrations above the median (N. Engl. J. Med. 344:1959-1965). Therefore, CRP may become an indication for prophylactic antiatherosclerotic therapy in otherwise apparently low-risk individuals and populations. The present invention provides CRP variants, which may optionally be used as diagnostic markers. Preferably these CRP variants are useful as diagnostic markers for the following conditions, among others: Assessment of disease activity in inflammatory conditions (Juvenile rheumatoid arthritis, Rheumatoid arthritis, Ankylosing spondylitis, Reiter disease, Psoriatic arthropathy, Vasculitides Behcet syndrome, Wegener granulomatosis, Polyarteritis nodosa, Polymyalgia rheumatica, Crohn's disease, Rheumatic fever, Familial fevers including familial Meditenanean fever, Acute pancreatitis); Diagnosis and management of infection (Bacterial endocarditis, Neonatal septicemia and meningitis, Intercunent infection in systemic lupus erythematosus, Intercunent infection in leukemia and its treatment, Postoperative complications including infection, and thromboembolism); Differential diagnosis/classification of inflammatory disease (Systemic lupus erythematosus vs. rheumatoid arthritis, Crohn disease vs. ulcerative colitis); Tissue necrosis after myocardial infarction and/or outcome of such an infarction. This first group is described as preferably being used for measurement of differential CRP variant levels as a diagnostic marker. Optionally and preferably, baseline levels of a CRP variant may also be used as a diagnostic marker for the following conditions: 1. Coronary artery disease, non-fatal and fatal 2. Stroke 3. Progression of peripheral vascular disease 4. Development of Congestive Heart Failure 5. Sudden Cardiac Death. 6. Poor prognosis in severe unstable angina 7. Poor
prognosis after angioplasty. Also optionally and preferably, low-grade upregulation of CRP variant production may be detected for predicting coronary events, stroke and cerebrovascular events, and other cardiovascular diseases such as unstable angina. This second group is described as preferably being used for measurement of baseline CRP variant levels as a diagnostic marker. In combination, both the differential variant markers and the baseline variant markers are collectively described as "CRP variant disease markers".
DESCRIPTION FOR CLUSTER HSCREACT Cluster HSCREACT features 10 transcript(s) and 55 segment(s) of interest, the names for which are given in Tables 1 and 2, respectively, the sequences themselves are given at the end ofthe application. The selected protein variants are given in table 3. Table 1 - Transcripts of interest
Table 2 - Segments of interest
Table 3 - Proteins of interest Protgmi ame ** IX' » Sequence ID No. !________* m HSCREACT PEA 1 P9 1361 HSCREACT PEA 1 P10 1362
These sequences are variants ofthe known protein C-reactive protein precursor (SwissProt accession identifier CRP_HUMAN), SEQ ID NO: 1360, refened to herein as the previously known protein. Protein C-reactive protein precursor is known or believed to have the following function(s): displays several functions associated with host defense: it promotes agglutination, bacterial capsular swelling, phagocytosi and complement fixation through its calcium-dependent binding to phosphorylcholine. Can interact with DNA and histones and may scavenge nuclear material released from damaged circulating cells. The sequence for protein C-reactive protein precursor is given at the end ofthe application, as "C-reactive protein precursor amino acid sequence". Known polymoφhisms for this sequence are as shown in Table 4. Table 4 - Amino acid mutations for Known Protein
* This conflict is caused by two SNPs, deletion and insertion, that locally change the reading frame. Protein C-reactive protein precursor localization is believed to be Secreted. Also, the concentration of C-reactive protein precursor in plasma increases greatly during acute phase response to tissue injury, infection or other inflammatory stimuli. It is induced by IL-1 and IL-6. The previously known protein also has the following indication(s) and/or potential therapeutic use(s): Infarction, myocardial; Surgery adjunct; Coronary artery bypass grafting; Systemic inflammatory response syndrome. It has been investigated for clinical/therapeutic use in humans, for example as a target for an antibody or small molecule, and/or as a direct therapeutic; available information related to these investigations is as follows. Potential pharmaceutically related or therapeutically related activity or activities ofthe previously
known protein are as follows: Polymoφhonuclear neutrophil inhibitor. A therapeutic role for a protein represented by the cluster has been predicted. The cluster was assigned this field because there was information in the drug database or the public databases (e.g., described herein above) that this protein, or part thereof, is used or can be used for a potential therapeutic indication: Septic shock treatment; Cardiovascular. The following GO Annotation(s) apply to the previously known protein. The following annotation(s) were found: acute-phase response; inflammatory response, which are annotation(s) related to Biological Process; ligand binding or canier, which are annotation(s) related to Molecular Function; and extracellular space, which are annotation(s) related to Cellular Component. The GO assignment relies on information from one or more ofthe SwissProt/TremBl Protein knowledgebase, available from <http://www.expasy.ch/sprot/>; or Locuslink, available from <http://www.ncbi.nlm.nih.gov/projects/LocusLink/>. As noted above, cluster HSCREACT features 10 transcript(s), which were listed in Table 1 above. These transcript(s) encode for protein(s) which are variant(s) of protein C- reactive protein precursor. A description of each variant protein according to the present invention is now provided.
Variant protein HSCREACT_PEA_1_P9 according to the present invention has an amino acid sequence as given at the end of the application; it is encoded by transcript(s) HSCREACTJPEA 1 T12. An alignment is given to the known protein (C-reactive protein precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSCREACTJPEA 1 P9 and CRP_HUMAN: l .An isolated chimeric polypeptide encoding for HSCREACT PEA 1 P9, comprising a first amino acid sequence being at least 90 % homologous to MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKAFTVCLHF YTELSST conesponding to amino acids 1 - 64 of CRP HUMAN, which also conesponds to amino acids 1 - 64 of HSCREACT PEA 1 P9, second (bridging) amino acid sequence comprising H, and a third amino acid sequence being at least 90 % homologous to
EINTIYLGGPFSPNVLNWRALKYEVQGEVFTKPQLWP conesponding to amino acids 188 - 224 of CRP_HUMAN, which also conesponds to amino acids 66 - 102 of HSCREACT PEA 1 P9, wherein said first, second and third amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for an edge portion of HSCREACT_PEA_1_P9, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise THE having a structure as follows (numbering according to HSCREACT PEA 1 P9): a sequence starting from any of amino acid numbers 64-x to 64; and ending at any of amino acid numbers 66 + ((n-2) - x), in which x varies from 0 to n-2. The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSCREACT_PEA_1_P9 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 5, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCREACT PEA 1 P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 5 - Amino acid mutations
Variant protein HSCREACT_PEA_1_P9 is encoded by the following transcript(s): HSCREACT_PEA_1_T12, for which the sequence(s) is/are given at the end ofthe
application. The coding portion of transcript HSCREACT PEA 1 T12 is shown in bold; this coding portion starts at position 1 17 and ends at position 422. The transcript also has the following SNPs as listed in Table 6 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCREACT_PEA_1_P9 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 6 - Nucleic acid SNPs
Variant protein HSCREACT_PEA_1_P10 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSCREACT PEA 1 T13. An alignment is given to the known protein (C-reactive protein precursor) at the end ofthe application. One or more alignments to one or more previously
published protein sequences are given at the end of the application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCREACT PEA P10 and CRP HUMAN: l .An isolated chimeric polypeptide encoding for HSCREACT PEA 1 P10, comprising a first amino acid sequence being at least 90 % homologous to MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKAFTVCLHF YTELSSTRG conesponding to amino acids 1 - 66 of CRP_HUMAN, which also conesponds to amino acids 1 - 66 of HSCREACT PEA 1 P10.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSCREACT_PEA_1_P10 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 7, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCREACT_PEA_1_P10 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 7 - Amino acid mutations
Variant protein HSCREACT_PEA_1_P10 is encoded by the following transcript(s): HSCREACT_PEA_1_T13, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSCREACT_PEA_1_T13 is shown in bold; this coding portion starts at position 117 and ends at position 314. The transcript also has the following SNPs as listed in Table 8 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP
is known or not; the presence of known SNPs in variant protein HSCREACT PEA I PIO sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 8 - Nucleic acid SNPs
Variant protein HSCREACT PEA 1 P12 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSCREACT PEA I TI 5. An alignment is given to the known protein (C-reactive protein precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows: Comparison report between HSCREACT_PEA_1_P12 and CRP_HUMAN:
l.An isolated chimeric polypeptide encoding for HSCREACT_PEA_1_P12, comprising a first amino acid sequence being at least 90 % homologous to MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKAFTVCLHF YTELSSTRG conesponding to amino acids 1 - 66 of CRP_HUMAN, which also conesponds to amino acids 1 - 66 of HSCREACT PEA 1 P12, and a second amino acid sequence being at least 90 % homologous to PNVLNWRALKYEVQGEVFTKPQLWP conesponding to amino acids 200 - 224 of CRP_HUMAN, which also conesponds to amino acids 67 - 91 of HSCREACT PEA 1 P12, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated chimeric polypeptide encoding for an edge portion of HSCREACT_PEA_1_P12, comprising a polypeptide having a length "n", wherein n is at least about 10 amino acids in length, optionally at least about 20 amino acids in length, preferably at least about 30 amino acids in length, more preferably at least about 40 amino acids in length and most preferably at least about 50 amino acids in length, wherein at least two amino acids comprise GP, having a structure as follows: a sequence starting from any of amino acid numbers 66-x to 66; and ending at any of amino acid numbers 67 + ((n-2) - x), in which x varies from 0 to n-2.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSCREACT_PEA_1_P 12 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 9, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCREACT_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 9 - Amino acid mutations
Variant protein HSCREACT_PEA_1_P12 is encoded by the following transcript(s): HSCREACT PEA 1 T15, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSCREACT_PEA_1_T15 is shown in bold; this coding portion starts at position 1 17 and ends at position 389. The transcript also has the following SNPs as listed in Table 10 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCREACT_PEA_1_P12 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 10 - Nucleic acid SNPs
Variant protein HSCREACT_PEA_1_P16 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSCREACT_PEA_1_T22. An alignment is given to the known protein (C-reactive protein precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship of the variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSCREACT_PEA_1_P16 and CRP_HUMAN: 1.An isolated chimeric polypeptide encoding for HSCREACT PEA 1 P16, comprising a first amino acid sequence being at least 90 % homologous to MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKAFTVCLHF YTELSSTRGYSIFSYATKRQDNEILIFWSKDIGYSFTVGGSEILFEVPEVTVAPVHICTS WESASGIVEFWVDGKPRVRKSLKKGYTVGAEASIILGQEQDSF conesponding to amino acids 1 - 160 of CRP HUMAN, which also conesponds to amino acids 1 - 160 of HSCREACT PEA 1 P16, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence VSESGHWPGVWFGSRVLIIMS conesponding to amino acids 161 - 181 of HSCREACT_PEA_1_P16, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2. An isolated polypeptide encoding for a tail of HSCREACT PEA 1 P16, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95% homologous to the sequence VSESGHWPGVWFGSRVLIIMS in HSCREACT_PEA_1 _P 16.
The location of the variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region..
Variant protein HSCREACT_PEA_1_P16 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 1 1, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCREACT_PEA_1_P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 11 - Amino acid mutations
Variant protein HSCREACT PEA 1 P16 is encoded by the following transcπpt(s): HSCREACT_PEA_1_T22, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSCREACT_PEA_1_T22 is shown in bold; this coding portion starts at position 117 and ends at position 659. The transcript also has the following SNPs as listed in Table 12 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCREACT_PEA_1_P16 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 12 - Nucleic acid SNPs
Variant protein HSCREACT_PEA_1_P22 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSCREACT_PEA_1_T29. An alignment is given to the known protein (C-reactive protein precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end of the application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSCREACT_PEA_1_P22 and CRP_HUMAN: l.An isolated chimeric polypeptide encoding for HSCREACT_PEA_1_P22, comprising a first amino acid sequence being at least 90 % homologous to MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKAFTVCLHF YTELSSTRG conesponding to amino acids 1 - 66 of CRP_HUMAN, which also conesponds to amino acids 1 - 66 of HSCREACT PEA 1 P22, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence AFLILWLFWETPPLFHTNLVGL conesponding to amino acids 67 - 88 of HSCREACT_PEA_1_P22, wherein said first and second amino acid sequences are contiguous and in a sequential order. 2.An isolated polypeptide encoding for a tail of HSCREACT_PEA_1_P22, comprising a polypeptide being at least 70%, optionally at least about 80%, preferably at least about 85%, more preferably at least about 90% and most preferably at least about 95%
homologous to the sequence AFLILWLFWETPPLFHTNLVGL in HSCREACT PEA 1 P22.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSCREACT_PEA_1_P22 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 13, (given according to their posιtion(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCREACT PEA 1 P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 13 - Amino acid mutations
Variant protein HSCREACT_PEA_1_P22 is encoded by the following transcript(s): HSCREACT PEA 1 T29, for which the sequence(s) is/are given at the end ofthe application. The coding portion of transcript HSCREACT PEA 1 T29 is shown in bold; this coding portion starts at position 117 and ends at position 380. The transcript also has the following SNPs as listed in Table 14 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCREACT PEA 1 P22 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 14 - Nucleic acid SNPs
Variant protein HSCREACT PEA 1 P28 according to the present invention has an amino acid sequence as given at the end ofthe application; it is encoded by transcript(s) HSCREACT PEA 1 T33. An alignment is given to the known protein (C-reactive protein precursor) at the end ofthe application. One or more alignments to one or more previously published protein sequences are given at the end ofthe application. A brief description ofthe relationship ofthe variant protein according to the present invention to each such aligned protein is as follows:
Comparison report between HSCREACTJPEA_1_P28 and CRP HUMAN: l.An isolated chimeric polypeptide encoding for HSCREACT PEA 1 P28, comprising a first amino acid sequence being at least 90 % homologous to MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKAFTVCLHF YTELSST conesponding to amino acids 1 - 64 of CRP HUMAN, which also conesponds to amino acids 1 - 64 of HSCREACT PEA 1 P28, and a second amino acid sequence being at least 70%, optionally at least 80%, preferably at least 85%, more preferably at least 90% and most preferably at least 95% homologous to a polypeptide having the sequence LLS conesponding to amino acids 65 - 67 of HSCREACT PEA 1 P28, wherein said first and second amino acid sequences are contiguous and in a sequential order.
The location ofthe variant protein was determined according to results from a number of different software programs and analyses, including analyses from SignalP and other specialized programs. The variant protein is believed to be located as follows with regard to the cell: secreted. The protein localization is believed to be secreted because both signal- peptide prediction programs predict that this protein has a signal peptide, and neither transmembrane region prediction program predicts that this protein has a trans-membrane region.. Variant protein HSCREACT_PEA_1_P28 also has the following non-silent SNPs (Single Nucleotide Polymoφhisms) as listed in Table 15, (given according to their position(s) on the amino acid sequence, with the alternative amino acid(s) listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCREACT PEA 1 P28 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 15 - Amino acid mutations
Variant protein HSCREACT_PEA_1_P28 is encoded by the following transcript(s): HSCREACT_PEA_1_T33, for which the sequence(s) is/are given at the end of the application. The coding portion of transcript HSCREACT PEA 1 T33 is shown in bold; this coding portion starts at position 1 17 and ends at position 317. The transcript also has the following SNPs as listed in Table 16 (given according to their position on the nucleotide sequence, with the alternative nucleic acid listed; the last column indicates whether the SNP is known or not; the presence of known SNPs in variant protein HSCREACT PEA 1 P28 sequence provides support for the deduced sequence of this variant protein according to the present invention). Table 16 - Nucleic acid SNPs
Table 2 above and for which the sequence(s) are given at the end ofthe application. These segment(s) are portions of nucleic acid sequence(s) which are described herein separately because they are of particular interest. A description of each segment according to the present invention is now provided.
Segment cluster HSCREACT_PEA_l_node_63 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT PEA 1 T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT_PEAJ_T38 and HSCREACT_PEA_1_T39. Table 17 below describes the starting and ending position of this segment on each transcript. Table 17 - Segment location on transcripts s<3W &__i_&£!S®α ?Segmentendmg4posιUon. HSCREACT PEA 1 T12 1446 1582 HSCREACT PEA 1 T13 1450 1586 HSCREACT PEA 1 T15 1413 1549 HSCREACT PEA 1 T22 1 154 1290 HSCREACT PEA 1 T29 604 740 HSCREACT PEA 1 T30 705 841 HSCREACT PEA 1 T32 738 874 HSCREACT PEA 1 T33 600 736 HSCREACT PEA 1 T38 479 615 HSCREACT PEA 1 T39 524 660 According to an optiona embodiment ofthe present invention, short segments related to the above cluster are also provided. These segments are up to about 120 bp in length, and so are included in a separate description.
Segment cluster HSCREACT_PEA_l_node_10 according to the present invention is supported by 47 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 18 below describes the starting and ending position of this segment on each transcript. Table 18 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_l 1 according to the present invention is supported by 46 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 19 below describes the starting and ending position of this segment on each transcript. Table 19 - Segment location on transcripts cffite, „U& * „*_, Λ-<φ *■ "M ranscπptname X. Segmen taζting position iSegmenfeenduϊgposition m HSCREACT PEA 1 T12 284 309 HSCREACT PEA 1 T13 284 309 HSCREACT PEA 1 T15 284 309 HSCREACT PEA 1 T22 284 309 HSCREACT PEA 1 T29 284 309 HSCREACT PEA 1 T30 284 309 HSCREACT PEA 1 T32 284 309 HSCREACT PEA 1 T33 284 309 HSCREACT PEA 1 T38 284 309
HSCREACT PEA 1 T39 284 309
Segment cluster HSCREACT_PEA_l_node_12 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30 and HSCREACT_PEA_1_T38. Table 20 below describes the starting and ending position of this segment on each transcript. Table 20 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_13 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T22. Table 21 below describes the starting and ending position of this segment on each transcript. Table 21 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_14 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T22. Table 22 below describes the starting and ending position of this segment on each transcript. Table 22 - Segment location on transcripts
I Transcripftname ' ",„ ; . &g tSegmenfcstartmg'position ** • ; Segment endmg positions? HSCREACT PEA l T22 346 351
Segment cluster HSCREACT_PEA_l_node_l5 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described.
This segment can be found in the following transcπpt(s): HSCREACT_PEA_1_T22. Table 23 below describes the starting and ending position of this segment on each transcript. Table 23 - Segment location on transcripts Segment starting.position Segment endmg position HSCREACT PEA 1 T22 352 384
Segment cluster HSCREACT PEA l node lό according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T22. Table 24 below descπbes the starting and ending position of this segment on each transcript. Table 24 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_17 according to the present invention can be found in the following transcript(s): HSCREACT PEA 1 T22. Table 25 below describes the starting and ending position of this segment on each transcript. Table 25 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_18 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T22. Table 26 below describes the starting and ending position of this segment on each transcript. Table 26 - Segment location on transcripts
Segment cluster HSCREACT PEA l node l 9 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T22. Table 27 below describes the starting and ending position of this segment on each transcript.
Table 27 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_2 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 28 below describes the starting and ending position of this segment on each transcript. Table 28 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_20 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T22. Table 29 below describes the starting and ending position of this segment on each transcript. Table 29 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_21 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T22. Table 30 below describes the starting and ending position of this segment on each transcript.
Table 30 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_22 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T22. Table 31 below describes the starting and ending position of this segment on each transcript. Table 31 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_23 according to the present invention can be found in the following transcript(s): HSCREACT PEA 1 T22. Table 32 below describes the starting and ending position of this segment on each transcript. Table 32 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_24 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT PEA 1 T22. Table 33 below describes the starting and ending position of this segment on each transcript. Table 33 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_3 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described. T is segment can be found m the following transcript(s): HSCREACT PEA 1 T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22,
HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 34 below describes the starting and ending position of this segment on each transcript. Table 34 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_30 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T32 and HSCREACT_PEA_1_T39. Table 35 below describes the starting and ending position of this segment on each franscript. Table 35 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_31 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 36 below describes the starting and ending position of this segment on each transcript. Table 36 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_32 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 37 below describes the starting and ending position of this segment on each franscript. Table 37 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_33 according to the present invention can be found in the following franscript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 38 below describes the starting and ending position of this segment on each transcript. Table 38 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_34 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 39 below describes the starting and ending position of this segment on each transcript. Table 39 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_35 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 40 below describes the starting and ending position of this segment on each transcript. Table 40 - Segment location on transcripts $_ ftø» °__??_ HSCREACT PEA 1 T12 433 447 HSCREACT PEA 1 T13 437 451 HSCREACT PEA 1 T15 400 414 HSCREACT PEA 1 T30 400 414 HSCREACT PEA 1 T32 433 447 HSCREACT PEA 1 T38 400 414 HSCREACT PEA 1 T39 433 447
Segment cluster HSCREACT_PEA_l_node_36 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12,
HSCREACT_PEA_1_T13 and HSCREACT_PEA_1_T15. Table 41 below describes the starting and ending position of this segment on each franscript. Table 41 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_37 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT PEA 1 T12, HSCREACT_PEA_1_T13 and HSCREACT_PEA_1_T15. Table 42 below describes the starting and ending position of this segment on each franscript. Table 42 - Segment location on transcripts ΦranscHptfname" ll. iSegment stattιn£ po&sjtιqnJ Segmenjrø gϊBosition HSCREACT PEA 1 T12 487 544 HSCREACT PEA 1 T13 491 548 HSCREACT PEA 1 T15 454 51 1
Segment cluster HSCREACT_PEA_l_node_38 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT PEA 1 T12, HSCREACT PEA 1 T13 and HSCREACT_PEA_1_T15. Table 43 below describes the starting and ending position of this segment on each transcript. < Table 43 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_39 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12,
HSCREACT_PEA_1_T13 and HSCREACT_PEA_1_T15. Table 44 below describes the starting and ending position of this segment on each transcript. Table 44 - Segment location on transcripts *,Transbriptoame ' i * HSCREACT PEA 1 T12 583 620 HSCREACT PEA 1 T13 587 624 HSCREACT PEA 1 T15 550 587
Segment cluster HSCREACT_PEA_l_node_4 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT PEA T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 45 below describes the starting and ending position of this segment on each franscript. Table 45 - Segment location on transcripts sTranscπp name &.* tSegmenfstaiJtingsposition' HSCREACT PEA 1 T12 105 1 10 HSCREACT PEA 1 T13 105 110 HSCREACT PEA 1 T15 105 1 10 HSCREACT PEA 1 T22 105 1 10 HSCREACT PEA 1 T29 105 1 10 HSCREACT PEA 1 T30 105 1 10 HSCREACT PEA 1 T32 105 110 HSCREACT PEA 1 T33 105 1 10 HSCREACT PEA 1 T38 105 1 10 HSCREACT PEA 1 T39 105 1 10 Segment cluster HSCREACT_PEA_l_node_40 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13 and HSCREACT_PEA_1_T15. Table 46 below describes the starting and ending position of this segment on each transcript. Table 46 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_41 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13 and HSCREACT_PEA_1_T15. Table 47 below describes the starting and ending position of this segment on each transcript. Table 47 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_42 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT PEA 1 T12, HSCREACT_PEA_1_T13 and HSCREACT_PEA_1_T15. Table 48 below describes the starting and ending position of this segment on each transcript. Table 48 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_43 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT PEA 1 T13 and HSCREACT_PEA_1_T15. Table 49 below describes the starting and ending position of this segment on each transcript. Table 49 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_44 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13 and HSCREACT_PEA_1_T15. Table 50 below describes the starting and ending position of this segment on each transcript. Table 50 - Segment location on transcripts -cΛ__. <«t*. it m. - 4 |Sfgment%taπmg|posιtιon| HSCREACT PEA 1 T12 758 783 HSCREACT PEA 1 T13 762 787 HSCREACT PEA 1 T15 725 750
Segment cluster HSCREACT_PEA_l_node_45 according to the present invention is supported by 44 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT PEA 1 T12, HSCREACT_PEA_1_T13 and HSCREACT_PEA_1_T15. Table 51 below describes the starting and ending position of this segment on each transcript. Table 51 - Segment location on transcripts firanscπpfenamei SegmentχStarting*pόsitic HSCREACT PEA 1 T12 784 836 HSCREACT PEA 1 T13 788 840 HSCREACT PEA 1 T15 751 803
Segment cluster HSCREACT_PEA_l_node_46 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13 and HSCREACT_PEA_1_T15. Table 52 below describes the starting and ending position of this segment on each transcript. Table 52 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_47 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcπpt(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15 and HSCREACT_PEA_1_T22. Table 53 below describes the starting and ending position of this segment on each franscript. Table 53 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_48 according to the present invention is supported by 39 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15 and HSCREACT_PEA_1_T22. Table 54 below describes the starting and ending position of this segment on each franscript. Table 54 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_49 according to the present invention can be found in the following transcript(s): HSCREACT PEA 1 T12, HSCREACT PEA 1 T13, HSCREACT_PEA_1_T15 and HSCREACT PEA 1 T22. Table 55 below describes the starting and ending position of this segment on each transcript. Table 55 - Segment location on transcripts
•sTranscπpt namejjϊ s Segmentfstarting position ' sg». Segment endmg position HSCREACT PEA 1 T12 970 992 HSCREACT PEA 1 T13 974 996 HSCREACT PEA 1 T15 937 959 HSCREACT PEA 1 T22 678 700
Segment cluster HSCREACT_PEA_l_node_5 according to the present invention is supported by 47 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 56 below describes the starting and ending position of this segment on each transcript. Table 56 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_50 according to the present invention can be found in the following franscript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15 and HSCREACT_PEA_1_T22. Table 57 below describes the starting and ending position of this segment on each franscript. Table 57 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_51 according to the present invention is supported by 37 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCREACT_PEA_1_T12,
HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15 and HSCREACT_PEA_1_T22. Table 58 below describes the starting and ending position of this segment on each transcript. Table 58 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_52 according to the present invention is supported by 38 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCREACT PEA 1 T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15 and HSCREACT_PEA _T22. Table 59 below describes the starting and ending position of this segment on each transcript. Table 59 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_53 according to the present invention is supported by 42 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCREACT PEA 1 T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32 and HSCREACT PEA 1 T33. Table 60 below describes the starting and ending position of this segment on each franscript. Table 60 - Segment location on transcripts Transcπpt name ΛM HSCREACT PEA 1 T12 1156 1183 HSCREACT PEA 1 T13 1160 1187 HSCREACT PEA 1 T15 1 123 1150 HSCREACT PEA 1 T22 864 891 HSCREACT PEA 1 T29 314 341
Segment cluster HSCREACT_PEA_l_node_54 according to the present invention can be found in the following franscript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32 and HSCREACT PEA 1 T33. Table 61 below describes the starting and ending position of this segment on each transcript. Table 61 - Segment location on transcripts
supported by 45 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32 and HSCREACT_PEA_1_T33. Table 63 below describes the starting and ending position of this segment on each transcript. Table 63 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_56 according to the present invention is supported by 43 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32 and HSCREACT_PEA_1_T33. Table 65 below describes the starting and ending position of this segment on each transcript. Table 65 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_57 according to the present invention is supported by 41 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCREACT PEA 1 T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32 and HSCREACT_PEA_1_T33. Table 66 below describes the starting and ending position of this segment on each transcript. Table 66 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_58 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33 and HSCREACT_PEA_1_T39. Table 67 below describes the starting and ending position of this segment on each transcript. Table 67 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_59 according to the present invention is supported by 40 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT PEA T30, HSCREACT PEAJ T32, HSCREACT_PEA_1_T33, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 68 below describes the starting and ending position of this segment on each transcript. Table 68 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_60 according to the present invention can be found in the following franscript(s): HSCREACT_PEA_1__T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT PEA 1 T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 69 below describes the starting and ending position of this segment on each franscript. Table 69 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_61 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1 T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1 T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 70 below describes the starting and ending position of this segment on each franscript. Table 70 - Segment location on transcripts
HSCREACT PEA 1 T39 510 523
Segment cluster HSCREACT_PEA_l_node_64 according to the present invention is supported by 34 libraries. The number of libraries was determined as previously described. This segment can be found in the following franscript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 71 below describes the starting and ending position of this segment on each franscript. Table 71 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_8 according to the present invention is supported by 48 libraries. The number of libraries was determined as previously described. This segment can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT J>EA_1_T38 and HSCREACT_PEA_1_T39. Table 72 below describes the starting and ending position of this segment on each transcript. Table 72 - Segment location on transcripts
Segment cluster HSCREACT_PEA_l_node_9 according to the present invention can be found in the following transcript(s): HSCREACT_PEA_1_T12, HSCREACT_PEA_1_T13, HSCREACT_PEA_1_T15, HSCREACT_PEA_1_T22, HSCREACT_PEA_1_T29, HSCREACT_PEA_1_T30, HSCREACT_PEA_1_T32, HSCREACT_PEA_1_T33, HSCREACT_PEA_1_T38 and HSCREACT_PEA_1_T39. Table 73 below describes the starting and ending position of this segment on each franscript. Table 73 - Segment location on transcripts
Variant protein alignment to the previously known protein: Sequence name: /tmp/wYPvuKc8qt/2EρY3XLyuY: CRP_HUMAN Sequence documentation: Alignment of: HSCREACT_PEA_1_P9 x CRP_HUMAN Alignment segment 1/1: Quality: 899.00 Escore: 0 Matching length: 102 Total length: 224
Matching Percent Similarity: 99.02 Matching Percent Identity: 99.02 Total Percent Similarity: 45.09 Total Percent Identity: 45.09 Gaps : 1
Alignment: 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 51 FTVCLHFYTELSSTH 65 I I I I I I I I I I I I I I 51 FTVCLHFYTELSSTRGYSIFSYATKRQDNEILIFWSKDIGYSFTVGGSEI
100 65 65 101 LFEVPEVTVAPVHICTSWESASGIVEFWVDGKPRVRKSLKKGYTVGAEAS 150 66 EINTIYLGGPFSP 78 151 IILGQEQDSFGGNFEGSQSLVGDIGNVNM DFVLSPDEINTIYLGGPFSP 200 79 NVLN RALKYEVQGEVFTKPQLWP 102 I I I I I I I I I I I I I I I I I I I I I I I I 201 NVLNWRALKYEVQGEVFTKPQLWP 224
Sequence name: /tmp/E0hIQJQMu /ZZUuTJScck:CRP_HUMAN
Sequence documentation:
Alignment of: HSCREACT_PEA_1_P10 x CRP_HUMAN Alignment segment 1/1: Quality: 642.00 Escore: 0 Matching length: 66 Total length: 66
Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 51 FTVCLHFYTELSSTRG 66 I I I I I I I I I I I I I I I I 51 FTVCLHFYTELSSTRG 66
Sequence name: /tmp/SjZGekAOzM/KOEWo2vkcH: CRP_HUMAN
Sequence documentation:
Alignment of: HSCREACT_PEA_1_P12 x CRP_HUMAN
Alignment segment 1/1: Quality: 803.00
Escore: 0 Matching length: 91 Total length: 224 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 40.62 Total Percent Identity: 40.62 Gaps : 1 Alignment: 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 51 FTVCLHFYTELSSTRG 66 I I I I I I I I I I I I I I I I 51 FTVCLHFYTELSSTRGYSIFSYATKRQDNEILIFWSKDIGYSFTVGGSEI
100 66 66
101 LFEVPEVTVAPVHICTSWESASGIVEFWVDGKPRVRKSLKKGYTVGAEAS 150 67 P 67 I 151 IILGQEQDSFGGNFEGSQSLVGDIGNVNMWDFVLSPDEINTIYLGGPFSP 200 68 NVLNWRALKYEVQGEVFTKPQLWP 91 I I I I I I I I I I I I I I I I I I I I I I I I 201 NVLNWRALKYEVQGEVFTKPQLWP 224
Sequence name: /tmp/ThtZ21L8W4/dzqzEjwmeE:CRP_HUMAN
Sequence documentation:
Alignment of: HSCREACT_PEA_1_P16 x CRP_HUMAN
Alignment segment 1/1: Quality: 1556.00 Escore: 0 Matching length: 160 Total length: 160 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment: 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 51 FTVCLHFYTELSSTRGYSIFSYATKRQDNEILIFWSKDIGYSFTVGGSEI 100 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 51 FTVCLHFYTELSSTRGYSIFSYATKRQDNEILIFWSKDIGYSFTVGGSEI
100 101 LFEVPEVTVAPVHICTSWESASGIVEFWVDGKPRVRKSLKKGYTVGAEAS 150
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 101 LFEVPEVTVAPVHICTSWESASGIVEFWVDGKPRVRKSLKKGYTVGAEAS 150 151 IILGQEQDSF 160 151 IILGQEQDSF
160
Sequence name: /tmp/DFoGmxPA04/oOc616sFDL:CRP_HUMAN
Sequence documentation: Alignment of: HSCREACT_PEA_1_P22 x CRP_HUMAN
Alignment segment 1/1: Quality: 642.00 Escore: 0 Matching length: 66 Total length: 66 Matching Percent Similarity: 100.00 Matching Percent Identity: 100.00 Total Percent Similarity: 100.00 Total Percent Identity: 100.00 Gaps : 0
Alignment : 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 51 FTVCLHFYTELSSTRG 66 I I I I I I I I I I I I I I I I 51 FTVCLHFYTELSSTRG 66
Sequence name: /tmp/UEbuKp67sJ/siZFMsAVtb:CRP_HUMAN Sequence documentation:
Alignment of: HSCREACT_PEA_1_P28 x CRP_HUMAN Alignment segment 1/1: Quality: 623.00
Escore: 0 Matching length 64 Total length: 64 Matching Percent Similarity 100.00 Matching Percent Identity: 100.00 Total Percent Similarity 100.00 Total Percent Identity: 100.00 Gaps 0
Alignment : 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 MEKLLCFLVLTSLSHAFGQTDMSRKAFVFPKESDTSYVSLKAPLTKPLKA 50 51 FTVCLHFYTELSST 64 I I I I I I I I I I I I I I 51 FTVCLHFYTELSST 64
Experimental results for the variants according to the present invention are described below. Experimental materials and methods, including relevant tables of tissues, were described above with regard to results obtained for cluster HSKITCR (KIT).
Expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) HSCREACT transcripts which are detectable by amplicon as depicted in sequence name HSCREACT juncl 1-53F2R2 in different normal tissues
Expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) transcripts detectable by or according to HSCREACT juncl 1-53F2R2 amplicon (SEQ ID NO: 1409) and primers HSCREACT juncl 1-53F2 (SEQ ID NO: 1410) and HSCREACT juncl 1-53R2 (SEQ ID NO: 141 1) was measured by real time PCR. In parallel the expression of four housekeeping genes - RPL 19 (GenBank Accession No. NM_000981 ; RPL 19 amplicon, SEQ ID NO: 1378), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO: 1393), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin- amplicon, SEQ ID NO: 1390) and SDHA (GenBank Accession No. NM_004168; amplicon -
SDHA-amplicon, SEQ ID NO: 1369) was measured similarly. For each RT sample, the expression of the above amplicon was normalized to the geometric mean ofthe quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the median ofthe quantities ofthe liver samples (Sample Nos. 47-49 above in the normal tissues table), to obtain a value of relative expression of each sample relative to median of the liver samples. These data are plotted in figures 32 and 33 in two different scales.
HSCREACT junc 1 1-53F2 Forward primer (SEQ ID NO: 1410):
AACTGTCCTCGACCCTGCTTT HSCREACT juncl 1-53R2 Reverse primer (SEQ ID NO: 141 1):
GTGGCCTGGGTATATTGGGA
Amplicon (SEQ ID NO: 1409):
AACTGTCCTCGACCCTGCTTTCTTAATTTTATGGCTCTTCTGGGAAACTCCTCCCC
TTTTCCACACGAACCTTGTGGGGCTGTGAATTCTTTCTTCATCCCCGCATTCCCA ATATACCCAGGCCAC
Expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) HSCREACT franscripts which are detectable by amplicon as depicted in sequence name HSCREACT juncl2-30F2R2 in different normal tissues
Expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) transcripts detectable by or according to HSCREACT juncl 2-30F2R2 amplicon (SEQ ID NO: 1412) and priemrs HSCREACTjuncl2-30F2 (SEQ ID NO: 1413) and HSCREACT junc 12-30R2 (SEQ ID NO: 1414) was measured by real time PCR. In parallel the expression of four housekeeping genes - RPL 19 (GenBank Accession No. NM 000981 ; RPL 19 amplicon, SEQ ID NO: 1378), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO: 1393), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin- amplicon, SEQ ID NO:1390) and SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon, SEQ ID NO: 1369) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean ofthe quantities of
the housekeeping genes. The normalized quantity of each RT sample was then divided by the median ofthe quantities ofthe liver samples (Sample Nos. 47-49 above in the normal tissues table), to obtain a value of relative expression of each sample relative to median of the liver samples. These data are plotted in figures 34 and 35 in two different scales.
Forward primer HSCREACT juncl 2-30F2 (SEQ ID NO: 1413):
CTCGACCCGTGGATGAGATT
Reverse primer HSCREACT junc 12-30R2 (SEQ ID NO: 1414):
ACACTTCGCCTTGCACTTCA Amplicon (SEQ ID NO: 1412):
CTCGACCCGTGGATGAGATTAACACCATCTATCTTGGCGGGCCCTTCAGTCCTAA
TGTCCTGAACTGGCGGGCACTGAAGTATGAAGTGCAAGGCGAAGTGT
Expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) HSCREACT transcripts which are detectable by amplicon as depicted in sequence name HSCREACT juncl 2-53F2R2 in different normal tissues
Expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) transcripts detectable by or according to HSCREACT juncl 2-53F2R2 amplicon (SEQ ID NO: 1415) and primers HSCREACT juncl 2-53F2 (SEQ ID NO: 1416) and HSCREACT juncl 2-53R2 (SEQ ID NO: 1417) was measured by real time PCR. In parallel the expression of four housekeeping genes -RPL 19 (GenBank Accession No. NM 000981; RPL 19 amplicon, SEQ ID NO: 1378), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO: 1393), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon, SEQ ID NO: 1390) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA- amplicon, SEQ ID NO: 1369) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean ofthe quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities ofthe liver samples (Sample Nos. 47-49 above in the normal tissues
table), to obtain a value of relative expression of each sample relative to median ofthe liver samples. These data are plotted in figures 36 and 37 in two different scales.
Forward primer HSCREACT juncl 2-53F2 (SEQ ID NO: 1416): CCTCGACCCGTGGTGCT Reverse primer HSCREACT juncl 2-53R2(SEQ ID NO: 1417): GTGGCCTGGGTATATTGGGA Amplicon (SEQ ID NO: 1415):
CCTCGACCCGTGGTGCTTTCTTAATTTTATGGCTCTTCTGGGAAACTCCTCCCCTT TTCCACACGAACCTTGTGGGGCTGTGAATTCTTTCTTCATCCCCGCATTCCCAAT ATACCCAGGCCAC
Expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) HSCREACT transcripts which are detectable by amplicon as depicted in sequence name HSCREACT junc24-47F2R2 in different normal tissues
Expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) transcripts detectable by or according to HSCREACT junc24-47F2R2 amplicon (SEQ ID NO: 1418) and primers HSCREACT junc24-47F2 (SEQ ID NO: 1419) and HSCREACT junc24-47R2 (SEQ ID NO: 1420) was measured by real time PCR. In parallel the expression of four housekeeping genes - RPL19 (GenBank Accession No. NM_000981 ; RPL19 amplicon, SEQ ID NO: 1378), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO: 1393), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin- amplicon, SEQ ID NO: 1390) and SDHA (GenBank Accession No. NM 004168; amplicon - SDHA-amplicon, SEQ ID NO: 1369) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean of the quantities of the housekeeping genes. The normalized quantity of each RT sample was then divided by the quantity ofthe one liver sample that express this amplicon (Sample No. 47 above in the normal tissues table), to obtain a value of relative expression of each sample relative to this liver sample. These data are plotted in figure 38.
Forward primer HSCREACT junc24-47F2 (SEQ ID NO: 1419): GCAGGATTCCTTCGTCTCAGA
Reverse primer HSCREACT junc24-47R2 (SEQ ID NO: 1420): GAGAAAGTGGAGGGACTGCG Amplicon (SEQ ID NO: 1418):
GCAGGATTCCTTCGTCTCAGAATCAGGACACTGGCCAGGTGTCTGGTTTGGGTCC
AGAGTGCTCATCATCATGTCATAGAACTGCTGGGCCCAGGTCTCCTGAAATGGG
AAGCCCAGCAATACCACGCAGTCCCTCCACTTTCTC
Expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) HSCREACT transcripts which are detectable by amplicon as depicted in sequence name HSCREACT seg8-l 1 in different normal tissues Expression of Homo sapiens C-reactive protein, pentraxin-related (CRP) transcripts detectable by or according to HSCREACT seg8-l 1 amplicon (SEQ ID NO: 1421) and primers HSCREACT seg8-l IF (SEQ ID NO: 1422) and HSCREACT seg8-l 1R (SEQ ID NO: 1423) was measured by real time PCR. These transcripts are related to the known protein ( WT or wild type) protein sequence. In parallel the expression of four housekeeping genes - RPL 19 (GenBank Accession No. NM_000981 ; RPL 19 amplicon, SEQ ID
NO: 1378), TATA box (GenBank Accession No. NM_003194; TATA amplicon, SEQ ID NO: 1393), Ubiquitin (GenBank Accession No. BC000449; amplicon - Ubiquitin-amplicon, SEQ ID NO:1390) and SDHA (GenBank Accession No. NM_004168; amplicon - SDHA- amplicon, SEQ ID NO: 1369) was measured similarly. For each RT sample, the expression ofthe above amplicon was normalized to the geometric mean ofthe quantities ofthe housekeeping genes. The normalized quantity of each RT sample was then divided by the median of the quantities ofthe liver samples (Sample Nos. 47-49 above in the normal tissues table), to obtain a value of relative expression of each sample relative to median of the liver samples. These data are plotted in figure 39.
Forward primer HSCREACT seg8-l IF (SEQ ID NO: 1422):
GAAGGCTTTTGTGTTTCCCAAA
Reverse primer HSCREACT seg8-l 1R (SEQ ID NO: 1423):
AGAAGTGGAGGCACACAGTGAA
Amplicon (SEQ ID NO: 1421):
GAAGGCTTTTGTGTTTCCCAAAGAGTCGGATACTTCCTATGTATCCCTCAAAGCA
CCGTTAACGAAGCCTCTCAAAGCCTTCACTGTGTGCCTCCACTTCT
Therapeutic applications of splice variants of the present invention Splice variants described herein (including any polynucleotide, oligonucleotide, polypeptide, peptide or fragments thereof) or antibodies that specifically bind thereto may optionally be used for therapeutic applications, for example to treat the diseases described herein with regard to diagnostic applications thereof. A "variant-treatable" disease refers to any disease that is treatable by using a splice variant of any ofthe therapeutic proteins according to the present invention. "Treatment" also encompasses prevention, amelioration, elimination and control ofthe disease and/or pathological condition. The diseases for which such variants may be useful therapeutic agents are described in greater detail below for each ofthe variants. The variants themselves are described by "cluster" or by gene, as these variants are splice variants of known proteins. Therefore, a "cluster-related disease" or a "variant-related disease" refers to a disease that may be treated by a particular protein, with regard to the description of such diseases below a therapeutic protein variant according to the present invention. The term "biologically active", as used herein, refers to a protein having structural, regulatory, or biochemical functions of a naturally occuning molecule. Likewise,
"immunologically active" refers to the capability ofthe natural, recombinant, or synthetic ligand, or any oligopeptide thereof, to induce a specific immune response in appropriate animals or cells and to bind with specific antibodies. The term "modulate", as used herein, refers to a change in the activity of at least one receptor mediated activity. For example, modulation may cause an increase or a decrease in protein activity, binding characteristics, or any other biological, functional or immunological properties of a ligand.
METHODS OF TREATMENT As mentioned hereinabove the novel therapeutic protein variants ofthe present invention and compositions derived therefrom (i.e., peptides, oligonucleotides) can be used to treat cluster-related diseases.
Thus, according to an additional aspect ofthe present invention there is provided a method of treating cluster-related disease in a subject. The subject according to the present invention is a mammal, preferably a human which has at least one type ofthe cluster-related diseases described hereinabove. As mentioned hereinabove, the biomolecular sequences ofthe present invention can be used to treat subjects with the above-described diseases. The subject according to the present invention is a mammal, preferably a human which is diagnosed with one ofthe diseases described hereinabove, or alternatively is predisposed to having one ofthe diseases described hereinabove. As used herein the term "treating" refers to preventing, curing, reversing, attenuating, alleviating, minimizing, suppressing or halting the deleterious effects ofthe above-described diseases. Treating, according to the present invention, can be effected by specifically upregulating or alternatively downregulating the expression of at least one ofthe polypeptides ofthe present invention in the subject. Optionally, upregulation may be effected by administering to the subject at least one ofthe polypeptides ofthe present invention (e.g., recombinant or synthetic) or an active portion thereof, as described herein. However, since the bioavailability of large polypeptides may potentially be relatively small due to high degradation rate and low penetration rate, administration of polypeptides is preferably confined to small peptide fragments (e.g., about 100 amino acids). The polypeptide or peptide may optionally be administered in a pharmaceutical composition, described in more detail below. It will be appreciated that treatment ofthe above-described diseases according to the present invention may be combined with other treatment methods known in the art (i.e., combination therapy). Thus, treatment of malignancies using the agents ofthe present invention may be combined with, for example, radiation therapy, antibody therapy and/or chemotherapy. Alternatively or additionally, an upregulating method may optionally be effected by specifically upregulating the amount (optionally expression) in the subject of at least one of the polypeptides ofthe present invention or active portions thereof. As is mentioned hereinabove and in the Examples section which follows, the biomolecular sequences of this aspect ofthe present invention may be used as valuable therapeutic tools in the treatment of diseases in which altered activity or expression ofthe
wild-type gene product is known to contribute to disease onset or progression. For example in case a disease is caused by overexpression of a membrane bound receptor, a soluble variant thereof may be used as an antagonist which competes with the receptor for binding the ligand, to thereby terminate signaling from the receptor. Examples of such diseases are listed in the Examples section which follows. It will be appreciated that the polypeptides of the present invention may also have agonistic properties. These include increasing the stability ofthe ligand (e.g., IL-4), protection from proteolysis and modification ofthe pharmacokinetic properties ofthe ligand (i.e., increasing the half-life ofthe ligand, while decreasing the clearance thereof). As such, the biomolecular sequences of this aspect ofthe present invention may be used to treat conditions or diseases in which the wild-type gene product plays a favorable role, for example, increasing angiogenesis in cases of diabetes or ischemia. Upregulating expression ofthe therapeutic protein variants ofthe present invention may be effected via the administration of at least one ofthe exogenous polynucleotide sequences of the present invention, ligated into a nucleic acid expression construct designed for expression of coding sequences in eukaryotic cells (e.g., mammalian cells), as described above. Accordingly, the exogenous polynucleotide sequence may be a DNA or RNA sequence encoding the variants ofthe present invention or active portions thereof. It will be appreciated that the nucleic acid construct can be administered to the individual employing any suitable mode of administration, described hereinbelow (i.e., in- vivo gene therapy). Alternatively, the nucleic acid construct is introduced into a suitable cell via an appropriate gene delivery vehicle/method (transfection, transduction, homologous recombination, etc.) and an expression system as needed and then the modified cells are expanded in culture and returned to the individual (i.e., ex-vivo gene therapy). Nucleic acid constructs are described in greater detail above. It will be appreciated that the present methodology may also be effected by specifically upregulating the expression ofthe variants ofthe present invention endogenously in the subject. Agents for upregulating endogenous expression of specific splice variants of a given gene include antisense oligonucleotides, which are directed at splice sites of interest, thereby altering the splicing pattern ofthe gene. This approach has been successfully used for shifting the balance of expression ofthe two isoforms of Bcl-x [Taylor (1999) Nat. Biotechnol. 17:1097-1100; and Mercatante (2001) J. Biol. Chem.
276:1641 1-16417]; IL-5R [Kanas (2000) Mol. Pharmacol. 58:380-387]; and c-myc [Giles (1999) Antisense Acid Drug Dev. 9:213-220]. For example, interleukin 5 and its receptor play a critical role as regulators of hematopoiesis and as mediators in some inflammatory diseases such as allergy and asthma. Two alternatively spliced isoforms are generated from the IL-5R gene, which include (i.e., long form) or exclude (i.e., short form) exon 9. The long form encodes for the intact membrane-bound receptor, while the shorter form encodes for a secreted soluble nonfunctional receptor. Using 2'-0-MOE-oligonucleotides specific to regions of exon 9, Kanas and co-workers (supra) were able to significantly decrease the expression ofthe wild type receptor and increase the expression ofthe shorter isoforms. Design and synthesis of oligonucleotides which can be used according to the present invention are described hereinbelow and by Sazani and Kole (2003) Progress in Moleclular and Subcellular Biology 31 :217-239. Upregulating expression ofthe polypeptides of the present invention in a subject may be effected via the adminisfration of at least one ofthe exogenous polynucleotide sequences of the present invention (e.g., SEQ ID NOs: 3, 7, 11, 15, 19, 23, 27, 31, 35, 39 or 43) ligated into a nucleic acid expression construct designed for expression of coding sequences in eukaryotic cells (e.g., mammalian cells). Accordingly, the exogenous polynucleotide sequence may be a DNA or RNA sequence encoding the variants ofthe present invention or active portions thereof. It will be appreciated that the nucleic acid construct can be administered to the individual employing any suitable mode of administration, described hereinbelow (i.e., in- vivo gene therapy). Alternatively, the nucleic acid construct is introduced into a suitable cell via an appropriate gene delivery vehicle/method (transfection, transduction, homologous recombination, etc.) and an expression system as needed and then the modified cells are expanded in culture and returned to the individual (i.e., ex-vivo gene therapy). Preferably, the promoter utilized by the nucleic acid construct ofthe present invention is active in the specific cell population transformed. Examples of cell type- specific and/or tissue-specific promoters include promoters, such as albumin that is liver specific [Pinkert et al., (1987) Genes Dev. 1 :268-277], lymphoid specific promoters [Calame et al., (1988) Adv. Immunol. 43:235-275]; in particular promoters of T-cell receptors [Winoto et al., (1989) EMBO J. 8:729-733] and immunoglobulins; [Banerji et al. (1983) Cell 33729-740], neuron-specific promoters such as the neurofilament promoter [Byrne et al.
(1989) Proc. Natl. Acad. Sci. USA 86:5473-5477], pancreas-specific promoters [Edlunch et al. (1985) Science 230:912-916] or mammary gland-specific promoters such as the milk whey promoter (U.S. Pat. No. 4,873,316 and European Patent Application No. EP 264,166). Examples of suitable constructs include, but are not limited to, pcDNA3, pcDNA3.1 (+/-), pGL3, PzeoSV2 (+/-), pDisplay, pEF/myc/cyto, pCMV/myc/cyto each of which is commercially available from Invitrogen Co. (www.invifrogen.com). Examples of retroviral vector and packaging systems are those sold by Clontech, San Diego, Calif, including Refro-X vectors pLNCX and pLXSN, which permit cloning into multiple cloning sites and the trasgene is transcribed from CMV promoter. Vectors derived from Mo-MuLV are also included such as pBabe, where the fransgene will be transcribed from the 5 'LTR promoter. Cunently prefened in vivo nucleic acid transfer techniques include transfection with viral or non-viral constructs, such as adenovirus, lentivirus, Herpes simplex I virus, or adeno-associated virus (AAV) and lipid-based systems. Useful lipids for lipid-mediated transfer ofthe gene are, for example, DOTMA, DOPE, and DC-Choi [Tonkinson et al., Cancer Investigation, 14(1): 54-65 (1996)]. The most prefened constructs for use in gene therapy are viruses, most preferably adenovimses, AAV, lentiviruses, or retroviruses. A viral construct such as a retroviral construct includes at least one transcriptional promoter/enhancer or locus-defining element(s), or other elements that control gene expression by other means such as alternate splicing, nuclear RNA export, or post- translational modification of messenger. Such vector constructs also include a packaging signal, long terminal repeats (LTRs) or portions thereof, and positive and negative strand primer binding sites appropriate to the virus used, unless it is already present in the viral construct. In addition, such a construct typically includes a signal sequence for secretion of the peptide from a host cell in which it is placed. Preferably the signal sequence for this purpose is a mammalian signal sequence or the signal sequence of the polypeptide variants ofthe present invention. Optionally, the construct may also include a signal that directs polyadenylation, as well as one or more restriction sites and a translation termination sequence. By way of example, such constructs will typically include a 5' LTR, a tRNA binding site, a packaging signal, an origin of second-strand DNA synthesis, and a 3' LTR or a portion thereof. Other vectors can be used that are non-viral, such as cationic lipids, polylysine, and dendrimers. It will be appreciated that the present methodology may also be performed by specifically upregulating the expression ofthe splice variants ofthe present invention
endogenously in the subject. Agents for upregulating endogenous expression of specific splice variants of a given gene include antisense oligonucleotides, which are directed at splice sites of interest, thereby altering the splicing pattern of the gene. This approach has been successfully used for shifting the balance of expression ofthe two isoforms of Bcl-x [Taylor (1999) Nat. Biotechnol. 17:1097-1100; and Mercatante (2001) J. Biol. Chem.
276: 1641 1-16417]; IL-5R [Kanas (2000) Mol. Pharmacol. 58:380-387]; and c-myc [Giles (1999) Antisense Acid Drug Dev. 9:213-220]. For example, interleukin 5 and its receptor play a critical role as regulators of hematopoiesis and as mediators in some inflammatory diseases such as allergy and asthma. Two alternatively spliced isoforms are generated from the IL-5R gene, which include (i.e., long form) or exclude (i.e., short form) exon 9. The long form encodes for the intact membrane-bound receptor, while the shorter form encodes for a secreted soluble nonfunctional receptor. Using 2'-0-MOE-oligonucleotides specific to regions of exon 9, Kanas and co-workers (supra) were able to significantly decrease the expression of the wild type receptor and increase the expression ofthe shorter isoforms. Design and synthesis of oligonucleotides which can be used according to the present invention are described hereinbelow and by Sazani and Kole (2003) Progress in Moleclular and Subcellular Biology 31 :217-239. Treatment can preferably effected by agents which are capable of specifically downregulating expression (or activity) of at least one ofthe polypeptide variants ofthe present invention. Down regulating the expression ofthe therapeutic protein variants of the present invention may be achieved using oligonucleotide agents such as those described in greater detail below. SiRNA molecules - Small interfering RNA (siRNA) molecules can be used to down- regulate expression ofthe therapeutic protein variants ofthe present invention. RNA interference is a two-step process. The first step, which is termed as the initiation step, input dsRNA is digested into 21-23 nucleotide (nt) small interfering RNAs (siRNA), probably by the action of Dicer, a member ofthe RNase III family of dsRNA-specific ribonucleases, which processes (cleaves) dsRNA (introduced directly or via a transgene or a virus) in an
ATP-dependent manner. Successive cleavage events degrade the RNA to 19-21 bp duplexes (siRNA), each with 2-nucleotide 3' overhangs [Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225-232 (2002); and Bernstein Nature 409:363-366 (2001)].
In the effector step, the siRNA duplexes bind to a nuclease complex to from the RNA-induced silencing complex (RISC). An ATP-dependent unwinding ofthe siRNA duplex is required for activation ofthe RISC. The active RISC then targets the homologous transcript by base pairing interactions and cleaves the mRNA into 12 nucleotide fragments from the 3' terminus ofthe siRNA [Hutvagner and Zamore Cun. Opin. Genetics and
Development 12:225-232 (2002); Hammond et al. (2001) Nat. Rev. Gen. 2:1 10-119 (2001); and Sharp Genes. Dev. 15:485-90 (2001)]. Although the mechanism of cleavage is still to be elucidated, research indicates that each RISC contains a single siRNA and an RNase [Hutvagner and Zamore Cun. Opin. Genetics and Development 12:225-232 (2002)]. Because ofthe remarkable potency of RNAi, an amplification step within the
RNAi pathway has been suggested. Amplification could occur by copying of the input dsRNAs which would generate more siRNAs, or by replication of the siRNAs formed. Alternatively or additionally, amplification could be effected by multiple turnover events of the RISC [Hammond et al. Nat. Rev. Gen. 2:1 10-1 19 (2001), Sharp Genes. Dev. 15:485-90 (2001 ); Hutvagner and Zamore Cun. Opin. Genetics and Development 12 :225-232 (2002)] . For more information on RNAi see the following reviews Tuschl ChemBiochem. 2:239-245 (2001); Cullen Nat. Immunol. 3:597-599 (2002); and Brantl Biochem. Biophys. Act. 1575:15-25 (2002). Synthesis of RNAi molecules suitable for use with the present invention can be effected as follows. First, the mRNA sequence is scanned downstream of the AUG start codon for AA dinucleotide sequences. Occunence of each AA and the 3' adjacent 19 nucleotides is recorded as potential siRNA target sites. Preferably, siRNA target sites are selected from the open reading frame, as untranslated regions (UTRs) are richer in regulatory protein binding sites. UTR-binding proteins and/or translation initiation complexes may interfere with binding ofthe siRNA endonuclease complex [Tuschl ChemBiochem. 2:239- 245]. It will be appreciated though, that siRNAs directed at untranslated regions may also be effective, as demonstrated for GAPDH wherein siRNA directed at the 5 ' UTR mediated about 90 % decrease in cellular GAPDH mRNA and completely abolished protein level (www.ambion.com/techlib/tn/91/912.html). Second, potential target sites are compared to an appropriate genomic database (e.g., human, mouse, rat etc.) using any sequence alignment software, such as the BLAST software available from the NCBI server (www.ncbi.nlm.nih.gov/BLAST/).
Putative target sites which exhibit significant homology to other coding sequences are filtered out. Qualifying target sequences are selected as template for siRNA synthesis. Prefened sequences are those including low G/C content as these have proven to be more effective in mediating gene silencing as compared to those with G/C content higher than 55 %. Several target sites are preferably selected along the length of the target gene for evaluation. Target sites are selected from the unique nucleotide sequences of each ofthe polynucleotides ofthe present invention, such that each polynucleotide is specifically down regulated. For better evaluation of the selected siRNAs, a negative control is preferably used in conjunction. Negative confrol siRNA preferably include the same nucleotide composition as the siRNAs but lack significant homology to the genome. Thus, a scrambled nucleotide sequence ofthe siRNA is preferably used, provided it does not display any significant homology to any other gene. DNAzyme molecules - Another agent capable of downregulating expression ofthe polypeptides of the present invention is a DNAzyme molecule capable of specifically cleaving an mRNA transcript or DNA sequence ofthe polynucleotides ofthe present invention. DNAzymes are single-stranded polynucleotides which are capable of cleaving both single and double stranded target sequences (Breaker, R.R. and Joyce, G. Chemistry and Biology 1995;2:655; Santoro, S.W. & Joyce, G.F. Proc. Natl, Acad. Sci. USA 1997;943:4262) A general model (the "10-23" model) for the DNAzyme has been proposed. "10-23" DNAzymes have a catalytic domain of 15 deoxyribonucleotides, flanked by two substrate-recognition domains of seven to nine deoxyribonucleotides each. This type of DNAzyme can effectively cleave its substrate RNA at purine:pyrimidine junctions (Santoro, S.W. & Joyce, G.F. Proc. Natl, Acad. Sci. USA 199; for rev of DNAzymes see Khachigian, LM [Cun Opin Mol Ther 4:119-21 (2002)]. Target sites for DNAzymes are selected from the unique nucleotide sequences of each ofthe polynucleotides ofthe present invention, such that each polynucleotide is specifically down regulated. Examples of construction and amplification of synthetic, engineered DNAzymes recognizing single and double-stranded target cleavage sites have been disclosed in U.S. Pat. No. 6,326,174 to Joyce et al. DNAzymes of similar design directed against the human Urokinase receptor were recently observed to inhibit Urokinase receptor expression, and successfully inhibit colon cancer cell metastasis in vivo (Itoh et al , 20002, Abstract 409,
Ann Meeting Am Soc Gen Ther www.asgt.org). In another application, DNAzymes complementary to bcr-abl oncogenes were successful in inhibiting the oncogenes expression in leukemia cells, and lessening relapse rates in autologous bone manow transplant in cases ofCML and ALL. Antisense molecules - Downregulation ofthe polynucleotides ofthe present invention can also be effected by using an antisense polynucleotide capable of specifically hybridizing with an mRNA franscript encoding the polypeptide variants ofthe present invention. The term "antisense", as used herein, refers to any composition containing nucleotide sequences, which are complementary to a specific DNA or RNA sequence. The term "antisense strand" is used in reference to a nucleic acid strand that is complementary to the "sense" strand. Antisense molecules also include peptide nucleic acids and may be produced by any method including synthesis or transcription. Once introduced into a cell, the complementary nucleotides combine with natural sequences produced by the cell to form duplexes and block either transcription or translation. The designation "negative" is sometimes used in reference to the antisense strand, and "positive" is sometimes used in reference to the sense strand. Antisense oligonucleotides are also used for modulation of alternative splicing in vivo and for diagnostics in vivo and in vitro (Khelifi C. et al., 2002, Cunent Pharmaceutical Design 8:451-1466; Sazani, P., and Kole. R. Progress in Molecular and Cellular Biology, 2003, 31 :217-239). Design of antisense molecules which can be used to efficiently downregulate expression ofthe polypeptides ofthe present invention must be effected while considering two aspects important to the antisense approach. The first aspect is delivery ofthe oligonucleotide into the cytoplasm ofthe appropriate cells, while the second aspect is design of an oligonucleotide which specifically binds the designated mRNA within cells in a way which inhibits translation thereof. The prior art teaches of a number of delivery strategies which can be used to efficiently deliver oligonucleotides into a wide variety of cell types [see, for example, Luft J Mol Med 76: 75-6 (1998); Kronenwett et al. Blood 91 : 852-62 (1998); Rajur et al. Bioconjug Chem 8: 935-40 (1997); Lavigne et al. Biochem Biophys Res Commun 237: 566-71 (1997) and Aoki et al. (1997) Biochem Biophys Res Commun 231 : 540-5 (1997)]. In addition, algorithms for identifying those sequences with the highest predicted binding affinity for their target mRNA based on a thermodynamic cycle that accounts for the
energetics of structural alterations in both the target mRNA and the oligonucleotide are also available [see, for example, Walton et al. Biotechnol Bioeng 65: 1-9 (1999)]. Such algorithms have been successfully used to implement an antisense approach in cells. For example, the algorithm developed by Walton et al. enabled scientists to successfully design antisense oligonucleotides for rabbit beta-globin (RBG) and mouse tumor necrosis factor-alpha (TNF alpha) transcripts. The same research group has more recently reported that the antisense activity of rationally selected oligonucleotides against three model target mRNAs (human lactate dehydrogenase A and B and rat gpl30) in cell culture as evaluated by a kinetic PCR technique proved effective in almost all cases, including tests against three different targets in two cell types with phosphodiester and phosphorothioate oligonucleotide chemistries. In addition, several approaches for designing and predicting efficiency of specific oligonucleotides using an in vitro system were also published (Matveeva et al., Nature Biotechnology 16: 1374 - 1375 (1998)]. Several clinical trials have demonstrated safety, feasibility and activity of antisense oligonucleotides. For example, antisense oligonucleotides suitable for the treatment of cancer have been successfully used [Holmund et al., Cun Opin Mol Ther 1 :372-85 (1999)], while treatment of hematological malignancies via antisense oligonucleotides targeting c- myb gene, p53 and Bcl-2 had entered clinical trials and had been shown to be tolerated by patients [Gerwitz Cun Opin Mol Ther 1 :297-306 (1999)]. More recently, antisense-mediated suppression of human heparanase gene expression has been reported to inhibit pleural dissemination of human cancer cells in a mouse model [Uno et al., Cancer Res 61:7855-60 (2001)]. Thus, the cunent consensus is that recent developments in the field of antisense technology which, as described above, have led to the generation of highly accurate antisense design algorithms and a wide variety of oligonucleotide delivery systems, enable an ordinarily skilled artisan to design and implement antisense approaches suitable for downregulating expression of known sequences without having to resort to undue trial and enor experimentation. Target sites for antisense molecules are selected from the unique nucleotide sequences of each ofthe polynucleotides of the present invention, such that each polynucleotide is specifically down regulated.
Ribozymes - Another agent capable of downregulating expression ofthe polypeptides ofthe present invention is a ribozyme molecule capable of specifically cleaving an mRNA transcript encoding the polypeptide variants ofthe present invention. Ribozymes are being increasingly used for the sequence-specific inhibition of gene expression by the cleavage of mRNAs encoding proteins of interest [Welch et al., Cun Opin Biotechnol.
9:486-96 (1998)]. The possibility of designing ribozymes to cleave any specific target RNA has rendered them valuable tools in both basic research and therapeutic applications. In therapeutics area, ribozymes have been exploited to target viral RNAs in infectious diseases, dominant oncogenes in cancers and specific somatic mutations in genetic disorders [Welch et al., Clin Diagn Virol. 10:163-71 (1998)]. Most notably, several ribozyme gene therapy protocols for HIV patients are already in Phase 1 trials. More recently, ribozymes have been used for transgenic animal research, gene target validation and pathway elucidation. Several ribozymes are in various stages of clinical trials. ANGIOZYME was the first chemically synthesized ribozyme to be studied in human clinical trials. ANGIOZYME specifically inhibits formation of the VEGF-r (Vascular Endothelial Growth Factor receptor), a key component in the angiogenesis pathway. Ribozyme Pharmaceuticals, Inc., as well as other firms have demonstrated the importance of anti-angiogenesis therapeutics in animal models. HEPTAZYME, a ribozyme designed to selectively destroy Hepatitis C Virus (HCV) RNA, was found effective in decreasing Hepatitis C viral RNA in cell culture assays (Ribozyme Pharmaceuticals, Incoφorated - WEB home page). Alternatively, down regulation ofthe polypeptide variants ofthe present invention may be achieved at the polypeptide level using downregulating agents such as antibodies or antibody fragments capabale of specifically binding the polypeptides of the present invention and inhibiting the activity thereof (i.e., neutralizing antibodies). Such antibodies can be directed for example, to the heterodimerizing domain on the variant, or to a putative ligand binding domain. Further description of antibodies and methods of generating same is provided below.
PHARMACEUTICAL COMPOSITIONS AND DELIVERY THEREOF The present invention features a pharmaceutical composition comprising a therapeutically effective amount of a therapeutic agent according to the present invention, which is preferably a therapeutic protein variant as described herein. Optionally and alternatively, the therapeutic agent could be an antibody or an oligonucleotide that
specifically recognizes and binds to the therapeutic protein variant, but not to the conesponding full length known protein. Alternatively, the pharmaceutical composition ofthe present invention includes a therapeutically effective amount of at least an active portion of a therapeutic protein variant polypeptide. The pharmaceutical composition according to the present invention is preferably used for the treatment of cluster-related diseases. "Treatment" refers to both therapeutic treatment and prophylactic or preventative measures. Those in need of freatment include those already with the disorder as well as those in which the disorder is to be prevented. Hence, the mammal to be treated herein may have been diagnosed as having the disorder or may be predisposed or susceptible to the disorder. "Mammal" for puφoses of treatment refers to any animal classified as a mammal, including humans, domestic and farm animals, and zoo, sports, or pet animals, such as dogs, horses, cats, cows, etc. Preferably, the mammal is human. A "disorder" is any condition that would benefit from freatment with the agent according to the present invention. This includes chronic and acute disorders or diseases including those pathological conditions which predispose the mammal to the disorder in question. Non-limiting examples of disorders to be treated herein are described with regard to specific examples given herein. The term "therapeutically effective amount" refers to an amount of agent according to the present invention that is effective to treat a disease or disorder in a mammal. In the case of cancer, the therapeutically effective amount ofthe agent may reduce the number of cancer cells; reduce the tumor size; inhibit (i.e., slow to some extent and preferably stop) cancer cell infilfration into peripheral organs; inhibit (i.e., slow to some extent and preferably stop) tumor metastasis; inhibit, to some extent, tumor growth; and/or relieve to some extent one or more ofthe symptoms associated with the cancer. To the extent the agent may prevent growth and/or kill existing cancer cells, it may be cytostatic and/or cytotoxic. For cancer therapy, efficacy can, for example, be measured by assessing the time to disease progression (TTP) and/or determining the response rate (RR). The therapeutic agents ofthe present invention can be provided to the subject per se, or as part of a pharmaceutical composition where they are mixed with a pharmaceutically acceptable canier.
As used herein a "pharmaceutical composition" refers to a preparation of one or more of the active ingredients described herein with other chemical components such as physiologically suitable caniers and excipients. The puφose of a pharmaceutical composition is to facilitate administration of a compound to an organism. Herein the term "active ingredient" refers to the preparation accountable for the biological effect. Hereinafter, the phrases "physiologically acceptable canier" and "pharmaceutically acceptable canier" which may be interchangeably used refer to a canier or a diluent that does not cause significant initation to an organism and does not abrogate the biological activity and properties ofthe administered compound. An adjuvant is included under these phrases. One of the ingredients included in the pharmaceutically acceptable canier can be for example polyethylene glycol (PEG), a biocompatible polymer with a wide range of solubility in both organic and aqueous media (Mutter et al. (1979). Herein the term "excipient" refers to an inert substance added to a pharmaceutical composition to further facilitate administration of an active ingredient. Examples, without limitation, of excipients include calcium carbonate, calcium phosphate, various sugars and types of starch, cellulose derivatives, gelatin, vegetable oils and polyethylene glycols. Techniques for formulation and administration of drugs may be found in "Remington's Pharmaceutical Sciences," Mack Publishing Co., Easton, PA, latest edition, which is incoφorated herein by reference. Suitable routes of administration may, for example, include oral, rectal, transmucosal, especially transnasal, intestinal or parenteral delivery, including intramuscular, subcutaneous and intramedullary injections as well as infrathecal, direct infraventricular, intravenous, intraperitoneal, intranasal, or intraocular injections. Alternately, one may administer a preparation in a local rather than systemic manner, for example, via injection of the preparation directly into a specific region of a patient's body. Pharmaceutical compositions ofthe present invention may be manufactured by processes well known in the art, e.g., by means of conventional mixing, dissolving, granulating, dragee-making, levigating, emulsifying, encapsulating, entrapping or lyophilizing processes. Pharmaceutical compositions for use in accordance with the present invention may be formulated in conventional manner using one or more physiologically acceptable caniers comprising excipients and auxiliaries, which facilitate processing of the active ingredients
into preparations which, can be used pharmaceutically. Proper formulation is dependent upon the route of administration chosen. For injection, the active ingredients ofthe invention may be formulated in aqueous solutions, preferably in physiologically compatible buffers such as Hank's solution, Ringer's solution, or physiological salt buffer. For fransmucosal administration, penefrants appropriate to the banier to be permeated are used in the formulation. Such penetrants are generally known in the art. For oral adminisfration, the compounds can be formulated readily by combining the active compounds with pharmaceutically acceptable caniers well known in the art. Such caniers enable the compounds of the invention to be formulated as tablets, pills, dragees, capsules, liquids, gels, syrups, slunies, suspensions, and the like, for oral ingestion by a patient. Pharmacological preparations for oral use can be made using a solid excipient, optionally grinding the resulting mixture, and processing the mixture of granules, after adding suitable auxiliaries if desired, to obtain tablets or dragee cores. Suitable excipients are, in particular, fillers such as sugars, including lactose, sucrose, mannitol, or sorbitol; cellulose preparations such as, for example, maize starch, wheat starch, rice starch, potato starch, gelatin, gum tragacanth, methyl cellulose, hydroxypropylmethyl-cellulose, sodium carbomethylcellulose; and/or physiologically acceptable polymers such as polyvinylpynolidone (PVP). If desired, disintegrating agents may be added, such as cross- linked polyvinyl pynolidone, agar, or alginic acid or a salt thereof such as sodium alginate. Dragee cores are provided with suitable coatings. For this puφose, concentrated sugar solutions may be used which may optionally contain gum arabic, talc, polyvinyl pynolidone, carbopol gel, polyethylene glycol, titanium dioxide, lacquer solutions and suitable organic solvents or solvent mixtures. Dyestuffs or pigments may be added to the tablets or dragee coatings for identification or to characterize different combinations of active compound doses. Pharmaceutical compositions, which can be used orally, include push-fit capsules made of gelatin as well as soft, sealed capsules made of gelatin and a plasticizer, such as glycerol or sorbitol. The push-fit capsules may contain the active ingredients in admixture with filler such as lactose, binders such as starches, lubricants such as talc or magnesium stearate and, optionally, stabilizers. In soft capsules, the active ingredients may be dissolved or suspended in suitable liquids, such as fatty oils, liquid paraffin, or liquid polyethylene
glycols. In addition, stabilizers may be added. All formulations for oral adminisfration should be in dosages suitable for the chosen route of adminisfration. For buccal adminisfration, the compositions may take the form of tablets or lozenges formulated in conventional manner. For administration by nasal inhalation, the active ingredients for use according to the present invention are conveniently delivered in the form of an aerosol spray presentation from a pressurized pack or a nebulizer with the use of a suitable propellant, e.g., dichlorodifluoromethane, trichlorofluoromethane, dichloro-tetrafluoroethane or carbon dioxide. In the case of a pressurized aerosol, the dosage unit may be determined by providing a valve to deliver a metered amount. Capsules and cartridges of, e.g., gelatin for use in a dispenser may be formulated containing a powder mix ofthe compound and a suitable powder base such as lactose or starch. The preparations described herein may be formulated for parenteral administration, e.g., by bolus injection or continuous infusion. Formulations for injection may be presented in unit dosage form, e.g., in ampoules or in multidose containers with optionally, an added preservative. The compositions may be suspensions, solutions or emulsions in oily or aqueous vehicles, and may contain formulatory agents such as suspending, stabilizing and/or dispersing agents. Pharmaceutical compositions for parenteral administration include aqueous solutions ofthe active preparation in water-soluble form. Additionally, suspensions of the active ingredients may be prepared as appropriate oily or water based injection suspensions.
Suitable lipophilic solvents or vehicles include fatty oils such as sesame oil, or synthetic fatty acids esters such as ethyl oleate, friglycerides or liposomes. Aqueous injection suspensions may contain substances, which increase the viscosity ofthe suspension, such as sodium carboxymethyl cellulose, sorbitol or dextran. Optionally, the suspension may also contain suitable stabilizers or agents which increase the solubility ofthe active ingredients to allow for the preparation of highly concentrated solutions. Alternatively, the active ingredient may be in powder form for constitution with a suitable vehicle, e.g., sterile, pyrogen-free water based solution, before use. The preparation ofthe present invention may also be formulated in rectal compositions such as suppositories or retention enemas, using, e.g., conventional suppository bases such as cocoa butter or other glycerides.
Pharmaceutical compositions suitable for use in context of the present invention include compositions wherein the active ingredients are contained in an amount effective to achieve the intended puφose. More specifically, a therapeutically effective amount means an amount of active ingredients effective to prevent, alleviate or ameliorate symptoms of disease or prolong the survival ofthe subject being treated. Determination of a therapeutically effective amount is well within the capability of those skilled in the art. For any preparation used in the methods ofthe invention, the therapeutically effective amount or dose can be estimated initially from in vitro assays. For example, a dose can be formulated in animal models and such information can be used to more accurately determine useful doses in humans. Toxicity and therapeutic efficacy ofthe active ingredients described herein can be determined by standard pharmaceutical procedures in vitro, in cell cultures or experimental animals. The data obtained from these in vitro and cell culture assays and animal studies can be used in formulating a range of dosage for use in human. The dosage may vary depending upon the dosage form employed and the route of adminisfration utilized. The exact formulation, route of administration and dosage can be chosen by the individual physician in view of the patient's condition. (See e.g., Fingl, et al., 1975, in "The Pharmacological Basis of Therapeutics", Ch. 1 p. l). Depending on the severity and responsiveness ofthe condition to be treated, dosing can be of a single or a plurality of administrations, with course of treatment lasting from several days to several weeks or until cure is effected or diminution ofthe disease state is achieved. The amount of a composition to be administered will, of course, be dependent on the subject being treated, the severity ofthe affliction, the manner of adminisfration, the judgment ofthe prescribing physician, etc. Compositions including the preparation ofthe present invention formulated in a compatible pharmaceutical canier may also be prepared, placed in an appropriate container, and labeled for treatment of an indicated condition. Pharmaceutical compositions ofthe present invention may, if desired, be presented in a pack or dispenser device, such as an FDA approved kit, which may contain one or more unit dosage forms containing the active ingredient. The pack may, for example, comprise metal or plastic foil, such as a blister pack. The pack or dispenser device may be
accompanied by instructions for administration. The pack or dispenser may also be accommodated by a notice associated with the container in a form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals, which notice is reflective of approval by the agency ofthe form ofthe compositions or human or veterinary adminisfration. Such notice, for example, may be of labeling approved by the U.S. Food and Drug Administration for prescription drugs or of an approved product insert.
IMMUNOGENIC COMPOSITIONS A therapeutic agent according to the present invention may optionally be a molecule, which promotes a specific immunogenic response against at least one of the polypeptides of the present invention in the subject. The molecule can be polypeptide variants ofthe present invention, a fragment derived therefrom or a nucleic acid sequence encoding thereof. Although such a molecule can be provided to the subject per se, the agent is preferably administered with an immunostimulant in an immunogenic composiiton. An immunostimulant may be any substance that enhances or potentiates an immune response (antibody and/or cell-mediated) to an exogenous antigen. Examples of immunostimulants include adjuvants, biodegradable microspheres (e.g., polylactic galactide) and liposomes into which the compound is incoφorated (see e.g., U.S. Pat. No. 4,235,877). Vaccine preparation is generally described in, for example, M. F. Powell and M. J. Newman, eds., "Vaccine Design (the subunit and adjuvant approach)," Plenum Press (NY, 1995). Illustrative immunogenic compositions may contain DNA encoding one or more of the polypeptides as described above, such that the polypeptide is generated in situ. The DNA may be present within any of a variety of delivery systems known to those of ordinary skill in the art, including nucleic acid expression systems (see below), bacteria and viral expression systems. Numerous gene delivery techniques are well known in the art, such as those described by Rolland, Crit. Rev. Therap. Drug Canier Systems 15:143-198, 1998, and references cited therein. Appropriate nucleic acid expression systems contain the necessary DNA sequences for expression in the subject (such as a suitable promoter and terminating signal). Bacterial delivery systems involve the administration of a bacterium (such as Bacillus-Calmette-Guenin) that expresses an immunogenic portion ofthe polypeptide on its cell surface or secretes such an epitope. In a prefened embodiment, the DNA may be introduced using a viral expression system (e.g., vaccinia or other pox virus, retrovirus, or adenovirus), which may involve the use of a non-pathogenic (defective), replication
competent virus. Suitable systems are disclosed, for example, in Fisher-Hoch et al., Proc. Natl. Acad. Sci. USA 86:317-321, 1989; Flexner et al., Ann. N.Y Acad. Sci. 569:86-103, 1989; Flexner et al., Vaccine 8:17-21, 1990; U.S. Pat. Nos. 4,603,112, 4,769,330, and 5,017,487; WO 89/01973; U.S. Pat. No. 4,777,127; GB 2,200,651; EP 0,345,242; WO 91/02805; Berkner, Biotechniques 6:616-627, 1988; Rosenfeld et al., Science 252:431-434, 1991 ; Kolls et al., Proc. Natl. Acad. Sci. USA 91 :215-219, 1994; Kass-Eisler et al., Proc. Natl. Acad. Sci. USA 90:11498-1 1502, 1993; Guzman et al., Circulation 88:2838-2848, 1993; and Guzman et al., Cir. Res. 73:1202-1207, 1993. Techniques for incoφorating DNA into such expression systems are well known to those of ordinary skill in the art. The DNA may also be "naked," as described, for example, in Ulmer et al., Science 259:1745-1749, 1993 and reviewed by Cohen, Science 259:1691-1692, 1993. The uptake of naked DNA may be increased by coating the DNA onto biodegradable beads, which are efficiently transported into the cells. It will be appreciated that an immunogenic composition may comprise both a polynucleotide and a polypeptide component. Such immunogenic compositions may provide for an enhanced immune response. Any of a variety of immunostimulants may be employed in the immunogenic compositions of this invention. For example, an adjuvant may be included. Most adjuvants contain a substance designed to protect the antigen from rapid catabolism, such as aluminum hydroxide or mineral oil, and a stimulator of immune responses, such as lipid A, Bortadella pertussis or Mycobacterium tuberculosis derived proteins. Suitable adjuvants are commercially available as, for example, Freund's Incomplete Adjuvant and Complete Adjuvant (Difco Laboratories, Detroit, Mich.); Merck Adjuvant 65 (Merck and Company, Inc., Rahway, N.J.); AS-2 (SmithKline Beecham, Philadelphia, Pa.); aluminum salts such as aluminum hydroxide gel (alum) or aluminum phosphate; salts of calcium, iron or zinc; an insoluble suspension of acylated tyrosine; acylated sugars; cationically or anionically derivatized polysaccharides; polyphosphazenes; biodegradable microspheres; monophosphoryl lipid A and quil A. Cytokines, such as GM-CSF or interleukin-2,-7, or -12, may also be used as adjuvants. The adjuvant composition may be designed to induce an immune response predominantly ofthe Thl type. High levels of Thl-type cytokines (e.g., IFN-.gamma., TNF. alpha., IL-2 and IL-12) tend to favor the induction of cell mediated immune responses to an administered antigen. In contrast, high levels of Th2-type cytokines (e.g., IL-4, IL-5,
IL-6 and IL-10) tend to favor the induction of humoral immune responses. Following application of an immunogenic composition as provided herein, the subject will support an immune response that includes Thl- and Th2-type responses. The levels of these cytokines may be readily assessed using standard assays. For a review ofthe families of cytokines, see Mosmann and Coffinan, Ann. Rev. Immunol. 7:145-173, 1989. Prefened adjuvants for use in eliciting a predominantly Thl-type response include, for example, a combination of monophosphoryl lipid A, preferably 3-de-O-acylated monophosphoryl lipid A (3D-MPL), together with an aluminum salt. MPL adjuvants are available from Corixa Coφoration (Seattle, Wash.; see U.S. Pat. Nos. 4,436,727; 4,877,611; 4,866,034 and 4,912,094). CpG-containing oligonucleotides (in which the CpG dinucleotide is unmethylated) also induce a predominantly Thl response. Such oligonucleotides are well known and are described, for example, in WO 96/02555, WO 99/33488 and U.S. Pat. Nos. 6,008,200 and 5,856,462. Imrnunostimulatory DNA sequences are also described, for example, by Sato et al., Science 273:352, 1996. Another prefened adjuvant is a saponin, preferably QS21 (Aquila Biopharmaceuticals Inc., Framingham, Mass.), which may be used alone or in combination with other adjuvants. For example, an enhanced system involves the combination of a monophosphoryl lipid A and saponin derivative, such as the combination of QS21 and 3D-MPL as described in WO 94/00153, or a less reactogenic composition where the QS21 is quenched with cholesterol, as described in WO 96/33739. Other prefened formulations comprise an oil-in-water emulsion and tocopherol. A particularly potent adjuvant formulation involving QS21, 3D-MPL and tocopherol in an oil-in-water emulsion is described in WO 95/17210. Other prefened adjuvants include Montanide ISA 720 (Seppic, France), SAF (Chiron, Calif, United States), ISCOMS (CSL), MF-59 (Chiron), the SBAS series of adjuvants (e.g., SBAS-2 or SBAS-4, available from SmithKline Beecham, Rixensart,
Belgium), Detox (Corixa, Hamilton, Mont.), RC-529 (Corixa, Hamilton, Mont.) and other aminoalkyl glucosaminide 4-phosphates (AGPs), such as those described in pending U.S. patent application Ser. Nos. 08/853,826 and 09/074,720. A delivery vehicle may be employed within the immunogenic composition ofthe present invention to facilitate production of an antigen-specific immune response that targets tumor cells. Delivery vehicles include antigen presenting cells (APCs), such as dendritic cells, macrophages, B cells, monocytes and other cells that may be engineered to be efficient APCs. Such cells may be genetically modified to increase the capacity for presenting the
antigen, to improve activation and/or maintenance ofthe T cell response, to have anti-tumor effects per se and/or to be immunologically compatible with the receiver (i.e., matched HLA haplotype). APCs may generally be isolated from any of a variety of biological fluids and organs, including tumor and peritumoral tissues, and may be autologous, allogeneic, syngeneic or xenogeneic cells. Dendritic cells are highly potent APCs (Banchereau and Steinman, Nature 392:245- 251, 1998) and have been shown to be effective as a physiological adjuvant for eliciting prophylactic or therapeutic antitumor immunity (see Timmernan and Levy, Ann. Rev. Med. 50:507-529, 1999). In general, dendritic cells may be identified based on their typical shape (stellate in situ, with marked cytoplasmic processes (dendrites) visible in vitro), their ability to take up, process and present antigens with high efficiency and their ability to activate naive T cell responses. Dendritic cells may, of course, be engineered to express specific cell- surface receptors or ligands that are not commonly found on dendritic cells in vivo or ex vivo, and such modified dendritic cells are contemplated by the present invention. As an alternative to dendritic cells, secreted vesicles antigen-loaded dendritic cells (called exosomes) may be used within an immunogenic composition (see Zitvogel et al., Nature Med. 4:594-600, 1998). Dendritic cells and progenitors may be obtained from peripheral blood, bone manow, tumor-infiltrating cells, peritumoral tissues-infiltrating cells, lymph nodes, spleen, skin, umbilical cord blood or any other suitable tissue or fluid. For example, dendritic cells may be differentiated ex vivo by adding a combination of cytokines such as GM-CSF, IL-4, IL-13 and/or TNF.alpha. to cultures of monocytes harvested from peripheral blood. Alternatively, CD34 positive cells harvested from peripheral blood, umbilical cord blood or bone manow may be differentiated into dendritic cells by adding to the culture medium combinations of GM-CSF, IL-3, TNF.alpha., CD40 ligand, LPS, flt3 ligand and/or other compound(s) that induce differentiation, maturation and proliferation of dendritic cells. Dendritic cells are categorized as "immature" and "mature" cells, which allows a simple way to discriminate between two well characterized phenotypes. Immature dendritic cells are characterized as APC with a high capacity for antigen uptake and processing, which conelates with the high expression of Fey receptor and mannose receptor. The mature phenotype is typically characterized by a lower expression of these markers, but a high expression of cell surface molecules responsible for T cell activation such as class 1 and class
II MHC, adhesion molecules (e.g., CD54 and CDl 1) and costimulatory molecules (e.g., CD40, CD80, CD86 and 4- IBB). APCs may generally be transfected with at least one polynucleotide encoding a polypeptide ofthe present invention, such that variant II, or an immunogenic portion thereof, is expressed on the cell surface. Such transfection may take place ex vivo, and a composition comprising such transfected cells may then be used for therapeutic puφoses, as described herein. Alternatively, a gene delivery vehicle that targets a dendritic or other antigen presenting cell may be administered to the subject, resulting in transfection that occurs in vivo. In vivo and ex vivo transfection of dendritic cells, for example, may generally be performed using any methods known in the art, such as those described in WO 97/24447, or the gene gun approach described by Mahvi et al., Immunology and cell Biology 75:456-460, 1997. Antigen loading of dendritic cells may be achieved by incubating dendritic cells or progenitor cells with a polypeptide ofthe present inventio, DNA (naked or within a plasmid vector) or RNA; or with antigen-expressing recombinant bacterium or viruses (e.g., vaccinia, fowlpox, adenovirus or lentivirus vectors). Prior to loading, the polypeptide may be covalently conjugated to an immunological partner that provides T cell help (e.g., a carrier molecule) such as described above. Alternatively, a dendritic cell may be pulsed with a non- conjugated immunological partner, separately or in the presence ofthe polypeptide. It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents and patent applications mentioned in this specification are herein incoφorated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incoφorated herein by reference. In addition, citation or identification of any
reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.