Abstract
To establish a clean basis for studying alternative splicing and gene regulation in life science projects, a powerful data modeling and also a strict validation procedure for assigning levels of reliability to given gene models is essential. One common problem of public genome databases are insufficiently organized and linked description data, which make it difficult to study relations of the alternative isoforms of a gene that are relevant for medi cine and plant genome research. This is a severe obstacle for the integration of biological data and motivated us to establish a new modeling instance and that we call splice template or sTMP. Every sTMP has a unique splicing pattern, but the length of the first and the last exon remains undefined. This allows to model different gene isoforms with the same splicing pattern. By utilizing this more fine-grained data structure, many cases of plurivalent mRNA-CDS relations are uncovered. There are more than 3,000 extra CDSs in the human genome compatible with the categories sTMP, mRNA and CDS, which exceed the classical one-to-one relations of mRNAs and CDSs. In one case, 11 extra CDSs are compatible with one mRNA. Crosslinks between mRNAs derived from different sTMPs leading to the same CDS are now accessible as well as disease-related ruptures in UTR regions. This allows discovering and validating disease and tissue specific differences in alternative splicing, gene expression and regulation. Another problem in public databases is a too much relaxed standard for labeling genes “confirmed by ESTs and full-length-cDNAs.” We provide a pipeline that handles gene annotations from different sources, integrates them into complex gene models and assigns strict validation tags, constrained by a local low-error model for the alignments of genome annotation and transcripts. The data structures are being implemented and made publicly available at the Plant Data Warehouse of the Bioinformatics Center Gatersleben-Halle (http://portal.bic-gh.de/sTMP).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Haas, B.J., Volfovsky, N., Town, C.D., Troukhan, M., Alexandrov, N., Feldmann, K.A., Flavell, R.B., White, O., Salzberg, S.L.: Full-length messenger RNA sequences greatly improve genome annotation. Genome Biology 2002 3(6), 1–12 (2002)
EnsEMBL/UCSC Golden Path gene annotation, http://genome.ucsc.edu/goldenPath/
TIGR, The Arabidopsis thaliana genome TIGR/NCBI revision 5.0 from (February 19, 2004) (2004), http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=3702
NCBI (2004-2006), http://www.ncbi.nlm.nih.gov/
Schell, T., Kulozik, A., Hentze1, M.W.: Integration of splicing, transport and translation to achieve mRNA quality control by the nonsense-mediated decay pathway. Genome Biology (2002), doi:10.1186/gb-2002-3-3-reviews1006
Scottish Crop Research Institute. Computational Biology (snoRNAs) (2004), http://bioinf.scri.sari.ac.uk/cgi-bin/plant_snorna/introduction
Hiller, M., Huse, K., Szafranski, K., Jahn, N., Hampe, J., Schreiber, S., Backofen, R., Platzer, M.: Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity. Nature Genetics 36, 1255–1257 (2004)
Thanaraj, T.A., Stamm, S., Clark, F., Riethoven, J.-J., Le Texier, V., Muilu, J.: ASD: the Alternative Splicing Database Nucleic Acids Research 32(Database issue), 2004, pp. D64–D69 (2004-2005)
Usuka, J., Zhu, W., Brendel, V.: Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16, 203–211 (2000)
Kent, W.J.: BLAT—The BLAST-Like Alignment Tool. Gen. Res. 12, 656–664 (2002)
Kleffe, J., Möller, F., Wessel, R., Wittig, B.: Identification of perfect matches in large sets of sequences. (ClustDB) (submitted, 2006)
Grosse, I., Funke, T., Kuenne, C., Neumann, S., Stephanik, A., Thiel, T., Weise, S.: Integrative Datenanalyse mit dem Plant Data Warehouse Vorträge für Pflanzenzüchtung, vol. 70, pp. 50–53 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mielordt, S., Grosse, I., Kleffe, J. (2006). Data Structures for Genome Annotation, Alternative Splicing, and Validation. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_11
Download citation
DOI: https://doi.org/10.1007/11799511_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36593-8
Online ISBN: 978-3-540-36595-2
eBook Packages: Computer ScienceComputer Science (R0)