Abstract
The advent of the so-called cDNA microarrays has offered the first possibility to obtain a global understanding of biological processes in living organisms by simultaneous readouts of tens of thousands of genes. Initial experiments suggest that genes with similar function have similar expression patterns in microarray experiments. Until now, most approaches to computational analysis of gene expressions have used unsupervised learning. Although in some cases unsupervised methods may be suficient, the complexity of the biological processes is so high that it is unlikely that purely syntactical analyses are capable of fully exploiting the richness of the microarray data. In addition, it seems natural to re-use the existing biological (background) knowledge. In this paper, we present some elements of a methodology for knowledge discovery from microarray experiments. Two source of bio-medical knowledge are used: Ashburner’s gene ontology and our own literature-derived network of gene-gene relations obtained by analysing Medline citation records. Predictive models can be induced and their classification quality validated through the ROC/AUC analysis and applied to provide hypotheses regarding the function of unclassified genes. The methodology has been so far tested on publicly available gene expression data and its results evaluated by molecular biologists and medical researchers.
Chapter PDF
Similar content being viewed by others
Keywords
- Knowledge Discovery
- Squalene Epoxidase
- Computational Molecular Biology
- Norwegian Radium Hospital
- Universal Academy
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Schena M, Shalon D, Davis R and Brown PO, Quantitative monitoring of gene expression patterns with a cDNA microarray, Science, 270:467–470, 1995.
Deboucek and Goodfellow, Nature Genetics, 21 (1 Suppl):48–52, 1999.
Brown MPS, Grundy WN, Cristianini N, Sugnet CW, Furey TS, Ares M and Haussler D, Knowledge-based analysis of microarray gene expression data by using support vector machines, PNAS, No. 1, Vol. 97:262–267, 1999.
Eisen M, Spellman P, Brown P and Botstein D, Cluster analysis and display of genome-wide expression pattern, Proc. Natl. Acad. Sci. USA, 95:1464–1480, 1998.
Kohonen T, The Self-Organizing Map, Proceedings of the IEEE, Vol. 78, No. 9:1464–1480, 1990.
Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM, Staudt LM, Dudson Jr. J, Boguski MS, Lashkari D, Shalon D, Botstein D and Brown PO, The transcriptional program in the response of human fibroblasts to serum, Science, 283:83–87, 1999.
White JA, et al., Guidelines for human gene nomenclature, Genomics, 45(2):468–471, Oct 15 1997.
White JA, et al., The HUGO Nomenclature Committee home page http://www.gene.ucl.ac.uk/nomenclature.
Jenssen TK, The PubGene home page http://www.idi.ntnu.no/grupper/KS-grp/microarray/pubgen/genes.cgi.
Jensen T-K, Lægreid A, Komorowski J and Hovig E, A literature network of human genes for high-throuput gene-expression analysis, submitted for publication, June 2000.
Pawlak Z, Rough Sets, International Journal of Computer and Information Sciences, Vol. 11:341–356,1982.
Komorowski J, Skowron A and Øhrn A, The Rosetta system, to appear in Handbook of Data Mining and Knowledge Discovery, (W. Klösgen, J. Zytkow, Eds.), Oxford University Press, 2000.
Komorowski J and Øhrn A, Modelling Prognostic Power of Cardiac Tests Using Rough Sets, Artificial Intelligence in Medicine, Vol. 15, No. 2:167–191, 1999.
Hvidsten TR, Komorowski J, Lægreid A and Sandvik, Discovery of gene functions and processes from gene expressions and ontologies, submitted for publication, July 2000.
Hvidsten TR, Jenssen T-K, Komorowski J, Lægreid A, Sandvik A and Tjeldvoll D, Template-based gene expression analysis, in “Currents in Computational Molecular Biology-RECOMB 2000”, edited by S. Miyano, R. Shamir and T. Takagi, pp. 10–11, ISBN 4-946443-61-4, Universal Academy Press, Inc, April 8–11, 2000, Tokyo, Japan.
Jenssen T-K, Lægreid A, Komorowski J and Hovig E, PubGene: Discovering and visualising gene-gene relations, in “Currents in Computational Molecular Biology-RECOMB 2000”, edited by S. Miyano, R. Shamir and T. Takagi, pp. 48–49, ISBN 4-946443-61-4, Universal Academy Press, Inc, April 8–11, 2000, Tokyo, Japan.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Komorowski, J. et al. (2000). Towards Knowledge Discovery from cDNA Microarray Gene Expression Data. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2000. Lecture Notes in Computer Science(), vol 1910. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45372-5_53
Download citation
DOI: https://doi.org/10.1007/3-540-45372-5_53
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41066-9
Online ISBN: 978-3-540-45372-7
eBook Packages: Springer Book Archive