Abstract
Due to the development of very high-throughput lab technology, known as DNA microarrays, it has become feasible for scientists to monitor the transcriptional activity of all known genes in many living organisms. Such assays are typically conducted repeatedly, along a timecourse or across a series of predefined experimental conditions, yielding a set of expression profiles. Arranging these into subsets, based on their pair-wise similarity, is known as clustering. Clusters of genes exhibiting similar expression behavior are often related in a biologically meaningful way, which is at the center of interest to research in functional genomics.
We present a distributed, parallel system based on spectral graph theory and numerical linear algebra that can solve this problem for datasets generated by the latest generation of microarrays, and at high levels of experimental noise. It allows us to process hundreds of thousands of expression profiles, thereby vastly increasing the current size limit for unsupervized clustering with full similarity information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ernst, J.: Similarity-Based Clustering Algorithms for Gene Expression Profiles, Dissertation, TU München (2003)
Gourlay, A., Watson, G.: Computational Methods for Matrix Eigenproblems. John Wiley & Sons, New York (1973)
Jiang, D., Tang, C., Zhang, A.: Cluster Analysis for Gene Expression Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)
Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical Recipes in C: The Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (1992)
Schena, M., Shalon, D., Davis, R.W., Brown, P.O.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995)
Shamir, R., Sharan, R.: Algorithmic approaches to clustering gene expression data. Current Topics in Computational Biology, 269–300 (2002)
Spira, A., Beane, J., Shah, V., Liu, G., Schembri, F., Yang, X., Palma, J., Brody, J.S.: Effects of cigarette smoke on the human airway epithelial cell transcriptome. Proc. Natl. Acad. Sci. US 101(27), 10143–10148 (2004)
Valafar, F.: Pattern Recognition Techniques in Microarray Data: A Survey. Special Issue of Annals of New York, Techniques in Bioinformatics and Medical Informatics 980, 41–64 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ernst, J. (2006). A Distributed, Parallel System for Large-Scale Structure Recognition in Gene Expression Data. In: Gerndt, M., Kranzlmüller, D. (eds) High Performance Computing and Communications. HPCC 2006. Lecture Notes in Computer Science, vol 4208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11847366_3
Download citation
DOI: https://doi.org/10.1007/11847366_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39368-9
Online ISBN: 978-3-540-39372-6
eBook Packages: Computer ScienceComputer Science (R0)