Abstract
The rapid advancement of DNA microarray technology has revolutionalized genetic research in bioscience. Due to the enormous amount of gene expression data generated by such technology, computer processing and analysis of such data has become indispensable. In this paper, we present a computational framework for the extraction, analysis and visualization of gene expression data from microarray experiments. A novel, fully automated, spot segmentation algorithm for DNA microarray images, which makes use of adaptive thresholding, morphological processing and statistical intensity modeling, is proposed to: (i) segment the blocks of spots, (ii) generate the grid structure, and (iii) to segment the spot within each subregion. For data analysis, we propose a binary hierarchical clustering (BHC) framework for the clustering of gene expression data. The BHC algorithm involves two major steps. Firstly, the fuzzy C-means algorithm and the average linkage hierarchical clustering algorithm are used to split the data into two classes. Secondly, the Fisher linear discriminant analysis is applied to the two classes to assess whether the split is acceptable. The BHC algorithm is applied to the sub-classes recursively and ends when all clusters cannot be split any further. BHC does not require the number of clusters to be known in advance. It does not place any assumption about the number of samples in each cluster or the class distribution. The hierarchical framework naturally leads to a tree structure representation for effective visualization of gene expressions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
S.K. Moore, “Making Chips to Probe Genes,” IEEE Spectrum, March 2001, pp. 54–60.
D.J. Lockhart and E.A. Winzeler, “Genomics, Gene Expression and DNA Arrays,” Nature, vol. 405, 2000, pp. 827–846.
A. Brazma and J. Vilo, “Minireview: Gene Expression Data Analysis,” European Molecular Biology Laboratory, Outstation Hinxton—The European Bioinformatics Institute, Cambridge CB10 ISD UK, 2000.
N.S. Halter, A. Maritan, M. Cieplak, N.V. Fedoroff, and J.R. Bamavar, “Dynamic Modelling of Gene Expression Data,” Department of Physic and Center for Materials Physics, and Department of Biology and the Life Sciences Consortium, Penndylvania State University, University Park, PA 16802, 2000.
U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” in Proceedings of the National Academy of Science USA, 1999, vol. 96, pp. 6745–6750.
C.M. Perou, S.S. Jeffrey, M. van de Rijn, C.A. Rees, M.B. Eisen, D.T. Ross, A. Pergamenschikov, C.F. Williams, S.X. Zhu, J.C.F. Lee, D. Lashkari, D. Shalon, P.O. Brown, and D. Botstein, “Distinctive Gene Expression Patterns in Human Mammary Epithelial Cells and Breast Cancers,” in Proceedings of the National Academy of Science USA, 1999, vol. 96, pp. 9212–9217.
K.P. White, S.A. Rifkin, P. Hurban, and D.S. Hogness, “Microarray Analysis of Drosophila Development during Metamorphosis,” Science,vol. 286, 1999, pp. 2179–2184.
K.Y. Yeung and W.L. Ruzzo, “Principal Component Analysis for Clustering Gene Expression Data,” Bioinformatics,vol. 17, no. 9, 2001, pp. 763–774.
C. Tang, L. Zhang, and A. Zhang, “Interactive Visualization and Analysis for Gene Expression Data,” in IEEE Proceedings of the Hawaii International Conference on System Sciences. Big Island, HI, Jan. 2002, vol. 6, pp. 143–166.
M. Eisen, Cluster and TreeView Manual. Stanford University, 1998/1999.
M. Eisen, ScanAlyze User Manual, Stanford University, 1999, http://rana.lbl.gov/EisenSoftware.htm.
Axon Instruments Inc. GenePix Pro 3.0, 2001.
Packard BioChip Technologies, LLC, QuantArray Mocroarray Analysis Software.
Yale Microarray Database, Yale University, “Direction for Using Quantarray Microarray Analysis Software,” http://info.med.yale.edu/microarray/quantarray.pdf.
J. Buhler, T. Ideker, and D. Haynor, “Dapple: Improved Techniques for Finding Spots on DNA Microarrays,” Technical Report UWTR 2000-08-05, University of Washington.
M. Buckley, “The Spot User's Guide,” CSIRO Mathematical and Information Sciences, Australia, http://www.cmis.csiro.au/iap/spot.htm.
Y. Chen, E.R. Dougherty, and M.L. Bittner, “Ratio-Based Decisions and the Quantitative Analysis of cDNA Microarray Images,” Journal of Biomedical Optics,vol. 2, 1997, pp. 364–374. http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/img analysis.html.
X. Wang, S. Ghosh, and S.W. Guo, “Quantitative Quality Control in Microarray Image Processing and Data Acquisition,” Nucleic Acids Research,vol. 29, no. 15, 2001, p. e75.
J. Barrera, D.O. Dantas, R.F. Hashimoto, and R. Hirata, Jr., “Microarray Gridding by Mathematical Morphology,” in Proceedings of XIV Brazilian Symposium on Computer Graphics and Image Processing 2001, Oct. 2001, pp. 112–119.
R. Baumgartner, S. Booth, and C. Bowman, “Automated Analysis of Gene-Microarray Images,” in Proceedings of Canadian Conference on Electrical and Computer Engineering IEEE CCECE 2002, 2002, vol. 2, pp. 1140–1144.
A.K. Jain and R.C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ, 1988.
E.R. Dougherty, An Introduction to Morphological Image Processing. SPIE-The International Society for Optical Engineering, 1992.
Y. Chen, E.R. Dougherty, and M.L. Bittner, “Ratio Based Decisions and the Quantitative Analysis of cDNA Microarray Images,” Journal of Biomedical Optics,vol. 2, 1997, pp. 364–374.
R.D. Wolfinger, G. Gibson, E.D. Wolfinger, L. Bennett, H. Hamadeh, P. Bushel, C. Afshari, and R.S. Paules, “Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models,” Journal of Computational Biology,vol. 8, 2001, pp. 625–637.
Y.H. Yang, S. Dudoit, P. Luu, D.M. Lin, V. Peng, J. Ngai, and T.P. Speed, “Normalization for cDNA Microarray Data: A Robust Composite Method Addressing Single and Multiple Slide Systematic Variation,” Nucleic Acids Research,vol. 30, no. 4, 2002, p. e15.
A.W.C. Liew, H. Yan, and M. Yang, “Robust Adaptive Spot Segmentation of DNA Microarray Images,” Pattern Recognition, vol. 36, 2003, pp. 1251–1254.
S. Chu, J. DeRisi, M. Eisen, J. Mulholland, D. Botstein, P.O. Brown, and I. Herskowitz, “The Transcriptional Program of Sporulation in Budding Yeast,” Science, no. 282, 1998, pp. 699–705.
T. Chen, V. Filkov, and S.S. Skiena, “Identifying Gene Regulatory Networks from Experimental Data,” in Proceedings of the Third Annual International Conference on Computational Molecular Biology RECOMB99, Lyon, France, March 1999, pp. 94–103.
T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, no. 5439, 1999, pp. 531–537.
P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E.S. Lander, and T.R. Golub, “Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation,” in Proceedings of the National Academy of Science USA, March 1999, vol. 96, pp. 2907–2912.
D.A. Clausi, “K-Means Iterative Fisher (KIF) Unsupervised Clustering Algorithm Applied to Image Texture Segmentation,” Pattern Recognition,vol. 35, 2002, pp. 1959–1972.
L.K. Szeto, A.W.C. Liew, H. Yan, and S.S. Tang, “Gene Expression Data Clustering and Visualization Using a Binary Hierarchical Clustering Framework,” in Proceedings of the First Asia-Pacific Bioinformatics Conference APBC2003, Adelaide, Australia, Feb. 2003, pp. 4–7.
L.K. Szeto, A.W.C. Liew, H. Yan, and S.S. Tang, “Gene Expression Data Clustering and Visualization Based on a Binary Hierarchical Clustering Framework,” Special Issue on Biomedical Visualization for Bioinformatics, Journal of Visual Languages and Computing,vol. 14, 2003, pp. 341–362.
W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flanery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 2nd edition, 1992.
G. Strang, Introduction To Linear Algebra.Wellesley-Cambridge Press, 1998.
O. Alter, P.O. Brown, and D. Botstein, “Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling,” in Proceedings of the National Academy of Science USA, 2000, vol. 97, pp. 10101–10106.
O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, Trevor Hastie, R. Tibshirani, D. Botsttein, and R.B. Altman, “Missing Value Estimation Methods for DNA Microarrays,” Bioinformatics, vol. 17, no. 6, 2001, pp. 520–525.
A.W.C. Liew, S.H. Leung, and W.H. Lau, “Fuzzy Image Clustering Incorporating Spatial Continuity,” in IEE Proceedings-Vision, Image and Signal Processing, April 2000, vol. 147, pp. 185–192.
J.C. Bezdek, Pattern Recognitionwith Fuzzy Objective Function. New York: Plenum Press, 1981.
R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. New York: Wiley-Interscience, 2001.
P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, and B. Futcher, “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell,vol. 9, 1998, pp. 3273–3297.
M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” in Proceedings of the National Academy of Science USA, Dec. 1998, vol. 95, pp. 14863–14868.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Liew, A.WC., Szeto, L.K., Tang, SS. et al. A Computational Approach to Gene Expression Data Extraction and Analysis. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 38, 237–258 (2004). https://doi.org/10.1023/B:VLSI.0000042490.35986.84
Published:
Issue Date:
DOI: https://doi.org/10.1023/B:VLSI.0000042490.35986.84