Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values

Published: 01 August 2010 Publication History

Abstract

Distance based clustering algorithms can group genes that show similar expression values under multiple experimental conditions. They are unable to identify a group of genes that have similar pattern of variation in their expression values. Previously we developed an algorithm called divisive correlation clustering algorithm (DCCA) to tackle this situation, which is based on the concept of correlation clustering. But this algorithm may also fail for certain cases. In order to overcome these situations, we propose a new clustering algorithm, called average correlation clustering algorithm (ACCA), which is able to produce better clustering solution than that produced by some others. ACCA is able to find groups of genes having more common transcription factors and similar pattern of variation in their expression values. Moreover, ACCA is more efficient than DCCA with respect to the time of execution. Like DCCA, we use the concept of correlation clustering concept introduced by Bansal et al. ACCA uses the correlation matrix in such a way that all genes in a cluster have the highest average correlation values with the genes in that cluster. We have applied ACCA and some well-known conventional methods including DCCA to two artificial and nine gene expression datasets, and compared the performance of the algorithms. The clustering results of ACCA are found to be more significantly relevant to the biological annotations than those of the other methods. Analysis of the results show the superiority of ACCA over some others in determining a group of genes having more common transcription factors and with similar pattern of variation in their expression profiles. Availability of the software: The software has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http://www.isical.ac.in/~rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software.

References

[1]
Alon N, Makarychev K, Makarychev Y, Naor A. Quadratic forms on graphs. In: Proceedings of the 37th STOC; 2005. p. 634-43.
[2]
Arai, K. and Barakbah, A.R., Hierarchical k-means: an algorithm for centroids initialization for k-means. Rep Fac Sci Eng Saga Univ. v36. 25-31.
[3]
Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H. and Cherry, J., Gene ontology: tool for the unification of biology. Nat Genet. v25 i1. 25-29.
[4]
Bansal, N., Blum, A. and Chawla, S., Correlation clustering. Mach Learn. v56. 89-113.
[5]
Berriz, F.G., King, O.D., Bryant, B., Sander, C. and Roth, F.P., Characterizing gene sets with funcassociate. Bioinformatics. v19 i18. 2502-2504.
[6]
Bezdek, J.C., Pattern recognition with fuzzy objective function algorithms. 1981. Plenum Press, New York.
[7]
Bezdek, J.C., Ehrlich, R. and Full, W., FCM: the fuzzy c-means clustering algorithm. Comput Geosci. v10. 191-203.
[8]
Bhattacharya, A. and De, R.K., Divisive correlation clustering algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles. Bioinformatics. v24 i11. 1359-1366.
[9]
Charikar M, Guruswami V, Wirth A. Clustering with qualitative information. In: Proceedings of the 44th FOCS; 2003. p. 524-33.
[10]
Charikar M, Wirth A. Maximizing quadratic programs: extending Grothendieck's inequality. In: Proceedings of the 45th FOCS; 2004. p. 524-33.
[11]
Cheng, Y. and Church, G.M., Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol. v8. 93-103.
[12]
Cohen W, Richman J. Learning to match and cluster large high-dimensional data sets for data integration. In: Proceedings of the eighth ACM SIGKDD; 2002. p. 475-80.
[13]
Datta, S. and Datta, S., Evaluation of clustering algorithms for gene expression data. BMC Bioinform. v7. s17
[14]
Demaine, E.D., Emanuel, D., Fiat, A. and Immorlica, N., Correlation clustering in general weighted graphs. Theor Comput Sci. v361. 172-187.
[15]
Demaine ED, Immorlica N. Correlation clustering with partial information. In: Proceedings of the RANDOM-APPROX; 2003. p. 1-13.
[16]
Dembele, D. and Kastner, P., Fuzzy c-means method for clustering microarray data. Bioinformatics. v19. 973-980.
[17]
Dunn, J.C., A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern. v3. 32-57.
[18]
Everitt, B.S., Landau, S. and Leese, M., Cluster analysis. 2001. Hodder Arnold, London.
[19]
Fraley, C. and Raftery, A.E., Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc. v97. 611-631.
[20]
Getz, G., Levine, E. and Domany, E., Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci USA. 12079-12084.
[21]
Gibbons, F. and Roth, F., Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res. v12 i10. 1574-1581.
[22]
Gordon, A.D., Classification. 1999. CRC Press, Boca Raton (FL).
[23]
Gun, A.M., Gupta, M.K. and Dasgupta, B., Fundamentals of statistics. 2005. The World Press Private Limited, Kolkata.
[24]
Gustafson, E.E. and Kessel, W.C., Fuzzy clustering with a fuzzy covariance matrix. Proc IEEE Conf Decision Control. 761-766.
[25]
Han, J. and Kamber, M., Data mining: concepts and techniques. 2001. Morgan Kaufman, Los Altos (CA).
[26]
Hartigan, J.A., Direct clustering of a data matrix. J Am Stat Assoc. v67 i337. 123-129.
[27]
Heyer, L.J., Kruglyak, S. and Yooseph, S., Exploring expression data: Identification and analysis of coexpressed genes. Genome Res. v9. 1106-1115.
[28]
Ihmels, J., Friedlander, G., Bergmann, S., Sarig, O., Ziv, Y. and Barkai, N., Revealing modular organization in the yeast transcriptional network. Nat Genet. v31. 370-377.
[29]
Issel-Tarver, L., Christie, K., Dolinski, K., Andrada, R., Balakrishnan, R. and Ball, C., Saccharomyces genome database. Methods Enzymol. v350. 329-346.
[30]
Jain, A.K. and Dubes, R.C., Algorithms for clustering data. 1988. Prentice-Hall, Englewood Cliffs (NJ).
[31]
Jain, N.C., Indrayan, A. and Goel, L.R., Monte carlo comparison of six hierarchical clustering methods on random data. Pattern Recogn. v19. 95-99.
[32]
Kim, D.W., Lee, K.H. and Lee, D., Detecting clusters of different geometrical shapes in microarray gene expression data. Bioinformatics. v21 i9. 1927-1934.
[33]
Kluger, Y., Basri, R., Chang, J.T. and Gerstein, M., Spectral biclustering of microarray cancer data: co-clustering genes and conditions. Genome Res. v13 i4. 703-716.
[34]
Li, K., Wang, L. and Hao, L., Comparison of cluster ensembles methods based on hierarchical clustering. Proc Int Conf Comput Intell Natl Comput. 499-502.
[35]
Loganantharaj, R., Cheepala, S. and Clifford, J., Metric for measuring the effectiveness of clustering of DNA microarray expression. BMC Bioinform. v7. s5
[36]
Lukashin, A.V. and Fuchs, R., Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters. Bioinformatics. v17. 405-414.
[37]
Maulik, U. and Bandyopadhyay, S., Genetic algorithm-based clustering technique. Pattern Recogn. v33. 1455-1465.
[38]
Press, W., Flannery, B., Teukolsky, S. and Vetterling, W., Numerical recipes - the art of scientific computing. 2003. Cambridge University Press, Cambridge.
[39]
Qin, J., Lewis, D.P. and Noble, W.S., Kernel hierarchical gene clustering from microarray expression data. Bioinformatics. v19. 2097-2104.
[40]
Reich, M., Ohm, K., Angelo, M., Tamayo, P. and Mesirov, J.P., Genecluster 2.0: an advanced toolset for bioarray analysis. Bioinformatics. v20. 1797-1798.
[41]
Shao, J., Tanner, S.W., Thompson, N. and Cheatham, T.E., Clustering molecular dynamics trajectories: 1. Characterizing the performance of different clustering algorithms. J Chem Theory Comput. v2. 2312-2334.
[42]
Sharan, R., Maron-Katz, A. and Shamir, R., Click and expander: a system for clustering and visualizing gene expression data. Bioinformatics. v19. 1787-1799.
[43]
Tanay, A., Sharan, R. and Shamir, R., Discovering statistically significant biclusters in gene expression data. Bioinformatics. v18. S136-S144.
[44]
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J. and Church, G.M., Systematic determination of genetic network architecture. Nat Genet. v22. 281-285.
[45]
Teyra, J., Paszkowski-Rogacz, M., Anders, G. and Pisabarro, M.T., Scowlp classification: Structural comparison and analysis of protein binding regions. BMC Bioinform. v9. 9
[46]
Wang H, Wang W, Yang J, Yu PS. Clustering by pattern similarity in large data sets. In: Proceedings of the ACM SIGMOD; 2002. p. 394-405.
[47]
Xu, Y., Olman, V. and Xu, D., Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees. Bioinformatics. v18 i4. 536-545.
[48]
Yang J, Wang W, Wang H, Yu P. ¿-Clusters: capturing subspace correlation in a large data set. In: Proceedings of the 18th IEEE international conference on data engineering; 2002. p. 517-28.
[49]
Yang J, Wang W, Wang H, Yu P. Enhanced biclustering on expression data. In: Proceedings of the third IEEE conference on bioinformatics and bioengineering; 2003. p. 321-27.
[50]
Zhang, Y., Luxon, B.A., Casola, A., Garofalo, R.P., Jamaluddin, M. and Brasier, A.R., Expression of respiratory syncytial virus-induced chemokine gene networks in lower airway epithelial cells revealed by cdna microarrays. J Virol. v75 i19. 9044-9058.

Cited By

View all
  • (2018)Combining K-Means and a genetic algorithm through a novel arrangement of genetic operators for high quality clusteringExpert Systems with Applications: An International Journal10.5555/3170713.317076991:C(402-417)Online publication date: 1-Jan-2018
  • (2018)Hierarchical Clustering of High-Dimensional Data Without Global Dimensionality ReductionFoundations of Intelligent Systems10.1007/978-3-030-01851-1_23(236-246)Online publication date: 29-Oct-2018
  • (2015)Concepts of relative sample outlier RSO and weighted sample similarity WSS for improving performance of clustering genesInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2015.06732211:3(314-330)Online publication date: 1-Feb-2015
  • Show More Cited By
  1. Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Journal of Biomedical Informatics
        Journal of Biomedical Informatics  Volume 43, Issue 4
        August, 2010
        195 pages

        Publisher

        Elsevier Science

        San Diego, CA, United States

        Publication History

        Published: 01 August 2010

        Author Tags

        1. Correlation clustering
        2. Functional enrichment
        3. P-Value
        4. Transcription factors
        5. z-Score

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 23 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2018)Combining K-Means and a genetic algorithm through a novel arrangement of genetic operators for high quality clusteringExpert Systems with Applications: An International Journal10.5555/3170713.317076991:C(402-417)Online publication date: 1-Jan-2018
        • (2018)Hierarchical Clustering of High-Dimensional Data Without Global Dimensionality ReductionFoundations of Intelligent Systems10.1007/978-3-030-01851-1_23(236-246)Online publication date: 29-Oct-2018
        • (2015)Concepts of relative sample outlier RSO and weighted sample similarity WSS for improving performance of clustering genesInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2015.06732211:3(314-330)Online publication date: 1-Feb-2015
        • (2011)A methodology for handling a new kind of outliers present in gene expression patternsProceedings of the 4th international conference on Pattern recognition and machine intelligence10.5555/2026851.2026924(394-399)Online publication date: 27-Jun-2011
        • (2011)Searching for Coexpressed Genes in Three-Color cDNA Microarray Data Using a Probabilistic Model-Based Hough TransformIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2010.1208:4(1093-1107)Online publication date: 1-Jul-2011

        View Options

        View options

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media