Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Concepts of relative sample outlier RSO and weighted sample similarity WSS for improving performance of clustering genes: co-function and co-regulation

Published: 01 February 2015 Publication History

Abstract

Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the success of a similarity measure. Better the ability of similarity measure in measuring similarity between genes in the presence of outliers, better will be the performance of the clustering algorithm in forming biologically relevant groups of genes. In the present article, we discuss the problem of handling outliers with different existing similarity measures and introduce the concepts of Relative Sample Outlier RSO. We formulate new similarity, called Weighted Sample Similarity WSS, incorporated in Euclidean distance and Pearson correlation coefficient and then use them in various clustering and biclustering algorithms to group different gene expression profiles. Our results suggest that WSS improves performance, in terms of finding biologically relevant groups of genes, of all the considered clustering algorithms.

References

[1]
Bansal, N., Blum, A. and Chawla, S. (2004) 'Correlation clustering', Machine Learning, Vol. 56, pp. 89-113.
[2]
Barnett, V. and Lewis, T. (1994) Outliers in Statistical Data, Wiley, New York, NY, USA.
[3]
Bhattacharya, A. and De, R.K. (2008) 'Divisive correlation clustering algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles', Bioinformatics, Vol. 24, No. 11, pp. 1359-1366.
[4]
Bhattacharya, A. and De, R.K. (2009) 'Bi-correlation clustering algorithm for determining a set of co-regulated genes', Bioinformatics, Vol. 25, No. 21, pp. 2795-2801.
[5]
Bhattacharya, A. and De, R.K. (2010) 'Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values', Journal of Biomedical Informatics, Vol. 43, No. 4, pp. 560-568.
[6]
Bland, J.M. and Altman, D.G. (1995) Calculating correlation coefficients with repeated observations: part 2 - correlation between subjects', British Medical Journal, Vol. 310, No. 6980, p. 633.
[7]
Chandran, U.R., Ma, C., Dhir, R., Bisceglia, M., Lyons-Weiler, M., Liang, W., Michalopoulos, G., Becich, M. and Monzon, F. (2007) 'Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process', BMC Cancer, Vol. 7, p. 64.
[8]
Cheng, Y. and Church, G.M. (2000) 'Biclustering of expression data', Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, 19-23 August, San Diego, CA, USA, pp. 93-103.
[9]
Fawcett, T. and Provost, F. (1997) 'Adaptive fraud detection', Data-mining and Knowledge Discovery, Vol. 1, pp. 291-316.
[10]
Ge, X.J., Yamamoto, S., Tsutsumi, S., Midorikawa, Y., Ihara, S., Wang, S.M. and Aburatani, H. (2005) 'Interpreting expression profiles of cancers by genome-wide survey of breadth of expression', Genomics, Vol. 86, No. 2, pp. 127-141.
[11]
Gibbons, F. and Roth, F. (2002) 'Judging the quality of gene expression-based clustering methods using gene annotation', Genome Research, Vol. 12, No. 10, pp. 1574-1581.
[12]
Gun, A.M., Gupta, M.K. and Dasgupta, B. (2005) Fundamentals of Statistics, Vol. 2, The World Press Private Limited, Kolkata, India.
[13]
Han, J. and Kamber, M. (2001) Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, CA, USA.
[14]
Hawkins, D. (1980) Identification of Outliers, Chapman and Hall, London, UK.
[15]
Heyer, L.J., Kruglyak, S. and Yooseph, S. (1999) 'Exploring expression data: Identification and analysis of coexpressed genes', Genome Research, Vol. 9, pp. 1106-1115.
[16]
Hu, T. and Sung, S.Y. (2003) 'Detecting pattern-based outliers', Pattern Recognition Letters, Vol. 24, pp. 3059-3068.
[17]
Jain, A.K. and Dubes, R.C. (1988) Algorithms for Clustering Data, Prentice Hall, New Jersey, NJ, USA.
[18]
Kadota, K., Nishimura, S.I., Bono, H., Nakamura, S., Hayashizaki, Y., Okazaki, Y. and Takahashi, K. (2003) 'Detection of genes with tissue-specific expression patterns using akaikes information criterion', Physiological Genomics, Vol. 12, No. 3, pp. 251-259.
[19]
Kaufman, L. and Rousseeuw, P.J. (1990) Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York, NY, USA.
[20]
Knudsen, S. (2001) A Biologists Guide to Analysis of DNA Microarray Data, Wiley, New York, NY, USA.
[21]
Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M. and Sabeti, P.C. (2011) 'Detecting novel associations in large data sets', Science, Vol. 334, No. 6062, pp. 1518-1524.
[22]
Rousseeuw, P. and Leory, A. (1987) Robust Regression and Outlier Detection, Wiley, New York, NY, USA.
[23]
Schiffman, S.S., Reynolds, M.L. and Young, F.W. (1981) Introduction to Multidimensional Scaling: Theory, Methods and Applications, Academic Press, New York, NY, USA.
[24]
Shekhar, S. and Chawla, S. (2002) A Tour of Spatial Databases, Prentice Hall, New Jersey, NJ, USA.
[25]
Spellman, P.T., Zhang, G.S.M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D. and Futcher, B. (1998) 'Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization', Molecular Biology of the Cell, Vol. 9, No. 12, pp. 3273-3297.
[26]
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J. and Church, G.M. (1999) 'Systematic determination of genetic network architecture', Nature Genetics, Vol. 22, No. 3, pp. 281-285.
[27]
Tou, J.T. and Gonzalez, R.C. (1974) Pattern Recognition Principles, Addison-Wesley, Reading, England, UK.
[28]
Wang, S., Antwerp, M. V., Kuick, R. and Gauger, P. (2007) 'Microarray analysis of cytokine activation of apoptosis pathways in the thyroid', Endocrinology, Vol. 148, No. 10, pp. 4844-4852.
[29]
Wills-Karp, M. and Ewart, S.L. (2004) Time to draw breath: asthma-susceptibility genes are identified', Nature Reviews Genetics, Vol. 5, No. 5, pp. 376-387.
[30]
Yu, Y., Landsittel, D., Jing, L. and Nelson, J. (2004) 'Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy', Journal of Clinical Oncology, Vol. 22, pp. 2790-2799.

Cited By

View all
  • (2018)Discovery of deep order-preserving submatrix in DNA microarray data based on sequential pattern miningInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2017.08528017:3(217-237)Online publication date: 23-Dec-2018
  1. Concepts of relative sample outlier RSO and weighted sample similarity WSS for improving performance of clustering genes: co-function and co-regulation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image International Journal of Data Mining and Bioinformatics
      International Journal of Data Mining and Bioinformatics  Volume 11, Issue 3
      February 2015
      107 pages
      ISSN:1748-5673
      EISSN:1748-5681
      Issue’s Table of Contents

      Publisher

      Inderscience Publishers

      Geneva 15, Switzerland

      Publication History

      Published: 01 February 2015

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 23 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Discovery of deep order-preserving submatrix in DNA microarray data based on sequential pattern miningInternational Journal of Data Mining and Bioinformatics10.1504/IJDMB.2017.08528017:3(217-237)Online publication date: 23-Dec-2018

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media