Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2003351.2003352acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Algorithm for low-variance biclusters to identify coregulation modules in sequencing datasets

Published: 21 August 2011 Publication History

Abstract

High-throughput sequencing (CHIP-Seq) data exhibit binding events with possible binding locations and their strengths, followed by interpretations of the locations of peaks. Recent methods tend to summarize all CHIP-Seq peaks detected within a limited up and down region of each gene into one real-valued score in order to quantify the probability of regulation in a region. Applying subspace clustering (or biclustering) techniques on these scores would discover important knowledge such as the potential co-regulation or co-factors mechanisms. The ideal biclusters generated should contain subsets of genes, and transcription factors (TF) such that the cell-values in biclusters are distributed around a mean value with low variance. Such biclusters would indicate TF sets regulating gene sets with the same probability values. However, most existing biclustering algorithms are neither able to enforce variance as a strict limitation on the values contained in a bicluster, nor use variance as the guiding metric while searching for the desirable biclusters. An algorithm that uses search spaces defined by lattices containing all overlapping biclusters and a bound on variance values as the guiding metric is presented in this paper. The algorithm is shown to be an efficient and effective method for discovering the possibly overlapping biclusters under pre-defined variance bounds. We present in this paper our algorithm, its results with synthetic and CHIP-Seq and motif datasets, and compare them with the results obtained by other algorithms to demonstrate the power and effectiveness of our algorithm.

References

[1]
Ucsc genome browser website:. http://genome.ucsc.edu/.
[2]
V. A, J. DS, S. A, M. C, A. E, and et al. Genome-wide analysis of transcription factor binding sites based on chip-seq data. Nat Methods, 5:829--834, 2008.
[3]
F. Alqadah and R. Bhatnagar. An effective algorithm for mining 3-clusters in vertically partitioned data. In Proceeding of the 17th ACM conference on Information and knowledge management, pages 1103--1112, 2008.
[4]
M. Ashburner, C. Ball, J. Blake, D. Botstein, H. B. J. Cherry, A. Davis, K. Dolinski, S. Dwight, J. Eppig, and et al. Gene ontology: tool for the unification of biology. Nature Genetics, 25(1), 2000.
[5]
B. C. Ben-Dor, R. Karp, and Z. Yakhini. Discovering local structure in gene expression data: The order-preserving submatrix problem. In Proceedings of the 6th International Conference on Computational Biology (RECOMB-02), pages 49--57, 2002.
[6]
H. Bian and R. Bhatnagar. An algorithm for lattice-structured subspace clustering. Proceedings of the SIAM International Conference on Data Mining, 2005.
[7]
H. Bian, R. Bhatnagar, and B. Young. An efficient constraint-based closed set mining algorithm. In Proceedings of the 6th international confernce on Machine Learning, pages 172--177, 2007.
[8]
K. Bryan, P. Cunningham, and Bolshakova N. Biclustering of expression data using simulated annealing. In Proceedings of the 18th IEEE symposium on computer-based medical systems, pages 383--388, 2005.
[9]
J. S. Carroll, C. A. Meyer, J. Song, W. Li, T. R. Geistlinger, J. Eeckhoute, A. S. Brodsky, E. K. Keeton, K. C. Fertuck, G. F. Hall, Q. Wang, S. Bekiranov, V. Sementchenko, E. A. Fox, P. A. Silver, T. R. Gingeras, X. S. Liu, and M. Brown. Genome-wide analysis of estrogen receptor binding sites. Nature Genetics, 38:1289--1297, 2006.
[10]
X. Chen, H. Xu, P. Yuan, F. Fang, M. Huss, V. B. Vega, E. Wong, Y. L. Orlov, W. Zhang, J. Jiang, Y.-H. Loh, H. C. Yeo, Z. X. Yeo, V. Narang, K. Ramamoorthy, Govindarajan, B. Leong, A. Shahab, Y. Ruan, G. Bourque, W.-K. Sung, N. D. Clarke, C.-L. Wei, and H.-H. Ng. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell, 133:1106âĂŞ1117, 2008.
[11]
Y. Cheng and G. Church. Biclustering of expression data. In Proceedings of the 8th international conference on intelligent systems for molecular biology, pages 93--103, 2000.
[12]
E. Conlon, X. Liu, J. Lieb, and J. Liu. Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Natl. Acad. Sci. U.S.A., 100(6):3339--3344, 2003.
[13]
N. D, C. S, and B. K. Empirical methods for controlling false positives and estimating confidence in chip-seq peaks. BMC Bioinformatics, 9:523, 2008.
[14]
I. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining (KDD), 2001.
[15]
I. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 89--98, 2003.
[16]
J. M. Freudenberg, V. K. Joshi, Z. Hu, and M. Medvedovic. Clean: Clustering enrichment analysis. BMC Bioinformatics, 10(234), 2009.
[17]
B. Ganter and R. Wille. Formal concept analysis: Mathematical foundations. Springer-Verlag, Heidelber, 1999.
[18]
J. Ihmels, S. Bergmann, and N. Barkai. Defining transcription modules using large-scale gene expression data. Bioinformatics, 20:1993--2003, 2004.
[19]
J. Ihmels, G. Friedlander, S. Bergmann, O. Sarig, Y. Ziv, and N. Barkai. Revealing modular organization in the yeast transcriptional network. Nature Genetics, 31:370--377, 2002.
[20]
S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1):24--45, 2004.
[21]
Z. Ouyang, Q. Zhou, and W. H. Wong. Chip-seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. PNAS, 106(51):21521--21526, 2009.
[22]
P. PJ. Chip-seq: advantages and challenges of a maturing technology. Nat Rev Genet, 10:669âĂŞ680, 2009.
[23]
P. S, W. B, and M. A. Computation for chip-seq and rna-seq studies. Nat Methods, 6:S22âĂŞ32, 2009.
[24]
K. Shinde, M. Phatak, J. M. Freudenberg, J. Chen, Q. Li, V. Joshi, Z. Hu, K. Ghosh, J. Meller, and M. Medvedovic. Genomics portals: Integrative web-platform for mining genomics data. BMC Genomics, 11(1), 2010.
[25]
A. Tanay, R. Sharan, and R. Shamir. Discovering statistically significant bilcusters in gene expression data. Bioinformatics, 18:136--144, 2002.
[26]
L. TD, R. S, T. S, L. R, A. T, and et al. A practical comparison of methods for detecting transcription factor binding sites in chip-seq experiments. BMC Genomics, 10:618, 2009.
[27]
Z. Y, L. T, M. C, E. J, J. D, and et al. Model-based analysis of chip-seq (macs). Genome Biology, 9:R137, 2008.
[28]
J. Yang, W. Wang, H. Wang, and P. Yu. Δ-clusters: capturing subspace correlation in a large data set. In Proceedings of the 18th IEEE International Conference On Data Engineering, pages 517--528, 2002.
[29]
S. Yoon, L. Benini, and D. M. G. Co-clustering: A versatile tool for data analysis in biomedical informatics. Information Technology in Biomedicine, IEEE Transactions on, 11:493--494.
[30]
M. J. Zaki and K. Gouda. Fast vertical mining using diffsets. In 9th International Conference on Knowledge Discovery and Data Mining, 2003.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
BIOKDD '11: Proceedings of the Tenth International Workshop on Data Mining in Bioinformatics
August 2011
47 pages
ISBN:9781450308397
DOI:10.1145/2003351
  • General Chairs:
  • Mohammed Zaki,
  • Jake Chen,
  • Program Chairs:
  • Mohammad Al Hasan,
  • Jun (Luke) Huan
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

KDD '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 7 of 16 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 172
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media