Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1376916.1376945acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Approximation algorithms for co-clustering

Published: 09 June 2008 Publication History

Abstract

Co-clustering is the simultaneous partitioning of the rows and columns of a matrix such that the blocks induced by the row/column partitions are good clusters. Motivated by several applications in text mining, market-basket analysis, and bioinformatics, this problem has attracted severe attention in the past few years. Unfortunately, to date, most of the algorithmic work on this problem has been heuristic in nature.
In this work we obtain the first approximation algorithms for the co-clustering problem. Our algorithms are simple and obtain constant-factor approximation solutions to the optimum. We also show that co-clustering is NP-hard, thereby complementing our algorithmic result.

References

[1]
D. Agarwal and S. Merugu. Predictive discrete latent factor models for large scale dyadic data. In Proc. of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 26--35, 2007.
[2]
V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search heuristics for k-median and facility location problems. SIAM Journal on Computing, 33(3):544--562, June 2004.
[3]
M. Bǎdoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In Proc. of the 34th Annual ACM Symposium on Theory of Computing, pages 250--257, 2002.
[4]
A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. S. Modha. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. Journal of Machine Learning Research, 8:1919--1986, 2007.
[5]
N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning, 56(1-3):89--113, 2004.
[6]
Y. Cheng and G. M. Church. Biclustering of expression data. In Proc. of the 8th International Conference on Intelligent Systems for Molecular Biology, pages 93--103, 2000.
[7]
H. Cho, I. S. Dhillon, Y. Guan, and S. Sra. Minimum sum-squared residue co-clustering of gene expression data. In Proc. of the 4th SIAM International Conference on Data Mining. SIAM, 2004.
[8]
I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 269--274, 2001.
[9]
P. Drineas, A. M. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering large graphs via the singular value decomposition. Machine Learning, 56(1-3):9--33, 2004.
[10]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley Interscience, 2000.
[11]
U. Feige and S. Kogan. Hardness of approximation of the balanced complete bipartite subgraph problem, 2004.
[12]
B. Gao, T. Liu, X. Zheng, Q. Cheng, and W. Ma. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In Proc. of the 11th ACM Conference on Knowledge Discovery and Data Mining, pages 41--50, 2005.
[13]
S. Gollapudi, R. Kumar, and D. Sivakumar. Programmable clustering. In Proc. 25th ACM Symposium on Principles of Database Systems, pages 348--354, 2006.
[14]
J. A. Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association, 67(337):123--129, 1972.
[15]
S. Hassanpour. Computational complexity of bi-clustering. Master's thesis, University of Waterloo, 2007.
[16]
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264--323, 1999.
[17]
K. Jain and V. V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. Journal of the ACM, 48(2):274--296, 2001.
[18]
M. Jambyu and M. O. Lebeaux. Cluster Analysis and Data Analysis. North-Holland, 1983.
[19]
J. Kleinberg. An impossibility theorem for clustering. In Advances in Neural Information Processing Systems 15, pages 446--453, 2002.
[20]
Y. Kluger, R. Basri, J. T. Chang, and M. Gerstein. Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Research, 13:703--716, 2003.
[21]
A. Kumar, Y. Sabharwal, and S. Sen. A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In Proc. of the 45th IEEE Symposium on Foundations of Computer Science, pages 454--462, 2004.
[22]
S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: A survey. IEEE Transactions on Computational Biology and Bioinformatics, 1(1):24--45, 2004.
[23]
N. Megiddo and K. J. Supowit. On the complexity of some common geometric location problems. SIAM Journal on Computing, 13(1):182--196, 1984.
[24]
N. Mishra, D. Ron, and R. Swaminathan. On finding large conjunctive clusters. In Proc. of the 16th Annual Conference on Computational Learning Theory, pages 448--462, 2003.
[25]
N. Mishra, D. Ron, and R. Swaminathan. A new conceptual clustering framework. Machine Learning, 56(1-3):115--151, 2004.
[26]
R. Peeters. The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics, 131(3):651--654, 2003.
[27]
K. Puolamäki, S. Hanhijärvi, and G. C. Garriga. An approximation ratio for biclustering. CoRR, abs/0712.2682, 2007.
[28]
R. Shamir, R. Sharan, and D. Tsur. Cluster graph modification problems. Discrete Applied Mathematics, 144(1-2):173--182, 2004.
[29]
H. Takamura and Y. Matsumoto. Co-clustering for text categorization. Information Processing Society of Japan Journal, 2003.
[30]
A. Tanay, R. Sharan, and R. Shamir. Biclustering algorithms: A survey. In E. by Srinivas Aluru, editor, In Handbook of Computational Molecular Biology. Chapman & Hall/CRC, Computer and Information Science Series, 2005.
[31]
V. V. Vazirani. Approximation Algorithms. Springer-Verlag, 2001.
[32]
J. Yang, H. Wang, W. Wang, and P. Yu. Enhanced biclustering on expression data. In Proc. of the 3rd IEEE Conference on Bioinformatics and Bioengineering, pages 321--327, 2003.
[33]
J. Yang, W. Wang, H. Wang, and P. S. Yu. delta-clusters: Capturing subspace correlation in a large data set. In Proc. of the 18th International Conference on Data Engineering, pages 517--528, 2002.
[34]
H. Zhou and D. P. Woodruff. Clustering via matrix powering. In Proc. of the 23rd ACM Symposium on Principles of Database Systems, pages 136--142, 2004.

Cited By

View all
  • (2022)Privacy in Social NetworksundefinedOnline publication date: 24-Mar-2022
  • (2018)Network Anomaly Detection Using Co-clusteringEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4939-7131-2_354(1501-1516)Online publication date: 12-Jun-2018
  • (2017)Multi-manifold matrix decomposition for data co-clusteringPattern Recognition10.1016/j.patcog.2016.11.02764:C(386-398)Online publication date: 1-Apr-2017
  • Show More Cited By

Index Terms

  1. Approximation algorithms for co-clustering

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
    June 2008
    330 pages
    ISBN:9781605581521
    DOI:10.1145/1376916
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. approximation
    2. biclustering
    3. clustering
    4. co-clustering

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '08
    Sponsor:

    Acceptance Rates

    PODS '08 Paper Acceptance Rate 28 of 159 submissions, 18%;
    Overall Acceptance Rate 642 of 2,707 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Privacy in Social NetworksundefinedOnline publication date: 24-Mar-2022
    • (2018)Network Anomaly Detection Using Co-clusteringEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4939-7131-2_354(1501-1516)Online publication date: 12-Jun-2018
    • (2017)Multi-manifold matrix decomposition for data co-clusteringPattern Recognition10.1016/j.patcog.2016.11.02764:C(386-398)Online publication date: 1-Apr-2017
    • (2017)Network Anomaly Detection Using Co-clusteringEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4614-7163-9_354-1(1-17)Online publication date: 22-Apr-2017
    • (2016)HICCKnowledge and Information Systems10.1007/s10115-015-0823-x46:2(343-367)Online publication date: 1-Feb-2016
    • (2015)Parameter selection for nonnegative l 1 matrix/tensor sparse decompositionOperations Research Letters10.1016/j.orl.2015.06.00543:4(423-426)Online publication date: 1-Jul-2015
    • (2015)Multimedia Recommendation and Delivery StrategiesData Management in Pervasive Systems10.1007/978-3-319-20062-0_16(327-342)Online publication date: 2015
    • (2014)Improving Co-Cluster Quality with Application to Product RecommendationsProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661980(679-688)Online publication date: 3-Nov-2014
    • (2014)CoBaFiProceedings of the 23rd international conference on World wide web10.1145/2566486.2568040(97-108)Online publication date: 7-Apr-2014
    • (2014)Reduce and aggregateProceedings of the 23rd international conference on World wide web10.1145/2566486.2568025(349-360)Online publication date: 7-Apr-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media