Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1989323.1989400acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Advancing data clustering via projective clustering ensembles

Published: 12 June 2011 Publication History

Abstract

Projective Clustering Ensembles (PCE) are a very recent advance in data clustering research which combines the two powerful tools of clustering ensembles and projective clustering.Specifically, PCE enables clustering ensemble methods to handle ensembles composed by projective clustering solutions. PCE has been formalized as an optimization problem with either a two-objective or a single-objective function. Two-objective PCE has shown to generally produce more accurate clustering results than its single-objective counterpart, although it can handle the object-based and feature-based cluster representations only independently of one other. Moreover, both the early formulations of PCE do not follow any of the standard approaches of clustering ensembles, namely instance-based, cluster-based, and hybrid. In this paper, we propose an alternative formulation to the PCE problem which overcomes the above issues. We investigate the drawbacks of the early formulations of PCE and define a new single-objective formulation of the problem. This formulation is capable of treating the object- and feature-based cluster representations as a whole, essentially tying them in a distance computation between a projective clustering solution and a given ensemble. We propose two cluster-based algorithms for computing approximations to the proposed PCE formulation, which have the common merit of conforming to one of the standard approaches of clustering ensembles. Experiments on benchmark datasets have shown the significance of our PCE formulation, as both the proposed heuristics outperform existing PCE methods.

References

[1]
E. Achtert, C. Böhm, H. Kriegel, P. Kröger, I. Müller-Gorman, and A. Zimek. Detection and Visualization of Subspace Cluster Hierarchies. In Proc. DASFAA Conf., pages 152--163, 2007.
[2]
C. C. Aggarwal, C. M. Procopiuc, J. L. Wolf, P. S. Yu, and J. S. Park. Fast Algorithms for Projected Clustering. In Proc. SIGMOD Conf., pages 61--72, 1999.
[3]
H. Ayad and M. S. Kamel. Finding Natural Clusters Using Multi-Clusterer Combiner Based on Shared Nearest Neighbors. In Proc. Int. Workshop on Multiple Classifier Systems (MCS), pages 166--175, 2003.
[4]
J. P. Barthélemy and B. Leclerc. The Median Procedure for Partitions. Partitioning Data Sets, 19:3--33, 1995.
[5]
C. Böhm, K. Kailing, H. P. Kriegel, and P. Kröger. Density Connected Clustering with Local Subspace Preferences. In Proc. ICDM Conf., pages 27--34, 2004.
[6]
C. Boulis and M. Ostendorf. Combining Multiple Clustering Systems. In Proc. PKDD Conf., pages 63--74, 2004.
[7]
P. S. Bradley and U. M. Fayyad. Refining Initial Points for K-Means Clustering. In Proc. ICML Conf., pages 91--99, 1998.
[8]
L. Chen, Q. Jiang, and S. Wang. A Probability Model for Projective Clustering on High Dimensional Data. In Proc. ICDM Conf., pages 755--760, 2008.
[9]
F. Chierichetti, R. Kumar, S. Pandey, and S. Vassilvitskii. Finding the Jaccard Median. In Proc. SODA Conf., pages 293--311, 2010.
[10]
E. Dimitriadou, A. Weingesse, and K. Hornik. Voting-Merging: An Ensemble Method for Clustering. In Proc. ICANN Conf., pages 217--224, 2001.
[11]
S. Dudoit and J. Fridlyand. Bagging to Improve the Accuracy of a Clustering Procedure. Bioinformatics, 19(9):1090--1099, 2003.
[12]
B. Fischer and J. M. Buhmann. Bagging for Path-Based Clustering. TPAMI, 25(11):1411--1415, 2003.
[13]
A. L. N. Fred. Finding Consistent Clusters in Data Partitions. In Proc. Int. Workshop on Multiple Classifier Systems (MCS), pages 309--318, 2001.
[14]
G. Gan, C. Ma, and J. Wu. Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability, 2007.
[15]
A. Gionis, H. Mannila, and P. Tsaparas. Clustering Aggregation. TKDD, 1(1), 2007.
[16]
F. Gullo, C. Domeniconi, and A. Tagarelli. Projective Clustering Ensembles. In Proc. ICDM Conf., pages 794--799, 2009.
[17]
F. Gullo, A. Tagarelli, and S. Greco. Diversity-Based Weighting Schemes for Clustering Ensembles. In Proc. SDM Conf., pages 437--448, 2009.
[18]
A. K. Jain and R. Dubes. Algorithms for Clustering Data. Prentice-Hall, 1988.
[19]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comp., 20(1):359--392, 1998.
[20]
L. I. Kuncheva, S. T. Hadjitodorov, and L. P. Todorova. Experimental Comparison of Cluster Ensemble Methods. In Proc. Int. Conf. on Information Fusion, pages 1--7, 2006.
[21]
R. P. Li and M. Mukaidono. Gaussian clustering method based on maximum-fuzzy-entropy interpretation. Fuzzy Sets and Systems, 102(2):253--258, 1999.
[22]
N. Nguyen and R. Caruana. Consensus Clustering. In Proc. ICDM Conf., pages 607--612, 2007.
[23]
A. Patrikainen and M. Meila. Comparing subspace clusterings. TKDE, 18(7):902--916, 2006.
[24]
C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. M. Murali. A Monte Carlo algorithm for fast projective clustering. In Proc. SIGMOD Conf., pages 418--427, 2002.
[25]
K. Sequeira and M. Zaki. SCHISM: A New Approach for Interesting Subspace Mining. In Proc. ICDM Conf., pages 186--193, 2004.
[26]
A. Strehl, J. Ghosh, and R. Mooney. Impact of Similarity Measures on Web-Page Clustering. In Proc. of AAAI Workshop on AI for Web Search, pages 58--64, 2000.
[27]
A. Asuncion and D. Newman. UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/.
[28]
A. Strehl and J. Ghosh. Cluster Ensembles -- A Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res., 3:583--617, 2002.
[29]
C. Domeniconi and M. Al-Razgan. Weighted Cluster Ensembles: Methods and Analysis. TKDD, 2(4), 2009.
[30]
C. Domeniconi, D. Gunopulos, S. Ma, B. Yan, M. Al-Razgan, and D. Papadopoulos. Locally Adaptive Metrics for Clustering High Dimensional Data. Data Mining and Knowledge Discovery, 14(1):63--97, 2007.
[31]
E. Achtert, C. Böhm, H. Kriegel, P. Kröger, I. Müller-Gorman, and A. Zimek. Finding Hierarchies of Subspace Clusters. In Proc. PKDD Conf., pages 446--453, 2006.
[32]
E. Ka Ka Ng, A. W.-C. Fu, and R. C.-W. Wong. Projective Clustering by Histograms. TKDE, 17(3):369--383, 2005.
[33]
E. Keogh, X. Xi, L. Wei, and C. A. Ratanamahatana. The UCR Time Series Classification/Clustering Page, http://www.cs.ucr.edu/$\sim$eamonn/time_series_data/.
[34]
G. Moise, J. Sander, and M. Ester. Robust projected clustering. KAIS, 14(3):273--298, 2008.
[35]
M. L. Yiu and N. Mamoulis. Iterative Projected Clustering by Subspace Mining. TKDE, 17(2):176--189, 2005.
[36]
X. Z. Fern and C. Brodley. Solving Cluster Ensemble Problems by Bipartite Graph Partitioning. In Proc. ICML Conf., pages 281--288, 2004.
[37]
K. Y. Yip, D. W. Cheung, and M. K. Ng. On Discovery of Extremely Low-Dimensional Clusters using Semi-Supervised Projected Clustering. In Proc. ICDE Conf., pages 329--340, 2005.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. clustering ensembles
  3. data mining
  4. dimensionality reduction
  5. optimization
  6. projective clustering
  7. subspace clustering

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)An efficient and scalable family of algorithms for combining clusteringsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2013.08.00126:10(2525-2539)Online publication date: 27-Dec-2018
  • (2018)Projective clustering ensemblesData Mining and Knowledge Discovery10.1007/s10618-012-0266-x26:3(452-511)Online publication date: 26-Dec-2018
  • (2017)Distribution-Based Cluster Structure SelectionIEEE Transactions on Cybernetics10.1109/TCYB.2016.256952947:11(3554-3567)Online publication date: Nov-2017
  • (2016)Subspace Clustering Ensembles through Tensor Decomposition2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW.2016.0177(1225-1234)Online publication date: Dec-2016
  • (2012)Multiobjective optimization of co-clustering ensemblesProceedings of the 14th annual conference companion on Genetic and evolutionary computation10.1145/2330784.2331010(1495-1496)Online publication date: 7-Jul-2012
  • (2012)Structure ensemble based on fuzzy c-means2012 International Conference on Machine Learning and Cybernetics10.1109/ICMLC.2012.6359567(1383-1389)Online publication date: Jul-2012
  • (2012)Discovering Multiple Clustering SolutionsProceedings of the 2012 IEEE 28th International Conference on Data Engineering10.1109/ICDE.2012.142(1207-1210)Online publication date: 1-Apr-2012

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media