Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1281192.1281248acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Joint cluster analysis of attribute and relationship data withouta-priori specification of the number of clusters

Published: 12 August 2007 Publication History

Abstract

In many applications, attribute and relationship data areavailable, carrying complementary information about real world entities. In such cases, a joint analysis of both types of data can yield more accurate results than classical clustering algorithms that either use only attribute data or only relationship (graph) data. The Connected k-Center (CkC) has been proposed as the first joint cluster analysis model to discover k clusters which are cohesive on both attribute and relationship data. However, it is well-known that prior knowledge on the number of clusters is often unavailable in applications such as community dentification and hotspot analysis. In this paper, we introduce and formalize the problem of discovering an a-priori unspecified number of clusters in the context of joint cluster analysis of attribute and relationship data, called Connected X Clusters (CXC) problem. True clusters are assumed to be compact and distinctive from their neighboring clusters in terms of attribute data and internally connected in terms of relationship data. Different from classical attribute-based clustering methods, the neighborhood of clusters is not defined in terms of attribute data but in terms of relationship data. To efficiently solve the CXC problem, we present JointClust, an algorithm which adopts a dynamic two-phase approach. In the first phase, we find so called cluster atoms. We provide a probability analysis for thisphase, which gives us a probabilistic guarantee, that each true cluster is represented by at least one of the initial cluster atoms. In the second phase, these cluster atoms are merged in a bottom-up manner resulting in a dendrogram. The final clustering is determined by our objective function. Our experimental evaluation on several real datasets demonstrates that JointClust indeed discovers meaningful and accurate clusterings without requiring the user to specify the number of clusters.

References

[1]
M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: ordering points to identify the clustering structure. In SIGMOD, 1999.
[2]
C. M. Bishop. Neural networks for pattern recognition. Oxford: Clarendon Press, 1995.
[3]
P. S. Bradley and U. M. Fayyad. Refining initial points for k-means clustering. In ICML, 1998.
[4]
U. Brandes, M. Gaertler, and D. Wagner. Experiments on graph clustering algorithms. In Algorithms -- ESA 2003, 11th Annual European Symposium, Budapest,Hungary, 2003.
[5]
C. Carson, M. Thomas, S. Belongie, J. M. Hellerstein, and J. Malik. Blobworld: A system for region--based image indexing and retrieval. In VIS. Springer, 1999.
[6]
C. Ding, X. He, H. Zha, M. Gu, and H. Simon. A min-max cut algorithm for graph partitioning and data clustering. ICDM, 2001.
[7]
R. Duda and P. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons, 1973.
[8]
M. Ester, R. Ge, B. J. Gao, Z. Hu, and B. Ben-Moshe. Joint cluster analysis of attribute data and relationship data: the connected k-center problem. In SDM, 2006.
[9]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, 1996.
[10]
W. Feller. An Introduction to Probability Theory and Its Applications, volume 1. New York: Wiley, 1968.
[11]
P. Fjallstrom. Algorithms for graph partitioning: A survey. Linkoping Electronic Articles in Computer and Information Science, 1998.
[12]
M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simplified np--complete problems. In Sixth annual ACM symposium on Theory of computing, 1974.
[13]
S. Guha, R. Rastogi, and K. Shim. Cure: an efficient clustering algorithm for large databases. In SIGMOD, 1998.
[14]
S. Guha, R. Rastogi, and K. Shim. Rock: A robust clustering algorithm for categorical attributes. In Information Systems, 2000.
[15]
L. Hagen and A. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Trans. on Computed Aided Design, 1992.
[16]
R. A. Hanneman and M. Riddle. Introduction to social network methods. http://faculty.ucr.edu/$\sim$hanneman/, 2005.
[17]
A. Jain and R. Dubes. Algorithms for clustering data. Prentice Hall, 1988.
[18]
G. Karypis, E.-H. Han, and V. Kumar. Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer, 1999.
[19]
L. Kaufman and P. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley, 1990.
[20]
T. M. Mitchell. Machine Learning. New York; London: McGraw-Hill, 1997.
[21]
M. Newman. The structure of scientific collaboration networks. In Proc. Natl. Acad. Sci. 98, 2001.
[22]
M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review, 2004.
[23]
A. Okabe, K.-I. Okunuki, and S. Shino. The sanet toolbox: New methods for network spatial analysis. In Transactions in GIS. Blackwell Publishing, July 2006.
[24]
D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In ICML, San Francisco, 2000.
[25]
G. Salton and M. J. McGill. Introduction to modern information retrieval. McGraw-Hill, 1983.
[26]
J. Scott. Social Network Analysis: A handbook. Sage Publications, London, 2000.
[27]
J. Shi and J. Malik. Normalized cuts and image segmentation. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000.
[28]
S. Wasserman and K. Faust. Social Network Analysis. Cambridge University Press, 1994.
[29]
Y. Wei and C. Cheng. Towards efficient hierarchical designs by ratio cut partitioning. In IEEE Conf. on Computer-Aided Design, 1989.
[30]
S. White and P. Smyth. A spectral clustering approach to finding comm. in graphs. In SDM, 2005.
[31]
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. In SIGMOD, 1996.

Cited By

View all
  • (2020)Attributed Networks Partitioning Based on Modularity OptimizationAdvances in Data Science10.1002/9781119695110.ch8(169-185)Online publication date: 15-Jan-2020
  • (2017)Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clusteringKnowledge and Information Systems10.1007/s10115-017-1030-853:1(239-268)Online publication date: 1-Oct-2017
  • (2017)Towards Contextualizing Community Detection in Dynamic Social NetworksModeling and Using Context10.1007/978-3-319-57837-8_26(324-336)Online publication date: 4-May-2017
  • Show More Cited By

Index Terms

  1. Joint cluster analysis of attribute and relationship data withouta-priori specification of the number of clusters

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2007
    1080 pages
    ISBN:9781595936097
    DOI:10.1145/1281192
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. algorithms
    2. clustering
    3. community identification
    4. graph-structured data
    5. hotspot analysis
    6. joint cluster analysis

    Qualifiers

    • Article

    Conference

    KDD07

    Acceptance Rates

    KDD '07 Paper Acceptance Rate 111 of 573 submissions, 19%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Attributed Networks Partitioning Based on Modularity OptimizationAdvances in Data Science10.1002/9781119695110.ch8(169-185)Online publication date: 15-Jan-2020
    • (2017)Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clusteringKnowledge and Information Systems10.1007/s10115-017-1030-853:1(239-268)Online publication date: 1-Oct-2017
    • (2017)Towards Contextualizing Community Detection in Dynamic Social NetworksModeling and Using Context10.1007/978-3-319-57837-8_26(324-336)Online publication date: 4-May-2017
    • (2016)Hybrid community detection approach in multilayer social network: Scientific collaboration recommendation case study2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)10.1109/AICCSA.2016.7945701(1-8)Online publication date: Nov-2016
    • (2015)Bayesian Multiscale Modeling of Spatial Infrastructure Performance Predictions with an Application to Electric Power Outage ForecastingJournal of Infrastructure Systems10.1061/(ASCE)IS.1943-555X.000022221:2Online publication date: Jun-2015
    • (2015)I-Louvain: An Attributed Graph Clustering MethodAdvances in Intelligent Data Analysis XIV10.1007/978-3-319-24465-5_16(181-192)Online publication date: 22-Nov-2015
    • (2014)Exploring the Properties of Online Social Network Data and Their Implications for Consumer Social Data AnalyticsHarnessing the Power of Social Media and Web Analytics10.4018/978-1-4666-5194-4.ch009(210-230)Online publication date: 2014
    • (2013)Ambient Intelligence for Monitoring and Research in Clinical Neurophysiology and MedicineClinical EEG and Neuroscience10.1177/155005941246365844:2(144-149)Online publication date: 31-Mar-2013
    • (2013)An effective community detection algorithm of the social networks2013 IEEE Third International Conference on Information Science and Technology (ICIST)10.1109/ICIST.2013.6747668(824-827)Online publication date: Mar-2013
    • (2013)Moblog-Based Social NetworksSocial Networks: A Framework of Computational Intelligence10.1007/978-3-319-02993-1_5(75-97)Online publication date: 10-Dec-2013
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media