Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2020408.2020547acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

A game theoretic framework for heterogenous information network clustering

Published: 21 August 2011 Publication History

Abstract

Heterogeneous information networks are pervasive in applications ranging from bioinformatics to e-commerce. As a result, unsupervised learning and clustering methods pertaining to such networks have gained significant attention recently. Nodes in a heterogeneous information network are regarded as objects derived from distinct domains such as 'authors' and 'papers'. In many cases, feature sets characterizing the objects are not available, hence, clustering of the objects depends solely on the links and relationships amongst objects. Although several previous studies have addressed information network clustering, shortcomings remain. First, the definition of what constitutes an information network cluster varies drastically from study to study. Second, previous algorithms have generally focused on non-overlapping clusters, while many algorithms are also limited to specific network topologies. In this paper we introduce a game theoretic framework (GHIN) for defining and mining clusters in heterogeneous information networks. The clustering problem is modeled as a game wherein each domain represents a player and clusters are defined as the Nash equilibrium points of the game. Adopting the abstraction of Nash equilibrium points as clusters allows for flexible definition of reward functions that characterize clusters without any modification to the underlying algorithm. We prove that well-established definitions of clusters in 2-domain information networks such as formal concepts, maximal bi-cliques, and noisy binary tiles can always be represented as Nash equilibrium points. Moreover, experimental results employing a variety of reward functions and several real world information networks illustrate that the GHIN framework produces more accurate and informative clusters than the recently proposed NetClus and state of the art MDC algorithms.

References

[1]
F. Alqadah and R. Bhatnagar. An effective algorithm for mining 3-clusters in vertically partitioned data. In CIKM '08, 2008.
[2]
F. Alqadah and R. Bhatnagar. Discovering substantial distinctions among incremental bi-clusters. In SDM'09, 2009.
[3]
E. Amigó, J. Gonzalo, J. Artiles, and F. Verdejo. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 12 (4):461--486, 2009.
[4]
A. Asuncion and D. Newman. UCI machine learning repository, 2007.
[5]
A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In SDM'07, 2007.
[6]
R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In ICML '05, pages 41--48, 2005.
[7]
I. Bhattacharya and L. Getoor. Relational clustering for multi-type entity resolution. In MRDM '05, 2005.
[8]
S. R. Bulo' and M. Pelillo. A game-theoretic approach to hypegraph clustering. In NIPS2009, 1571--1579, 2009.
[9]
S. Busygin, O. Prokopyev, and P. M. Pardalos. Biclustering in data mining. Comput. Oper. Res., 35(9):2964--2987, 2008.
[10]
Y. Cheng and G. Church. Biclustering of expression data. In ISMB '00, 2000.
[11]
H. Cho, I. Dhillon, Y. Guan, and S. Sra. Minimum sum squared residue co-clustering of gene expression data. In SDM '04, 2004.
[12]
I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD '01, 2001.
[13]
I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In KDD '03, pages 89--98, 2003.
[14]
B. Gamter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag, Berlin, 1999.
[15]
J. Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association, 67, No. 337:123--129, 1972.
[16]
J. Li, G. Liu, H. Li, and L. Wong. Maximal biclique subgraphs and closed pattern pairs of the adjacency matrix: A one-to-one correspondence and mining algorithms. IEEE Trans. Knowl. Data Eng., 19(12):1625--1637, 2007.
[17]
B. Long, X. Wu, Z. M. Zhang, and P. S. Yu. Unsupervised learning on k-partite graphs. In KDD '06, pages 317--326, 2006.
[18]
B. Long, Z. M. Zhang, X. Wu, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML '06, pages 585--592, 2006.
[19]
S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1 (1):24--45, 2004.
[20]
E. Mendelson. Introducing Game Theory and Its Applications. Chapman & Hall / CRC, 2004.
[21]
G. Owen. Game Theory. Academic Press, 1995.
[22]
R. Pensa and J. Boulicaut. Towards fault-tolerant formal concept analysis. Congress of the Italian Association for Artificial Intelligence AI* IA, 3673:21--23, 2005.
[23]
R. Porter, E. Nudelman, and Y. Shoham. Simple search methods for finding a nash equilibrium. In Games and Economic Behavior, pages 664--669, 2004.
[24]
C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. Murali. A monte carlo algorithm for fast projective clustering. In SIGMOD '02, pages 418--427, 2002.
[25]
K. Sim, J. Li, V. Gopalkrishnan, and G. Liu. Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment. In ICDM '06, pages 1059--1063, 2006.
[26]
Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu. Rankclus: Integrating clustering with ranking for heterogeneous information network analysis. In EDBT'09, 2009.
[27]
Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In KDD'09, 2009.
[28]
A. Tanay, R. Sharan, and R. Shamir. Discovering statistically significant biclusters in gene expression data. In ISMB 2002, 2002.
[29]
X. Yin, J. Han, and P. S. Yu. Cross-relational clustering with user's guidance. In KDD '05, 2005.
[30]
X. Yin, J. Han, and P. S. Yu. Linkclus: efficient clustering via heterogeneous semantic links. In VLDB '06, 2006.
[31]
M. J. Zaki and M. Ogihara. Theoretical foundations of association rules. In SIGMOD-DMKD '98, 1998.
[32]
M. J. Zaki, M. Peters, I. Assent, and T. Seidl. Clicks: An effective algorithm for mining subspace clusters in categorical datasets. Data and Knowledge Engineering special issue on Intelligent Data Mining, 60 (2):51--70, 2007.
[33]
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In KDD '04, 2004.

Cited By

View all

Index Terms

  1. A game theoretic framework for heterogenous information network clustering

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2011
    1446 pages
    ISBN:9781450308137
    DOI:10.1145/2020408
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clustering
    2. game theoretic clustering
    3. information networks
    4. multi-way clustering

    Qualifiers

    • Poster

    Conference

    KDD '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Cross the data desertMultimedia Tools and Applications10.1007/s11042-018-6297-678:6(6409-6440)Online publication date: 1-Mar-2019
    • (2018)Ranking-based Collaborative Clustering for Heterogeneous Information Network2018 IEEE 18th International Conference on Communication Technology (ICCT)10.1109/ICCT.2018.8600099(246-250)Online publication date: Oct-2018
    • (2017)Picture or it didn't happenMultimedia Tools and Applications10.1007/s11042-016-3864-676:14(15681-15706)Online publication date: 1-Jul-2017
    • (2017)Survey of Current DevelopmentsHeterogeneous Information Network Analysis and Applications10.1007/978-3-319-56212-4_2(13-30)Online publication date: 26-May-2017
    • (2014)Community Dynamics: Event and Role Analysis in Social Network AnalysisAdvanced Data Mining and Applications10.1007/978-3-319-14717-8_7(85-97)Online publication date: 2014

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media