Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Community detection via heterogeneous interaction analysis

Published: 01 July 2012 Publication History

Abstract

The pervasiveness of Web 2.0 and social networking sites has enabled people to interact with each other easily through various social media. For instance, popular sites like Del.icio.us, Flickr, and YouTube allow users to comment on shared content (bookmarks, photos, videos), and users can tag their favorite content. Users can also connect with one another, and subscribe to or become a fan or a follower of others. These diverse activities result in a multi-dimensional network among actors, forming group structures with group members sharing similar interests or affiliations. This work systematically addresses two challenges. First, it is challenging to effectively integrate interactions over multiple dimensions to discover hidden community structures shared by heterogeneous interactions. We show that representative community detection methods for single-dimensional networks can be presented in a unified view. Based on this unified view, we present and analyze four possible integration strategies to extend community detection from single-dimensional to multi-dimensional networks. In particular, we propose a novel integration scheme based on structural features. Another challenge is the evaluation of different methods without ground truth information about community membership. We employ a novel cross-dimension network validation (CDNV) procedure to compare the performance of different methods. We use synthetic data to deepen our understanding, and real-world data to compare integration strategies as well as baseline methods in a large scale. We study further the computational time of different methods, normalization effect during integration, sensitivity to related parameters, and alternative community detection methods for integration.

References

[1]
Abou-Rjeili A, Karypis G (2006) Multilevel algorithms for partitioning power-law graphs. In: Proceedings of the 20th international conference on parallel and distributed processing (IPDPS'06). IEEE Computer Society, Washington, DC, p 124.
[2]
Airodi EM, Blei D, Fienberg SE, Xing EP (2008) Mixed membership stochastic blockmodels. J Mach Learn Res 9:1981-2014.
[3]
Aleman-Meza B, Nagarajan M, Ramakrishnan C, Ding L, Kolari P, Sheth AP, Arpinar BI, Joshi A, Finin T (2006) Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection. In: WWW '06: proceedings of the 15th international conference on World Wide Web. ACM Press, New York, pp 407-416.
[4]
Argyriou A, Herbster M, Pontil M (2006) Combining graph Laplacians for semi-supervised learning. Adv Neural Inf Process Syst 18:67.
[5]
Asur S, Parthasarathy S, Ucar D (2007) An event-based framework for characterizing the evolutionary behavior of interaction graphs. In: KDD '07: proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 913-921.
[6]
Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: KDD '06: proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 44-54.
[7]
Bansal N, Chiang F, Koudas N, Tompa FW (2007) Seeking stable clusters in the blogosphere. In: VLDB '07: proceedings of the 33rd international conference on very large data bases. Vienna, Austria, pp 806-817.
[8]
Bickel S, Scheffere T (2004) Multi-view clustering. In: ICDM '04: proceedings of the fourth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 19-26.
[9]
Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications. Springer, New York.
[10]
Chaudhuri K, Kakade SM, Livescu K, Sridharan K (2009) Multi-view clustering via canonical correlation analysis. In: ICML '09: proceedings of the 26th annual international conference on machine learning. ACM, New York, pp 1-8.
[11]
Clauset A, Newman M, Moore C (2004) Finding community structure in very large networks. Arxiv preprint cond-mat/0408187.
[12]
Clauset A, Shalizi CR, Newman MEJ (2007) Power-law distributions in empirical data. arXiv, 706.
[13]
de Sa VR (2005) Spectral clustering with two views. In: Proceedings of workshop of learning with multiple views. Bonn, Germany, pp 20-27.
[14]
Demmel J, Dongarra J, Ruhe A, van der Vorst H (2000) Templates for the solution of algebraic eigenvalue problems: a practical guide. Society for Industrial and Applied Mathematics, Philadelphia.
[15]
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: ICML '04: proceedings of the twenty-first international conference on machine learning. ACM, New York, p 36.
[16]
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3-5):75-174.
[17]
Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci USA 104(1):36-41.
[18]
Goder A, Filkov V (2008) Consensus clustering algorithms: comparison and refinement. In: Proceedings of the workshop on algorithm engineering and experiments, ALENEX 2008, San Francisco, CA, January 19, 2008.
[19]
Good B, De Montjoye Y, Clauset A (2010) Performance of modularity maximization in practical contexts. Phys Rev E 81(4):046106.
[20]
Guimera R, Amaral LAN (2005) Functional cartography of complex metabolic networks. Nature 433:895-900.
[21]
Hopcroft J, Khan O, Kulis B, Selman B (2003) Natural communities in large linked networks. In: KDD '03: proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 541-546.
[22]
Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3/4):321-377.
[23]
Hu T, Sung SY (2005) Consensus clustering. Intell Data Anal 9(6):551-565.
[24]
Jung J, Euzenat J (2007) Towards semantic social networks. In: Proceedings of the European semantic Web conference (ESWC2007), volume 4519 of LNCS. Springer, Berlin, pp 267-280.
[25]
Kang H, Getoor L, Singh L (2007) Visual analysis of dynamic group membership in temporal social networks. SIGKDD Explor Spec Issue Vis Anal 9(2):13-21.
[26]
Kemp C, Tenenbaum J, Griffiths T, Yamada T, Ueda N (1999/2006) Learning systems of concepts with an infinite relational model. In: Proceedings of the national conference on artificial intelligence, vol 21. AAAI Press/MIT Press, Menlo Park/Cambridge, p 381.
[27]
Kettenring J (1971) Canonical analysis of several sets of variables. Biometrika 58:433-451.
[28]
Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604-632.
[29]
Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80(5):056117.
[30]
Leskovec J, Lang KJ, Mahoney M (2010) Empirical comparison of algorithms for network community detection. In: WWW '10: proceedings of the 19th international conference on World Wide Web. ACM, New York, pp 631-640.
[31]
Lin Y-R, Chi Y, Zhu S, Sundaram H, Tseng BL (2008) Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In: Proceeding of the 17th international conference on World Wide Web (WWW '08). ACM, New York, NY, pp 685-694.
[32]
Lin Y-R, Chi Y, Zhu S, Sundaram H, Tseng BL (2009) Analyzing communities and their evolutions in dynamic social networks. ACM Trans Knowl Discov Data 3(2):1-31.
[33]
Lizorkin D, Medelyan O, Grineva M (2009) Analysis of community structure in Wikipedia. In: WWW'09: proceedings of the 18th international conference on World Wide Web. ACM, New York, pp 1221-1222.
[34]
Long B, Yu PS, Zhang ZM (2008) A general model for multiple view unsupervised learning. In: Proceedings of the 2008 SIAM international conference on data mining, pp 822-833.
[35]
Luxburg Uv (2007) A tutorial on spectral clustering. Stat Comput 17(4):395-416.
[36]
McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415-444.
[37]
Mika P (2007) Ontologies are us: a unified model of social networks and semantics. In: Web semantics: science, services and agents on the World Wide Web, vol 5, no 1, pp 5-15. Selected papers from the international semantic Web conference (ISWC2005).
[38]
Miller K, Griffiths T, Jordan M (2009) Nonparametric latent feature models for link prediction. Adv Neural Inf Process Syst 22:1276-1284.
[39]
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1-2):91-118.
[40]
Mucha P, Richardson T, Macon K, Porter M, Onnela J (2010) Community structure in time-dependent, multiscale, and multiplex networks. Science 328(5980):876-878.
[41]
Newman M (2006a) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104.
[42]
Newman M (2006b) Modularity and community structure in networks. Proc Natl Acad Sci USA 103(23):8577-8582.
[43]
Nguyen N, Caruana R (2007) Consensus clusterings. In: ICDM '07: proceedings of the 2007 seventh IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 607-612.
[44]
Palla G, Barabasi A-L, Vicsek T (2007) Quantifying social group evolution. Nature 446(7136):664-667.
[45]
Richardson M, Domingos P (2002) Mining knowledge-sharing sites for viral marketing. In: KDD '02: proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 61-70.
[46]
Rosvall M, Bergstrom C (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4):1118-1123.
[47]
Sarkar P, Moore AW (2005) Dynamic social network analysis using latent space models. SIGKDD Explor Newsl 7(2):31-40.
[48]
Specia L, Motta E (2007) Integrating folksonomies with the semantic web. In: Proceedings of the 4th European conference on the semantic Web: research and applications. Springer, Berlin, pp 624-639.
[49]
Strehl A, Ghosh J (2003) Cluster ensembles--a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583-617.
[50]
Tang L, Liu H (2009a) Relational learning via latent social dimensions. In: KDD '09: proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 817-826.
[51]
Tang L, Liu H (2009b) Scalable learning of collective behavior based on sparse social dimensions. In: CIKM '09: proceeding of the 18th ACM conference on information and knowledge management. ACM, New York, pp 1107-1116.
[52]
Tang L, Liu H, Zhang J, Nazeri Z (2008) Community evolution in dynamic multi-mode networks. In: KDD '08: proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 677-685.
[53]
Tang L, Wang X, Liu H (2009a) Uncovering groups via heterogeneous interaction analysis. In: Proceeding of IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 503-512.
[54]
Tang W, Lu Z, Dhillon IS (2009b) Clustering with multiple graphs. In: ICDM '09: proceedings of the 2009 ninth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 1016-1021.
[55]
Topchy A, Jain AK, Punch W (2003) Combining multiple weak clusterings. In: ICDM '03: proceedings of the third IEEE international conference on data mining. IEEE Computer Society, Washington, DC, p 331.
[56]
Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, Cambridge.
[57]
White S, Smyth P (2005) A spectral clustering approach to finding communities in graphs. In: Proceedings of the fifth SIAM international conference on data mining, pp 274-285.
[58]
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufman, San Francisco.
[59]
Yingzi Jin YM, Ishizuka M (2008) Ranking entities on the web using social network mining and ranking learning. In: WWW'08: proceedings of the 17th international conference on World Wide Web. ACM Press, New York, pp 21-25.
[60]
Yu K, Yu S, Tresp V (2005) Soft clustering on graphs. In: Advances in neural information processing systems 18. MIT Press, Cambridge, pp 1553-1560.
[61]
Zhou D, Burges CJC (2007) Spectral clustering and transductive learning with multiple views. In: ICML'07: proceedings of the 24th international conference on machine learning. ACM, New York, 1159-1166.
[62]
Zhou D, Zhu S, Yu K, Song X, Tseng BL, Zha H, Giles CL (2008) Learning multiple graphs for document recommendations. In: WWW '08: proceeding of the 17th international conference on World Wide Web. ACM, New York, pp 141-150.

Cited By

View all
  • (2024)Multiplex Community Detection in Social Networks Using a Chaos-Based Hybrid Evolutionary ApproachComplexity10.1155/2024/10160862024Online publication date: 1-Jan-2024
  • (2023)Supervised community detection in multiplex networks based on layers convex flattening and modularity optimizationProcedia Computer Science10.1016/j.procs.2022.11.002212:C(181-190)Online publication date: 20-Jan-2023
  • (2022)Density-Based Subspace Clustering in Heterogeneous NetworksMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44848-9_10(149-164)Online publication date: 10-Mar-2022
  • Show More Cited By

Recommendations

Reviews

Dejing Dou

Real-world social networks normally have multiple dimensions, each of which represents one type of interaction between connected people, such as face-to-face, by email, or by phone. Most previous research on community detection has focused on only one dimension of interaction. Community detection in a multiple-dimensional network is definitely a more challenging problem. To handle the problem, the authors first show that representative community detection methods can be classified using four components: network interactions, utility matrix, structural features, and community partitions. Then, the information from different network dimensions can be integrated using four corresponding integration schemes: network integration, utility integration, feature integration, and partition integration. Through experiments with synthetic data and real-world YouTube data, the authors show that feature integration via principal component analysis (PCA) outperforms other integration schemes in terms of normalized mutual information (NMI) and cross-dimension network validation (CDNV), while utility integration is more efficient with a clever weighting scheme over each dimension. The main contribution of this paper is the presentation of various community detection methods in a unified view with four components. A second contribution is the comparison of different integration schemes in terms of performance and efficiency. The limitation of the approach is the assumption that "heterogeneous interactions share the same community structure," which may not be true in real-world social networks. A possible future extension of the proposed framework would be to enable it to "automatically determine which dimensions share the same community structure." Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Data Mining and Knowledge Discovery
Data Mining and Knowledge Discovery  Volume 25, Issue 1
July 2012
168 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2012

Author Tags

  1. Community detection
  2. Heterogeneous interactions
  3. Multi-dimensional networks
  4. Network integration
  5. Social media

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Multiplex Community Detection in Social Networks Using a Chaos-Based Hybrid Evolutionary ApproachComplexity10.1155/2024/10160862024Online publication date: 1-Jan-2024
  • (2023)Supervised community detection in multiplex networks based on layers convex flattening and modularity optimizationProcedia Computer Science10.1016/j.procs.2022.11.002212:C(181-190)Online publication date: 20-Jan-2023
  • (2022)Density-Based Subspace Clustering in Heterogeneous NetworksMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44848-9_10(149-164)Online publication date: 10-Mar-2022
  • (2022)Improving Community Detection Performance in Heterogeneous Music Network by Learning Edge-Type Usefulness DistributionInformation for a Better World: Shaping the Global Future10.1007/978-3-030-96960-8_5(68-78)Online publication date: 28-Feb-2022
  • (2021)A Novel Emerging Topic Identification and Evolution Discovery Method on Time-Evolving and Heterogeneous Online Social NetworksComplexity10.1155/2021/88592252021Online publication date: 1-Jan-2021
  • (2021)Community Detection in Multiplex NetworksACM Computing Surveys10.1145/344468854:3(1-35)Online publication date: 8-May-2021
  • (2021)MultiGBSJournal of Biomedical Informatics10.1016/j.jbi.2021.103706116:COnline publication date: 1-Apr-2021
  • (2021)Combining Information from Multiple Views for Vertex-Based Change Detection in Dynamic Networks: A Comparative StudySN Computer Science10.1007/s42979-021-00982-13:2Online publication date: 17-Dec-2021
  • (2021)A survey of community detection methods in multilayer networksData Mining and Knowledge Discovery10.1007/s10618-020-00716-635:1(1-45)Online publication date: 1-Jan-2021
  • (2020)Nonlinear Structural Fusion for Multiplex NetworkComplexity10.1155/2020/70415642020Online publication date: 1-Jan-2020
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media