Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Structure and Overlaps of Ground-Truth Communities in Networks

Published: 30 April 2014 Publication History

Abstract

One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Even though detection of such communities is of great interest, understanding the structure communities in large networks remains relatively limited. In particular, due to the unavailability of labeled ground-truth data, it was traditionally very hard to develop accurate models of network community structure.
Here we use six large social, collaboration, and information networks where nodes explicitly state their ground-truth community memberships. For example, nodes in social networks join into explicitly defined interest based groups, and we use such groups as explicitly labeled ground-truth communities. We use such ground-truth communities to study their structural signatures by analyzing how ground-truth communities emerge in networks and how they overlap. We observe some surprising phenomena. First, ground-truth communities contain high-degree hub nodes that reside in community overlaps and link to most of the members of the community. Second, the overlaps of communities are more densely connected than the non-overlapping parts of communities. We show that this in contrast to the conventional wisdom that community overlaps are more sparsely connected than the non-overlapping parts themselves. We then show that many existing models of network communities do not capture dense community overlaps. This in turn means that most present models and community detection methods confuse overlaps as separate communities. In contrast, we present the community-affiliation graph model (AGM), a conceptual model of network community structure. We demonstrate that AGM reliably captures the overall structure of networks as well as the overlapping and hierarchical nature of network communities.

References

[1]
Bruno D. Abrahao, Sucheta Soundarajan, John E. Hopcroft, and Robert Kleinberg. 2012. On the separability of structural classes of communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12). 624--632.
[2]
Y.-Y. Ahn, J. P. Bagrow, and S. Lehmann. 2010. Link communities reveal multi-scale complexity in networks. Nature 466.
[3]
E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. 2007. Mixed membership stochastic blockmodels. J. Machine Learn. Res. 9, 1981--2014.
[4]
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. 2006. Group formation in large social networks: Membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). 44--54.
[5]
Brian Ball, Brian Karrer, and M. E. J. Newman. 2011. Efficient and principled method for detecting communities in networks. Phys. Rev. E 84.
[6]
A.-L. Barabási and Z. N. Oltvai. 2004. Network biology: Understanding the cell's functional organization. Nature Rev. Genetics 5, 2, 101--113.
[7]
Jeffrey Baumes, Mark Goldberg, and Malik Magdon-ismail. 2005. Efficient identification of overlapping communities. In Proceedings of the IEEE International Conference on Intelligence and Security Informatics (ISI'05). 27--36.
[8]
J. Baumes, M. Goldberg, M. Magdon-Ismail, and A. Wallace. 2004. Discovering hidden groups in communication networks. In Proceedings of the 2nd NSF/NIJ Symposium on Intelligence and Security Informatics (ISI'04).
[9]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Machine Learn. Res. 3, 993--1022.
[10]
R. L. Breiger. 1974. The duality of persons and groups. Social Forces 53, 2, 181--190.
[11]
D. Chakrabarti and C. Faloutsos. 2006. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38, 1, 2.
[12]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM'04).
[13]
A. Clauset, M. E. J. Newman, and C. Moore. 2004. Finding community structure in very large networks. Phys. Rev. E 70.
[14]
V. Colizza, A. Flammini, M. Serrano, and A. Vespignani. 2005. Characterization and modeling of protein protein interaction networks. Physica A Stat. Mech. Appl. 352, 1--27.
[15]
V. Colizza, A. Flammini, M. Serrano, and A. Vespignani. 2006. Detecting rich-club ordering in complex networks. Nature Phys. 2, 110--115.
[16]
I. S. Dhillon, Y. Guan, and B. Kulis. 2007. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Trans. Pattern Anal. Machine Intell. 29, 11, 1944--1957.
[17]
T. S. Evans and R. Lambiotte. 2009. Line graphs, link partitions, and overlapping communities. Phys. Rev. E 80.
[18]
M. Faloutsos, P. Faloutsos, and C. Faloutsos. 1999. On power-law relationships of the internet topology. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM'99). 251--262.
[19]
Illes J. Farkas, Imre Derényi, Albert-László Barabási, and Tamas Vicsek. 2001. Spectra of ‘Real-World’ graphs: Beyond the semi-circle law. Phys. Rev. E 64.
[20]
Scott L. Feld. 1981. The focused organization of social ties. Amer. J. Sociol. 86, 5, 1015--1035.
[21]
M. Fiedler. 1973. Algebraic connectivity of graphs. Czechoslovak Math. J. 23, 98, 298--305.
[22]
G. W. Flake, S. Lawrence, and C. L. Giles. 2000. Efficient identification of Web communities. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'00). 150--160.
[23]
S. Fortunato. 2010. Community detection in graphs. Physics Reports 486, 3--5, 75--174.
[24]
S. Fortunato and M. Barthélemy. 2007. Resolution limit in community detection. Proc. Nat. Acad. Sci. U.S.A. 104, 1, 36--41.
[25]
M. Girvan and M. E. J. Newman. 2002. Community structure in social and biological networks. Proc. Nat. Acad. Sci. U.S.A. 99, 12, 7821--7826.
[26]
David F. Gleich and C. Seshadhri. 2012. Neighborhoods are good communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'12).
[27]
M. S. Granovetter. 1973. The strength of weak ties. Amer. J. Sociol. 78, 1360--1380.
[28]
S. Gregory. 2010. Finding overlapping communities in networks by label propagation. New J. Physics 12, 10.
[29]
S. Guattery and G. L. Miller. 1998. On the quality of spectral separators. SIAM J. Matrix Anal. Appl. 19, 701--719.
[30]
Keith Henderson and Tina Eliassi-Rad. 2009. Applying latent dirichlet allocation to group discovery in large graphs. In Proceedings of the ACM Symposium on Applied Computing (SAC'09). 1456--1461.
[31]
Paul W. Holland, Kathryn Laskey, and Samuel Leinhardt. 1983. Stochastic blockmodels: First steps. Social Netw. 5, 2, 109--137.
[32]
J. Hopcroft, O. Khan, B. Kulis, and B. Selman. 2003. Natural communities in large linked networks. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'03). 541--546.
[33]
S. Kairam, D. Wang, and J. Leskovec. 2012. The life and death of online groups: Predicting group growth and longevity. In Proceedings of the ACM International Conference on Web Search and Data Minig (WSDM'12).
[34]
R. Kannan, S. Vempala, and A. Vetta. 2004. On clusterings: Good, bad and spectral. J. ACM 51, 3, 497--515.
[35]
B. Karrer, E. Levina, and M. E. J. Newman. 2008. Robustness of community structure in networks. Phys. Rev. E 77.
[36]
Brian Karrer and M. E. J. Newman. 2011. Stochastic blockmodels and community structure in networks. Phys. Rev. E 83.
[37]
G. Karypis and V. Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359--392.
[38]
R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. 2000. Stochastic models for the Web graph. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS'00). 57.
[39]
Andrea Lancichinetti and Santo Fortunato. 2009. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys. Rev. E 80, 1.
[40]
Silvio Lattanzi and D. Sivakumar. 2009. Affiliation networks. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing (STOC'09). 427--434.
[41]
Conrad Lee, Fergal Reid, Aaron McDaid, and Neil Hurley. 2010. Detecting highly overlapping community structure by greedy clique expansion. In Proceedings of the 4th International Workshop on Advances in Social Network Mining and Analysis (SNA-KDD'10).
[42]
J. Leskovec, L. A. Adamic, and B. A. Huberman. 2007a. The dynamics of viral marketing. ACM Trans. Web 1, 1.
[43]
J. Leskovec, J. Kleinberg, and C. Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD'05). 177--187.
[44]
J. Leskovec, J. Kleinberg, and C. Faloutsos. 2007b. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1, 1.
[45]
J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 1, 29--123.
[46]
J. Leskovec, K. Lang, and M. Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web (WWW'10).
[47]
Wangqun Lin, Xiangnan Kong, Philip S. Yu, Quanyuan Wu, Yan Jia, and Chuan Li. 2012. Community detection in incomplete information networks. In Proceedings of the 21st International Conference on World Wide Web (WWW'12). ACM, New York, NY, 341--350.
[48]
J. McAuley and J. Leskovec. 2012. Learning to discover social circles in ego networks. In Proceedings of the 26th Annual Conference on Advances in Neural Information Processing Systems (NIPS'12). 548--556.
[49]
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC'07). 29--42.
[50]
M. Mitzenmacher. 2004. A brief history of generative models for power law and lognormal distributions. Internet Math. 1, 2, 226--251.
[51]
M. Molloy and B. Reed. 1995. A critical point for random graphs with a given degree sequence. Random Struct. Algorit. 6, 161--180.
[52]
Morten Mørup, Mikkel N. Schmidt, and Lars Kai Hansen. 2011. Infinite multiple membership relational modeling for complex networks. CoRR abs/1101.5097.
[53]
Raj Rao Nadakuditi and M. E. J. Newman. 2012. Graph spectra and the detectability of community structure in networks. Phys. Rev. Lett. 108.
[54]
Tamás Nepusz, Haiyuan Yu, and Alberto Paccanaro. 2012. Detecting overlapping protein complexes in protein-protein interaction networks. Nature Methods 9, 471--472.
[55]
M. E. J. Newman. 2006. Modularity and community structure in networks. Proc. Nat. Acad. Sci. U.S.A. 103, 23, 8577--8582.
[56]
M. E. J. Newman and M. Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69.
[57]
G. Palla, I. Derényi, I. Farkas, and T. Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 7043, 814--818.
[58]
S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos. 2012. Community detection in social media. Data Mining Knowl. Discov. 24, 3, 515--554.
[59]
Ioannis Psorakis, Stephen Roberts, Mark Ebden, and Ben Sheldon. 2011. Overlapping community detection using Bayesian non-negative matrix factorization. Phys. Rev. E 83, 6.
[60]
Y. Qi, J. K. Seetharaman, and Z. B. Joseph. 2005. Random forest similarity for protein-protein interaction prediction from multiple sources. In Proceedings of the Pacific Symposium on Biocomputing.
[61]
F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. 2004. Defining and identifying communities in networks. Proc. Nat. Acad. Sci. U.S.A. 101, 9, 2658--2663.
[62]
M. Sales-Pardo, R. Guimerà, A. A. Moreira, and L. A. N. Amaral. 2007. Extracting the hierarchical organization of complex systems. Proc. Nat. Acad. Sci. U.S.A. 104, 18874--18874.
[63]
E. N. Sawardecker, M. Sales-Pardo, and L. A. N. Amaral. 2009. Detection of node group membership in networks with group overlap. Euro. Phys. J. B 67, 277--284.
[64]
S. E. Schaeffer. 2007. Graph Clustering. Comput. Sci. Rev. 1, 1, 27--64.
[65]
C. Seshadhri, Tamara G. Kolda, and Ali Pinar. 2012. Community structure and scale-free collections of Erdos-Renyi graphs. Phys. Rev. E 85.
[66]
Huawei Shen, Xueqi Cheng, Kai Cai, and Mao-Bin Hu. 2009. Detect overlapping and hierarchical community structure in networks. Physica A: Stat. Mech. Appl. 388, 8, 1706--1712.
[67]
J. Shi and J. Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Machine Intell. 22, 8, 888--905.
[68]
Georg Simmel. 1964. Conflict and the Web of Group Affiliations. Simon and Schuster.
[69]
D. A. Spielman and S.-H. Teng. 2007. Spectral partitioning works: Planar graphs and finite element meshes. Linear Algebra Appl. 421, 2--3, 284--305.
[70]
Yizhou Sun, Yintao Yu, and Jiawei Han. 2009. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 797--806.
[71]
Chayant Tantipathananandh, Tanya Berger-Wolf, and David Kempe. 2007. A framework for community identification in dynamic social networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07). 717--726.
[72]
Charalampos E. Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In Proceedings of the IEEE International Conference on Data Mining (ICDM'08). 608--617.
[73]
U. von Luxburg. 2007. A tutorial on spectral clustering. Stat. Comput. 17, 395--416.
[74]
Chunyan Wang, Mao Ye, and Wang-Chien Lee. 2012. From face-to-face gathering to social structure. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM'12). 465--474.
[75]
D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 440--442.
[76]
Jaewon Yang and Jure Leskovec. 2012a. Community-affiliation graph model for overlapping network community detection. In Proceedings of the IEEE International Conference on Data Mining (ICDM'12). 1170--1175.
[77]
Jaewon Yang and Jure Leskovec. 2012b. Defining and evaluating network communities based on ground-truth. In Proceedings of the IEEE International Conference on Data Mining (ICDM'12). 745--754.
[78]
Jaewon Yang and Jure Leskovec. 2012c. Structure and overlaps of communities in large networks. In Proceedings of the 6th International Workshop on Advances in Social Network Mining and Analysis (SNA-KDD'12).
[79]
Jaewon Yang and Jure Leskovec. 2013a. Defining and evaluating network communities based on ground-truth. Knowl. Inform. Syst. J.
[80]
J. Yang and J. Leskovec. 2013b. Overlapping community detection at scale: A non-negative factorization approach. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM'13). 587--596.
[81]
E. Zheleva, H. Sharara, and L. Getoor. 2009. Co-evolution of social and affiliation networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 1007--1016.
[82]
Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, and Hongyuan Zha. 2006. Probabilistic models for discovering e-communities. In Proceedings of the 15th International Conference on World Wide Web (WWW'06). 173--182.

Cited By

View all
  • (2024)Cuckoo Search Optimization-Based Influence Maximization in Dynamic Social NetworksACM Transactions on the Web10.1145/369064418:4(1-25)Online publication date: 28-Aug-2024
  • (2024)DCDIMB: Dynamic Community-based Diversified Influence Maximization using Bridge NodesACM Transactions on the Web10.1145/366461818:4(1-32)Online publication date: 11-May-2024
  • (2024)Temporal Graph Generation Featuring Time-Bound Communities2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00187(2365-2378)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. Structure and Overlaps of Ground-Truth Communities in Networks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 5, Issue 2
    Special Issue on Linking Social Granularity and Functions
    April 2014
    347 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/2611448
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 April 2014
    Accepted: 01 April 2013
    Revised: 01 March 2013
    Received: 01 January 2013
    Published in TIST Volume 5, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Network communities
    2. affiliation networks
    3. social networks

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)55
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 14 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Cuckoo Search Optimization-Based Influence Maximization in Dynamic Social NetworksACM Transactions on the Web10.1145/369064418:4(1-25)Online publication date: 28-Aug-2024
    • (2024)DCDIMB: Dynamic Community-based Diversified Influence Maximization using Bridge NodesACM Transactions on the Web10.1145/366461818:4(1-32)Online publication date: 11-May-2024
    • (2024)Temporal Graph Generation Featuring Time-Bound Communities2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00187(2365-2378)Online publication date: 13-May-2024
    • (2024)Community Robotics: A New Paradigm for Robotic Design and Innovation2024 IEEE International Conference on Advanced Robotics and Its Social Impacts (ARSO)10.1109/ARSO60199.2024.10557833(153-160)Online publication date: 20-May-2024
    • (2024) Hierarchical communities in the larval Drosophila connectome: Links to cellular annotations and network topology Proceedings of the National Academy of Sciences10.1073/pnas.2320177121121:38Online publication date: 13-Sep-2024
    • (2024)OlapGN: A multi-layered graph convolution network-based model for locating influential nodes in graph networksKnowledge-Based Systems10.1016/j.knosys.2023.111163283(111163)Online publication date: Jan-2024
    • (2024)Multiple rumor source identification in social networks leveraging community and monitor informationInformation Fusion10.1016/j.inffus.2024.102530111(102530)Online publication date: Nov-2024
    • (2024)Local clustering coefficient-based iterative peeling strategy to extract the core and peripheral layers of a networkApplied Network Science10.1007/s41109-024-00667-79:1Online publication date: 27-Aug-2024
    • (2024)Dual graph neural network for overlapping community detectionThe Journal of Supercomputing10.1007/s11227-023-05435-580:2(2196-2222)Online publication date: 1-Jan-2024
    • (2024)The Complexity of Cluster Vertex Splitting and CompanySOFSEM 2024: Theory and Practice of Computer Science10.1007/978-3-031-52113-3_16(226-239)Online publication date: 19-Feb-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media