Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Uncovering Hierarchical and Overlapping Communities with a Local-First Approach

Published: 25 August 2014 Publication History

Abstract

Community discovery in complex networks is the task of organizing a network’s structure by grouping together nodes related to each other. Traditional approaches are based on the assumption that there is a global-level organization in the network. However, in many scenarios, each node is the bearer of complex information and cannot be classified in disjoint clusters. The top-down global view of the partition approach is not designed for this. Here, we represent this complex information as multiple latent labels, and we postulate that edges in the networks are created among nodes carrying similar labels. The latent labels are the communities a node belongs to and we discover them with a simple local-first approach to community discovery. This is achieved by democratically letting each node vote for the communities it sees surrounding it in its limited view of the global system, its ego neighborhood, using a label propagation algorithm, assuming that each node is aware of the label it shares with each of its connections. The local communities are merged hierarchically, unveiling the modular organization of the network at the global level and identifying overlapping groups and groups of groups. We tested this intuition against the state-of-the-art overlapping community discovery and found that our new method advances in the chosen scenarios in the quality of the obtained communities. We perform a test on benchmark and on real-world networks, evaluating the quality of the community coverage by using the extracted communities to predict the metadata attached to the nodes, which we consider external information about the latent labels. We also provide an explanation about why real-world networks contain overlapping communities and how our logic is able to capture them. Finally, we show how our method is deterministic, is incremental, and has a limited time complexity, so that it can be used on real-world scale networks.

References

[1]
Yong-Yeol Ahn, James P. Bagrow, and Sune Lehmann. 2010. Link communities reveal multiscale complexity in networks. Nature 466, 7307 (June 2010), 761--764.
[2]
James P. Bagrow and Erik M. Bollt. 2005. Local method for detecting communities. Physical Review E 72, 4 (Oct. 2005), 046108+.
[3]
Michele Berlingerio, Michele Coscia, and Fosca Giannotti. 2011. Finding and characterizing communities in multidimensional networks. In Proceedings of the International Conference on Advances in Social Network Analysis and Mining (ASONAM’11), IEEE, Kaoshiung, Taiwan, 490--494.
[4]
Michele Berlingerio, Michele Coscia, Fosca Giannotti, Anna Monreale, and Dino Pedreschi. 2012. Multidimensional networks: Foundations of structural analysis. World Wide Web 16, 5--6, 567--593.
[5]
Steffen Bickel and Tobias Scheffer. 2004. Multi-view clustering. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04). IEEE Computer Society, Washington, DC, 19--26.
[6]
Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics (2008), P10008.
[7]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In Proceedings of the 20th International Conference on World Wide Web (WWW’11). 587--596.
[8]
Aaron Clauset, M. E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical Review E 70 (2004), 066111.
[9]
Michele Coscia, Fosca Giannotti, and Dino Pedreschi. 2011. A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining 4, 5 (2011), 512--546.
[10]
Michele Coscia, Giulio Rossetti, Fosca Giannotti, and Dino Pedreschi. 2012. DEMON: A local-first discovery method for overlapping communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). 615--623.
[11]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation (OSDI’’04). 137--150.
[12]
Imre Derényi, Gergely Palla, and Tamás Vicsek. 2005. Clique percolation in random networks. Physical Review Letters 94, 16 (April 2005), 160202+. http://dx.doi.org/10.1103/PhysRevLett.94.160202
[13]
T. S. Evans and R. Lambiotte. 2009. Line graphs, link partitions, and overlapping communities. Physical Review E 80, 1 (July 2009), 016105+.
[14]
S. Fortunato. 2010. Community detection in graphs. Physics Reports 486 (Feb. 2010), 75--174.
[15]
Santo Fortunato and Marc Barthélemy. 2007. Resolution limit in community detection. Proceedings of the National Academy of Sciences 104, 1 (Jan. 2007), 36--41.
[16]
M. Girvan and M. E. J. Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (June 2002), 7821--7826.
[17]
Shantanu Godbole and Sunita Sarawagi. 2004. Discriminative methods for multi-labeled classification. In Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’04). 22--30.
[18]
Amit Goyal, Byung-Won On, Francesco Bonchi, and Laks V. S. Lakshmanan. 2009. GuruMine: A pattern mining system for discovering leaders and tribes. In Proceedings of the International Conference on Data Engineering (ICDE’09). 1471--1474.
[19]
Keith Henderson, Tina Eliassi-Rad, Spiros Papadimitriou, and Christos Faloutsos. 2010. HCDF: A hybrid community discovery framework. In Proceedings of the SIAM Conference on Data Mining (SDM’10). 754--765.
[20]
Yang Jaewon and Jure Leskovec. 2013. Overlapping community detection at scale: A nonnegative matrix factorization approach. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM’13).
[21]
Liran Katzir, Edo Liberty, and Oren Somekh. 2011. Estimating sizes of social networks via biased sampling. In Proceedings of the International Conference on World Wide Web (WWW’11). 597--606.
[22]
Abhishek Kumar and Hal Daumé III. 2011. A co-training approach for multi-view spectral clustering. In Proceedings of the International Conference on Machine Learning (ICML’11). 393--400.
[23]
Andrea Lancichinetti and Santo Fortunato. 2009a. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Physical Review E 80, 1 (July 2009), 016118.
[24]
A. Lancichinetti and S. Fortunato. 2009b. Community detection algorithms: A comparative analysis. Physical Review E 80, 5 (Nov. 2009), 056117--+.
[25]
Andrea Lancichinetti, Santo Fortunato, and Jnos Kertsz. 2009. Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics 11, 3 (2009), 033015. http://stacks.iop.org/1367-2630/11/i=3/a=033015
[26]
Jure Leskovec, Lada A. Adamic, and Bernardo A. Huberman. 2007. The dynamics of viral marketing. ACM Transactions of the Web 1 (May 2007). http://dx.doi.org/10.1145/1232722.1232727
[27]
Bo Long, Philip S. Yu, and Zhongfei Zhang. 2008. A general model for multiple view unsupervised learning. In Proceedings of the 2008 SIAM International Conference on Data Mining.
[28]
Julian J. McAuley and Jure Leskovec. 2012. Learning to discover social circles in ego networks. In NIPS, Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C. Burges, Léon Bottou, and Kilian Q. Weinberger (Eds.). 548--556. Available at http://dblp.uni-trier.de/db/conf/nips/nips2012.html# McAuleyL12.
[29]
Aaron F. McDaid, Derek Greene, and Neil Hurley. 2011. Normalized mutual information to evaluate overlapping community finding algorithms. ArXiv. (Oct. 2011). http://arxiv.org/abs/1110.2515
[30]
Peter J. Mucha, Thomas Richardson, Kevin Macon, Mason A. Porter, and J.-P. Onnela. 2010. Community structure in time-dependent, multiscale, and multiplex networks. Science 328, 5980 (2010), 876--878. http://dx.doi.org/10.1126/science.1184819
[31]
M. E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23 (June 2006), 8577--8582.
[32]
Spiros Papadimitriou, Jimeng Sun, Christos Faloutsos, and Philip S. Yu. 2008. Hierarchical, parameter-free community discovery. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’08). 170--187.
[33]
Pascal Pons and Matthieu Latapy. 2006. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications 10, 2 (2006), 191--218.
[34]
Usha N. Raghavan, Réka Albert, and Soundar Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76, 3 (Sept. 2007), 036106+.
[35]
Jianhua Ruan and Weixiong Zhang. 2007. An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In Proceedings of the IEEE International Conference on Data Mining. 643--648.
[36]
G. Tsoumakas and I. Katakis. 2007. Multi label classification: An overview. International Journal of Data Warehousing and Mining 3, 3 (2007), 1--13.
[37]
Dashun Wang, Zhen Wen, Hanghang Tong, Ching-Yung Lin, Chaoming Song, and Albert-László Barabási. 2011. Information spreading in context. In Proceedings of the International Conference on World Wide Web (WWW’11). 735--744.
[38]
J. Xie and B. Szymanski. 2012. Towards linear time overlapping community detection in social networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’12) (2012).
[39]
Jierui Xie, Boleslaw K. Szymanski, and Xiaoming Liu. 2011. SLPA: Uncovering overlapping communities in social networks via a speaker-listener interaction dynamic process. In ICDM Workshops. 344--349.
[40]
Jaewon Yang and Jure Leskovec. 2012. Defining and evaluating network communities based on ground-truth. ArXiv. (Nov. 2012). http://arxiv.org/abs/1205.6233
[41]
Dengyong Zhou and Christopher J. C. Burges. 2007. Spectral clustering and transductive learning with multiple views. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). ACM, New York, NY, 1159--1166.

Cited By

View all
  • (2024)From plan to practice: Interorganizational crisis response networks from governmental guidelines and real‐world collaborations during hurricane eventsJournal of Contingencies and Crisis Management10.1111/1468-5973.1260132:3Online publication date: 27-Jul-2024
  • (2023)CDRKDJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23061445:2(2513-2527)Online publication date: 1-Jan-2023
  • (2023)An Overlapping Community Detection Approach Based on Deepwalk and Improved Label PropagationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.315257910:1(311-321)Online publication date: Feb-2023
  • Show More Cited By

Index Terms

  1. Uncovering Hierarchical and Overlapping Communities with a Local-First Approach

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 9, Issue 1
    October 2014
    209 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/2663598
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 August 2014
    Accepted: 01 March 2014
    Revised: 01 December 2013
    Received: 01 August 2013
    Published in TKDD Volume 9, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Complex networks
    2. community discovery
    3. data mining

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)30
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)From plan to practice: Interorganizational crisis response networks from governmental guidelines and real‐world collaborations during hurricane eventsJournal of Contingencies and Crisis Management10.1111/1468-5973.1260132:3Online publication date: 27-Jul-2024
    • (2023)CDRKDJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23061445:2(2513-2527)Online publication date: 1-Jan-2023
    • (2023)An Overlapping Community Detection Approach Based on Deepwalk and Improved Label PropagationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.315257910:1(311-321)Online publication date: Feb-2023
    • (2023)Overlapping community detection with adaptive density peaks clustering and iterative partition strategyExpert Systems with Applications10.1016/j.eswa.2022.119213213(119213)Online publication date: Mar-2023
    • (2022)Community Splitter: A Network Embedding Method for Predicting Missing Links2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA54385.2022.10032336(1-10)Online publication date: 13-Oct-2022
    • (2022)Multilayer cellular learning automata: A computational model to solve multilayer infrastructure problems with its application in community detection for multilayer networksJournal of Computational Science10.1016/j.jocs.2022.10168361(101683)Online publication date: May-2022
    • (2022)Identifiability and parameter estimation of the overlapped stochastic co-block modelStatistics and Computing10.1007/s11222-022-10114-132:4Online publication date: 28-Jun-2022
    • (2021)Persona2vec: a flexible multi-role representations learning framework for graphsPeerJ Computer Science10.7717/peerj-cs.4397(e439)Online publication date: 30-Mar-2021
    • (2021)Dynamic Community Structure in Online Social GroupsInformation10.3390/info1203011312:3(113)Online publication date: 5-Mar-2021
    • (2021)Detecting overlapping communities using ensemble-based distributed neighbourhood threshold method in social networksIntelligent Decision Technologies10.3233/IDT-200059(1-17)Online publication date: 21-May-2021
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media