Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3097983.3098069acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Local Higher-Order Graph Clustering

Published: 04 August 2017 Publication History

Abstract

Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal \emph{motif conductance}, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology.

Supplementary Material

MP4 File (yin_graph_clustering.mp4)

References

[1]
N. K. Ahmed, J. Neville, R. A. Rossi, and N. Duffield. Efficient graphlet counting for large networks. In ICDM, 2015.
[2]
R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In FOCS, 2006.
[3]
R. Andersen, F. Chung, and K. Lang. Local partitioning for directed graphs using PageRank. Internet Mathematics, 2008.
[4]
A. R. Benson, D. F. Gleich, and J. Leskovec. Tensor spectral clustering for partitioning higher-order network structures. In SDM, 2015.
[5]
A. R. Benson, D. F. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 2016.
[6]
M. Bressan, F. Chierichetti, R. Kumar, S. Leucci, and A. Panconesi. Counting graphlets: Space vs time. In WSDM, 2017.
[7]
N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on Computing, 1985.
[8]
F. Chung and O. Simpson. Solving linear systems with boundary conditions using heat kernel pagerank. In WAW, 2013.
[9]
F. R. Chung. Four proofs for the cheeger inequality and graph partition algorithms. In ICCM, 2007.
[10]
A. Clauset. Finding local community structure in networks. Phys. Rev. E, 72(2), 2005.
[11]
W. Cui, Y. Xiao, H. Wang, and W. Wang. Local search of communities in large graphs. In SIGMOD, 2014.
[12]
L. De Stefani, A. Epasto, M. Riondato, and E. Upfal. Trièst: Counting local and global triangles in fully-dynamic streams with fixed memory size. In KDD, 2016.
[13]
A. Epasto, J. Feldman, S. Lattanzi, S. Leonardi, and V. Mirrokni. Reduce and aggregate: similarity ranking in multi-categorical bipartite graphs. In WWW, 2014.
[14]
M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical J., 1973.
[15]
S. Fortunato. Community detection in graphs. Physics Reports, 2010.
[16]
U. Gargi, W. Lu, V. Mirrokni, and S. Yoon. Large-scale community detection on youtube for topic discovery and exploration. In ICWSM, 2011.
[17]
D. F. Gleich and M. W. Mahoney. Using local spectral methods to robustify graph-based learning algorithms. In KDD, 2015.
[18]
D. F. Gleich and C. Seshadhri. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In KDD, 2012.
[19]
M. S. Granovetter. The strength of weak ties. American J. of Sociology, 1973.
[20]
X. Huang, H. Cheng, L. Qin, W. Tian, and J. X. Yu. Querying k-truss community in large and dynamic graphs. In SIGMOD, 2014.
[21]
S. Jain and C. Seshadhri. A fast and provable method for estimating clique counts using turán's theorem. In WWW, 2017.
[22]
L. G. S. Jeub, P. Balachandran, M. A. Porter, P. J. Mucha, and M. W. Mahoney. Think locally, act locally: Detection of small, medium-sized, and large communities in large networks. Physics Review E, 91, 2015.
[23]
M. Jha, C. Seshadhri, and A. Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In WWW, 2015.
[24]
B.-B. Jiang, J.-G. Wang, Y. Wang, and J. Xiao. Gene prioritization for type 2 diabetes in tissue-specific protein interaction networks. Systems Biology, 2009.
[25]
B. Klimt and Y. Yang. Introducing the Enron corpus. In CEAS, 2004.
[26]
K. Kloster and D. F. Gleich. Heat kernel based community detection. In KDD, 2014.
[27]
C. Klymko, D. F. Gleich, and T. G. Kolda. Using triangles to improve community detection in directed networks. In ASE BigData Conference, 2014.
[28]
A. Lancichinetti and S. Fortunato. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Physical Review E, 2009.
[29]
K. J. Lang. Fixing two weaknesses of the spectral method. In NIPS, 2005.
[30]
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. TKDD, 2007.
[31]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 2009.
[32]
Y. Li, K. He, D. Bindel, and J. E. Hopcroft. Uncovering the small community structure in large networks: A local spectral approach. In WWW, 2015.
[33]
M. W. Mahoney, L. Orecchia, and N. K. Vishnoi. A local spectral method for graphs: With applications to improving graph partitions and exploring data graphs locally. JMLR, 2012.
[34]
D. Marcus and Y. Shavitt. Efficient counting of network motifs. In ICDCSW, 2010.
[35]
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 2002.
[36]
L. Orecchia and Z. A. Zhu. Flow-based algorithms for local graph clustering. In SODA, 2014.
[37]
N. Prvzulj. Biological network comparison using graphlet degree distribution. Bioinformatics, 2007.
[38]
K. Rohe and T. Qin. The blessing of transitivity in sparse and stochastic networks. arXiv:1307.2302, 2013.
[39]
S. E. Schaeffer. Graph clustering. Computer Science Rev., 2007.
[40]
M. Sozio and A. Gionis. The community-search problem and how to plan a successful cocktail party. In KDD, 2010.
[41]
O. Sporns and R. Kötter. Motifs in brain networks. PLOS Biology, 2004.
[42]
A. L. Traud, P. J. Mucha, and M. A. Porter. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications, 2012.
[43]
C. Tsourakakis, J. Pachocki, and M. Mitzenmacher. Scalable motif-aware graph clustering. In WWW, 2017.
[44]
J. Ugander, L. Backstrom, and J. Kleinberg. Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In WWW, 2013.
[45]
K. Voevodski, S.-H. Teng, and Y. Xia. Spectral affinity in protein networks. BMC Systems Biology, 2009.
[46]
D. Wagner and F. Wagner. Between min cut and graph bisection. In MFCS, 1993.
[47]
J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 2015.
[48]
Ö. N. Yaverouglu, N. Malod-Dognin, D. Davis, Z. Levnajic, V. Janjic, R. Karapandza, A. Stojmirovic, and N. Prvzulj. Revealing the hidden language of complex networks. Scientific Reports, 2014.
[49]
H. Yin, A. R. Benson, and J. Leskovec. Higher-order clustering in networks. arXiv:1704.03913, 2017.
[50]
Z. A. Zhu, S. Lattanzi, and V. S. Mirrokni. A local algorithm for finding well-connected clusters. In ICML, 2013.endthebibliography

Cited By

View all
  • (2025)Iterative active learning strategies for subgraph matchingPattern Recognition10.1016/j.patcog.2024.110797158(110797)Online publication date: Feb-2025
  • (2024)Edge Deletion based Subgraph HidingWSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS10.37394/23209.2024.21.3221(333-347)Online publication date: 17-Jul-2024
  • (2024)Approaches to Extracting Patterns of Service Utilization for Patients with Complex Conditions: Graph Community Detection vs. Natural Language Processing ClusteringBioMedInformatics10.3390/biomedinformatics40301034:3(1884-1900)Online publication date: 9-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering
  2. community detection
  3. higher-order structure
  4. motifs

Qualifiers

  • Research-article

Funding Sources

  • DARPA SIMPLEX
  • Sloan Research Fellowship
  • Chan Zuckerberg Biohub
  • NSF Center for Science of Information STC
  • Huawei
  • NSF
  • Tencent
  • Stanford Data Science Initiative

Conference

KDD '17
Sponsor:

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)786
  • Downloads (Last 6 weeks)118
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Iterative active learning strategies for subgraph matchingPattern Recognition10.1016/j.patcog.2024.110797158(110797)Online publication date: Feb-2025
  • (2024)Edge Deletion based Subgraph HidingWSEAS TRANSACTIONS ON INFORMATION SCIENCE AND APPLICATIONS10.37394/23209.2024.21.3221(333-347)Online publication date: 17-Jul-2024
  • (2024)Approaches to Extracting Patterns of Service Utilization for Patients with Complex Conditions: Graph Community Detection vs. Natural Language Processing ClusteringBioMedInformatics10.3390/biomedinformatics40301034:3(1884-1900)Online publication date: 9-Aug-2024
  • (2024)Community Search Based on Containment Control of Multi-Agent System with Opinion Leaders2024 43rd Chinese Control Conference (CCC)10.23919/CCC63176.2024.10661912(6103-6108)Online publication date: 28-Jul-2024
  • (2024)Efficient Maximal Motif-Clique Enumeration over Large Heterogeneous Information NetworksProceedings of the VLDB Endowment10.14778/3681954.368197517:11(2946-2959)Online publication date: 30-Aug-2024
  • (2024)Well-connectedness and community detectionPLOS Complex Systems10.1371/journal.pcsy.00000091:3(e0000009)Online publication date: 5-Nov-2024
  • (2024)Higher-Order Networks Representation and Learning: A SurveyACM SIGKDD Explorations Newsletter10.1145/3682112.368211426:1(1-18)Online publication date: 25-Jul-2024
  • (2024)DCDIMB: Dynamic Community-based Diversified Influence Maximization using Bridge NodesACM Transactions on the Web10.1145/366461818:4(1-32)Online publication date: 11-May-2024
  • (2024)Representative and Back-In-Time Sampling from Real-world HypergraphsACM Transactions on Knowledge Discovery from Data10.1145/365330618:6(1-48)Online publication date: 26-Apr-2024
  • (2024)Regularized Unconstrained Weakly Submodular MaximizationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679651(3537-3548)Online publication date: 21-Oct-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media