Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1989323.1989399acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Local graph sparsification for scalable clustering

Published: 12 June 2011 Publication History

Abstract

In this paper we look at how to sparsify a graph i.e. how to reduce the edgeset while keeping the nodes intact, so as to enable faster graph clustering without sacrificing quality. The main idea behind our approach is to preferentially retain the edges that are likely to be part of the same cluster. We propose to rank edges using a simple similarity-based heuristic that we efficiently compute by comparing the minhash signatures of the nodes incident to the edge. For each node, we select the top few edges to be retained in the sparsified graph. Extensive empirical results on several real networks and using four state-of-the-art graph clustering and community discovery algorithms reveal that our proposed approach realizes excellent speedups (often in the range 10-50), with little or no deterioration in the quality of the resulting clusters. In fact, for at least two of the four clustering algorithms, our sparsification consistently enables higher clustering accuracies.

References

[1]
D. Achlioptas and F. McSherry. Fast computation of low rank matrix approximations. In STOC '01, pages 611--618, 2001.
[2]
S. Arora, E. Hazan, and S. Kale. A fast random sampling algorithm for sparsifying matrices. APPROX-RANDOM '06, pages 272--279, 2006.
[3]
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD '08, pages 16--24, 2008.
[4]
A. A. Benczúr and D. R. Karger. Approximating s-t minimum cuts in O(n2) time. In STOC '96, pages 47--55, 1996.
[5]
T. Bohman, C. Coopery, and A. Friezez. Min-Wise independent linear permutations. the electronic journal of combinatorics, 7(R26):2, 2000.
[6]
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations (extended abstract). In STOC '98, pages 327--336, 1998.
[7]
G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM '08, pages 95--106, 2008.
[8]
A. Clauset, C. Shalizi, and M. Newman. Power-law distributions in empirical data. SIAM review, 51(4):661--703, 2009.
[9]
M. Costanzo, A. Baryshnikova, J. Bellay, Y. Kim, E. Spear, C. Sevier, H. Ding, J. Koh, K. Toufighi, S. Mostafavi, et al. The genetic landscape of a cell. Science's STKE, 327(5964):425, 2010.
[10]
B. B. Dalvi, M. Kshirsagar, and S. Sudarshan. Keyword search on external memory data graphs. PVLDB, 1(1):1189--1204, 2008.
[11]
I. S. Dhillon, Y. Guan, and B. Kulis. Weighted Graph Cuts without Eigenvectors: A Multilevel Approach. IEEE Trans. PAMI, 29(11):1944--1957, 2007.
[12]
S. Fortunato. Community detection in graphs. Phys. Reports, 486:75--174, 2010.
[13]
D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB '05, pages 721--732, 2005.
[14]
J. Glattfelder and S. Battiston. Backbone of complex networks of corporations: The flow of control. Phys. Rev. E, 80(3):36104, 2009.
[15]
R. C. K. and S. Sudarshan. Graph clustering for keyword search. In COMAD '09, 2009.
[16]
D. R. Karger. Random sampling in cut, flow, and network design problems. In STOC '94, pages 648--657, 1994.
[17]
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359, 1999.
[18]
V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, J. Cui, and A. Percus. Reducing large internet topologies for faster simulations. NETWORKING '05, pages 328--341, 2005.
[19]
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW '10, pages 591--600, 2010.
[20]
A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Phys. Rev. E, 78(4):46110, 2008.
[21]
K. Lang and S. Rao. A flow-based method for improving the expansion or conductance of graph cuts. LNCS '04, pages 325--337, 2004.
[22]
J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD '06, pages 631--636, 2006.
[23]
J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Statistical properties of community structure in large social and information networks. In WWW '08, pages 695--704, 2008.
[24]
S. Macskassy and F. Provost. Classification in networked data: A toolkit and a univariate case study. The Journal of Machine Learning Research, 8:935--983, 2007.
[25]
A. S. Maiya and T. Y. Berger-Wolf. Sampling community structure. In WWW '10, pages 701--710, 2010.
[26]
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In IMC '07, pages 29--42, October 2007.
[27]
M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):026113, Feb 2004.
[28]
F. Provost and V. Kolluri. A survey of methods for scaling up inductive algorithms. DMKD, 3(2):131--169, 1999.
[29]
S. Pu, J. Wong, B. Turner, E. Cho, and S. Wodak. Up-to-date catalogues of yeast protein complexes. Nucleic acids research, 37(3):825--831, 2009.
[30]
A. Ruepp et al. CORUM: the comprehensive resource of mammalian protein complexes. Nucleic Acids Research, 36(suppl 1):D646, 2008.
[31]
V. Satuluri and S. Parthasarathy. Scalable graph clustering using stochastic flows: applications to community discovery. In KDD '09, pages 737--746, 2009.
[32]
V. Satuluri and S. Parthasarathy. Symmetrizations for clustering directed graphs. In EDBT '11, 2011.
[33]
V. Satuluri, S. Parthasarathy, and Y. Ruan. Local graph sparsification for scalable clustering. Technical Report OSU-CISRC-11/10-TR25, The Ohio State University.
[34]
J. Shi and J. Malik. Normalized Cuts and Image Segmentation. IEEE Trans. PAMI, 22(8), 2000.
[35]
D. A. Spielman and N. Srivastava. Graph sparsification by effective resistances. In STOC '08, pages 563--568, 2008.
[36]
D. A. Spielman and S. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC '04, pages 81--90, 2004.
[37]
M. Tumminello, T. Aste, T. Di Matteo, and R. Mantegna. A tool for filtering information in complex systems. PNAS, 102(30):10421, 2005.
[38]
S. Van Dongen. Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, 2000.

Cited By

View all
  • (2024)Graph modeling of relational structures among functioning variables with low back pain: an exploratory analysis based on International Classification of Functioning, Disability and HealthEuropean Journal of Physical and Rehabilitation Medicine10.23736/S1973-9087.24.08089-460:3Online publication date: Jul-2024
  • (2024)Interpretation of course conceptual structure and student self-efficacy: an integrated strategy of knowledge graphs with item response modelingBMC Medical Education10.1186/s12909-024-05401-624:1Online publication date: 23-May-2024
  • (2024)DoppelGanger++: Towards Fast Dependency Graph Generation for Database ReplayProceedings of the ACM on Management of Data10.1145/36393222:1(1-26)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph clustering
  2. graph sparsification
  3. minwise hashing

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)97
  • Downloads (Last 6 weeks)14
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Graph modeling of relational structures among functioning variables with low back pain: an exploratory analysis based on International Classification of Functioning, Disability and HealthEuropean Journal of Physical and Rehabilitation Medicine10.23736/S1973-9087.24.08089-460:3Online publication date: Jul-2024
  • (2024)Interpretation of course conceptual structure and student self-efficacy: an integrated strategy of knowledge graphs with item response modelingBMC Medical Education10.1186/s12909-024-05401-624:1Online publication date: 23-May-2024
  • (2024)DoppelGanger++: Towards Fast Dependency Graph Generation for Database ReplayProceedings of the ACM on Management of Data10.1145/36393222:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Graph Representation Learning for Contention and Interference Management in Wireless NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2024.335593532:3(2479-2494)Online publication date: Jun-2024
  • (2024)SparGNN: Efficient Joint Feature-Model Sparsity Exploitation in Graph Neural Network Acceleration2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473883(225-230)Online publication date: 22-Jan-2024
  • (2024)New business model and local governance in supporting social and environmental solutions: A social network analysis to evaluate the Italian local action group’s “Terra è Vita” roleSocio-Economic Planning Sciences10.1016/j.seps.2024.101945(101945)Online publication date: May-2024
  • (2024)Generic network sparsification via degree- and subgraph-based edge samplingInformation Sciences: an International Journal10.1016/j.ins.2024.121096679:COnline publication date: 1-Sep-2024
  • (2024)Identifying similar-bicliques in bipartite graphsThe VLDB Journal10.1007/s00778-023-00834-933:3(703-726)Online publication date: 25-Jan-2024
  • (2023)On the ability of graph neural networks to model interactions between verticesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667275(26501-26545)Online publication date: 10-Dec-2023
  • (2023)Demystifying Graph Sparsification Algorithms in Graph Properties PreservationProceedings of the VLDB Endowment10.14778/3632093.363210617:3(427-440)Online publication date: 1-Nov-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media