Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2736277.2741640acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions

Published: 18 May 2015 Publication History

Abstract

Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the goal is rarely to find the "true optimum", but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective, and only find a few such subgraphs without providing any structural relations. We define the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which enables discovering overlapping dense subgraphs. With the right parameters, the nucleus decomposition generalizes the classic notions of k-cores and k-truss decompositions. We give provably efficient algorithms for nucleus decompositions, and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures, and outputs dense subgraphs of higher quality than other state-of-the-art solutions. Our algorithm can process graphs with tens of millions of edges in less than an hour.

References

[1]
University of Florida sparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices/.
[2]
J. I. Alvarez-Hamelin, A. Barrat, and A. Vespignani. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in Neural Information Processing Systems 18, pages 41--50, 2006.
[3]
R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In Workshop on Algorithms and Models for the Web-Graph (WAW), pages 25--37, 2009.
[4]
A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc. VLDB Endow., 5(6):574--585, Feb. 2012.
[5]
Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. Greedily finding a dense subgraph. J. Algorithms, 34(2):203--221, Feb. 2000.
[6]
D. J. Beal, R. Cohen, M. J. Burke, and C. L. McLendon. Cohesion and performance in groups: A meta-analytic clarification of construct relation. Journal of Applied Psychology, 88:989--1004, 2003.
[7]
C. Bron and J. Kerbosch. Algorithm 457: Finding all cliques of an undirected graph. Commun. ACM, 16(9):575--577, Sept. 1973.
[8]
G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In Proc. of the 2008 International Conference on Web Search and Data Mining, WSDM '08, pages 95--106, 2008.
[9]
M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In Proc. of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization, APPROX '00, pages 84--95, 2000.
[10]
N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM J. Comput., 14:210--223, February 1985.
[11]
J. Cohen. Trusses: Cohesive subgraphs for social network analysis. National Security Agency Technical Report, 2008.
[12]
J. Cohen. Graph twiddling in a MapReduce world. Computing in Science & Engineering, 11:29--41, 2009.
[13]
Y. Dourisboure, F. Geraci, and M. Pellegrini. Extraction and classification of dense communities in the web. In Proc. of the 16th International Conference on World Wide Web, WWW '07, pages 461--470, 2007.
[14]
X. Du, R. Jin, L. Ding, V. E. Lee, and J. H. T. Jr. Migration motif: a spatial - temporal pattern mining approach for financial markets. In Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, 2009.
[15]
P. Erdos and A. Hajnal. On chromatic number of graphs and set-systems. Acta Mathematica Hungarica, 17:61--99, 1966.
[16]
U. Feige. Relations between average case complexity and approximation complexity. In Proceedings of Symposium on Theory of Computing, pages 534--543, 2002.
[17]
D. R. Forsyth. Group Dynamics. Cengage Learning, 2010.
[18]
E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. Motifcut: regulatory motifs finding with maximum density subgraphs. In ISMB (Supplement of Bioinformatics), pages 156--157, 2006.
[19]
G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. SIAM J. Comput., 18(1):30--55, Feb. 1989.
[20]
D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In Proc. of the 31st International Conference on Very Large Data Bases, VLDB '05, pages 721--732, 2005.
[21]
A. Gionis, F. Junqueira, V. Leroy, M. Serafini, and I. Weber. Piggybacking on social networks. Proc. VLDB Endow., 6(6):409--420, 2013.
[22]
A. V. Goldberg. Finding a maximum density subgraph. Technical report, Berkeley, CA, USA, 1984.
[23]
R. Gupta, T. Roughgarden, and C. Seshadhri. Decompositions of triangle-dense graphs. In Innovations in Theoretical Computer Science (ITCS), pages 471--482, 2014.
[24]
J. Hastad. Clique is hard to approximate within n(1). In Acta Mathematica, pages 627--636, 1996.
[25]
H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213--221, Jan. 2005.
[26]
L. Iasemidis, D.-S. Shiau, W. Chaovalitwongse, J. Sackellares, P. Pardalos, J. Principe, P. Carney, A. Prasad, B. Veeramani, and K. Tsakalis. Adaptive epileptic seizure prediction system. Biomedical Engineering, IEEE Transactions on, 50:616--627, 2003.
[27]
R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD Conference, pages 813{826, 2009.
[28]
S. Khot. Ruling out ptas for graph min-bisection, dense k-subgraph, and bipartite clique. SIAM Journal on Computing, 36(4):1025--1071, 2006.
[29]
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the web for emerging cyber-communities. In Proc. of the Eighth International Conference on World Wide Web, WWW '99, pages 1481--1493, 1999.
[30]
V. E. Lee, N. Ruan, R. Jin, and C. C. Aggarwal. A survey of algorithms for dense subgraph discovery. In C. C. Aggarwal and H. Wang, editors, Managing and Mining Graph Data, volume 40. Springer, 2010.
[31]
D. Lick and A. White. k-degenerate graphs. Canadian Journal of Mathematics, 22:1082--1096, 1970.
[32]
D. Matula and L. Beck. Smallest-last ordering and clustering and graph coloring algorithms. Journal of ACM, 30(3):417--427, 1983.
[33]
R. Mokken. Cliques, clubs and clans. Quality and Quantity, 13(2):161--173, 1979.
[34]
R. A. Rossi, D. F. Gleich, A. H. Gebremedhin, and M. M. A. Patwary. A fast parallel maximum clique algorithm for large sparse graphs and temporal strong components. CoRR, abs/1302.6256, 2013.
[35]
A. Sala, L. Cao, C. Wilson, R. Zablit, H. Zheng, and B. Y. Zhao. Measurement-calibrated graph models for social network experiments. In WWW '10, pages 861--870. ACM, 2010.
[36]
A. E. Sariyuce, B. Gedik, G. Jacques-Silva, K.-L. Wu, and U. V. Catalyurek. Streaming algorithms for k-core decomposition. In 39th International Conference on Very Large Data Bases (VLDB), Aug 2013.
[37]
T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms, pages 606--609. Springer Berlin / Heidelberg, 2005.
[38]
S. B. Seidman. Network structure and minimum degree. Social Networks, 5(3):269--287, 1983.
[39]
S. B. Seidman and B. Foster. A graph-theoretic generalization of the clique concept. Journal of Mathematical Sociology, 1978.
[40]
C. Seshadhri, A. Pinar, and T. G. Kolda. Triadic measures on graphs: The power of wedge sampling. Statistical Analysis and Data Mining, 7(4):294--307, 2014.
[41]
SNAP. Stanford network analysis package. http://snap.stanford.edu/snap, retrieved March, 2014.
[42]
S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW'11, pages 607--614, 2011.
[43]
C. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '13, 2013.
[44]
C. E. Tsourakakis. A novel approach to finding near-cliques: The triangle-densest subgraph problem. CoRR, abs/1405.1477, 2014.
[45]
J. Wang and J. Cheng. Truss decomposition in massive networks. Proceedings of the VLDB Endowment, 5(9):812--823, 2012.
[46]
N. Wang, J. Zhang, K.-L. Tan, and A. K. H. Tung. On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endow., 4:58--68, 2010.
[47]
S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
[48]
D. Watts and S. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393:440--442, 1998.
[49]
B. Zhang and S. Horvath. A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4(1):Article 17+, 2005.
[50]
Y. Zhang and S. Parthasarathy. Extracting analyzing and visualizing triangle k-core motifs within networks. In Proc. of the 2012 IEEE 28th International Conference on Data Engineering, ICDE '12, pages 1049--1060, 2012.
[51]
F. Zhao and A. K. H. Tung. Large scale cohesive subgraphs discovery for social network visual analysis. In Proc. of the 39th international conference on Very Large Data Bases, PVLDB'13, pages 85--96, 2013.

Cited By

View all
  • (2024)Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic GraphsElectronics10.3390/electronics1301019213:1(192)Online publication date: 1-Jan-2024
  • (2024)An Algorithm for Finding Optimal k-Core in Attribute NetworksApplied Sciences10.3390/app1403125614:3(1256)Online publication date: 2-Feb-2024
  • (2024)Efficient k-Clique Count Estimation with Accuracy GuaranteeProceedings of the VLDB Endowment10.14778/3681954.368203217:11(3707-3719)Online publication date: 1-Jul-2024
  • Show More Cited By

Index Terms

  1. Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '15: Proceedings of the 24th International Conference on World Wide Web
    May 2015
    1460 pages
    ISBN:9781450334693

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 18 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dense subgraph discovery
    2. density hierarchy
    3. graph decomposition
    4. k-core
    5. k-truss
    6. overlapping dense subgraphs

    Qualifiers

    • Research-article

    Funding Sources

    • DARPA

    Conference

    WWW '15
    Sponsor:
    • IW3C2

    Acceptance Rates

    WWW '15 Paper Acceptance Rate 131 of 929 submissions, 14%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 23 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic GraphsElectronics10.3390/electronics1301019213:1(192)Online publication date: 1-Jan-2024
    • (2024)An Algorithm for Finding Optimal k-Core in Attribute NetworksApplied Sciences10.3390/app1403125614:3(1256)Online publication date: 2-Feb-2024
    • (2024)Efficient k-Clique Count Estimation with Accuracy GuaranteeProceedings of the VLDB Endowment10.14778/3681954.368203217:11(3707-3719)Online publication date: 1-Jul-2024
    • (2024)Efficient Algorithms for Density Decomposition on Large Static and Dynamic GraphsProceedings of the VLDB Endowment10.14778/3681954.368197417:11(2933-2945)Online publication date: 30-Aug-2024
    • (2024)Efficient Parallel D-Core Decomposition at ScaleProceedings of the VLDB Endowment10.14778/3675034.367505417:10(2654-2667)Online publication date: 1-Jun-2024
    • (2024)A Survey on the Densest Subgraph Problem and its VariantsACM Computing Surveys10.1145/365329856:8(1-40)Online publication date: 22-Mar-2024
    • (2024)A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and InteractionProceedings of the ACM on Management of Data10.1145/36393152:1(1-29)Online publication date: 26-Mar-2024
    • (2024)Efficient k-Clique Listing: An Edge-Oriented Branching StrategyProceedings of the ACM on Management of Data10.1145/36392622:1(1-26)Online publication date: 26-Mar-2024
    • (2024)Hierarchical Structure Construction on HypergraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679765(1597-1606)Online publication date: 21-Oct-2024
    • (2024)Towards Optimal Output-Sensitive Clique Listing or: Listing Cliques from Smaller CliquesProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649663(923-934)Online publication date: 10-Jun-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media