research-article

Dynamic Structural Clustering on Graphs

Authors:

Boyu Ruan,

Junhao Gan,

Hao Wu,

Anthony WirthAuthors Info & Claims

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 1491 - 1503

https://doi.org/10.1145/3448016.3452828

Published: 18 June 2021 Publication History

Get Access

Abstract

\em Structural Clustering ($\strclu$) is one of the most popular graph clustering paradigms. In this paper, we consider $\strclu$ under Jaccard similarity on a dynamic graph, G = (V, E), subject to edge insertions and deletions (updates). The goal is to maintain certain information under updates, so that the strclu clustering result on~G can be retrieved in O(|V| + |E|)$ time, upon request. The state-of-the-art worst-case cost is~O(|V|) per update; we improve this update-time bound \em significantly with the ρ-approximate notion. Specifically, for a specified failure probability, δ^*, and \em every sequence of~M updates (no need to know M's value in advance), our algorithm, $\dynelm$, achieves~O(?og^2 |V| + og |V| \cdot ?og \fracM ?^* )$ amortized cost for each update, \em at all times in linear space. Moreover, $\dynelm$ provides a provable "sandwich'' guarantee on the clustering quality at all times after each update with probability at least 1 - ^*. We further develop dynelm into our ultimate algorithm, dynstr, which also supports \em cluster-group-by queries. Given Q \subseteq V, this puts the non-empty intersection of Q and each strclu cluster into a distinct group. dynstr not only achieves all the guarantees of dynelm, but also runs \em cluster-group-by queries in~O(|Q|\cdot og |V|) time. We demonstrate the performance of our algorithms via extensive experiments, on 15 real datasets. Experimental results confirm that our algorithms are up to three orders of magnitude more efficient than state-of-the-art competitors, and still provide quality structural clustering results.

Supplementary Material

MP4 File (3448016.3452828.mp4)

Structural Clustering (StrClu) is one of the most popular graph clustering paradigms. In this paper, we consider StrClu under the Jaccard similarity on a dynamic graph, G = < V, E >, subject to edge insertions and deletions. The goal is to maintain certain information under updates, so that the StrClu clustering result on G can be retrieved in O(|V| + |E|) time, upon request. The state-of-the-art worst-case update cost is O(|V|). We improve this bound significantly with the \rho-approximate notion. Specifically, for any sequence of updates that satisfies: for every vertex u in V, the frequency of deletions incident on u is at most a constant fraction of that of the insertions incident on u, our algorithm, DynELM, achieves ~O(1) amortized cost for each update, with linear space consumption, where the notation ~O( ) hides a poly-logarithmic factor in the complexity. Moreover, DynELM provides a provable "sandwich" guarantee on the clustering quality with high probability. We further develop DynELM into our ultimate algorithm, DynStrClu, which also supports cluster-group-by queries. Given an arbitrary subset Q of V, this puts the non-empty intersection of Q and each StrClu cluster into a distinct group. DynStrClu not only achieves all the guarantees of DynELM, but also runs cluster-group-by queries in ~O(|Q|) time. We demonstrate the performance of our algorithms via extensive experiments, on 14 real datasets. Experimental results confirm that both our algorithms are up to three orders of magnitude more efficient than state-of-the-art competitors, and still provide quality structural clustering results.

Download
111.13 MB

References

[1]

Nikhil Bansal, Avrim Blum, and Shuchi Chawla. 2004. Correlation Clustering. Machine Learning, Vol. 56, 1--3 (2004), 89--113.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Stable structural clustering in uncertain graphs

Dynamic Density Based Clustering

Structural clustering of millions of molecular graphs

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations