Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3490148.3538584acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article
Open access

Parallel Batch-Dynamic Minimum Spanning Forest and the Efficiency of Dynamic Agglomerative Graph Clustering

Published: 11 July 2022 Publication History

Abstract

Hierarchical agglomerative clustering (HAC) is a popular algorithm for clustering data, but despite its importance, no dynamic algorithms for HAC with good theoretical guarantees exist. In this paper, we study dynamic HAC on edge-weighted graphs. As single-linkage HAC reduces to computing a minimum spanning forest (MSF), our first result is a parallel batch-dynamic algorithm for maintaining MSFs. On a batch of k edge insertions or deletions, our batch-dynamic MSF algorithm runs in O(k log6 n) expected amortized work and O(log4 n) span with high probability. It is the first fully dynamic MSF algorithm handling batches of edge updates with polylogarithmic work per update and polylogarithmic span. Using our MSF algorithm, we obtain a parallel batch-dynamic algorithm that can answer queries about single-linkage graph HAC clusters.
Our second result is that dynamic graph HAC is significantly harder for other common linkage functions. For example, assuming the strong exponential time hypothesis, dynamic graph HAC requires Ω(n1-o(1)) work per update or query on a graph with n vertices for complete linkage, weighted average linkage, and average linkage. For complete linkage and weighted average linkage, the bound still holds even for incremental or decremental algorithms and even if we allow poly(n)-approximation. For average linkage, the bound weakens to Ω(n1/2-o(1)) for incremental and decremental algorithms, and the bounds still hold when allowing no(1) -approximation.

References

[1]
Amir Abboud and Virginia Vassilevska Williams. 2014. Popular conjectures imply strong lower bounds for dynamic problems. In Proceedings of the 55th Annual Symposium on Foundations of Computer Science. 434--443.
[2]
Umut A. Acar, Daniel Anderson, Guy E. Blelloch, and Laxman Dhulipala. 2019. Parallel Batch-Dynamic Graph Connectivity. In Proceedings of the 31st ACM Symposium on Parallelism in Algorithms and Architectures. 381--392.
[3]
Umut A. Acar, Daniel Anderson, Guy E. Blelloch, Laxman Dhulipala, and Sam Westrick. 2020. Parallel Batch-Dynamic Trees via Change Propagation. In 28th Annual European Symposium on Algorithms, Vol. 173. 2:1--2:23.
[4]
Umit A Acar, Guy E Blelloch, Robert Harper, Jorge L Vittes, and Shan Leung Maverick Woo. 2004. Dynamizing Static Algorithms, with Applications to Dynamic Trees and History Independence. In Proceedings of the Fifteenth Annual ACMSIAM Symposium on Discrete Algorithms, Vol. 531. 540.
[5]
Stephen Alstrup, Jacob Holm, Kristian de Lichtenberg, and Mikkel Thorup. 1997. Minimizing Diameters of Dynamic Trees. In International Colloquium on Automata, Languages, and Programming. 270--280.
[6]
Daniel Anderson, Guy E. Blelloch, and Kanat Tangwongsan. 2020. Work-efficient Batch-Incremental Minimum Spanning Trees with Applications to the Sliding Window Model. In Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures. 51--61.
[7]
MohammadHossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, Mohammad- Taghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, and Vahab Mirrokni. 2017. Affinity Clustering: Hierarchical Clustering at Scale. In Advances in Neural Information Processing Systems. 6864--6874.
[8]
J.-P. Benzécri. 1982. Construction d'une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques. Cahiers de l'analyse des données 7, 2 (1982), 209--218.
[9]
Guy E Blelloch. 1996. Programming Parallel Algorithms. Commun. ACM 39, 3 (1996), 85--97.
[10]
Guy E Blelloch, Daniel Ferizovic, and Yihan Sun. 2016. Just Join for Parallel Ordered sets. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. 253--264.
[11]
Chris Calabro, Russell Impagliazzo, and Ramamohan Paturi. 2009. The Complexity of Satisfiability of Small Depth Circuits. In International Workshop on Parameterized and Exact Computation. Springer, 75--85.
[12]
Timothy M Chan. 2006. Dynamic subgraph connectivity with geometric applications. SIAM J. Comput. 36, 3 (2006), 681--694.
[13]
Julia Chuzhoy, Yu Gao, Jason Li, Danupon Nanongkai, Richard Peng, and Thatchaphol Saranurak. 2020. A Deterministic Algorithm for Balanced Cut with Applications to Dynamic Connectivity, Flows, and Beyond. In Proceedings of the 61st Annual Symposium on Foundations of Computer Science. 1158--1167.
[14]
Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-trenn, and Claire Mathieu. 2019. Hierarchical Clustering: Objective Functions and Algorithms. J. ACM 66, 4 (2019).
[15]
Sajal K Das and Paolo Ferragina. 1995. Parallel Dynamic Algorithms for Minimum Spanning Trees. (1995).
[16]
Sanjoy Dasgupta. 2016. A Cost Function for Similarity-Based Hierarchical Clustering. In ACM Symposium on Theory of Computing (STOC). 118--127.
[17]
Erik Demaine. 2004. Lecture Notes on Skip Lists. https://courses.csail.mit.edu/6.046/spring04/handouts/skiplists.pdf. [Online; accessed 08-January-2022].
[18]
Laxman Dhulipala, David Eisenstat, Vahab Mirrokni, and Jessica Shi. 2021. Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time. In International Conference on Machine Learning. 2676--2686.
[19]
Paolo Ferragina and Fabrizio Luccio. 1994. Batch dynamic algorithms for two graph problems. In International Conference on Parallel Architectures and Languages Europe. 713--724.
[20]
Pasi Franti, Olli Virmajoki, and Ville Hautamaki. 2006. Fast Agglomerative Clustering Using a k-nearest Neighbor Graph. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 11 (2006), 1875--1881.
[21]
Daniele Frigioni and Giuseppe F Italiano. 2000. Dynamically Switching Vertices in Planar Graphs. Algorithmica 28, 1 (2000), 76--103.
[22]
Anka Gajentaan and Mark H Overmars. 1995. On a class of (2) problems in computational geometry. Computational geometry 5, 3 (1995), 165--185.
[23]
Joseph Gil, Yossi Matias, and Uzi Vishkin. 1991. Towards a Theory of Nearly Constant Time Parallel Algorithms. In Proceedings of the 32nd Annual Symposium on Foundations of Computer Science. 698--710.
[24]
John C Gower and Gavin JS Ross. 1969. Minimum Spanning Trees and Single Linkage Cluster Analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics) 18, 1 (1969), 54--64.
[25]
Yan Gu, Julian Shun, Yihan Sun, and Guy E. Blelloch. 2015. A Top-Down Parallel Semisort. In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures. 24--34.
[26]
Timothy C Havens, James C Bezdek, James M Keller, Mihail Popescu, and Jacalyn M Huband. 2009. Is VAT really single linkage in disguise? Annals of Mathematics and Artificial Intelligence 55, 3 (2009), 237--251.
[27]
Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranurak. 2015. Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. 21--30.
[28]
Monika Rauch Henzinger and Valerie King. 1995. Randomized Dynamic Graph Algorithms with Polylogarithmic Time per Operation. In Proceedings of the Twenty- Seventh Annual ACM Symposium on Theory of Computing. 519--527.
[29]
Jacob Holm, Kristian de Lichtenberg, and Mikkel Thorup. 2001. Poly-Logarithmic Deterministic Fully-Dynamic Algorithms for Connectivity, Minimum Spanning Tree, 2-Edge, and Biconnectivity. J. ACM 48, 4 (2001), 723--760.
[30]
Jacob Holm, Eva Rotenberg, and Christian Wulff-Nilsen. 2015. Faster fully dynamic minimum spanning forest. In 23rd Annual European Symposium on Algorithms. 742--753.
[31]
Joseph JáJá. 1992. An Introduction to Parallel Algorithms. Addison-Wesley.
[32]
George Karypis, Eui-Hong Han, and Vipin Kumar. 1999. Chameleon: Hierarchical Clustering using Dynamic Modeling. Computer 32, 8 (1999), 68--75.
[33]
Ari Kobren, Nicholas Monath, Akshay Krishnamurthy, and Andrew McCallum. 2017. A Hierarchical Algorithm for Extreme Clustering. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 255--264.
[34]
Tsvi Kopelowitz, Ely Porat, and Yair Rosenmutter. 2018. Improved Worst-Case Deterministic Parallel Dynamic Minimum Spanning Forest. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures. 333--341.
[35]
Silvio Lattanzi, Thomas Lavastida, Kefu Lu, and Benjamin Moseley. 2019. A Framework for Parallelizing Hierarchical Clustering Methods. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 73--89.
[36]
Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, and Sanjiv Kumar. 2019. Online hierarchical clustering approximations. arXiv preprint arXiv:1909.09667 (2019).
[37]
Peter Bro Miltersen, Sairam Subramanian, Jeffrey Scott Vitter, and Roberto Tamassia. 1994. Complexity models for incremental computation. Theoretical Computer Science 130, 1 (1994), 203--236.
[38]
Nicholas Monath, Ari Kobren, Akshay Krishnamurthy, Michael R Glass, and Andrew McCallum. 2019. Scalable Hierarchical Clustering with Tree Grafting. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1438--1448.
[39]
Benjamin Moseley and Joshua R. Wang. 2017. Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search. In Advances in Neural Information Processing Systems. 3094--3103.
[40]
Danupon Nanongkai, Thatchaphol Saranurak, and Christian Wulff-Nilsen. 2017. Dynamic minimum spanning forest with subpolynomial worst-case update time. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science. 950--961.
[41]
Mihai Patrascu. 2010. Towards polynomial lower bounds for dynamic problems. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing. 603--610.
[42]
Shaunak Pawagi and Owen Kaser. 1993. Optimal parallel algorithms for multiple updates of minimum spanning trees. Algorithmica 9, 4 (1993), 357--381.
[43]
Seth Pettie and Vijaya Ramachandran. 2002. A randomized time-work optimal parallel algorithm for finding a minimum spanning forest. SIAM J. Comput. 31, 6 (2002), 1879--1895.
[44]
Xiaojun Shen and Weifa Liang. 1993. A parallel algorithm for multiple edge updates of minimum spanning trees. In Proceedings of the Seventh International Parallel Processing Symposium. 310--317.
[45]
Daniel D Sleator and Robert Endre Tarjan. 1983. A data structure for dynamic trees. Journal of computer and system sciences 26, 3 (1983), 362--391.
[46]
Yihan Sun. 2019. Join-based Parallel Balanced Binary Trees. Ph.D. Dissertation. Carngie Mellon University.
[47]
Yihan Sun, Daniel Ferizovic, and Guy E Belloch. 2018. PAM: Parallel Augmented Maps. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 290--304.
[48]
Thomas Tseng, Laxman Dhulipala, and Guy E. Blelloch. 2019. Batch-Parallel Euler Tour Trees. In Proceedings of the Meeting on Algorithm Engineering and Experiments. 92--106.
[49]
Tom Tseng, Laxman Dhulipala, and Julian Shun. 2022. Parallel Batch-Dynamic Minimum Spanning Forest and the Efficiency of Dynamic Agglomerative Graph Clustering. arXiv preprint arXiv:2205.04956 (2022).
[50]
Qi Zhang and Wei Wang. 2007. An efficient algorithm for approximate biased quantile computation in data streams. In Proceedings of the 16th ACM International Conference on Information and Knowledge Management. 1023--1026.

Cited By

View all
  • (2024)Efficient Maintenance of Minimum Spanning Trees in Dynamic Weighted Undirected GraphsMathematics10.3390/math1207102112:7(1021)Online publication date: 28-Mar-2024
  • (2024)Concurrent Data Structures Made EasyProceedings of the ACM on Programming Languages10.1145/36897758:OOPSLA2(1814-1842)Online publication date: 8-Oct-2024
  • (2024)Parallel Dynamic Maximal MatchingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659982(427-437)Online publication date: 17-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '22: Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures
July 2022
464 pages
ISBN:9781450391467
DOI:10.1145/3490148
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2022

Check for updates

Author Tags

  1. dynamic algorithms
  2. graph clustering
  3. parallel algorithms

Qualifiers

  • Research-article

Funding Sources

Conference

SPAA '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)142
  • Downloads (Last 6 weeks)23
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Maintenance of Minimum Spanning Trees in Dynamic Weighted Undirected GraphsMathematics10.3390/math1207102112:7(1021)Online publication date: 28-Mar-2024
  • (2024)Concurrent Data Structures Made EasyProceedings of the ACM on Programming Languages10.1145/36897758:OOPSLA2(1814-1842)Online publication date: 8-Oct-2024
  • (2024)Parallel Dynamic Maximal MatchingProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659982(427-437)Online publication date: 17-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media