Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3558481.3591075acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article
Open access

On Parallel k-Center Clustering

Published: 17 June 2023 Publication History

Abstract

We consider the classic k-center problem in a parallel setting, on the low-local-space Massively Parallel Computation (MPC) model, with local space per machine of O (nδ), where δ ∈ (0,1) is an arbitrary constant. As a central clustering problem, the k-center problem has been studied extensively. Still, until very recently, all parallel MPC algorithms have been requiring Ω(k) or even Ω(knδ) local space per machine. While this setting covers the case of small values of k, for a large number of clusters these algorithms require large local memory, making them poorly scalable. The case of large k,k ≥ Ω(nδ), has been considered recently for the low-local-space MPC model by Bateni et al. (2021), who gave an O (log log n)-round MPC algorithm that produces k(1 + ο (1)) centers whose cost has multiplicative approximation of O (log log log n). In this paper we extend the algorithm of Bateni et al. and design a low-local-space MPC algorithm that in O (log log n) rounds returns a clustering with k(1 + ο(1)) clusters that is an O (log*n)-approximation for k-center.

References

[1]
Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. 2014. Parallel Algorithms for Geometric Graph Problems. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC). 574--583. https://doi.org/10.1145/2591796.2591805
[2]
Mohammad Hossein Bateni, Hossein Esfandiari, Manuela Fischer, and Vahab Mirrokni. 2021. Extreme k-Center Clustering. In Proceedings of the 35th AAAI Conference on Artificial Intelligence(AAAI). 3941--3949. https://ojs.aaai.org/index.php/AAAI/article/view/16513
[3]
Paul Beame, Paraschos Koutris, and Dan Suciu. 2017. Communication Steps for Parallel Query Processing. J. ACM, Vol. 64, 6 (2017), 40:1--40:58. https://doi.org/10.1145/3125644
[4]
Guy E Blelloch and Kanat Tangwongsan. 2010. Parallel approximation algorithms for facility-location problems. In Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures. 315--324.
[5]
Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. 2019. Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially. Proceedings of the VLDB Endowment, Vol. 12, 7 (2019), 766--778. https://doi.org/10.14778/3317315.3317319
[6]
Jiecao Chen, He Sun, David P. Woodruff, and Qin Zhang. 2016. Communication-Optimal Distributed Clustering. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS). 3720--3728. https://proceedings.neurips.cc/paper/2016/hash/7503cfacd12053d309b6bed5c89de212-Abstract.html
[7]
Sam Coy, Artur Czumaj, and Gopinath Mishra. 2023. On Parallel k-Center Clustering. Preprint arXiv:2304.05883 (2023).
[8]
Artur Czumaj, Jakub Łcacki, Aleksander Mcadry, Slobodan Mitrović, Krzysztof Onak, and Piotr Sankowski. 2018. Round Compression for Parallel Matching Algorithms. In Proceedings of the 50th Annual ACM Symposium on Theory of Computing (STOC). 471--484.
[9]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, Vol. 51, 1 (January 2008), 107--113. https://doi.org/10.1145/1327452.1327492
[10]
Martin E. Dyer and Alan M. Frieze. 1985. A Simple Heuristic for the p-Centre Problem. Operations Research Letters, Vol. 3, 6 (1985), 285--288.
[11]
Alina Ene, Sungjin Im, and Benjamin Moseley. 2011. Fast Clustering using MapReduce. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 681--689. https://doi.org/10.1145/2020408.2020515
[12]
Teofilo F. Gonzalez. 1985. Clustering to Minimize the Maximum Intercluster Distance. Theoretical Computer Science, Vol. 38 (1985), 293--306. Issue 2--3. https://doi.org/10.1016/0304-3975(85)90224-5
[13]
Michael T. Goodrich, Nodari Sitchinava, and Qin Zhang. 2011. Sorting, Searching, and Simulation in the MapReduce Framework. In Proceedings of the 22nd International Symposium on Algorithms and Computation (ISAAC). 374--383.
[14]
Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. 2012. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality. Theoretical Computer Science, Vol. 8, 1 (2012), 321--350. https://doi.org/10.4086/toc.2012.v008a014
[15]
Dorit S. Hochbaum and David B. Shmoys. 1985. A Best Possible Heuristic for the k-Center Problem. Mathematics of Operations Research, Vol. 10, 2 (1985), 180--184. https://doi.org/10.1287/moor.10.2.180
[16]
Wen-Lian Hsu and George L. Nemhauser. 1979. Easy and Hard Bottleneck Location Problems. Discrete Applied Mathematics, Vol. 1, 3 (1979), 209--215. https://doi.org/10.1016/0166-218X(79)90044-1
[17]
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed Data-parallel Programs from Sequential Building Blocks. SIGOPS Operating Systems Review, Vol. 41, 3 (March 2007), 59--72.
[18]
Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A Model of Computation for MapReduce. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 938--948. https://doi.org/10.1137/1.9781611973075.76
[19]
Gustavo Malkomes, Matt J. Kusner, Wenlin Chen, Kilian Q. Weinberger, and Benjamin Moseley. 2015. Fast Distributed k-Center Clustering with Outliers on Massive Data. In Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS). 1063--1071. https://proceedings.neurips.cc/paper/2015/hash/8fecb20817b3847419bb3de39a609afe-Abstract.html
[20]
Haofen Wang, Yan Liang, Linyun Fu, Gui-Rong Xue, and Yong Yu. 2009. Efficient Query Expansion for Advertisement Search. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 51--58. https://doi.org/10.1145/1571941.1571953
[21]
Tom White. 2015. Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale 4th ed.). O'Reilly Media, Sebastopol, CA.
[22]
Rui Xu and Donald Wunsch. 2005. Survey of Clustering Algorithms. IEEE Transactions on Neural Networks, Vol. 16, 3 (2005), 645--678. https://doi.org/10.1109/TNN.2005.845141
[23]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).

Cited By

View all
  • (2024)A Comprehensive Evaluation of Rough Sets Clustering in Uncertainty Driven ContextsStudia Universitatis Babeș-Bolyai Informatica10.24193/subbi.2024.1.0369:1(41-56)Online publication date: 10-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures
June 2023
504 pages
ISBN:9781450395458
DOI:10.1145/3558481
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2023

Check for updates

Author Tags

  1. clustering
  2. k-center
  3. mapreduce
  4. mpc
  5. parallel computing

Qualifiers

  • Research-article

Funding Sources

Conference

SPAA '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)167
  • Downloads (Last 6 weeks)8
Reflects downloads up to 02 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Comprehensive Evaluation of Rough Sets Clustering in Uncertainty Driven ContextsStudia Universitatis Babeș-Bolyai Informatica10.24193/subbi.2024.1.0369:1(41-56)Online publication date: 10-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media