Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3519935.3520011acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open access

Improved approximations for Euclidean k-means and k-median, via nested quasi-independent sets

Published: 10 June 2022 Publication History

Abstract

Motivated by data analysis and machine learning applications, we consider the popular high-dimensional Euclidean k-median and k-means problems. We propose a new primal-dual algorithm, inspired by the classic algorithm of Jain and Vazirani and the recent algorithm of Ahmadian, Norouzi-Fard, Svensson, and Ward. Our algorithm achieves an approximation ratio of 2.406 and 5.912 for Euclidean k-median and k-means, respectively, improving upon the 2.633 approximation ratio of Ahmadian et al. and the 6.1291 approximation ratio of Grandoni, Ostrovsky, Rabani, Schulman, and Venkat.
Our techniques involve a much stronger exploitation of the Euclidean metric than previous work on Euclidean clustering. In addition, we introduce a new method of removing excess centers using a variant of independent sets over graphs that we dub a “nested quasi-independent set”. In turn, this technique may be of interest for other optimization problems in Euclidean and ℓp metric spaces.

References

[1]
Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. 2019. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. SIAM J. Comput. 49, 4 ( 2019 ), FOCS17-97-FOCS17-156.
[2]
David Arthur and Sergei Vassilvitskii. 2007. k-means++: the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, Nikhil Bansal, Kirk Pruhs, and Cliford Stein (Eds.). SIAM, 1027-1035. http://dl.acm.org/citation.cfm?id= 1283383. 1283494
[3]
David Arthur and Sergei Vassilvitskii. 2009. Worst-Case and Smoothed Analysis of the ICP Algorithm, with an Application to the k-Means Method. SIAM J. Comput. 39, 2 ( 2009 ), 766-782. https://doi.org/10.1137/070683921
[4]
Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. 2004. Local Search Heuristics for k-Median and Facility Location Problems. SIAM J. Comput. 33, 3 ( 2004 ), 544-562. https://doi.org/10. 1137/S0097539702416402
[5]
Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. 2015. The Hardness of Approximation of Euclidean k-Means. In 31st International Symposium on Computational Geometry, SoCG 2015, June 22-25, 2015, Eindhoven, The Netherlands (LIPIcs, Vol. 34 ), Lars Arge and János Pach (Eds.). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 754-767. https: //doi.org/10.4230/LIPIcs.SOCG. 2015.754
[6]
Sayan Bandyapadhyay and Kasturi R. Varadarajan. 2016. On Variants of k-means Clustering. In 32nd International Symposium on Computational Geometry, SoCG 2016, June 14-18, 2016, Boston, MA, USA (LIPIcs, Vol. 51 ), Sándor P. Fekete and Anna Lubiw (Eds.). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 14 : 1-14 : 15. https://doi.org/10.4230/LIPIcs.SoCG. 2016.14
[7]
Luca Becchetti, Marc Bury, Vincent Cohen-Addad, Fabrizio Grandoni, and Chris Schwiegelshohn. 2019. Oblivious dimension reduction for k-means: beyond subspaces and the Johnson-Lindenstrauss lemma. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, Moses Charikar and Edith Cohen (Eds.). ACM, 1039-1050. https://doi.org/10.1145/3313276.3316318
[8]
Jaroslaw Byrka, Thomas W. Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. 2017. An Improved Approximation for k-Median and Positive Correlation in Budgeted Optimization. ACM Trans. Algorithms 13, 2 ( 2017 ), 23 : 1-23 : 31. https://doi.org/10.1145/2981561
[9]
Moses Charikar and Sudipto Guha. 1999. Improved Combinatorial Algorithms for the Facility Location and k-Median Problems. In 40th Annual Symposium on Foundations of Computer Science, FOCS ' 99, 17-18 October, 1999, New York, NY, USA. IEEE Computer Society, 378-388. https://doi.org/10.1109/SFFCS. 1999.814609
[10]
Moses Charikar and Sudipto Guha. 2005. Improved Combinatorial Algorithms for Facility Location Problems. SIAM J. Comput. 34, 4 ( 2005 ), 803-824. https: //doi.org/10.1137/S0097539701398594
[11]
Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. 2002. A Constant-Factor Approximation Algorithm for the-Median Problem. J. Comput. Syst. Sci. 65, 1 ( 2002 ), 129-149. https://doi.org/10.1006/jcss. 2002.1882
[12]
Moses Charikar and Shi Li. 2012. A Dependent LP-Rounding Approach for the kMedian Problem. In Automata, Languages, and Programming-39th International Colloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 7391 ), Artur Czumaj, Kurt Mehlhorn, Andrew M. Pitts, and Roger Wattenhofer (Eds.). Springer, 194-205. https://doi.org/10.1007/ 978-3-642-31594-7_17
[13]
Vincent Cohen-Addad. 2018. A Fast Approximation Scheme for Low-Dimensional k-Means. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, Artur Czumaj (Ed.). SIAM, 430-440. https://doi.org/10.1137/1.9781611975031.29
[14]
Vincent Cohen-Addad, Hossein Esfandiari, Vahab Mirrokni, and Shyam Narayanan. 2022. Improved Approximations for Euclidean k-means and k-median, via Nested Quasi-Independent Sets. CoRR abs/2204.04828 ( 2022 ).
[15]
Vincent Cohen-Addad, Anupam Gupta, Lunjia Hu, Hoon Oh, and David Saulpic. 2022. An Improved Local Search Algorithm for-Median. In Proceedings of the Thirty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2022. SIAM, 1556-1612.
[16]
Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. 2019. Tight FPT Approximations for k-Median and k-Means. In 46th International Colloquium on Automata, Languages, and Programming, ICALP 2019, July 9-12, 2019, Patras, Greece (LIPIcs, Vol. 132 ), Christel Baier, Ioannis Chatzigiannakis, Paola Flocchini, and Stefano Leonardi (Eds.). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 42 : 1-42 : 14. https://doi.org/10.4230/LIPIcs.ICALP. 2019.42
[17]
Vincent Cohen-Addad and Karthik C. S. 2019. Inapproximability of Clustering in Lp Metrics. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, David Zuckerman (Ed.). IEEE Computer Society, 519-539. https://doi.org/10.1109/FOCS. 2019.00040
[18]
Vincent Cohen-Addad, Euiwoong Lee, and Karthik C. S. 2022. Johnson Coverage Hypothesis: Inapproximability of-means and-median in ℓ metrics. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022. SIAM.
[19]
Vincent Cohen-Addad and Claire Mathieu. 2015. Efectiveness of Local Search for Geometric Optimization. In 31st International Symposium on Computational Geometry, SoCG 2015, June 22-25, 2015, Eindhoven, The Netherlands (LIPIcs, Vol. 34 ), Lars Arge and János Pach (Eds.). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 329-343. https://doi.org/10.4230/LIPIcs.SOCG. 2015.329
[20]
Vincent Cohen-Addad, Karthik C. S., and Euiwoong Lee. 2021. On Approximability of Clustering Problems Without Candidate Centers. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10-13, 2021, Dániel Marx (Ed.). SIAM, 2635-2648. https://doi.org/10.1137/1.9781611976465.156
[21]
Sanjoy Dasgupta. 2008. The hardness of k-means clustering. Department of Computer Science and Engineering, University of California....
[22]
Dan Feldman and Michael Langberg. 2011. A unified framework for approximating and clustering data. In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, San Jose, CA, USA, 6-8 June 2011, Lance Fortnow and Salil P. Vadhan (Eds.). ACM, 569-578. https://doi.org/10.1145/1993636.1993712
[23]
Fabrizio Grandoni, Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Rakesh Venkat. 2022. A refined approximation for Euclidean k-means. Inf. Process. Lett. 176 ( 2022 ), 106251. https://doi.org/10.1016/j.ipl. 2022.106251
[24]
Sudipto Guha and Samir Khuller. 1999. Greedy Strikes Back: Improved Facility Location Algorithms. J. Algorithms 31, 1 ( 1999 ), 228-248. https://doi.org/10. 1006/jagm. 1998.0993
[25]
Venkatesan Guruswami and Piotr Indyk. 2003. Embeddings and nonapproximability of geometric problems. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA. ACM/SIAM, 537-538. http://dl.acm.org/citation.cfm?id= 644108. 644198
[26]
S Louis Hakimi. 1964. Optimum locations of switching centers and the absolute centers and medians of a graph. Operations research 12, 3 ( 1964 ), 450-459.
[27]
Kamal Jain, Mohammad Mahdian, Evangelos Markakis, Amin Saberi, and Vijay V. Vazirani. 2003. Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP. J. ACM 50, 6 ( 2003 ), 795-824. https://doi.org/10.1145/ 950620.950621
[28]
Kamal Jain and Vijay V. Vazirani. 2001. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. Journal of the ACM 48, 2 ( 2001 ), 274-296.
[29]
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. 2004. A local search approximation algorithm for k-means clustering. Comput. Geom. 28, 2-3 ( 2004 ), 89-112. https://doi.org/10. 1016/j.comgeo. 2004. 03.003
[30]
Madhukar R. Korupolu, C. Greg Plaxton, and Rajmohan Rajaraman. 2000. Analysis of a Local Search Heuristic for Facility Location Problems. J. Algorithms 37, 1 ( 2000 ), 146-188. https://doi.org/10.1006/jagm. 2000.1100
[31]
Amit Kumar, Yogish Sabharwal, and Sandeep Sen. 2010. Linear-time approximation schemes for clustering problems in any dimensions. J. ACM 57, 2 ( 2010 ), 5 : 1-5 : 32. https://doi.org/10.1145/1667053.1667054
[32]
Euiwoong Lee, Melanie Schmidt, and John Wright. 2017. Improved and simplified inapproximability for k-means. Inf. Process. Lett. 120 ( 2017 ), 40-43. https: //doi.org/10.1016/j.ipl. 2016. 11.009
[33]
Shi Li. 2013. A 1.488 approximation algorithm for the uncapacitated facility location problem. Inf. Comput. 222 ( 2013 ), 45-58. https://doi.org/10.1016/j.ic. 2012. 01.007
[34]
Shi Li and Ola Svensson. 2016. Approximating k-Median via PseudoApproximation. SIAM J. Comput. 45, 2 ( 2016 ), 530-547. https://doi.org/10. 1137/130938645
[35]
SP Lloyd. 1957. Least square quantization in PCM. Bell Telephone Laboratories Paper. Published in journal much later: Lloyd, SP: Least squares quantization in PCM. IEEE Trans. Inform. Theor.( 1957 / 1982 ) 18 ( 1957 ).
[36]
Konstantin Makarychev, Yury Makarychev, and Ilya P. Razenshteyn. 2019. Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, Moses Charikar and Edith Cohen (Eds.). ACM, 1027-1038. https://doi.org/10.1145/3313276.3316350
[37]
Konstantin Makarychev, Yury Makarychev, Maxim Sviridenko, and Justin Ward. 2016. A Bi-Criteria Approximation Algorithm for k-Means. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2016, September 7-9, 2016, Paris, France (LIPIcs, Vol. 60 ), Klaus Jansen, Claire Mathieu, José D. P. Rolim, and Chris Umans (Eds.). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 14 : 1-14 : 20. https://doi.org/10.4230/LIPIcs. APPROX-RANDOM. 2016.14
[38]
Jirí Matousek. 2000. On Approximate Geometric k-Clustering. Discrete & Computational Geometry 24, 1 ( 2000 ), 61-84. https://doi.org/10.1007/s004540010019
[39]
Nimrod Megiddo and Kenneth J Supowit. 1984. On the complexity of some common geometric location problems. SIAM journal on computing 13, 1 ( 1984 ), 182-196.
[40]
David Saulpic, Vincent Cohen-Addad, and Andreas Emil Feldmann. 2019. NearLinear Time Approximations Schemes for Clustering in Doubling Metrics. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, David Zuckerman (Ed.). IEEE Computer Society, 540-559. https://doi.org/10.1109/FOCS. 2019.00041
[41]
Hugo Steinhaus. 1957. Sur la division des corps matériels en parties. Bull. Acad. Pol. Sci., Cl. III 4 ( 1957 ), 801-804.

Cited By

View all
  • (2024)A (1+varepsilon)-approximation algorithm for diversity-aware k-medianSCIENTIA SINICA Informationis10.1360/SSI-2024-010855:1(32)Online publication date: 26-Dec-2024
  • (2024)Resilient k-ClusteringProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671888(29-38)Online publication date: 25-Aug-2024
  • (2024)On the Budgeted Priority p-Median Problem in High-Dimensional Euclidean SpacesJournal of the Operations Research Society of China10.1007/s40305-023-00533-wOnline publication date: 22-Mar-2024
  • Show More Cited By

Index Terms

  1. Improved approximations for Euclidean k-means and k-median, via nested quasi-independent sets

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC 2022: Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing
    June 2022
    1698 pages
    ISBN:9781450392648
    DOI:10.1145/3519935
    This work is licensed under a Creative Commons Attribution 4.0 International License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. $k$-means
    2. $k$-median
    3. approximation algorithms
    4. primal-dual

    Qualifiers

    • Research-article

    Conference

    STOC '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Upcoming Conference

    STOC '25
    57th Annual ACM Symposium on Theory of Computing (STOC 2025)
    June 23 - 27, 2025
    Prague , Czech Republic

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)349
    • Downloads (Last 6 weeks)40
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A (1+varepsilon)-approximation algorithm for diversity-aware k-medianSCIENTIA SINICA Informationis10.1360/SSI-2024-010855:1(32)Online publication date: 26-Dec-2024
    • (2024)Resilient k-ClusteringProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671888(29-38)Online publication date: 25-Aug-2024
    • (2024)On the Budgeted Priority p-Median Problem in High-Dimensional Euclidean SpacesJournal of the Operations Research Society of China10.1007/s40305-023-00533-wOnline publication date: 22-Mar-2024
    • (2023)k-means clustering with distance-based privacyProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666981(19570-19593)Online publication date: 10-Dec-2023
    • (2023)Deterministic Clustering in High Dimensional Spaces: Sketches and Approximation2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00066(1105-1130)Online publication date: 6-Nov-2023
    • (2023)On parameterized approximation algorithms for balanced clusteringJournal of Combinatorial Optimization10.1007/s10878-022-00980-w45:1Online publication date: 8-Jan-2023
    • (2023)A PTAS Framework for Clustering Problems in Doubling MetricsComputing and Combinatorics10.1007/978-3-031-49190-0_28(384-397)Online publication date: 9-Dec-2023
    • (2022)Near-optimal private and scalable k-clusteringProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601030(10462-10475)Online publication date: 28-Nov-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media