Local Search k-means++ with Foresight

File

LIPIcs.SEA.2024.7.pdf

Filesize: 10.87 MB
20 pages

Document Identifiers

DOI: 10.4230/LIPIcs.SEA.2024.7
URN: urn:nbn:de:0030-drops-203727

Author Details

Theo Conrads

Department of Computer Science, University of Cologne, Germany

Lukas Drexler

Faculty of Mathematics and Natural Sciences, Department of Computer Science, Heinrich Heine University Düsseldorf, Germany

Joshua Könen

Institute of Computer Science, University of Bonn, Germany

Daniel R. Schmidt

Faculty of Mathematics and Natural Sciences, Department of Computer Science, Heinrich Heine University Düsseldorf, Germany

Melanie Schmidt

Faculty of Mathematics and Natural Sciences, Department of Computer Science, Heinrich Heine University Düsseldorf, Germany

Cite AsGet BibTex

Theo Conrads, Lukas Drexler, Joshua Könen, Daniel R. Schmidt, and Melanie Schmidt. Local Search k-means++ with Foresight. In 22nd International Symposium on Experimental Algorithms (SEA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 301, pp. 7:1-7:20, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2024)
https://doi.org/10.4230/LIPIcs.SEA.2024.7

Abstract

Since its introduction in 1957, Lloyd’s algorithm for k-means clustering has been extensively studied and has undergone several improvements. While in its original form it does not guarantee any approximation factor at all, Arthur and Vassilvitskii (SODA 2007) proposed k-means++ which enhances Lloyd’s algorithm by a seeding method which guarantees a 𝒪(log k)-approximation in expectation. More recently, Lattanzi and Sohler (ICML 2019) proposed LS++ which further improves the solution quality of k-means++ by local search techniques to obtain a 𝒪(1)-approximation. On the practical side, the greedy variant of k-means++ is often used although its worst-case behaviour is provably worse than for the standard k-means++ variant. We investigate how to improve LS++ further in practice. We study two options for improving the practical performance: (a) Combining LS++ with greedy k-means++ instead of k-means++, and (b) Improving LS++ by better entangling it with Lloyd’s algorithm. Option (a) worsens the theoretical guarantees of k-means++ but improves the practical quality also in combination with LS++ as we confirm in our experiments. Option (b) is our new algorithm, Foresight LS++. We experimentally show that FLS++ improves upon the solution quality of LS++. It retains its asymptotic runtime and its worst-case approximation bounds.

Subject Classification

ACM Subject Classification

Mathematics of computing → Combinatorial algorithms
Theory of computation → Facility location and clustering
Information systems → Clustering

Keywords

k-means clustering
kmeans++
greedy
local search

Metrics

Access Statistics
Total Accesses (updated on a weekly basis)

0

PDF Downloads

0

Metadata Views

References

Available at https://www.kdd.org/kdd-cup/view/kdd-cup-2004/data.
Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and Eucl. k-median by primal-dual alg. SIAM J. Comput., 49(4), 2020.
Daniel Aloise, Pierre Hansen, and Leo Liberti. An improved column generation algorithm for minimum sum-of-squares clustering. Mathematical Programming, 131:195-220, 2012.
David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In Proceedings of the 18th SODA, pages 1027-1035, USA, 2007.
Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The hardness of approximation of Euclidean k-means. In Lars Arge and János Pach, editors, Proc. of the 31st SoCG, volume 34 of LIPIcs, pages 754-767, 2015.
Anup Bhattacharya, Jan Eube, Heiko Röglin, and Melanie Schmidt. Noisy, greedy and not so greedy k-means++. In Proc. of the 28th ESA, 2020.
M. Emre Celebi, Hassan A. Kingravi, and Patricio A. Vela. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl., 40(1):200-210, 2013.
Davin Choo, Christoph Grunau, Julian Portmann, and Václav Rozhon. k-means++: few more steps yield constant approximation. In International Conference on Machine Learning, pages 1909-1917. PMLR, 2020.
Theo Conrads. Lokale Such- und Samplingmethoden für das k-Means- und k-Median-Problem. Master’s thesis, Universität zu Köln, 2021.
Sanjoy Dasgupta. The hardness of k-means clustering, 2008. Technical report.
Lukas Drexler, Joshua Könen, Daniel R. Schmidt, Melanie Schmidt, and Giulia Baldini. algo-hhu/FLSpp. Software, swhId: https://archive.softwareheritage.org/swh:1:dir:39136777e542456572a51515813b6cb377ed0940;origin=https://github.com/algo-hhu/FLSpp;visit=swh:1:snp:18f736f0d6ee924ac77fcfdbeed3e15015a6b867;anchor=swh:1:rev:72d58dc26e599c190373274ed6c22009d07a911b (visited on 2024-06-27). URL: https://github.com/algo-hhu/FLSpp.
Charles Elkan. Using the triangle inequality to accelerate k-means. In Tom Fawcett and Nina Mishra, editors, Proc. of the 20th ICML, pages 147-153, 2003.
Gereon Frahling and Christian Sohler. A fast k-means implementation using coresets. In Proc. of the 22nd SoCG, pages 135-143, 2006.
Pasi Fränti and Sami Sieranoja. K-means properties on six clustering benchmark datasets. Appl. Intell., 48(12):4743-4759, 2018.
Pasi Fränti and Sami Sieranoja. How much can k-means be improved by using better initialization and repeats? Pattern Recognition, 93:95-112, 2019.
Bernd Fritzke. The k-means-u* algorithm: non-local jumps and greedy retries improve k-means++ clustering. CoRR, abs/1706.09059, 2017.
Christoph Grunau, Ahmet Alper Özüdoğru, Václav Rozhoň, and Jakub Tětek. A nearly tight analysis of greedy k-means++. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1012-1070. SIAM, 2023.
Greg Hamerly. Making k-means even faster. In SDM, pages 130-140. SIAM, 2010.
Grete Heinz, Louis J. Peterson, Roger W. Johnson, and Carter J. Kerk. Exploring relationships in body dimensions. Journal of Statistics Education, 11(2), 2003.
Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. Comput. Geom., 28(2-3):89-112, 2004.
Silvio Lattanzi and Christian Sohler. A better k-means++ algorithm via local search. In Proc. of the 36th ICML, volume 97 of Proceedings of Machine Learning Research, pages 3662-3671. PMLR, 09-15 June 2019.
Euiwoong Lee, Melanie Schmidt, and John Wright. Improved and simplified inapproximability for k-means. Inf. Process. Lett., 120:40-43, 2017.
Meena Mahajan, Prajakta Nimbhorkar, and Kasturi R. Varadarajan. The planar k-means problem is np-hard. Theor. Comput. Sci., 442:13-21, 2012.
Manfred Padberg and Giovanni Rinaldi. A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Review, 33(1):60-100, 1991.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011.
J.M Peña, J.A Lozano, and P Larrañaga. An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters, 20(10):1027-1040, 1999.
Dennis Wei. A constant-factor bi-criteria approximation guarantee for k-means++. In Advances in Neural Information Processing Systems, volume 29, 2016.
I.-C. Yeh. Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete Research, 28(12):1797-1808, 1998.

Local Search k-means++ with Foresight

Authors Theo Conrads, Lukas Drexler , Joshua Könen , Daniel R. Schmidt , Melanie Schmidt

File

Document Identifiers

Author Details

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

References

Thanks for your feedback!

Could not send message

Local Search k-means++ with Foresight

Authors Theo Conrads, Lukas Drexler , Joshua Könen , Daniel R. Schmidt , Melanie Schmidt

File

Document Identifiers

Author Details

Funding

Cite AsGet BibTex

Abstract

Subject Classification

ACM Subject Classification

Keywords

Metrics

Supplementary Materials

References

Thanks for your feedback!

Could not send message