Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

A Complexity Theoretical Study of Fuzzy K-Means

Published: 16 September 2020 Publication History

Abstract

The fuzzy K-means problem is a popular generalization of the well-known K-means problem to soft clusterings. In this article, we present the first algorithmic study of the problem going beyond heuristics. Our main result is that, assuming a constant number of clusters, there is a polynomial time approximation scheme for the fuzzy K-means problem. As a part of our analysis, we also prove the existence of small coresets for fuzzy K-means. At the heart of our proofs are two novel techniques developed to analyze the otherwise notoriously difficult fuzzy K-means objective function.

References

[1]
A. Aggarwal, A. Deshpande, and R. Kannan. 2009. Adaptive sampling for k-means clustering. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. Lecture Notes in Computer Science, Vol. 5687. Springer, 15--28.
[2]
P. Awasthi, M. Charikar, R. Krishnaswamy, and A. K. Sinop. 2015. The hardness of approximation of Euclidean k-means. In Proceedings of the 31st International Symposium on Computational Geometry (SoCG’15). 754–767.
[3]
C. Bajaj. 1988. The algebraic degree of geometric optimization problems. Discrete 8 Computational Geometry 3, 2 (Jan. 1988), 177--191.
[4]
J. C. Bezdek, R. Ehrlich, and W. Full. 1984. FCM: The fuzzy -means clustering algorithm. Computers 8 Geosciences 10, 2 (1984), 191--203.
[5]
J. C. Bezdek, R. J. Hathaway, M. J. Sabin, and W. T. Tucker. 1987. Convergence theory for fuzzy c-means: Counterexamples and repairs. IEEE Transactions on Systems, Man and Cybernetics 17, 5 (Sept. 1987), 873--877.
[6]
J. Blömer, S. Brauer, and K. Bujna. 2016. A theoretical analysis of the fuzzy K-means problem. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM’16). 805--810.
[7]
J. Blömer, S. Brauer, and K. Bujna. 2018. Coresets for fuzzy -means with applications. In Proceedings of the 29th International Symposium on Algorithms and Computation (ISAAC’18).
[8]
J. Blömer, C. Lammersen, M. Schmidt, and C. Sohler. 2016. Theoretical analysis of the k-means algorithm—A survey. In Algorithm Engineering. Springer International, 81--116.
[9]
S. Brauer. 2019. Classification and Approximation of Geometric Location Problems. Ph.D. Dissertation. Paderborn University.
[10]
K. Chen. 2009. On coresets for K-median and K-means clustering in metric and Euclidean spaces and their applications. SIAM Journal on Computing 39, 3 (Jan. 2009), 923--947.
[11]
D. Dembélé and P. Kastner. 2003. Fuzzy C-means method for clustering microarray data. Bioinformatics 19, 8 (May 2003), 973--980.
[12]
J. C. Dunn. 1973. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics 3, 3 (Jan. 1973), 32--57.
[13]
D. Feldman and M. Langberg. 2011. A unified framework for approximating and clustering data. In Proceedings of the 43rd Annual ACM Symposium on Theory of Computing (STOC’11). ACM, New York, NY, 569--578.
[14]
D. Feldman, F. Matthew, and A. Krause. 2011. Scalable training of mixture models via coresets. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2142--2150.
[15]
Dan Feldman, Morteza Monemizadeh, and Christian Sohler. 2007. A PTAS for K-means clustering based on weak coresets. In Proceedings of the 23rd Annual Symposium on Computational Geometry (SCG’07). ACM, New York, NY, 11--18.
[16]
D. Feldman, M. Schmidt, and C. Sohler. 2013. Turning big data into tiny data: Constant-size coresets for K-means, PCA and projective clustering. In Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’13). 1434--1453.
[17]
S. Har-Peled and A. Kushal. 2005. Smaller coresets for K-median and K-means clustering. In Proceedings of the 21st Annual Symposium on Computational Geometry (SCG’05). ACM, New York, NY, 126--134.
[18]
S. Har-Peled and S. Mazumdar. 2003. Coresets for k-means and k-median clustering and their applications. In Proceedings of the 36th Annual ACM Sympososium on Theory of Computing. 291--300.
[19]
R. J. Hathaway and J. C. Bezdek. 1986. Local convergence of the fuzzy c-means algorithms. Pattern Recognition 19, 6 (1986), 477--480.
[20]
R. J. Hathaway and J. C. Bezdek. 1988. Recent convergence results for the fuzzy c-means clustering algorithms. Journal of Classification 5, 2 (Sept. 1988), 237--247.
[21]
D. Haussler. 1992. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation 100, 1 (1992), 78--150.
[22]
K. Hirota and W. Pedrycz. 1999. Fuzzy computing for data mining. Proceedings of the IEEE 87, 9 (1999), 1575–1600.
[23]
F. Hoppner and F. Klawonn. 2003. A contribution to convergence theory of fuzzy c-means and derivatives. IEEE Transactions on Fuzzy Systems 11, 5 (Oct. 2003), 682--694.
[24]
M. Inaba, N. Katoh, and H. Imai. 1994. Applications of weighted Voronoi diagrams and randomization to variance-based K-clustering. In Proceedings of the 10th Annual Symposium on Computational Geometry (SoCG’94). ACM, New York, NY, 332--339.
[25]
W. B. Johnson and J. Lindenstrauss. 1984. Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics 26 (1984), 189--206.
[26]
T. Kim, J. C. Bezdek, and R. J. Hathaway. 1988. Optimality tests for fixed points of the fuzzy c-means algorithm. Pattern Recognition 21, 6 (1988), 651--663.
[27]
S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 2 (March 1982), 129--137.
[28]
Mario Lucic, Olivier Bachem, and Andreas Krause. 2016. Strong coresets for hard and soft Bregman clustering with applications to exponential family mixtures. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS’16). 1--9.
[29]
M. R. Rezaee, P. M. J. van der Zwet, B. P. F. Lelieveldt, R. J. van der Geest, and J. H. C. Reiber. 2000. A multiresolution image segmentation technique based on pyramidal segmentation and fuzzy clustering. IEEE Transactions on Image Processing 9, 7 (July 2000), 1238--1248.

Cited By

View all
  • (2024)Wasserstein gradient flow for optimal probability measure decompositionSSRN Electronic Journal10.2139/ssrn.4831118Online publication date: 2024
  • (2022)How to improve urban transportation planning in big data era? A practice in the study of traffic analysis zone delineationTransport Policy10.1016/j.tranpol.2022.08.002127(1-14)Online publication date: Oct-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Algorithms
ACM Transactions on Algorithms  Volume 16, Issue 4
October 2020
404 pages
ISSN:1549-6325
EISSN:1549-6333
DOI:10.1145/3407674
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2020
Accepted: 01 June 2020
Revised: 01 November 2019
Received: 01 December 2018
Published in TALG Volume 16, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Clustering
  2. approximation algorithms
  3. coresets
  4. fuzzy K-means

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • German Research Foundation (DFG)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Wasserstein gradient flow for optimal probability measure decompositionSSRN Electronic Journal10.2139/ssrn.4831118Online publication date: 2024
  • (2022)How to improve urban transportation planning in big data era? A practice in the study of traffic analysis zone delineationTransport Policy10.1016/j.tranpol.2022.08.002127(1-14)Online publication date: Oct-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media