Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/11425274_24guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

On autonomous k-means clustering

Published: 25 May 2005 Publication History

Abstract

Clustering is a basic tool in unsupervised machine learning and data mining. One of the simplest clustering approaches is the iterative k-means algorithm. The quality of k-means clustering suffers from being confined to run with fixed k rather than being able to dynamically alter the value of k. Moreover, it would be much more elegant if the user did not have to supply the number of clusters for the algorithm.
In this paper we consider recently proposed autonomous versions of the k-means algorithm. We demonstrate some of their shortcomings and put forward solutions for their deficiencies. In particular, we examine the problem of automatically determining a good initial candidate as the number of clusters.

References

[1]
Fisher, D.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2 (1987) 139-172
[2]
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31 (1999) 264-323
[3]
MacQueen, J.B.: On convergence of k-means and partitions with minimum average variance (abstract). Annals of Mathematical Statistics 36 (1965) 1084
[4]
Forgy, E.: Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics 21 (1965) 768
[5]
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. John Wiley & Sons (1973)
[6]
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society, Series B 39 (1977) 1-38
[7]
Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Proc. 15th Intl. Conf. on Machine Learning, Morgan Kaufmann (1998) 91-99
[8]
Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Proc. 17th Intl. Conf. on Machine Learning, Morgan Kaufmann (2000) 727-734
[9]
Hamerly, G., Elkan, C.: Learning the k in k-means. In: Advances in Neural Information Processing Systems 16. MIT Press (2004)
[10]
Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Proc. 20th Intl. Conf. on Very Large Data Bases, Morgan Kaufmann (1994) 144-155
[11]
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: Proc. ACM SIGMOD Intl. Conf. on Management of Data, ACM Press (1995) 103-114
[12]
Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large datasets. In: Proc. ACM SIGMOD Intl. Conf. on Management of Data, ACM Press (1998) 73-84
[13]
Moore, A.W.: The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In: Proc. 16th Conf. on Uncertainty in Artificial Intelligence, Morgan Kaufmann (2000) 397-405
[14]
Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proc. 20th Intl. Conf. on Machine Learning, AAAI Press (2003) 147-153
[15]
Provost, F., Jensen, D., Oates, T.: Efficient progressive sampling. In: Proc. 5th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, ACM Press (1999) 23-32
[16]
Fayyad, U.M., Reina, C., Bradley, P.S.: Initialization of iterative refinement clustering. In: Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining, AAAI Press (1998) 194-198
[17]
Bischof, H., Leonardis, A., Selb, A.: MDL-principle for robust vector quantisation. Pattern Analysis and Applications 2 (1999) 59-72
[18]
Pelleg, D., Moore, A.W.: Accelerating exact k-means algorithms with geometric reasoning. In: Proc. 5th Intl. Conf. on Knowledge Discovery and Data Mining, AAAI Press (1999) 277-281
[19]
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proc. 36th Annual Symp. on Theory of Computing, ACM Press (2004) 291-300
[20]
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In: Proc. 45th Annual IEEE Symp. on Foundations on Computer Science, IEEE Press (2004) 454-462

Cited By

View all
  • (2020)GQM-based Tree Model for Automatic Recommendation of Design Pattern CategoryProceedings of the 9th International Conference on Software and Information Engineering10.1145/3436829.3436862(126-130)Online publication date: 11-Nov-2020
  • (2017)A Bayesian approach and probabilistic latent variable clustering based web services selectionInternational Journal of Knowledge Engineering and Soft Data Paradigms10.1504/IJKESDP.2017.0895246:1(82-93)Online publication date: 1-Jan-2017
  • (2006)A voronoi diagram approach to autonomous clusteringProceedings of the 9th international conference on Discovery Science10.1007/11893318_17(149-160)Online publication date: 7-Oct-2006

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ISMIS'05: Proceedings of the 15th international conference on Foundations of Intelligent Systems
May 2005
697 pages
ISBN:3540258787
  • Editors:
  • Mohand-Said Hacid,
  • Neil V. Murray,
  • Zbigniew W. Raś,
  • Shusaku Tsumoto

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 25 May 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)GQM-based Tree Model for Automatic Recommendation of Design Pattern CategoryProceedings of the 9th International Conference on Software and Information Engineering10.1145/3436829.3436862(126-130)Online publication date: 11-Nov-2020
  • (2017)A Bayesian approach and probabilistic latent variable clustering based web services selectionInternational Journal of Knowledge Engineering and Soft Data Paradigms10.1504/IJKESDP.2017.0895246:1(82-93)Online publication date: 1-Jan-2017
  • (2006)A voronoi diagram approach to autonomous clusteringProceedings of the 9th international conference on Discovery Science10.1007/11893318_17(149-160)Online publication date: 7-Oct-2006

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media