Abstract
Clustering problems arise in many different applications: machine learning, data mining, knowledge discovery, data compression, vector quantization, pattern recognition and pattern classification. One of the most popular and widely studied clustering methods is K-means. Several improvements to the standard K-means algorithm have been carried out, most of them related to the initial parameter values. In contrast, this article proposes an improvement using a new convergence condition that consists of stopping the execution when a local optimum is found or no more object exchanges among groups can be performed. For assessing the improvement attained, the modified algorithm (Early Stop K-means) was tested on six databases of the UCI repository, and the results were compared against SPSS, Weka and the standard K-means algorithm. Experimentally Early Stop K-means obtained important reductions in the number of iterations and improvements in the solution quality with respect to the other algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bottou, L., Bengio, Y.: Convergence Properties of the K-means Algorithms. Advances in Neural Information Processing Systems. MIT Press, Cambridge (1995)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. John Wiley & Sons, New York (1973)
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press (1996)
Fisher, D.: Knowlwdge Acquisition via Incremental Conceptual Clustering. Machine Learning 2(2), 139–172 (1987)
Garner, S.: Weka: The Waikato Environment for Knowledge Analysis. In: Proc. New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)
Hamerly, G., Elkan, C.: Alternatives to the K-means Algorithm that Find Better Clusterings. In: Proc. 11th International Conf. on Information and Knowledge Management CIKM’02. ACM. Virginia, USA (2002)
Kanungo, T., Netanyahu, N.S., Wu, A.Y.: An Efficient K-means Clustering Algorithm: Analysis and Implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7) (2002)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: A Local Search Approximation Algorithm for k-Means Clustering. In: Proc. 18th Annual ACM Symposium on Computational Geometry (SoCG’02). Barcelona, Spain, pp. 10–18 (2002)
Likas, A., Vlassis, N., Verbeek, J.J.: The Global K-means Clustering Algorithm. Pattern Recognition, 451–461 (2003)
MacQueen, J.: Some Methods for Classification and Analysis of Multivariate Observations. Proc. 5th Berkeley Symp. Math. Statistics and Probability 1, 281–297 (1967)
Mehmed, K.: Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons, Chichester (2003)
Meila, M., Heckerman, D.: An Experimental Comparison of Several Clustering and Initialization Methods. In: Proc. 14th Conf. on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Francisco
Merz, C., Murphy, P., Aha, D.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California, http://www.ics.uci.edu/~mlearn/MLRepository.html
Pelleg, D., Moore, A.: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Proc. 17th International Conf. on Machine Learning (2000)
Peña, J.M., Lozano, J.A., Larrañaga, P.: An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm. Dept. of Computer Science and Artificial Intelligence, University of the Basque Country, San Sebastian, España.
SPSS, Inc. Headquarters, Chicago, Illinois, http://www.spss.com/es/
Su, M.C., Chou, C.H.: A Modified Version of the K-Means Algorithm with a Distance Based on Cluster Symmetry. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(6), 674–680 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pérez O, J., Pazos R, R., Cruz R, L., Reyes S, G., Basave T, R., Fraire H, H. (2007). Improving the Efficiency and Efficacy of the K-means Clustering Algorithm Through a New Convergence Condition. In: Gervasi, O., Gavrilova, M.L. (eds) Computational Science and Its Applications – ICCSA 2007. ICCSA 2007. Lecture Notes in Computer Science, vol 4707. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74484-9_58
Download citation
DOI: https://doi.org/10.1007/978-3-540-74484-9_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74482-5
Online ISBN: 978-3-540-74484-9
eBook Packages: Computer ScienceComputer Science (R0)