Abstract
In this paper, we propose a new approach based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) to perform clustering via minimum sum-of-squares Euclidean distance. The so called Minimum Sum-of-Squares Clustering (MSSC in short) is first formulated in the form of a hard combinatorial optimization problem. It is afterwards recast as a (continuous) DC program with the help of exact penalty in DC programming. A DCA scheme is then investigated. The related DCA is original and very inexpensive because it amounts to computing, at each iteration, the projection of points onto a simplex and/or onto a ball, that all are given in the explicit form. Numerical results on real word data sets show the efficiency of DCA and its great superiority with respect to K-means, a standard method of clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: Np-hardness of Euclidean Sum-of-squares Clustering, Cahiers du GERAD, G-2008-33 (2008)
Arora, S., Kannan, R.: Learning Mixtures of Arbitrary Gaussians. In: Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pp. 247–257 (2001)
Bradley, B.S., Mangasarian, O.L.: Feature Selection via Concave Minimization and Support Vector Machines. In: Shavlik, J. (ed.) Machine Learning Proceedings of the Fifteenth International Conferences (ICML 1998), pp. 82–90. MorganKaufmann, San Francisco (1998)
Brusco, M.J.: A Repetitive Branch-and-bound Procedure for Minimum Within-cluster Sum of Squares Partitioning. Psychometrika 71, 347–363 (2006)
Dhilon, I.S., Korgan, J., Nicholas, C.: Feature Selection and Document Clustering. In: Berry, M.W. (ed.) A Comprehensive Survey of Text Mining, pp. 73–100. Springer, Heidelberg (2003)
Duda, R.O., Hart, P.E.: Pattern classification and Scene Analysis. Wiley, Chichester (1972)
Feder, T., Greene, D.: Optimal Algorithms for Approximate Clustering. In: Proc. STOC (1988)
Fisher, D.: Knowledge Acquisition via Incremental Conceptual Clustering. Machine Learning 2, 139–172 (1987)
Forgy, E.: Cluster Analysis of Multivariate Date: Efficiency vs. Interpretability of Classifications. Biometrics, 21–768 (1965)
Jancey, R.C., Botany, J.: Multidimensional Group Analysis. Australian, 14–127 (1966)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: a Review. ACM Comput. Surv. 31(3), 264–323 (1999)
Krause, N., Singer, Y.: Leveraging the Margin More Carefully. In: International Conference on Machine Learning ICML (2004)
Le, T.H.A.: Contribution à l’optimisation non convexe et l’optimisation globale: Théorie, Algoritmes et Applications, Habilitation à Diriger des Recherches, Université de Rouen (1997)
Le, T.H.A., Pham, D.T.: Solving a Class of Linearly Constrained Indefinite Quadratic Problems by DC Algorithms. Journal of Global Optimization 11, 253–285 (1997)
Le, T.H.A., Pham, D.T.: The DC (Difference of Convex Functions) Programming and DCA Revisited with DC Models of Real World Nonconvex Optimization Problems. Annals of Operations Research 133, 23–46 (2005)
Le, T.H.A., Pham, D.T., Huynh, V.: Ngai, Exact penalty in DC Programming, Technical Report. LMI, INSA-Rouen (2005)
Le, T.H.A., Belghiti, T., Pham, D.T.: A New Efficient Algorithm Based on DC Programming and DCA for Clustering. Journal of Global Optimization 37, 593–608 (2007)
Le, T.H.A., Le, H.M., Pham, D.T.: Optimization Based DC Programming and DCA for Hierarchical Clustering. European Journal of Operational Research (2006)
Le, T.H.A., Le, H.M., Nguyen, V.V., Pham, D.T.: A DC Programming Approach for Feature Selection in Support Vector Machines Learning. Journal of Advances in Data Analysis and Classification 2, 259–278 (2008)
Liu, Y., Shen, X., Doss, H.: Multicategory ψ-Learning and Support Vector Machine: Computational Tools. Journal of Computational and Graphical Statistics 14, 219–236 (2005)
Liu, Y., Shen, X.: Multicategoryψ -Learning. Journal of the American Statistical Association 101, 500–509 (2006)
Mangasarian, O.L.: Mathematical Programming in Data Mining. Data Mining and Knowledge Discovery 1, 183–201 (1997)
MacQueen, J.B.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Merle, O.D., Hansen, P., Jaumard, B., Mladenovi’c, N.: An Interior Point Algorithm for Minimum Sum of Squares Clustering. SIAM J. Sci. Comput. 21, 1485–1505 (2000)
Neumann, J., Schnörr, C., Steidl, G.: SVM-based feature selection by direct objective minimisation. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 212–219. Springer, Heidelberg (2004)
Peng, J., Xiay, Y.: A Cutting Algorithm for the Minimum Sum-of-Squared Error Clustering. In: Proceedings of the SIAM International Data Mining Conference (2005)
Pham, D.T., Le, T.H.A.: DC Optimization Algorithms for Solving the Trust Region Subproblem. SIAM J. Optimization 8, 476–505 (1998)
Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Mathematical Programming 79, 191–215 (1997)
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
Ronan, C., Fabian, S., Jason, W., Léon, B.: Trading Convexity for Scalability. In: International Conference on Machine Learning ICML (2006)
Shen, X., Tseng, G.C., Zhang, X., Wong, W.H.: ψ -Learning. Journal of American Statistical Association 98, 724–734 (2003)
Sherali, H.D., Desai, J.: A global Optimization RLT-based Approach for Solving the Hard Clustering Problem. Journal of Global Optimization 32, 281–306 (2005)
Yuille, A.L., Rangarajan, A.: The Convex Concave Procedure (CCCP). In: Advances in Neural Information Processing System, vol. 14. MIT Press, Cambrige (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hoai An, L.T., Tao, P.D. (2009). Minimum Sum-of-Squares Clustering by DC Programming and DCA. In: Huang, DS., Jo, KH., Lee, HH., Kang, HJ., Bevilacqua, V. (eds) Emerging Intelligent Computing Technology and Applications. With Aspects of Artificial Intelligence. ICIC 2009. Lecture Notes in Computer Science(), vol 5755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04020-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-04020-7_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04019-1
Online ISBN: 978-3-642-04020-7
eBook Packages: Computer ScienceComputer Science (R0)