Abstract
Stability is a common tool to verify the validity of sample based algorithms. In clustering it is widely used to tune the parameters of the algorithm, such as the number k of clusters. In spite of the popularity of stability in practical applications, there has been very little theoretical analysis of this notion. In this paper we provide a formal definition of stability and analyze some of its basic properties. Quite surprisingly, the conclusion of our analysis is that for large sample size, stability is fully determined by the behavior of the objective function which the clustering algorithm is aiming to minimize. If the objective function has a unique global minimizer, the algorithm is stable, otherwise it is unstable. In particular we conclude that stability is not a well-suited tool to determine the number of clusters – it is determined by the symmetries of the data which may be unrelated to clustering parameters. We prove our results for center-based clusterings and for spectral clustering, and support our conclusions by many examples in which the behavior of stability is counter-intuitive.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ben-David, S.: A framework for statistical clustering with a constant time approximation algorithms for K-median clustering. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS, vol. 3120, pp. 415–426. Springer, Heidelberg (2004)
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing (2002)
Bousquet, O., Elisseeff, A.: Stability and generalization. JMLR 2(3), 499–526 (2002)
Chan, A., Godsil, C.: Symmetry and eigenvectors. In: Hahn, G., Sabidussi, G. (eds.) Graph Symmetry, Algebraic Methods and Applications. Kluwer, Dordrecht (1997)
Kulis, B., Dhillon, I., Guan, Y.: A unified view of kernel k-means, spectral clustering, and graph partitioning. Technical Report TR-04-25, UTCS Technical Report (2005)
Kutin, S., Niyogi, P.: Almost-everywhere algorithmic stability and generalization error. Technical report, TR-2002-03, University of of Chicago (2002)
Lange, T., Roth, V., Braun, M., Buhmann, J.: Stability-based validation of clustering solutions. Neural Computation (2004)
Rakhlin, A., Caponnetto, A.: Stability properties of empirical risk minimization over donsker classes. Technical report, MIT AI Memo 2005-018 (2005)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
von Luxburg, U., Belkin, M., Bousquet, O.: Consistency of spectral clustering. Technical Report 134, Max Planck Institute for Biological Cybernetics (2004)
von Luxburg, U., Ben-David, S.: Towards a statistical theory of clustering. In: PASCAL workshop on Statistics and Optimization of Clustering (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ben-David, S., von Luxburg, U., Pál, D. (2006). A Sober Look at Clustering Stability. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_4
Download citation
DOI: https://doi.org/10.1007/11776420_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35294-5
Online ISBN: 978-3-540-35296-9
eBook Packages: Computer ScienceComputer Science (R0)