Abstract
Given an arbitrary data set, to which no particular parametrical, statistical or geometrical structure can be assumed, different clustering algorithms will in general produce different data partitions. In fact, several partitions can also be obtained by using a single clustering algorithm due to dependencies on initialization or the selection of the value of some design parameter. This paper addresses the problem of finding consistent clusters in data partitions, proposing the analysis of the most common associations performed in a majority voting scheme. Combination of clustering results are performed by transforming data partitions into a co-association sample matrix, which maps coherent associations. This matrix is then used to extract the underlying consistent clusters. The proposed methodology is evaluated in the context of k-means clustering, a new clustering algorithm – voting-k-means, being presented. Examples, using both simulated and real data, show how this majority voting combination scheme simultaneously handles the problems of selecting the number of clusters, and dependency on initialization. Furthermore, resulting clusters are not constrained to be hyper-spherically shaped.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
H. Bischof and A. Leonardis. Vector quantization and minimum description length. In Sameer Singh, editor, International Conference on Advances on Pattern Recognition, pages 355–364. Springer Verlag, 1999.
J. Buhmann and M. Held. Unsupervised learning without overfitting: Empirical risk approximation as an induction principle for reliable clustering. In Sameer Singh, editor, International Conference on Advances in Pattern Recognition, pages 167–176. Springer Verlag, 1999.
T. Dietterich. Ensemble methods in machine learning. In Kittler and Roli, editors, Multiple Classifier Systems, volume 1857 of Lecture Notes in Computer Science, pages 1–15. Springer, 2000.
Y. El-Sonbaty and M. A. Ismail. On-line hierarchical clustering. Pattern Recognition Letters, pages 1285–1291, 1998.
A. L. Fred and J. Leitão. Clustering under a hypothesis of smooth dissimilarity increments. In Proc. of the 15th Int’l Conference on Pattern Recognition, volume 2, pages 190–194, Barcelona, 2000.
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
A.K. Jain, R. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Trans. Pattern Analysis and Machine Intelligence, 22:4–37, January 2000.
A.K. Jain, M. N. Murty, and P.J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264–323, September 1999.
J. Kittler. Pattern classification: Fusion of information. In S. Singh, editor, Int. Conf. on Advances in Pattern Recognition, pages 13–22, Plymouth, UK, November 1998. Springer.
J. Kittler, M. Hatef, R.P Duin, and J. Matas. On combining classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(3):226–239, 1998.
L. Lam. Classifier combinations: Implementations and theoretical issues. In Kittler and Roli, editors, Multiple Classifier Systems, volume 1857 of Lecture Notes in Computer Science, pages 78–86. Springer, 2000.
L. Lam and C. Y. Suen. Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans. Systems, Man, and Cybernetics, 27(5):553–568, 1997.
G. McLachlan and K. Basford. Mixture Models: Inference and Application to Clustering. Marcel Dekker, New York, 1988.
B. Mirkin. Concept learning and feature selection based on square-error clustering. Machine Learning, 35:25–39, 1999.
E. J. Pauwels and G. Frederix. Fiding regions of interest for content-extraction. In Proc. of IS&T/SPIE Conference on Storage and Retrieval for Image and Video Databases VII, volume SPIE Vol. 3656, pages 501–510, San Jose, January 1999.
S. Roberts, D. Husmeier, I. Rezek, and W. Penny. Bayesian approaches to gaussian mixture modelling. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(11), November 1998.
H. Tenmoto, M. Kudo, and M. Shimbo. Mdl-based selection of the number of components in mixture models for pattern recognition. In Adnan Amin, Dov Dori, Pavel Pudil, and Herbert Freeman, editors, Advances in Pattern Recognition, volume 1451 of Lecture Notes in Computer Science, pages 831–836. Springer Verlag, 1998.
C. Zahn. Graph-theoretical methods for detecting and describing gestalt structures. IEEE Trans. Computers, C-20(1):68–86, 1971.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fred, A. (2001). Finding Consistent Clusters in Data Partitions. In: Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2001. Lecture Notes in Computer Science, vol 2096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48219-9_31
Download citation
DOI: https://doi.org/10.1007/3-540-48219-9_31
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42284-6
Online ISBN: 978-3-540-48219-2
eBook Packages: Springer Book Archive