Abstract
The problem of cluster analysis eludes a unique mathematical definition. Instead, a variety of different instantiations of the problem can be defined using specific measures of internal cluster validity. In turn, such internal cluster validity measures rely on quantifying dissimilarity between entities. This article explores the interaction between dissimilarity measures and internal cluster validity techniques in the context of multi-objective clustering. It does so by contrasting two conceptually different approaches to multi-objective clustering: the multi-criterion clustering algorithm \(\Delta\)-MOCK, designed to optimise different measures of internal cluster validity over a single dissimilarity space, and the multi-view clustering algorithm MVMC, designed to optimise a single measure of internal cluster validity over distinct dissimilarity spaces. Our comparison highlights the interchangeable roles of distance functions and measures of internal cluster validity, which paves the way for the future design of a flexible, dual-purpose approach to multi-objective clustering.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aljalbout E, Golkov V, Siddiqui Y, et al (2018) Clustering with deep learning: taxonomy and new methods. arXiv:1801.07648
Bayá AE, Granitto PM (2013) How many clusters: a validation index for arbitrary-shaped clusters. IEEE/ACM Trans Comput Biol Bioinf 10(2):401–14
de Carvalho F, Lechevallier Y, de Melo FM (2012) Partitioning hard clustering algorithms based on multiple dissimilarity matrices. Pattern Recogn 45(1):447–464
de Carvalho F, Lechevallier Y, Despeyroux T et al (2014) Multi-view clustering on relational data. In: Zighed F, Abdelkader G, Gilles P et al (eds) Advances in knowledge discovery and management. Springer, Heidelberg, pp 37–51
Delattre M, Hansen P (1980) Bicriterion cluster analysis. IEEE Trans Pattern Anal Mach Intell 2(4):277–291
Garza-Fabre M, Handl J, Knowles J (2018) An improved and more scalable evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 22(4):515–535
Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 11(1):56–76
Hennig C (2015) What are the true clusters? Pattern Recogn Lett 64:53–62
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
José-García A, Gómez-Flores W (2016) Automatic clustering using nature-Inspired metaheuristics: a survey. Appl Soft Comput 41:192–213
José-García A, Handl J (2021) On the interaction between distance functions and clustering criteria in multi-objective clustering. In: International conference on evolutionary multi-criterion optimization, Springer, pp 504–515
José-García A, Handl J, Gómez-Flores W et al (2019) Many-view clustering: An illustration using multiple dissimilarity measures. In: Press ACM (ed) Genetic and Evolutionary Computation Conference - GECCO ’19. Republic Prague, Czech, pp 213–214
José-García A, Handl J, Gómez-Flores W et al (2021) An evolutionary many-objective approach to multiview clustering using feature and relational data. Appl Soft Comput 108:1–15
Kanaan-Izquierdo S, Ziyatdinov A, Perera-Lluna A (2018) Multiview and multifeature spectral clustering using common eigenvectors. Pattern Recogn Lett 102:30–36
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley symposium on mathematical statistics and probability. University of California Press, pp 281–297
Mukhopadhyay A, Maulik U, Bandyopadhyay S (2015) A survey of multiobjective evolutionary clustering. ACM Comput Surv (CSUR) 47(4):1–46
Park Y, Song M (1998) A genetic algorithm for clustering problems. In: Proceedings of the Third Annual Conference on Genetic Programming, pp 568–575
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Santos JM, de Sá JM (2005) Human clustering on bi-dimensional data: an assessment. Tech. rep, INEB -Instituto de Engenharia Biomedica
Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038
Theodoridis S, Koutrumbas K (2009) Pattern recognition, 4th edn. Elsevier Inc, Amsterdam
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Statist Soc Ser B (Statist Methodol) 63(2):411–423
Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Zhang Q, Li H (2007) MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
José-García, A., Handl, J. What’s in a distance? Exploring the interplay between distance measures and internal cluster validity in multi-objective clustering. Nat Comput 22, 259–270 (2023). https://doi.org/10.1007/s11047-022-09909-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11047-022-09909-y