Determinantal consensus clustering

297 Accesses
Explore all metrics

Abstract

Random restart of a given algorithm produces many partitions that can be aggregated to yield a consensus clustering. Ensemble methods have been recognized as more robust approaches for data clustering than single clustering algorithms. We propose the use of determinantal point processes or DPPs for the random restart of clustering algorithms based on initial sets of center points, such as k-medoids or k-means. The relation between DPPs and kernel-based methods makes DPPs suitable to describe and quantify similarity between objects. DPPs favor diversity of the center points in initial sets, so that sets with similar points have less chance of being generated than sets with very distinct points. Most current inital sets are generated with center points sampled uniformly at random. We show through extensive simulations that, contrary to DPPs, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets. The latter are two key properties that make DPPs achieve good performance. Simulations with artificial datasets and applications to real datasets show that determinantal consensus clustering outperforms consensus clusterings which are based on uniform random sampling of center points.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Consensus function based on cluster-wise two level clustering

Article 26 July 2020

Clustering ensemble extraction: a knowledge reuse framework

Article 27 March 2024

A multi-level consensus function clustering ensemble

Article 13 September 2021

References

Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49–60
Google Scholar
Ao SI, Yip K, Ng M, Cheung D, Fong P, Melhado I, Sham P (2005) Clustag: hierarchical clustering and graph methods for selecting tag SNPS. Bioinformatics 21:1735–6. https://doi.org/10.1093/bioinformatics/bti201
Article Google Scholar
Arthur D, Vassilvitskii S (2007) K-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, USA, SODA ’07. Society for Industrial and Applied Mathematics, pp 1027–1035
Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23(3):345–405. https://doi.org/10.1145/116873.116880
Article Google Scholar
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821
MathSciNet MATH Google Scholar
Ben Hough J, Krishnapur M, Peres Y, Virág B (2006) Determinantal processes and independence. Probab Surv [electronic only] 3:206–229
MathSciNet MATH Google Scholar
Bicego M, Baldo S (2016) Properties of the Box–Cox transformation for pattern classification. Neurocomputing 218:390–400
Google Scholar
Bien J, Tibshirani R (2011) Hierarchical clustering with prototypes via minimax linkage. J Am Stat Assoc 106:1075–1084. https://doi.org/10.1198/jasa.2011.tm10183
Article MathSciNet MATH Google Scholar
Bilodeau M, Nangue AG (2017) Tests of mutual or serial independence of random vectors with applications. J Mach Learn Res 18(1):2518–2557
MathSciNet MATH Google Scholar
Blatt M, Wiseman S, Domany E (1996) Superparamagnetic clustering of data. Phys Rev Lett 76:3251–3254
Google Scholar
Blatt M, Wiseman S, Domany E (1997) Data clustering using a model granular magnet. Neural Comput 9(8):1805–1842
Google Scholar
Borodin A, Olshanski G (2000) Distributions on partitions, point processes, and the hypergeometric kernel. Commun Math Phys 211:335–358. https://doi.org/10.1007/s002200050815 arXiv:math/9904010
Article MathSciNet MATH Google Scholar
Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc B 26:211–252
MATH Google Scholar
Budiaji W, Leisch F (2019) Simple k-medoids partitioning algorithm for mixed variable data. Algorithms 12(9):177
Google Scholar
Capó M, Pérez A, Lozano JA (2017) An efficient approximation to the k-means clustering for massive data. Knowl-Based Syst 117:56–69
Google Scholar
Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210. https://doi.org/10.1016/j.eswa.2012.07.021
Article Google Scholar
Chaudhuri A, Kakde D, Sadek C, Gonzalez L, Kong S (2017) The mean and median criteria for kernel bandwidth selection for support vector data description. In: 2017 IEEE international conference on data mining workshops (ICDMW). IEEE, pp 842–849
Chen D, Xing K, Henson D, Sheng L, Schwartz AM, Cheng X (2009) Developing prognostic systems of cancer patients by ensemble clustering. J Biomed Biotechnol 2009:632786
Google Scholar
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press, Boca Raton
MATH Google Scholar
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231
Google Scholar
Fan Z, Jiang X, Xu B, Jiang Z (2010) An automatic index validity for clustering. In: Tan Y, Shi Y, Tan KC (eds) Advances in swarm intelligence. Springer, Berlin, pp 359–366
Google Scholar
Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recogn 41(1):176–190
MATH Google Scholar
Florek K, Łukaszewicz J, Perkal J, Steinhaus H, Zubrzycki S (1951) Sur la liaison et la division des points d’un ensemble fini. Colloq Math 2:282–285
MathSciNet Google Scholar
Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769
Google Scholar
Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588
MATH Google Scholar
Fränti P, Sieranoja S (2019) How much can k-means be improved by using better initialization and repeats? Pattern Recogn 93:95–112
Google Scholar
Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13(3):780–784
Google Scholar
Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306
MathSciNet MATH Google Scholar
Hafiz Affandi R, Fox EB, Taskar B (2013) Approximate inference in continuous determinantal point processes. ArXiv e-prints arXiv:1311.2971
Hafiz Affandi R, Fox EB, Adams RP, Taskar B (2014) Learning the parameters of determinantal point process kernels. arXiv e-prints arXiv:1402.4862
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco
MATH Google Scholar
Hennig C (2019) Cluster validation by measurement of clustering characteristics relevant to the user, Chap 1. Wiley, Hoboken, pp 1–24
Google Scholar
Herbrich R (2001) Learning kernel classifiers: theory and algorithms. MIT Press, Cambridge
MATH Google Scholar
Hinneburg A, Keim DA (1999) Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering. In: 25th International conference on very large databases, pp 506–517
Howley T, Madden MG (2006) An evolutionary approach to automatic kernel construction. In: Kollias S, Stafylopatis A, Duch W, Oja E (eds) Artificial Neural Networks—ICANN 2006. Springer, Berlin, pp 417–426
Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218. https://doi.org/10.1007/BF01908075
Article MATH Google Scholar
Ibrahim LF, Harbi MHA (2013) Using modified partitioning around medoids clustering technique in mobile network planning. arXiv preprint arXiv:1302.6602
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall Inc, Upper Saddle River
MATH Google Scholar
Kang B (2013) Fast determinantal point process sampling with application to clustering. In: Burges C, Bottou L, Welling M, Ghahramani Z, Weinberger K (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., New York, pp 2319–2327
Google Scholar
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab—an s4 package for kernel methods in r. J Stat Softw 11(9):1–20
Google Scholar
Katsavounidis I, Kuo CCJ, Zhang Z (1994) A new initialization technique for generalized Lloyd iteration. IEEE Signal Process Lett 1(10):144–146
Google Scholar
Kaufmann L, Rousseeuw P (1987) Clustering by means of medoids. In: Proceedings of the statistical data analysis based on the L1 norm conference, Neuchatel, Switzerland, vol 31, pp 405–416
Kulesza A, Taskar B (2012) Determinantal point processes for machine learning. Found Trends Mach Learn 5(2–3):123–286. https://doi.org/10.1561/2200000044
Article MATH Google Scholar
Lago-Fernández LF, Corbacho F (2010) Normality-based validation for crisp clustering. Pattern Recogn 43(3):782–795
MATH Google Scholar
Lewis D (1997) Reuters-21578 text categorization collection, distribution 1.0. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html
Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 296–304
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:129–136
MathSciNet MATH Google Scholar
Maitra R (2009) Initializing partition-optimization algorithms. IEEE/ACM Trans Comput Biol Bioinf 6(1):144–157
Google Scholar
Melnykov V, Chen WC, Maitra R (2012) Mixsim: an r package for simulating data to study performance of clustering algorithms. J Stat Softw 51(12):1–25. https://doi.org/10.18637/jss.v051.i12
Article Google Scholar
Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1):91–118
MATH Google Scholar
Muñoz J, Murua A (2018) Building cancer prognosis systems with survival function clusters. Stat Anal Data Min ASA Data Sci J 11(3):98–110. https://doi.org/10.1002/sam.11373
Article MathSciNet MATH Google Scholar
Murua A, Wicker N (2014) The conditional-Potts clustering model. J Comput Graph Stat 23(3):717–739
MathSciNet Google Scholar
Murua A, Stanberry L, Stuetzle W (2008) On Potts model clustering, kernel k-means, and density estimation. J Comput Graph Stat 17(3):629–658
MathSciNet Google Scholar
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856
Google Scholar
Okabe A, Boots B, Sugihara K, Chiu SN (2000) Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, 2nd edn. Series in Probability and Statistics. Wiley, Hoboken
MATH Google Scholar
Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36(2):3336–3341
Google Scholar
Pena JM, Lozano JA, Larranaga P (1999) An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn Lett 20(10):1027–1040
Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Google Scholar
Saad D (1998) Online algorithms and stochastic approximations. Online Learn 5:3–6
Google Scholar
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Google Scholar
Schölkopf B, Tsuda K, Vert JP (2004) Kernel methods in computational biology. MIT Press, Cambridge
Google Scholar
Schubert E, Rousseeuw PJ (2019) Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. In: International conference on similarity search and applications. Springer, pp 171–187
Sejdinovic D, Sriperumbudur B, Gretton A, Fukumizu K (2013) Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann Stat 41:2263–2291
MathSciNet MATH Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Google Scholar
Smyth P (1997) Clustering sequences with hidden Markov models. In: Proceedings of the 9th international conference on neural information processing systems (NIPS 1996). MIT Press, Cambridge, pp 648–654
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617. https://doi.org/10.1162/153244303321897735
Article MathSciNet MATH Google Scholar
Stuetzle W (2003) Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J Classif 20(1):25–47
MathSciNet MATH Google Scholar
Stuetzle W, Nugent R (2010) A generalized single linkage method for estimating the cluster tree of a density. J Comput Graph Stat 19(2):397–418
MathSciNet Google Scholar
Thygesen HH, Zwinderman AH (2004) Comparing transformation methods for DNA microarray data. BMC Bioinform 5(1):77
Google Scholar
Vanschoren J, van Rijn JN, Bischl B, Torgo L (2013) Openml: networked science in machine learning. SIGKDD Explor 15(2):49–60. https://doi.org/10.1145/2641190.2641198
Article Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin
MATH Google Scholar
Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recogn Artif Intell 25(03):337–372. https://doi.org/10.1142/S0218001411008683
Article MathSciNet Google Scholar
Verleysen M, François D (2005) The curse of dimensionality in data mining and time series prediction. In: International work-conference on artificial neural networks. Springer, pp 758–770
Vert JP, Tsuda K, Schölkopf B (2004) A primer on kernel methods. Kernel Methods Comput Biol 47:35–70
Google Scholar
Wang F, Landau DP (2001) Efficient, multiple-range random walk algorithm to calculate the density of states. Phys Rev Lett 86(10):2050
Google Scholar
Wang W, Yang J, Muntz R et al (1997) Sting: a statistical information grid approach to spatial data mining. VLDB 97:186–195
Google Scholar
Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
MathSciNet Google Scholar
Watkins C (1999) Dynamic alignment kernels. In: Smola AJ, Bartlett PL, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, pp 39–50
Xuan L, Zhigang C, Fan Y (2013) Exploring of clustering algorithm on class-imbalanced data. In: 8th International conference on computer science and education, ICCSE 2013, pp 89–93. https://doi.org/10.1109/ICCSE.2013.6553890
Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17:977–987
Google Scholar

Download references

Acknowledgements

This work was partially funded by a Discovery grant 2019-05444 from the Natural Sciences and Engineering Research Council of Canada, and by a Fundamental Research Projects grant PRF-2017-20 from the Institute for Data Valorization (IVADO) of Quebec, Canada.

Author information

Authors and Affiliations

Département de Mathématiques et de Statistique, Université de Montréal, Montreal, Canada
Serge Vicente & Alejandro Murua-Sazo

Authors

Serge Vicente
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Murua-Sazo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alejandro Murua-Sazo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: Proof that an increasing sequence of thresholds gives rise to hierarchical cluster partitions

Consider s given sequence of increasing thresholds $\theta _1< \theta _2< \cdots < \theta _m$. As mentioned in Sect. 4, the associated consolidated matrices $C^{(1)}, \ldots , C^{(m)}$ are nested in the sense that when $s < t$, $C^{(s)}_{ij} \ge C^{(t)}_{ij}$ for all pairs (i, j) $s, t=1, \ldots ,m$. Let $\mathcal {P}_s$ and $\mathcal {P}_t$ be the connected component partitions associated with the consolidated matrices $C^{(s)}$ and $C^{(t)}$, respectively. Let points $x_i$ and $x_j$ be in the same connected component of $\mathcal {P}_t$. That is, there is a path $x_i \mapsto x_{l_1} \mapsto \cdots \mapsto x_{l_{g-1}} \mapsto x_j$ of points satisfying $C^{(t)}_{l_h, l_{h+1}} = 1$, $h=0, \ldots , g-1$, with $l_0 = i,$ and $l_{g} = j$. Since $C^{(s)}_{ij} \ge C^{(t)}_{ij}$ for all pairs (i, j), necessarily points $x_i$ and $x_j$ are also in the same connected component of $\mathcal {P}_s$. The converse is not true. So suppose that points $x_i$ and $x_j$ are in the same connected component, say $\mathcal {C}_\ell $ of $\mathcal {P}_s$. That is, there is a path $x_i \mapsto x_{l_1} \mapsto \cdots \mapsto x_{l_{g-1}} \mapsto x_j$ of points satisfying $C^{(s)}_{l_h, l_{h+1}} = 1$, $h=0, \ldots , g-1$, with $l_0 = i,$ and $l_{g} = j$. Suppose as well that for some $h\in \{0, \ldots , g-1\}$, $C^{(t)}_{l_h, l_{h+1}} = 0$. Then the proportion of times the pair $(x_i, x_j)$ appear together is at least $\theta _s$, but less than $\theta _t$. Points $x_i$ and $x_j$ are necessarily in different connected components of $\mathcal {P}_t$. This means that the component $\mathcal {C}_\ell \in \mathcal {P}_s$ has split in several connected components isolated from all other components of $\mathcal {P}_s$. In this sense the sequence of thresholds $\theta _1< \theta _2< \cdots < \theta _m$, leads to a sequence of partitions satisfying $\mathcal {P}_1 \supseteq \mathcal {P}_2 \supseteq \cdots \supseteq \mathcal {P}_m$, where $\mathcal {P}_s \supseteq \mathcal {P}_t$ means that each connected component in $\mathcal {P}_t$ arises from a connected component of $\mathcal {P}_s$, and the number of connected components of $\mathcal {P}_t$ is larger or equal to the number of connected components of $\mathcal {P}_s$.

Appendix: Scattering formulas in the transformed space: using the kernel trick

Consider the RBF kernel $\kappa (x_i,x_j)$ defined by (3). The mean scattering induced by the kernel on the data (Vert et al. 2004; Fan et al. 2010) is defined as

$$\begin{aligned} V_{\mathcal {S}} = \frac{1}{n}\sum _{i=1}^{n}\left\{ \kappa (x_i,x_i)-\sum _{j=1}^{n}\frac{2\, \kappa (x_i,x_j)}{n}+\sum _{j, \ell =1}^{n}\frac{\kappa (x_j,x_\ell )}{n^2} \right\} ^{\frac{1}{2}}. \end{aligned}$$

This formula is derived using the kernel trick which links kernel evaluations with inner products between transformed points in a high-dimensional space. It can be shown that this quantity corresponds to the average distance between the transformed points and the mean of the transformed points. Let $\mathcal {V}= \{\mathcal {V}_1, \ldots , \mathcal {V}_K\}$ be a clustering configuration with K clusters. Also, let $n_k$ be the size of cluster $\mathcal {V}_k$, and let $J_k = \{ i : x_i \in \mathcal {V}_k\}$, $k=1,\ldots ,K.$ As with the mean scattering, Fan et al. (2010) define similarly, the kernel induced scattering within each cluster $\mathcal {V}_k$,

$$\begin{aligned} W_{\mathcal {V}_k} = \frac{1}{n_k}\sum _{i\in J_k} \left\{ \kappa (x_i,x_i)-\sum _{j\in J_k}\frac{2\, \kappa (x_i,x_j)}{n_k}+\sum _{j, \ell \in J_k} \frac{\kappa (x_j,x_\ell )}{n_k^2} \right\} ^{\frac{1}{2}} \end{aligned}$$

which can be seen as the average distance between the mean of the clusters $\mathcal {V}_k$ in the transformed space and the transformed points of cluster $\mathcal {V}_k$. Finally, to measure the total scattering between clusters, one considers the distance between clusters in the transformed space,

$$\begin{aligned} B^2(\mathcal {V}_i,\mathcal {V}_j) = \bigl [ B(\mathcal {V}_i,\mathcal {V}_j) \bigr ]^2&= \sum _{i, j \in J_i} \frac{\kappa (x_i,x_j)}{n_i^2} - \sum _{i\in J_i}\sum _{j\in J_j}\frac{2 \kappa (x_i,x_j)}{n_in_j} +\sum _{i, j\in J_j} \frac{\kappa (x_i,x_j)}{n_j^2}. \end{aligned}$$

Appendix: Scheme to sample $\varvec{Y}\sim DPP_{\mathcal {S}}(L)$

Below $e_i$ represents the ith standard basis vector, a n-dimensional vector that is all zeros except for a one in the ith position. The span of a given finite collection of vectors V, denoted by ${{\,\mathrm{span}\,}}(V)$, is the set containing of all linear combinations of the vectors in V, i.e., ${{\,\mathrm{span}\,}}(V)=\left\{ \sum _{v\in V}a_v v \mid a_v\in \mathbb {R}\right\} $.

Appendix: Sensitivity analysis on kernel choice

As mentioned in Sect. 5.3, we considered three kernels: the linear kernel, $\kappa (x_i,x_j)=x_i^Tx_j$, and two variants of the q-stable kernel (Bilodeau and Nangue 2017), also known as the q-Gaussian kernel (Verleysen and François 2005), which is given by $\kappa (x_i, x_j)=\exp \left( -\left\Vert x_i-x_j\right\Vert ^{q}/\sigma ^q\right) $, $q\in (0,2]$, where $\sigma >0$ is the kernel bandwidth. The variants considered are the Laplace kernel ($q=1$), and the Gaussian kernel ($q=2$).

Two estimates of the bandwidth associated with each one of the two q-stable kernels studied were considered. For the Laplace kernel, the bandwidth estimates were the mean $\hat{\sigma }_2$ and the median $\tilde{\sigma }_2$ of all pairwise Euclidean distances. For the Gaussian kernel, the estimates were the mean $\hat{\sigma }^2$ and the median $\tilde{\sigma }^2$ of all pairwise squared Euclidean distances. Note that the Gaussian kernel with bandwidth estimate $\hat{\sigma }^2$ corresponds to our proposed DPP kernel.

We further investigated the behavior of the kernels on data whose clusters are non-Gaussian. For this simulation, we followed the setup of Sect. 5.2. Recall that the data were generated from 24 experimental conditions of a $3^3$-factorial design. For this study, we generated five replicas from each experimental condition. The non-ellipsoidal (non-Gaussian) clusters were obtained by applying the inverse Box–Cox transformation to the datasets, As before, we selected $\lambda =1/3$, $\lambda =1/2$, and $\lambda = 2$ for the power parameter $\lambda $ of the Box–Cox transformations. We run the consensus DPP algorithm described in Sect. 4 on each dataset with the five different kernels. The values of the consensus DPP parameters were set to $a=0.5$, $R=200$, and $\tau =0.6$. To select the optimal clustering configuration, we used the kernel-based validation index KVI$_\mathcal {V}$ given by (6). To measure its quality, we use the adjusted Rand index (see Sect. 4.2). The results are displayed in Fig. 5 in the form of barplots. Each bar represents a different kernel or consensus method. The segments on top of the bars represents ± one standard error. We also compare the results from the different kernels with those yielded by CC-PAM and CC-k-means$++$.

For both q-stable kernels, estimating the bandwidth parameter with the median or the mean of all the pairwise distances do not produce any major difference in the ARI results.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Vicente, S., Murua-Sazo, A. Determinantal consensus clustering. Adv Data Anal Classif 17, 829–858 (2023). https://doi.org/10.1007/s11634-022-00514-6

Download citation

Received: 21 January 2021
Revised: 16 June 2022
Accepted: 27 July 2022
Published: 25 August 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11634-022-00514-6