Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Determinantal consensus clustering

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Random restart of a given algorithm produces many partitions that can be aggregated to yield a consensus clustering. Ensemble methods have been recognized as more robust approaches for data clustering than single clustering algorithms. We propose the use of determinantal point processes or DPPs for the random restart of clustering algorithms based on initial sets of center points, such as k-medoids or k-means. The relation between DPPs and kernel-based methods makes DPPs suitable to describe and quantify similarity between objects. DPPs favor diversity of the center points in initial sets, so that sets with similar points have less chance of being generated than sets with very distinct points. Most current inital sets are generated with center points sampled uniformly at random. We show through extensive simulations that, contrary to DPPs, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets. The latter are two key properties that make DPPs achieve good performance. Simulations with artificial datasets and applications to real datasets show that determinantal consensus clustering outperforms consensus clusterings which are based on uniform random sampling of center points.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) Optics: ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49–60

    Google Scholar 

  • Ao SI, Yip K, Ng M, Cheung D, Fong P, Melhado I, Sham P (2005) Clustag: hierarchical clustering and graph methods for selecting tag SNPS. Bioinformatics 21:1735–6. https://doi.org/10.1093/bioinformatics/bti201

    Article  Google Scholar 

  • Arthur D, Vassilvitskii S (2007) K-means++: The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, USA, SODA ’07. Society for Industrial and Applied Mathematics, pp 1027–1035

  • Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23(3):345–405. https://doi.org/10.1145/116873.116880

    Article  Google Scholar 

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821

    MathSciNet  MATH  Google Scholar 

  • Ben Hough J, Krishnapur M, Peres Y, Virág B (2006) Determinantal processes and independence. Probab Surv [electronic only] 3:206–229

    MathSciNet  MATH  Google Scholar 

  • Bicego M, Baldo S (2016) Properties of the Box–Cox transformation for pattern classification. Neurocomputing 218:390–400

    Google Scholar 

  • Bien J, Tibshirani R (2011) Hierarchical clustering with prototypes via minimax linkage. J Am Stat Assoc 106:1075–1084. https://doi.org/10.1198/jasa.2011.tm10183

    Article  MathSciNet  MATH  Google Scholar 

  • Bilodeau M, Nangue AG (2017) Tests of mutual or serial independence of random vectors with applications. J Mach Learn Res 18(1):2518–2557

    MathSciNet  MATH  Google Scholar 

  • Blatt M, Wiseman S, Domany E (1996) Superparamagnetic clustering of data. Phys Rev Lett 76:3251–3254

    Google Scholar 

  • Blatt M, Wiseman S, Domany E (1997) Data clustering using a model granular magnet. Neural Comput 9(8):1805–1842

    Google Scholar 

  • Borodin A, Olshanski G (2000) Distributions on partitions, point processes, and the hypergeometric kernel. Commun Math Phys 211:335–358. https://doi.org/10.1007/s002200050815arXiv:math/9904010

    Article  MathSciNet  MATH  Google Scholar 

  • Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc B 26:211–252

    MATH  Google Scholar 

  • Budiaji W, Leisch F (2019) Simple k-medoids partitioning algorithm for mixed variable data. Algorithms 12(9):177

    Google Scholar 

  • Capó M, Pérez A, Lozano JA (2017) An efficient approximation to the k-means clustering for massive data. Knowl-Based Syst 117:56–69

    Google Scholar 

  • Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40(1):200–210. https://doi.org/10.1016/j.eswa.2012.07.021

    Article  Google Scholar 

  • Chaudhuri A, Kakde D, Sadek C, Gonzalez L, Kong S (2017) The mean and median criteria for kernel bandwidth selection for support vector data description. In: 2017 IEEE international conference on data mining workshops (ICDMW). IEEE, pp 842–849

  • Chen D, Xing K, Henson D, Sheng L, Schwartz AM, Cheng X (2009) Developing prognostic systems of cancer patients by ensemble clustering. J Biomed Biotechnol 2009:632786

    Google Scholar 

  • Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231

    Google Scholar 

  • Fan Z, Jiang X, Xu B, Jiang Z (2010) An automatic index validity for clustering. In: Tan Y, Shi Y, Tan KC (eds) Advances in swarm intelligence. Springer, Berlin, pp 359–366

    Google Scholar 

  • Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recogn 41(1):176–190

    MATH  Google Scholar 

  • Florek K, Łukaszewicz J, Perkal J, Steinhaus H, Zubrzycki S (1951) Sur la liaison et la division des points d’un ensemble fini. Colloq Math 2:282–285

    MathSciNet  Google Scholar 

  • Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769

    Google Scholar 

  • Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588

    MATH  Google Scholar 

  • Fränti P, Sieranoja S (2019) How much can k-means be improved by using better initialization and repeats? Pattern Recogn 93:95–112

    Google Scholar 

  • Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13(3):780–784

    Google Scholar 

  • Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306

    MathSciNet  MATH  Google Scholar 

  • Hafiz Affandi R, Fox EB, Taskar B (2013) Approximate inference in continuous determinantal point processes. ArXiv e-prints arXiv:1311.2971

  • Hafiz Affandi R, Fox EB, Adams RP, Taskar B (2014) Learning the parameters of determinantal point process kernels. arXiv e-prints arXiv:1402.4862

  • Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco

    MATH  Google Scholar 

  • Hennig C (2019) Cluster validation by measurement of clustering characteristics relevant to the user, Chap 1. Wiley, Hoboken, pp 1–24

    Google Scholar 

  • Herbrich R (2001) Learning kernel classifiers: theory and algorithms. MIT Press, Cambridge

    MATH  Google Scholar 

  • Hinneburg A, Keim DA (1999) Optimal grid-clustering: towards breaking the curse of dimensionality in high-dimensional clustering. In: 25th International conference on very large databases, pp 506–517

  • Howley T, Madden MG (2006) An evolutionary approach to automatic kernel construction. In: Kollias S, Stafylopatis A, Duch W, Oja E (eds) Artificial Neural Networks—ICANN 2006. Springer, Berlin, pp 417–426

    Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218. https://doi.org/10.1007/BF01908075

    Article  MATH  Google Scholar 

  • Ibrahim LF, Harbi MHA (2013) Using modified partitioning around medoids clustering technique in mobile network planning. arXiv preprint arXiv:1302.6602

  • Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall Inc, Upper Saddle River

    MATH  Google Scholar 

  • Kang B (2013) Fast determinantal point process sampling with application to clustering. In: Burges C, Bottou L, Welling M, Ghahramani Z, Weinberger K (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., New York, pp 2319–2327

    Google Scholar 

  • Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab—an s4 package for kernel methods in r. J Stat Softw 11(9):1–20

    Google Scholar 

  • Katsavounidis I, Kuo CCJ, Zhang Z (1994) A new initialization technique for generalized Lloyd iteration. IEEE Signal Process Lett 1(10):144–146

    Google Scholar 

  • Kaufmann L, Rousseeuw P (1987) Clustering by means of medoids. In: Proceedings of the statistical data analysis based on the L1 norm conference, Neuchatel, Switzerland, vol 31, pp 405–416

  • Kulesza A, Taskar B (2012) Determinantal point processes for machine learning. Found Trends Mach Learn 5(2–3):123–286. https://doi.org/10.1561/2200000044

    Article  MATH  Google Scholar 

  • Lago-Fernández LF, Corbacho F (2010) Normality-based validation for crisp clustering. Pattern Recogn 43(3):782–795

    MATH  Google Scholar 

  • Lewis D (1997) Reuters-21578 text categorization collection, distribution 1.0. http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

  • Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 296–304

  • Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:129–136

    MathSciNet  MATH  Google Scholar 

  • Maitra R (2009) Initializing partition-optimization algorithms. IEEE/ACM Trans Comput Biol Bioinf 6(1):144–157

    Google Scholar 

  • Melnykov V, Chen WC, Maitra R (2012) Mixsim: an r package for simulating data to study performance of clustering algorithms. J Stat Softw 51(12):1–25. https://doi.org/10.18637/jss.v051.i12

    Article  Google Scholar 

  • Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1):91–118

    MATH  Google Scholar 

  • Muñoz J, Murua A (2018) Building cancer prognosis systems with survival function clusters. Stat Anal Data Min ASA Data Sci J 11(3):98–110. https://doi.org/10.1002/sam.11373

    Article  MathSciNet  MATH  Google Scholar 

  • Murua A, Wicker N (2014) The conditional-Potts clustering model. J Comput Graph Stat 23(3):717–739

    MathSciNet  Google Scholar 

  • Murua A, Stanberry L, Stuetzle W (2008) On Potts model clustering, kernel k-means, and density estimation. J Comput Graph Stat 17(3):629–658

    MathSciNet  Google Scholar 

  • Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 14:849–856

    Google Scholar 

  • Okabe A, Boots B, Sugihara K, Chiu SN (2000) Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, 2nd edn. Series in Probability and Statistics. Wiley, Hoboken

    MATH  Google Scholar 

  • Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36(2):3336–3341

    Google Scholar 

  • Pena JM, Lozano JA, Larranaga P (1999) An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn Lett 20(10):1027–1040

    Google Scholar 

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Google Scholar 

  • Saad D (1998) Online algorithms and stochastic approximations. Online Learn 5:3–6

    Google Scholar 

  • Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319

    Google Scholar 

  • Schölkopf B, Tsuda K, Vert JP (2004) Kernel methods in computational biology. MIT Press, Cambridge

    Google Scholar 

  • Schubert E, Rousseeuw PJ (2019) Faster k-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. In: International conference on similarity search and applications. Springer, pp 171–187

  • Sejdinovic D, Sriperumbudur B, Gretton A, Fukumizu K (2013) Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann Stat 41:2263–2291

    MathSciNet  MATH  Google Scholar 

  • Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Google Scholar 

  • Smyth P (1997) Clustering sequences with hidden Markov models. In: Proceedings of the 9th international conference on neural information processing systems (NIPS 1996). MIT Press, Cambridge, pp 648–654

  • Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617. https://doi.org/10.1162/153244303321897735

    Article  MathSciNet  MATH  Google Scholar 

  • Stuetzle W (2003) Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J Classif 20(1):25–47

    MathSciNet  MATH  Google Scholar 

  • Stuetzle W, Nugent R (2010) A generalized single linkage method for estimating the cluster tree of a density. J Comput Graph Stat 19(2):397–418

    MathSciNet  Google Scholar 

  • Thygesen HH, Zwinderman AH (2004) Comparing transformation methods for DNA microarray data. BMC Bioinform 5(1):77

    Google Scholar 

  • Vanschoren J, van Rijn JN, Bischl B, Torgo L (2013) Openml: networked science in machine learning. SIGKDD Explor 15(2):49–60. https://doi.org/10.1145/2641190.2641198

    Article  Google Scholar 

  • Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  • Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recogn Artif Intell 25(03):337–372. https://doi.org/10.1142/S0218001411008683

    Article  MathSciNet  Google Scholar 

  • Verleysen M, François D (2005) The curse of dimensionality in data mining and time series prediction. In: International work-conference on artificial neural networks. Springer, pp 758–770

  • Vert JP, Tsuda K, Schölkopf B (2004) A primer on kernel methods. Kernel Methods Comput Biol 47:35–70

    Google Scholar 

  • Wang F, Landau DP (2001) Efficient, multiple-range random walk algorithm to calculate the density of states. Phys Rev Lett 86(10):2050

    Google Scholar 

  • Wang W, Yang J, Muntz R et al (1997) Sting: a statistical information grid approach to spatial data mining. VLDB 97:186–195

    Google Scholar 

  • Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244

    MathSciNet  Google Scholar 

  • Watkins C (1999) Dynamic alignment kernels. In: Smola AJ, Bartlett PL, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, pp 39–50

  • Xuan L, Zhigang C, Fan Y (2013) Exploring of clustering algorithm on class-imbalanced data. In: 8th International conference on computer science and education, ICCSE 2013, pp 89–93. https://doi.org/10.1109/ICCSE.2013.6553890

  • Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17:977–987

    Google Scholar 

Download references

Acknowledgements

This work was partially funded by a Discovery grant 2019-05444 from the Natural Sciences and Engineering Research Council of Canada, and by a Fundamental Research Projects grant PRF-2017-20 from the Institute for Data Valorization (IVADO) of Quebec, Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alejandro Murua-Sazo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix: Proof that an increasing sequence of thresholds gives rise to hierarchical cluster partitions

Consider s given sequence of increasing thresholds \(\theta _1< \theta _2< \cdots < \theta _m\). As mentioned in Sect. 4, the associated consolidated matrices \(C^{(1)}, \ldots , C^{(m)}\) are nested in the sense that when \(s < t\), \(C^{(s)}_{ij} \ge C^{(t)}_{ij}\) for all pairs (ij) \(s, t=1, \ldots ,m\). Let \(\mathcal {P}_s\) and \(\mathcal {P}_t\) be the connected component partitions associated with the consolidated matrices \(C^{(s)}\) and \(C^{(t)}\), respectively. Let points \(x_i\) and \(x_j\) be in the same connected component of \(\mathcal {P}_t\). That is, there is a path \(x_i \mapsto x_{l_1} \mapsto \cdots \mapsto x_{l_{g-1}} \mapsto x_j\) of points satisfying \(C^{(t)}_{l_h, l_{h+1}} = 1\), \(h=0, \ldots , g-1\), with \(l_0 = i,\) and \(l_{g} = j\). Since \(C^{(s)}_{ij} \ge C^{(t)}_{ij}\) for all pairs (ij), necessarily points \(x_i\) and \(x_j\) are also in the same connected component of \(\mathcal {P}_s\). The converse is not true. So suppose that points \(x_i\) and \(x_j\) are in the same connected component, say \(\mathcal {C}_\ell \) of \(\mathcal {P}_s\). That is, there is a path \(x_i \mapsto x_{l_1} \mapsto \cdots \mapsto x_{l_{g-1}} \mapsto x_j\) of points satisfying \(C^{(s)}_{l_h, l_{h+1}} = 1\), \(h=0, \ldots , g-1\), with \(l_0 = i,\) and \(l_{g} = j\). Suppose as well that for some \(h\in \{0, \ldots , g-1\}\), \(C^{(t)}_{l_h, l_{h+1}} = 0\). Then the proportion of times the pair \((x_i, x_j)\) appear together is at least \(\theta _s\), but less than \(\theta _t\). Points \(x_i\) and \(x_j\) are necessarily in different connected components of \(\mathcal {P}_t\). This means that the component \(\mathcal {C}_\ell \in \mathcal {P}_s\) has split in several connected components isolated from all other components of \(\mathcal {P}_s\). In this sense the sequence of thresholds \(\theta _1< \theta _2< \cdots < \theta _m\), leads to a sequence of partitions satisfying \(\mathcal {P}_1 \supseteq \mathcal {P}_2 \supseteq \cdots \supseteq \mathcal {P}_m\), where \(\mathcal {P}_s \supseteq \mathcal {P}_t\) means that each connected component in \(\mathcal {P}_t\) arises from a connected component of \(\mathcal {P}_s\), and the number of connected components of \(\mathcal {P}_t\) is larger or equal to the number of connected components of \(\mathcal {P}_s\).

Appendix: Scattering formulas in the transformed space: using the kernel trick

Consider the RBF kernel \(\kappa (x_i,x_j)\) defined by (3). The mean scattering induced by the kernel on the data (Vert et al. 2004; Fan et al. 2010) is defined as

$$\begin{aligned} V_{\mathcal {S}} = \frac{1}{n}\sum _{i=1}^{n}\left\{ \kappa (x_i,x_i)-\sum _{j=1}^{n}\frac{2\, \kappa (x_i,x_j)}{n}+\sum _{j, \ell =1}^{n}\frac{\kappa (x_j,x_\ell )}{n^2} \right\} ^{\frac{1}{2}}. \end{aligned}$$

This formula is derived using the kernel trick which links kernel evaluations with inner products between transformed points in a high-dimensional space. It can be shown that this quantity corresponds to the average distance between the transformed points and the mean of the transformed points. Let \(\mathcal {V}= \{\mathcal {V}_1, \ldots , \mathcal {V}_K\}\) be a clustering configuration with K clusters. Also, let \(n_k\) be the size of cluster \(\mathcal {V}_k\), and let \(J_k = \{ i : x_i \in \mathcal {V}_k\}\), \(k=1,\ldots ,K.\) As with the mean scattering, Fan et al. (2010) define similarly, the kernel induced scattering within each cluster \(\mathcal {V}_k\),

$$\begin{aligned} W_{\mathcal {V}_k} = \frac{1}{n_k}\sum _{i\in J_k} \left\{ \kappa (x_i,x_i)-\sum _{j\in J_k}\frac{2\, \kappa (x_i,x_j)}{n_k}+\sum _{j, \ell \in J_k} \frac{\kappa (x_j,x_\ell )}{n_k^2} \right\} ^{\frac{1}{2}} \end{aligned}$$

which can be seen as the average distance between the mean of the clusters \(\mathcal {V}_k\) in the transformed space and the transformed points of cluster \(\mathcal {V}_k\). Finally, to measure the total scattering between clusters, one considers the distance between clusters in the transformed space,

$$\begin{aligned} B^2(\mathcal {V}_i,\mathcal {V}_j) = \bigl [ B(\mathcal {V}_i,\mathcal {V}_j) \bigr ]^2&= \sum _{i, j \in J_i} \frac{\kappa (x_i,x_j)}{n_i^2} - \sum _{i\in J_i}\sum _{j\in J_j}\frac{2 \kappa (x_i,x_j)}{n_in_j} +\sum _{i, j\in J_j} \frac{\kappa (x_i,x_j)}{n_j^2}. \end{aligned}$$

Appendix: Scheme to sample \(\varvec{Y}\sim DPP_{\mathcal {S}}(L)\)

Below \(e_i\) represents the ith standard basis vector, a n-dimensional vector that is all zeros except for a one in the ith position. The span of a given finite collection of vectors V, denoted by \({{\,\mathrm{span}\,}}(V)\), is the set containing of all linear combinations of the vectors in V, i.e., \({{\,\mathrm{span}\,}}(V)=\left\{ \sum _{v\in V}a_v v \mid a_v\in \mathbb {R}\right\} \).

figure b

Appendix: Sensitivity analysis on kernel choice

As mentioned in Sect. 5.3, we considered three kernels: the linear kernel, \(\kappa (x_i,x_j)=x_i^Tx_j\), and two variants of the q-stable kernel (Bilodeau and Nangue 2017), also known as the q-Gaussian kernel (Verleysen and François 2005), which is given by \(\kappa (x_i, x_j)=\exp \left( -\left\Vert x_i-x_j\right\Vert ^{q}/\sigma ^q\right) \), \(q\in (0,2]\), where \(\sigma >0\) is the kernel bandwidth. The variants considered are the Laplace kernel (\(q=1\)), and the Gaussian kernel (\(q=2\)).

Fig. 5
figure 5

Mean ARIs over five runs associated with DPP, CC-PAM and CC-k-means\(++\) for different scenarios with datasets with diverse shape clusters. The horizontal axis is divided by the different powers \(\lambda \) involved in the inverse Box–Cox transformations. The results are aggregated over the three types of data sizes (\(n\in \{150, 500, 1500\}\)). The segments on top of the bars represent \(\pm 1\) Standard Error

Two estimates of the bandwidth associated with each one of the two q-stable kernels studied were considered. For the Laplace kernel, the bandwidth estimates were the mean \(\hat{\sigma }_2\) and the median \(\tilde{\sigma }_2\) of all pairwise Euclidean distances. For the Gaussian kernel, the estimates were the mean \(\hat{\sigma }^2\) and the median \(\tilde{\sigma }^2\) of all pairwise squared Euclidean distances. Note that the Gaussian kernel with bandwidth estimate \(\hat{\sigma }^2\) corresponds to our proposed DPP kernel.

We further investigated the behavior of the kernels on data whose clusters are non-Gaussian. For this simulation, we followed the setup of Sect. 5.2. Recall that the data were generated from 24 experimental conditions of a \(3^3\)-factorial design. For this study, we generated five replicas from each experimental condition. The non-ellipsoidal (non-Gaussian) clusters were obtained by applying the inverse Box–Cox transformation to the datasets, As before, we selected \(\lambda =1/3\), \(\lambda =1/2\), and \(\lambda = 2\) for the power parameter \(\lambda \) of the Box–Cox transformations. We run the consensus DPP algorithm described in Sect. 4 on each dataset with the five different kernels. The values of the consensus DPP parameters were set to \(a=0.5\), \(R=200\), and \(\tau =0.6\). To select the optimal clustering configuration, we used the kernel-based validation index KVI\(_\mathcal {V}\) given by (6). To measure its quality, we use the adjusted Rand index (see Sect. 4.2). The results are displayed in Fig. 5 in the form of barplots. Each bar represents a different kernel or consensus method. The segments on top of the bars represents ± one standard error. We also compare the results from the different kernels with those yielded by CC-PAM and CC-k-means\(++\).

For both q-stable kernels, estimating the bandwidth parameter with the median or the mean of all the pairwise distances do not produce any major difference in the ARI results.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vicente, S., Murua-Sazo, A. Determinantal consensus clustering. Adv Data Anal Classif 17, 829–858 (2023). https://doi.org/10.1007/s11634-022-00514-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-022-00514-6

Keywords

Mathematics Subject Classification

Navigation