Abstract
Model-based clustering is a popular application of the rapidly developing area of finite mixture modeling. While there is ample work focusing on clustering multivariate data, an increasing number of advancements have been aiming at the expansion of existing theory to the matrix-variate framework. Matrix-variate Gaussian mixtures are most popular in this setting despite the potential misfit for skewed and heavy-tailed data. To overcome this lack of flexibility, a new contaminated transformation matrix mixture model is proposed. We illustrate its utility in a series of experiments on simulated data and apply to a real-life data set containing COVID-related information. The performance of the developed model is promising in all considered settings.
Similar content being viewed by others
References
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Second international symposium on information theory, pp. 267–281
Akdemir D, Gupta A (2010) A matrix variate skew distribution. Eur J Pure Appl Math 3:128–140
Anderlucci L, Viroli C (2015) Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data. Ann Appl Stat 9:777–800
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821
Biernacki C, Celeux G, Gold EM (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22:719–725
Box GE, Cox DR (1964) An analysis of transformations. J Roy Stat Soc B 26(2):211–252
Cabral C, Lachos V, Prates M (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56(1):126–142
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793
Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture. J Classif 13:195–212
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood for incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39(1):1–38
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631
Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11:317–336
Gallaugher MPB, McNicholas PD (2018) Finite mixtures of skewed matrix variate distributions. Pattern Recognit 80:83–93
Gallaugher MPB, McNicholas PD (2019) Three skewed matrix variate distributions. Stat Probab Lett 145:103–109
Gallaugher MPB, McNicholas PD (2020) Mixtures of skewed matrix variate bilinear factor analyzers. Adv Data Anal Classif 14:415–434
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
McLachlan GJ, Peel D (2000) Finite mixture models. John Wiley and Sons Inc, New York
McNicholas PD (2017) Mixture model-based classification. CRC Press, London
Melnykov V, Zhu X (2018) On model-based clustering of skewed matrix data. J Multivar Anal 167:181–194
Melnykov V, Zhu X (2019) Studying crime trends in the USA over the years 2000–2012. Adv Data Anal Classif 13:325–341
Melnykov Y, Zhu X, Melnykov V (2021) Transformation mixture modeling for skewed data groups with heavy tails and scatter. Comput Stat 36:61–78
Morris K, Punzo A, McNicholas P, Browne R (2019) Asymmetric clusters and outliers: mixtures of multivariate contaminated shifted asymmetric Laplace distributions. Comput Stat Data Anal 132:145–156
O’Hagan A, Murphy T, Gormley I, McNicholas P, Karlis D (2016) Clustering with the multivariate normal inverse Gaussian distribution. Comput Stat Data Anal 93:18–30
Punzo A, McNicholas P (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 58:1506–1537
Schwarz G (1978) Estimating the dimensions of a model. Ann Stat 6(2):461–464
Tomarchio SD, Gallaugher MPB, Punzo A, McNicholas PD (2022) Mixtures of contaminated matrix-variate normal distributions. J Comput Graph Stat 31:413–421
Tortora C, Browne RP, ElSherbiny A, Franczak BC, McNicholas PD (2021) Model-based clustering, classification, and discriminant analysis using the generalized hyperbolic distribution: MixGHD R package. J Stat Softw 98:1–24
Viroli C (2011) Finite mixtures of matrix normal distributions for classifying three-way data. Stat Comput 21:511–522
Viroli C (2011) Model based clustering for three-way data structures. Bayesian Anal 6:573–602
Yeo I-K, Johnson RA (2000) A new family of power transformations to improve normality or symmetry. Biometrika 87:954–959
Zhu X, Melnykov V (2018) Manly transformation in finite mixture modeling. Comput Stat Data Anal 121:190–208
Zhu X, Sarkar S, Melnykov V (2022) MatTransMix: an R package for matrix model-based clustering and parsimonious mixture modeling. J Classif 39(1):147–170
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, X., Melnykov, Y. & Kolomoytseva, A.S. Contamination transformation matrix mixture modeling for skewed data groups with heavy tails and scatter. Adv Data Anal Classif 18, 85–101 (2024). https://doi.org/10.1007/s11634-023-00550-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-023-00550-w