Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Contamination transformation matrix mixture modeling for skewed data groups with heavy tails and scatter

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Model-based clustering is a popular application of the rapidly developing area of finite mixture modeling. While there is ample work focusing on clustering multivariate data, an increasing number of advancements have been aiming at the expansion of existing theory to the matrix-variate framework. Matrix-variate Gaussian mixtures are most popular in this setting despite the potential misfit for skewed and heavy-tailed data. To overcome this lack of flexibility, a new contaminated transformation matrix mixture model is proposed. We illustrate its utility in a series of experiments on simulated data and apply to a real-life data set containing COVID-related information. The performance of the developed model is promising in all considered settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Second international symposium on information theory, pp. 267–281

  • Akdemir D, Gupta A (2010) A matrix variate skew distribution. Eur J Pure Appl Math 3:128–140

    MathSciNet  Google Scholar 

  • Anderlucci L, Viroli C (2015) Covariance pattern mixture models for the analysis of multivariate heterogeneous longitudinal data. Ann Appl Stat 9:777–800

    Article  MathSciNet  Google Scholar 

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821

    Article  MathSciNet  Google Scholar 

  • Biernacki C, Celeux G, Gold EM (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22:719–725

    Article  Google Scholar 

  • Box GE, Cox DR (1964) An analysis of transformations. J Roy Stat Soc B 26(2):211–252

    Google Scholar 

  • Cabral C, Lachos V, Prates M (2012) Multivariate mixture modeling using skew-normal independent distributions. Comput Stat Data Anal 56(1):126–142

    Article  MathSciNet  Google Scholar 

  • Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793

    Article  ADS  Google Scholar 

  • Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture. J Classif 13:195–212

    Article  MathSciNet  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood for incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39(1):1–38

    Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631

    Article  MathSciNet  Google Scholar 

  • Frühwirth-Schnatter S, Pyne S (2010) Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11:317–336

    Article  PubMed  Google Scholar 

  • Gallaugher MPB, McNicholas PD (2018) Finite mixtures of skewed matrix variate distributions. Pattern Recognit 80:83–93

    Article  ADS  Google Scholar 

  • Gallaugher MPB, McNicholas PD (2019) Three skewed matrix variate distributions. Stat Probab Lett 145:103–109

    Article  MathSciNet  Google Scholar 

  • Gallaugher MPB, McNicholas PD (2020) Mixtures of skewed matrix variate bilinear factor analyzers. Adv Data Anal Classif 14:415–434

    Article  MathSciNet  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. John Wiley and Sons Inc, New York

    Book  Google Scholar 

  • McNicholas PD (2017) Mixture model-based classification. CRC Press, London

    Google Scholar 

  • Melnykov V, Zhu X (2018) On model-based clustering of skewed matrix data. J Multivar Anal 167:181–194

    Article  MathSciNet  Google Scholar 

  • Melnykov V, Zhu X (2019) Studying crime trends in the USA over the years 2000–2012. Adv Data Anal Classif 13:325–341

    Article  MathSciNet  Google Scholar 

  • Melnykov Y, Zhu X, Melnykov V (2021) Transformation mixture modeling for skewed data groups with heavy tails and scatter. Comput Stat 36:61–78

    Article  MathSciNet  Google Scholar 

  • Morris K, Punzo A, McNicholas P, Browne R (2019) Asymmetric clusters and outliers: mixtures of multivariate contaminated shifted asymmetric Laplace distributions. Comput Stat Data Anal 132:145–156

    Article  MathSciNet  Google Scholar 

  • O’Hagan A, Murphy T, Gormley I, McNicholas P, Karlis D (2016) Clustering with the multivariate normal inverse Gaussian distribution. Comput Stat Data Anal 93:18–30

    Article  MathSciNet  Google Scholar 

  • Punzo A, McNicholas P (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 58:1506–1537

    Article  MathSciNet  PubMed  Google Scholar 

  • Schwarz G (1978) Estimating the dimensions of a model. Ann Stat 6(2):461–464

    Article  MathSciNet  Google Scholar 

  • Tomarchio SD, Gallaugher MPB, Punzo A, McNicholas PD (2022) Mixtures of contaminated matrix-variate normal distributions. J Comput Graph Stat 31:413–421

    Article  Google Scholar 

  • Tortora C, Browne RP, ElSherbiny A, Franczak BC, McNicholas PD (2021) Model-based clustering, classification, and discriminant analysis using the generalized hyperbolic distribution: MixGHD R package. J Stat Softw 98:1–24

    Article  Google Scholar 

  • Viroli C (2011) Finite mixtures of matrix normal distributions for classifying three-way data. Stat Comput 21:511–522

    Article  MathSciNet  Google Scholar 

  • Viroli C (2011) Model based clustering for three-way data structures. Bayesian Anal 6:573–602

    Article  MathSciNet  Google Scholar 

  • Yeo I-K, Johnson RA (2000) A new family of power transformations to improve normality or symmetry. Biometrika 87:954–959

    Article  MathSciNet  Google Scholar 

  • Zhu X, Melnykov V (2018) Manly transformation in finite mixture modeling. Comput Stat Data Anal 121:190–208

    Article  MathSciNet  Google Scholar 

  • Zhu X, Sarkar S, Melnykov V (2022) MatTransMix: an R package for matrix model-based clustering and parsimonious mixture modeling. J Classif 39(1):147–170

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yana Melnykov.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Melnykov, Y. & Kolomoytseva, A.S. Contamination transformation matrix mixture modeling for skewed data groups with heavy tails and scatter. Adv Data Anal Classif 18, 85–101 (2024). https://doi.org/10.1007/s11634-023-00550-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-023-00550-w

Keywords

Mathematics Subject Classification

Navigation