On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

Salvatore D. Tomarchio⁵,
Antonio Punzo⁵ &
Luca Bagnato⁶

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 406))

Included in the following conference series:

Convegno della Società Italiana di Statistica

465 Accesses

Abstract

Recent advances in the matrix-variate model-based clustering literature have shown the growing interest for this kind of data modelization. In this framework, finite mixture models constitute a powerful clustering technique, despite the fact that they tend to suffer from overparameterization problems because of the high number of parameters to be estimated. To cope with this issue, parsimonious matrix-variate normal mixtures have been recently proposed in the literature. However, for many real phenomena, the tails of the mixture components of such models are lighter than required, with a direct effect on the corresponding fitting results. Thus, in this paper we introduce a family of 196 parsimonious mixture models based on the matrix-variate tail-inflated normal distribution, an elliptical heavy-tailed generalization of the matrix-variate normal distribution. Parsimony is reached by applying the well-known eigen-decomposition of the component scale matrices, as well as by allowing the tailedness parameters of the mixture components to be tied across groups. An AECM algorithm for parameter estimation is presented. The proposed models are then fitted to simulated and real data. Comparisons with parsimonious matrix-variate normal mixtures are also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Family of Parsimonious Matrix-Variate Mixture Models for Heavy-Tailed Data

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Article 14 January 2022

Parsimonious Finite Mixtures of Matrix-Variate Regressions

References

Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41(3–4), 561–575 (2003)
Article MathSciNet MATH Google Scholar
Browne, R.P., McNicholas, P.D.: Estimating common principal components in high dimensions. Adv. Data Anal. Classific. 8(2), 217–226 (2014)
Article MathSciNet MATH Google Scholar
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognit. 28(5), 781–793 (1995)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)
MathSciNet MATH Google Scholar
Doğru, F.Z., Bulut, Y.M., Arslan, O.: Finite mixtures of matrix variate t distributions. Gazi Univ. J. Sci. 29(2), 335–341 (2016)
Google Scholar
Farcomeni, A., Punzo, A.: Robust model-based clustering with mild and gross outliers. Test 29(4), 989–1007 (2020)
Article MathSciNet MATH Google Scholar
Gallaugher, M.P.B., McNicholas, P.D.: Finite mixtures of skewed matrix variate distributions. Pattern Recognit. 80, 83–93 (2018)
Article Google Scholar
Gupta, A.K., Varga, T., Bodnar, T.: Elliptically Contoured Models in Statistics and Portfolio Theory. Springer, New York (2013)
Book MATH Google Scholar
Leisch, F.: Flexmix: a general framework for finite mixture models and latent glass regression in R. J. Stat. Softw. 11(8), 1–18 (2004)
Article Google Scholar
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley (2007)
Google Scholar
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)
Book MATH Google Scholar
Melnykov, V., Melnykov, I.: Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Comput. Stat. Data Anal. 56(6), 1381–1395 (2012)
Article MathSciNet MATH Google Scholar
Melnykov, V., Zhu, X.: On model-based clustering of skewed matrix data. J. Multivar. Anal. 167, 181–194 (2018)
Article MathSciNet MATH Google Scholar
Melnykov, V., Zhu, X.: Studying crime trends in the USA over the years 2000–2012. Adv. Data Anal. Classific. 13(1), 325–341 (2019)
Article MathSciNet MATH Google Scholar
Meng, X.L., Van Dyk, D.: The EM algorithm-an old folk-song sung to a fast new tune. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 59(3), 511–567 (1997)
Article MathSciNet MATH Google Scholar
Michael, S., Melnykov, V.: An effective strategy for initializing the EM algorithm in finite mixture models. Adv. Data Anal. Classific. 10(4), 563–583 (2016)
Article MathSciNet MATH Google Scholar
Sarkar, S., Zhu, X., Melnykov, V., Ingrassia, S.: On parsimonious models for modeling matrix data. Comput. Stat. Data Anal. 142, 106822 (2020)
Article MathSciNet MATH Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MathSciNet MATH Google Scholar
Tomarchio, S.D., Punzo, A., Bagnato, L.: Two new matrix-variate distributions with application in model-based clustering. Comput. Stat. Data Anal. 152, 107050 (2020)
Article MathSciNet MATH Google Scholar
Tomarchio, S.D., Gallaugher, M.P.B., Punzo, A., McNicholas, P.D.: Mixtures of matrix-variate contaminated normal distributions. J. Comput. Graph. Stat. 31(2), 413–421 (2022)
Article MathSciNet MATH Google Scholar
Tomarchio, S.D., McNicholas, P.D., Punzo, A.: Matrix normal cluster-weighted models. J. Classific. 38(3), 556–575 (2021)
Article MathSciNet MATH Google Scholar
Viroli, C.: Finite mixtures of matrix normal distributions for classifying three-way data. Stat. Comput. 21(4), 511–522 (2011)
Article MathSciNet MATH Google Scholar
Viroli, C.: Model based clustering for three-way data structures. Bayesian Anal. 6(4), 573–602 (2011)
Article MathSciNet MATH Google Scholar
Zhu, X., Melnykov V.: MatTransMix: an R package for clustering matrices. R package version 0.1.15 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Università degli Studi di Catania, Dipartimento di Economia e Impresa, Catania, Italia
Salvatore D. Tomarchio & Antonio Punzo
Università Cattolica del Sacro Cuore, Dipartimento di Scienze Economiche e Sociali, Piacenza, Italia
Luca Bagnato

Authors

Salvatore D. Tomarchio
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Punzo
View author publications
You can also search for this author in PubMed Google Scholar
Luca Bagnato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salvatore D. Tomarchio .

Editor information

Editors and Affiliations

Department of Economics and Management, University of Pisa, Pisa, Italy
Nicola Salvati
Department of Economics and Statistics, University of Salerno, Fisciano, Salerno, Italy
Cira Perna
Department of Economics and Management, University of Pisa, Pisa, Italy
Stefano Marchetti
School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, Australia
Raymond Chambers

Appendices

Appendix A

Let $\ddot{\textbf{V}}=\displaystyle \sum _{g=1}^G \ddot{\textbf{V}}_g$, where $\ddot{\textbf{V}}_g = \displaystyle \sum _{i=1}^N \ddot{z}_{ig}\ddot{w}_{ig}\left( \textbf{X}_{i}-\ddot{\textbf{M}}_g\right) \dot{\boldsymbol{\Psi }}_g^{-1}\left( \textbf{X}_{i}-\ddot{\textbf{M}}_g\right) '$. Then, we have the following updates:

Model EII
$$\begin{aligned} {\ddot{\lambda }} = \frac{{{\,\text{ tr }}\left\{ \ddot{\textbf{V}}\right\} }}{prN}; \end{aligned}$$
Model VII
$$\begin{aligned} \ddot{\lambda }_g = \frac{{{\,\text{ tr }}\left\{ \ddot{\textbf{V}}_g\right\} }}{pr \displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
Model EEI
$$\begin{aligned} \ddot{\boldsymbol{\Delta }} =\frac{\text {diag}\left( \ddot{\textbf{V}}\right) }{ \left| \text {diag}\left( \ddot{\textbf{V}}\right) \right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda } = \frac{\left| \text {diag}\left( \ddot{\textbf{V}}\right) \right| ^\frac{1}{p}}{rN}; \end{aligned}$$
Model VEI
$$\begin{aligned} \ddot{\boldsymbol{\Delta }} = \frac{\text {diag}\left( \displaystyle \sum \limits _{g=1}^G \dot{\lambda }_g^{-1}\ddot{\textbf{V}}_g\right) }{\left| \text {diag}\left( \displaystyle \sum \limits _{g=1}^G \dot{\lambda }_g^{-1}\ddot{\textbf{V}}_g\right) \right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda }_g = \frac{{{\,\text{ tr }}\left\{ \ddot{\boldsymbol{\Delta }}^{-1} \ddot{\textbf{V}}_g \right\} }}{pr\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
Model EVI
$$\begin{aligned} \ddot{\boldsymbol{\Delta }}_g = \frac{\text {diag}\left( \ddot{\textbf{V}}_g\right) }{\left| \text {diag}\left( \ddot{\textbf{V}}_g\right) \right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda } = \frac{\displaystyle \sum \limits _{g=1}^G\left| \text {diag}\left( \ddot{\textbf{V}}_g\right) \right| ^\frac{1}{p}}{rN}; \end{aligned}$$
Model VVI
$$\begin{aligned} \ddot{\boldsymbol{\Delta }}_g = \frac{\text {diag}\left( \ddot{\textbf{V}}_g\right) }{\left| \text {diag}\left( \ddot{\textbf{V}}_g\right) \right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda }_g = \frac{\left| \text {diag}\left( \ddot{\textbf{V}}_g\right) \right| ^\frac{1}{p}}{r\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
Model EEE
$$\begin{aligned} \ddot{\boldsymbol{\Sigma }}= \frac{\ddot{\textbf{V}}}{rN}; \end{aligned}$$
Model VEE
$$\begin{aligned} \ddot{\boldsymbol{\Gamma }}\ddot{\boldsymbol{\Delta }}\ddot{\boldsymbol{\Gamma }}' = \frac{\displaystyle \sum \limits _{g=1}^G \dot{\lambda }_g^{-1}\ddot{\textbf{V}}_g}{\left| \displaystyle \sum \limits _{g=1}^G \dot{\lambda }_g^{-1}\ddot{\textbf{V}}_g\right| ^{\frac{1}{p}}} \quad \text {and} \quad \ddot{\lambda }_g = \frac{{{\,\text{ tr }}\left\{ (\ddot{\boldsymbol{\Gamma }}\ddot{\boldsymbol{\Delta }}\ddot{\boldsymbol{\Gamma }}')^{-1} \ddot{\textbf{V}}_g \right\} }}{pr\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
Model EVE

For this model, there is no analytical solution for $\boldsymbol{\Gamma }$. Thus, an iterative minorization-maximization (MM) algorithm [2] is implemented. Specifically, the following surrogate function is defined
$$\begin{aligned} f\left( \boldsymbol{\Gamma }\right) = \sum \limits _{g=1}^G \,\text{ tr }\left\{ \textbf{V}_g\boldsymbol{\Gamma }\boldsymbol{\Delta }_{k}^{-1}\boldsymbol{\Gamma }'\right\} \le S + \,\text{ tr }\left\{ \dot{\textbf{F}}\boldsymbol{\Gamma }\right\} , \end{aligned}$$
where S is a constant and $\dot{\textbf{F}} = \displaystyle \sum _{g=1}^G\left( \boldsymbol{\Delta }_{k}^{-1} \dot{\boldsymbol{\Gamma }}' \textbf{V}_g - e_g \boldsymbol{\Delta }_{k}^{-1} \dot{\boldsymbol{\Gamma }}'\right) $, with $e_g$ being the largest eigenvalue of $\textbf{V}_g$. The update of $\boldsymbol{\Gamma }$ is given by $\ddot{\boldsymbol{\Gamma }} = \dot{\textbf{G}} \dot{\textbf{H}} '$, where $\dot{\textbf{G}}$ and $\dot{\textbf{H}}$ are obtained from the singular value decomposition of $\dot{\textbf{F}}$. This process is repeated until a specified convergence criterion is met and the estimate $\ddot{\boldsymbol{\Gamma }}$ is obtained from the last iteration. Then, we obtain
$$\begin{aligned} {\ddot{{\boldsymbol{\Delta }}}}_{g} = \frac{{\text {diag}}\left( {\ddot{{\boldsymbol{\Gamma }}}'} {\ddot{{\textbf{V}}}}_{g} {\ddot{{\boldsymbol{\Gamma }}}}\right) }{{\left| {\text {diag}}\left( {\ddot{{\boldsymbol{\Gamma }}}'}{\ddot{{\textbf{V}}}}_{g} {\ddot{{\boldsymbol{\Gamma }}}}\right) \right| ^{\frac{1}{p}}}} \quad \text {and} \quad {\ddot{\lambda }} = \frac{\sum \limits _{g=1}^{G}{\text {tr}}\left( \ddot{\boldsymbol{\Gamma }} \ddot{\boldsymbol{\Delta }}_{g}^{-1} \ddot{\boldsymbol{\Gamma }}'\ddot{\textbf{V}}_{g}\right) }{prN}; \end{aligned}$$
Model VVE

Similarly to the EVE case, there is no analytical solution for $\boldsymbol{\Gamma }$, and the MM algorithm described above is implemented. Then, we have
$$\begin{aligned} \ddot{\boldsymbol{\Delta }}_g = \frac{\text {diag}\left( \ddot{\boldsymbol{\Gamma }}' \ddot{\textbf{V}}_g \ddot{\boldsymbol{\Gamma }}\right) }{\left| \text {diag}\left( \ddot{\boldsymbol{\Gamma }}' \ddot{\textbf{V}}_g \ddot{\boldsymbol{\Gamma }}\right) \right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda }_g = \frac{\left| \text {diag}\left( \ddot{\boldsymbol{\Gamma }}' \ddot{\textbf{V}}_g \ddot{\boldsymbol{\Gamma }}\right) \right| ^{\frac{1}{p}}}{r\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
Model EEV

Consider the eigen-decomposition $\textbf{V}_g=\textbf{L}_g \textbf{D}_g \textbf{L}_g'$, with eigenvalues in the diagonal matrix $\textbf{D}_g$ following descending order and orthogonal matrix $\textbf{L}_g$ composed of the corresponding eigenvectors. Then, we obtain
$$\begin{aligned} \ddot{\boldsymbol{\Gamma }}_g=\ddot{\textbf{L}}_g , \quad \ddot{\boldsymbol{\Delta }} = \frac{\displaystyle \sum \limits _{g=1}^G \ddot{\textbf{D}}_g}{\left| \displaystyle \sum \limits _{g=1}^G \ddot{\textbf{D}}_g\right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda } = \frac{\left| \displaystyle \sum \limits _{g=1}^G \ddot{\textbf{D}}_g\right| ^\frac{1}{p}}{rN}; \end{aligned}$$
Model VEV

By using the same algorithm applied for the EEV model, we have
$$\begin{aligned} \ddot{\boldsymbol{\Gamma }}_g=\ddot{\textbf{L}}_g , \quad \ddot{\boldsymbol{\Delta }} = \frac{\displaystyle \sum \limits _{g=1}^G \lambda _g^{-1} \ddot{\textbf{D}}_g}{\left| \displaystyle \sum \limits _{g=1}^G \lambda _g^{-1} \ddot{\textbf{D}}_g\right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda }_g = \frac{{{\,\text{ tr }}\left\{ \ddot{\textbf{D}}_g \ddot{\boldsymbol{\Delta }}^{-1} \right\} }}{pr\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
Model EVV
$$\begin{aligned} \ddot{\boldsymbol{\Gamma }}_g\ddot{\boldsymbol{\Delta }}_g\ddot{\boldsymbol{\Gamma }}_g' = \frac{\ddot{\textbf{V}}_g}{\left| \ddot{\textbf{V}}_g\right| ^{\frac{1}{p}}} \quad \text {and} \quad \ddot{\lambda } = \frac{\displaystyle \sum \limits _{g=1}^G \left| \ddot{\textbf{V}}_g\right| ^{\frac{1}{p}}}{rN}; \end{aligned}$$
Model VVV
$$\begin{aligned} \ddot{\boldsymbol{\Sigma }}_g = \frac{\ddot{\textbf{V}}_g}{r\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}. \end{aligned}$$

Appendix B

Let $\ddot{\textbf{W}}=\displaystyle \sum _{g=1}^G \ddot{\textbf{W}}_g$, where $\ddot{\textbf{W}}_g = \displaystyle \sum _{i=1}^N \ddot{z}_{ig}\ddot{w}_{ig}\left( \textbf{X}_{i}-\ddot{\textbf{M}}_{g}\right) '\ddot{\boldsymbol{\Sigma }}_g^{-1}\left( \textbf{X}_{i}-\ddot{\textbf{M}}_{g}\right) $. With the exclusion of the II model, for which no parameters need to be estimated, we have the following updates:

Model EI
$$\begin{aligned} \ddot{\boldsymbol{\Delta }} = \frac{\text {diag}\left( \ddot{\textbf{W}}\right) }{\left| \text {diag}\left( \ddot{\textbf{W}}\right) \right| ^\frac{1}{r}}; \end{aligned}$$
Model VI
$$\begin{aligned} \ddot{\boldsymbol{\Delta }}_g = \frac{\text {diag}\left( \ddot{\textbf{W}}_g\right) }{\left| \text {diag}\left( \ddot{\textbf{W}}_g\right) \right| ^\frac{1}{r}}; \end{aligned}$$
Model EE
$$\begin{aligned} \ddot{\boldsymbol{\Psi }} = \frac{\ddot{\textbf{W}}}{\left| \ddot{\textbf{W}} \right| ^\frac{1}{r}}; \end{aligned}$$
Model VE

As for the EVE and VVE models, there is no analytical solution for $\boldsymbol{\Gamma }$ and a MM algorithm of the type described for the EVE model is implemented, after replacing $\textbf{V}$ with $\textbf{W}$. Then, we have
$$\begin{aligned} \ddot{\boldsymbol{\Delta }}_g= \frac{\text {diag}\left( \ddot{\boldsymbol{\Gamma }}' \ddot{\textbf{W}}_g \ddot{\boldsymbol{\Gamma }}\right) }{\left| \text {diag}\left( \ddot{\boldsymbol{\Gamma }}' \ddot{\textbf{W}}_g \ddot{\boldsymbol{\Gamma }}\right) \right| ^\frac{1}{r}}; \end{aligned}$$
Model EV

By using the same approach of the EEV and VEV models, after replacing $\ddot{\textbf{V}}$ with $\ddot{\textbf{W}}$, we have
$$\begin{aligned} \ddot{\boldsymbol{\Gamma }}_g=\ddot{\textbf{L}}_g \quad \text {and} \quad \ddot{\boldsymbol{\Delta }} = \frac{\displaystyle \sum \limits _{g=1}^G \ddot{\textbf{D}}_g}{\left| \displaystyle \sum \limits _{g=1}^G \ddot{\textbf{D}}_g\right| ^\frac{1}{r}}; \end{aligned}$$
Model VV
$$\begin{aligned} \ddot{\boldsymbol{\Psi }}_g = \frac{\ddot{\textbf{W}}_g}{\left| \ddot{\textbf{W}}_g\right| ^\frac{1}{r}}. \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tomarchio, S.D., Punzo, A., Bagnato, L. (2022). On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling. In: Salvati, N., Perna, C., Marchetti, S., Chambers, R. (eds) Studies in Theoretical and Applied Statistics . SIS 2021. Springer Proceedings in Mathematics & Statistics, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-031-16609-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-16609-9_24
Published: 15 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16608-2
Online ISBN: 978-3-031-16609-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Family of Parsimonious Matrix-Variate Mixture Models for Heavy-Tailed Data

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Parsimonious Finite Mixtures of Matrix-Variate Regressions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix A

Appendix B

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Family of Parsimonious Matrix-Variate Mixture Models for Heavy-Tailed Data

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Parsimonious Finite Mixtures of Matrix-Variate Regressions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix A

Appendix B

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation