Nothing Special   »   [go: up one dir, main page]

Skip to main content

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

  • Conference paper
  • First Online:
Studies in Theoretical and Applied Statistics (SIS 2021)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 406))

Included in the following conference series:

  • 465 Accesses

Abstract

Recent advances in the matrix-variate model-based clustering literature have shown the growing interest for this kind of data modelization. In this framework, finite mixture models constitute a powerful clustering technique, despite the fact that they tend to suffer from overparameterization problems because of the high number of parameters to be estimated. To cope with this issue, parsimonious matrix-variate normal mixtures have been recently proposed in the literature. However, for many real phenomena, the tails of the mixture components of such models are lighter than required, with a direct effect on the corresponding fitting results. Thus, in this paper we introduce a family of 196 parsimonious mixture models based on the matrix-variate tail-inflated normal distribution, an elliptical heavy-tailed generalization of the matrix-variate normal distribution. Parsimony is reached by applying the well-known eigen-decomposition of the component scale matrices, as well as by allowing the tailedness parameters of the mixture components to be tied across groups. An AECM algorithm for parameter estimation is presented. The proposed models are then fitted to simulated and real data. Comparisons with parsimonious matrix-variate normal mixtures are also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41(3–4), 561–575 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  2. Browne, R.P., McNicholas, P.D.: Estimating common principal components in high dimensions. Adv. Data Anal. Classific. 8(2), 217–226 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  3. Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognit. 28(5), 781–793 (1995)

    Article  Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  5. Doğru, F.Z., Bulut, Y.M., Arslan, O.: Finite mixtures of matrix variate t distributions. Gazi Univ. J. Sci. 29(2), 335–341 (2016)

    Google Scholar 

  6. Farcomeni, A., Punzo, A.: Robust model-based clustering with mild and gross outliers. Test 29(4), 989–1007 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  7. Gallaugher, M.P.B., McNicholas, P.D.: Finite mixtures of skewed matrix variate distributions. Pattern Recognit. 80, 83–93 (2018)

    Article  Google Scholar 

  8. Gupta, A.K., Varga, T., Bodnar, T.: Elliptically Contoured Models in Statistics and Portfolio Theory. Springer, New York (2013)

    Book  MATH  Google Scholar 

  9. Leisch, F.: Flexmix: a general framework for finite mixture models and latent glass regression in R. J. Stat. Softw. 11(8), 1–18 (2004)

    Article  Google Scholar 

  10. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley (2007)

    Google Scholar 

  11. McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  12. Melnykov, V., Melnykov, I.: Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Comput. Stat. Data Anal. 56(6), 1381–1395 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  13. Melnykov, V., Zhu, X.: On model-based clustering of skewed matrix data. J. Multivar. Anal. 167, 181–194 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  14. Melnykov, V., Zhu, X.: Studying crime trends in the USA over the years 2000–2012. Adv. Data Anal. Classific. 13(1), 325–341 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  15. Meng, X.L., Van Dyk, D.: The EM algorithm-an old folk-song sung to a fast new tune. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 59(3), 511–567 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  16. Michael, S., Melnykov, V.: An effective strategy for initializing the EM algorithm in finite mixture models. Adv. Data Anal. Classific. 10(4), 563–583 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  17. Sarkar, S., Zhu, X., Melnykov, V., Ingrassia, S.: On parsimonious models for modeling matrix data. Comput. Stat. Data Anal. 142, 106822 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  18. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  19. Tomarchio, S.D., Punzo, A., Bagnato, L.: Two new matrix-variate distributions with application in model-based clustering. Comput. Stat. Data Anal. 152, 107050 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  20. Tomarchio, S.D., Gallaugher, M.P.B., Punzo, A., McNicholas, P.D.: Mixtures of matrix-variate contaminated normal distributions. J. Comput. Graph. Stat. 31(2), 413–421 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  21. Tomarchio, S.D., McNicholas, P.D., Punzo, A.: Matrix normal cluster-weighted models. J. Classific. 38(3), 556–575 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  22. Viroli, C.: Finite mixtures of matrix normal distributions for classifying three-way data. Stat. Comput. 21(4), 511–522 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  23. Viroli, C.: Model based clustering for three-way data structures. Bayesian Anal. 6(4), 573–602 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  24. Zhu, X., Melnykov V.: MatTransMix: an R package for clustering matrices. R package version 0.1.15 (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salvatore D. Tomarchio .

Editor information

Editors and Affiliations

Appendices

Appendix A

Let \(\ddot{\textbf{V}}=\displaystyle \sum _{g=1}^G \ddot{\textbf{V}}_g\), where \(\ddot{\textbf{V}}_g = \displaystyle \sum _{i=1}^N \ddot{z}_{ig}\ddot{w}_{ig}\left( \textbf{X}_{i}-\ddot{\textbf{M}}_g\right) \dot{\boldsymbol{\Psi }}_g^{-1}\left( \textbf{X}_{i}-\ddot{\textbf{M}}_g\right) '\). Then, we have the following updates:

  • Model EII

    $$\begin{aligned} {\ddot{\lambda }} = \frac{{{\,\text{ tr }}\left\{ \ddot{\textbf{V}}\right\} }}{prN}; \end{aligned}$$
  • Model VII

    $$\begin{aligned} \ddot{\lambda }_g = \frac{{{\,\text{ tr }}\left\{ \ddot{\textbf{V}}_g\right\} }}{pr \displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
  • Model EEI

    $$\begin{aligned} \ddot{\boldsymbol{\Delta }} =\frac{\text {diag}\left( \ddot{\textbf{V}}\right) }{ \left| \text {diag}\left( \ddot{\textbf{V}}\right) \right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda } = \frac{\left| \text {diag}\left( \ddot{\textbf{V}}\right) \right| ^\frac{1}{p}}{rN}; \end{aligned}$$
  • Model VEI

    $$\begin{aligned} \ddot{\boldsymbol{\Delta }} = \frac{\text {diag}\left( \displaystyle \sum \limits _{g=1}^G \dot{\lambda }_g^{-1}\ddot{\textbf{V}}_g\right) }{\left| \text {diag}\left( \displaystyle \sum \limits _{g=1}^G \dot{\lambda }_g^{-1}\ddot{\textbf{V}}_g\right) \right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda }_g = \frac{{{\,\text{ tr }}\left\{ \ddot{\boldsymbol{\Delta }}^{-1} \ddot{\textbf{V}}_g \right\} }}{pr\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
  • Model EVI

    $$\begin{aligned} \ddot{\boldsymbol{\Delta }}_g = \frac{\text {diag}\left( \ddot{\textbf{V}}_g\right) }{\left| \text {diag}\left( \ddot{\textbf{V}}_g\right) \right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda } = \frac{\displaystyle \sum \limits _{g=1}^G\left| \text {diag}\left( \ddot{\textbf{V}}_g\right) \right| ^\frac{1}{p}}{rN}; \end{aligned}$$
  • Model VVI

    $$\begin{aligned} \ddot{\boldsymbol{\Delta }}_g = \frac{\text {diag}\left( \ddot{\textbf{V}}_g\right) }{\left| \text {diag}\left( \ddot{\textbf{V}}_g\right) \right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda }_g = \frac{\left| \text {diag}\left( \ddot{\textbf{V}}_g\right) \right| ^\frac{1}{p}}{r\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
  • Model EEE

    $$\begin{aligned} \ddot{\boldsymbol{\Sigma }}= \frac{\ddot{\textbf{V}}}{rN}; \end{aligned}$$
  • Model VEE

    $$\begin{aligned} \ddot{\boldsymbol{\Gamma }}\ddot{\boldsymbol{\Delta }}\ddot{\boldsymbol{\Gamma }}' = \frac{\displaystyle \sum \limits _{g=1}^G \dot{\lambda }_g^{-1}\ddot{\textbf{V}}_g}{\left| \displaystyle \sum \limits _{g=1}^G \dot{\lambda }_g^{-1}\ddot{\textbf{V}}_g\right| ^{\frac{1}{p}}} \quad \text {and} \quad \ddot{\lambda }_g = \frac{{{\,\text{ tr }}\left\{ (\ddot{\boldsymbol{\Gamma }}\ddot{\boldsymbol{\Delta }}\ddot{\boldsymbol{\Gamma }}')^{-1} \ddot{\textbf{V}}_g \right\} }}{pr\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
  • Model EVE

    For this model, there is no analytical solution for \(\boldsymbol{\Gamma }\). Thus, an iterative minorization-maximization (MM) algorithm [2] is implemented. Specifically, the following surrogate function is defined

    $$\begin{aligned} f\left( \boldsymbol{\Gamma }\right) = \sum \limits _{g=1}^G \,\text{ tr }\left\{ \textbf{V}_g\boldsymbol{\Gamma }\boldsymbol{\Delta }_{k}^{-1}\boldsymbol{\Gamma }'\right\} \le S + \,\text{ tr }\left\{ \dot{\textbf{F}}\boldsymbol{\Gamma }\right\} , \end{aligned}$$

    where S is a constant and \(\dot{\textbf{F}} = \displaystyle \sum _{g=1}^G\left( \boldsymbol{\Delta }_{k}^{-1} \dot{\boldsymbol{\Gamma }}' \textbf{V}_g - e_g \boldsymbol{\Delta }_{k}^{-1} \dot{\boldsymbol{\Gamma }}'\right) \), with \(e_g\) being the largest eigenvalue of \(\textbf{V}_g\). The update of \(\boldsymbol{\Gamma }\) is given by \(\ddot{\boldsymbol{\Gamma }} = \dot{\textbf{G}} \dot{\textbf{H}} '\), where \(\dot{\textbf{G}}\) and \(\dot{\textbf{H}}\) are obtained from the singular value decomposition of \(\dot{\textbf{F}}\). This process is repeated until a specified convergence criterion is met and the estimate \(\ddot{\boldsymbol{\Gamma }}\) is obtained from the last iteration. Then, we obtain

    $$\begin{aligned} {\ddot{{\boldsymbol{\Delta }}}}_{g} = \frac{{\text {diag}}\left( {\ddot{{\boldsymbol{\Gamma }}}'} {\ddot{{\textbf{V}}}}_{g} {\ddot{{\boldsymbol{\Gamma }}}}\right) }{{\left| {\text {diag}}\left( {\ddot{{\boldsymbol{\Gamma }}}'}{\ddot{{\textbf{V}}}}_{g} {\ddot{{\boldsymbol{\Gamma }}}}\right) \right| ^{\frac{1}{p}}}} \quad \text {and} \quad {\ddot{\lambda }} = \frac{\sum \limits _{g=1}^{G}{\text {tr}}\left( \ddot{\boldsymbol{\Gamma }} \ddot{\boldsymbol{\Delta }}_{g}^{-1} \ddot{\boldsymbol{\Gamma }}'\ddot{\textbf{V}}_{g}\right) }{prN}; \end{aligned}$$
  • Model VVE

    Similarly to the EVE case, there is no analytical solution for \(\boldsymbol{\Gamma }\), and the MM algorithm described above is implemented. Then, we have

    $$\begin{aligned} \ddot{\boldsymbol{\Delta }}_g = \frac{\text {diag}\left( \ddot{\boldsymbol{\Gamma }}' \ddot{\textbf{V}}_g \ddot{\boldsymbol{\Gamma }}\right) }{\left| \text {diag}\left( \ddot{\boldsymbol{\Gamma }}' \ddot{\textbf{V}}_g \ddot{\boldsymbol{\Gamma }}\right) \right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda }_g = \frac{\left| \text {diag}\left( \ddot{\boldsymbol{\Gamma }}' \ddot{\textbf{V}}_g \ddot{\boldsymbol{\Gamma }}\right) \right| ^{\frac{1}{p}}}{r\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
  • Model EEV

    Consider the eigen-decomposition \(\textbf{V}_g=\textbf{L}_g \textbf{D}_g \textbf{L}_g'\), with eigenvalues in the diagonal matrix \(\textbf{D}_g\) following descending order and orthogonal matrix \(\textbf{L}_g\) composed of the corresponding eigenvectors. Then, we obtain

    $$\begin{aligned} \ddot{\boldsymbol{\Gamma }}_g=\ddot{\textbf{L}}_g , \quad \ddot{\boldsymbol{\Delta }} = \frac{\displaystyle \sum \limits _{g=1}^G \ddot{\textbf{D}}_g}{\left| \displaystyle \sum \limits _{g=1}^G \ddot{\textbf{D}}_g\right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda } = \frac{\left| \displaystyle \sum \limits _{g=1}^G \ddot{\textbf{D}}_g\right| ^\frac{1}{p}}{rN}; \end{aligned}$$
  • Model VEV

    By using the same algorithm applied for the EEV model, we have

    $$\begin{aligned} \ddot{\boldsymbol{\Gamma }}_g=\ddot{\textbf{L}}_g , \quad \ddot{\boldsymbol{\Delta }} = \frac{\displaystyle \sum \limits _{g=1}^G \lambda _g^{-1} \ddot{\textbf{D}}_g}{\left| \displaystyle \sum \limits _{g=1}^G \lambda _g^{-1} \ddot{\textbf{D}}_g\right| ^\frac{1}{p}} \quad \text {and} \quad \ddot{\lambda }_g = \frac{{{\,\text{ tr }}\left\{ \ddot{\textbf{D}}_g \ddot{\boldsymbol{\Delta }}^{-1} \right\} }}{pr\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}; \end{aligned}$$
  • Model EVV

    $$\begin{aligned} \ddot{\boldsymbol{\Gamma }}_g\ddot{\boldsymbol{\Delta }}_g\ddot{\boldsymbol{\Gamma }}_g' = \frac{\ddot{\textbf{V}}_g}{\left| \ddot{\textbf{V}}_g\right| ^{\frac{1}{p}}} \quad \text {and} \quad \ddot{\lambda } = \frac{\displaystyle \sum \limits _{g=1}^G \left| \ddot{\textbf{V}}_g\right| ^{\frac{1}{p}}}{rN}; \end{aligned}$$
  • Model VVV

    $$\begin{aligned} \ddot{\boldsymbol{\Sigma }}_g = \frac{\ddot{\textbf{V}}_g}{r\displaystyle \sum _{i=1}^N \ddot{z}_{ig}}. \end{aligned}$$

Appendix B

Let \(\ddot{\textbf{W}}=\displaystyle \sum _{g=1}^G \ddot{\textbf{W}}_g\), where \(\ddot{\textbf{W}}_g = \displaystyle \sum _{i=1}^N \ddot{z}_{ig}\ddot{w}_{ig}\left( \textbf{X}_{i}-\ddot{\textbf{M}}_{g}\right) '\ddot{\boldsymbol{\Sigma }}_g^{-1}\left( \textbf{X}_{i}-\ddot{\textbf{M}}_{g}\right) \). With the exclusion of the II model, for which no parameters need to be estimated, we have the following updates:

  • Model EI

    $$\begin{aligned} \ddot{\boldsymbol{\Delta }} = \frac{\text {diag}\left( \ddot{\textbf{W}}\right) }{\left| \text {diag}\left( \ddot{\textbf{W}}\right) \right| ^\frac{1}{r}}; \end{aligned}$$
  • Model VI

    $$\begin{aligned} \ddot{\boldsymbol{\Delta }}_g = \frac{\text {diag}\left( \ddot{\textbf{W}}_g\right) }{\left| \text {diag}\left( \ddot{\textbf{W}}_g\right) \right| ^\frac{1}{r}}; \end{aligned}$$
  • Model EE

    $$\begin{aligned} \ddot{\boldsymbol{\Psi }} = \frac{\ddot{\textbf{W}}}{\left| \ddot{\textbf{W}} \right| ^\frac{1}{r}}; \end{aligned}$$
  • Model VE

    As for the EVE and VVE models, there is no analytical solution for \(\boldsymbol{\Gamma }\) and a MM algorithm of the type described for the EVE model is implemented, after replacing \(\textbf{V}\) with \(\textbf{W}\). Then, we have

    $$\begin{aligned} \ddot{\boldsymbol{\Delta }}_g= \frac{\text {diag}\left( \ddot{\boldsymbol{\Gamma }}' \ddot{\textbf{W}}_g \ddot{\boldsymbol{\Gamma }}\right) }{\left| \text {diag}\left( \ddot{\boldsymbol{\Gamma }}' \ddot{\textbf{W}}_g \ddot{\boldsymbol{\Gamma }}\right) \right| ^\frac{1}{r}}; \end{aligned}$$
  • Model EV

    By using the same approach of the EEV and VEV models, after replacing \(\ddot{\textbf{V}}\) with \(\ddot{\textbf{W}}\), we have

    $$\begin{aligned} \ddot{\boldsymbol{\Gamma }}_g=\ddot{\textbf{L}}_g \quad \text {and} \quad \ddot{\boldsymbol{\Delta }} = \frac{\displaystyle \sum \limits _{g=1}^G \ddot{\textbf{D}}_g}{\left| \displaystyle \sum \limits _{g=1}^G \ddot{\textbf{D}}_g\right| ^\frac{1}{r}}; \end{aligned}$$
  • Model VV

    $$\begin{aligned} \ddot{\boldsymbol{\Psi }}_g = \frac{\ddot{\textbf{W}}_g}{\left| \ddot{\textbf{W}}_g\right| ^\frac{1}{r}}. \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tomarchio, S.D., Punzo, A., Bagnato, L. (2022). On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling. In: Salvati, N., Perna, C., Marchetti, S., Chambers, R. (eds) Studies in Theoretical and Applied Statistics . SIS 2021. Springer Proceedings in Mathematics & Statistics, vol 406. Springer, Cham. https://doi.org/10.1007/978-3-031-16609-9_24

Download citation

Publish with us

Policies and ethics