A new mixture model on the simplex

812 Accesses
7 Citations
Explore all metrics

Abstract

This paper is meant to introduce a significant extension of the flexible Dirichlet (FD) distribution, which is a quite tractable special mixture model for compositional data, i.e. data representing vectors of proportions of a whole. The FD model displays several theoretical properties which make it suitable for inference, and fairly easy to handle from a computational viewpoint. However, the rigid type of mixture structure implied by the FD makes it unsuitable to describe many compositional datasets. Furthermore, the FD only allows for negative correlations. The new extended model, by considerably relaxing the strict constraints among clusters entailed by the FD, allows for a more general dependence structure (including positive correlations) and greatly expands its applicative potential. At the same time, it retains, to a large extent, its good properties. EM-type estimation procedures can be developed for this more complex model, including ad hoc reliable initialization methods, which permit to keep the computational issues at a rather uncomplicated level. Accurate evaluation of standard error estimates can be provided as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A structured Dirichlet mixture model for compositional data: inferential and applicative issues

Article 19 May 2016

Distributions on the Simplex Revisited

Mixture models: building a parameter space

Article 26 February 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aitchison, J.: The Statistical Analysis of Compositional Data. Chapman & Hall, London (2003)
MATH Google Scholar
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics, I, pp. 610–624. Springer-Verlag (1973)
Azzalini, A., Torelli, N.: Clustering via nonparametric density estimation. Stat. Comput. 17(1), 71–80 (2007). https://doi.org/10.1007/s11222-006-9010-y. arXiv:1301.6559
Article MathSciNet Google Scholar
Barndorff-Nielsen, O., Jørgensen, B.: Some parametric models on the simplex. J. Multivar. Anal. 39(1), 106–116 (1991)
Article MathSciNet Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Stat. Data Anal. 41, 561–575 (2003)
Article MathSciNet Google Scholar
Byrd, L.P.R.H., Nocedal, J.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Stat. Comput. 16(5), 1190–1208 (1995)
Article MathSciNet Google Scholar
Celeux, G., Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Comput. Stat. Data Anal. 4, 315–332 (1992)
Article MathSciNet Google Scholar
Celeux, G., Chauveau, D., Diebolt, J.: Stochastic versions of the EM algorithm: an experimental study in the mixture case. J. Stat. Comput. Simul. 55, 287–314 (1996)
Article Google Scholar
Comas-Cufí, M., Martín-Fernández, J.A., Mateu-Figueras, G.: Log-ratio methods in mixture models for compositional data sets. Sort 40(2), 349–374 (2016)
MathSciNet MATH Google Scholar
Connor, R.J., Mosimann, J.E.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Am. Stat. Assoc. 64(325), 194–206 (1969)
Article MathSciNet Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Diebolt, J., Ip, E.: Stochastic EM: method and application. In: WR Gilks, S.R., Spiegelhalter, D.: (eds) Markov Chain Monte Carlo in Practice, Chapman & Hall, London, pp 259–273 (1996)
Favaro, S., Hadjicharalambous, G., Prunster, I.: On a class of distributions on the simplex. J. Stat. Plan. Inference 141(426), 2987–3004 (2011)
Article MathSciNet Google Scholar
Forina, M., Armanino, C., Lanteri, S., Tiscornia, E.: Classification of olive oils from their fatty acid composition. In: Martens, R. (ed.) Food Research and Data Analysis. Dip. Chimica e Tecnologie Farmaceutiche ed Alimentari, University of Genova, Genoa (1983)
Google Scholar
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)
MATH Google Scholar
Gupta, R.D., Richards, D.S.P.: Multivariate liouville distributions. J. Multivar. Anal. 23, 233–256 (1987)
Article MathSciNet Google Scholar
Gupta, R.D., Richards, D.S.P.: Multivariate liouville distributions, II. Prob. Math. Stat. 12, 291–309 (1991)
MathSciNet MATH Google Scholar
Gupta, R.D., Richards, D.S.P.: Multivariate liouville distributions, III. J. Multivar. Anal. 43, 29–57 (1992)
Article MathSciNet Google Scholar
Gupta, R.D., Richards, D.S.P.: Multivariate liouville distributions, IV. J. Multivar. Anal. 54, 1–17 (1995)
Article MathSciNet Google Scholar
Gupta, R.D., Richards, D.S.P.: Multivariate liouville distributions, V. In: Johnson, N.L., Balakrishnan, N. (eds.) Advances in the Theory and Practice of Statistics: A Volume in Honour of Samuel Kotz, pp. 377–396. Wiley, New York (1997)
Google Scholar
Gupta, R.D., Richards, D.S.P.: The covariance structure of the multivariate liouville distributions. Contemp. Math. 287, 125–138 (2001a)
Article MathSciNet Google Scholar
Gupta, R.D., Richards, D.S.P.: The history of the Dirichlet and Liouville distributions. Int. Stat. Rev. 69(3), 433–446 (2001b)
Article Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951). https://doi.org/10.1214/aoms/1177729694. arXiv:1511.00860
Article MathSciNet MATH Google Scholar
Migliorati, S., Ongaro, A., Monti, G.S.: A structured dirichlet mixture model for compositional data: inferential and applicative issues. Stat. Comput. 27, 963. https://doi.org/10.1007/s11222-016-9665-y
Ongaro, A., Migliorati, S.: A generalization of the Dirichlet distribution. J. Multivar. Anal. 114, 412–426 (2013)
Article MathSciNet Google Scholar
Pawlowsky-Glahn, V., Egozcue, J., Tolosana-Delgado, R.: Modeling and Analysis of Compositional Data. Wiley, New York (2015)
Google Scholar
R Core Team (2018) R: a language and environment for statistical computing. https://www.r-project.org/. Accessed 22 January
Rayens, W.S., Srinivasan, C.: Dependence properties of generalized Liouville distributions on the simplex. J. Am. Stat. Assoc. 89(428), 1465–1470 (1994)
Article MathSciNet Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 2, 461–464 (1978)
Article MathSciNet Google Scholar
Smith, B., Rayens, W.: Conditional generalized Liouville distributions on the simplex. Statistics 36(2), 185–194 (2002)
Article MathSciNet Google Scholar
Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)
Article MathSciNet Google Scholar

Download references

Funding

This study was partially funded by Università degli Studi di Milano-Bicocca (Grant No. FA 2018).

Author information

Authors and Affiliations

DEMS - Dept of Economics, Management and Statistics, University of Milano-Bicocca, Milano, Italy
Andrea Ongaro, Sonia Migliorati & Roberto Ascari

Authors

Andrea Ongaro
View author publications
You can also search for this author in PubMed Google Scholar
Sonia Migliorati
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Ascari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sonia Migliorati.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof of Proposition 3

The conditional distribution function of ${\varvec{S}}_1\mid {\varvec{X}}_{2}={\varvec{x}}_{2}$ can be derived most easily by conditioning on ${\varvec{Z}}$:

$$\begin{aligned} \begin{aligned} F_{{\varvec{S}}_1\mid {\varvec{X}}_{2}={\varvec{x}}_{2}}({\varvec{s}}_1)=&\sum _{i\le D} F_{{\varvec{S}}_1\mid {\varvec{X}}_{2}={\varvec{x}}_{2},{\varvec{Z}}={\varvec{e}}_i}({\varvec{s}}_1)\\&\cdot P({\varvec{Z}}={\varvec{e}}_i\mid {\varvec{X}}_{2}={\varvec{x}}_{2}). \end{aligned} \end{aligned}$$

(35)

Given that ${\varvec{X}}\mid {\varvec{Z}}={\varvec{e}}_i\sim \mathcal{D}({\dot{\varvec{\alpha }}}_i)$, by using well-known Dirichlet independence properties we have that:

$$\begin{aligned} {\varvec{S}}_1 | {\varvec{X}}_{2}={\varvec{x}}_{2},{\varvec{Z}}={\varvec{e}}_i\sim {\varvec{S}}_1\mid {\varvec{Z}}={\varvec{e}}_i. \end{aligned}$$

Recalling that the Dirichlet distribution is closed under the operation of subcomposition, it follows that:

$$\begin{aligned} {\varvec{S}}_1 | {\varvec{Z}}={\varvec{e}}_i\;\sim \;\mathcal{D}(\alpha _1,\ldots ,\alpha _i+\tau _i,\ldots ,\alpha _k),\quad i\le k \end{aligned}$$

and

$$\begin{aligned} {\varvec{S}}_1 | {\varvec{Z}}={\varvec{e}}_i\;\sim \;\mathcal{D}(\alpha _1,\ldots ,\alpha _i,\ldots ,\alpha _k),\quad i> k. \end{aligned}$$

The probabilities $P({\varvec{Z}}={\varvec{e}}_i\mid {\varvec{X}}_{2}={\varvec{x}}_{2})$ can be computed by the Bayes theorem. In particular, the distribution of $({\varvec{X}}_{2},1-X_2^+)^\intercal | {\varvec{Z}}={\varvec{e}}_i$ can be obtained by resorting to closure of the Dirichlet under marginalization; it takes the form

$$\begin{aligned} ({\varvec{X}}_{2},1-X_2^+)^\intercal | {\varvec{Z}}={\varvec{e}}_i\;\sim \;\mathcal{D}(\alpha _{k+1},\ldots ,\alpha _D,\alpha _1^++\tau _i) \end{aligned}$$

if $i\le k$ and

$$\begin{aligned} ({\varvec{X}}_{2},1-X_2^+)^\intercal | {\varvec{Z}}={\varvec{e}}_i\;\sim \;\mathcal{D}(\alpha _{k+1},\ldots ,\alpha _i+\tau _i,\ldots ,\alpha _D,\alpha _1^+) \end{aligned}$$

if $i> k$. From the Bayes formula, some algebraic manipulations show that the probabilities $P({\varvec{Z}}={\varvec{e}}_i\mid {\varvec{X}}_{2}={\varvec{x}}_{2})$ are proportional to the $p_{i}^{'}$’s provided by (14). By plugging all the computed quantities into (35), the result is obtained.

1.2 Proof of Proposition 5

It is obvious that if $\varvec{\theta }=\varvec{\theta }^\prime $, then $\mathbf{X } \sim \mathbf{X }^\prime $. In order to show the converse, one can focus on the marginal distribution of $X_i$. By virtue of Proposition 3, we can write its density function $g(x_i; \varvec{\theta })$ as:

$$\begin{aligned} \begin{aligned} g(x_i; \varvec{\theta })&= x_i^{\alpha _i - 1} (1-x_i)^{\alpha ^+-\alpha _i -1} \\&\cdot \left\{ p_i \frac{\Gamma (\alpha ^++ \tau _i) x_i^{\tau _i}}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)} \right. \\&+\, \left. \sum _{l\ne i} p_l \frac{\Gamma (\alpha ^++ \tau _l)(1-x_i)^{\tau _l} }{\Gamma (\alpha _i)\Gamma (\alpha ^+- \alpha _i + \tau _l)} \right\} . \end{aligned} \end{aligned}$$

(36)

If $\mathbf{X } \sim \mathbf{X }^\prime $, then $X_i \sim X_i^\prime $ and therefore, $g(x_i; \varvec{\theta }) = g(x_i; \varvec{\theta }^\prime )$$\forall $$x_i$$\in $ (0, 1), as these density functions are continuous. It follows that $\displaystyle \lim \limits _{x \rightarrow 0^+} \frac{g(x_i; \varvec{\theta })}{x_i^{\alpha _i - 1}} = \lim \limits _{x \rightarrow 0^+} \frac{g(x_i; \varvec{\theta }^\prime )}{x_i^{\alpha _i - 1}}$. We have:

$$\begin{aligned} \displaystyle \lim \limits _{x_i \rightarrow 0^+} \frac{g(x_i; \varvec{\theta })}{x_i^{\alpha _i - 1}} = \sum _{l \ne i} \frac{p_l \Gamma (\alpha ^++ \tau _l)}{\Gamma (\alpha _i)\Gamma (\alpha ^+-\alpha _i+\tau _l)} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \lim \limits _{x_i \rightarrow 0^+} \frac{g(x_i; \varvec{\theta }^\prime )}{x_i^{\alpha _i - 1}}&= \left( \lim \limits _{x_i \rightarrow 0^+} \frac{x_i^{\alpha _i^\prime - 1}}{x_i^{\alpha _i - 1}} \right) \\&\cdot \sum _{l \ne i} \frac{p_l^\prime \Gamma (\alpha ^{\prime +} + \tau _l^\prime )}{\Gamma (\alpha _i^\prime )\Gamma (\alpha ^{\prime +}-\alpha _i^\prime +\tau _l^\prime )}. \end{aligned} \end{aligned}$$

In order to satisfy the equality of these two limits, the quantity $\displaystyle \left( \lim _{x_i \rightarrow 0^+} \frac{x_i^{\alpha _i^\prime - 1}}{x^{\alpha _i - 1}}\right) $ must be finite and different from 0.

This implies that $\varvec{\alpha }=\varvec{\alpha }^\prime $. As a consequence, the equality $g(x_i; \varvec{\theta }) = g(x_i; \varvec{\theta }^\prime )$ can be rewritten as:

$$\begin{aligned}&\, \frac{p_i \Gamma (\alpha ^++ \tau _i)x_i^{\tau _i}}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)} + \sum _{l\ne i} \frac{ p_l \Gamma (\alpha ^++ \tau _l)(1-x_i)^{\tau _l}}{\Gamma (\alpha _i)\Gamma (\alpha ^+- \alpha _i + \tau _l)} = \nonumber \\&\quad = \frac{p_i^\prime \Gamma (\alpha ^++ \tau _i^\prime )x_i^{\tau _i^\prime }}{\Gamma (\alpha _i + \tau _i^\prime ) \Gamma (\alpha ^+-\alpha _i)} + \sum _{l\ne i} \frac{p_l^\prime \Gamma (\alpha ^++ \tau _l^\prime )(1-x_i)^{\tau _l^\prime }}{\Gamma (\alpha _i)\Gamma (\alpha ^+- \alpha _i + \tau _l^\prime )}.\nonumber \\ \end{aligned}$$

(37)

By taking the limits as $x_i \rightarrow 1^-$ on both sides, one obtains:

$$\begin{aligned} \frac{p_i \Gamma (\alpha ^++ \tau _i)}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)} = \frac{p_i^\prime \Gamma (\alpha ^++ \tau _i^\prime )}{\Gamma (\alpha _i + \tau _i^\prime ) \Gamma (\alpha ^+-\alpha _i)}. \end{aligned}$$

(38)

Equation (38) implies that $p_i$ and $ p_i^\prime $ are either both null or both strictly positive. In the former case, because of the parameter space definition, $ \tau _i=\tau _i^\prime =1$. In the latter case, plugging (38) into equality (37) and deriving both sides, the following equality must hold $\forall $$x_i$$\in $ (0, 1):

$$\begin{aligned} \begin{aligned}&\, \frac{p_i \tau _i \Gamma (\alpha ^++ \tau _i) x_i^{\tau _i-1}}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)}\\&\qquad - \sum _{l\ne i} \frac{p_l \tau _l \Gamma (\alpha ^++ \tau _l)(1-x_i)^{\tau _l-1}}{\Gamma (\alpha _i)\Gamma (\alpha ^+- \alpha _i + \tau _l)} = \\&\quad = \frac{ p_i \tau _i^\prime \Gamma (\alpha ^++ \tau _i)x_i^{\tau _i^\prime -1}}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)}\\&\qquad -\sum _{l\ne i} \frac{p_l^\prime \tau _l^\prime \Gamma (\alpha ^++ \tau _l^\prime )(1-x_i)^{\tau _l^\prime -1}}{\Gamma (\alpha _i)\Gamma (\alpha ^+- \alpha _i + \tau _l^\prime )}. \end{aligned} \end{aligned}$$

(39)

Taking the limits as $x_i \rightarrow 1^-$ on both sides, we have:

$$\begin{aligned} \frac{p_i \tau _i \Gamma (\alpha ^++ \tau _i)}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)} = \frac{p_i \tau _i ^\prime \Gamma (\alpha ^++ \tau _i)}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)}. \end{aligned}$$

(40)

It follows that $\tau _i=\tau _i^\prime $ for any i such that $p_i>0$ and hence for all i. Finally, substituting this constraint in (38), it is possible to conclude that $\mathbf{p }= \mathbf{p }^\prime $.

1.3 Proof of Proposition 8

Recall that $\mathbf{X }| Y^+ = y^+ \sim EFD(\varvec{\alpha },\mathbf{p }^*(y^+),\varvec{\tau }, \beta )$, where $\mathbf{p }^*(y^+)$ are defined as in (23). Then, if $\tau _i=\tau $$\forall i$, it can be seen immediately that the $p_i^*(y^+)$’s are independent of $y^+$ (and coincide with the $p_i$’s). Conversely, if the basis is compositional invariant, then $p_i^*(y^+)$ does not depend on $y^+$, and therefore, neither does the ratio $p_i^*(y^+)/p_l^*(y^+)$$\forall i\ne l$. Because this ratio is proportional to ${(y^+)}^{\tau _i-\tau _l}$, $\tau _i=\tau _l$$\forall i\ne l$.

1.4 Partial derivatives

In this section we show the partial derivatives of the complete-data log-likelihood (25). In particular, for $i=1, \ldots ,D$, the first-order partial derivatives are:

$$\begin{aligned} \frac{\partial l_c(\varvec{\theta })}{\partial p_i} = \frac{z_{\cdot i}}{p_i} - \frac{z_{\cdot D}}{p_D}, \end{aligned}$$

where $z_{\cdot i}=\sum _{j=1}^n z_{ji}$.

$$\begin{aligned} \frac{\partial l_c(\varvec{\theta })}{\partial \alpha _i}= & {} \left( \sum _{l=1}^D z_{\cdot l} \psi (\alpha ^++ \tau _l)\right) + \sum _{j=1}^n \log x_{ji}\\&+\, z_{\cdot i} \left( \psi (\alpha _i) - \psi (\alpha _i + \tau _i) \right) - n \psi (\alpha _i).\\ \frac{\partial l_c(\varvec{\theta })}{\partial \tau _i}= & {} z_{\cdot i} \left( \psi (\alpha ^++ \tau _i) - \psi (\alpha _i + \tau _i)\right) + \sum _{j=1}^n z_{ji} \log x_{ji}. \end{aligned}$$

Table 7 Goodness-of-fit measures for two-part compositions

Full size table

The second-order partial derivatives are:

$$\begin{aligned} \frac{\partial ^2 l_c(\varvec{\theta })}{\partial p_i \partial p_h} = - \frac{z_{\cdot D}}{p_D^2} - \mathbb {1}_{i=h} \cdot \frac{z_{\cdot i}}{p_i^2}, \end{aligned}$$

where $\mathbb {1}_{i=h}$ is the indicator function that is equal to 1 if $i = h$ and 0 otherwise.

$$\begin{aligned} \frac{\partial ^2 l_c(\varvec{\theta })}{\partial p_i \partial \alpha _h}= & {} \frac{\partial ^2 l_c(\varvec{\theta })}{\partial p_i \partial \tau _h} = 0.\\ \frac{\partial ^2 l_c(\varvec{\theta })}{\partial \alpha _i \partial \alpha _h}= & {} \left( \sum _{l=1}^D z_{\cdot l} \psi ^\prime (\alpha ^++ \tau _l) \right) - \mathbb {1}_{i=h} n\psi ^\prime (\alpha _i)\\&+\,\mathbb {1}_{i=h} \cdot \left[ z_{\cdot i} \left( \psi ^\prime (\alpha _i) - \psi ^\prime (\alpha _i + \tau _i)\right) \right] , \end{aligned}$$

where $\psi ^\prime (\cdot )$ is the trigamma function.

$$\begin{aligned} \frac{\partial ^2 l_c(\varvec{\theta })}{\partial \alpha _i \partial \tau _h}= & {} z_{\cdot h} \psi ^\prime (\alpha ^++ \tau _h) - \mathbb {1}_{i=h} z_{\cdot i} \psi ^\prime (\alpha _i + \tau _i).\\ \frac{\partial ^2 l_c(\varvec{\theta })}{\partial \tau _i \partial \tau _h}= & {} \mathbb {1}_{i=h} \left[ z_{\cdot i} \left( \psi ^\prime (\alpha ^++ \tau _i) - \psi ^\prime (\alpha _i + \tau _i) \right) \right] . \end{aligned}$$

1.5 Results of the univariate case of the olive oil dataset

In this section we report the AIC and BIC criteria for the considered models (Table 7), and the fitted density curves (Fig. 9).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ongaro, A., Migliorati, S. & Ascari, R. A new mixture model on the simplex. Stat Comput 30, 749–770 (2020). https://doi.org/10.1007/s11222-019-09920-x

Download citation

Published: 10 January 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11222-019-09920-x

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others