Abstract
Despite the flexibility and popularity of mixture models, their associated parameter spaces are often difficult to represent due to fundamental identification problems. This paper looks at a novel way of representing such a space for general mixtures of exponential families, where the parameters are identifiable, interpretable, and, due to a tractable geometric structure, the space allows fast computational algorithms to be constructed.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Anaya-Izquierdo, K., Marriott, P.: Local mixture models of exponential families. Bernoulli 13, 623–640 (2007)
Celeux, G.: Mixture models for classification. In: Decker, R., Lenz, H.-J. (eds.) Advances in Data Analysis, pp. 3–14. Springer, Berlin (2007)
Chen, J., Kalbfleisch, J.: Penalized minimum-distance estimates in finite mixture models. Can. J. Stat. 24(2), 167–175 (1996)
Culter, A.: Windham: information-based validity functionals for mixture analysis. In: Proceedings of the First US/Japan Conference on the frontires of Statistical Modeling in Informational Approach Amsterdam: Kluwer, pp. 149–170 (1994)
Donoho, D.L.: One-sided inference about functionals of a density. Ann. Stat. 16, 1390–1420 (1988)
Everitt, B.S.: An introduction to finite mixture distributions. Stat. Methods Med. Res. 5(2), 107–127 (1996)
Gan, L., Jiang, J.: A test for global maximum. J. Am. Stat. Assoc. 94(447), 847–854 (1999)
Hall, P., Stewart, M.: Theoretical analysis of power in a two-componet normal mixture model. J. Stat. Plan. Inference 134, 158–179 (2005)
Leroux, B.G., et al.: Consistent estimation of a mixing distribution. Ann. Stat. 20(3), 1350–1360 (1992)
Li, P., Chen, J.: Testing the order of a finite mixture. J. Am. Stat. Assoc. 105(491), 1084–1092 (2010)
Li, P., Chen, J., Marriott, P.: Non-finite fisher information and homogeneity: an em approach. Biometrika, 1–16 (2008)
Lindsay, B.G.: Mixture Models: Theory, Geometry and Applications. Institute of Mathematical Statistics (1995)
Lindsay, B.G., Roeder, K.: Uniqueness of estimation and identifiability in mixture models. Can. J. Stat. 21(2), 139–147 (1993)
Maciejowska, K.: Assessing the number of componentsi a normal mixture: an alternative appraoch. University Library of Munich (No. 50303) (2013)
Maroufy, V., Marriott, P.: Generalizing the frailty assumptions in survival analysis. arXiv:1510.02425 (2015)
Marriott, P.: On the local geometry of mixture models. Biometrika 89, 77–93 (2002)
Marriott, P.: Extending local mixture models. AISM 59, 95–110 (2006)
Mclachlan, G., Peel, D.: Extending local mixture models. Wiley, New York (2000)
Morris, C.: Natural exponential families with quadratic variance functions. Ann. Stat. 10(1), 65–80 (1982)
Richardson, S., Green, P.J.: On bayesian analysis of mixtures with an unknown number of components (with discuassion). J. R. Stat. Soc. B 59, 731–792 (1997)
Schlattmann, P.: Medical Applications of Finite Mixture models. Springer, Berlin (2009)
Shun, Z., McCullagh, P.: Laplace approximation of high dimensional integrals. J. R. Stat. Soc. Ser. B (Methodological) 1, 749–760 (1955)
Struik, D.J.: Lectures on Classical Differential Geometry. Dover Publications, Mineola (1988)
Tallis, G.: The identifiability of mixtures of distributions. J. Appl. Prob. 6(2), 389–398 (1969)
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Following the methodology in Sect. 2.1 select suitably separated grid points \(\varvec{\mu }=(\mu _1,\ldots ,\mu _L)\), which are fixed throughout. Select initial proportions \(\varvec{\rho }^{(0)}=(\rho _1^{(0)},\ldots ,\rho _L^{(0)})\) and local mixture parameters \(\underline{\varvec{\lambda }}^{(0)} =(\varvec{\lambda }^{1,(0)},\ldots ,\varvec{\lambda }^{L,(0)})\). Suppose we have \(\varvec{\rho }^{(r)}\) and \(\underline{\varvec{\lambda }}^{(r)}\) at step r, where \(L_r\) is the number of non-zero proportions and \(L_r \le L\). To obtain the estimates at step \(r+1\) run the following steps.
-
1.
Calculate \(\rho _l^{(r+1)}=\frac{n_l}{n}\), where \(n_l= \sum \nolimits _{i=1}^{n}w^{(r+1)}_{il}\) and for \(x=1,\ldots ,n;\,\, l=1,\ldots ,L_r\)
$$\begin{aligned} w^{(r+1)}_{il}=\frac{\rho _l^{(r)} g_{\mu _l}\left( x_i,\varvec{\lambda }^{l,(r)}\right) }{\sum \nolimits _{l=1}^{L_r} \rho _l^{(r)} g_{\mu _l}\left( x_i,\varvec{\lambda }^{l,(r)}\right) }. \end{aligned}$$ -
2.
Choose a positive value \(0 < \gamma < 1\), and check if there is any l such that \(\rho _l^{(r+1)} < \gamma \).
-
(a)
If yes: exclude the components corresponding to \(\rho _l^{(r+1)} < \gamma \), update \(L_r\rightarrow L_{r+1}\) and go back to step 1.
-
(b)
If no: go to step 3.
-
(a)
-
3.
Classify the data set into \(\varvec{x}^1,\ldots \varvec{x}^{L_{r+1}}\) by assigning each \(x_i\) to only one local mixture component. For each \(l=1,\ldots ,L_{r+1}\), update \(\varvec{\lambda }^{l,(r)}\) by
$$\begin{aligned} \varvec{\lambda }^{l,(r+1)}= {\text {*}}{arg\,max}_{\varvec{\lambda } \in \Lambda _{\mu _l}} l_{\mu _l}(x^l,\varvec{\lambda }), \end{aligned}$$where \( l_{\mu _l}(x^l,\cdot )\) is the log-likelihood function for the component l as defined in Marriott (2002).
Remark 1
Step 2 restricts the number of required components for fitting a data set in a way that there is enough information necessary for running inference on each local mixture component. The value, \(\gamma \), has an influence on the final result of the algorithm in a similar way that an initial value affects the convergence of a general EM algorithm (Table 1).
Rights and permissions
About this article
Cite this article
Maroufy, V., Marriott, P. Mixture models: building a parameter space. Stat Comput 27, 591–597 (2017). https://doi.org/10.1007/s11222-016-9641-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9641-6