Abstract
In this paper we explore the possibilities of applying \(\phi \)-divergence measures in inferential problems in the field of latent class models (LCMs) for multinomial data. We first treat the problem of estimating the model parameters. As explained below, minimum \(\phi \)-divergence estimators (M\(\phi \)Es) considered in this paper are a natural extension of the maximum likelihood estimator (MLE), the usual estimator for this problem; we study the asymptotic properties of M\(\phi \)Es, showing that they share the same asymptotic distribution as the MLE. To compare the efficiency of the M\(\phi \)Es when the sample size is not big enough to apply the asymptotic results, we have carried out an extensive simulation study; from this study, we conclude that there are estimators in this family that are competitive with the MLE. Next, we deal with the problem of testing whether a LCM for multinomial data fits a data set; again, \(\phi \)-divergence measures can be used to generate a family of test statistics generalizing both the classical likelihood ratio test and the chi-squared test statistics. Finally, we treat the problem of choosing the best model out of a sequence of nested LCMs; as before, \(\phi \)-divergence measures can handle the problem and we derive a family of \(\phi \)-divergence test statistics based on them; we study the asymptotic behavior of these test statistics, showing that it is the same as the classical test statistics. A simulation study for small and moderate sample sizes shows that there are some test statistics in the family that can compete with the classical likelihood ratio and the chi-squared test statistics.
Similar content being viewed by others
Notes
In Formann (1992) it is proposed the use of EM algorithm developed in Dempster et al. (1977) when dealing with MLE; however, we have preferred to use our own algorithm in order to compare the different estimators. Our Fortran 95 algorithm can be found at http://www.sites.google.com/site/nirianmartinswebsite/software.
References
Abar B, Loken E (2010) Self-regulated learning and self-directed study in a pre-college sample. Learn Individ Differ 20:25–29
Agresti A (1996) An introduction to categorical data analysis. Wiley, New York
Bartolucci F, Forcina A (2002) Extended RC association models allowing for order restrictions and marginal modelling. J Am Math Assoc 97(460):1192–1199
Berkson J (1980) Minimum chi-square, not maximum likelihood! Ann Stat 8(3):457–487
Biemer P (2011) Latent class analysis and survey error. Wiley, New York
Birch M (1964) A new proof of the Pearson-Fisher theorem. Ann Math Stat 35:817–824
Bryant F, Satorra A (2012) Principles and practice of scaled difference chi-square testing. Struct Equ Model 19:372–398
Clogg C (1988) Latent class models for measuring. In: Latent trait and class models. Plenum, New York, pp 173–205
Collins L, Lanza S (2010) Latent class and latent transition analysis for the social, behavioral, and health sciences. Wiley, New York
Cressie N, Pardo L (2000) Minimum phi-divergence estimator and hierarchical testing in loglinear models. Stat Sin 10:867–884
Cressie N, Pardo L (2002) Phi-divergence statistics. In: Elshaarawi A, Plegorich W (eds) Encyclopedia of environmetrics, vol 13. Wiley, New York, pp 1551–1555
Cressie N, Read T (1984) Multinomial goodness-of-fit tests. J R Stat Soc Ser B 8:440–464
Cressie N, Pardo L, Pardo M (2003) Size and power considerations for testing loglinear models using \(\phi \)-divergence test statistics. Stat Sin 13(2):555–570
Csiszár I (1967) Information-type measures of difference of probability distributions and indirect observations. Stud Sci Math Hung 2:299–318
Dale J (1986) Asymptotic normality of goodness-of-fit statistics for sparse product multinomials. J R Stat Soc Ser B 41:48–59
Dempster A, Laird N, Rubin D (1977) Maximum likelihood for incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Feldman B, Masyn K, Conger R (2009) New approaches to studying behaviors: a comparison of methods for modelling longitudinal, categorical and adolescent drinking data. Dev Psycol 45(3):652–676
Felipe A, Miranda P, Martín N, Pardo L (2014) Phi-divergence test statistics for testing the validity of latent class models for binary data. arXiv:1407.2165
Felipe A, Miranda P, Pardo L (2015) Minimum \(\phi \)-divergence estimation in constrained latent class models for binary data. Psychometrika 80(4):1020–1042
Formann A (1982) Linear logistic latent class analysis. Biom J 24:171–190
Formann A (1985) Constrained latent class models: theory and applications. Br J Math Stat Psycol 38:87–111
Formann A (1992) Linear logistic latent class analysis for polytomous data. J Am Stat Assoc 87:476–486
Genge E (2014) A latent class analysis of the public attitude towards the euro adoption in Poland. Adv Data Anal Classif 8(4):427–442
Gnaldi M, Bacci S, Bartolucci F (2016) A multilevel finite mixture item response model to cluster examinees and schools. Adv Data Anal Classif 10(1):53–70
Goodman L (1974) Exploratory latent structure analysis using Goth identifiable and unidentifiable models. Biometrika 61:215–231
Goodman L (1979) Simple models for the analysis of association in cross-classifications having ordered categories. J Am Stat Assoc 74:537–552
Hagenaars JA, Cutcheon A (2002) Applied latent class analysis. Cambridge University Press, Cambridge
Laska M, Pash K, Lust K, Story M, Ehlinger E (2009) Latent class analysis of lifestyle characteristics and health risk behaviors among college youth. Prev Sci 10:376–386
Lazarsfeld P (1950) The logical and mathematical foundation of latent structure analysis. Studies in social psycology in World War II, vol IV. Princeton University Press, Princeton, pp 362–412
Martin N, Mata R, Pardo L (2015) Comparing two treatments in terms of the likelihood ratio order. J Stat Comput Simul 85(17):3512–3534
Moon K, Hero A (2014) Multivariate \(f\)-divergence estimation with confidence. In: Advances in neural information processing systems, pp 2420–2428
Morales D, Pardo L, Vajda I (1995) Asymptotic divergence of estimators of discrete distributions. J Stat Plan Inference 48:347–369
Oberski DL (2016) Beyond the number of classes: separating substantive from non-substantive dependence in latent class analysis. Adv Data Anal Classif 10(2):171–182
Pardo L (2006) Statistical inference based on divergence measures. Chapman and Hall CRC, Boca Raton
Satorra A, Bentler P (2010) Ensuring positiveness of the scaled difference chi-square test statistic. Psychometrika 75(2):243–248
Uebersax J (2009) Latent structure analysis. Web document: http://www.john-uebersax.com/stat/index.htm
Acknowledgements
We are very grateful to the associate editor as well as the anonymous referees for fruitful comments and remarks that have improved the final version of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper was supported by the Spanish Grant MTM-2012-33740 and MTM-2015-67057.
Appendix
Appendix
1.1 Proof of Theorem 1
We denote by \(l^{g}\) the interior of the g-dimensional unit cube, where \({g:=\prod _{i=1}^{k}g_{i}}\). The interior of \(\varDelta _{g}\) defined in (10) is contained in \(l^{g}\). Let \(W_{(\varvec{\lambda }_{0},\varvec{\eta }_{0})}\) be a neighborhood of \((\varvec{\lambda } _{0},\varvec{\eta }_{0})\), the true value of the unknown parameter \((\varvec{\lambda },\varvec{\eta })\), on which
has continuous second partial derivatives. Consider the application
whose components \(F_{j},\,j=1,{\ldots },t+u\) are defined by
where \(\theta _{j}\) is either \(\lambda _{j}\) if \(j\le t\) or \(\eta _{j-t}\) if \(j>t\) and \(\varvec{p}\) is a g-dimensional probability vector.
Then \(F_{j},\,j=1,{\ldots },t+u\) vanishes at \((\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0}),(\varvec{\lambda }_{0},\varvec{\eta } _{0}))\). Since
the \((t+u)\times (t+u)\) matrix \(\varvec{J}_{\varvec{F}} (\varvec{\theta }_{0})\) associated with function \(\varvec{F}\) at point \((\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))\) is given by
Next, it is a simple algebra exercise to prove that \(\varvec{J} _{\varvec{F}}(\varvec{\theta }_{0})\) is nonsingular. As \(\varvec{J} _{\varvec{F}}(\varvec{\theta }_{0})=\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})\phi ^{\prime \prime }(1)\), we conclude that this matrix is nonsingular at point \((\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0}),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))\).
Applying the Implicit Function Theorem, there exists a neighborhood U of \((\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))\) such that the matrix \(\varvec{J}_{\varvec{F}}\) is nonsingular (in our case \(\varvec{J} _{\varvec{F}}\) at \((\varvec{p}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0}),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))\) is positive definite and then it is continuously differentiable). Also, there exists a continuously differentiable function and a set A such that
such that \(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})\in A\) and
Let us define
As \(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})\in A\), we conclude that
Briefly speaking,\(\widehat{\varvec{\theta }}(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}))\) is the minimum of function \(\psi \). On the other hand, applying (23),
and then \(\varvec{J}_{\varvec{F}}\) is positive definite at \((\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ),\widetilde{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})))\). Therefore,
and by the \(\phi \)-divergence properties \(\widehat{\varvec{\theta } }(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ))=(\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\), and
Further, we know that
The (i, j)th element of the \((t+u)\times g\) matrix \({\frac{\partial F_{j} }{\partial p_{i}}}\) is given by
and for \((\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0}),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))\) we have
Since \(\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta } _{0})=\varvec{D}_{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}}}\varvec{J}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0})\), then
and
A first order Taylor expansion of the function \(\widehat{\varvec{\theta }}\) around \(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})\) yields
But \(\widehat{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0}))=(\varvec{\lambda }_{0}^{T},\varvec{\eta }_{0}^{T})^{T}\), hence
It is well-known that the nonparametric estimation \(\widehat{\varvec{p}}\) converges almost sure to the probability vector \(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})\). Therefore \(\widehat{\varvec{p}}\in A\) and \(\widehat{\varvec{\theta } }(\varvec{p})\) is the unique solution of the system of equations
and also \((\widehat{\varvec{p}},\widetilde{\varvec{\theta } }(\varvec{p}))\in U\). Therefore,\(\widehat{\varvec{\theta } }(\widehat{\varvec{p}})\) is the minimum \(\phi \)-divergence estimator, \(\widehat{\varvec{\theta }}_{\phi }\), satisfying the relation
This finishes the proof. \(\square \)
1.2 Proof of Theorem 2
Based on the BAN decomposition of the previous theorem it holds
By the Central Limit Theorem we conclude \(\sqrt{N}(\widehat{\varvec{p}}-\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ))\overset{\mathcal {L}}{\underset{N\rightarrow \infty }{\!}{\longrightarrow } }\mathcal {N}(\varvec{0},\varvec{\Sigma }_{\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})})\), where \(\varvec{\Sigma }_{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} )}=\varvec{D}_{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}-\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0})\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\). Now the result holds after some algebra. \(\square \)
1.3 Proof of Theorem 3
A second order Taylor expansion of \(D_{\phi _{1}}\left( \varvec{p} ,\varvec{q}\right) \) around \(\left( \varvec{p}\left( \varvec{\theta }_{0}\right) ,\varvec{p}\left( \varvec{\theta } _{0}\right) \right) \) at \(\left( \widehat{\varvec{p}},\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) \) is given by
By Theorem 1 we have
Therefore,
with \(\varvec{V}\left( \varvec{\theta }_{0}\right) :=\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{1/2}\varvec{A}\left( \varvec{\ \theta }_{0}\right) \left( \varvec{A}\left( \varvec{\theta }_{0}\right) ^{T}\varvec{A}\left( \varvec{\theta }_{0}\right) \right) ^{-1}\varvec{A}\left( \varvec{\theta } _{0}\right) ^{T}\varvec{D}_{\varvec{p}(\varvec{\theta }_{0} )}^{-1/2}\). On the other hand,
Then we have
and we conclude that
Notice that the asymptotic distribution of
coincides with the asymptotic distribution of the quadratic form
with
Now, as
we conclude that the asymptotic distribution of \(\varvec{x}^{T} \varvec{x}\) will be a chi-square distribution if the matrix
is idempotent and symmetric, and in this case de degrees of freedom will be the trace of the matrix \(\mathbf {Q}\left( {\varvec{\theta }}_{0}\right) \). Symmetry is evident. Establishing that the matrix \(\mathbf {Q}\left( {\varvec{\theta }}_{0}\right) \) is idempotent and that its trace is \(g-(u+t)-1\) is a simple but long and tedious exercise.
1.4 Proof of Theorem 4
In Felipe et al. (2014) we established the asymptotic distribution of \(S_{A-B}^{\phi _{1},\phi _{2}}\) for LCMs for binary data. Let us then establish in this case the asymptotic distribution of \(T_{A-B}^{\phi _{1},\phi _{2}}\), the other being similar. A second order Taylor expansion of \(D_{\phi _{1}}\left( \varvec{p},\varvec{q}\right) \) around \(\left( \varvec{p}\left( \varvec{\theta }_{0}\right) ,\varvec{p}\left( \varvec{\theta } _{0}\right) \right) \) at \(\left( \varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}}^{A}),\varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}} ^{B})\right) \) is given by (see the proof of Theorem 1)
Therefore,
with
On the other hand, we already know that
Let us define
Consequently, the asymptotic distribution of \(\varvec{x}\) coincides with the asymptotic distribution of
Now,
being
Consequently, it suffices to show that \(\varvec{\Sigma }^{*}\) is a symmetric and idempotent matrix. Symmetry is trivial, hence it suffices to show idempotency. Notice that
Next, it can be computed that \(\varvec{W}_{A}\sqrt{\varvec{p} (\varvec{\theta }_{0})}=\varvec{W}_{B}\sqrt{\varvec{p} (\varvec{\theta }_{0})}=\varvec{0}\). Finally,
On the other hand,
hence we conclude that
and it is an idempotent matrix. We conclude that
To obtain the degrees of freedom we compute
This finishes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Felipe, A., Martín, N., Miranda, P. et al. Statistical inference in constrained latent class models for multinomial data based on \(\phi \)-divergence measures. Adv Data Anal Classif 12, 605–636 (2018). https://doi.org/10.1007/s11634-017-0289-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-017-0289-7
Keywords
- Latent class models
- Minimum \(\phi \)-divergence estimator
- Maximum likelihood estimator
- \(\phi \)-Divergence test statistics
- Goodness-of-fit
- Nested latent class models