Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Statistical inference in constrained latent class models for multinomial data based on \(\phi \)-divergence measures

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In this paper we explore the possibilities of applying \(\phi \)-divergence measures in inferential problems in the field of latent class models (LCMs) for multinomial data. We first treat the problem of estimating the model parameters. As explained below, minimum \(\phi \)-divergence estimators (M\(\phi \)Es) considered in this paper are a natural extension of the maximum likelihood estimator (MLE), the usual estimator for this problem; we study the asymptotic properties of M\(\phi \)Es, showing that they share the same asymptotic distribution as the MLE. To compare the efficiency of the M\(\phi \)Es when the sample size is not big enough to apply the asymptotic results, we have carried out an extensive simulation study; from this study, we conclude that there are estimators in this family that are competitive with the MLE. Next, we deal with the problem of testing whether a LCM for multinomial data fits a data set; again, \(\phi \)-divergence measures can be used to generate a family of test statistics generalizing both the classical likelihood ratio test and the chi-squared test statistics. Finally, we treat the problem of choosing the best model out of a sequence of nested LCMs; as before, \(\phi \)-divergence measures can handle the problem and we derive a family of \(\phi \)-divergence test statistics based on them; we study the asymptotic behavior of these test statistics, showing that it is the same as the classical test statistics. A simulation study for small and moderate sample sizes shows that there are some test statistics in the family that can compete with the classical likelihood ratio and the chi-squared test statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. In Formann (1992) it is proposed the use of EM algorithm developed in Dempster et al. (1977) when dealing with MLE; however, we have preferred to use our own algorithm in order to compare the different estimators. Our Fortran 95 algorithm can be found at http://www.sites.google.com/site/nirianmartinswebsite/software.

References

  • Abar B, Loken E (2010) Self-regulated learning and self-directed study in a pre-college sample. Learn Individ Differ 20:25–29

    Article  Google Scholar 

  • Agresti A (1996) An introduction to categorical data analysis. Wiley, New York

    MATH  Google Scholar 

  • Bartolucci F, Forcina A (2002) Extended RC association models allowing for order restrictions and marginal modelling. J Am Math Assoc 97(460):1192–1199

    MATH  Google Scholar 

  • Berkson J (1980) Minimum chi-square, not maximum likelihood! Ann Stat 8(3):457–487

    Article  MathSciNet  Google Scholar 

  • Biemer P (2011) Latent class analysis and survey error. Wiley, New York

    MATH  Google Scholar 

  • Birch M (1964) A new proof of the Pearson-Fisher theorem. Ann Math Stat 35:817–824

    Article  MathSciNet  Google Scholar 

  • Bryant F, Satorra A (2012) Principles and practice of scaled difference chi-square testing. Struct Equ Model 19:372–398

    Article  MathSciNet  Google Scholar 

  • Clogg C (1988) Latent class models for measuring. In: Latent trait and class models. Plenum, New York, pp 173–205

    Chapter  Google Scholar 

  • Collins L, Lanza S (2010) Latent class and latent transition analysis for the social, behavioral, and health sciences. Wiley, New York

    Google Scholar 

  • Cressie N, Pardo L (2000) Minimum phi-divergence estimator and hierarchical testing in loglinear models. Stat Sin 10:867–884

    MATH  Google Scholar 

  • Cressie N, Pardo L (2002) Phi-divergence statistics. In: Elshaarawi A, Plegorich W (eds) Encyclopedia of environmetrics, vol 13. Wiley, New York, pp 1551–1555

    Google Scholar 

  • Cressie N, Read T (1984) Multinomial goodness-of-fit tests. J R Stat Soc Ser B 8:440–464

    MathSciNet  MATH  Google Scholar 

  • Cressie N, Pardo L, Pardo M (2003) Size and power considerations for testing loglinear models using \(\phi \)-divergence test statistics. Stat Sin 13(2):555–570

    MathSciNet  MATH  Google Scholar 

  • Csiszár I (1967) Information-type measures of difference of probability distributions and indirect observations. Stud Sci Math Hung 2:299–318

    MathSciNet  MATH  Google Scholar 

  • Dale J (1986) Asymptotic normality of goodness-of-fit statistics for sparse product multinomials. J R Stat Soc Ser B 41:48–59

    MathSciNet  MATH  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood for incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Feldman B, Masyn K, Conger R (2009) New approaches to studying behaviors: a comparison of methods for modelling longitudinal, categorical and adolescent drinking data. Dev Psycol 45(3):652–676

    Article  Google Scholar 

  • Felipe A, Miranda P, Martín N, Pardo L (2014) Phi-divergence test statistics for testing the validity of latent class models for binary data. arXiv:1407.2165

  • Felipe A, Miranda P, Pardo L (2015) Minimum \(\phi \)-divergence estimation in constrained latent class models for binary data. Psychometrika 80(4):1020–1042

    Article  MathSciNet  Google Scholar 

  • Formann A (1982) Linear logistic latent class analysis. Biom J 24:171–190

    Article  Google Scholar 

  • Formann A (1985) Constrained latent class models: theory and applications. Br J Math Stat Psycol 38:87–111

    Article  MathSciNet  Google Scholar 

  • Formann A (1992) Linear logistic latent class analysis for polytomous data. J Am Stat Assoc 87:476–486

    Article  Google Scholar 

  • Genge E (2014) A latent class analysis of the public attitude towards the euro adoption in Poland. Adv Data Anal Classif 8(4):427–442

    Article  MathSciNet  Google Scholar 

  • Gnaldi M, Bacci S, Bartolucci F (2016) A multilevel finite mixture item response model to cluster examinees and schools. Adv Data Anal Classif 10(1):53–70

    Article  MathSciNet  Google Scholar 

  • Goodman L (1974) Exploratory latent structure analysis using Goth identifiable and unidentifiable models. Biometrika 61:215–231

    Article  MathSciNet  Google Scholar 

  • Goodman L (1979) Simple models for the analysis of association in cross-classifications having ordered categories. J Am Stat Assoc 74:537–552

    Article  MathSciNet  Google Scholar 

  • Hagenaars JA, Cutcheon A (2002) Applied latent class analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Laska M, Pash K, Lust K, Story M, Ehlinger E (2009) Latent class analysis of lifestyle characteristics and health risk behaviors among college youth. Prev Sci 10:376–386

    Article  Google Scholar 

  • Lazarsfeld P (1950) The logical and mathematical foundation of latent structure analysis. Studies in social psycology in World War II, vol IV. Princeton University Press, Princeton, pp 362–412

  • Martin N, Mata R, Pardo L (2015) Comparing two treatments in terms of the likelihood ratio order. J Stat Comput Simul 85(17):3512–3534

    Article  MathSciNet  Google Scholar 

  • Moon K, Hero A (2014) Multivariate \(f\)-divergence estimation with confidence. In: Advances in neural information processing systems, pp 2420–2428

  • Morales D, Pardo L, Vajda I (1995) Asymptotic divergence of estimators of discrete distributions. J Stat Plan Inference 48:347–369

    Article  Google Scholar 

  • Oberski DL (2016) Beyond the number of classes: separating substantive from non-substantive dependence in latent class analysis. Adv Data Anal Classif 10(2):171–182

    Article  MathSciNet  Google Scholar 

  • Pardo L (2006) Statistical inference based on divergence measures. Chapman and Hall CRC, Boca Raton

    MATH  Google Scholar 

  • Satorra A, Bentler P (2010) Ensuring positiveness of the scaled difference chi-square test statistic. Psychometrika 75(2):243–248

    Article  MathSciNet  Google Scholar 

  • Uebersax J (2009) Latent structure analysis. Web document: http://www.john-uebersax.com/stat/index.htm

Download references

Acknowledgements

We are very grateful to the associate editor as well as the anonymous referees for fruitful comments and remarks that have improved the final version of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L. Pardo.

Additional information

This paper was supported by the Spanish Grant MTM-2012-33740 and MTM-2015-67057.

Appendix

Appendix

1.1 Proof of Theorem 1

We denote by \(l^{g}\) the interior of the g-dimensional unit cube, where \({g:=\prod _{i=1}^{k}g_{i}}\). The interior of \(\varDelta _{g}\) defined in (10) is contained in \(l^{g}\). Let \(W_{(\varvec{\lambda }_{0},\varvec{\eta }_{0})}\) be a neighborhood of \((\varvec{\lambda } _{0},\varvec{\eta }_{0})\), the true value of the unknown parameter \((\varvec{\lambda },\varvec{\eta })\), on which

$$\begin{aligned} \varvec{p}:\Theta&\rightarrow \varDelta _{g}\\ (\varvec{\lambda }^{T},\varvec{\eta }^{T})^{T}&\mapsto \varvec{p}^{T}(\varvec{\lambda },\varvec{\eta }):=(p_{1} (\varvec{\lambda },\varvec{\eta }),{\ldots },p_{g}(\varvec{\lambda },\varvec{\eta }))^{T} \end{aligned}$$

has continuous second partial derivatives. Consider the application

$$\begin{aligned} \varvec{F}:=(F_{1},{\ldots },F_{t+u})^{T}:l^{g}\times W_{(\varvec{\lambda }_{0},\varvec{\eta }_{0})}\rightarrow \mathbb {R}^{t+u}, \end{aligned}$$

whose components \(F_{j},\,j=1,{\ldots },t+u\) are defined by

$$\begin{aligned} F_{j}(\varvec{p};(\varvec{\lambda },\varvec{\eta })):={\frac{\partial D_{\phi }(\varvec{p},\varvec{p}(\varvec{\lambda },\varvec{\eta }))}{\partial \theta _{j}}},\,j=1,{\ldots },t+u, \end{aligned}$$

where \(\theta _{j}\) is either \(\lambda _{j}\) if \(j\le t\) or \(\eta _{j-t}\) if \(j>t\) and \(\varvec{p}\) is a g-dimensional probability vector.

Then \(F_{j},\,j=1,{\ldots },t+u\) vanishes at \((\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0}),(\varvec{\lambda }_{0},\varvec{\eta } _{0}))\). Since

$$\begin{aligned} {\frac{\partial ^{2}D_{\phi }(\varvec{p},\varvec{p}(\varvec{\lambda },\varvec{\eta }))}{\partial \theta _{r}\partial \theta _{j}}}=\sum _{\nu =1} ^{g}\phi ^{\prime \prime }\left( {\frac{\tilde{p}_{\nu }}{p_{\nu } (\varvec{\lambda },\varvec{\eta })}}\right) {\frac{\tilde{p}_{\nu } }{p_{\nu }(\varvec{\lambda },\varvec{\eta })^{2}}}{\frac{\partial p_{\nu }(\varvec{\lambda },\varvec{\eta })}{\partial \theta _{r}}}{\frac{\partial p_{\nu }(\varvec{\lambda },\varvec{\eta })}{\partial \theta _{j}}} {\frac{\tilde{p}_{\nu }}{p_{\nu }(\varvec{\lambda },\varvec{\eta })},} \end{aligned}$$

the \((t+u)\times (t+u)\) matrix \(\varvec{J}_{\varvec{F}} (\varvec{\theta }_{0})\) associated with function \(\varvec{F}\) at point \((\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))\) is given by

$$\begin{aligned} {\frac{\partial \varvec{F}}{\partial (\varvec{\lambda }_{0} ,\varvec{\eta }_{0})}}&=\left. {\frac{\partial \varvec{F}}{\partial (\varvec{\lambda },\varvec{\eta })}}\right| _{(\varvec{p},(\varvec{\lambda },\varvec{\eta }))=(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}),(\varvec{\lambda } _{0},\varvec{\eta }_{0}))}\\&=\phi ^{\prime \prime }(1)\left( \sum _{l=1}^{g}{\frac{1}{p_{l} (\varvec{\lambda }_{0},\varvec{\eta }_{0})}}{\frac{\partial p_{l}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}{\partial \theta _{r}} }{\frac{\partial p_{l}(\varvec{\lambda }_{0},\varvec{\eta }_{0} )}{\partial \theta _{j}}}\right) _{\overset{j=1,{\ldots },t+u}{r=1,{\ldots },t+u}}. \end{aligned}$$

Next, it is a simple algebra exercise to prove that \(\varvec{J} _{\varvec{F}}(\varvec{\theta }_{0})\) is nonsingular. As \(\varvec{J} _{\varvec{F}}(\varvec{\theta }_{0})=\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})\phi ^{\prime \prime }(1)\), we conclude that this matrix is nonsingular at point \((\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0}),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))\).

Applying the Implicit Function Theorem, there exists a neighborhood U of \((\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))\) such that the matrix \(\varvec{J}_{\varvec{F}}\) is nonsingular (in our case \(\varvec{J} _{\varvec{F}}\) at \((\varvec{p}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0}),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))\) is positive definite and then it is continuously differentiable). Also, there exists a continuously differentiable function and a set A such that

$$\begin{aligned} \widetilde{\varvec{\theta }}:A\subset l^{g}\rightarrow \mathbb {R}^{t+u} \end{aligned}$$

such that \(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})\in A\) and

$$\begin{aligned} \left\{ (\varvec{p},(\varvec{\lambda },\varvec{\eta }))\in U:\varvec{F}(\varvec{p},(\varvec{\lambda },\varvec{\eta }))=\varvec{0}\right\} =\left\{ (\varvec{p} ,\widetilde{\varvec{\theta }}(\varvec{p})):\varvec{p}\in A\right\} . \end{aligned}$$
(23)

Let us define

$$\begin{aligned} \psi (\varvec{\lambda },\varvec{\eta }):=D_{\phi }(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}),\varvec{p} (\varvec{\lambda },\varvec{\eta })). \end{aligned}$$

As \(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})\in A\), we conclude that

$$\begin{aligned} \varvec{F}(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0}),\widetilde{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})))={\frac{\partial D_{\phi }(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}),\varvec{p} (\widetilde{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0}))))}{\partial (\varvec{\lambda },\varvec{\eta })}}=\varvec{0}. \end{aligned}$$

Briefly speaking,\(\widehat{\varvec{\theta }}(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}))\) is the minimum of function \(\psi \). On the other hand, applying (23),

$$\begin{aligned} (\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ),\widetilde{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})))\in U, \end{aligned}$$

and then \(\varvec{J}_{\varvec{F}}\) is positive definite at \((\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ),\widetilde{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})))\). Therefore,

$$\begin{aligned} D_{\phi }(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0}),\varvec{p}(\widetilde{\varvec{\theta }}(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}))))=\inf _{(\varvec{\lambda },\varvec{\eta })\in \Theta }D_{\phi }(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0}),\varvec{p}(\varvec{\lambda } ,\varvec{\eta })), \end{aligned}$$

and by the \(\phi \)-divergence properties \(\widehat{\varvec{\theta } }(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ))=(\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\), and

$$\begin{aligned} {\frac{\partial \varvec{F}}{\partial \varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})}}-{\frac{\partial \varvec{F}}{\partial (\varvec{\lambda }_{0},\varvec{\eta }_{0})}}{\frac{\partial (\varvec{\lambda }_{0},\varvec{\eta }_{0})}{\partial \varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})}}=\varvec{0}. \end{aligned}$$

Further, we know that

$$\begin{aligned} {\frac{\partial \varvec{F}}{\partial (\varvec{\lambda }_{0} ,\varvec{\eta }_{0})}}=\phi ^{\prime \prime }(1)\varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0}). \end{aligned}$$

The (ij)th element of the \((t+u)\times g\) matrix \({\frac{\partial F_{j} }{\partial p_{i}}}\) is given by

$$\begin{aligned} {\frac{\partial }{\partial p_{i}}}\left( {\frac{\partial D_{\phi }(\varvec{p},\varvec{p}(\varvec{\lambda },\varvec{\eta } ))}{\partial \theta _{j}}}\right) ={\frac{1}{p_{i}(\varvec{\lambda },\varvec{\eta })}}\left( -{\frac{p_{i}}{p_{i}(\varvec{\lambda },\varvec{\eta })}}\phi ^{\prime \prime }\left( {\frac{p_{i}}{p_{i} (\varvec{\lambda },\varvec{\eta })}}\right) \right) {\frac{\partial p_{i}(\varvec{\lambda },\varvec{\eta })}{\partial \theta _{j}}} \end{aligned}$$

and for \((\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0}),(\varvec{\lambda }_{0},\varvec{\eta }_{0}))\) we have

$$\begin{aligned} {\frac{\partial }{\partial p_{i}}}\left( {\frac{\partial D_{\phi }(\varvec{p},\varvec{p}(\varvec{\lambda },\varvec{\eta } ))}{\partial \theta _{j}}}\right) ={\frac{1}{p_{i}(\varvec{\lambda } _{0},\varvec{\eta }_{0})}}\phi ^{\prime \prime }\left( 1\right) {\frac{\partial p_{i}(\varvec{\lambda }_{0},\varvec{\eta }_{0} )}{\partial \theta _{j}}}. \end{aligned}$$

Since \(\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta } _{0})=\varvec{D}_{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}}}\varvec{J}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0})\), then

$$\begin{aligned} {\frac{\partial \varvec{F}}{\partial \varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})}}=-\phi ^{\prime \prime }(1)\varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\varvec{D} _{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}}} \end{aligned}$$

and

$$\begin{aligned} {\frac{\partial (\varvec{\lambda }_{0},\varvec{\eta }_{0})}{\partial \varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})} }=(\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta }_{0} )^{T}\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ))^{-1}\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta }_{0} )^{T}\varvec{D}_{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}}}. \end{aligned}$$

A first order Taylor expansion of the function \(\widehat{\varvec{\theta }}\) around \(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})\) yields

$$\begin{aligned} \widetilde{\varvec{\theta }}(\varvec{p})=\widetilde{\varvec{\theta }}(\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0}))+\left. {\frac{\partial \widetilde{\varvec{\theta }}(\varvec{p})}{\varvec{p} }}\right| _{\varvec{p}=\varvec{\pi }}(\varvec{p}-\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}))+o(\Vert \varvec{p} -\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})\Vert ). \end{aligned}$$

But \(\widehat{\varvec{\theta }}(\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0}))=(\varvec{\lambda }_{0}^{T},\varvec{\eta }_{0}^{T})^{T}\), hence

$$\begin{aligned} \widetilde{\varvec{\theta }}(\varvec{p})&=(\varvec{\lambda } _{0}^{T},\varvec{\eta }_{0}^{T})^{T}+(\varvec{A}(\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0}))^{-1}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})^{T}\varvec{D}_{\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0}}^{-{\frac{1}{2}} }(\varvec{p}-\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0}))\\&\quad +\,o(\Vert \varvec{p}-\varvec{p}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0})\Vert ). \end{aligned}$$

It is well-known that the nonparametric estimation \(\widehat{\varvec{p}}\) converges almost sure to the probability vector \(\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})\). Therefore \(\widehat{\varvec{p}}\in A\) and \(\widehat{\varvec{\theta } }(\varvec{p})\) is the unique solution of the system of equations

$$\begin{aligned} {\frac{\partial D_{\phi }(\widehat{\varvec{p}},\varvec{p} (\widetilde{\varvec{\theta }}(\varvec{p})))}{\theta _{j}}} =0,\,j=1,{\ldots },t+u, \end{aligned}$$

and also \((\widehat{\varvec{p}},\widetilde{\varvec{\theta } }(\varvec{p}))\in U\). Therefore,\(\widehat{\varvec{\theta } }(\widehat{\varvec{p}})\) is the minimum \(\phi \)-divergence estimator, \(\widehat{\varvec{\theta }}_{\phi }\), satisfying the relation

$$\begin{aligned} \widetilde{\varvec{\theta }}_{\phi }&=(\varvec{\lambda }_{0} ^{T},\varvec{\eta }_{0}^{T})^{T}+(\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})^{T}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0}))^{-1}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})^{T}\varvec{D}_{\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}} }(\widehat{\varvec{p}}-\varvec{p}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0}))\\&\quad +O_{p}(N^{-1/2}). \end{aligned}$$

This finishes the proof. \(\square \)

1.2 Proof of Theorem 2

Based on the BAN decomposition of the previous theorem it holds

$$\begin{aligned} \sqrt{N}(\widetilde{\varvec{\theta }}_{\phi }-(\varvec{\lambda }_{0} ^{T},\varvec{\eta }_{0}^{T})^{T})&=\left( \varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0})\right) ^{-1}\\&\quad \times \varvec{A} (\varvec{\lambda }_{0},\varvec{\eta }_{0})\varvec{D}_{\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}}}\sqrt{N}(\widehat{\varvec{p}}-\varvec{p}(\varvec{\lambda } _{0},\varvec{\eta }_{0})) +O_{p}(1). \end{aligned}$$

By the Central Limit Theorem we conclude \(\sqrt{N}(\widehat{\varvec{p}}-\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} ))\overset{\mathcal {L}}{\underset{N\rightarrow \infty }{\!}{\longrightarrow } }\mathcal {N}(\varvec{0},\varvec{\Sigma }_{\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})})\), where \(\varvec{\Sigma }_{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0} )}=\varvec{D}_{\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})}-\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta } _{0})\varvec{p}(\varvec{\lambda }_{0},\varvec{\eta }_{0})^{T}\). Now the result holds after some algebra. \(\square \)

1.3 Proof of Theorem 3

A second order Taylor expansion of \(D_{\phi _{1}}\left( \varvec{p} ,\varvec{q}\right) \) around \(\left( \varvec{p}\left( \varvec{\theta }_{0}\right) ,\varvec{p}\left( \varvec{\theta } _{0}\right) \right) \) at \(\left( \widehat{\varvec{p}},\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) \) is given by

$$\begin{aligned} D_{\phi _{1}}\left( \widehat{\varvec{p}},\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) ={\frac{\phi _{1} ^{\prime \prime }(1)}{2}}\text { }\left( \widehat{\varvec{p}}-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) ^{T}\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1}\left( \widehat{\varvec{p} }-\varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}})\right) +o_{p}(N^{-1}). \end{aligned}$$

By Theorem 1 we have

$$\begin{aligned} \widetilde{\varvec{\theta }}_{\phi }= & {} (\varvec{\lambda }_{0} ^{T},\varvec{\eta }_{0}^{T})^{T}+(\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})^{T}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0}))^{-1}\varvec{A}(\varvec{\lambda } _{0},\varvec{\eta }_{0})^{T}\varvec{D}_{\varvec{p} (\varvec{\lambda }_{0},\varvec{\eta }_{0})}^{-{\frac{1}{2}} }(\widehat{\varvec{p}}-\varvec{p}(\varvec{\lambda }_{0} ,\varvec{\eta }_{0}))\nonumber \\&+\,O_{p}(N^{-1/2}). \end{aligned}$$

Therefore,

$$\begin{aligned} \varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}})-\varvec{p}\left( \varvec{\theta }_{0}\right) =\varvec{V}\left( \varvec{\theta } _{0}\right) \left( \hat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) +o_{p}(N^{-1/2}) \end{aligned}$$

with \(\varvec{V}\left( \varvec{\theta }_{0}\right) :=\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{1/2}\varvec{A}\left( \varvec{\ \theta }_{0}\right) \left( \varvec{A}\left( \varvec{\theta }_{0}\right) ^{T}\varvec{A}\left( \varvec{\theta }_{0}\right) \right) ^{-1}\varvec{A}\left( \varvec{\theta } _{0}\right) ^{T}\varvec{D}_{\varvec{p}(\varvec{\theta }_{0} )}^{-1/2}\). On the other hand,

$$\begin{aligned} \sqrt{N}\left( \hat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) \overset{\mathcal {L}}{\underset{N\rightarrow \infty }{\longrightarrow }}\mathcal {N}\left( \varvec{0},\varvec{\Sigma }_{\varvec{p}(\varvec{\theta }_{0})}\right) . \end{aligned}$$

Then we have

$$\begin{aligned} \widehat{\varvec{p}}-\varvec{p}(\widehat{\varvec{\theta }} _{\phi _{2}})=\left( \varvec{I}-\mathbf {V}\left( {\varvec{\theta } _{0}}\right) \right) \left( \hat{\varvec{p}}-\varvec{p} (\varvec{\theta }_{0})\right) +o_{p}(N^{-1/2}), \end{aligned}$$

and we conclude that

$$\begin{aligned} \sqrt{N}\left( \widehat{\varvec{p}}-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) \overset{\mathcal {L} }{\underset{N\rightarrow \infty }{\longrightarrow }}\mathcal {N}\left( \varvec{0},\left( \varvec{I}-\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \varvec{\Sigma }_{\varvec{p} (\varvec{\ \theta }_{0})}\left( \varvec{I}-\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \right) . \end{aligned}$$

Notice that the asymptotic distribution of

$$\begin{aligned} \frac{2N}{\phi _{1}^{\prime \prime }(1)}D_{\phi _{1}}\left( \widehat{\varvec{p}},\varvec{p}(\widehat{\varvec{\theta }} _{\phi _{2}})\right) \end{aligned}$$

coincides with the asymptotic distribution of the quadratic form

$$\begin{aligned} N\left( \widehat{\varvec{p}}-\varvec{p}(\widehat{\varvec{\theta } }_{\phi _{2}})\right) ^{T}\varvec{D}_{\varvec{p}(\varvec{\theta }_{0})}^{-1}\left( \widehat{\varvec{p}}-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) =\varvec{X} ^{T}\varvec{X}, \end{aligned}$$

with

$$\begin{aligned} \varvec{X}:=\sqrt{N}\varvec{D}_{\varvec{p}(\varvec{\theta } _{0})}^{-1/2}\left( \widehat{\varvec{p}}-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}})\right) . \end{aligned}$$

Now, as

$$\begin{aligned} \varvec{X}\overset{\mathcal {L}}{\underset{N\rightarrow \infty }{\longrightarrow }}\mathcal {N}\left( \varvec{0},\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \varvec{I} -\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \varvec{\Sigma }_{\varvec{p}(\varvec{\ \theta }_{0})}\left( \varvec{I}-\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \varvec{D}_{\varvec{p}(\varvec{\theta }_{0})} ^{-1/2}\right) , \end{aligned}$$

we conclude that the asymptotic distribution of \(\varvec{x}^{T} \varvec{x}\) will be a chi-square distribution if the matrix

$$\begin{aligned} \mathbf {Q}\left( {\varvec{\theta }}_{0}\right) :=\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \varvec{I} -\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \varvec{\Sigma }_{\varvec{p}(\varvec{\ \theta }_{0})}\left( \varvec{I}-\varvec{V}\left( \varvec{\theta }_{0}\right) ^{T}\right) \varvec{D}_{\varvec{p}(\varvec{\theta }_{0})}^{-1/2} \end{aligned}$$

is idempotent and symmetric, and in this case de degrees of freedom will be the trace of the matrix \(\mathbf {Q}\left( {\varvec{\theta }}_{0}\right) \). Symmetry is evident. Establishing that the matrix \(\mathbf {Q}\left( {\varvec{\theta }}_{0}\right) \) is idempotent and that its trace is \(g-(u+t)-1\) is a simple but long and tedious exercise.

1.4 Proof of Theorem 4

In Felipe et al. (2014) we established the asymptotic distribution of \(S_{A-B}^{\phi _{1},\phi _{2}}\) for LCMs for binary data. Let us then establish in this case the asymptotic distribution of \(T_{A-B}^{\phi _{1},\phi _{2}}\), the other being similar. A second order Taylor expansion of \(D_{\phi _{1}}\left( \varvec{p},\varvec{q}\right) \) around \(\left( \varvec{p}\left( \varvec{\theta }_{0}\right) ,\varvec{p}\left( \varvec{\theta } _{0}\right) \right) \) at \(\left( \varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}}^{A}),\varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}} ^{B})\right) \) is given by (see the proof of Theorem 1)

$$\begin{aligned} D_{\phi _{1}}\left( \varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2} }^{A}),\varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}}^{B})\right)&={\frac{\phi _{1}^{\prime \prime }(1)}{2}}\text { }\left( \varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{A})-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{B})\right) ^{T}\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1}\left( \varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{A})-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{B})\right) \\&\quad +o(||\varvec{p}(\widehat{\varvec{\theta }}_{\phi _{2}}^{A} )-\varvec{p}(\varvec{\theta }_{0})||^{2})+o(||\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{B})-\varvec{p} (\varvec{\theta }_{0})||^{2}), \end{aligned}$$

Therefore,

$$\begin{aligned} T_{A-B}^{\phi _{1,}\phi _{2}}=\varvec{X}_{A-B}^{T}\varvec{X}_{A-B} +o_{p}(1), \end{aligned}$$

with

$$\begin{aligned} \varvec{X}_{A-B}:=\sqrt{N}\varvec{D}_{\varvec{p} (\varvec{\theta }_{0})}^{-1/2}\left( \varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{A})-\varvec{p} (\widehat{\varvec{\theta }}_{\phi _{2}}^{B})\right) . \end{aligned}$$

On the other hand, we already know that

$$\begin{aligned} \widehat{\varvec{\theta }}_{\phi _{2}}^{A}-\varvec{\theta }_{0}&=\varvec{J}\left( \varvec{\theta }_{0}\right) \left( \varvec{A} _{A}^{T}\varvec{A}_{A}\right) ^{-1}\varvec{A}_{A}^{T}\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \hat{\varvec{p} }-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) +o_{p}(N^{-1/2}),\\ \widehat{\varvec{\theta }}_{\phi _{2}}^{B}-\varvec{\theta }_{0}&=\varvec{J}\left( \varvec{\theta }_{0}\right) \left( \varvec{A} _{B}^{T}\varvec{A}_{B}\right) ^{-1}\varvec{A}_{B}^{T}\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \hat{\varvec{p} }-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) +o_{p}(N^{-1/2}). \end{aligned}$$

Let us define

$$\begin{aligned} \varvec{W}_{A}:=\varvec{A}_{A}\left( \varvec{A}_{A} ^{T}\varvec{A}_{A}\right) ^{-1}\varvec{A}_{A}^{T},\quad \varvec{W} _{B}:=\varvec{A}_{B}\left( \varvec{A}_{B}^{T}\varvec{A} _{B}\right) ^{-1}\varvec{A}_{B}^{T}. \end{aligned}$$

Consequently, the asymptotic distribution of \(\varvec{x}\) coincides with the asymptotic distribution of

$$\begin{aligned} \sqrt{N}\left( \varvec{W}_{A}-\varvec{W}_{B}\right) \varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \hat{\varvec{p} }-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) . \end{aligned}$$

Now,

$$\begin{aligned} \sqrt{N}\left( \hat{\varvec{p}}-\varvec{p}\left( \varvec{\theta }_{0}\right) \right) \overset{\mathcal {L}}{\underset{N\rightarrow \infty }{\longrightarrow }}\mathcal {N}\left( \varvec{0,\Sigma }^{*}\right) \end{aligned}$$

being

$$\begin{aligned} \varvec{\Sigma }^{*}=\left( \varvec{W}_{A}-\varvec{W} _{B}\right) \varvec{D}_{\varvec{p}(\varvec{\theta }_{0})} ^{-1/2}\varvec{\Sigma }_{\varvec{p}(\varvec{\theta }_{0} )}\varvec{D}_{\varvec{p}(\varvec{\theta }_{0})}^{{-1/2}}\left( \varvec{W}_{A}-\varvec{W}_{B}\right) . \end{aligned}$$

Consequently, it suffices to show that \(\varvec{\Sigma }^{*}\) is a symmetric and idempotent matrix. Symmetry is trivial, hence it suffices to show idempotency. Notice that

$$\begin{aligned} \varvec{D}_{\varvec{p}(\varvec{\theta }_{0})}^{-1/2} \varvec{\Sigma }_{\varvec{p}(\varvec{\theta }_{0})}\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{{-1/2}}&=\varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{-1/2}\left( \varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}-\varvec{p}(\varvec{\theta }_{0})\varvec{p}(\varvec{\theta }_{0})^{T}\right) \varvec{D} _{\varvec{p}(\varvec{\theta }_{0})}^{{-1/2}}\\&=\varvec{I}-\sqrt{\varvec{p}(\varvec{\theta }_{0})} \sqrt{\varvec{p}(\varvec{\theta }_{0})}^{T}. \end{aligned}$$

Next, it can be computed that \(\varvec{W}_{A}\sqrt{\varvec{p} (\varvec{\theta }_{0})}=\varvec{W}_{B}\sqrt{\varvec{p} (\varvec{\theta }_{0})}=\varvec{0}\). Finally,

$$\begin{aligned} \varvec{\Sigma }^{*}{=}\left( \varvec{W}_{A}{-}\varvec{W} _{B}\right) \left( \mathbf {Id}-\sqrt{\varvec{p}(\varvec{\theta } _{0})}\sqrt{\varvec{p}(\varvec{\theta }_{0})}^{T}\right) \left( \varvec{W}_{A}-\varvec{W}_{B}\right) =\left( \varvec{W} _{A}{-}\varvec{W}_{B}\right) \left( \varvec{W}_{A}{-}\varvec{W} _{B}\right) . \end{aligned}$$

On the other hand,

$$\begin{aligned} \varvec{W}_{A}\varvec{W}_{B}=\varvec{W}_{B},\,\varvec{W} _{B}\varvec{W}_{A}=\varvec{W}_{B},\,\varvec{W}_{A}\varvec{W} _{A}=\varvec{W}_{A},\,\varvec{W}_{B}\varvec{W}_{B}=\varvec{W} _{B}, \end{aligned}$$

hence we conclude that

$$\begin{aligned} \varvec{\Sigma }^{*}=\varvec{W}_{A}-\varvec{W}_{B} \end{aligned}$$

and it is an idempotent matrix. We conclude that

$$\begin{aligned} T_{A-B}^{\phi _{1,}\phi _{2}}\overset{\mathcal {L}}{\underset{N\rightarrow \infty }{\longrightarrow }}\chi _{\hbox {tr}(\varvec{\Sigma }^{*})}^{2}. \end{aligned}$$

To obtain the degrees of freedom we compute

$$\begin{aligned} \hbox {tr}(\varvec{\Sigma }^{*})&=\hbox {tr}\left( \varvec{W} _{A}-\varvec{W}_{B}\right) \\&=\hbox {tr}\left( \varvec{W}_{A}\right) -\hbox {tr}\left( \varvec{W}_{B}\right) \\&=\hbox {tr}\left( \varvec{A}_{A}\left( \varvec{A}_{A} ^{T}\varvec{A}_{A}\right) ^{-1}\varvec{A}_{A}^{T}\right) -\hbox {tr}\left( \varvec{A}_{B}\left( \varvec{A}_{B} ^{T}\varvec{A}_{B}\right) ^{-1}\varvec{A}_{B}^{T}\right) \\&=\hbox {tr}\left( \varvec{A}_{A}^{T}\varvec{A}_{A}\left( \varvec{A}_{A}^{T}\varvec{A}_{A}\right) ^{-1}\right) -\hbox {tr} \left( \varvec{A}_{B}^{T}\varvec{A}_{B}\left( \varvec{A}_{B} ^{T}\varvec{A}_{B}\right) ^{-1}\right) \\&=h_{1}-h_{2}. \end{aligned}$$

This finishes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Felipe, A., Martín, N., Miranda, P. et al. Statistical inference in constrained latent class models for multinomial data based on \(\phi \)-divergence measures. Adv Data Anal Classif 12, 605–636 (2018). https://doi.org/10.1007/s11634-017-0289-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-017-0289-7

Keywords

Mathematics Subject Classification

Navigation