Abstract
In this article, we propose two classes of semiparametric mixture regression models with single-index for model based clustering. Unlike many semiparametric/nonparametric mixture regression models that can only be applied to low dimensional predictors, the new semiparametric models can easily incorporate high dimensional predictors into the nonparametric components. The proposed models are very general, and many of the recently proposed semiparametric/nonparametric mixture regression models are indeed special cases of the new models. Backfitting estimates and the corresponding modified EM algorithms are proposed to achieve optimal convergence rates for both parametric and nonparametric parts. We establish the identifiability results of the proposed two models and investigate the asymptotic properties of the proposed estimation procedures. Simulation studies are conducted to demonstrate the finite sample performance of the proposed models. Two real data applications using the new models reveal some interesting findings.
Similar content being viewed by others
References
Cao J, Yao W (2012) Semiparametric mixture of binomial regression with a degenerate component. Statistica Sinica 22:27–46
Chatterjee S, Handcock MS, Simmonoff JS (1995) A casebook for a first course in statistics and data analysis. Wiley, New York
Chen J, Li P (2009) Hypothesis test for normal mixture models: the EM approach. Ann Stat 37:2523–2542
Cook RD, Li B (2002) Dimension reduction for conditional mean in regression. Ann Stat 30:455–474
Fan J, Zhang C, Zhang J (2001) Generalized likelihood ratio statistics and Wilks phenomenon. Ann Stat 29:153–193
Frühwirth-Schnatter S (2001) Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J Am Stat Assoc 96:194–209
Green PJ, Richardson S (2002) Hidden markov models and disease mapping. J Am Stat Assoc 97:1055–1070
Härdle W, Hall P, Ichimura H (1993) Optimal smoothing in single-index models. Ann Stat 21:157–178
Henning C (2000) Identifiability of models for clusterwise linear regression. J Classif 17:273–296
Hu H, Yao W, Wu Y (2017) The robust EM-type algorithms for log-concave mixtures of regression models. Comput Stat Data Anal 111:14–26
Huang M, Yao W (2012) Mixture of regression models with varying mixing proportions: a semiparametric approach. J Am Stat Assoc 107:711–724
Huang M, Li R, Wang S (2013) Nonparametric mixture of regression models. J Am Stat Assoc 108:929–941
Huang M, Li R, Wang H, Yao W (2014) Estimating mixture of Gaussian processes by kernel smoothing. J Bus Econ Stat 32:259–270
Ichimura H (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J Econom 58:71–120
Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6:181–214
Li K (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86(414):316–327
Li P, Chen J (2010) Testing the order of a finite mixture. J Am Stat Assoc 105:1084–1092
Li B, Zha H, Chiaromonte F (2005) Contour regression: a general approach to dimension reduction. Ann Stat 33:1580–1616
Luo R, Wang H, Tsai CL (2009) Contour projected dimension reduction. Ann Stat 37:3743–3778
Ma Y, Zhu L (2012) A semiparametric approach to dimension reduction. J Am Stat Assoc 107(497):168–179
Ma Y, Zhu L (2013) Efficient estimation in sufficient dimension reduction. Ann Stat 41:250–268
Shao J (1993) Linear models selection by cross-validation. J Am Stat Assoc 88:486–494
Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc B 62:795–809
Titterington D, Smith A, Makov U (1985) Statistical analysis of finite mixture distribution. Wiley, New York
Wang H, Xia Y (2008) Sliced regression for dimension reduction. J Am Stat Assoc 103:811–821
Wang Q, Yao W (2012) An adaptive estimation of MAVE. J Multivar Anal 104:88–100
Wang S, Yao W, Huang M (2014) A note on the identiability of nonparametric and semiparametric mixtures of GLMs. Stat Probab Lett 93:41–45
Wedel M, DeSarbo WS (1993) A latent class binomial logit methodology for the analysis of paired comparison data. Decis Sci 24:1157–1170
Xiang S, Yao W (2018) Semiparametric mixtures of nonparametric regressions. Ann Inst Stat Math 70:131–154
Xiang S, Yao W, Yang G (2019) An overview of semiparametric extensions of finite mixture models. Stat Sci 34:391–404
Yao W, Lindsay BG (2009) Bayesian mixture labeling by highest posterior density. J Am Stat Assoc 104:758–767
Yao W, Nandy D, Lindsay B, Chiaromonte F (2019) Covariate information matrix for sufficient dimension reduction. J Am Stat Assoc 114:1752–1764
Young DS, Hunter DR (2010) Mixtures of regressions with predictors dependent mixing proportions. Comput Stat Data Anal 54:2253–2266
Zeng P (2012) Finite mixture of heteroscedastic single-index models. Open J Stat 2:12–20
Acknowledgements
The authors are grateful to the editor, the guest editor, and two referees for numerous helpful comments during the preparation of the article. Funding was provided by National Natural Science Foundation of China (Grant No. 11601477), Natural Science Foundation (USA) (Grant No. DMS-1461677), Department of Energy (Grant No. 10006272), the First Class Discipline of Zhejiang - A (Zhejiang University of Finance and Economics-Statistics), China (Grant No. NA) and Natural Science Foundation of Zhejiang Province (Grant No. LY19A010006).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
Technical conditions
-
(C1)
The sample \(\{({{\varvec{x}}}_i,Y_i),i=1,\ldots ,n\}\) is independent and identically distributed from its population \(({{\varvec{x}}},Y)\). The support for \({{\varvec{x}}}\), denoted by \(\mathscr {X}\), is a compact subset of \(\mathbb {R}^3\).
-
(C2)
The marginal density of \({\varvec{\alpha }}^\top {{\varvec{x}}}\), denoted by \(f(\cdot )\), is twice continuously differentiable and positive at the point z.
-
(C3)
The kernel function \(K(\cdot )\) has a bounded support, and satisfies that
$$\begin{aligned}&\int K(t)dt=1,\qquad \int tK(t)dt=0,\qquad \int t^2K(t)dt<\infty ,\nonumber \\&\int K^2(t)dt<\infty ,\qquad \int |K^3(t)|dt<\infty . \end{aligned}$$ -
(C4)
\(h\rightarrow 0\), \(nh\rightarrow 0\), and \(nh^5=O(1)\) as \(n\rightarrow \infty \).
-
(C5)
The third derivative \(|\partial ^3\ell ({\varvec{\theta }},y)/\partial \theta _i\partial \theta _j\partial \theta _k|\le M(y)\) for all y and all \({\varvec{\theta }}\) in a neighborhood of \({\varvec{\theta }}(z)\), and \(E[M(y)]<\infty \).
-
(C6)
The unknown functions \({\varvec{\theta }}(z)\) have continuous second derivative. For \(j=1,\ldots ,k\), \(\sigma _j^2(z)>0\), and \(\pi _j(z)>0\) for all \({{\varvec{x}}}\in \mathscr {X}\).
-
(C7)
For all i and j, the following conditions hold:
$$\begin{aligned} E\left[ \left| \frac{\partial \ell ({\varvec{\theta }}(z),Y)}{\partial \theta _i}\right| ^3\right]<\infty \qquad E\left[ \left( \frac{\partial ^2\ell ({\varvec{\theta }}(z),Y)}{\partial \theta _i\partial \theta _j}\right) ^2\right] <\infty \end{aligned}$$ -
(C8)
\({\varvec{\theta }}_0''(\cdot )\) is continuous at the point z.
-
(C9)
The third derivative \(|\partial ^3\ell ({\varvec{\pi }},y)/\partial \pi _i\partial \pi _j\partial \pi _k|\le M(y)\) for all y and all \({\varvec{\pi }}\) in a neighborhood of \({\varvec{\pi }}(z)\), and \(E[M(y)]<\infty \).
-
(C10)
The unknown functions \({\varvec{\pi }}(z)\) have continuous second derivative. For \(j=1,\ldots ,k\), \(\pi _j(z)>0\) for all \({{\varvec{x}}}\in \mathscr {X}\).
-
(C11)
For all i and j, the following conditions hold:
$$\begin{aligned} E\left[ \left| \frac{\partial \ell ({\varvec{\pi }}(z),Y)}{\partial \pi _i}\right| ^3\right]<\infty \qquad E\left[ \left( \frac{\partial ^2\ell ({\varvec{\pi }}(z),Y)}{\partial \pi _i\partial \pi _j}\right) ^2\right] <\infty \end{aligned}$$ -
(C11)
\({\varvec{\pi }}''(\cdot )\) is continuous at the point z.
Proof of Theorem 1
Ichimura (1993) have shown that under conditions (i)–(iv), \({\varvec{\alpha }}\) is identifiable. Further, Huang et al. (2013) showed that with condition (v), the nonparametric functions are identifiable. Thus completes the proof. \(\square \)
Proof of Theorem 2
Let
Define \(\hat{{\varvec{\pi }}}^*=(\hat{\pi }_1^*,\ldots ,\hat{\pi }_{k-1}^*)^\top \), \(\hat{{{\varvec{m}}}}^*=(\hat{m}_1^*,\ldots ,\hat{m}_k^*)^\top \), \(\hat{{{\varvec{\sigma }}}}^*=(\hat{\sigma }_1^*,\ldots ,\hat{\sigma }_k^*)^\top \) and denote \(\hat{{\varvec{\theta }}}^*=(\hat{{\varvec{\pi }}}^{*T},\hat{{{\varvec{m}}}}^{*T},(\hat{{{\varvec{\sigma }}}}^{*2})^\top )^\top \). Let \(a_n=(nh)^{-1/2}\) and
If \((\hat{{\varvec{\pi }}},\hat{{{\varvec{m}}}},\hat{{{\varvec{\sigma }}}}^2)^\top \) maximizes (4), then \(\hat{{\varvec{\theta }}}^*\) maximizes
with respect to \({\varvec{\theta }}^*\). By a Taylor expansion,
where
and
By WLLN, it can be shown that \({{\varvec{A}}}_{1n}=-f(z)\mathscr {I}^{(1)}_\theta (z)+o_p(1)\). Therefore,
Using the quadratic approximation lemma (see, for example, Fan and Gijbels 1996), we have that
Note that
where
Since \(\sqrt{n}(\tilde{{\varvec{\alpha }}}-{\varvec{\alpha }})=O_p(1)\), it can be shown that
and
Therefore,
To complete the proof, we now calculate the mean and variance of \({{\varvec{W}}}_n\). Note that
Similarly, we can show that
where \(\kappa _l=\int t^lK(t)dt\) and \(\nu _l=\int t^lK^2(t)dt\). The rest of the proof follows a standard argument. \(\square \)
Proof of Theorem 3
Denote \(Z={\varvec{\alpha }}^\top {{\varvec{x}}}\) and \(\hat{Z}=\hat{{\varvec{\alpha }}}^\top {{\varvec{x}}}\). Let \(\ell ({\varvec{\theta }}(z),X,Y)=\log \sum _{j=1}^k\pi _j(z)\phi (Y|m_j(z),\sigma _j^2(z))\). If \(\hat{{\varvec{\theta }}}(z_0;\hat{{\varvec{\alpha }}})\) maximizes (4), then it solves
Apply a Taylor expansion and use the conditions on h, we obtain
By similar argument as in the previous proof,
Note that
where the second part is handled by (27).
Since \(\hat{{\varvec{\alpha }}}\) maximizes (9), it is the solution to
where \(\lambda \) is the Lagrange multiplier. By the Taylor expansion and using (28), we have that
Define
and apply (27),
Interchanging the summations in the last term, we get
Let \(\varGamma _\alpha =I-{\varvec{\alpha }}{\varvec{\alpha }}^\top +o_p(1)\). Combining (29) and (30), and multiply by \(\varGamma _\alpha \), we have
It can be shown that the right-hand side of (31) has the covariance matrix \(\varGamma _\alpha {{\varvec{Q}}}_1\varGamma _\alpha \), and therefore, completes the proof. \(\square \)
Proof of Theorem 4
Ichimura (1993) have shown that under conditions (i)–(iv), \({\varvec{\alpha }}\) is identifiable. Furthermore, Huang and Yao (2012) showed that with condition (v), \(({\varvec{\pi }}(\cdot ),{\varvec{\beta }},{{\varvec{\sigma }}}^2)\) are identifiable. Thus completes the proof. \(\square \)
Proof of Theorem 5
This proof is similar to the proof of Theorem 2.
Let \(\hat{\pi }_j^*=\sqrt{nh}\{\hat{\pi }_j-\pi _j(z)\}\), \(j=1,\ldots ,k-1\), and \(\hat{{\varvec{\pi }}}^*=(\hat{\pi }_1^*,\ldots ,\hat{\pi }_{k-1}^*)^\top \). It can be shown that
where
To complete the proof, notice that
and Cov\(({{\varvec{W}}}_{2n})=f(z)\mathscr {I}^{(2)}_\pi (z)\nu _0+o_p(1)\). The rest of the proof follows a standard argument. \(\square \)
Proof of Theorem 6s
The proof is similar to the proof of Theorem 3. It can be shown that
and therefore,
Since \(\hat{{{\varvec{\lambda }}}}\) maximizes (14), it is the solution to
where \(\gamma \) is the Lagrange multiplier. By Taylor series and (32)
where \({{\varvec{\varLambda }}}_{1i}=\begin{pmatrix}{{\varvec{x}}}_i{\varvec{\pi }}'(Z_i)\\ \mathbf{I} \end{pmatrix}\), and the last equation is the result of interchanging the summations. Let \(\varGamma _\alpha =\begin{pmatrix}{{\varvec{I}}}-{\varvec{\alpha }}{\varvec{\alpha }}^\top &{}\mathbf 0 \\ \mathbf 0 &{}{{\varvec{I}}}\end{pmatrix}+o_p(1)\). By (33), and multiply by \(\varGamma _\alpha \), we have
It can be shown that the right-hand side of (34) has the covariance matrix \(\varGamma _\alpha {{\varvec{Q}}}_2\varGamma _\alpha \), and thus, completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Xiang, S., Yao, W. Semiparametric mixtures of regressions with single-index for model based clustering. Adv Data Anal Classif 14, 261–292 (2020). https://doi.org/10.1007/s11634-020-00392-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-020-00392-w