Estimation of a zero-inflated Poisson regression model with missing covariates via nonparametric multiple imputation methods

Shen-Ming Lee^1,2,
T. Martin Lukusa³ &
Chin-Shang Li⁴

688 Accesses
7 Citations
Explore all metrics

Abstract

Zero-inflated Poisson (ZIP) regression is widely applied to model effects of covariates on an outcome count with excess zeros. In some applications, covariates in a ZIP regression model are partially observed. Based on the imputed data generated by applying the multiple imputation (MI) schemes developed by Wang and Chen (Ann Stat 37:490–517, 2009), two methods are proposed to estimate the parameters of a ZIP regression model with covariates missing at random. One, proposed by Rubin (in: Proceedings of the survey research methods section of the American Statistical Association, 1978), consists of obtaining a unified estimate as the average of estimates from all imputed datasets. The other, proposed by Fay (J Am Stat Assoc 91:490–498, 1996), consists of averaging the estimating scores from all imputed data sets to solve the imputed estimating equation. Moreover, it is shown that the two proposed estimation methods are asymptotically equivalent to the semiparametric inverse probability weighting method. A modified formula is proposed to estimate the variances of the MI estimators. An extensive simulation study is conducted to investigate the performance of the estimation methods. The practicality of the methodology is illustrated with a dataset of motorcycle survey of traffic regulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates

Article 16 October 2015

Semiparametric Weighting Estimations of a Zero-Inflated Poisson Regression with Missing in Covariates

Spline-based semiparametric estimation of a zero-inflated Poisson regression single-index model

Article 02 July 2015

References

Barry SC, Welsh AH (2002) Generalized additive modeling and zero-inflated count data. Ecol Model 157:179–188
Google Scholar
Bohning D, Dietz E, Schlattmann P, Mendonca L, Kirchner U (1999) The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Stat Soc Ser A 162:195–209
Cameron AC, Trivedi PK (2013) Regression analysis of count data, 2nd edn. Cambridge University Press, New York
MATH Google Scholar
Clayton D, Spiegelhalter D, Dunn G, Pickles A (1998) Analysis of longitudinal binary data from multiphase sampling (with discussion). J R Stat Soc Ser B 60:71–87
MathSciNet MATH Google Scholar
Chen XD, Fu YZ (2011) Model selection for zero-inflated regression with missing covariates. Comput Stat Data Anal 55:765–773
MathSciNet MATH Google Scholar
Cheung YB (2002) Zero-inflated models for regression analysis of count data, a study of growth and development. Stat Med 21:1461–1469
Google Scholar
Creemers A, Aerts M, Hens N, Molenberghs G (2012) A nonparametric approach to weighted estimating equations for regression analysis with missing covariates. Comput Stat Data Anal 56:100–113
MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
MathSciNet MATH Google Scholar
Deng D, Paul SR (2000) Score tests for zero inflation in generalized linear models. Can J Stat 27:563–570
MathSciNet MATH Google Scholar
Deng D, Paul SR (2005) Score tests for zero-inflation and over-dispersion in generalized linear models. Stat Sin 15:257–276
MathSciNet MATH Google Scholar
Dietz K, Böhning D (1997) The use of two-component mixture models with one completely or partly known component. Comput Stat 12:219–234
MATH Google Scholar
Fay RE (1996) Alternative paradigms for the analysis of imputed survey data. J Am Stat Assoc 91:490–498
MATH Google Scholar
Hall DB, Shen J (2010) Robust estimation for zero-inflated Poisson regression. Scand J Stat 37:237–252
MathSciNet MATH Google Scholar
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
MathSciNet MATH Google Scholar
Hsieh SH, Lee SM, Shen PS (2009) Semiparametric analysis of randomized response data with missing covariates in logistic regression. Comput Stat Data Anal 53:2673–2692
MathSciNet MATH Google Scholar
Hsieh SH, Lee SM, Shen PS (2010) Logistic regression analysis of randomized response data with missing covariates. J Stat Plan Inference 140:927–940
MathSciNet MATH Google Scholar
Huang L, Zheng D, Zalkikar J, Tiwari R (2017) Zero-inflated Poisson model based likelihood ratio test for drug safety signal detection. Stat Methods Med Res 26:471–488
MathSciNet Google Scholar
Jansakul N, Hinde JP (2002) Score tests for zero-inflated Poisson models. Comput Stat Data Anal 40:75–96
MathSciNet MATH Google Scholar
Johnson NL, Kemp AW, Kotz S (2005) Univariate discrete distributions, 3rd edn. Wiley, New York
MATH Google Scholar
Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14
MATH Google Scholar
Lee SM, Gee MJ, Hsieh SH (2011) Semiparametric methods in the proportional odds model for ordinal response data with missing covariates. Biometrics 67:788–798
MathSciNet MATH Google Scholar
Lee JH, Han G, Fulp WJ, Giuliano AR (2012a) Analysis of overdispersed count data: application to the human papillomavirus infection in men (HIM) study. Epidemiol Infect 140:1087–1094
Google Scholar
Lee SM, Li CS, Hsieh SH, Huang LH (2012b) Semiparametric estimation of logistic regression model with missing covariates and outcome. Metrika 75:621–653
MathSciNet MATH Google Scholar
Lee SM, Hwang WH, Tapsoba JD (2016) Estimation in closed capture-recapture models when covariates are missing at random. Biometrics 72:1294–1304
MathSciNet MATH Google Scholar
Li CS (2011) A Lack-of-fit test for parametric zero-inflated Poisson models. J Stat Comput Simul 81:1081–1098
MathSciNet MATH Google Scholar
Li CS (2012) Score test for semiparametric zero-inflated Poisson model. Int J Stat Probab 1:1–7
Google Scholar
Little RJA (1992) Regression with missing X’s: a review. J Am Stat Assoc 87:1227–1237
Google Scholar
Liu H, Powers DA (2007) Growth curve models for zero-inflated count data: An application to smoking behavior. Struct Equ Model Multidiscip J 14:247–279
MathSciNet Google Scholar
Lu SE, Lin Y, Shih WCJ (2004) Analyzing excessive no changes in clinical trials with clustered data. Biometrics 60:257–267
MathSciNet MATH Google Scholar
Lukusa TM, Lee SM, Li CS (2016) Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates. Metrika 79:457–483
MathSciNet MATH Google Scholar
Lukusa TM, Lee SM, Li CS (2017) Review of zero-inflated models with missing data. Curr Res Biostat 7:1–12
Google Scholar
Mullahy J (1986) Specification and testing of some modified count data models. J Econ 33:341–365
MathSciNet Google Scholar
Pahel BT, Preisser JS, Stearns SC, Rozier RG (2011) Multiple imputation of dental caries data using a zero-inflated Poisson regression model. J Public Health Dent 71:71–78
Google Scholar
Reilly M, Pepe MS (1995) A mean score method for missing and auxiliary covariates data in regression methods. Biometrika 82:299–314
MathSciNet MATH Google Scholar
Ridout M, Demetrio CGB, Hinde J (1998) Models for count data with many zeros. In: 19th international biometric conference, Cape Town, pp 179–192
Righi P, Falorsi S, Fasulo A (2014) Methods for variance estimation under random hot deck imputation in business surveys. Rivista Di Statistica Ufficiale N 1–2(2014):45–64
Google Scholar
Robins JM, Wang N (2000) Inference for imputation estimators. Biometrika 87:113–124
MathSciNet MATH Google Scholar
Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866
MathSciNet MATH Google Scholar
Rubin DB (1976) Inference and missing data. Biometrika 63:581–592
MathSciNet MATH Google Scholar
Rubin DB (1978) Multiple imputations in sample surveys: a phenomenological Bayesian approach to nonresponse. In: Proceedings of the survey research methods section of the American Statistical Association, vol. 1. American Statistical Association, Boston, pp 20-28
Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
MATH Google Scholar
Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–489
MATH Google Scholar
Rubin DB, Schenker N (1986) Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 81:366–374
MathSciNet MATH Google Scholar
Samani EB, Ganjali M, Amirian Y (2012) Zero-inflated power series joint model to analyze count data with missing responses. J Stat Theor Pract 6:334–343
MathSciNet MATH Google Scholar
Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7:147–177
Google Scholar
Singh S (1963) A note on inflated Poisson distribution. J Indian Stat Assoc 1:140–144
MathSciNet Google Scholar
Van den Broek J (1995) A score test for zero inflation in a Poisson distribution. Biometrics 51:738–743
MathSciNet MATH Google Scholar
Wang S, Wang CY (2001) A note on kernel assisted estimators in missing covariate regression. Stat Probab Lett 55:439–449
MathSciNet MATH Google Scholar
Wang D, Chen SX (2009) Empirical likelihood for estimating equations with missing values. Ann Stat 37:490–517
MathSciNet MATH Google Scholar
Wang CY, Wang S, Zhao LP, Ou ST (1997) Weighted semiparametric estimation in regression with missing covariate data. J Am Stat Assoc 92:512–525
MathSciNet MATH Google Scholar
Wang CY, Chen JC, Lee SM, Ou ST (2002) Joint conditional likelihood estimator in logistic regression with missing covariate data. Stat Sin 12:555–574
MathSciNet MATH Google Scholar
Xiang L, Lee AH, Yau KKW, McLachlan GJ (2007) A score test for overdispersion in zero-inflated Poisson mixed regression model. Stat Med 26:1608–1622
MathSciNet Google Scholar
Yau KKW, Lee AH (2001) Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Stat Med 20:2907–2920
Google Scholar
Zhao LP, Lipsitz S (1992) Designs and analysis of two-stage studies. Stat Med 11:769–782
Google Scholar

Download references

Acknowledgements

The authors are very grateful for two referees’ helpful comments and suggestions that improved the presentation. This work was supported by the Ministry of Science and Technology of Taiwan (S.M. Lee).

Author information

Authors and Affiliations

Department of Statistics, Feng Chia University, Taichung, Taiwan, ROC
Shen-Ming Lee
Center for Survey Research, Research Center for Humanities and Social Science, Academia Sinica, Taipei, Taiwan, ROC
Shen-Ming Lee
Institute of Statistical Science, Academia Sinica, Taipei, Taiwan, ROC
T. Martin Lukusa
School of Nursing, The State University of New York, University at Buffalo, Buffalo, NY, USA
Chin-Shang Li

Authors

Shen-Ming Lee
View author publications
You can also search for this author in PubMed Google Scholar
T. Martin Lukusa
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Shang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chin-Shang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof of Theorem 1

It can be obtained from the empirical CDF $\hat{F}(x|Y_i,\varvec{V}_i)$ in (10) that for $i=1,\dots ,n$ and $v=1,\dots ,M$,

$$\begin{aligned} E_{\hat{F}}(\tilde{S}_{iv}({\varvec{\theta }})|Y_i,\varvec{V}_i) =\sum _{k=1}^{n}\dfrac{\delta _kI(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)S_k({\varvec{\theta }})}{\sum _{r=1}^{n}\delta _rI(Y_r=Y_i,\varvec{V}_r=\varvec{V}_i)}. \end{aligned}$$

(18)

By using the expression of $U_v({\varvec{\theta }})$ in (11), the expression of $E_{\hat{F}}(\tilde{S}_{iv}({\varvec{\theta }})|Y_i,\varvec{V}_i)$ in (18), and the fact that

$$\begin{aligned} \sum _{i=1}^{n}(1-\delta _i)E_{\hat{F}}(\tilde{S}_{iv}({\varvec{\theta }})|Y_i,\varvec{V}_i) =\sum _{k=1}^n\delta _kS_k({\varvec{\theta }})\left[ \dfrac{1}{\hat{\pi }(Y_k,\varvec{V}_k)}-1\right] , \end{aligned}$$

(19)

$v=1,\ldots ,M$, we can have $E_{\hat{F}}(U_v({\varvec{\theta }})|\mathcal {O})=n^{-1/2}\sum _{i=1}^n[\delta _i/\hat{\pi }(Y_i,\varvec{V}_i)]S_i({\varvec{\theta }}) =U_w({\varvec{\theta }},\hat{\varvec{\pi }})$, $v=1,\ldots ,M$. Similarly, it can be shown that $E_{\hat{F}}(\partial {U}_v({\varvec{\theta }})/{\varvec{\theta }}|\mathcal {O}) =\partial {U}_w({\varvec{\theta }},\hat{\varvec{\pi }})/\partial {\varvec{\theta }}$ and, hence, $E(\partial {U}_v({\varvec{\theta }})/\partial {\varvec{\theta }}) =E(\partial {U}_w({\varvec{\theta }},\hat{\varvec{\pi }})/\partial {\varvec{\theta }})$, $v=1,\ldots ,M$.

Recall that $S_i^*({\varvec{\theta }})=E(S_i({\varvec{\theta }})|Y_i,\varvec{V}_i)$, $i=1,\dots ,n$. As given in (16), $U_{m2}({\varvec{\theta }})$ can be expressed as follows:

$$\begin{aligned} U_{m2}({\varvec{\theta }})= & {} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}\left[ \delta _iS_i({\varvec{\theta }})+(1-\delta _i)S_i^*({\varvec{\theta }})\right] \nonumber \\&+\dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) \left[ \tilde{S}_{i}({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|\mathcal {O})\right] \nonumber \\&+\dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) \left[ E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})\big |\mathcal {O})-S_i^*({\varvec{\theta }})\right] . \end{aligned}$$

(20)

Note that the second term of the expression of $U_{m2}({\varvec{\theta }})$ in (20) can be reformulated as

$$\begin{aligned} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) [\tilde{S}_{i}({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|\mathcal {O})] =O_p(M^{-1/2}), \end{aligned}$$

(21)

where $\tilde{S}_i({\varvec{\theta }})=\sum _{v=1}^{M}\tilde{S}_{iv}({\varvec{\theta }})/M$. The third term of the expression of $U_{m2}({\varvec{\theta }})$ in (20) can be rewritten as follows:

$$\begin{aligned}&\dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i)\left[ E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|\mathcal {O})-S_i^*({\varvec{\theta }})\right] \nonumber \\&= \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i)\left[ \sum _{k=1}^{n}\dfrac{\delta _kS_k({\varvec{\theta }})I(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{r=1}^{n}\delta _rI(Y_r=Y_i,\varvec{V}_r=\varvec{V}_i)} \nonumber \right. \\&\quad \left. -S_i^*({\varvec{\theta }})\sum _{k=1}^{n}\dfrac{\delta _kI(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{r=1}^{n}\delta _rI(Y_r=Y_i,\varvec{V}_r=\varvec{V}_i)}\right] \nonumber \\&= \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i)\left[ \sum _{k=1}^{n}\dfrac{\delta _k[S_k({\varvec{\theta }})-S_i^*({\varvec{\theta }})] I(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{r=1}^{n}\delta _rI(Y_r=Y_i,\varvec{V}_r=\varvec{V}_i)}\right] \nonumber \\&= \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) \left[ \sum _{k=1}^n\dfrac{\delta _k[S_k({\varvec{\theta }})-S_i^*({\varvec{\theta }})] I(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{s=1}^{n}I(Y_s=Y_i,\varvec{V}_s=\varvec{V}_i)}\right] \nonumber \\&\quad \times \,\left[ \dfrac{\sum _{s=1}^{n}I(Y_s=Y_i,\varvec{V}_s=\varvec{V}_i)}{\sum _{r=1}^{n}\delta _r I(Y_r=Y_i,\varvec{V}_r=\varvec{V}_i)}\right] \nonumber \\&= \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}\dfrac{(1-\delta _i)}{\hat{\pi }(Y_i,\varvec{V}_i)} \left[ \dfrac{\sum _{k=1}^{n}\delta _k[S_k({\varvec{\theta }})-S_k^*({\varvec{\theta }})]I(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{s=1}^{n}I(Y_s=Y_i,\varvec{V}_s=\varvec{V}_i)}\right] \nonumber \\&=\dfrac{1}{\sqrt{n}}\sum _{k=1}^n\delta _k[S_k({\varvec{\theta }})-S_k^*({\varvec{\theta }})] \left\{ \sum _{i=1}^{n}\dfrac{(1-\delta _i)}{\hat{\pi }(Y_i,\varvec{V}_i)} \left[ \dfrac{I(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{s=1}^{n}I(Y_s=Y_i,\varvec{V}_s=\varvec{V}_i)}\right] \right\} \nonumber \\&=\dfrac{1}{\sqrt{n}}\sum _{k=1}^{n}\delta _k[S_k({\varvec{\theta }})-S_k^*({\varvec{\theta }})] \left[ \dfrac{1-\hat{\pi }(Y_k,\varvec{V}_k)}{\hat{\pi }(Y_k,\varvec{V}_k)}\right] \nonumber \\&= \dfrac{1}{\sqrt{n}}\sum _{k=1}^{n}\delta _k\left[ \dfrac{1-\pi (Y_k,\varvec{V}_k)}{\pi (Y_k,\varvec{V}_k)}\right] \left[ S_k({\varvec{\theta }})-S_k^*({\varvec{\theta }})\right] +o_p(1). \end{aligned}$$

(22)

Hence, from (21) and (22), $U_{m2}({\varvec{\theta }})$ can be re-expressed as

$$\begin{aligned} U_{m2}({\varvec{\theta }})= & {} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}\left\{ \dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}S_i({\varvec{\theta }})+ \left[ 1-\dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}\right] S_i^*({\varvec{\theta }})\right\} \nonumber \\&+ O_p(M^{-1/2})+o_p(1). \end{aligned}$$

(23)

Because the first term is the sum of independent and identically distributed random vectors, it can be shown by the multivariate central limit theorem that $U_{m2}({\varvec{\theta }}){\mathop {\rightarrow }\limits ^{d}}\mathcal {N}(\varvec{0},M({\varvec{\theta }},\varvec{\pi }))$ as $n,M\rightarrow \infty $, where $M({\varvec{\theta }},\varvec{\pi })$ is given in (15). In addition, $U_{m2}({\varvec{\theta }})$ in (23) can be expressed as

$$\begin{aligned} U_{m2}({\varvec{\theta }})= & {} U_w({\varvec{\theta }},\varvec{\pi }) +\dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}\left[ 1-\dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}\right] S_i^*({\varvec{\theta }})\nonumber \\&+ O_p(M^{-1/2}) +o_p(1). \end{aligned}$$

(24)

Because $\hat{\varvec{\theta }}_{m2}^{(M)}$ is the solution of $U_{m2}({\varvec{\theta }})=\varvec{0}$, it follows by a Taylor’s expansion of $U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)})$ at ${\varvec{\theta }}$ and the expression of $U_{m2}({\varvec{\theta }})$ in (23) that

$$\begin{aligned} \varvec{0}=U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)})= & {} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}\left\{ \dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}S_i({\varvec{\theta }}) +\left[ 1-\dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}\right] S_i^*({\varvec{\theta }})\right\} \\&- G({\varvec{\theta }},\varvec{\pi })\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }})+O_p(M^{-1/2})+o_p(1), \end{aligned}$$

where $G({\varvec{\theta }},\varvec{\pi })=E[-\partial {U}_w({\varvec{\theta }},\varvec{\pi })/(\sqrt{n}\partial {\varvec{\theta }})]$. Therefore, it can be obtained that

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }})= & {} \dfrac{1}{\sqrt{n}}G^{-1}({\varvec{\theta }},\varvec{\pi })\left\{ \sum _{i=1}^n \left[ \dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}S_i({\varvec{\theta }})+\left( 1-\dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}\right) S_i^*({\varvec{\theta }})\right] \right\} \\&+\, O_p(M^{-1/2})+o_p(1). \end{aligned}$$

This implies that $\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }}){\mathop {\rightarrow }\limits ^{d}}\mathcal {N}(\varvec{0},\Delta _m({\varvec{\theta }}))$ as $n,M\rightarrow \infty $, where $\Delta _m({\varvec{\theta }})=G^{-1}({\varvec{\theta }},\varvec{\pi })M({\varvec{\theta }},\varvec{\pi })[G^{-1}({\varvec{\theta }},\varvec{\pi })]^T$.

Let $\hat{\varvec{\theta }}_v$ be the solution to the estimating equations $U_v({\varvec{\theta }})=\varvec{0}$. We have by a Taylor’s expansion of $U_v(\hat{\varvec{\theta }}_v)$ at ${\varvec{\theta }}$ that

$$\begin{aligned} \varvec{0}=U_v(\hat{\varvec{\theta }}_v) =U_v({\varvec{\theta }})-G({\varvec{\theta }},\varvec{\pi })\sqrt{n}(\hat{\varvec{\theta }}_v-{\varvec{\theta }})+o_p(1). \end{aligned}$$

Hence, it follows that $\sqrt{n}(\hat{\varvec{\theta }}_v-{\varvec{\theta }})=G^{-1}({\varvec{\theta }},\varvec{\pi })U_v({\varvec{\theta }})+o_p(1)$. Because $\hat{\varvec{\theta }}_{m1}^{(M)}=\sum _{v=1}^{M}\hat{\varvec{\theta }}_v/M$, using the above result and the expressions for $U_{m2}({\varvec{\theta }})$ in (13) and (23), we can have

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\theta }}_{m1}^{(M)}-{\varvec{\theta }})= & {} G^{-1}({\varvec{\theta }},\varvec{\pi })\left( \dfrac{1}{M}\sum _{v=1}^MU_v({\varvec{\theta }})\right) +o_p(1)\nonumber \\= & {} \dfrac{1}{\sqrt{n}}G^{-1}({\varvec{\theta }},\varvec{\pi })\sum _{i=1}^n\left\{ \dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)} S_i({\varvec{\theta }})+\left[ 1-\dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}\right] S_i^*({\varvec{\theta }})\right\} \nonumber \\&+\, O_p(M^{-1/2})+o_p(1), \end{aligned}$$

(25)

and it is shown easily that $\sqrt{n}(\hat{\varvec{\theta }}_{m1}^{(M)}-{\varvec{\theta }}){\mathop {\rightarrow }\limits ^{d}}{N}(\varvec{0},\Delta _{m}({\varvec{\theta }}))$ as $n,M\rightarrow \infty $.

1.2 Proof of Theorem 2

Because $\hat{\varvec{\theta }}_{m2}^{(M)}$ is the solution of $U_{m2}({\varvec{\theta }})=M^{-1}\sum _{v=1}^{M}U_v({\varvec{\theta }})=\varvec{0}$, a Taylor’s expansion of $U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)})$ at ${\varvec{\theta }}$ can lead to $\varvec{0}=U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)}) =M^{-1}\sum _{v=1}^MU_v({\varvec{\theta }})-G({\varvec{\theta }},\varvec{\pi })\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }})+o_p(1)$, which implies that

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }}) =G^{-1}({\varvec{\theta }},\varvec{\pi })\left( \dfrac{1}{M}\sum _{v=1}^{M}U_v({\varvec{\theta }})\right) +o_p(1). \end{aligned}$$

(26)

Thus, it follows from (25) and (26) that $\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-\hat{\varvec{\theta }}_{m1}^{(M)})=o_p(1)$. This shows that $\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-\hat{\varvec{\theta }}_{m1}^{(M)})$ converges in probability to $\varvec{0}$ as $n,M\rightarrow \infty $.

Next, we show that the semiparametric IPW estimator and the second MI-type estimator are asymptotically equivalent. $U_{m2}({\varvec{\theta }})$ can be expressed as

$$\begin{aligned} U_{m2}({\varvec{\theta }})= & {} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n} \left\{ \delta _iS_i({\varvec{\theta }})+(1-\delta _i)E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})\big |Y_i,\varvec{V}_i) \right. \\&\left. +\,(1-\delta _i)\left[ \tilde{S}_i({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|Y_i,\varvec{V}_i)\right] \right\} . \end{aligned}$$

Using the fact given in (19), $n^{-1/2}\sum _{i=1}^n[\delta _iS_i({\varvec{\theta }})+(1-\delta _i)E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|Y_i,\varvec{V}_i)]$ can be expressed as

$$\begin{aligned}&\dfrac{1}{\sqrt{n}} \sum _{i=1}^{n}[\delta _iS_i({\varvec{\theta }})+(1-\delta _i)E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})\big |Y_i,\varvec{V}_i)] \\&\quad = \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n} \delta _iS_i({\varvec{\theta }})+\dfrac{1}{\sqrt{n}}\sum _{k=1}^{n}\delta _kS_k({\varvec{\theta }}) \left[ \dfrac{1}{\hat{\pi }(Y_k,\varvec{V}_k)}-1\right] =U_w({\varvec{\theta }},\hat{\varvec{\pi }}). \end{aligned}$$

Hence it can be obtained that

$$\begin{aligned} U_{m2}({\varvec{\theta }})=U_w({\varvec{\theta }},\hat{\varvec{\pi }})+\dfrac{1}{\sqrt{n}}\sum _{i=1}^n(1-\delta _i) \left[ \tilde{S}_i({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|Y_i,\varvec{V}_i)\right] . \end{aligned}$$

Recall $\mathcal {B}({\varvec{\theta }};Y_i,\varvec{V}_i) =M^{-1/2}\sum _{v=1}^M\left[ \tilde{S}_{iv}({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|Y_i,\varvec{V}_i)\right] $, $i=1,\ldots ,n$. Because

$$\begin{aligned} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) \left[ \tilde{S}_i({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|Y_i,\varvec{V}_i)\right] M^{-1/2}\dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i)\mathcal {B}({\varvec{\theta }};Y_i,\varvec{V}_i) \end{aligned}$$

and $(1-\delta _i)\mathcal {B}({\varvec{\theta }};Y_i,\varvec{V}_i)$ are independent and identically distributed random vectors with mean $\varvec{0}$ and covariance matrix $E\{(1-\delta _1)[\mathcal {B}({\varvec{\theta }},Y_1,\varvec{V}_1)]^{\otimes 2}\}$, it implies by the multivariate central limit theorem that $n^{-1/2}\sum _{i=1}^{n}(1-\delta _i)\mathcal {B}({\varvec{\theta }};Y_i,\varvec{V}_i)=O_p(1)$ and, hence,

$$\begin{aligned} U_{m2}({\varvec{\theta }})-U_w({\varvec{\theta }},\hat{\varvec{\pi }})= & {} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) \left[ \tilde{S}_i({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})\big |Y_i,\varvec{V}_i)\right] \nonumber \\= & {} M^{-1/2}O_p(1)=O_p(M^{-1/2}). \end{aligned}$$

(27)

Let $\hat{\varvec{\theta }}_{m2}^{(M)}$ be the solution of $U_{m2}({\varvec{\theta }})=\varvec{0}$. Because by a Taylor’s expansion of $U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)})$ at ${\varvec{\theta }}$ and $U_w(\hat{\varvec{\theta }}_{ws},\hat{\varvec{\pi }})$ at ${\varvec{\theta }}$, respectively, we can have that

$$\begin{aligned} \varvec{0}=U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)}) =U_{m2}({\varvec{\theta }})-G({\varvec{\theta }},\varvec{\pi })\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }})+o_p(1) \end{aligned}$$

and

$$\begin{aligned} \varvec{0}=U_w(\hat{\varvec{\theta }}_{ws},\hat{\varvec{\pi }}) =U_{w}({\varvec{\theta }},\hat{\varvec{\pi }})-G({\varvec{\theta }},\varvec{\pi })\sqrt{n}(\hat{\varvec{\theta }}_{ws}-{\varvec{\theta }})+o_p(1), \end{aligned}$$

it can be shown from (27) that

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-\hat{\varvec{\theta }}_{ws}) G^{-1}({\varvec{\theta }},\varvec{\pi })\left[ U_{m2}({\varvec{\theta }})-U_w({\varvec{\theta }},\hat{\varvec{\pi }})\right] +o_p(1) =o_p(1)+O_p(M^{-1/2}). \end{aligned}$$

Therefore, it follows that $\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-\hat{\varvec{\theta }}_{ws})$ converges in probability to $\varvec{0}$ as $n,M\rightarrow \infty $.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, SM., Lukusa, T.M. & Li, CS. Estimation of a zero-inflated Poisson regression model with missing covariates via nonparametric multiple imputation methods. Comput Stat 35, 725–754 (2020). https://doi.org/10.1007/s00180-019-00930-x

Download citation

Received: 20 December 2018
Accepted: 09 October 2019
Published: 14 October 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00180-019-00930-x

Estimation of a zero-inflated Poisson regression model with missing covariates via nonparametric multiple imputation methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates

Semiparametric Weighting Estimations of a Zero-Inflated Poisson Regression with Missing in Covariates

Spline-based semiparametric estimation of a zero-inflated Poisson regression single-index model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Proof of Theorem 1

1.2 Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Estimation of a zero-inflated Poisson regression model with missing covariates via nonparametric multiple imputation methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates

Semiparametric Weighting Estimations of a Zero-Inflated Poisson Regression with Missing in Covariates

Spline-based semiparametric estimation of a zero-inflated Poisson regression single-index model

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of Theorem 1

1.2 Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation