Estimation of parameters of logistic regression for two-stage randomized response technique

Pei-Chieh Chang¹,
Kim-Hung Pho^1,2,
Shen-Ming Lee¹ &
…
Chin-Shang Li³

359 Accesses
5 Citations
Explore all metrics

Abstract

When a survey study is related to sensitive issues such as political orientation, sexual orientation, and income, respondents may not be willing to reply truthfully, which leads to bias results. To protect the respondents’ privacy and improve their willingness to provide true answers, Warner (J Am Stat Assoc 60:63–69, 1965) proposed the randomized response (RR) technique in which respondents select a question by means of a random device in order to ensure that they maintain privacy. Huang (Stat Neerl 58:75–82, 2004) extended the RR design of Warner (1965) to propose a two-stage RR design. Not only can this method be used to estimate the population proportion of persons with a sensitive characteristic, but also estimate the honest answer rate in the first stage. This work develops a covariate extension of the two-stage RR design of Huang (2004) by applying logistic regression to investigate the effects of covariates on a sensitive characteristic and an honest response. Simulation experiments are conducted to study the finite-sample performance of the maximum likelihood estimators of the logistic regression parameters. The proposed methodology is applied to analyze the survey data of sexuality of freshmen at Feng Chia University in Taiwan in 2016.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Alternative Estimator in Dichotomous Randomized Response Technique

Article 25 August 2018

An Improved Two-Stage Forced Randomized Response Model for Estimating the Proportion of Sensitive Attribute

Article 17 June 2022

Two-stage unrelated randomized response model to estimate the prevalence of a sensitive attribute

Article 30 January 2023

References

Chang HJ, Huang KC (2001) Estimation of proportion and sensitivity of a qualitative character. Metrika 53:269–280
Article MathSciNet Google Scholar
Chaudhuri A (2002) Estimating sensitive proportions from randomized responses in unequal probability sampling. Calcutta Stat Assoc Bull 52:315–322
Article MathSciNet Google Scholar
Chaudhuri A, Mukerjee R (1988) Randomized response: theory and techniques. Marcel Dekker, New York
MATH Google Scholar
Christofides TC (2003) A generalized randomized response technique. Metrika 57:195–200
Article MathSciNet Google Scholar
Christofides TC (2005) Randomized response in stratified sampling. J Stat Plan Inference 128:303–310
Article MathSciNet Google Scholar
Cruyff MJLF, Böckenholt U, van den Hout A, van der Heijden PGM (2008) Accounting for self-protective responses in randomized response data from a social security survey using the zero-inflated Poisson model. Ann Appl Stat 2:316–331
Article MathSciNet Google Scholar
Foutz RV (1977) On the unique consistent solution to the likelihood equations. J Am Stat Assoc 72:147–148
Article MathSciNet Google Scholar
Greenberg BG, Abul-Ela A, Simmons WR, Horvitz DG (1969) The unrelated question randomized response model: theoretical framework. J Am Stat Assoc 64:520–539
Article MathSciNet Google Scholar
Groenitz H (2014) A new privacy-protecting survey design for multichotomous sensitive variables. Metrika 77:211–224
Article MathSciNet Google Scholar
Hsieh SH, Lee SM, Shen PS (2009) Semiparametric analysis of randomized response data with missing covariates in logistic regression. Comput Stat Data Anal 53:2673–2692
Article MathSciNet Google Scholar
Hsieh SH, Lee SM, Shen PS (2010) Logistic regression analysis of randomized response data with missing covariates. J Stat Plan Inference 140:927–940
Article MathSciNet Google Scholar
Hsieh SH, Lee SM, Li CS (2020) A two-stage multilevel randomized response technique with proportional odds models and missing covariates. Sociol Methods Res. https://doi.org/10.1177/0049124120914954
Article Google Scholar
Huang KC (2004) A Survey technique for estimating the proportion and sensitivity in a dichotomous finite population. Stat Neerl 58:75–82
Article MathSciNet Google Scholar
Kim JM, Warde WD (2004) A stratified Warner’s randomized response model. J Stat Plan Inference 120:155–165
Article MathSciNet Google Scholar
Kim JM, Warde WD (2005) A mixed randomized response model. J Stat Plan Inference 133:211–221
Article MathSciNet Google Scholar
Kim JM, Tebbs JM, An SW (2006) Extensions of Mangat’s randomized-response model. J Stat Plan Inference 136:1554–1567
Article MathSciNet Google Scholar
Kuk AYC (1990) Asking sensitive questions indirectly. Biometrika 77:436–438
Article MathSciNet Google Scholar
Lee SM, Peng TC, Tapsoba JDD, Hsieh SH (2017) Improved estimation methods for unrelated question randomized response techniques. Commun Stat Theory Methods 46:8101–8112
Article MathSciNet Google Scholar
Maddala GS (1983) Limited-dependent and qualitative variables in econometrics. Cambridge University Press, Cambridge
Book Google Scholar
Saha A (2004) On efficacies of Dalenius–Vitale technique with compulsory versus optional randomized responses from complex surveys. Calcutta Stat Assoc Bull 54:223–230
Article MathSciNet Google Scholar
Scheers NJ, Dayton CM (1988) Covariate randomized response models. J Am Stat Assoc 83:969–974
Article Google Scholar
Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60:63–69
Article Google Scholar
Yu JW, Tian GL, Tang ML (2008) Two new models for survey sampling with sensitive characteristic: design and analysis. Metrika 67:251–263
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors are very thankful for a reviewer’s constructive comments that improved the presentation. The research of S.M. Lee and K.H. Pho was supported by the Ministry of Science and Technology (MOST) of Taiwan, ROC (105-2118-M-035-005-MY2 and 107-2118-M-035-004-MY2).

Author information

Authors and Affiliations

Department of Statistics, Feng Chia University, Taichung City, Taiwan, ROC
Pei-Chieh Chang, Kim-Hung Pho & Shen-Ming Lee
Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Kim-Hung Pho
School of Nursing, The State University of New York, University at Buffalo, Buffalo, USA
Chin-Shang Li

Authors

Pei-Chieh Chang
View author publications
You can also search for this author in PubMed Google Scholar
Kim-Hung Pho
View author publications
You can also search for this author in PubMed Google Scholar
Shen-Ming Lee
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Shang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chin-Shang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof of Lemma 1

$\varvec{\Psi }_i(\Theta )$ in (4) can be expressed as

$$\begin{aligned} \varvec{\Psi }_i(\Theta )&=\frac{Y_{i1}}{g_{i1}(\Theta )}\left( \frac{\partial {g_{i1}}(\Theta )}{\partial \Theta }\right) + \frac{Y_{i2}}{g_{i2}(\Theta )}\left( \frac{\partial {g_{i2}}(\Theta )}{\partial \Theta }\right) \\&\quad -\frac{1-Y_{i1}-Y_{i2}}{1-g_{i1}(\Theta )-g_{i2}(\Theta )}\left( \frac{\partial {g_{i1}}(\Theta )}{\partial \Theta } + \frac{\partial {g_{i2}}(\Theta )}{\partial \Theta }\right) \\&=\frac{\left[ Y_{i1}-g_{i1}(\Theta )\right] g_{i2}(\Theta )[1-g_{i2}(\Theta )]}{g_{i1}(\Theta )g_{i2}(\Theta )[1-g_{i1}(\Theta )-g_{i2}(\Theta )]} \left( \frac{\partial {g_{i1}}(\Theta )}{\partial \Theta }\right) \nonumber \\&\quad + \frac{\left[ Y_{i2}-g_{i2}(\Theta )\right] g_{i1}(\Theta )[1-g_{i1}(\Theta )]}{g_{i1}(\Theta )g_{i2}(\Theta )[1-g_{i1}(\Theta )-g_{i2}(\Theta )]} \left( \frac{\partial g_{i2}(\Theta )}{\partial \Theta }\right) \nonumber \\&\quad + \frac{\left[ Y_{i1}-g_{i1}(\Theta )\right] g_{i1}(\Theta )g_{i2}(\Theta )}{g_{i1}(\Theta )g_{i2}(\Theta )[1-g_{i1}(\Theta )-g_{i2}(\Theta )]} \left( \frac{\partial {g_{i2}}(\Theta )}{\partial \Theta }\right) \nonumber \\&\quad + \frac{\left[ Y_{i2}-g_{i2}(\Theta )\right] g_{i1}(\Theta )g_{i2}(\Theta )}{g_{i1}(\Theta )g_{i2}(\Theta )[1-g_{i1}(\Theta )-g_{i2}(\Theta )]} \left( \frac{\partial {g_{i1}}(\Theta )}{\partial \Theta }\right) \nonumber \\&=\left( \frac{\partial {g_{i1}}(\Theta )}{\partial \Theta }, \frac{\partial {g_{i2}}(\Theta )}{\partial \Theta }\right) \frac{1}{det(\varvec{V}_{i}(\Theta ))}\\&\quad \begin{bmatrix} g_{i2}(\Theta )[1-g_{i2}(\Theta )] &{} g_{i1}(\Theta )g_{i2}(\Theta ) \\ g_{i1}(\Theta )g_{i2}(\Theta ) &{} g_{i1}(\Theta )[1-g_{i1}(\Theta )] \end{bmatrix} [\varvec{Y}_i-\varvec{g}_i(\Theta )]^T\\&=\left( \frac{\partial \varvec{g_i}(\Theta )}{\partial \Theta }\right) {\varvec{V}_i^{-1}(\Theta )}[\varvec{Y}_i-\varvec{g}_i(\Theta )]^T,\ i=1,2,\dots , n, \end{aligned}$$

where $det(\varvec{V}_i(\Theta ))= g_{i1}(\Theta )g_{i2}(\Theta )(1-g_{i1}(\Theta )-g_{i2}(\Theta ))$ and $\varvec{V}_i(\Theta )$ is given in (8). Hence the score function $\varvec{U}_n(\Theta )$ can be written as

$$\begin{aligned} \varvec{U}_n(\Theta ) =\sum _{i=1}^n\varvec{\Psi }_i(\Theta ) =\sum _{i=1}^{n}\left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_{i}^{-1}(\Theta )[\varvec{Y}_i-\varvec{g}_i(\Theta )]^T. \end{aligned}$$

(9)

1.2 Proof of Theorem 1

1.2.1 (a) Proof of consistency of ${\widehat{\Theta }}$

Because from condition (C1) and the inverse function theorem of Foutz (1977), $\varvec{U}_n(\Theta )=\varvec{0}$ has a unique solution, the ML estimator ${\widehat{\Theta }}$ is a consistent estimator of $\Theta $.

1.2.2 (b) Proof of asymptotic normality of $\sqrt{n}({\widehat{\Theta }}-\Theta )$

Let $\varvec{{\mathcal {U}}}_n={\frac{1}{\sqrt{n}}}\varvec{U}_n(\Theta )=\frac{1}{\sqrt{n}}\sum _{i=1}^n\varvec{\Psi }_i(\Theta )$. By a Taylor’s expansion of $\varvec{{\mathcal {U}}}_n({\widehat{\Theta }})$ at $\Theta $, we can have

$$\begin{aligned} \varvec{0}=\varvec{{\mathcal {U}}}_n({\widehat{\Theta }})&=\varvec{{\mathcal {U}}}_n(\Theta ) +\frac{\partial \varvec{{\mathcal {U}}}_n(\Theta )}{\partial \Theta ^T} ({\widehat{\Theta }}-\Theta )+o_p\left( \sqrt{n}[({\widehat{\Theta }}-\Theta )]^{\otimes 2}\right) \nonumber \\&=\varvec{{\mathcal {U}}}_n(\Theta )+\frac{\partial \varvec{{\mathcal {U}}}_n(\Theta )}{\sqrt{n}\partial \Theta ^T}\sqrt{n}({\widehat{\Theta }}-\Theta )+o_p(\varvec{1}), \end{aligned}$$

(10)

where

$$\begin{aligned} \frac{\partial \varvec{{\mathcal {U}}}_n(\Theta )}{\sqrt{n}\partial \Theta ^T}&=\frac{1}{n}\sum _{i=1}^{n}\frac{\partial \varvec{\Psi }_i(\Theta )}{\partial \Theta ^T}\\&=\frac{1}{n}\sum _{i=1}^{n}\frac{\partial \left[ \left( \dfrac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta )\right] }{\partial \Theta ^T} [\varvec{Y}_i-\varvec{g}_i(\Theta )]^T\\&\quad -\frac{1}{n}\sum _{i=1}^{n}\left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta ) \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) ^T. \end{aligned}$$

Because $(Y_{i1},Y_{i2},1-Y_{i1}-Y_{i2})|\varvec{X}_i\sim {Mult}\left( 1,g_{i1}(\Theta ),g_{i2}(\Theta ),1-g_{i1}(\Theta )-g_{i2}(\Theta )\right) $, $i=1,2,\dots $, we have

$$\begin{aligned} E\left\{ [\varvec{Y}_i-\varvec{g}_i(\Theta )]|\varvec{X}_i\right\}&=\varvec{0}, \end{aligned}$$

(11)

$$\begin{aligned} E[\varvec{Y}_i-\varvec{g}_i(\Theta )]&=E\left\{ E\left\{ [\varvec{Y}_i-\varvec{g}_i(\Theta )]|\varvec{X}_i\right\} \right\} =\varvec{0}, \nonumber \\ Var\left\{ [\varvec{Y}_i-\varvec{g}_i(\Theta )]|\varvec{X}_i\right\}&=E\left\{ [\varvec{Y}_i-\varvec{g}_i(\Theta )][\varvec{Y}_i-\varvec{g}_i(\Theta )]^T|\varvec{X}_i\right\} \nonumber \\&=\begin{bmatrix} g_{i1}(\Theta )[1-g_{i1}(\Theta )] &{}\quad -g_{i1}(\Theta )g_{i2}(\Theta ) \\ -g_{i1}(\Theta )g_{i2}(\Theta ) &{}\quad g_{i2}(\Theta )[1-g_{i2}(\Theta )] \end{bmatrix} =\varvec{V}_i(\Theta ). \end{aligned}$$

(12)

Because $(\varvec{Y}_i,\varvec{X}_i)$, $i=1,2,\ldots ,n$, are independent and identically distributed and $E[\varvec{Y}_i-\varvec{g}_i(\Theta )]=\varvec{0}$, it can be shown according to condition (C2) and the weak law of large numbers that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\left\{ \frac{\partial }{\partial \Theta ^T}\left[ \left( \dfrac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta )\right] \right\} [\varvec{Y}_i- \varvec{g}_i(\Theta )]^T\overset{p}{\longrightarrow }\varvec{0}, \end{aligned}$$

and

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta ) \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) ^T\overset{p}{\longrightarrow }E\left[ \left( \frac{\partial \varvec{g}_1(\Theta )}{\partial \Theta }\right) \varvec{V}_1^{-1}(\Theta )\left( \frac{\partial \varvec{g}_1(\Theta )}{\partial \Theta }\right) ^T\right] . \end{aligned}$$

Hence

$$\begin{aligned} -\frac{\partial \varvec{{\mathcal {U}}}_n(\Theta )}{\sqrt{n}\partial \Theta ^T} \overset{p}{\longrightarrow }E\left[ \left( \frac{\partial \varvec{g}_1(\Theta )}{\partial \Theta }\right) \varvec{V}_1^{-1}(\Theta )\left( \frac{\partial \varvec{g}_1(\Theta )}{\partial \Theta }\right) ^T\right] =\varvec{\Delta }^{-1}. \end{aligned}$$

From (10), $\sqrt{n}({\widehat{\Theta }}-\Theta )$ can be expressed as

$$\begin{aligned} \sqrt{n}({\hat{\Theta }}-\Theta )&=\left( -\frac{\partial \varvec{{\mathcal {U}}}_n(\Theta )}{\sqrt{n}\partial \Theta ^T}\right) ^{-1}\varvec{{\mathcal {U}}}_n(\Theta )+o_p(\varvec{1}) \nonumber \\&=\left[ \varvec{\Delta }+o_p(\varvec{1})\right] \varvec{{\mathcal {U}}}_n(\Theta )+o_p(\varvec{1}) \nonumber \\&=\varvec{\Delta }\varvec{{\mathcal {U}}}_n(\Theta )+o_p(\varvec{1}). \end{aligned}$$

(13)

Because from (11) and (12) we can have

$$\begin{aligned} E\left[ \varvec{\Psi }_i(\Theta )\right] =E\left[ \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_{i}^{-1}(\Theta )[\varvec{Y}_i-\varvec{g}_i(\Theta )]^T\right] =\varvec{0} \end{aligned}$$

and

$$\begin{aligned} Var\left[ \varvec{\Psi }_i(\Theta )\right]&=E\left[ \varvec{\Psi }_i(\Theta )\varvec{\Psi }_i^T(\Theta )\right] \\&=E\left\{ E\left[ \varvec{\Psi }_i(\Theta )\varvec{\Psi }_i^T(\Theta )|\varvec{X}_i\right] \right\} \\&=E\left[ \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta ) E\left\{ [\varvec{Y}_i-\varvec{g}_i(\Theta )]^T[\varvec{Y}_i-\varvec{g}_i(\Theta )]|\varvec{X}_i\right\} \right. \\&\quad \left. \varvec{V}_i^{-1}(\Theta )\left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) ^T\right] \\&=E\left[ \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) \varvec{V}_i^{-1}(\Theta ) \left( \frac{\partial \varvec{g}_i(\Theta )}{\partial \Theta }\right) ^T\right] =\varvec{\Delta }^{-1},\ i=1,2,\dots ,n, \end{aligned}$$

it can be shown via the central limit theorem that

$$\begin{aligned} \varvec{{\mathcal {U}}}_n(\Theta ) =\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\varvec{\Psi }_i(\Theta ) \overset{d}{\longrightarrow }{\mathcal {N}}\left( \varvec{0},E[\varvec{\Psi }_1(\Theta )\varvec{\Psi }_1^{T}(\Theta )]\right) . \end{aligned}$$

Therefore

$$\begin{aligned} \varvec{\Delta }\varvec{{\mathcal {U}}}_n(\Theta ) = \varvec{\Delta }\left[ \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\varvec{\Psi }_i(\Theta )\right] \overset{d}{\longrightarrow }{\mathcal {N}}(\varvec{0},\varvec{\Delta }), \end{aligned}$$

and by Slutsky’s theorem $\sqrt{n}({\widehat{\Theta }}-\Theta )=\varvec{\Delta }\varvec{{\mathcal {U}}}_n(\Theta ) +o_p(\varvec{1})\overset{d}{\longrightarrow }N(\varvec{0},\varvec{\Delta })$ to finish the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, PC., Pho, KH., Lee, SM. et al. Estimation of parameters of logistic regression for two-stage randomized response technique. Comput Stat 36, 2111–2133 (2021). https://doi.org/10.1007/s00180-021-01068-5

Download citation

Received: 24 September 2020
Accepted: 06 January 2021
Published: 18 January 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s00180-021-01068-5

Estimation of parameters of logistic regression for two-stage randomized response technique

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Alternative Estimator in Dichotomous Randomized Response Technique

An Improved Two-Stage Forced Randomized Response Model for Estimating the Proportion of Sensitive Attribute

Two-stage unrelated randomized response model to estimate the prevalence of a sensitive attribute

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 Proof of Lemma 1

1.2 Proof of Theorem 1

1.2.1 (a) Proof of consistency of \({\widehat{\Theta }}\)

1.2.2 (b) Proof of asymptotic normality of \(\sqrt{n}({\widehat{\Theta }}-\Theta )\)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Estimation of parameters of logistic regression for two-stage randomized response technique

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Alternative Estimator in Dichotomous Randomized Response Technique

An Improved Two-Stage Forced Randomized Response Model for Estimating the Proportion of Sensitive Attribute

Two-stage unrelated randomized response model to estimate the prevalence of a sensitive attribute

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Proof of Lemma 1

1.2 Proof of Theorem 1

1.2.1 (a) Proof of consistency of \({\widehat{\Theta }}\)

1.2.2 (b) Proof of asymptotic normality of \(\sqrt{n}({\widehat{\Theta }}-\Theta )\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation