Abstract
When a survey study is related to sensitive issues such as political orientation, sexual orientation, and income, respondents may not be willing to reply truthfully, which leads to bias results. To protect the respondents’ privacy and improve their willingness to provide true answers, Warner (J Am Stat Assoc 60:63–69, 1965) proposed the randomized response (RR) technique in which respondents select a question by means of a random device in order to ensure that they maintain privacy. Huang (Stat Neerl 58:75–82, 2004) extended the RR design of Warner (1965) to propose a two-stage RR design. Not only can this method be used to estimate the population proportion of persons with a sensitive characteristic, but also estimate the honest answer rate in the first stage. This work develops a covariate extension of the two-stage RR design of Huang (2004) by applying logistic regression to investigate the effects of covariates on a sensitive characteristic and an honest response. Simulation experiments are conducted to study the finite-sample performance of the maximum likelihood estimators of the logistic regression parameters. The proposed methodology is applied to analyze the survey data of sexuality of freshmen at Feng Chia University in Taiwan in 2016.
Similar content being viewed by others
References
Chang HJ, Huang KC (2001) Estimation of proportion and sensitivity of a qualitative character. Metrika 53:269–280
Chaudhuri A (2002) Estimating sensitive proportions from randomized responses in unequal probability sampling. Calcutta Stat Assoc Bull 52:315–322
Chaudhuri A, Mukerjee R (1988) Randomized response: theory and techniques. Marcel Dekker, New York
Christofides TC (2003) A generalized randomized response technique. Metrika 57:195–200
Christofides TC (2005) Randomized response in stratified sampling. J Stat Plan Inference 128:303–310
Cruyff MJLF, Böckenholt U, van den Hout A, van der Heijden PGM (2008) Accounting for self-protective responses in randomized response data from a social security survey using the zero-inflated Poisson model. Ann Appl Stat 2:316–331
Foutz RV (1977) On the unique consistent solution to the likelihood equations. J Am Stat Assoc 72:147–148
Greenberg BG, Abul-Ela A, Simmons WR, Horvitz DG (1969) The unrelated question randomized response model: theoretical framework. J Am Stat Assoc 64:520–539
Groenitz H (2014) A new privacy-protecting survey design for multichotomous sensitive variables. Metrika 77:211–224
Hsieh SH, Lee SM, Shen PS (2009) Semiparametric analysis of randomized response data with missing covariates in logistic regression. Comput Stat Data Anal 53:2673–2692
Hsieh SH, Lee SM, Shen PS (2010) Logistic regression analysis of randomized response data with missing covariates. J Stat Plan Inference 140:927–940
Hsieh SH, Lee SM, Li CS (2020) A two-stage multilevel randomized response technique with proportional odds models and missing covariates. Sociol Methods Res. https://doi.org/10.1177/0049124120914954
Huang KC (2004) A Survey technique for estimating the proportion and sensitivity in a dichotomous finite population. Stat Neerl 58:75–82
Kim JM, Warde WD (2004) A stratified Warner’s randomized response model. J Stat Plan Inference 120:155–165
Kim JM, Warde WD (2005) A mixed randomized response model. J Stat Plan Inference 133:211–221
Kim JM, Tebbs JM, An SW (2006) Extensions of Mangat’s randomized-response model. J Stat Plan Inference 136:1554–1567
Kuk AYC (1990) Asking sensitive questions indirectly. Biometrika 77:436–438
Lee SM, Peng TC, Tapsoba JDD, Hsieh SH (2017) Improved estimation methods for unrelated question randomized response techniques. Commun Stat Theory Methods 46:8101–8112
Maddala GS (1983) Limited-dependent and qualitative variables in econometrics. Cambridge University Press, Cambridge
Saha A (2004) On efficacies of Dalenius–Vitale technique with compulsory versus optional randomized responses from complex surveys. Calcutta Stat Assoc Bull 54:223–230
Scheers NJ, Dayton CM (1988) Covariate randomized response models. J Am Stat Assoc 83:969–974
Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60:63–69
Yu JW, Tian GL, Tang ML (2008) Two new models for survey sampling with sensitive characteristic: design and analysis. Metrika 67:251–263
Acknowledgements
The authors are very thankful for a reviewer’s constructive comments that improved the presentation. The research of S.M. Lee and K.H. Pho was supported by the Ministry of Science and Technology (MOST) of Taiwan, ROC (105-2118-M-035-005-MY2 and 107-2118-M-035-004-MY2).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Proof of Lemma 1
\(\varvec{\Psi }_i(\Theta )\) in (4) can be expressed as
where \(det(\varvec{V}_i(\Theta ))= g_{i1}(\Theta )g_{i2}(\Theta )(1-g_{i1}(\Theta )-g_{i2}(\Theta ))\) and \(\varvec{V}_i(\Theta )\) is given in (8). Hence the score function \(\varvec{U}_n(\Theta )\) can be written as
1.2 Proof of Theorem 1
1.2.1 (a) Proof of consistency of \({\widehat{\Theta }}\)
Because from condition (C1) and the inverse function theorem of Foutz (1977), \(\varvec{U}_n(\Theta )=\varvec{0}\) has a unique solution, the ML estimator \({\widehat{\Theta }}\) is a consistent estimator of \(\Theta \).
1.2.2 (b) Proof of asymptotic normality of \(\sqrt{n}({\widehat{\Theta }}-\Theta )\)
Let \(\varvec{{\mathcal {U}}}_n={\frac{1}{\sqrt{n}}}\varvec{U}_n(\Theta )=\frac{1}{\sqrt{n}}\sum _{i=1}^n\varvec{\Psi }_i(\Theta )\). By a Taylor’s expansion of \(\varvec{{\mathcal {U}}}_n({\widehat{\Theta }})\) at \(\Theta \), we can have
where
Because \((Y_{i1},Y_{i2},1-Y_{i1}-Y_{i2})|\varvec{X}_i\sim {Mult}\left( 1,g_{i1}(\Theta ),g_{i2}(\Theta ),1-g_{i1}(\Theta )-g_{i2}(\Theta )\right) \), \(i=1,2,\dots \), we have
Because \((\varvec{Y}_i,\varvec{X}_i)\), \(i=1,2,\ldots ,n\), are independent and identically distributed and \(E[\varvec{Y}_i-\varvec{g}_i(\Theta )]=\varvec{0}\), it can be shown according to condition (C2) and the weak law of large numbers that
and
Hence
From (10), \(\sqrt{n}({\widehat{\Theta }}-\Theta )\) can be expressed as
Because from (11) and (12) we can have
and
it can be shown via the central limit theorem that
Therefore
and by Slutsky’s theorem \(\sqrt{n}({\widehat{\Theta }}-\Theta )=\varvec{\Delta }\varvec{{\mathcal {U}}}_n(\Theta ) +o_p(\varvec{1})\overset{d}{\longrightarrow }N(\varvec{0},\varvec{\Delta })\) to finish the proof.
Rights and permissions
About this article
Cite this article
Chang, PC., Pho, KH., Lee, SM. et al. Estimation of parameters of logistic regression for two-stage randomized response technique. Comput Stat 36, 2111–2133 (2021). https://doi.org/10.1007/s00180-021-01068-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-021-01068-5