Abstract
When multiplicative noises are used to perturb a set of original data, the data provider needs to ensure that the original values are not likely to be learned by data intruders from the noise-multiplied data. Different attacking strategies for unveiling the original values have been recognised in the literature, and the data provider needs to ensure that the noise-multiplied data is protected against these attacking strategies by selecting an appropriate noise generating variable. However, there are many potential attacking strategies, which makes the quantification of the protection level of a noise candidate difficult. In this paper, we argue that, to quantify the protection level a noise candidate offers to the original data against an attacking strategy, the data provider might look at the average value disclosure risk it produces. Correspondingly, we propose an optimal estimator which maximizes the average value disclosure risk. As a result, the data provider could use the maximized average value disclosure risk as a single measure for quantifying the protection level a noise candidate offers to the original data. The measure could help the data provider with the process of noise generating variable selection in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, R., Aggarwal, C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th Symposium on Principles of Database Systems, Santa Barbara, California, USA (2001)
Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the ACM SIGMOD, pp. 439–450 (2000)
Bickel, P.J., Ritov, Y.: Nonparametric estimators which can be “plugged-in”. Ann. Stat. 31(4), 1033–1053 (2003)
Burridge, J.: Information presrving statistical obfuscation. Stat. Comput. 13, 321–327 (2003)
Chipperfield, J., Newman, J., Thompson, G., Ma, Y., Lin, Y.X.: Prospects for protecting aggregate business microdata via a remote server. J. Off. Stat. (Major Revision) (2018)
Domingo-Ferrer, J., Torra, V.: Disclosure protection methods and information loss for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110 (2001)
Fuller, W.A.: Masking procedures for microdata disclosure limitation. J. Off. Stat. 9, 383–406 (1993)
Kim, J., Jeong, D.M.: Truncated triangular distribution for multiplicative noise and domain estimation. In: Section on Government Statistics-JSM 2008, pp. 1023–1030 (2008)
Klein, M., Mathew, T., Sinha, B.: Noise multiplication for statistical disclosure control of extreme values in log-normal regression samples. J. Priv. Confid. 6, 77–125 (2014)
Lin, Y.-X.: Density Approximant based on noise multiplied data. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 89–104. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11257-2_8
Lin, Y.X., Fielding, M.J.: MaskDensity14: a R package for the density approximant of a univariate based on noise multiplied data. SoftwareX 3–4, 37–43 (2015)
Lin, Y.X., Wise, P.: Estimation of regression paremeters from noise multiplied data. J. Priv. Confid. 4, 61–94 (2012)
Ma, Y., Lin, Y.X., Sarathy, R.: The vulnerability of multiplicative noise protection to correlation-attacks on continuous microdata. Sankhya B (Major Revision) (2018)
Melville, N., McQuaid, M.: Research note-generating shareable statistical databases for business value: multiple imputation with multimodal perturbation. Inf. Syst. Res. 23(2), 559–574 (2012)
Muralidhar, K., Sarathy, R.: Data shuffling - a new masking approach for numerical data. Manag. Sci. 52, 658–670 (2006)
Nayak, T.K., Sinha, B., Zayatz, L.: Statistical properties of multiplicative noise masking for confidentiality protection. J. Off. Stat. 27(3), 527–544 (2011)
Oganian, A.: Multiplicative noise for masking numerical microdata with constraints. SORT Special Issue: Priv. Stat. Database 99–112 (2011)
Ruiz, N.: A multiplicative masking method for preserving the skewness of the original micro-records. J. Off. Stat. 28, 107–120 (2011)
Shlomo, N.: Releasing microdata: disclosure risk estimation, data masking and assessing utility. J. Priv. Confid. 2(1), 73–91 (2010)
Sinha, B., Nayak, T.K., Zayatz, L.: Privacy protection and quantile estimation from noise multiplied data. Sankhya B 73(2), 297–315 (2011)
Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavković, A.: General and specific utility for synthetic data. J. Roy. Stat. Soc. Ser. A: Stat. Soc. 181, Part 3, 663–688 (2018)
Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47804-3_11
Acknowledgements
We thank the anonymous reviewers for their constructive comments on the paper. This research has been conducted with the support of the Australian Government Research Training Program Scholarship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
In this section we show how \(z_i^{opt}\) is derived. Suppose for a set of original data \(\{y_i\}_{i=1}^n\), the following probabilistic disclosure risk measure to be used by the data provider.
In the following we assume \(Y>0\), \(C>0\), \(\tilde{Y}=g(Y^*)\), where \(Y^*=CY\). We observe that the disclosure risk of an observation y is given as:
Therefore \(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y)\) is a function of random variable Y.
The average disclosure risk for the original data is
Suppose \(E_Y(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y))\) exists, therefore we have
as \(n\rightarrow \infty \).
The objective is to find an expression of \(g(Y^*)\) which maximizes \(R_{overall}\) as \(n\rightarrow \infty \). Because \(\{Y^*|Y=y\}=yC\), therefore \(f_{Y^*|Y}(y^*|Y=y)=\frac{1}{y}f_C(\frac{y^*}{y})\). We observe the following:
Therefore, \(E_Y(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y))\) is maximized if \(\int _{\frac{g(y^*)}{(1+\delta )}}^{\frac{g(y^*)}{(1-\delta )}} f_{Y|Y^*=y^*}(y)dy\) is maximized. The form of \(g(y^*)\) which maximizes \(\int _{\frac{g(y^*)}{(1+\delta )}}^{\frac{g(y^*)}{(1-\delta )}} f_{Y|Y^*=y^*}(y)dy\) is \(z_{opt}=argmax_{g(y^*)} \int _{\frac{g(y^*)}{(1+\delta )}}^{\frac{g(y^*)}{(1-\delta )}} f_{Y|Y^*=y^*}(y)dy\). Therefore, the optimal estimator \(Z_{opt}\) takes the following form
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, Y., Lin, YX., Krivitsky, P.N., Wakefield, B. (2018). Quantifying the Protection Level of a Noise Candidate for Noise Multiplication Masking Scheme. In: Domingo-Ferrer, J., Montes, F. (eds) Privacy in Statistical Databases. PSD 2018. Lecture Notes in Computer Science(), vol 11126. Springer, Cham. https://doi.org/10.1007/978-3-319-99771-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-99771-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99770-4
Online ISBN: 978-3-319-99771-1
eBook Packages: Computer ScienceComputer Science (R0)