Quantifying the Protection Level of a Noise Candidate for Noise Multiplication Masking Scheme

Yue Ma¹⁵,
Yan-Xia Lin¹⁵,
Pavel N. Krivitsky¹⁵ &
…
Bradley Wakefield¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11126))

Included in the following conference series:

International Conference on Privacy in Statistical Databases

940 Accesses

Abstract

When multiplicative noises are used to perturb a set of original data, the data provider needs to ensure that the original values are not likely to be learned by data intruders from the noise-multiplied data. Different attacking strategies for unveiling the original values have been recognised in the literature, and the data provider needs to ensure that the noise-multiplied data is protected against these attacking strategies by selecting an appropriate noise generating variable. However, there are many potential attacking strategies, which makes the quantification of the protection level of a noise candidate difficult. In this paper, we argue that, to quantify the protection level a noise candidate offers to the original data against an attacking strategy, the data provider might look at the average value disclosure risk it produces. Correspondingly, we propose an optimal estimator which maximizes the average value disclosure risk. As a result, the data provider could use the maximized average value disclosure risk as a single measure for quantifying the protection level a noise candidate offers to the original data. The measure could help the data provider with the process of noise generating variable selection in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Vulnerability of Multiplicative Noise Protection to Correlation-Attacks on Continuous Microdata

Article 29 March 2019

Simple Refreshing in the Noisy Leakage Model

Noisy Leakage Revisited

References

Agrawal, R., Aggarwal, C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th Symposium on Principles of Database Systems, Santa Barbara, California, USA (2001)
Google Scholar
Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the ACM SIGMOD, pp. 439–450 (2000)
Google Scholar
Bickel, P.J., Ritov, Y.: Nonparametric estimators which can be “plugged-in”. Ann. Stat. 31(4), 1033–1053 (2003)
Article MathSciNet Google Scholar
Burridge, J.: Information presrving statistical obfuscation. Stat. Comput. 13, 321–327 (2003)
Article MathSciNet Google Scholar
Chipperfield, J., Newman, J., Thompson, G., Ma, Y., Lin, Y.X.: Prospects for protecting aggregate business microdata via a remote server. J. Off. Stat. (Major Revision) (2018)
Google Scholar
Domingo-Ferrer, J., Torra, V.: Disclosure protection methods and information loss for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110 (2001)
Google Scholar
Fuller, W.A.: Masking procedures for microdata disclosure limitation. J. Off. Stat. 9, 383–406 (1993)
Google Scholar
Kim, J., Jeong, D.M.: Truncated triangular distribution for multiplicative noise and domain estimation. In: Section on Government Statistics-JSM 2008, pp. 1023–1030 (2008)
Google Scholar
Klein, M., Mathew, T., Sinha, B.: Noise multiplication for statistical disclosure control of extreme values in log-normal regression samples. J. Priv. Confid. 6, 77–125 (2014)
Google Scholar
Lin, Y.-X.: Density Approximant based on noise multiplied data. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 89–104. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11257-2_8
Chapter Google Scholar
Lin, Y.X., Fielding, M.J.: MaskDensity14: a R package for the density approximant of a univariate based on noise multiplied data. SoftwareX 3–4, 37–43 (2015)
Article Google Scholar
Lin, Y.X., Wise, P.: Estimation of regression paremeters from noise multiplied data. J. Priv. Confid. 4, 61–94 (2012)
Google Scholar
Ma, Y., Lin, Y.X., Sarathy, R.: The vulnerability of multiplicative noise protection to correlation-attacks on continuous microdata. Sankhya B (Major Revision) (2018)
Google Scholar
Melville, N., McQuaid, M.: Research note-generating shareable statistical databases for business value: multiple imputation with multimodal perturbation. Inf. Syst. Res. 23(2), 559–574 (2012)
Article Google Scholar
Muralidhar, K., Sarathy, R.: Data shuffling - a new masking approach for numerical data. Manag. Sci. 52, 658–670 (2006)
Article Google Scholar
Nayak, T.K., Sinha, B., Zayatz, L.: Statistical properties of multiplicative noise masking for confidentiality protection. J. Off. Stat. 27(3), 527–544 (2011)
Google Scholar
Oganian, A.: Multiplicative noise for masking numerical microdata with constraints. SORT Special Issue: Priv. Stat. Database 99–112 (2011)
Google Scholar
Ruiz, N.: A multiplicative masking method for preserving the skewness of the original micro-records. J. Off. Stat. 28, 107–120 (2011)
Google Scholar
Shlomo, N.: Releasing microdata: disclosure risk estimation, data masking and assessing utility. J. Priv. Confid. 2(1), 73–91 (2010)
Google Scholar
Sinha, B., Nayak, T.K., Zayatz, L.: Privacy protection and quantile estimation from noise multiplied data. Sankhya B 73(2), 297–315 (2011)
Article MathSciNet Google Scholar
Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavković, A.: General and specific utility for synthetic data. J. Roy. Stat. Soc. Ser. A: Stat. Soc. 181, Part 3, 663–688 (2018)
Google Scholar
Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47804-3_11
Chapter MATH Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers for their constructive comments on the paper. This research has been conducted with the support of the Australian Government Research Training Program Scholarship.

Author information

Authors and Affiliations

National Institute for Applied Statistics Research Australia, School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, 2500, Australia
Yue Ma, Yan-Xia Lin, Pavel N. Krivitsky & Bradley Wakefield

Authors

Yue Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Xia Lin
View author publications
You can also search for this author in PubMed Google Scholar
Pavel N. Krivitsky
View author publications
You can also search for this author in PubMed Google Scholar
Bradley Wakefield
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Ma .

Editor information

Editors and Affiliations

Rovira i Virgili University, Tarragona, Spain
Josep Domingo-Ferrer
University of Valencia, Burjassot, Spain
Francisco Montes

Appendix

In this section we show how $z_i^{opt}$ is derived. Suppose for a set of original data $\{y_i\}_{i=1}^n$, the following probabilistic disclosure risk measure to be used by the data provider.

$$\begin{aligned} P(\frac{|\tilde{Y}_i-Y_i|}{Y_i}<\delta )=P(\frac{|\tilde{Y}-Y|}{Y}<\delta ), i=1,\cdots , n \end{aligned}$$

In the following we assume $Y>0$, $C>0$, $\tilde{Y}=g(Y^*)$, where $Y^*=CY$. We observe that the disclosure risk of an observation y is given as:

$$\begin{aligned} P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y=y)=\int _{0}^{\infty }I(\frac{g(y^*)}{1+\delta }<y<\frac{g(y^*)}{1-\delta })f_{Y^*|Y}(y^*|Y=y)dy^* \end{aligned}$$

Therefore $P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y)$ is a function of random variable Y.

The average disclosure risk for the original data is

$$\begin{aligned} R_{overall}=\frac{\sum _{i=1}^n P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y=y_i)}{n} \end{aligned}$$

Suppose $E_Y(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y))$ exists, therefore we have

$$\begin{aligned} R_{overall}\overset{P}{\rightarrow }E_Y(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y)) \end{aligned}$$

as $n\rightarrow \infty $.

The objective is to find an expression of $g(Y^*)$ which maximizes $R_{overall}$ as $n\rightarrow \infty $. Because $\{Y^*|Y=y\}=yC$, therefore $f_{Y^*|Y}(y^*|Y=y)=\frac{1}{y}f_C(\frac{y^*}{y})$. We observe the following:

$$\begin{aligned} \begin{aligned} E_Y(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y))&=\int _{0}^{\infty }\int _{0}^{\infty } I(\frac{g(y^*)}{1+\delta }<y<\frac{g(y^*)}{1-\delta })f_{Y^*|Y=y}(y^*)dy^* f_Y(y)dy\\&=\int _{0}^{\infty }\int _{0}^{\infty } I(\frac{g(y^*)}{1+\delta }<y<\frac{g(y^*)}{1-\delta })\frac{1}{y} f_C(\frac{y^*}{y})f_Y(y)dydy^*\\&=\int _{0}^{\infty }f_{Y^*}(y^*)\int _{0}^{\infty } I(\frac{g(y^*)}{1+\delta }<y<\frac{g(y^*)}{1-\delta })f_{Y|Y^*=y^*}(y)dydy^*\\&=\int _{0}^{\infty }f_{Y^*}(y^*)\int _{\frac{g(y^*)}{(1+\delta )}}^{\frac{g(y^*)}{(1-\delta )}} f_{Y|Y^*=y^*}(y)dydy^*\end{aligned}\end{aligned}$$

(2)

Therefore, $E_Y(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y))$ is maximized if $\int _{\frac{g(y^*)}{(1+\delta )}}^{\frac{g(y^*)}{(1-\delta )}} f_{Y|Y^*=y^*}(y)dy$ is maximized. The form of $g(y^*)$ which maximizes $\int _{\frac{g(y^*)}{(1+\delta )}}^{\frac{g(y^*)}{(1-\delta )}} f_{Y|Y^*=y^*}(y)dy$ is $z_{opt}=argmax_{g(y^*)} \int _{\frac{g(y^*)}{(1+\delta )}}^{\frac{g(y^*)}{(1-\delta )}} f_{Y|Y^*=y^*}(y)dy$. Therefore, the optimal estimator $Z_{opt}$ takes the following form

$$\begin{aligned} Z_{opt}=argmax_{g(Y^*)} \int _{\frac{g(Y^*)}{(1+\delta )}}^{\frac{g(Y^*)}{(1-\delta )}} f_{Y|Y^*}(y)dy \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, Y., Lin, YX., Krivitsky, P.N., Wakefield, B. (2018). Quantifying the Protection Level of a Noise Candidate for Noise Multiplication Masking Scheme. In: Domingo-Ferrer, J., Montes, F. (eds) Privacy in Statistical Databases. PSD 2018. Lecture Notes in Computer Science(), vol 11126. Springer, Cham. https://doi.org/10.1007/978-3-319-99771-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-99771-1_19
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99770-4
Online ISBN: 978-3-319-99771-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Quantifying the Protection Level of a Noise Candidate for Noise Multiplication Masking Scheme

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

The Vulnerability of Multiplicative Noise Protection to Correlation-Attacks on Continuous Microdata

Simple Refreshing in the Noisy Leakage Model

Noisy Leakage Revisited

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us