Nothing Special   »   [go: up one dir, main page]

Skip to main content

Quantifying the Protection Level of a Noise Candidate for Noise Multiplication Masking Scheme

  • Conference paper
  • First Online:
Privacy in Statistical Databases (PSD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11126))

Included in the following conference series:

Abstract

When multiplicative noises are used to perturb a set of original data, the data provider needs to ensure that the original values are not likely to be learned by data intruders from the noise-multiplied data. Different attacking strategies for unveiling the original values have been recognised in the literature, and the data provider needs to ensure that the noise-multiplied data is protected against these attacking strategies by selecting an appropriate noise generating variable. However, there are many potential attacking strategies, which makes the quantification of the protection level of a noise candidate difficult. In this paper, we argue that, to quantify the protection level a noise candidate offers to the original data against an attacking strategy, the data provider might look at the average value disclosure risk it produces. Correspondingly, we propose an optimal estimator which maximizes the average value disclosure risk. As a result, the data provider could use the maximized average value disclosure risk as a single measure for quantifying the protection level a noise candidate offers to the original data. The measure could help the data provider with the process of noise generating variable selection in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Agrawal, R., Aggarwal, C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th Symposium on Principles of Database Systems, Santa Barbara, California, USA (2001)

    Google Scholar 

  • Agrawal, R., Srikant, R.: Privacy preserving data mining. In: Proceedings of the ACM SIGMOD, pp. 439–450 (2000)

    Google Scholar 

  • Bickel, P.J., Ritov, Y.: Nonparametric estimators which can be “plugged-in”. Ann. Stat. 31(4), 1033–1053 (2003)

    Article  MathSciNet  Google Scholar 

  • Burridge, J.: Information presrving statistical obfuscation. Stat. Comput. 13, 321–327 (2003)

    Article  MathSciNet  Google Scholar 

  • Chipperfield, J., Newman, J., Thompson, G., Ma, Y., Lin, Y.X.: Prospects for protecting aggregate business microdata via a remote server. J. Off. Stat. (Major Revision) (2018)

    Google Scholar 

  • Domingo-Ferrer, J., Torra, V.: Disclosure protection methods and information loss for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110 (2001)

    Google Scholar 

  • Fuller, W.A.: Masking procedures for microdata disclosure limitation. J. Off. Stat. 9, 383–406 (1993)

    Google Scholar 

  • Kim, J., Jeong, D.M.: Truncated triangular distribution for multiplicative noise and domain estimation. In: Section on Government Statistics-JSM 2008, pp. 1023–1030 (2008)

    Google Scholar 

  • Klein, M., Mathew, T., Sinha, B.: Noise multiplication for statistical disclosure control of extreme values in log-normal regression samples. J. Priv. Confid. 6, 77–125 (2014)

    Google Scholar 

  • Lin, Y.-X.: Density Approximant based on noise multiplied data. In: Domingo-Ferrer, J. (ed.) PSD 2014. LNCS, vol. 8744, pp. 89–104. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11257-2_8

    Chapter  Google Scholar 

  • Lin, Y.X., Fielding, M.J.: MaskDensity14: a R package for the density approximant of a univariate based on noise multiplied data. SoftwareX 3–4, 37–43 (2015)

    Article  Google Scholar 

  • Lin, Y.X., Wise, P.: Estimation of regression paremeters from noise multiplied data. J. Priv. Confid. 4, 61–94 (2012)

    Google Scholar 

  • Ma, Y., Lin, Y.X., Sarathy, R.: The vulnerability of multiplicative noise protection to correlation-attacks on continuous microdata. Sankhya B (Major Revision) (2018)

    Google Scholar 

  • Melville, N., McQuaid, M.: Research note-generating shareable statistical databases for business value: multiple imputation with multimodal perturbation. Inf. Syst. Res. 23(2), 559–574 (2012)

    Article  Google Scholar 

  • Muralidhar, K., Sarathy, R.: Data shuffling - a new masking approach for numerical data. Manag. Sci. 52, 658–670 (2006)

    Article  Google Scholar 

  • Nayak, T.K., Sinha, B., Zayatz, L.: Statistical properties of multiplicative noise masking for confidentiality protection. J. Off. Stat. 27(3), 527–544 (2011)

    Google Scholar 

  • Oganian, A.: Multiplicative noise for masking numerical microdata with constraints. SORT Special Issue: Priv. Stat. Database 99–112 (2011)

    Google Scholar 

  • Ruiz, N.: A multiplicative masking method for preserving the skewness of the original micro-records. J. Off. Stat. 28, 107–120 (2011)

    Google Scholar 

  • Shlomo, N.: Releasing microdata: disclosure risk estimation, data masking and assessing utility. J. Priv. Confid. 2(1), 73–91 (2010)

    Google Scholar 

  • Sinha, B., Nayak, T.K., Zayatz, L.: Privacy protection and quantile estimation from noise multiplied data. Sankhya B 73(2), 297–315 (2011)

    Article  MathSciNet  Google Scholar 

  • Snoke, J., Raab, G.M., Nowok, B., Dibben, C., Slavković, A.: General and specific utility for synthetic data. J. Roy. Stat. Soc. Ser. A: Stat. Soc. 181, Part 3, 663–688 (2018)

    Google Scholar 

  • Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure risk assessment in perturbative microdata protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47804-3_11

    Chapter  MATH  Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers for their constructive comments on the paper. This research has been conducted with the support of the Australian Government Research Training Program Scholarship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Ma .

Editor information

Editors and Affiliations

Appendix

Appendix

In this section we show how \(z_i^{opt}\) is derived. Suppose for a set of original data \(\{y_i\}_{i=1}^n\), the following probabilistic disclosure risk measure to be used by the data provider.

$$\begin{aligned} P(\frac{|\tilde{Y}_i-Y_i|}{Y_i}<\delta )=P(\frac{|\tilde{Y}-Y|}{Y}<\delta ), i=1,\cdots , n \end{aligned}$$

In the following we assume \(Y>0\), \(C>0\), \(\tilde{Y}=g(Y^*)\), where \(Y^*=CY\). We observe that the disclosure risk of an observation y is given as:

$$\begin{aligned} P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y=y)=\int _{0}^{\infty }I(\frac{g(y^*)}{1+\delta }<y<\frac{g(y^*)}{1-\delta })f_{Y^*|Y}(y^*|Y=y)dy^* \end{aligned}$$

Therefore \(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y)\) is a function of random variable Y.

The average disclosure risk for the original data is

$$\begin{aligned} R_{overall}=\frac{\sum _{i=1}^n P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y=y_i)}{n} \end{aligned}$$

Suppose \(E_Y(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y))\) exists, therefore we have

$$\begin{aligned} R_{overall}\overset{P}{\rightarrow }E_Y(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y)) \end{aligned}$$

as \(n\rightarrow \infty \).

The objective is to find an expression of \(g(Y^*)\) which maximizes \(R_{overall}\) as \(n\rightarrow \infty \). Because \(\{Y^*|Y=y\}=yC\), therefore \(f_{Y^*|Y}(y^*|Y=y)=\frac{1}{y}f_C(\frac{y^*}{y})\). We observe the following:

$$\begin{aligned} \begin{aligned} E_Y(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y))&=\int _{0}^{\infty }\int _{0}^{\infty } I(\frac{g(y^*)}{1+\delta }<y<\frac{g(y^*)}{1-\delta })f_{Y^*|Y=y}(y^*)dy^* f_Y(y)dy\\&=\int _{0}^{\infty }\int _{0}^{\infty } I(\frac{g(y^*)}{1+\delta }<y<\frac{g(y^*)}{1-\delta })\frac{1}{y} f_C(\frac{y^*}{y})f_Y(y)dydy^*\\&=\int _{0}^{\infty }f_{Y^*}(y^*)\int _{0}^{\infty } I(\frac{g(y^*)}{1+\delta }<y<\frac{g(y^*)}{1-\delta })f_{Y|Y^*=y^*}(y)dydy^*\\&=\int _{0}^{\infty }f_{Y^*}(y^*)\int _{\frac{g(y^*)}{(1+\delta )}}^{\frac{g(y^*)}{(1-\delta )}} f_{Y|Y^*=y^*}(y)dydy^*\end{aligned}\end{aligned}$$
(2)

Therefore, \(E_Y(P(\frac{|g(Y^*)-Y|}{Y}<\delta |Y))\) is maximized if \(\int _{\frac{g(y^*)}{(1+\delta )}}^{\frac{g(y^*)}{(1-\delta )}} f_{Y|Y^*=y^*}(y)dy\) is maximized. The form of \(g(y^*)\) which maximizes \(\int _{\frac{g(y^*)}{(1+\delta )}}^{\frac{g(y^*)}{(1-\delta )}} f_{Y|Y^*=y^*}(y)dy\) is \(z_{opt}=argmax_{g(y^*)} \int _{\frac{g(y^*)}{(1+\delta )}}^{\frac{g(y^*)}{(1-\delta )}} f_{Y|Y^*=y^*}(y)dy\). Therefore, the optimal estimator \(Z_{opt}\) takes the following form

$$\begin{aligned} Z_{opt}=argmax_{g(Y^*)} \int _{\frac{g(Y^*)}{(1+\delta )}}^{\frac{g(Y^*)}{(1-\delta )}} f_{Y|Y^*}(y)dy \end{aligned}$$

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, Y., Lin, YX., Krivitsky, P.N., Wakefield, B. (2018). Quantifying the Protection Level of a Noise Candidate for Noise Multiplication Masking Scheme. In: Domingo-Ferrer, J., Montes, F. (eds) Privacy in Statistical Databases. PSD 2018. Lecture Notes in Computer Science(), vol 11126. Springer, Cham. https://doi.org/10.1007/978-3-319-99771-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99771-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99770-4

  • Online ISBN: 978-3-319-99771-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics