DRAPE: optimizing private data release under adjustable privacy-utility equilibrium

Qingyue Xiong¹,
Qiujun Lan¹,
Jiaqi Ma¹,
Huiling Zhou¹,
Gang Li² &
…
Zheng Yang³

263 Accesses
Explore all metrics

Abstract

Data releasing and sharing between several fields has became inevitable tendency in the context of big data. Unfortunately, this situation has clearly caused enormous exposure of sensitive and private information. Along with massive privacy breaches, privacy-preservation issues were brought into sharp focus and privacy concerns may prevent people from providing their personal data. To meet the requirements of privacy protection, such a problem has been extensively studied. However, privacy protection of sensitive information should not prevent data users from conducting valid analyses of the released data. We propose a novel algorithm in this paper, named Data Release under Adjustable Privacy-utility Equilibrium (DRAPE), to address this problem. We handle the privacy versus utility tradeoff in the data release problem by breaking sensitive associations among variables while maintaining the correlations of nonsensitive variables. Furthermore, we quantify the impact of the proposed privacy-preserving method in terms of correlation preservation and privacy level, and thereby develop an optimization model to fulfil data privacy and data utility constraints. The proposed approach is not only able to provide a better privacy levels control scheme for data publishers, but also provides personalized service for data requesters with different utility requirements. We conduct experiments on one simulated dataset and two real datasets, and the simulation results show that DRAPE efficiently achieves a guaranteed privacy level while simultaneously effectively preserving data utility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Differentially Private User Data Perturbation with Multi-level Privacy Controls

Variational Optimization of Informational Privacy

Evaluating Differential Privacy on Correlated Datasets Using Pointwise Maximal Leakage

Availability of data and material

Some or all data generated or used during the study are available online.

Code Availability

The model required to reproduce these findings cannot be shared at this time as the model also forms part of an ongoing study

References

Chen M, Mao S, Zhang Y, Leung VC et al (2014) Big data: related technologies, challenges and future prospects. vol. 96
Aggarwal CC, Philip SY (2008) A general survey of privacy-preserving data mining models and algorithms. Privacy-preserving data mining pp. 11–52
Zhu T, Li G, Zhou W, Philip SY (2017) Differentially private data publishing and analysis: a survey. IEEE Trans Knowl Data Eng 29(8):1619–1638
Article Google Scholar
Xiao X, Tao Y, Koudas N (2010) Transparent anonymization: thwarting adversaries who know the algorithm. ACM Trans Database Syst 35(2):1–48
Article Google Scholar
Cormode G, Srivastava D, Li N, Li T (2010) Minimizing minimality and maximizing utility: analyzing method-based attacks on anonymized data. Proceed VLDB Endowment 3(1–2):1045–1056
Article Google Scholar
Motiwalla L, Li X-B (2013) Developing privacy solutions for sharing and analysing healthcare data. Int J Bus Inf Syst 13(2):199–216
Google Scholar
Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582
Article Google Scholar
Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: 3rd IEEE international conference on data mining, pp. 99–106
Yao AC (1982) Protocols for secure computations. In: 23rd annual symposium on foundations of computer science pp. 160–164
Rivest RL, Adleman L, Dertouzos ML et al (1978) On data banks and privacy homomorphisms. Found Secur Comput 4(11):169–180
Google Scholar
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference, pp. 265–284
Fung BC, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4):1–53
Article Google Scholar
Cynthia D (2006) Differential privacy. Automata, languages and programming pp. 1–12
Lindell Y (2009) Secure computation for privacy preserving data mining. In: Encyclopedia of data warehousing and mining pp. 1747–1752
Sarathy R, Muralidhar K (2002) The security of confidential numerical data in databases. Inf Syst Res 13(4):389–403
Article Google Scholar
Sweeney L (2002) k-anonymity: a model for protecting privacy. Internat J Uncertain Fuzziness Knowl-Based Syst 10(05):557–570
Article Google Scholar
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3–es
Article Google Scholar
Li N, Li T, Venkatasubramanian S (2007) t-closeness: Privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd international conference on data engineering, pp. 106–115
Wong RCW, Fu AWC, Wang K, Yu PS, Pei J (2011) Can the utility of anonymized data be used for privacy breaches? ACM Trans Knowl Discov Data 5(3):1–24
Article Google Scholar
Kifer D, Gehrke J (2006) Injecting utility into anonymized datasets. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 217–228
Li T, Li N (2008) Injector: Mining background knowledge for data anonymization. In: 2008 IEEE 24th international conference on data engineering, pp. 446–455
Kartal HB, Li XB (2020) Protecting privacy when sharing and releasing data with multiple records per person. J Assoc Inf Syst 21(6):1461–1485
Google Scholar
Dalenius T, Reiss SP (1982) Data-swapping: a technique for disclosure control. J Stat Plann Inf 6(1):73–85
Article Google Scholar
Liew CK, Choi UJ, Liew CJ (1985) A data distortion by probability distribution. ACM Trans Database Syst 10(3):395–411
Article Google Scholar
Liu K, Kargupta H, Ryan J (2005) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106
Google Scholar
Liu P, Le Wang, Li X (2017) Randomized perturbation for privacy-preserving social network data publishing. In: IEEE international conference on big knowledge, 2017:208–213
Badu-Marfo G, Farooq B, Patterson Z (2019) Perturbation privacy for sensitive locations in transit data publication: A case study of montreal trajet surveys. CoRR
Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 37–48
Liu L, Kantarcioglu M, Thuraisingham B (2008) The applicability of the perturbation based privacy preserving data mining for real-world data. Data & Knowl Eng 65(1):5–21
Article Google Scholar
Li XB, Sarkar S (2013) Class-restricted clustering and microperturbation for data privacy. Manage Sci 59(4):796–812
Article Google Scholar
Jiang X, Ji Z, Wang S, Mohammed N, Cheng S, Ohno-Machado L (2013) Differential-private data publishing through component analysis. Trans Data Privacy 6(1):19
Google Scholar
Gong M, Pan K, Xie Y (2019) Differential privacy preservation in regression analysis based on relevance. Knowl-Based Syst 173:140–149
Article Google Scholar
Baak M, Koopman R, Snoek H, Klous S (2020) A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics. Comput Stat Data Anal 152:107043
Article Google Scholar
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp. 439–450
Muralidhar K, Parsa R, Sarathy R (1999) A general additive data perturbation method for database security. Manage Sci 45(10):1399–1415
Article Google Scholar
Brand R, Domingo-Ferrer J, Mateo-Sanz J (2002) Reference data sets to test and compare SDC methods for the protection of numerical microdata
Harrison D, Rubimfeld D (1978) Hedonic prices and the demand for clean air
Soria-Comas J, Domingo-Ferrer J, Sánchez D, Martínez S (2014) Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J 23(5):771–794
Article Google Scholar
Olson RS, Urbanowicz RJ, Andrews PC, Lavender NA, Moore JH et al (2016) Automating biomedical data science through tree-based pipeline optimization. In: European conference on the applications of evolutionary computation, pp. 123–137

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 71871090), the Science & Technology Innovation Leading Project of Hunan High-tech Industry (No. 2020GK2005) and the Natural Science Foundation of Hunan Province of China (No. 2021JJ30158).

Funding

Author information

Authors and Affiliations

School of Business, Hunan University, Changsha, 410082, China
Qingyue Xiong, Qiujun Lan, Jiaqi Ma & Huiling Zhou
School of Information Technology, Deakin University, Geelong, VIC, 3216, Australia
Gang Li
Tianheguoyun Technology Co., Ltd., Hunan Tianhe Blockchain Research Institute, Changsha, 410019, China
Zheng Yang

Authors

Qingyue Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Qiujun Lan
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqi Ma
View author publications
You can also search for this author in PubMed Google Scholar
Huiling Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Gang Li
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qiujun Lan.

Ethics declarations

Conflict of interest

No conflict of interest exits in the submission of this manuscript.

Consent for publication

Not applicable.

Consent for Participate

Not applicable.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xiong, Q., Lan, Q., Ma, J. et al. DRAPE: optimizing private data release under adjustable privacy-utility equilibrium. Inf Technol Manag 25, 199–217 (2024). https://doi.org/10.1007/s10799-022-00378-4

Download citation

Accepted: 26 August 2022
Published: 02 October 2022
Issue Date: June 2024
DOI: https://doi.org/10.1007/s10799-022-00378-4

DRAPE: optimizing private data release under adjustable privacy-utility equilibrium

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Differentially Private User Data Perturbation with Multi-level Privacy Controls

Variational Optimization of Informational Privacy

Evaluating Differential Privacy on Correlated Datasets Using Pointwise Maximal Leakage

Availability of data and material

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Consent for publication

Consent for Participate

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

DRAPE: optimizing private data release under adjustable privacy-utility equilibrium

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Differentially Private User Data Perturbation with Multi-level Privacy Controls

Variational Optimization of Informational Privacy

Evaluating Differential Privacy on Correlated Datasets Using Pointwise Maximal Leakage

Availability of data and material

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Consent for publication

Consent for Participate

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation