Abstract
Data releasing and sharing between several fields has became inevitable tendency in the context of big data. Unfortunately, this situation has clearly caused enormous exposure of sensitive and private information. Along with massive privacy breaches, privacy-preservation issues were brought into sharp focus and privacy concerns may prevent people from providing their personal data. To meet the requirements of privacy protection, such a problem has been extensively studied. However, privacy protection of sensitive information should not prevent data users from conducting valid analyses of the released data. We propose a novel algorithm in this paper, named Data Release under Adjustable Privacy-utility Equilibrium (DRAPE), to address this problem. We handle the privacy versus utility tradeoff in the data release problem by breaking sensitive associations among variables while maintaining the correlations of nonsensitive variables. Furthermore, we quantify the impact of the proposed privacy-preserving method in terms of correlation preservation and privacy level, and thereby develop an optimization model to fulfil data privacy and data utility constraints. The proposed approach is not only able to provide a better privacy levels control scheme for data publishers, but also provides personalized service for data requesters with different utility requirements. We conduct experiments on one simulated dataset and two real datasets, and the simulation results show that DRAPE efficiently achieves a guaranteed privacy level while simultaneously effectively preserving data utility.
Similar content being viewed by others
Availability of data and material
Some or all data generated or used during the study are available online.
Code Availability
The model required to reproduce these findings cannot be shared at this time as the model also forms part of an ongoing study
References
Chen M, Mao S, Zhang Y, Leung VC et al (2014) Big data: related technologies, challenges and future prospects. vol. 96
Aggarwal CC, Philip SY (2008) A general survey of privacy-preserving data mining models and algorithms. Privacy-preserving data mining pp. 11–52
Zhu T, Li G, Zhou W, Philip SY (2017) Differentially private data publishing and analysis: a survey. IEEE Trans Knowl Data Eng 29(8):1619–1638
Xiao X, Tao Y, Koudas N (2010) Transparent anonymization: thwarting adversaries who know the algorithm. ACM Trans Database Syst 35(2):1–48
Cormode G, Srivastava D, Li N, Li T (2010) Minimizing minimality and maximizing utility: analyzing method-based attacks on anonymized data. Proceed VLDB Endowment 3(1–2):1045–1056
Motiwalla L, Li X-B (2013) Developing privacy solutions for sharing and analysing healthcare data. Int J Bus Inf Syst 13(2):199–216
Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582
Kargupta H, Datta S, Wang Q, Sivakumar K (2003) On the privacy preserving properties of random data perturbation techniques. In: 3rd IEEE international conference on data mining, pp. 99–106
Yao AC (1982) Protocols for secure computations. In: 23rd annual symposium on foundations of computer science pp. 160–164
Rivest RL, Adleman L, Dertouzos ML et al (1978) On data banks and privacy homomorphisms. Found Secur Comput 4(11):169–180
Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference, pp. 265–284
Fung BC, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4):1–53
Cynthia D (2006) Differential privacy. Automata, languages and programming pp. 1–12
Lindell Y (2009) Secure computation for privacy preserving data mining. In: Encyclopedia of data warehousing and mining pp. 1747–1752
Sarathy R, Muralidhar K (2002) The security of confidential numerical data in databases. Inf Syst Res 13(4):389–403
Sweeney L (2002) k-anonymity: a model for protecting privacy. Internat J Uncertain Fuzziness Knowl-Based Syst 10(05):557–570
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3–es
Li N, Li T, Venkatasubramanian S (2007) t-closeness: Privacy beyond k-anonymity and l-diversity. In: 2007 IEEE 23rd international conference on data engineering, pp. 106–115
Wong RCW, Fu AWC, Wang K, Yu PS, Pei J (2011) Can the utility of anonymized data be used for privacy breaches? ACM Trans Knowl Discov Data 5(3):1–24
Kifer D, Gehrke J (2006) Injecting utility into anonymized datasets. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 217–228
Li T, Li N (2008) Injector: Mining background knowledge for data anonymization. In: 2008 IEEE 24th international conference on data engineering, pp. 446–455
Kartal HB, Li XB (2020) Protecting privacy when sharing and releasing data with multiple records per person. J Assoc Inf Syst 21(6):1461–1485
Dalenius T, Reiss SP (1982) Data-swapping: a technique for disclosure control. J Stat Plann Inf 6(1):73–85
Liew CK, Choi UJ, Liew CJ (1985) A data distortion by probability distribution. ACM Trans Database Syst 10(3):395–411
Liu K, Kargupta H, Ryan J (2005) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106
Liu P, Le Wang, Li X (2017) Randomized perturbation for privacy-preserving social network data publishing. In: IEEE international conference on big knowledge, 2017:208–213
Badu-Marfo G, Farooq B, Patterson Z (2019) Perturbation privacy for sensitive locations in transit data publication: A case study of montreal trajet surveys. CoRR
Huang Z, Du W, Chen B (2005) Deriving private information from randomized data. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 37–48
Liu L, Kantarcioglu M, Thuraisingham B (2008) The applicability of the perturbation based privacy preserving data mining for real-world data. Data & Knowl Eng 65(1):5–21
Li XB, Sarkar S (2013) Class-restricted clustering and microperturbation for data privacy. Manage Sci 59(4):796–812
Jiang X, Ji Z, Wang S, Mohammed N, Cheng S, Ohno-Machado L (2013) Differential-private data publishing through component analysis. Trans Data Privacy 6(1):19
Gong M, Pan K, Xie Y (2019) Differential privacy preservation in regression analysis based on relevance. Knowl-Based Syst 173:140–149
Baak M, Koopman R, Snoek H, Klous S (2020) A new correlation coefficient between categorical, ordinal and interval variables with Pearson characteristics. Comput Stat Data Anal 152:107043
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp. 439–450
Muralidhar K, Parsa R, Sarathy R (1999) A general additive data perturbation method for database security. Manage Sci 45(10):1399–1415
Brand R, Domingo-Ferrer J, Mateo-Sanz J (2002) Reference data sets to test and compare SDC methods for the protection of numerical microdata
Harrison D, Rubimfeld D (1978) Hedonic prices and the demand for clean air
Soria-Comas J, Domingo-Ferrer J, Sánchez D, Martínez S (2014) Enhancing data utility in differential privacy via microaggregation-based k-anonymity. VLDB J 23(5):771–794
Olson RS, Urbanowicz RJ, Andrews PC, Lavender NA, Moore JH et al (2016) Automating biomedical data science through tree-based pipeline optimization. In: European conference on the applications of evolutionary computation, pp. 123–137
Acknowledgements
This work was supported by the National Natural Science Foundation of China (No. 71871090), the Science & Technology Innovation Leading Project of Hunan High-tech Industry (No. 2020GK2005) and the Natural Science Foundation of Hunan Province of China (No. 2021JJ30158).
Funding
This work was supported by the National Natural Science Foundation of China (No. 71871090), the Science & Technology Innovation Leading Project of Hunan High-tech Industry (No. 2020GK2005) and the Natural Science Foundation of Hunan Province of China (No. 2021JJ30158).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No conflict of interest exits in the submission of this manuscript.
Consent for publication
Not applicable.
Consent for Participate
Not applicable.
Ethical Approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiong, Q., Lan, Q., Ma, J. et al. DRAPE: optimizing private data release under adjustable privacy-utility equilibrium. Inf Technol Manag 25, 199–217 (2024). https://doi.org/10.1007/s10799-022-00378-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10799-022-00378-4