research-article

Differentially private data release for data mining

Authors:

Noman Mohammed,

Benjamin C.M. Fung,

Philip S. YuAuthors Info & Claims

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 493 - 501

https://doi.org/10.1145/2020408.2020487

Published: 21 August 2011 Publication History

Abstract

Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary's background knowledge. Most of the existing solutions that ensure ∈-differential privacy are based on an interactive model, where the data miner is only allowed to pose aggregate queries to the database. In this paper, we propose the first anonymization algorithm for the non-interactive setting based on the generalization technique. The proposed solution first probabilistically generalizes the raw data and then adds noise to guarantee ∈-differential privacy. As a sample application, we show that the anonymized data can be used effectively to build a decision tree induction classifier. Experimental results demonstrate that the proposed non-interactive anonymization algorithm is scalable and performs better than the existing solutions for classification analysis.

References

[1]

B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In PODS, 2007.

Digital Library

[2]

R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005.

Digital Library

[3]

R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In SIGKDD, 2010.

Digital Library

[4]

A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, 2008.

Digital Library

[5]

G. Cormode, D. Srivastava, N. Li, and T. Li. Minimizing minimality and maximizing utility: Analyzing methodbased attacks on anonymized data. In VLDB, 2010.

Digital Library

[6]

I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, 2003.

Digital Library

[7]

C. Dwork. Differential privacy. In ICALP, 2006.

Digital Library

[8]

C. Dwork. A firm foundation for private data analysis. Commun. ACM, 54(1):86--95, 2011.

Digital Library

[9]

C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006.

Digital Library

[10]

A. Frank and A. Asuncion. UCI machine learning repository, 2010.

[11]

A. Friedman and A. Schuster. Data mining with differential privacy. In SIGKDD, 2010.

Digital Library

[12]

B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4):1--53, June 2010.

Digital Library

[13]

B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE TKDE, 19(5):711--725, May 2007.

Digital Library

[14]

S. R. Ganta, S. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In SIGKDD, 2008.

Digital Library

[15]

M. Hardt and K. Talwar. On the geometry of differential privacy. In STOC, 2010.

Digital Library

[16]

M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. In VLDB, 2010.

Digital Library

[17]

A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino. Private record matching using differential privacy. In EDBT, 2010.

Digital Library

[18]

V. S. Iyengar. Transforming data to satisfy privacy constraints. In SIGKDD, 2002.

Digital Library

[19]

X. Jin, N. Zhang, and G. Das. Algorithm-safe privacy-preserving data publishing. In EDBT, 2010.

Digital Library

[20]

S. P. Kasiviswanathan, M. Rudelson, A. Smith, and J. Ullman. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In STOC, 2010.

Digital Library

[21]

D. Kifer. Attacks on privacy and de finetti's theorem. In SIGMOD, 2009.

Digital Library

[22]

D. Kifer and B. Lin. Towards an axiomatization of statistical privacy and utility. In PODS, 2010.

Digital Library

[23]

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, 2006.

Digital Library

[24]

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale data sets. ACM TODS, 33(3), 2008.

Digital Library

[25]

C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing linear counting queries under differential privacy. In PODS, 2010.

Digital Library

[26]

N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.

[27]

A. Machanavajjhala, J. Gehrke, and M. Gotz. Data publishing against realistic adversaries. In VLDB, 2009.

Digital Library

[28]

A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 2007.

Digital Library

[29]

D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Halpern. Worst-case background knowledge in privacy-preserving data publishing. In ICDE, 2007.

[30]

F. McSherry. Privacy integrated queries. In SIGMOD, 2009.

Digital Library

[31]

F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the net. In SIGKDD, 2009.

Digital Library

[32]

F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007.

Digital Library

[33]

N. Mohammed, B. C. M. Fung, P. C. K. Hung, and C. Lee. Anonymizing healthcare data: A case study on the blood transfusion service. In SIGKDD, 2009.

Digital Library

[34]

J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

Digital Library

[35]

A. Roth and T. Roughgarden. Interactive privacy via the median mechanism. In STOC, 2010.

Digital Library

[36]

P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 2001.

Digital Library

[37]

L. Sweeney. k-anonymity: A model for protecting privacy. In International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002.

Digital Library

[38]

K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: An alternative to k-anonymization. KAIS, 11(3):345--368, April 2007.

Digital Library

[39]

R. C. W. Wong, A. W. C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, 2007.

Digital Library

[40]

R. C. W. Wong, A. W. C. Fu, K. Wang, Y. Xu, and P. S. Yu. Can the utility of anonymized data be used for privacy breaches? ACM TKDD, to appear.

Digital Library

[41]

R. C. W. Wong, J. Li., A. W. C. Fu, and K. Wang. (a,k)-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In SIGKDD, 2006.

Digital Library

[42]

X. Xiao and Y. Tao. Personalized privacy preservation. In SIGMOD, 2006.

Digital Library

[43]

X. Xiao, Y. Tao, and N. Koudas. Transparent anonymization: Thwarting adversaries who know the algorithm. ACM TODS, 35(2), 2010.

Digital Library

[44]

X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, 2010.

[45]

L. Zhang, S. Jajodia, and A. Brodsky. Information disclosure under realistic assumptions: Privacy versus optimality. In CCS, 2007.

Digital Library

Cited By

Mudassar BTahir SKhan FShah SShah SAbbasi Q(2024)Privacy-Preserving Data Analytics in Internet of Medical ThingsFuture Internet10.3390/fi1611040716:11(407)Online publication date: 5-Nov-2024
https://doi.org/10.3390/fi16110407
Di Fazio A(2024)Enhancing Privacy in Recommender Systems through Differential Privacy TechniquesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688019(1348-1352)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688019
Hu YWu FLi QLong YGarrido GGe CDing BForsyth DLi BSong D(2024)SoK: Privacy-Preserving Data Synthesis2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00002(4696-4713)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00002
Show More Cited By

Index Terms

Differentially private data release for data mining

Recommendations

A differentially private algorithm for location data release

The rise of mobile technologies in recent years has led to large volumes of location information, which are valuable resources for knowledge discovery such as travel patterns mining and traffic analysis. However, location dataset has been confronted ...
Differentially private data publishing via optimal univariate microaggregation and record perturbation
Abstract
We present an approach to generate differentially private data sets that consists in adding noise to a microaggregated version of the original data set. While this idea has already been pursued in the literature to reduce the ...
IMR based Anonymization for Privacy Preservation in Data Mining
KMO '16: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society

Privacy Preserving Data Mining (PPDM) is a data mining research area that aims to protect individual's personal information from unsolicited or unauthorized disclosure. Privacy relates to personal information that a person would not wish others to know ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2011

1446 pages

ISBN:9781450308137

DOI:10.1145/2020408

General Chair:
Chid Apte
IBM Research
,
Program Chairs:
Joydeep Ghosh
UT Austin
,
Padhraic Smyth
UC Irvine

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '11

Sponsor:

KDD '11: The 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 21 - 24, 2011

California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

224
Total Citations
View Citations
2,376
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)2

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mudassar BTahir SKhan FShah SShah SAbbasi Q(2024)Privacy-Preserving Data Analytics in Internet of Medical ThingsFuture Internet10.3390/fi1611040716:11(407)Online publication date: 5-Nov-2024
https://doi.org/10.3390/fi16110407
Di Fazio A(2024)Enhancing Privacy in Recommender Systems through Differential Privacy TechniquesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688019(1348-1352)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688019
Hu YWu FLi QLong YGarrido GGe CDing BForsyth DLi BSong D(2024)SoK: Privacy-Preserving Data Synthesis2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00002(4696-4713)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00002
Bojkovic NLoh P(2024)Differentially Private Synthetic Data with Private Density Estimation2024 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT57864.2024.10619641(599-604)Online publication date: 7-Jul-2024
https://doi.org/10.1109/ISIT57864.2024.10619641
Hu KGong SZhang QSeng CXia MJiang S(2024)An overview of implementing security and privacy in federated learningArtificial Intelligence Review10.1007/s10462-024-10846-857:8Online publication date: 11-Jul-2024
https://doi.org/10.1007/s10462-024-10846-8
Kreačić ENouri NPotluru VBalch TVeloso MEvans RShpitser I(2023)Differentially private synthetic data using KD-treesProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625942(1143-1153)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625942
Utaliyeva AShin JChoi Y(2023)Task-Specific Adaptive Differential Privacy Method for Structured DataSensors10.3390/s2304198023:4(1980)Online publication date: 10-Feb-2023
https://doi.org/10.3390/s23041980
Majeed AHwang S(2023)A Generic Approach towards Enhancing Utility and Privacy in Person-Specific Data Publishing Based on Attribute Usefulness and UncertaintyElectronics10.3390/electronics1209197812:9(1978)Online publication date: 24-Apr-2023
https://doi.org/10.3390/electronics12091978
Khatir RIzadkhah HRazmara J(2023)Designing a Novel Approach Using a Greedy and Information-Theoretic Clustering-Based Algorithm for Anonymizing Microdata SetsEntropy10.3390/e2512161325:12(1613)Online publication date: 1-Dec-2023
https://doi.org/10.3390/e25121613
Cai KXiao XCormode G(2023)PrivLava: Synthesizing Relational Data with Foreign Keys under Differential PrivacyProceedings of the ACM on Management of Data10.1145/35892871:2(1-25)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589287
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents