Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2020408.2020487acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Differentially private data release for data mining

Published: 21 August 2011 Publication History

Abstract

Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among the existing privacy models, ∈-differential privacy provides one of the strongest privacy guarantees and has no assumptions about an adversary's background knowledge. Most of the existing solutions that ensure ∈-differential privacy are based on an interactive model, where the data miner is only allowed to pose aggregate queries to the database. In this paper, we propose the first anonymization algorithm for the non-interactive setting based on the generalization technique. The proposed solution first probabilistically generalizes the raw data and then adds noise to guarantee ∈-differential privacy. As a sample application, we show that the anonymized data can be used effectively to build a decision tree induction classifier. Experimental results demonstrate that the proposed non-interactive anonymization algorithm is scalable and performs better than the existing solutions for classification analysis.

References

[1]
B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In PODS, 2007.
[2]
R. J. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In ICDE, 2005.
[3]
R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In SIGKDD, 2010.
[4]
A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, 2008.
[5]
G. Cormode, D. Srivastava, N. Li, and T. Li. Minimizing minimality and maximizing utility: Analyzing methodbased attacks on anonymized data. In VLDB, 2010.
[6]
I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, 2003.
[7]
C. Dwork. Differential privacy. In ICALP, 2006.
[8]
C. Dwork. A firm foundation for private data analysis. Commun. ACM, 54(1):86--95, 2011.
[9]
C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006.
[10]
A. Frank and A. Asuncion. UCI machine learning repository, 2010.
[11]
A. Friedman and A. Schuster. Data mining with differential privacy. In SIGKDD, 2010.
[12]
B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4):1--53, June 2010.
[13]
B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE TKDE, 19(5):711--725, May 2007.
[14]
S. R. Ganta, S. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In SIGKDD, 2008.
[15]
M. Hardt and K. Talwar. On the geometry of differential privacy. In STOC, 2010.
[16]
M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. In VLDB, 2010.
[17]
A. Inan, M. Kantarcioglu, G. Ghinita, and E. Bertino. Private record matching using differential privacy. In EDBT, 2010.
[18]
V. S. Iyengar. Transforming data to satisfy privacy constraints. In SIGKDD, 2002.
[19]
X. Jin, N. Zhang, and G. Das. Algorithm-safe privacy-preserving data publishing. In EDBT, 2010.
[20]
S. P. Kasiviswanathan, M. Rudelson, A. Smith, and J. Ullman. The price of privately releasing contingency tables and the spectra of random matrices with correlated rows. In STOC, 2010.
[21]
D. Kifer. Attacks on privacy and de finetti's theorem. In SIGMOD, 2009.
[22]
D. Kifer and B. Lin. Towards an axiomatization of statistical privacy and utility. In PODS, 2010.
[23]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian multidimensional k-anonymity. In ICDE, 2006.
[24]
K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload-aware anonymization techniques for large-scale data sets. ACM TODS, 33(3), 2008.
[25]
C. Li, M. Hay, V. Rastogi, G. Miklau, and A. McGregor. Optimizing linear counting queries under differential privacy. In PODS, 2010.
[26]
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, 2007.
[27]
A. Machanavajjhala, J. Gehrke, and M. Gotz. Data publishing against realistic adversaries. In VLDB, 2009.
[28]
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM TKDD, 2007.
[29]
D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Halpern. Worst-case background knowledge in privacy-preserving data publishing. In ICDE, 2007.
[30]
F. McSherry. Privacy integrated queries. In SIGMOD, 2009.
[31]
F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the net. In SIGKDD, 2009.
[32]
F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007.
[33]
N. Mohammed, B. C. M. Fung, P. C. K. Hung, and C. Lee. Anonymizing healthcare data: A case study on the blood transfusion service. In SIGKDD, 2009.
[34]
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[35]
A. Roth and T. Roughgarden. Interactive privacy via the median mechanism. In STOC, 2010.
[36]
P. Samarati. Protecting respondents' identities in microdata release. IEEE TKDE, 2001.
[37]
L. Sweeney. k-anonymity: A model for protecting privacy. In International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002.
[38]
K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: An alternative to k-anonymization. KAIS, 11(3):345--368, April 2007.
[39]
R. C. W. Wong, A. W. C. Fu, K. Wang, and J. Pei. Minimality attack in privacy preserving data publishing. In VLDB, 2007.
[40]
R. C. W. Wong, A. W. C. Fu, K. Wang, Y. Xu, and P. S. Yu. Can the utility of anonymized data be used for privacy breaches? ACM TKDD, to appear.
[41]
R. C. W. Wong, J. Li., A. W. C. Fu, and K. Wang. (a,k)-anonymity: An enhanced k-anonymity model for privacy preserving data publishing. In SIGKDD, 2006.
[42]
X. Xiao and Y. Tao. Personalized privacy preservation. In SIGMOD, 2006.
[43]
X. Xiao, Y. Tao, and N. Koudas. Transparent anonymization: Thwarting adversaries who know the algorithm. ACM TODS, 35(2), 2010.
[44]
X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, 2010.
[45]
L. Zhang, S. Jajodia, and A. Brodsky. Information disclosure under realistic assumptions: Privacy versus optimality. In CCS, 2007.

Cited By

View all
  • (2024)Privacy-Preserving Data Analytics in Internet of Medical ThingsFuture Internet10.3390/fi1611040716:11(407)Online publication date: 5-Nov-2024
  • (2024)Enhancing Privacy in Recommender Systems through Differential Privacy TechniquesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688019(1348-1352)Online publication date: 8-Oct-2024
  • (2024)SoK: Privacy-Preserving Data Synthesis2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00002(4696-4713)Online publication date: 19-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2011
1446 pages
ISBN:9781450308137
DOI:10.1145/2020408
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anonymization
  2. data mining
  3. differential privacy

Qualifiers

  • Research-article

Conference

KDD '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Privacy-Preserving Data Analytics in Internet of Medical ThingsFuture Internet10.3390/fi1611040716:11(407)Online publication date: 5-Nov-2024
  • (2024)Enhancing Privacy in Recommender Systems through Differential Privacy TechniquesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688019(1348-1352)Online publication date: 8-Oct-2024
  • (2024)SoK: Privacy-Preserving Data Synthesis2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00002(4696-4713)Online publication date: 19-May-2024
  • (2024)Differentially Private Synthetic Data with Private Density Estimation2024 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT57864.2024.10619641(599-604)Online publication date: 7-Jul-2024
  • (2024)An overview of implementing security and privacy in federated learningArtificial Intelligence Review10.1007/s10462-024-10846-857:8Online publication date: 11-Jul-2024
  • (2023)Differentially private synthetic data using KD-treesProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625942(1143-1153)Online publication date: 31-Jul-2023
  • (2023)Task-Specific Adaptive Differential Privacy Method for Structured DataSensors10.3390/s2304198023:4(1980)Online publication date: 10-Feb-2023
  • (2023)A Generic Approach towards Enhancing Utility and Privacy in Person-Specific Data Publishing Based on Attribute Usefulness and UncertaintyElectronics10.3390/electronics1209197812:9(1978)Online publication date: 24-Apr-2023
  • (2023)Designing a Novel Approach Using a Greedy and Information-Theoretic Clustering-Based Algorithm for Anonymizing Microdata SetsEntropy10.3390/e2512161325:12(1613)Online publication date: 1-Dec-2023
  • (2023)PrivLava: Synthesizing Relational Data with Foreign Keys under Differential PrivacyProceedings of the ACM on Management of Data10.1145/35892871:2(1-25)Online publication date: 20-Jun-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media