Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Differentially private histogram publication

Published: 01 December 2013 Publication History

Abstract

Differential privacy (DP) is a promising scheme for releasing the results of statistical queries on sensitive data, with strong privacy guarantees against adversaries with arbitrary background knowledge. Existing studies on differential privacy mostly focus on simple aggregations such as counts. This paper investigates the publication of DP-compliant histograms, which is an important analytical tool for showing the distribution of a random variable, e.g., hospital bill size for certain patients. Compared to simple aggregations whose results are purely numerical, a histogram query is inherently more complex, since it must also determine its structure, i.e., the ranges of the bins. As we demonstrate in the paper, a DP-compliant histogram with finer bins may actually lead to significantly lower accuracy than a coarser one, since the former requires stronger perturbations in order to satisfy DP. Moreover, the histogram structure itself may reveal sensitive information, which further complicates the problem. Motivated by this, we propose two novel mechanisms, namely NoiseFirst and StructureFirst, for computing DP-compliant histograms. Their main difference lies in the relative order of the noise injection and the histogram structure computation steps. NoiseFirst has the additional benefit that it can improve the accuracy of an already published DP-compliant histogram computed using a naive method. For each of proposed mechanisms, we design algorithms for computing the optimal histogram structure with two different objectives: minimizing the mean square error and the mean absolute error, respectively. Going one step further, we extend both mechanisms to answer arbitrary range queries. Extensive experiments, using several real datasets, confirm that our two proposals output highly accurate query answers and consistently outperform existing competitors.

References

[1]
Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: PODS, pp. 273---282 (2007)
[2]
Bhaskar, R., Laxman, S., Smith, A., Thakurta, A.: Discovering frequent patterns in sensitive data. In: KDD, pp. 503---512 (2010)
[3]
Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: STOC, pp. 609---618 (2008)
[4]
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn., pp. 185---192. MIT Press and McGraw-Hill, New York (2001)
[5]
Cormode, G., Procopiuc, C.M., Srivastava, D., Tran, T.T.L.: Differentially private publication of sparse data. In: ICDT (2012)
[6]
Cormode, G., Procopiuc, M., Shen, E., Srivastava, D., Yu, T.: Differentially private spatial decompositions. In: ICDE (2012)
[7]
Ding, B., Winslett, M., Han, J., Li, Z.: Differentially private data cubes: optimizing noise sources and consistency. In: SIGMOD, pp. 217---228 (2011)
[8]
Dwork, C.: Differential privacy: a survey of results. In: TAMC, pp. 1---19 (2008)
[9]
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: TCC, pp. 265---284 (2006)
[10]
Dwork, C., McSherry, F., Talwar, K.: The price of privacy and the limits of LP decoding. In: STOC, pp. 85---94 (2007)
[11]
Dwork, C., Rothblum, G.N., Vadhan, S.P.: Boosting and differential privacy. In: FOCS, pp. 51---60 (2010)
[12]
Friedman, A., Schuster, A.: Data mining with differential privacy. In: KDD, pp. 493---502 (2010)
[13]
Götz, M., Machanavajjhala, A., Wang, G., Xiao, X., Gehrke, J.: Publishing search logs--a comparative study of privacy guarantees. IEEE TKDE 24(3): 520---532 (2012)
[14]
Guha, S., Koudas, N., Shim, K.: Approximation and streaming algorithms for histogram construction problems. ACM TODS 31(1), 396---438 (2006)
[15]
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. PVLDB 3(1), 1021---1032 (2010)
[16]
Homer N., Szelinger S., Redman M., Duggan D., Tembe W., Muehling J., Pearson J.V., Stephan D.A., Nelson S.F., Craig, D.W.: Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genet. 4(8), e100167 (2008)
[17]
Jagadish, H.V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K.C., Suel, T.: Optimal histograms with quality guarantees. In: VLDB, pp. 275---286 (1998)
[18]
Jagadish H.V., Koudas N., Muthukrishnan S., Poosala V., Sevcik K.C., Suel T. Optimal histograms with quality guarantees. In: VLDB, pp. 275---286 (1998)
[19]
Korolova, A., Kenthapadi, K., Mishra, N., Ntoulas, A.: Releasing search queries and clicks privately. In: WWW, pp. 171---180 (2009)
[20]
Kotz, S., Kozubowski, T., Podgórski, K.: The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance. Birkhäuser Publication, Boston (2001)
[21]
Li, C., Hay, M., Rastogi, V., Miklau, G., McGregor, A.: Optimizing linear counting queries under differential privacy. In: PODS, pp. 123---134 (2010)
[22]
Li, C., Miklau, G.: An adaptive mechanism for accurate query answering under differential privacy. PVLDB 5(6), 514---525 (2012)
[23]
McSherry, F., Mahajan R. Differentially-private network trace analysis. In: SIGCOMM, pp. 123---134 (2010)
[24]
Mohan, P., Thakurta, A., Shi, E., Song, D., Culler, D.E.: Gupt: privacy preserving data analysis made easy. In: SIGMOD, pp. 349---360 (2012)
[25]
Rastogi V., Nath S.: Differentially private aggregation of distributed time-series with transformation and encryption. In: SIGMOD, pp. 735---746 (2010)
[26]
Wang, R., Li, Y., Wang, X., Tang, H., Zhou, X.: Learning your identity and disease from research papers: Information leaks in genome wide association study. In: ACM CCS (2009)
[27]
Xiao, X., Bender, G., Hay, M., Gehrke, J.: ireduct: differential privacy with reduced relative errors. In: SIGMOD, pp. 229---240 (2011)
[28]
Xiao, X., Wang, G., Gehrke, J.: Differential privacy via wavelet transforms. In: ICDE, pp. 225---236 (2010)
[29]
Xiao, Y., Xiong, L., Yuan, C.: Differentially private data release through multidimensional partitioning. In: Secure Data Management, pp. 150---168 (2010)
[30]
Yuan, G., Zhang, Z., Winslett, M., Xiao, X., Yang, Y., Hao, Z.: Low-rank mechanism: optimizing batch queries under differential privacy. PVLDB 5(11), 1352---1363 (2012)
[31]
Zhang, J., Zhang, Z., Xiao, X., Yang, Y., Winslett, M.: Functional mechanism: regression analysis under differential privacy. PVLDB 5(11), 1364---1375 (2012)

Cited By

View all
  • (2024)LLM-PBE: Assessing Data Privacy in Large Language ModelsProceedings of the VLDB Endowment10.14778/3681954.368199417:11(3201-3214)Online publication date: 1-Jul-2024
  • (2024)Publishing Common Neighbors Histograms of Social Networks under Edge Differential PrivacyProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3637646(1099-1113)Online publication date: 1-Jul-2024
  • (2024)STDA: Secure Time Series Data Analytics With Practical Efficiency in Wide-Area NetworkIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.333651219(1440-1454)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 22, Issue 6
December 2013
147 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 December 2013

Author Tags

  1. Database query processing
  2. Differential privacy
  3. Histogram

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)21
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LLM-PBE: Assessing Data Privacy in Large Language ModelsProceedings of the VLDB Endowment10.14778/3681954.368199417:11(3201-3214)Online publication date: 1-Jul-2024
  • (2024)Publishing Common Neighbors Histograms of Social Networks under Edge Differential PrivacyProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3637646(1099-1113)Online publication date: 1-Jul-2024
  • (2024)STDA: Secure Time Series Data Analytics With Practical Efficiency in Wide-Area NetworkIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.333651219(1440-1454)Online publication date: 1-Jan-2024
  • (2023)Algorithms for bounding contribution for histogram estimation under user-level privacyAlgorithms for bounding contribution for histogram estimation under user-level privacyProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619319(21969-21996)Online publication date: 23-Jul-2023
  • (2023)Saibot: A Differentially Private Data Search PlatformProceedings of the VLDB Endowment10.14778/3611479.361150816:11(3057-3070)Online publication date: 24-Aug-2023
  • (2023)Answering Private Linear Queries Adaptively Using the Common MechanismProceedings of the VLDB Endowment10.14778/3594512.359451916:8(1883-1896)Online publication date: 1-Apr-2023
  • (2023)DP-starJ: A Differential Private Scheme towards Analytical Star-Join QueriesProceedings of the ACM on Management of Data10.1145/36267251:4(1-24)Online publication date: 12-Dec-2023
  • (2023)Data Anonymization With Diversity ConstraintsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313152835:4(3603-3618)Online publication date: 1-Apr-2023
  • (2023)Synthetic Data Generation for Differential Privacy Using Maximum Weight MatchingAlgorithms and Architectures for Parallel Processing10.1007/978-981-97-0798-0_8(121-138)Online publication date: 20-Oct-2023
  • (2022)Differential Privacy via Haar Wavelet Transform and Gaussian Mechanism for Range QueryComputational Intelligence and Neuroscience10.1155/2022/81398132022Online publication date: 1-Jan-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media