Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Calibrating Noise for Group Privacy in Subsampled Mechanisms

Published: 28 February 2025 Publication History

Abstract

Given a group size m and a sensitive dataset D, group privacy (GP) releases information about D (e.g., weights of a neural network trained on D) with the guarantee that the adversary cannot infer with high confidence whether the underlying data is D or a neighboring dataset D′ that differs from D by m records. GP generalizes the well-established notion of differential privacy (DP) for protecting individuals' privacy; in particular, when m = 1, GP reduces to DP. Compared to DP, GP is capable of protecting the sensitive aggregate information of a group of up to m individuals, e.g., the average annual income among members of a yacht club. Despite its longstanding presence in the research literature and its promising applications, GP is often treated as an afterthought, with most approaches first developing a differential privacy (DP) mechanism and then using a generic conversion to adapt it for GP, treating the DP solution as a black box. As we point out in the paper, this methodology is suboptimal when the underlying DP solution involves subsampling, e.g., in the classic DP-SGD method for training deep learning models. In this case, the DP-to-GP conversion is overly pessimistic in its analysis, leading to high error and low utility in the published results under GP.
Motivated by this, we propose a novel analysis framework that provides tight privacy accounting for subsampled GP mechanisms. Instead of converting a black-box DP mechanism to GP, our solution carefully analyzes and utilizes the inherent randomness in subsampled mechanisms, leading to a substantially improved bound on the privacy loss with respect to GP. The proposed solution applies to a wide variety of foundational mechanisms with subsampling. Extensive experiments with real datasets demonstrate that compared to the baseline convert-from-blackbox-DP approach, our GP mechanisms achieve noise reductions of over an order of magnitude in several practical settings, including deep neural network training.

References

[1]
Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In CCS. 308--318.
[2]
Naman Agarwal, Peter Kairouz, and Ziyu Liu. 2021. The skellam mechanism for differentially private federated learning. In NeurIPS. 5052--5064.
[3]
Ritesh Ahuja, Sepanta Zeighami, Gabriel Ghinita, and Cyrus Shahabi. 2023. A Neural Approach to Spatio-Temporal Data Release with User-Level Differential Privacy. SIGMOD 1, 1 (2023), 1--25.
[4]
Apple Differential Privacy Team. 2017. Learning with privacy at scale. http://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html. Online; accessed 18 February 2022.
[5]
Borja Balle, Gilles Barthe, and Marco Gaboardi. 2018. Privacy amplification by subsampling: tight analyses via couplings and divergences. In NeurIPS. 6280--6290.
[6]
Borja Balle, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Tetsuya Sato. 2020. Hypothesis testing interpretations and rényi differential privacy. In AISTATS. 2496--2506.
[7]
Ergute Bao, Yizheng Zhu, Xiaokui Xiao, Yin Yang, Beng Chin Ooi, Benjamin Hong Meng Tan, and Khin Mi Mi Aung. 2022. Skellam mixture mechanism: a novel approach to federated learning with differential privacy. PVLDB 15, 11 (2022), 2348--2360.
[8]
Johes Bater, Yongjoo Park, Xi He, Xiao Wang, and Jennie Rogers. 2020. Saqe: practical privacy-preserving approximate query processing for data federations. PVLDB 13, 12 (2020), 2691--2705.
[9]
Heinz Bauer. 1958. Minimalstellen von funktionen und extremalpunkte. Archiv der Mathematik 9, 4 (1958), 389--393.
[10]
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical secure aggregation for privacy-preserving machine learning. In CCS. 1175--1191.
[11]
Mark Bun, Marco Gaboardi, Max Hopkins, Russell Impagliazzo, Rex Lei, Toniann Pitassi, Satchit Sivakumar, and Jessica Sorrell. 2023. Stability is stable: Connections between replicability, privacy, and adaptive generalization. In STOC. 520--527.
[12]
Clément L Canonne, Gautam Kamath, and Thomas Steinke. 2020. The discrete gaussian for differential privacy. In NeurIPS. 15676--15688.
[13]
Kamalika Chaudhuri, Jacob Imola, and Ashwin Machanavajjhala. 2019. Capacity bounded differential privacy. In NeurIPS.
[14]
Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting telemetry data privately. In NeurIPS. 3574--3583.
[15]
Zeyu Ding, Yuxin Wang, Danfeng Zhang, and Daniel Kifer. 2019. Free gap information from the differentially private sparse vector and noisy max mechanisms. PVLDB 13, 3 (2019), 293--306.
[16]
Irit Dinur and Kobbi Nissim. 2003. Revealing information while preserving privacy. In PODS. 202--210.
[17]
Wei Dong, Juanru Fang, Ke Yi, Yuchao Tao, and Ashwin Machanavajjhala. 2022. R2t: Instance-optimal truncation for differentially private query evaluation with foreign keys. In SIGMOD. 759--772.
[18]
Cynthia Dwork. 2006. Differential privacy. In ICALP. 1--12.
[19]
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT. 486--503.
[20]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In TCC. 265--284.
[21]
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 3--4 (2014), 211--407.
[22]
Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In CCS. 1054--1067.
[23]
Juanru Fang and Ke Yi. 2024. Privacy Amplification by Sampling under User-level Differential Privacy. SIGMOD 2, 1 (2024), 1--26.
[24]
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Raghu Meka, and Chiyuan Zhang. 2023. User-Level Differential Privacy With Few Examples Per User. In NeurIPS.
[25]
Hongsheng Hu, Zoran Salcic, Lichao Sun, Gillian Dobbie, Philip S Yu, and Xuyun Zhang. 2022. Membership inference attacks on machine learning: A survey. ACM Computing Surveys (CSUR) 54, 11s (2022), 1--37.
[26]
Yangfan Jiang, Xinjian Luo, Yuncheng Wu, Xiaokui Xiao, and Beng Chin Ooi. 2024. Protecting Label Distribution in Cross-Silo Federated Learning. In IEEE S&P. 113--113.
[27]
Yangfan Jiang, Xinjian Luo, Yuncheng Wu, Xiaochen Zhu, Xiaokui Xiao, and Beng Chin Ooi. 2024. On Data Distribution Leakage in Cross-Silo Federated Learning. IEEE TKDE 36, 7 (2024), 3312--3328.
[28]
Yangfan Jiang, Xinjian Luo, Yin Yang, and Xiaokui Xiao. 2024. Calibrating noise for group privacy in subsampled mechanisms. arXiv preprint arXiv:2408.09943 (2024).
[29]
Daniel Kifer and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. In SIGMOD. 193--204.
[30]
Daniel Kifer and Ashwin Machanavajjhala. 2012. A rigorous and customizable framework for privacy. In PODS. 77--88.
[31]
Daniel Kifer and Ashwin Machanavajjhala. 2014. Pufferfish: A framework for mathematical privacy definitions. TODS 39, 1 (2014), 1--36.
[32]
Ios Kotsogiannis, Yuchao Tao, Xi He, Maryam Fanaeepour, Ashwin Machanavajjhala, Michael Hay, and Gerome Miklau. 2019. Privatesql: a differentially private sql query engine. PVLDB 12, 11 (2019), 1371--1384.
[33]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[34]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
[35]
Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, and Ananda Theertha Suresh. 2021. Learning with user-level privacy. In NeurIPS. 12466--12479.
[36]
Junxu Liu, Jian Lou, Li Xiong, Jinfei Liu, and Xiaofeng Meng. 2021. Projected federated averaging with heterogeneous differential privacy. PVLDB 15, 4 (2021), 828--840.
[37]
Yuhan Liu, Ananda Theertha Suresh, Felix Xinnan X Yu, Sanjiv Kumar, and Michael Riley. 2020. Learning discrete distributions: user vs item-level privacy. In NeurIPS. 20965--20976.
[38]
Xinjian Luo, Yangfan Jiang, Fei Wei, Yuncheng Wu, Xiaokui Xiao, and Beng Chin Ooi. 2024. Exploring Privacy and Fairness Risks in Sharing Diffusion Models: An Adversarial Perspective. IEEE TIFS 19 (2024), 8109--8124.
[39]
Miti Mazmudar, Thomas Humphries, Jiaxiang Liu, Matthew Rafuse, and Xi He. 2022. Cache Me If You Can: Accuracy-Aware Inference Engine for Differentially Private Data Exploration. PVLDB 16, 4 (2022), 574--586.
[40]
H Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. In ICLR.
[41]
Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD. 19--30.
[42]
Ilya Mironov. 2017. Rényi differential privacy. In CSF. 263--275.
[43]
Ilya Mironov, Kunal Talwar, and Li Zhang. 2019. Rényi differential privacy of the sampled gaussian mechanism. arXiv preprint arXiv:1908.10530 (2019).
[44]
Alfréd Rényi. 1961. On measures of entropy and information. In Berkeley symposium on mathematical statistics and probability, Vol. 1. 547--562.
[45]
Maria Rigaki and Sebastian Garcia. 2023. A Survey of Privacy Attacks in Machine Learning. ACM Computing Surveys (CSUR) 56 (2023), 1--34. Issue 4.
[46]
Jan Schuchardt, Mihail Stoian, Arthur Kosmala, and Stephan Günnemann. 2024. Unified Mechanism-Specific Amplification by Subsampling and Group Privacy Amplification. In NeurIPS.
[47]
Shuang Song, Yizhen Wang, and Kamalika Chaudhuri. 2017. Pufferfish privacy mechanisms for correlated data. In SIGMOD. 1291--1306.
[48]
Florian Tramer and Dan Boneh. 2021. Differentially Private Learning Needs Better Features (or Much More Data). In ICLR.
[49]
Salil Vadhan. 2017. The complexity of differential privacy. Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich (2017), 347--450.
[50]
Tim Van Erven and Peter Harremos. 2014. Rényi divergence and Kullback-Leibler divergence. IEEE TIT 60, 7 (2014), 3797--3820.
[51]
Yu-Xiang Wang, Borja Balle, and Shiva Prasad Kasiviswanathan. 2019. Subsampled rényi differential privacy and analytical moments accountant. In AISTATS. 1226--1235.
[52]
Stanley L Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 60, 309 (1965), 63--69.
[53]
Jianxin Wei, Ergute Bao, Xiaokui Xiao, and Yin Yang. 2022. Dpis: An enhanced mechanism for differentially private sgd with importance sampling. In CCS. 2885--2899.
[54]
Yuncheng Wu, Naili Xing, Gang Chen, Tien Tuan Anh Dinh, Zhaojing Luo, Beng Chin Ooi, Xiaokui Xiao, and Meihui Zhang. 2023. Falcon: A Privacy-Preserving and Interpretable Vertical Federated Learning System. PVLDB 16, 10 (2023), 2471--2484.
[55]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
[56]
Hanshen Xiao, Zihang Xiang, Di Wang, and Srinivas Devadas. 2023. A Theory to Instruct Differentially-Private Learning via Clipping Bias Reduction. In IEEE S&P. 2170--2189.
[57]
Andrew C Yao. 1982. Protocols for secure computations. In FOCS. 160--164.
[58]
Jun Zhang, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, and Marianne Winslett. 2012. Functional mechanism: regression analysis under differential privacy. In PVLDB. 1364--1375.
[59]
Yanping Zhang, Johes Bater, Kartik Nayak, and Ashwin Machanavajjhala. 2023. Longshot: Indexing growing databases using MPC and differential privacy. PVLDB 16, 8 (2023), 2005--2018.
[60]
Xinjing Zhou, Lidan Shou, Ke Chen, Wei Hu, and Gang Chen. 2019. DPTree: differential indexing for persistent memory. PVLDB 13, 4 (2019), 421--434.
[61]
Yuqing Zhu and Yu-Xiang Wang. 2019. Poission subsampled rényi differential privacy. In ICML. 7634--7642.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 18, Issue 2
October 2024
436 pages
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 28 February 2025
Published in PVLDB Volume 18, Issue 2

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 1
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media