Nothing Special   »   [go: up one dir, main page]

skip to main content

Calibrating Noise for Group Privacy in Subsampled Mechanisms

Published: 28 February 2025 Publication History


Given a group size m and a sensitive dataset D, group privacy (GP) releases information about D (e.g., weights of a neural network trained on D) with the guarantee that the adversary cannot infer with high confidence whether the underlying data is D or a neighboring dataset D′ that differs from D by m records. GP generalizes the well-established notion of differential privacy (DP) for protecting individuals' privacy; in particular, when m = 1, GP reduces to DP. Compared to DP, GP is capable of protecting the sensitive aggregate information of a group of up to m individuals, e.g., the average annual income among members of a yacht club. Despite its longstanding presence in the research literature and its promising applications, GP is often treated as an afterthought, with most approaches first developing a differential privacy (DP) mechanism and then using a generic conversion to adapt it for GP, treating the DP solution as a black box. As we point out in the paper, this methodology is suboptimal when the underlying DP solution involves subsampling, e.g., in the classic DP-SGD method for training deep learning models. In this case, the DP-to-GP conversion is overly pessimistic in its analysis, leading to high error and low utility in the published results under GP.
Motivated by this, we propose a novel analysis framework that provides tight privacy accounting for subsampled GP mechanisms. Instead of converting a black-box DP mechanism to GP, our solution carefully analyzes and utilizes the inherent randomness in subsampled mechanisms, leading to a substantially improved bound on the privacy loss with respect to GP. The proposed solution applies to a wide variety of foundational mechanisms with subsampling. Extensive experiments with real datasets demonstrate that compared to the baseline convert-from-blackbox-DP approach, our GP mechanisms achieve noise reductions of over an order of magnitude in several practical settings, including deep neural network training.


Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In CCS. 308--318.
Naman Agarwal, Peter Kairouz, and Ziyu Liu. 2021. The skellam mechanism for differentially private federated learning. In NeurIPS. 5052--5064.
Ritesh Ahuja, Sepanta Zeighami, Gabriel Ghinita, and Cyrus Shahabi. 2023. A Neural Approach to Spatio-Temporal Data Release with User-Level Differential Privacy. SIGMOD 1, 1 (2023), 1--25.
Apple Differential Privacy Team. 2017. Learning with privacy at scale. Online; accessed 18 February 2022.
Borja Balle, Gilles Barthe, and Marco Gaboardi. 2018. Privacy amplification by subsampling: tight analyses via couplings and divergences. In NeurIPS. 6280--6290.
Borja Balle, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Tetsuya Sato. 2020. Hypothesis testing interpretations and rényi differential privacy. In AISTATS. 2496--2506.
Ergute Bao, Yizheng Zhu, Xiaokui Xiao, Yin Yang, Beng Chin Ooi, Benjamin Hong Meng Tan, and Khin Mi Mi Aung. 2022. Skellam mixture mechanism: a novel approach to federated learning with differential privacy. PVLDB 15, 11 (2022), 2348--2360.
Johes Bater, Yongjoo Park, Xi He, Xiao Wang, and Jennie Rogers. 2020. Saqe: practical privacy-preserving approximate query processing for data federations. PVLDB 13, 12 (2020), 2691--2705.
Heinz Bauer. 1958. Minimalstellen von funktionen und extremalpunkte. Archiv der Mathematik 9, 4 (1958), 389--393.
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical secure aggregation for privacy-preserving machine learning. In CCS. 1175--1191.
Mark Bun, Marco Gaboardi, Max Hopkins, Russell Impagliazzo, Rex Lei, Toniann Pitassi, Satchit Sivakumar, and Jessica Sorrell. 2023. Stability is stable: Connections between replicability, privacy, and adaptive generalization. In STOC. 520--527.
Clément L Canonne, Gautam Kamath, and Thomas Steinke. 2020. The discrete gaussian for differential privacy. In NeurIPS. 15676--15688.
Kamalika Chaudhuri, Jacob Imola, and Ashwin Machanavajjhala. 2019. Capacity bounded differential privacy. In NeurIPS.
Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting telemetry data privately. In NeurIPS. 3574--3583.
Zeyu Ding, Yuxin Wang, Danfeng Zhang, and Daniel Kifer. 2019. Free gap information from the differentially private sparse vector and noisy max mechanisms. PVLDB 13, 3 (2019), 293--306.
Irit Dinur and Kobbi Nissim. 2003. Revealing information while preserving privacy. In PODS. 202--210.
Wei Dong, Juanru Fang, Ke Yi, Yuchao Tao, and Ashwin Machanavajjhala. 2022. R2t: Instance-optimal truncation for differentially private query evaluation with foreign keys. In SIGMOD. 759--772.
Cynthia Dwork. 2006. Differential privacy. In ICALP. 1--12.
Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. 2006. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT. 486--503.
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In TCC. 265--284.
Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 3--4 (2014), 211--407.
Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In CCS. 1054--1067.
Juanru Fang and Ke Yi. 2024. Privacy Amplification by Sampling under User-level Differential Privacy. SIGMOD 2, 1 (2024), 1--26.
Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Raghu Meka, and Chiyuan Zhang. 2023. User-Level Differential Privacy With Few Examples Per User. In NeurIPS.
Hongsheng Hu, Zoran Salcic, Lichao Sun, Gillian Dobbie, Philip S Yu, and Xuyun Zhang. 2022. Membership inference attacks on machine learning: A survey. ACM Computing Surveys (CSUR) 54, 11s (2022), 1--37.
Yangfan Jiang, Xinjian Luo, Yuncheng Wu, Xiaokui Xiao, and Beng Chin Ooi. 2024. Protecting Label Distribution in Cross-Silo Federated Learning. In IEEE S&P. 113--113.
Yangfan Jiang, Xinjian Luo, Yuncheng Wu, Xiaochen Zhu, Xiaokui Xiao, and Beng Chin Ooi. 2024. On Data Distribution Leakage in Cross-Silo Federated Learning. IEEE TKDE 36, 7 (2024), 3312--3328.
Yangfan Jiang, Xinjian Luo, Yin Yang, and Xiaokui Xiao. 2024. Calibrating noise for group privacy in subsampled mechanisms. arXiv preprint arXiv:2408.09943 (2024).
Daniel Kifer and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. In SIGMOD. 193--204.
Daniel Kifer and Ashwin Machanavajjhala. 2012. A rigorous and customizable framework for privacy. In PODS. 77--88.
Daniel Kifer and Ashwin Machanavajjhala. 2014. Pufferfish: A framework for mathematical privacy definitions. TODS 39, 1 (2014), 1--36.
Ios Kotsogiannis, Yuchao Tao, Xi He, Maryam Fanaeepour, Ashwin Machanavajjhala, Michael Hay, and Gerome Miklau. 2019. Privatesql: a differentially private sql query engine. PVLDB 12, 11 (2019), 1371--1384.
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, and Ananda Theertha Suresh. 2021. Learning with user-level privacy. In NeurIPS. 12466--12479.
Junxu Liu, Jian Lou, Li Xiong, Jinfei Liu, and Xiaofeng Meng. 2021. Projected federated averaging with heterogeneous differential privacy. PVLDB 15, 4 (2021), 828--840.
Yuhan Liu, Ananda Theertha Suresh, Felix Xinnan X Yu, Sanjiv Kumar, and Michael Riley. 2020. Learning discrete distributions: user vs item-level privacy. In NeurIPS. 20965--20976.
Xinjian Luo, Yangfan Jiang, Fei Wei, Yuncheng Wu, Xiaokui Xiao, and Beng Chin Ooi. 2024. Exploring Privacy and Fairness Risks in Sharing Diffusion Models: An Adversarial Perspective. IEEE TIFS 19 (2024), 8109--8124.
Miti Mazmudar, Thomas Humphries, Jiaxiang Liu, Matthew Rafuse, and Xi He. 2022. Cache Me If You Can: Accuracy-Aware Inference Engine for Differentially Private Data Exploration. PVLDB 16, 4 (2022), 574--586.
H Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. In ICLR.
Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In SIGMOD. 19--30.
Ilya Mironov. 2017. Rényi differential privacy. In CSF. 263--275.
Ilya Mironov, Kunal Talwar, and Li Zhang. 2019. Rényi differential privacy of the sampled gaussian mechanism. arXiv preprint arXiv:1908.10530 (2019).
Alfréd Rényi. 1961. On measures of entropy and information. In Berkeley symposium on mathematical statistics and probability, Vol. 1. 547--562.
Maria Rigaki and Sebastian Garcia. 2023. A Survey of Privacy Attacks in Machine Learning. ACM Computing Surveys (CSUR) 56 (2023), 1--34. Issue 4.
Jan Schuchardt, Mihail Stoian, Arthur Kosmala, and Stephan Günnemann. 2024. Unified Mechanism-Specific Amplification by Subsampling and Group Privacy Amplification. In NeurIPS.
Shuang Song, Yizhen Wang, and Kamalika Chaudhuri. 2017. Pufferfish privacy mechanisms for correlated data. In SIGMOD. 1291--1306.
Florian Tramer and Dan Boneh. 2021. Differentially Private Learning Needs Better Features (or Much More Data). In ICLR.
Salil Vadhan. 2017. The complexity of differential privacy. Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich (2017), 347--450.
Tim Van Erven and Peter Harremos. 2014. Rényi divergence and Kullback-Leibler divergence. IEEE TIT 60, 7 (2014), 3797--3820.
Yu-Xiang Wang, Borja Balle, and Shiva Prasad Kasiviswanathan. 2019. Subsampled rényi differential privacy and analytical moments accountant. In AISTATS. 1226--1235.
Stanley L Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 60, 309 (1965), 63--69.
Jianxin Wei, Ergute Bao, Xiaokui Xiao, and Yin Yang. 2022. Dpis: An enhanced mechanism for differentially private sgd with importance sampling. In CCS. 2885--2899.
Yuncheng Wu, Naili Xing, Gang Chen, Tien Tuan Anh Dinh, Zhaojing Luo, Beng Chin Ooi, Xiaokui Xiao, and Meihui Zhang. 2023. Falcon: A Privacy-Preserving and Interpretable Vertical Federated Learning System. PVLDB 16, 10 (2023), 2471--2484.
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
Hanshen Xiao, Zihang Xiang, Di Wang, and Srinivas Devadas. 2023. A Theory to Instruct Differentially-Private Learning via Clipping Bias Reduction. In IEEE S&P. 2170--2189.
Andrew C Yao. 1982. Protocols for secure computations. In FOCS. 160--164.
Jun Zhang, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, and Marianne Winslett. 2012. Functional mechanism: regression analysis under differential privacy. In PVLDB. 1364--1375.
Yanping Zhang, Johes Bater, Kartik Nayak, and Ashwin Machanavajjhala. 2023. Longshot: Indexing growing databases using MPC and differential privacy. PVLDB 16, 8 (2023), 2005--2018.
Xinjing Zhou, Lidan Shou, Ke Chen, Wei Hu, and Gang Chen. 2019. DPTree: differential indexing for persistent memory. PVLDB 13, 4 (2019), 421--434.
Yuqing Zhu and Yu-Xiang Wang. 2019. Poission subsampled rényi differential privacy. In ICML. 7634--7642.



Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors


Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 18, Issue 2
October 2024
436 pages
Issue’s Table of Contents


VLDB Endowment

Publication History

Published: 28 February 2025
Published in PVLDB Volume 18, Issue 2

Check for updates



  • Research-article


Other Metrics

Bibliometrics & Citations


Article Metrics

  • 0
    Total Citations
  • 1
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics


View Options

Login options

Full Access

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media