research-article

Open access

Communication Efficient and Differentially Private Logistic Regression under the Distributed Setting

Authors:

Yaliang LiAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 69 - 79

https://doi.org/10.1145/3580305.3599279

Published: 04 August 2023 Publication History

Abstract

We study the classic machine learning problem of logistic regression with differential privacy (DP), under the distributed setting. While logistic regression with DP has been extensively studied in the literature, most of the research is focused on the centralized setting, where a centralized server is trusted with the entire private training dataset. However, in many real-world scenarios (e.g., federated learning), the data is distributed among multiple clients who may not trust others, including clients and the server. While the server tries to learn a model using the clients' private datasets, the clients should provide each individual record in their local datasets with a formal privacy guarantee.

Towards this end, we propose a general mechanism for logistic regression with DP under the distributed setting, based on output perturbation. We show that our solution satisfies differential privacy and enjoys privacy amplification by secure aggregation, a recent technique for DP under the distributed setting. In addition, our solution also incurs much lower communication costs (which is considered as a huge overhead in federated learning), compared with existing ones. In particular, our solution requires the clients to communicate only once throughout the entire FL process. Finally, we provide experimental results on real-world datasets to demonstrate the effectiveness of our solution.

Supplementary Material

MP4 File (rtfp0450-2min-promo.mp4)

Promotion video

Download
2.88 MB

References

[1]

Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In CCS. 308--318.

[2]

Reza Abbasi Asl and Bin Yu. 2021. Structural Compression of Convolutional Neural Networks with Applications in Interpretability. Frontiers in Big Data, Vol. 4 (08 2021).

[3]

Naman Agarwal, Peter Kairouz, and Ziyu Liu. 2021. The Skellam Mechanism for Differentially Private Federated Learning. In NeurIPS 2021. 5052--5064.

[4]

Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, and H. Brendan McMahan. 2018. CpSGD: Communication-Efficient and Differentially-Private Distributed SGD. In NeurIPS. 7575--7586.

[5]

Apple. 2016. Differential Privacy Overview. https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf Retrieved December 21, 2020 from

[6]

Borja Balle, Peter Kairouz, H. Brendan McMahan, Om Thakkar, and Abhradeep Thakurta. 2020. Privacy Amplification via Random Check-Ins. In NeurIPS.

[7]

Borja Balle and Yu-Xiang Wang. 2018. Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising. In ICML, Vol. 80. 403--412.

[8]

Raef Bassily and Adam Smith. 2015. Local, Private, Efficient Protocols for Succinct Histograms. In STOC. 127--135.

[9]

Raef Bassily, Adam D. Smith, and Abhradeep Thakurta. 2014. Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds. FOCS (2014), 464--473.

[10]

James Henry Bell, Kallista A. Bonawitz, Adrià Gascón, Tancrède Lepoint, and Mariana Raykova. 2020. Secure Single-Server Aggregation with (Poly)Logarithmic Overhead. In CCS. 1253--1269.

[11]

Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konevcný, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019a. Towards Federated Learning at Scale: System Design. In MLSys, Vol. 1. 374--388.

[12]

Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In CCS. 1175--1191.

[13]

Keith Bonawitz, Fariborz Salehi, Jakub Konený, Brendan McMahan, and Marco Gruteser. 2019b. Federated Learning with Autotuned Communication-Efficient Secure Aggregation. In ACSSC. 1222--1226.

[14]

Léon Bottou, Frank E Curtis, and Jorge Nocedal. 2018. Optimization methods for large-scale machine learning. Siam Review, Vol. 60, 2 (2018), 223--311.

[15]

Clé ment L. Canonne, Gautam Kamath, and Thomas Steinke. 2020. The Discrete Gaussian for Differential Privacy. In NeurIPS.

[16]

Hongyan Chang, Virat Shejwalkar, Reza Shokri, and Amir Houmansadr. 2019. Cronus: Robust and Heterogeneous Collaborative Learning with Black-Box Knowledge Transfer. arxiv: 1912.11279 [stat.ML]

[17]

Hongyan Chang and Reza Shokri. 2023. Bias Propagation in Federated Learning. In ICLR.

[18]

Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially Private Empirical Risk Minimization. Journal of machine learning research: JMLR, Vol. 12 (2011), 1069--1109.

Digital Library

[19]

Chen Chen, Jaewoo Lee, and Dan Kifer. 2019. Renyi Differentially Private ERM for Smooth Objectives. In AISTATS. 2037--2046.

[20]

Wei-Ning Chen, Christopher A. Choquette-Choo, Peter Kairouz, and Ananda Theertha Suresh. 2022. The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning. In ICML.

[21]

Rishav Chourasia, Jiayuan Ye, and Reza Shokri. 2021. Differential privacy dynamics of langevin diffusion and noisy gradient descent. NeurIPS, Vol. 34 (2021), 14771--14781.

[22]

David R Cox. 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 20, 2 (1958), 215--232.

[23]

Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting Telemetry Data Privately. In NeurIPS. 3574--3583.

[24]

Frances Ding, Moritz Hardt, John Miller, and Ludwig Schmidt. 2021. Retiring Adult: New Datasets for Fair Machine Learning. In NeurIPS. 6478--6490.

[25]

Irit Dinur and Kobbi Nissim. 2003. Revealing Information While Preserving Privacy. In PODS. 202--210.

[26]

Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

[27]

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In TCC. 265--284.

[28]

Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci., Vol. 9, 3--4 (aug 2014), 211--407.

Digital Library

[29]

Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. 2010. Boosting and differential privacy. In FOCS. 51--60.

[30]

Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. 2019. Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity. In SODA. 2468--2479.

[31]

Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In CCS. 1054--1067.

Digital Library

[32]

V. Feldman, I. Mironov, K. Talwar, and A. Thakurta. 2018. Privacy Amplification by Iteration. In FOCS. 521--532.

[33]

Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, and Kunal Talwar. 2010. Differentially Private Combinatorial Optimization. In SODA. 1106--1125.

[34]

Moritz Hardt and Guy N. Rothblum. 2010. A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis. In FOCS. 61--70.

Digital Library

[35]

D. Heck, Johannes Knapp, J-N. Capdevielle, George C. Schatz, and T. J. Thouw. 1998. CORSIKA: A Monte Carlo code to simulate extensive air showers.

[36]

A. Hedayat and W. D. Wallis. 1978. Hadamard Matrices and Their Applications. The Annals of Statistics, Vol. 6, 6 (1978), 1184--1238.

[37]

Briland Hitaj, Giuseppe Ateniese, and Fernando Pérez-Cruz. 2017. Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning. In CCS. 603--618.

[38]

Rui Hu, Yuanxiong Guo, and Yanmin Gong. 2021. Concentrated Differentially Private Federated Learning With Performance Analysis. IEEE Open Journal of the Computer Society, Vol. 2 (2021), 276--289.

[39]

Bargav Jayaraman, Lingxiao Wang, David Evans, and Quanquan Gu. 2018. Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization. In NeurIPS.

[40]

Peter Kairouz, Ziyu Liu, and Thomas Steinke. 2021a. The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation. In ICML. 5201--5212.

[41]

Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista A. Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaïd Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrè de Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Hang Qi, Daniel Ramage, Ramesh Raskar, Mariana Raykova, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2021b. Advances and Open Problems in Federated Learning. Found. Trends Mach. Learn., Vol. 14, 1-2 (2021), 1--210.

Digital Library

[42]

J. Kiefer and Jacob Wolfowitz. 1952. Stochastic Estimation of the Maximum of a Regression Function. Annals of Mathematical Statistics, Vol. 23 (1952), 462--466.

[43]

Heiner Kirchhoffer, Paul Haase, Wojciech Samek, Karsten Müller, Hamed Rezazadegan-Tavakoli, Francesco Cricri, Emre B. Aksu, Miska M. Hannuksela, Wei Jiang, Wei Wang, Shan Liu, Swayambhoo Jain, Shahab Hamidi-Rad, Fabien Racapé, and Werner Bailer. 2022. Overview of the Neural Network Compression and Representation (NNR) Standard. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 5 (2022), 3203--3216.

Digital Library

[44]

Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtarik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. https://arxiv.org/abs/1610.05492

[45]

Qinbin Li, Bingsheng He, and Dawn Xiaodong Song. 2020. Practical One-Shot Federated Learning for Cross-Silo Setting. In International Joint Conference on Artificial Intelligence.

[46]

P. McCullagh and J.A. Nelder. 1989. Generalized Linear Models, Second Edition. Chapman & Hall. http://books.google.com/books?id=h9kFH2_FfBkC

[47]

Ryan McKenna and Daniel Sheldon. 2020. Permute-and-Flip: A New Mechanism for Differentially Private Selection. In NeurIPS. 11 pages.

[48]

Ryan Mckenna, Daniel Sheldon, and Gerome Miklau. 2019. Graphical-model based estimation and inference for differential privacy. In ICML. 4435--4444.

[49]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017a. Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS. 1273--1282.

[50]

Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. In ICLR.

[51]

H. B. McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017b. Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS.

[52]

Frank McSherry and Kunal Talwar. 2007. Mechanism Design via Differential Privacy. In FOCS. 94--103.

[53]

Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2019. Exploiting Unintended Feature Leakage in Collaborative Learning. In S&P. 691--706.

[54]

Ilya Mironov. 2017. Ré nyi Differential Privacy. In CSF. 263--275.

[55]

Joe Near. 2018. Differential Privacy at Scale: Uber and Berkeley Collaboration. In Enigma.

[56]

Ahmed El Ouadrhiri and Ahmed Abdelhadi. 2022. Differential Privacy for Deep and Federated Learning: A Survey. IEEE Access, Vol. 10 (2022), 22359--22380.

[57]

Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. 2018. Scalable Private Learning with PATE. In ICLR.

[58]

Swaroop Indra Ramaswamy, Om Thakkar, Rajiv Mathews, Galen Andrew, H. B. McMahan, and Franccoise Beaufays. 2020. Training Production Language Models without Memorizing User Data. ArXiv, Vol. abs/2009.10031 (2020).

[59]

Herbert Robbins and Sutton Monro. 1951. A Stochastic Approximation Method. The Annals of Mathematical Statistics, Vol. 22, 3 (1951), 400--407.

[60]

Abhin Shah, Wei-Ning Chen, Johannes Ballé, Peter Kairouz, and Lucas Theis. 2022. Optimal Compression of Locally Differentially Private Mechanisms. In ICAIS. 7680--7723.

[61]

Reza Shokri and Vitaly Shmatikov. 2015. Privacy-Preserving Deep Learning. In CCS. 1310--1321.

[62]

Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2013. Stochastic gradient descent with differentially private updates. In Global SIP. 245--248.

[63]

Tim van Erven and Peter Harremoës. 2014. Ré nyi Divergence and Kullback-Leibler Divergence. IEEE Trans. Inf. Theory, Vol. 60, 7 (2014), 3797--3820.

[64]

Di Wang, Minwei Ye, and Jinhui Xu. 2017. Differentially Private Empirical Risk Minimization Revisited: Faster and More General. In NeurIPS, Vol. 30.

[65]

David P. Woodruff. 2014. Sketching as a Tool for Numerical Linear Algebra. Found. Trends Theor. Comput. Sci., Vol. 10, 1-2 (2014), 1--157. https://doi.org/10.1561/0400000060

Digital Library

[66]

Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2017. PrivBayes: Private Data Release via Bayesian Networks. TODS, Vol. 42, 4 (2017).

Digital Library

[67]

Yuqing Zhu, Xiang Yu, Yi-Hsuan Tsai, Francesco Pittaluga, Masoud Faraki, Manmohan chandraker, and Yu-Xiang Wang. 2022. Voting-based Approaches For Differentially Private Federated Learning. In NeurIPS 2022 Workshop Federated Learning.

Cited By

Ooi BCai SChen GShen YTan KWu YXiao XXing NYue CZeng LZhang MZhao Z(2024)NeurDB: an AI-powered autonomous data systemScience China Information Sciences10.1007/s11432-024-4125-967:10Online publication date: 13-Sep-2024
https://doi.org/10.1007/s11432-024-4125-9

Index Terms

Communication Efficient and Differentially Private Logistic Regression under the Distributed Setting
1. Security and privacy
  1. Database and storage security
    1. Data anonymization and sanitization

Recommendations

On the complexity of differentially private data release: efficient algorithms and hardness results
STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computing

We consider private data analysis in the setting in which a trusted and trustworthy curator, having obtained a large data set containing private information, releases to the public a "sanitization" of the data set that simultaneously protects the ...
Differentially Private Logistic Regression with Sparse Solutions
AISec '23: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security

LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of LASSO logistic regression have been ...
Characterization of differentially private logistic regression
ACMSE '18: Proceedings of the 2018 ACM Southeast Conference

The purpose of this paper is to present an approach that can help data owners select suitable values for the privacy parameter of a differentially private logistic regression (DPLR), whose main intention is to achieve a balance between privacy strength ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2023

5996 pages

ISBN:9798400701030

DOI:10.1145/3580305

General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation Singapore

Conference

KDD '23

Sponsor:

KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 6 - 10, 2023

CA, Long Beach, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
868
Total Downloads

Downloads (Last 12 months)688
Downloads (Last 6 weeks)36

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ooi BCai SChen GShen YTan KWu YXiao XXing NYue CZeng LZhang MZhao Z(2024)NeurDB: an AI-powered autonomous data systemScience China Information Sciences10.1007/s11432-024-4125-967:10Online publication date: 13-Sep-2024
https://doi.org/10.1007/s11432-024-4125-9

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents