Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3580305.3599279acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Communication Efficient and Differentially Private Logistic Regression under the Distributed Setting

Published: 04 August 2023 Publication History

Abstract

We study the classic machine learning problem of logistic regression with differential privacy (DP), under the distributed setting. While logistic regression with DP has been extensively studied in the literature, most of the research is focused on the centralized setting, where a centralized server is trusted with the entire private training dataset. However, in many real-world scenarios (e.g., federated learning), the data is distributed among multiple clients who may not trust others, including clients and the server. While the server tries to learn a model using the clients' private datasets, the clients should provide each individual record in their local datasets with a formal privacy guarantee.
Towards this end, we propose a general mechanism for logistic regression with DP under the distributed setting, based on output perturbation. We show that our solution satisfies differential privacy and enjoys privacy amplification by secure aggregation, a recent technique for DP under the distributed setting. In addition, our solution also incurs much lower communication costs (which is considered as a huge overhead in federated learning), compared with existing ones. In particular, our solution requires the clients to communicate only once throughout the entire FL process. Finally, we provide experimental results on real-world datasets to demonstrate the effectiveness of our solution.

Supplementary Material

MP4 File (rtfp0450-2min-promo.mp4)
Promotion video

References

[1]
Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep Learning with Differential Privacy. In CCS. 308--318.
[2]
Reza Abbasi Asl and Bin Yu. 2021. Structural Compression of Convolutional Neural Networks with Applications in Interpretability. Frontiers in Big Data, Vol. 4 (08 2021).
[3]
Naman Agarwal, Peter Kairouz, and Ziyu Liu. 2021. The Skellam Mechanism for Differentially Private Federated Learning. In NeurIPS 2021. 5052--5064.
[4]
Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, and H. Brendan McMahan. 2018. CpSGD: Communication-Efficient and Differentially-Private Distributed SGD. In NeurIPS. 7575--7586.
[5]
Apple. 2016. Differential Privacy Overview. https://www.apple.com/privacy/docs/Differential_Privacy_Overview.pdf Retrieved December 21, 2020 from
[6]
Borja Balle, Peter Kairouz, H. Brendan McMahan, Om Thakkar, and Abhradeep Thakurta. 2020. Privacy Amplification via Random Check-Ins. In NeurIPS.
[7]
Borja Balle and Yu-Xiang Wang. 2018. Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising. In ICML, Vol. 80. 403--412.
[8]
Raef Bassily and Adam Smith. 2015. Local, Private, Efficient Protocols for Succinct Histograms. In STOC. 127--135.
[9]
Raef Bassily, Adam D. Smith, and Abhradeep Thakurta. 2014. Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds. FOCS (2014), 464--473.
[10]
James Henry Bell, Kallista A. Bonawitz, Adrià Gascón, Tancrède Lepoint, and Mariana Raykova. 2020. Secure Single-Server Aggregation with (Poly)Logarithmic Overhead. In CCS. 1253--1269.
[11]
Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konevcný, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019a. Towards Federated Learning at Scale: System Design. In MLSys, Vol. 1. 374--388.
[12]
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In CCS. 1175--1191.
[13]
Keith Bonawitz, Fariborz Salehi, Jakub Konený, Brendan McMahan, and Marco Gruteser. 2019b. Federated Learning with Autotuned Communication-Efficient Secure Aggregation. In ACSSC. 1222--1226.
[14]
Léon Bottou, Frank E Curtis, and Jorge Nocedal. 2018. Optimization methods for large-scale machine learning. Siam Review, Vol. 60, 2 (2018), 223--311.
[15]
Clé ment L. Canonne, Gautam Kamath, and Thomas Steinke. 2020. The Discrete Gaussian for Differential Privacy. In NeurIPS.
[16]
Hongyan Chang, Virat Shejwalkar, Reza Shokri, and Amir Houmansadr. 2019. Cronus: Robust and Heterogeneous Collaborative Learning with Black-Box Knowledge Transfer. arxiv: 1912.11279 [stat.ML]
[17]
Hongyan Chang and Reza Shokri. 2023. Bias Propagation in Federated Learning. In ICLR.
[18]
Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially Private Empirical Risk Minimization. Journal of machine learning research: JMLR, Vol. 12 (2011), 1069--1109.
[19]
Chen Chen, Jaewoo Lee, and Dan Kifer. 2019. Renyi Differentially Private ERM for Smooth Objectives. In AISTATS. 2037--2046.
[20]
Wei-Ning Chen, Christopher A. Choquette-Choo, Peter Kairouz, and Ananda Theertha Suresh. 2022. The Fundamental Price of Secure Aggregation in Differentially Private Federated Learning. In ICML.
[21]
Rishav Chourasia, Jiayuan Ye, and Reza Shokri. 2021. Differential privacy dynamics of langevin diffusion and noisy gradient descent. NeurIPS, Vol. 34 (2021), 14771--14781.
[22]
David R Cox. 1958. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), Vol. 20, 2 (1958), 215--232.
[23]
Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. 2017. Collecting Telemetry Data Privately. In NeurIPS. 3574--3583.
[24]
Frances Ding, Moritz Hardt, John Miller, and Ludwig Schmidt. 2021. Retiring Adult: New Datasets for Fair Machine Learning. In NeurIPS. 6478--6490.
[25]
Irit Dinur and Kobbi Nissim. 2003. Revealing Information While Preserving Privacy. In PODS. 202--210.
[26]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[27]
Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In TCC. 265--284.
[28]
Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci., Vol. 9, 3--4 (aug 2014), 211--407.
[29]
Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. 2010. Boosting and differential privacy. In FOCS. 51--60.
[30]
Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. 2019. Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity. In SODA. 2468--2479.
[31]
Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In CCS. 1054--1067.
[32]
V. Feldman, I. Mironov, K. Talwar, and A. Thakurta. 2018. Privacy Amplification by Iteration. In FOCS. 521--532.
[33]
Anupam Gupta, Katrina Ligett, Frank McSherry, Aaron Roth, and Kunal Talwar. 2010. Differentially Private Combinatorial Optimization. In SODA. 1106--1125.
[34]
Moritz Hardt and Guy N. Rothblum. 2010. A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis. In FOCS. 61--70.
[35]
D. Heck, Johannes Knapp, J-N. Capdevielle, George C. Schatz, and T. J. Thouw. 1998. CORSIKA: A Monte Carlo code to simulate extensive air showers.
[36]
A. Hedayat and W. D. Wallis. 1978. Hadamard Matrices and Their Applications. The Annals of Statistics, Vol. 6, 6 (1978), 1184--1238.
[37]
Briland Hitaj, Giuseppe Ateniese, and Fernando Pérez-Cruz. 2017. Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning. In CCS. 603--618.
[38]
Rui Hu, Yuanxiong Guo, and Yanmin Gong. 2021. Concentrated Differentially Private Federated Learning With Performance Analysis. IEEE Open Journal of the Computer Society, Vol. 2 (2021), 276--289.
[39]
Bargav Jayaraman, Lingxiao Wang, David Evans, and Quanquan Gu. 2018. Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization. In NeurIPS.
[40]
Peter Kairouz, Ziyu Liu, and Thomas Steinke. 2021a. The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation. In ICML. 5201--5212.
[41]
Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista A. Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaïd Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrè de Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Hang Qi, Daniel Ramage, Ramesh Raskar, Mariana Raykova, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, and Sen Zhao. 2021b. Advances and Open Problems in Federated Learning. Found. Trends Mach. Learn., Vol. 14, 1-2 (2021), 1--210.
[42]
J. Kiefer and Jacob Wolfowitz. 1952. Stochastic Estimation of the Maximum of a Regression Function. Annals of Mathematical Statistics, Vol. 23 (1952), 462--466.
[43]
Heiner Kirchhoffer, Paul Haase, Wojciech Samek, Karsten Müller, Hamed Rezazadegan-Tavakoli, Francesco Cricri, Emre B. Aksu, Miska M. Hannuksela, Wei Jiang, Wei Wang, Shan Liu, Swayambhoo Jain, Shahab Hamidi-Rad, Fabien Racapé, and Werner Bailer. 2022. Overview of the Neural Network Compression and Representation (NNR) Standard. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 5 (2022), 3203--3216.
[44]
Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtarik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. https://arxiv.org/abs/1610.05492
[45]
Qinbin Li, Bingsheng He, and Dawn Xiaodong Song. 2020. Practical One-Shot Federated Learning for Cross-Silo Setting. In International Joint Conference on Artificial Intelligence.
[46]
P. McCullagh and J.A. Nelder. 1989. Generalized Linear Models, Second Edition. Chapman & Hall. http://books.google.com/books?id=h9kFH2_FfBkC
[47]
Ryan McKenna and Daniel Sheldon. 2020. Permute-and-Flip: A New Mechanism for Differentially Private Selection. In NeurIPS. 11 pages.
[48]
Ryan Mckenna, Daniel Sheldon, and Gerome Miklau. 2019. Graphical-model based estimation and inference for differential privacy. In ICML. 4435--4444.
[49]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017a. Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS. 1273--1282.
[50]
Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. In ICLR.
[51]
H. B. McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017b. Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS.
[52]
Frank McSherry and Kunal Talwar. 2007. Mechanism Design via Differential Privacy. In FOCS. 94--103.
[53]
Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2019. Exploiting Unintended Feature Leakage in Collaborative Learning. In S&P. 691--706.
[54]
Ilya Mironov. 2017. Ré nyi Differential Privacy. In CSF. 263--275.
[55]
Joe Near. 2018. Differential Privacy at Scale: Uber and Berkeley Collaboration. In Enigma.
[56]
Ahmed El Ouadrhiri and Ahmed Abdelhadi. 2022. Differential Privacy for Deep and Federated Learning: A Survey. IEEE Access, Vol. 10 (2022), 22359--22380.
[57]
Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Úlfar Erlingsson. 2018. Scalable Private Learning with PATE. In ICLR.
[58]
Swaroop Indra Ramaswamy, Om Thakkar, Rajiv Mathews, Galen Andrew, H. B. McMahan, and Franccoise Beaufays. 2020. Training Production Language Models without Memorizing User Data. ArXiv, Vol. abs/2009.10031 (2020).
[59]
Herbert Robbins and Sutton Monro. 1951. A Stochastic Approximation Method. The Annals of Mathematical Statistics, Vol. 22, 3 (1951), 400--407.
[60]
Abhin Shah, Wei-Ning Chen, Johannes Ballé, Peter Kairouz, and Lucas Theis. 2022. Optimal Compression of Locally Differentially Private Mechanisms. In ICAIS. 7680--7723.
[61]
Reza Shokri and Vitaly Shmatikov. 2015. Privacy-Preserving Deep Learning. In CCS. 1310--1321.
[62]
Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. 2013. Stochastic gradient descent with differentially private updates. In Global SIP. 245--248.
[63]
Tim van Erven and Peter Harremoës. 2014. Ré nyi Divergence and Kullback-Leibler Divergence. IEEE Trans. Inf. Theory, Vol. 60, 7 (2014), 3797--3820.
[64]
Di Wang, Minwei Ye, and Jinhui Xu. 2017. Differentially Private Empirical Risk Minimization Revisited: Faster and More General. In NeurIPS, Vol. 30.
[65]
David P. Woodruff. 2014. Sketching as a Tool for Numerical Linear Algebra. Found. Trends Theor. Comput. Sci., Vol. 10, 1-2 (2014), 1--157. https://doi.org/10.1561/0400000060
[66]
Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, Divesh Srivastava, and Xiaokui Xiao. 2017. PrivBayes: Private Data Release via Bayesian Networks. TODS, Vol. 42, 4 (2017).
[67]
Yuqing Zhu, Xiang Yu, Yi-Hsuan Tsai, Francesco Pittaluga, Masoud Faraki, Manmohan chandraker, and Yu-Xiang Wang. 2022. Voting-based Approaches For Differentially Private Federated Learning. In NeurIPS 2022 Workshop Federated Learning.

Cited By

View all
  • (2024)NeurDB: an AI-powered autonomous data systemScience China Information Sciences10.1007/s11432-024-4125-967:10Online publication date: 13-Sep-2024

Index Terms

  1. Communication Efficient and Differentially Private Logistic Regression under the Distributed Setting

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2023
    5996 pages
    ISBN:9798400701030
    DOI:10.1145/3580305
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 August 2023

    Check for updates

    Author Tags

    1. differential privacy
    2. federated learning
    3. privacy

    Qualifiers

    • Research-article

    Funding Sources

    • National Research Foundation Singapore

    Conference

    KDD '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)688
    • Downloads (Last 6 weeks)36
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)NeurDB: an AI-powered autonomous data systemScience China Information Sciences10.1007/s11432-024-4125-967:10Online publication date: 13-Sep-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media

    Access Granted

    The conference sponsors are committed to making content openly accessible in a timely manner.
    This article is provided by ACM and the conference, through the ACM OpenTOC service.