research-article

Public Access

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

Authors:

Jiaming XuAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 1, Issue 2

Article No.: 44, Pages 1 - 25

https://doi.org/10.1145/3154503

Published: 19 December 2017 Publication History

Abstract

We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized system that consists of a parameter server and m working machines; each working machine keeps N/m data samples, where N is the total number of samples. In each iteration, up to q of the m working machines suffer Byzantine faults -- a faulty machine in the given iteration behaves arbitrarily badly against the system and has complete knowledge of the system. Additionally, the sets of faulty machines may be different across iterations. Our goal is to design robust algorithms such that the system can learn the underlying true parameter, which is of dimension d, despite the interruption of the Byzantine attacks.

In this paper, based on the geometric median of means of the gradients, we propose a simple variant of the classical gradient descent method. We show that our method can tolerate q Byzantine failures up to 2(1+ε)q ≤ for an arbitrarily small but fixed constant ε > 0. The parameter estimate converges in O(log N) rounds with an estimation error on the order of max{√dq/N, √d/N, which is larger than the minimax-optimal error rate √d/N in the centralized and failure-free setting by at most a factor of √q. The total computational complexity of our algorithm is of O((Nd/m) log N) at each working machine and O(md + kd log³ N) at the central server, and the total communication cost is of O(m d log N). We further provide an application of our general results to the linear regression problem.

A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. To handle this issue in the analysis, we prove that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.

References

[1]

Rakesh Agrawal and Ramakrishnan Srikant. 2000. Privacy-preserving Data Mining. SIGMOD Rec., Vol. 29, 2 (May 2000), 439--450.

Digital Library

[2]

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. 2017. Byzantine-Tolerant Machine Learning. arXiv preprint arXiv:1703.02757 (2017).

[3]

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, Vol. 3, 1 (2011), 1--122.

Digital Library

[4]

Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press.

Digital Library

[5]

Hervé Cardot, Peggy Cénac, Pierre-André Zitt, et al\mbox. 2013. Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, Vol. 19, 1 (2013), 18--43.

[6]

Michael B Cohen, Yin Tat Lee, Gary Miller, Jakub Pachocki, and Aaron Sidford. 2016. Geometric median in nearly linear time. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing. ACM, 9--21.

Digital Library

[7]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM Vol. 51, 1 (2008), 107--113.

Digital Library

[8]

Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2016. Robust estimators in high dimensions without the computational intractability. Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on. IEEE, 655--664.

[9]

John Duchi, Martin J Wainwright, and Michael I Jordan. 2013. Local privacy and minimax bounds: Sharp rates for probability estimation. Advances in Neural Information Processing Systems. 1529--1537.

Digital Library

[10]

Jiashi Feng, Huan Xu, and Shie Mannor. 2014. Distributed Robust Learning. arXiv preprint arXiv:1409.5937 (2014).

[11]

Michael I Jordan, Jason D Lee, and Yun Yang. 2016. Communication-Efficient Distributed Statistical Inference. arXiv preprint arXiv:1605.07689 (2016).

[12]

Charlie Kaufman, Radia Perlman, and Mike Speciner. 2002. Network security: private communication in a public world. Prentice Hall Press.

Digital Library

[13]

JHB Kemperman. 1987. The median of a finite measure on a Banach space. Statistical data analysis based on the L1-norm and related methods (Neuchâtel, 1987) (1987), 217--230.

[14]

Jakub Konevcnỳ, Brendan McMahan, and Daniel Ramage. 2015. Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575 (2015).

[15]

Kevin A Lai, Anup B Rao, and Santosh Vempala. 2016. Agnostic estimation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on. IEEE, 665--674.

[16]

Hendrik P Lopuhaa and Peter J Rousseeuw. 1991. Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics (1991), 229--248.

[17]

Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment Vol. 5, 8 (2012), 716--727.

Digital Library

[18]

Nancy A. Lynch. 1996. Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. \showISBNx9780080504704

Digital Library

[19]

Brendan McMahan and Daniel Ramage. Accessed: 2017-04--10. Federated Learning: Collaborative Machine Learning without Centralized Training Data. https://research.googleblog.com/2017/04/federated-learning-collaborative.html (Accessed: 2017-04--10).

[20]

Song Mei, Yu Bai, and Andrea Montanari. 2016. The landscape of empirical risk for non-convex losses. arXiv preprint arXiv:1607.06534 (2016).

[21]

P Milasevic, GR Ducharme, et al. 1987. Uniqueness of the spatial median. The Annals of Statistics Vol. 15, 3 (1987), 1332--1333.

[22]

Stanislav Minsker et al. 2015. Geometric median and robust estimation in Banach spaces. Bernoulli, Vol. 21, 4 (2015), 2308--2335.

[23]

Philipp Moritz, Robert Nishihara, Ion Stoica, and Michael I Jordan. 2015. Sparknet: Training deep networks in spark. arXiv preprint arXiv:1511.06051 (2015).

[24]

Jyrki Möttönen, Klaus Nordhausen, Hannu Oja, et al. 2010. Asymptotic theory of the spatial median. Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis: A Festschrift in honor of Professor Jana Jurevcková. Institute of Mathematical Statistics, 182--193.

[25]

Charles P. Pfleeger and Shari Lawrence Pfleeger. 2002. Security in Computing (3rd ed.). Prentice Hall Professional Technical Reference.

Digital Library

[26]

Foster J Provost and Daniel N Hennessy. 1996. Scaling up: Distributed machine learning with cooperation. AAAI/IAAI, Vol. 1. Citeseer, 74--79.

Digital Library

[27]

Hanif D Sherali and Dimitri P Bertsekas. 1998. Network optimization: Continuous and discrete models. (1998).

[28]

R. Vershynin. 2010. Introduction to the non-asymptotic analysis of random matrices. Arxiv preprint arxiv:1011.3027 (2010).

[29]

Cong Wang, Qian Wang, Kui Ren, and Wenjing Lou. 2010. Privacy-preserving public auditing for data storage security in cloud computing. Infocom, 2010 proceedings ieee. Ieee, 1--9.

Digital Library

[30]

Yihong Wu. 2017. Lecture Notes on Information-theoretic Methods For High-dimensional Statistics. (April 2017). http://www.stat.yale.edu/ yw562/teaching/it-stats.pdf.

[31]

Yuchen Zhang, John Duchi, and Martin Wainwright. 2015. Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. J. Mach. Learn. Res Vol. 16 (2015), 3299--3340.

Digital Library

[32]

Yuchen Zhang, John C. Duchi, and Martin J. Wainwright. 2013. Communication-Efficient Algorithms for Statistical Optimization. Journal of Machine Learning Research Vol. 14 (2013), 3321--3363. http://jmlr.org/papers/v14/zhang13b.html

Digital Library

Cited By

Gava CVékássy ACavorsi MGil SMallmann-Trenn F(2024)Community Consensus: Converging Locally Despite Adversaries and Heterogeneous Connectivity2024 American Control Conference (ACC)10.23919/ACC60939.2024.10644680(4141-4148)Online publication date: 10-Jul-2024
https://doi.org/10.23919/ACC60939.2024.10644680
Godavarthi SG. D(2024)Federated Learning's Dynamic Defense Against Byzantine Attacks: Integrating SIFT-Wavelet and Differential Privacy for Byzantine Grade Levels DetectionInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.53810:4Online publication date: 30-Oct-2024
https://doi.org/10.22399/ijcesen.538
zheng hChen JLiu TCheng YWang ZWang YGao LJi SZhang X(2024)DP-Poison: Poisoning Federated Learning under the Cover of Differential PrivacyACM Transactions on Privacy and Security10.1145/3702325Online publication date: 2-Nov-2024
https://doi.org/10.1145/3702325
Show More Cited By

Index Terms

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
      1. MapReduce algorithms
  2. Machine learning
    1. Learning settings
      1. Batch learning
2. Security and privacy
  1. Network security
    1. Mobile and wireless security
  2. Systems security
    1. Distributed systems security

Recommendations

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent
SIGMETRICS '18: Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized ...
Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent
SIGMETRICS '18

We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized ...
Securing Distributed Gradient Descent in High Dimensional Statistical Learning

We consider unreliable distributed learning systems wherein the training data is kept confidential by external workers, and the learner has to interact closely with those workers to train a model. In particular, we assume that there exists a system ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 1, Issue 2

December 2017

480 pages

EISSN:2476-1249

DOI:10.1145/3175501

Editors:
Augustin Chaintreau
Columbia University
,
Aditya Akella
University of Wisconsin-Madison
,
Adam Wierman
California Institute of Technology

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2017

Published in POMACS Volume 1, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

259
Total Citations
View Citations
3,552
Total Downloads

Downloads (Last 12 months)568
Downloads (Last 6 weeks)67

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gava CVékássy ACavorsi MGil SMallmann-Trenn F(2024)Community Consensus: Converging Locally Despite Adversaries and Heterogeneous Connectivity2024 American Control Conference (ACC)10.23919/ACC60939.2024.10644680(4141-4148)Online publication date: 10-Jul-2024
https://doi.org/10.23919/ACC60939.2024.10644680
Godavarthi SG. D(2024)Federated Learning's Dynamic Defense Against Byzantine Attacks: Integrating SIFT-Wavelet and Differential Privacy for Byzantine Grade Levels DetectionInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.53810:4Online publication date: 30-Oct-2024
https://doi.org/10.22399/ijcesen.538
zheng hChen JLiu TCheng YWang ZWang YGao LJi SZhang X(2024)DP-Poison: Poisoning Federated Learning under the Cover of Differential PrivacyACM Transactions on Privacy and Security10.1145/3702325Online publication date: 2-Nov-2024
https://doi.org/10.1145/3702325
Zhang YZeng DLuo JFu XChen GXu ZKing I(2024)A Survey of Trustworthy Federated Learning: Issues, Solutions, and ChallengesACM Transactions on Intelligent Systems and Technology10.1145/367818115:6(1-47)Online publication date: 23-Jul-2024
https://dl.acm.org/doi/10.1145/3678181
Farhadkhani SGuerraoui RGupta NPinot RKuznetsov PGelles ROlivetti D(2024)Brief Announcement: A Case for Byzantine Machine LearningProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662802(131-134)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3662158.3662802
Zhen XGong X(2024)Resilient Formation-Tracking of Double-Integretor Multi-agent Systems against Byzantine Attacks2024 39th Youth Academic Annual Conference of Chinese Association of Automation (YAC)10.1109/YAC63405.2024.10598540(2279-2284)Online publication date: 7-Jun-2024
https://doi.org/10.1109/YAC63405.2024.10598540
He YLi PNi JDeng XLu HZhang JYang L(2024)RSAM: Byzantine-Robust and Secure Model Aggregation in Federated Learning for Internet of Vehicles Using Private Approximate MedianIEEE Transactions on Vehicular Technology10.1109/TVT.2023.334163773:5(6714-6726)Online publication date: May-2024
https://doi.org/10.1109/TVT.2023.3341637
Dong XWu ZLing QTian Z(2024)Byzantine-Robust Distributed Online Learning: Taming Adversarial Participants in An Adversarial EnvironmentIEEE Transactions on Signal Processing10.1109/TSP.2023.334002872(235-248)Online publication date: 2024
https://doi.org/10.1109/TSP.2023.3340028
Husnoo MAnwar AHosseinzadeh NIslam SMahmood ADoss R(2024)A Secure Federated Learning Framework for Residential Short-Term Load ForecastingIEEE Transactions on Smart Grid10.1109/TSG.2023.329238215:2(2044-2055)Online publication date: Mar-2024
https://doi.org/10.1109/TSG.2023.3292382
Zhao JZhu HWang FZheng YLu RLi H(2024)Efficient and Privacy-Preserving Federated Learning Against Poisoning AdversariesIEEE Transactions on Services Computing10.1109/TSC.2024.337793117:5(2320-2333)Online publication date: Sep-2024
https://doi.org/10.1109/TSC.2024.3377931
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents