Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Public Access

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

Published: 19 December 2017 Publication History

Abstract

We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized system that consists of a parameter server and m working machines; each working machine keeps N/m data samples, where N is the total number of samples. In each iteration, up to q of the m working machines suffer Byzantine faults -- a faulty machine in the given iteration behaves arbitrarily badly against the system and has complete knowledge of the system. Additionally, the sets of faulty machines may be different across iterations. Our goal is to design robust algorithms such that the system can learn the underlying true parameter, which is of dimension d, despite the interruption of the Byzantine attacks.
In this paper, based on the geometric median of means of the gradients, we propose a simple variant of the classical gradient descent method. We show that our method can tolerate q Byzantine failures up to 2(1+ε)q ≤ for an arbitrarily small but fixed constant ε > 0. The parameter estimate converges in O(log N) rounds with an estimation error on the order of max{√dq/N, √d/N, which is larger than the minimax-optimal error rate √d/N in the centralized and failure-free setting by at most a factor of √q. The total computational complexity of our algorithm is of O((Nd/m) log N) at each working machine and O(md + kd log3 N) at the central server, and the total communication cost is of O(m d log N). We further provide an application of our general results to the linear regression problem.
A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. To handle this issue in the analysis, we prove that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.

References

[1]
Rakesh Agrawal and Ramakrishnan Srikant. 2000. Privacy-preserving Data Mining. SIGMOD Rec., Vol. 29, 2 (May 2000), 439--450.
[2]
Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. 2017. Byzantine-Tolerant Machine Learning. arXiv preprint arXiv:1703.02757 (2017).
[3]
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, Vol. 3, 1 (2011), 1--122.
[4]
Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press.
[5]
Hervé Cardot, Peggy Cénac, Pierre-André Zitt, et al\mbox. 2013. Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, Vol. 19, 1 (2013), 18--43.
[6]
Michael B Cohen, Yin Tat Lee, Gary Miller, Jakub Pachocki, and Aaron Sidford. 2016. Geometric median in nearly linear time. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing. ACM, 9--21.
[7]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM Vol. 51, 1 (2008), 107--113.
[8]
Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2016. Robust estimators in high dimensions without the computational intractability. Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on. IEEE, 655--664.
[9]
John Duchi, Martin J Wainwright, and Michael I Jordan. 2013. Local privacy and minimax bounds: Sharp rates for probability estimation. Advances in Neural Information Processing Systems. 1529--1537.
[10]
Jiashi Feng, Huan Xu, and Shie Mannor. 2014. Distributed Robust Learning. arXiv preprint arXiv:1409.5937 (2014).
[11]
Michael I Jordan, Jason D Lee, and Yun Yang. 2016. Communication-Efficient Distributed Statistical Inference. arXiv preprint arXiv:1605.07689 (2016).
[12]
Charlie Kaufman, Radia Perlman, and Mike Speciner. 2002. Network security: private communication in a public world. Prentice Hall Press.
[13]
JHB Kemperman. 1987. The median of a finite measure on a Banach space. Statistical data analysis based on the L1-norm and related methods (Neuchâtel, 1987) (1987), 217--230.
[14]
Jakub Konevcnỳ, Brendan McMahan, and Daniel Ramage. 2015. Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575 (2015).
[15]
Kevin A Lai, Anup B Rao, and Santosh Vempala. 2016. Agnostic estimation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on. IEEE, 665--674.
[16]
Hendrik P Lopuhaa and Peter J Rousseeuw. 1991. Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics (1991), 229--248.
[17]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment Vol. 5, 8 (2012), 716--727.
[18]
Nancy A. Lynch. 1996. Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. \showISBNx9780080504704
[19]
Brendan McMahan and Daniel Ramage. Accessed: 2017-04--10. Federated Learning: Collaborative Machine Learning without Centralized Training Data. https://research.googleblog.com/2017/04/federated-learning-collaborative.html (Accessed: 2017-04--10).
[20]
Song Mei, Yu Bai, and Andrea Montanari. 2016. The landscape of empirical risk for non-convex losses. arXiv preprint arXiv:1607.06534 (2016).
[21]
P Milasevic, GR Ducharme, et al. 1987. Uniqueness of the spatial median. The Annals of Statistics Vol. 15, 3 (1987), 1332--1333.
[22]
Stanislav Minsker et al. 2015. Geometric median and robust estimation in Banach spaces. Bernoulli, Vol. 21, 4 (2015), 2308--2335.
[23]
Philipp Moritz, Robert Nishihara, Ion Stoica, and Michael I Jordan. 2015. Sparknet: Training deep networks in spark. arXiv preprint arXiv:1511.06051 (2015).
[24]
Jyrki Möttönen, Klaus Nordhausen, Hannu Oja, et al. 2010. Asymptotic theory of the spatial median. Nonparametrics and Robustness in Modern Statistical Inference and Time Series Analysis: A Festschrift in honor of Professor Jana Jurevcková. Institute of Mathematical Statistics, 182--193.
[25]
Charles P. Pfleeger and Shari Lawrence Pfleeger. 2002. Security in Computing (3rd ed.). Prentice Hall Professional Technical Reference.
[26]
Foster J Provost and Daniel N Hennessy. 1996. Scaling up: Distributed machine learning with cooperation. AAAI/IAAI, Vol. 1. Citeseer, 74--79.
[27]
Hanif D Sherali and Dimitri P Bertsekas. 1998. Network optimization: Continuous and discrete models. (1998).
[28]
R. Vershynin. 2010. Introduction to the non-asymptotic analysis of random matrices. Arxiv preprint arxiv:1011.3027 (2010).
[29]
Cong Wang, Qian Wang, Kui Ren, and Wenjing Lou. 2010. Privacy-preserving public auditing for data storage security in cloud computing. Infocom, 2010 proceedings ieee. Ieee, 1--9.
[30]
Yihong Wu. 2017. Lecture Notes on Information-theoretic Methods For High-dimensional Statistics. (April 2017). http://www.stat.yale.edu/ yw562/teaching/it-stats.pdf.
[31]
Yuchen Zhang, John Duchi, and Martin Wainwright. 2015. Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. J. Mach. Learn. Res Vol. 16 (2015), 3299--3340.
[32]
Yuchen Zhang, John C. Duchi, and Martin J. Wainwright. 2013. Communication-Efficient Algorithms for Statistical Optimization. Journal of Machine Learning Research Vol. 14 (2013), 3321--3363. http://jmlr.org/papers/v14/zhang13b.html

Cited By

View all
  • (2024)Community Consensus: Converging Locally Despite Adversaries and Heterogeneous Connectivity2024 American Control Conference (ACC)10.23919/ACC60939.2024.10644680(4141-4148)Online publication date: 10-Jul-2024
  • (2024)Federated Learning's Dynamic Defense Against Byzantine Attacks: Integrating SIFT-Wavelet and Differential Privacy for Byzantine Grade Levels DetectionInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.53810:4Online publication date: 30-Oct-2024
  • (2024)DP-Poison: Poisoning Federated Learning under the Cover of Differential PrivacyACM Transactions on Privacy and Security10.1145/3702325Online publication date: 2-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 1, Issue 2
December 2017
480 pages
EISSN:2476-1249
DOI:10.1145/3175501
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2017
Published in POMACS Volume 1, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. byzantine adversaries
  2. distributed systems
  3. learning
  4. security

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)568
  • Downloads (Last 6 weeks)67
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Community Consensus: Converging Locally Despite Adversaries and Heterogeneous Connectivity2024 American Control Conference (ACC)10.23919/ACC60939.2024.10644680(4141-4148)Online publication date: 10-Jul-2024
  • (2024)Federated Learning's Dynamic Defense Against Byzantine Attacks: Integrating SIFT-Wavelet and Differential Privacy for Byzantine Grade Levels DetectionInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.53810:4Online publication date: 30-Oct-2024
  • (2024)DP-Poison: Poisoning Federated Learning under the Cover of Differential PrivacyACM Transactions on Privacy and Security10.1145/3702325Online publication date: 2-Nov-2024
  • (2024)A Survey of Trustworthy Federated Learning: Issues, Solutions, and ChallengesACM Transactions on Intelligent Systems and Technology10.1145/367818115:6(1-47)Online publication date: 23-Jul-2024
  • (2024)Brief Announcement: A Case for Byzantine Machine LearningProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662802(131-134)Online publication date: 17-Jun-2024
  • (2024)Resilient Formation-Tracking of Double-Integretor Multi-agent Systems against Byzantine Attacks2024 39th Youth Academic Annual Conference of Chinese Association of Automation (YAC)10.1109/YAC63405.2024.10598540(2279-2284)Online publication date: 7-Jun-2024
  • (2024)RSAM: Byzantine-Robust and Secure Model Aggregation in Federated Learning for Internet of Vehicles Using Private Approximate MedianIEEE Transactions on Vehicular Technology10.1109/TVT.2023.334163773:5(6714-6726)Online publication date: May-2024
  • (2024)Byzantine-Robust Distributed Online Learning: Taming Adversarial Participants in An Adversarial EnvironmentIEEE Transactions on Signal Processing10.1109/TSP.2023.334002872(235-248)Online publication date: 2024
  • (2024)A Secure Federated Learning Framework for Residential Short-Term Load ForecastingIEEE Transactions on Smart Grid10.1109/TSG.2023.329238215:2(2044-2055)Online publication date: Mar-2024
  • (2024)Efficient and Privacy-Preserving Federated Learning Against Poisoning AdversariesIEEE Transactions on Services Computing10.1109/TSC.2024.337793117:5(2320-2333)Online publication date: Sep-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media