research-article

Open access

Robust Multi-Agent Bandits Over Undirected Graphs

Authors:

Sanjay Shakkottai,

R. SrikantAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 6, Issue 3

Article No.: 53, Pages 1 - 57

https://doi.org/10.1145/3570614

Published: 08 December 2022 Publication History

Abstract

We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) łog (T) / Δ ) regret in this setting, where K is the number of arms and Δ is the arm gap. For m łl K, this improves over the single-agent baseline regret of O(Kłog(T)/Δ). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n. In light of this negative result, we propose a new algorithm for which the i-th agent has regret O(( dmal (i) + K/n) łog(T)/Δ) on any connected and undirected graph, where dmal(i) is the number of i's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i) = m), and show the effect of malicious agents is entirely local (in the sense that only the dmal (i) malicious agents directly connected to i affect its long-term regret).

References

[1]

Animashree Anandkumar, Nithin Michael, Ao Kevin Tang, and Ananthram Swami. 2011. Distributed algorithms for learning and cognitive medium access with logarithmic regret. IEEE Journal on Selected Areas in Communications, Vol. 29, 4 (2011), 731--745.

Digital Library

[2]

Jean-Yves Audibert and Sébastien Bubeck. 2010. Best Arm Identification in Multi-Armed Bandits. In COLT-23th Conference on Learning Theory-2010. 13--p.

[3]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, Vol. 47, 2--3 (2002), 235--256.

Digital Library

[4]

Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. 1995. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science. IEEE, 322--331.

[5]

Orly Avner and Shie Mannor. 2014. Concurrent bandits and cognitive radio networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 66--81.

Digital Library

[6]

Baruch Awerbuch and Robert Kleinberg. 2008. Competitive collaborative learning. J. Comput. System Sci., Vol. 74, 8 (2008), 1271--1288.

Digital Library

[7]

Yogev Bar-On and Yishay Mansour. 2019. Individual regret in cooperative nonstochastic multi-armed bandits. Advances in Neural Information Processing Systems, Vol. 32 (2019), 3116--3126.

[8]

Eugenio Bargiacchi, Timothy Verstraeten, Diederik Roijers, Ann Nowé, and Hado Hasselt. 2018. Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems. In International conference on machine learning. PMLR, 482--490.

[9]

Ilai Bistritz and Nicholas Bambos. 2020. Cooperative multi-player bandit optimization. Advances in Neural Information Processing Systems, Vol. 33 (2020).

[10]

Ilai Bistritz and Amir Leshem. 2018. Distributed multi-player bandits-a game of thrones approach. In Advances in Neural Information Processing Systems. 7222--7232.

[11]

Ilija Bogunovic, Andreas Krause, and Jonathan Scarlett. 2020. Corruption-tolerant Gaussian process bandit optimization. In International Conference on Artificial Intelligence and Statistics. PMLR, 1071--1081.

[12]

Ilija Bogunovic, Arpan Losalka, Andreas Krause, and Jonathan Scarlett. 2021. Stochastic linear bandits robust to adversarial attacks. In International Conference on Artificial Intelligence and Statistics. PMLR, 991--999.

[13]

Etienne Boursier and Vianney Perchet. 2019. SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits. Advances in Neural Information Processing Systems, Vol. 32 (2019), 12071--12080.

[14]

Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. 2011. Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science, Vol. 412, 19 (2011), 1832--1852.

Digital Library

[15]

Swapna Buccapatnam, Jian Tan, and Li Zhang. 2015. Information sharing in distributed stochastic bandits. In 2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, 2605--2613.

[16]

Nicolo Cesa-Bianchi, Claudio Gentile, Yishay Mansour, and Alberto Minora. 2016. Delay and cooperation in nonstochastic bandits. In Conference on Learning Theory, Vol. 49. 605--622.

[17]

Mithun Chakraborty, Kai Yee Phoebe Chua, Sanmay Das, and Brendan Juba. 2017. Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits. In IJCAI. 164--170.

[18]

Ronshee Chawla, Abishek Sankararaman, Ayalvadi Ganesh, and Sanjay Shakkottai. 2020. The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. 3471--3481.

[19]

Ronshee Chawla, Abishek Sankararaman, and Sanjay Shakkottai. 2022. Multi-agent low-dimensional linear bandits. IEEE Trans. Automat. Control (2022).

[20]

Flavio Chierichetti, Silvio Lattanzi, and Alessandro Panconesi. 2010. Almost tight bounds for rumour spreading with conductance. In Proceedings of the forty-second ACM symposium on Theory of computing. 399--408.

Digital Library

[21]

Hiba Dakdouk, Raphaël Féraud, Romain Laroche, Nadège Varsier, and Patrick Maillé. 2021. Collaborative Exploration and Exploitation in massively Multi-Player Bandits. (2021).

[22]

Abhimanyu Dubey et al. 2020a. Cooperative multi-agent bandits with heavy tails. In International Conference on Machine Learning. PMLR, 2730--2739.

[23]

Abhimanyu Dubey et al. 2020b. Kernel methods for cooperative multi-agent contextual bandits. In International Conference on Machine Learning. PMLR, 2740--2750.

[24]

Abhimanyu Dubey and AlexSandy' Pentland. 2020. Differentially-Private Federated Linear Bandits. Advances in Neural Information Processing Systems, Vol. 33 (2020).

[25]

Evrard Garcelon, Baptiste Roziere, Laurent Meunier, Jean Tarbouriech, Olivier Teytaud, Alessandro Lazaric, and Matteo Pirotta. 2020. Adversarial Attacks on Linear Contextual Bandits. Advances in Neural Information Processing Systems, Vol. 33 (2020).

[26]

Anupam Gupta, Tomer Koren, and Kunal Talwar. 2019. Better Algorithms for Stochastic Bandits with Adversarial Corruptions. In Conference on Learning Theory. 1562--1578.

[27]

F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), Vol. 5, 4 (2015), 1--19.

[28]

Eshcar Hillel, Zohar S Karnin, Tomer Koren, Ronny Lempel, and Oren Somekh. 2013. Distributed exploration in multi-armed bandits. In Advances in Neural Information Processing Systems. 854--862.

[29]

Kwang-Sung Jun, Lihong Li, Yuzhe Ma, and Jerry Zhu. 2018. Adversarial attacks on stochastic bandits. Advances in Neural Information Processing Systems, Vol. 31 (2018), 3640--3649.

[30]

Dileep Kalathil, Naumaan Nayyar, and Rahul Jain. 2014. Decentralized learning for multiplayer multiarmed bandits. IEEE Transactions on Information Theory, Vol. 60, 4 (2014), 2331--2345.

Digital Library

[31]

Varun Kanade, Zhenming Liu, and Bozidar Radunovic. 2012. Distributed non-stochastic experts. In Advances in Neural Information Processing Systems. 260--268.

[32]

Hsu Kao, Chen-Yu Wei, and Vijay Subramanian. 2022. Decentralized cooperative reinforcement learning with hierarchical information structure. In International Conference on Algorithmic Learning Theory. PMLR, 573--605.

[33]

Sayash Kapoor, Kumar Kshitij Patel, and Purushottam Kar. 2019. Corruption-tolerant bandit learning. Machine Learning, Vol. 108, 4 (2019), 687--715.

Digital Library

[34]

Ravi Kumar Kolla, Krishna Jagannathan, and Aditya Gopalan. 2018. Collaborative learning of stochastic bandits over a social network. IEEE/ACM Transactions on Networking, Vol. 26, 4 (2018), 1782--1795.

Digital Library

[35]

Nathan Korda, Balázs Szörényi, and Li Shuai. 2016. Distributed clustering of linear bandits in peer to peer networks. In Journal of machine learning research workshop and conference proceedings, Vol. 48. International Machine Learning Societ, 1301--1309.

[36]

Anusha Lalitha and Andrea Goldsmith. 2021. Bayesian Algorithms for Decentralized Stochastic Bandits. IEEE Journal on Selected Areas in Information Theory, Vol. 2, 2 (2021), 564--583.

[37]

Peter Landgren, Vaibhav Srivastava, and Naomi Ehrich Leonard. 2016. Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms. In 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 167--172.

Digital Library

[38]

Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.

[39]

Heath J LeBlanc, Haotian Zhang, Xenofon Koutsoukos, and Shreyas Sundaram. 2013. Resilient asymptotic consensus in robust networks. IEEE Journal on Selected Areas in Communications, Vol. 31, 4 (2013), 766--781.

[40]

Fang Liu and Ness Shroff. 2019. Data Poisoning Attacks on Stochastic Bandits. In International Conference on Machine Learning. 4042--4050.

[41]

Junyan Liu, Shuai Li, and Dapeng Li. 2021. Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions. arXiv preprint arXiv:2106.04207 (2021).

[42]

Keqin Liu and Qing Zhao. 2010. Distributed learning in multi-armed bandit with multiple players. IEEE Transactions on Signal Processing, Vol. 58, 11 (2010), 5667--5681.

Digital Library

[43]

Lydia T Liu, Horia Mania, and Michael Jordan. 2020. Competing bandits in matching markets. In International Conference on Artificial Intelligence and Statistics. PMLR, 1618--1628.

[44]

Thodoris Lykouris, Vahab Mirrokni, and Renato Paes Leme. 2018. Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. 114--122.

Digital Library

[45]

Udari Madhushani and Naomi Leonard. 2021. When to call your neighbor? strategic communication in cooperative stochastic bandits. arXiv preprint arXiv:2110.04396 (2021).

[46]

Yishay Mansour, Aleksandrs Slivkins, and Steven Wu. 2018. Competing bandits: Learning under competition. In 9th Innovations in Theoretical Computer Science, ITCS 2018. Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 48.

[47]

David Mart'inez-Rubio, Varun Kanade, and Patrick Rebeschini. 2019. Decentralized cooperative stochastic multi-armed bandits. Advances in Neural Information Processing Systems (2019).

[48]

Aritra Mitra, Hamed Hassani, and George Pappas. 2021. Exploiting Heterogeneity in Robust Federated Best-Arm Identification. arXiv preprint arXiv:2109.05700 (2021).

[49]

Conor Newton, AJ Ganesh, and Henry Reeve. 2021. Asymptotic Optimality for Decentralised Bandits. In Reinforcement Learning in Networks and Queues, Sigmetrics 2021.

[50]

Boris Pittel. 1987. On spreading a rumor. SIAM J. Appl. Math., Vol. 47, 1 (1987), 213--223.

Digital Library

[51]

Jonathan Rosenski, Ohad Shamir, and Liran Szlak. 2016. Multi-player bandits--a musical chairs approach. In International Conference on Machine Learning. 155--163.

[52]

Abishek Sankararaman, Ayalvadi Ganesh, and Sanjay Shakkottai. 2019. Social learning in multi agent multi armed bandits. Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 3, 3 (2019), 1--35.

Digital Library

[53]

Shahin Shahrampour, Alexander Rakhlin, and Ali Jadbabaie. 2017. Multi-armed bandits in multi-agent networks. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2786--2790.

Digital Library

[54]

Balázs Szörényi, Róbert Busa-Fekete, István HegedHu s, Róbert Ormándi, Márk Jelasity, and Balázs Kégl. 2013. Gossip-based distributed stochastic bandit algorithms. In Journal of Machine Learning Research Workshop and Conference Proceedings, Vol. 2. International Machine Learning Societ, 1056--1064.

[55]

Cem Tekin and Mihaela Van Der Schaar. 2015. Distributed online learning via cooperative contextual bandits. IEEE transactions on signal processing, Vol. 63, 14 (2015), 3700--3714.

Digital Library

[56]

Daniel Vial, Sanjay Shakkottai, and R Srikant. 2021. Robust multi-agent multi-armed bandits. In Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing. 161--170.

Digital Library

[57]

Po-An Wang, Alexandre Proutiere, Kaito Ariu, Yassir Jedra, and Alessio Russo. 2020. Optimal algorithms for multiplayer multi-armed bandits. In International Conference on Artificial Intelligence and Statistics. PMLR, 4120--4129.

[58]

Zhaowei Zhu, Jingxuan Zhu, Ji Liu, and Yang Liu. 2021. Federated bandit: A gossiping approach. In Abstract Proceedings of the 2021 ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems. 3--4.

Digital Library

Cited By

Kar ASingh RLiu FLiu XShroff N(2024)Linear Bandits With Side Observations on NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2024.342232332:5(4222-4237)Online publication date: Oct-2024
https://doi.org/10.1109/TNET.2024.3422323
Wang AQin ZZheng LLi DGao L(2023)Distributed Robust Bandits With Efficient CommunicationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.323132010:3(1586-1598)Online publication date: 1-May-2023
https://doi.org/10.1109/TNSE.2022.3231320

Index Terms

Robust Multi-Agent Bandits Over Undirected Graphs
1. Theory of computation
  1. Design and analysis of algorithms
    1. Distributed algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

Robust Multi-Agent Multi-Armed Bandits
MobiHoc '21: Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

Recent works have shown that agents facing independent instances of a stochastic K-armed bandit can collaborate to decrease regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which ...
Robust Multi-Agent Bandits Over Undirected Graphs
SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((...
Robust Multi-Agent Bandits Over Undirected Graphs
SIGMETRICS '23

We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 6, Issue 3

POMACS

December 2022

534 pages

EISSN:2476-1249

DOI:10.1145/3576048

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California, United States
,
Zhi-Li Zhang
University of Minnesota, United States

Issue’s Table of Contents

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2022

Published in POMACS Volume 6, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ONR
NSF

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
349
Total Downloads

Downloads (Last 12 months)146
Downloads (Last 6 weeks)7

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kar ASingh RLiu FLiu XShroff N(2024)Linear Bandits With Side Observations on NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2024.342232332:5(4222-4237)Online publication date: Oct-2024
https://doi.org/10.1109/TNET.2024.3422323
Wang AQin ZZheng LLi DGao L(2023)Distributed Robust Bandits With Efficient CommunicationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.323132010:3(1586-1598)Online publication date: 1-May-2023
https://doi.org/10.1109/TNSE.2022.3231320

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents