Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Robust Multi-Agent Bandits Over Undirected Graphs

Published: 08 December 2022 Publication History

Abstract

We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) łog (T) / Δ ) regret in this setting, where K is the number of arms and Δ is the arm gap. For m łl K, this improves over the single-agent baseline regret of O(Kłog(T)/Δ). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n. In light of this negative result, we propose a new algorithm for which the i-th agent has regret O(( dmal (i) + K/n) łog(T)/Δ) on any connected and undirected graph, where dmal(i) is the number of i's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i) = m), and show the effect of malicious agents is entirely local (in the sense that only the dmal (i) malicious agents directly connected to i affect its long-term regret).

References

[1]
Animashree Anandkumar, Nithin Michael, Ao Kevin Tang, and Ananthram Swami. 2011. Distributed algorithms for learning and cognitive medium access with logarithmic regret. IEEE Journal on Selected Areas in Communications, Vol. 29, 4 (2011), 731--745.
[2]
Jean-Yves Audibert and Sébastien Bubeck. 2010. Best Arm Identification in Multi-Armed Bandits. In COLT-23th Conference on Learning Theory-2010. 13--p.
[3]
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, Vol. 47, 2--3 (2002), 235--256.
[4]
Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. 1995. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science. IEEE, 322--331.
[5]
Orly Avner and Shie Mannor. 2014. Concurrent bandits and cognitive radio networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 66--81.
[6]
Baruch Awerbuch and Robert Kleinberg. 2008. Competitive collaborative learning. J. Comput. System Sci., Vol. 74, 8 (2008), 1271--1288.
[7]
Yogev Bar-On and Yishay Mansour. 2019. Individual regret in cooperative nonstochastic multi-armed bandits. Advances in Neural Information Processing Systems, Vol. 32 (2019), 3116--3126.
[8]
Eugenio Bargiacchi, Timothy Verstraeten, Diederik Roijers, Ann Nowé, and Hado Hasselt. 2018. Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems. In International conference on machine learning. PMLR, 482--490.
[9]
Ilai Bistritz and Nicholas Bambos. 2020. Cooperative multi-player bandit optimization. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[10]
Ilai Bistritz and Amir Leshem. 2018. Distributed multi-player bandits-a game of thrones approach. In Advances in Neural Information Processing Systems. 7222--7232.
[11]
Ilija Bogunovic, Andreas Krause, and Jonathan Scarlett. 2020. Corruption-tolerant Gaussian process bandit optimization. In International Conference on Artificial Intelligence and Statistics. PMLR, 1071--1081.
[12]
Ilija Bogunovic, Arpan Losalka, Andreas Krause, and Jonathan Scarlett. 2021. Stochastic linear bandits robust to adversarial attacks. In International Conference on Artificial Intelligence and Statistics. PMLR, 991--999.
[13]
Etienne Boursier and Vianney Perchet. 2019. SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits. Advances in Neural Information Processing Systems, Vol. 32 (2019), 12071--12080.
[14]
Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. 2011. Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science, Vol. 412, 19 (2011), 1832--1852.
[15]
Swapna Buccapatnam, Jian Tan, and Li Zhang. 2015. Information sharing in distributed stochastic bandits. In 2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, 2605--2613.
[16]
Nicolo Cesa-Bianchi, Claudio Gentile, Yishay Mansour, and Alberto Minora. 2016. Delay and cooperation in nonstochastic bandits. In Conference on Learning Theory, Vol. 49. 605--622.
[17]
Mithun Chakraborty, Kai Yee Phoebe Chua, Sanmay Das, and Brendan Juba. 2017. Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits. In IJCAI. 164--170.
[18]
Ronshee Chawla, Abishek Sankararaman, Ayalvadi Ganesh, and Sanjay Shakkottai. 2020. The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. 3471--3481.
[19]
Ronshee Chawla, Abishek Sankararaman, and Sanjay Shakkottai. 2022. Multi-agent low-dimensional linear bandits. IEEE Trans. Automat. Control (2022).
[20]
Flavio Chierichetti, Silvio Lattanzi, and Alessandro Panconesi. 2010. Almost tight bounds for rumour spreading with conductance. In Proceedings of the forty-second ACM symposium on Theory of computing. 399--408.
[21]
Hiba Dakdouk, Raphaël Féraud, Romain Laroche, Nadège Varsier, and Patrick Maillé. 2021. Collaborative Exploration and Exploitation in massively Multi-Player Bandits. (2021).
[22]
Abhimanyu Dubey et al. 2020a. Cooperative multi-agent bandits with heavy tails. In International Conference on Machine Learning. PMLR, 2730--2739.
[23]
Abhimanyu Dubey et al. 2020b. Kernel methods for cooperative multi-agent contextual bandits. In International Conference on Machine Learning. PMLR, 2740--2750.
[24]
Abhimanyu Dubey and AlexSandy' Pentland. 2020. Differentially-Private Federated Linear Bandits. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[25]
Evrard Garcelon, Baptiste Roziere, Laurent Meunier, Jean Tarbouriech, Olivier Teytaud, Alessandro Lazaric, and Matteo Pirotta. 2020. Adversarial Attacks on Linear Contextual Bandits. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[26]
Anupam Gupta, Tomer Koren, and Kunal Talwar. 2019. Better Algorithms for Stochastic Bandits with Adversarial Corruptions. In Conference on Learning Theory. 1562--1578.
[27]
F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), Vol. 5, 4 (2015), 1--19.
[28]
Eshcar Hillel, Zohar S Karnin, Tomer Koren, Ronny Lempel, and Oren Somekh. 2013. Distributed exploration in multi-armed bandits. In Advances in Neural Information Processing Systems. 854--862.
[29]
Kwang-Sung Jun, Lihong Li, Yuzhe Ma, and Jerry Zhu. 2018. Adversarial attacks on stochastic bandits. Advances in Neural Information Processing Systems, Vol. 31 (2018), 3640--3649.
[30]
Dileep Kalathil, Naumaan Nayyar, and Rahul Jain. 2014. Decentralized learning for multiplayer multiarmed bandits. IEEE Transactions on Information Theory, Vol. 60, 4 (2014), 2331--2345.
[31]
Varun Kanade, Zhenming Liu, and Bozidar Radunovic. 2012. Distributed non-stochastic experts. In Advances in Neural Information Processing Systems. 260--268.
[32]
Hsu Kao, Chen-Yu Wei, and Vijay Subramanian. 2022. Decentralized cooperative reinforcement learning with hierarchical information structure. In International Conference on Algorithmic Learning Theory. PMLR, 573--605.
[33]
Sayash Kapoor, Kumar Kshitij Patel, and Purushottam Kar. 2019. Corruption-tolerant bandit learning. Machine Learning, Vol. 108, 4 (2019), 687--715.
[34]
Ravi Kumar Kolla, Krishna Jagannathan, and Aditya Gopalan. 2018. Collaborative learning of stochastic bandits over a social network. IEEE/ACM Transactions on Networking, Vol. 26, 4 (2018), 1782--1795.
[35]
Nathan Korda, Balázs Szörényi, and Li Shuai. 2016. Distributed clustering of linear bandits in peer to peer networks. In Journal of machine learning research workshop and conference proceedings, Vol. 48. International Machine Learning Societ, 1301--1309.
[36]
Anusha Lalitha and Andrea Goldsmith. 2021. Bayesian Algorithms for Decentralized Stochastic Bandits. IEEE Journal on Selected Areas in Information Theory, Vol. 2, 2 (2021), 564--583.
[37]
Peter Landgren, Vaibhav Srivastava, and Naomi Ehrich Leonard. 2016. Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms. In 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 167--172.
[38]
Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
[39]
Heath J LeBlanc, Haotian Zhang, Xenofon Koutsoukos, and Shreyas Sundaram. 2013. Resilient asymptotic consensus in robust networks. IEEE Journal on Selected Areas in Communications, Vol. 31, 4 (2013), 766--781.
[40]
Fang Liu and Ness Shroff. 2019. Data Poisoning Attacks on Stochastic Bandits. In International Conference on Machine Learning. 4042--4050.
[41]
Junyan Liu, Shuai Li, and Dapeng Li. 2021. Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions. arXiv preprint arXiv:2106.04207 (2021).
[42]
Keqin Liu and Qing Zhao. 2010. Distributed learning in multi-armed bandit with multiple players. IEEE Transactions on Signal Processing, Vol. 58, 11 (2010), 5667--5681.
[43]
Lydia T Liu, Horia Mania, and Michael Jordan. 2020. Competing bandits in matching markets. In International Conference on Artificial Intelligence and Statistics. PMLR, 1618--1628.
[44]
Thodoris Lykouris, Vahab Mirrokni, and Renato Paes Leme. 2018. Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. 114--122.
[45]
Udari Madhushani and Naomi Leonard. 2021. When to call your neighbor? strategic communication in cooperative stochastic bandits. arXiv preprint arXiv:2110.04396 (2021).
[46]
Yishay Mansour, Aleksandrs Slivkins, and Steven Wu. 2018. Competing bandits: Learning under competition. In 9th Innovations in Theoretical Computer Science, ITCS 2018. Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 48.
[47]
David Mart'inez-Rubio, Varun Kanade, and Patrick Rebeschini. 2019. Decentralized cooperative stochastic multi-armed bandits. Advances in Neural Information Processing Systems (2019).
[48]
Aritra Mitra, Hamed Hassani, and George Pappas. 2021. Exploiting Heterogeneity in Robust Federated Best-Arm Identification. arXiv preprint arXiv:2109.05700 (2021).
[49]
Conor Newton, AJ Ganesh, and Henry Reeve. 2021. Asymptotic Optimality for Decentralised Bandits. In Reinforcement Learning in Networks and Queues, Sigmetrics 2021.
[50]
Boris Pittel. 1987. On spreading a rumor. SIAM J. Appl. Math., Vol. 47, 1 (1987), 213--223.
[51]
Jonathan Rosenski, Ohad Shamir, and Liran Szlak. 2016. Multi-player bandits--a musical chairs approach. In International Conference on Machine Learning. 155--163.
[52]
Abishek Sankararaman, Ayalvadi Ganesh, and Sanjay Shakkottai. 2019. Social learning in multi agent multi armed bandits. Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 3, 3 (2019), 1--35.
[53]
Shahin Shahrampour, Alexander Rakhlin, and Ali Jadbabaie. 2017. Multi-armed bandits in multi-agent networks. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2786--2790.
[54]
Balázs Szörényi, Róbert Busa-Fekete, István HegedHu s, Róbert Ormándi, Márk Jelasity, and Balázs Kégl. 2013. Gossip-based distributed stochastic bandit algorithms. In Journal of Machine Learning Research Workshop and Conference Proceedings, Vol. 2. International Machine Learning Societ, 1056--1064.
[55]
Cem Tekin and Mihaela Van Der Schaar. 2015. Distributed online learning via cooperative contextual bandits. IEEE transactions on signal processing, Vol. 63, 14 (2015), 3700--3714.
[56]
Daniel Vial, Sanjay Shakkottai, and R Srikant. 2021. Robust multi-agent multi-armed bandits. In Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing. 161--170.
[57]
Po-An Wang, Alexandre Proutiere, Kaito Ariu, Yassir Jedra, and Alessio Russo. 2020. Optimal algorithms for multiplayer multi-armed bandits. In International Conference on Artificial Intelligence and Statistics. PMLR, 4120--4129.
[58]
Zhaowei Zhu, Jingxuan Zhu, Ji Liu, and Yang Liu. 2021. Federated bandit: A gossiping approach. In Abstract Proceedings of the 2021 ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems. 3--4.

Cited By

View all
  • (2024)Linear Bandits With Side Observations on NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2024.342232332:5(4222-4237)Online publication date: Oct-2024
  • (2023)Distributed Robust Bandits With Efficient CommunicationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.323132010:3(1586-1598)Online publication date: 1-May-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 6, Issue 3
POMACS
December 2022
534 pages
EISSN:2476-1249
DOI:10.1145/3576048
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2022
Published in POMACS Volume 6, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. malicious agents
  2. multi-armed bandits

Qualifiers

  • Research-article

Funding Sources

  • ONR
  • NSF

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)146
  • Downloads (Last 6 weeks)7
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Linear Bandits With Side Observations on NetworksIEEE/ACM Transactions on Networking10.1109/TNET.2024.342232332:5(4222-4237)Online publication date: Oct-2024
  • (2023)Distributed Robust Bandits With Efficient CommunicationIEEE Transactions on Network Science and Engineering10.1109/TNSE.2022.323132010:3(1586-1598)Online publication date: 1-May-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media