research-article

Open access

Risk-Aware Multi-Agent Multi-Armed Bandits

Authors:

John C. S. LuiAuthors Info & Claims

MOBIHOC '24: Proceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

Pages 61 - 70

https://doi.org/10.1145/3641512.3686368

Published: 01 October 2024 Publication History

Abstract

Multi-armed bandits (MAB) is an online learning and decisionmaking model under uncertainty. Instead of maximizing the expected utility (or reward) in a classical MAB setting, the variance of the utility should be considered when making risk-aware decisions. In this paper, we propose a risk-aware multi-agent MAB (MAMAB) model, which considers both the "independent" and "correlated" risk when multiple agents make arm-pulling decisions. Specifically, the system includes a platform that owns a number of tasks (or arms) awaiting a group of agents to accomplish. We show how to calculate the arm-pulling strategy of agents with potentially different eligible arm sets under a Nash equilibrium point. From the perspective of the platform, each arm has its maximal capacity to accommodate arm-pulling agents. We design the platform's optimal payment algorithms for its risk-aware revenue maximization (a regret minimization) under both independent and correlated risks. We prove that our algorithms achieve the sub-linear regret under independent risks when the platform can or cannot differentiate the utility on each arm. We also prove that our algorithm achieves the sublinear regret under correlated risks. We also carry out experiments to quantify the merits of our algorithms for various networking applications, such as crowdsourcing and edge computing.

References

[1]

A. Slivkins et al., "Introduction to multi-armed bandits," Foundations and Trends® in Machine Learning, vol. 12, no. 1-2, pp. 1--286, 2019.

Digital Library

[2]

S. Vakili and Q. Zhao, "Risk-averse multi-armed bandit problems under mean-variance measure," IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 6, pp. 1093--1111, 2016.

[3]

A. Sani, A. Lazaric, and R. Munos, "Risk-aversion in multi-armed bandits," Advances in neural information processing systems, vol. 25, 2012.

[4]

X. Liu, M. Derakhshani, S. Lambotharan, and M. Van der Schaar, "Risk-aware multi-armed bandits with refined upper confidence bounds," IEEE Signal Processing Letters, vol. 28, pp. 269--273, 2020.

[5]

Foap, https://www.foap.com/missions.

[6]

K. Liu and Q. Zhao, "Distributed learning in multi-armed bandit with multiple players," IEEE transactions on signal processing, vol. 58, no. 11, pp. 5667--5681, 2010.

[7]

Y. Gai, B. Krishnamachari, and R. Jain, "Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation," in 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN). IEEE, 2010, pp. 1--9.

[8]

E. Boursier and V. Perchet, "Selfish robustness and equilibria in multi-player bandits," in Conference on Learning Theory. PMLR, 2020, pp. 530--581.

[9]

A. C. Tossou, C. Dimitrakakis, J. Rzepecki, and K. Hofmann, "A novel individually rational objective in multi-agent multi-armed bandits: Algorithms and regret bounds," in Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020, pp. 1395--1403.

[10]

K. Taywade, B. Harrison, and A. Bagh, "Modelling cournot games as multi-agent multi-armed bandits," arXiv preprint arXiv:2201.01182, 2022.

[11]

L. T. Liu, H. Mania, and M. Jordan, "Competing bandits in matching markets," in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 1618--1628.

[12]

L. T. Liu, F. Ruan, H. Mania, and M. I. Jordan, "Bandit learning in decentralized matching markets," The Journal of Machine Learning Research, vol. 22, no. 1, pp. 9612--9645, 2021.

Digital Library

[13]

A. Sankararaman, S. Basu, and K. A. Sankararaman, "Dominate or delete: Decentralized competing bandits in serial dictatorship," in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 1252--1260.

[14]

S. Vakili and Q. Zhao, "Mean-variance and value at risk in multi-armed bandit problems," in 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2015, pp. 1330--1335.

[15]

Q. Zhu and V. Tan, "Thompson sampling algorithms for mean-variance bandits," in International Conference on Machine Learning. PMLR, 2020, pp. 11 599--11 608.

[16]

Y. David, B. Szörényi, M. Ghavamzadeh, S. Mannor, and N. Shimkin, "Pac bandits with risk constraints." in ISAIM, 2018.

[17]

N. Galichet, M. Sebag, and O. Teytaud, "Exploration vs exploitation vs safety: Risk-aware multi-armed bandits," in Asian conference on machine learning. PMLR, 2013, pp. 245--260.

[18]

A. Kagrecha, J. Nair, and K. P. Jagannathan, "Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards." in NeurIPS, 2019, pp. 11 269--11 278.

[19]

Y. Du, S. Wang, Z. Fang, and L. Huang, "Continuous mean-covariance bandits," Advances in Neural Information Processing Systems, vol. 34, pp. 875--886, 2021.

[20]

D. Fudenberg and J. Tirole, Game Theory. Cambridge, Massachusetts: MIT Press, 1991.

[21]

Q. Shao, J. Ye, and J. C. Lui, Tech. Rep. [Online]. Available: https://www.dropbox.com/scl/fo/xuw3wmu2agtl3zmu2h6qg/AIA62vRHOVCDJWEm6UHF2-8?rlkey=nfhbva52li63pi8rvkp22a4gx&dl=0

[22]

S. Burer and A. N. Letchford, "Non-convex mixed-integer nonlinear programming: A survey," Surveys in Operations Research and Management Science, vol. 17, no. 2, pp. 97--106, Jul. 2012.

Index Terms

Risk-Aware Multi-Agent Multi-Armed Bandits
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems
2. Theory of computation
  1. Design and analysis of algorithms
    1. Online algorithms
      1. Online learning algorithms
  2. Theory and algorithms for application domains
    1. Algorithmic game theory and mechanism design
    2. Machine learning theory
      1. Regret bounds

Recommendations

Ballooning Multi-Armed Bandits
AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems

We introduce ballooning multi-armed bandits (BL-MAB), a novel extension to the classical stochastic MAB model. In the BL-MAB model, the set of available arms grows (or balloons) over time. The regret in a BL-MAB setting is computed with respect to the ...
Multi-armed bandits with episode context

A multi-armed bandit episode consists of n trials, each allowing selection of one of K arms, resulting in payoff from a distribution over [0,1] associated with that arm. We assume contextual side information is available at the start of the episode. ...
Fair algorithms for multi-agent multi-armed bandits
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

We propose a multi-agent variant of the classical multi-armed bandit problem, in which there are N agents and K arms, and pulling an arm generates a (possibly different) stochastic reward for each agent. Unlike the classical multi-armed bandit problem, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MobiHoc '24: Proceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

October 2024

511 pages

ISBN:9798400705212

DOI:10.1145/3641512

General Chair:
Symeon Papavassiliou,
Program Chair:
Stefan Schmid

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Hong Kong Research Grants Council

Conference

MobiHoc '24

Sponsor:

SIGMOBILE

MobiHoc '24: Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing

October 14 - 17, 2024

Athens, Greece

Acceptance Rates

Overall Acceptance Rate 296 of 1,843 submissions, 16%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
291
Total Downloads

Downloads (Last 12 months)291
Downloads (Last 6 weeks)47

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten