Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3641512.3686368acmconferencesArticle/Chapter ViewAbstractPublication PagesmobihocConference Proceedingsconference-collections
research-article
Open access

Risk-Aware Multi-Agent Multi-Armed Bandits

Published: 01 October 2024 Publication History

Abstract

Multi-armed bandits (MAB) is an online learning and decisionmaking model under uncertainty. Instead of maximizing the expected utility (or reward) in a classical MAB setting, the variance of the utility should be considered when making risk-aware decisions. In this paper, we propose a risk-aware multi-agent MAB (MAMAB) model, which considers both the "independent" and "correlated" risk when multiple agents make arm-pulling decisions. Specifically, the system includes a platform that owns a number of tasks (or arms) awaiting a group of agents to accomplish. We show how to calculate the arm-pulling strategy of agents with potentially different eligible arm sets under a Nash equilibrium point. From the perspective of the platform, each arm has its maximal capacity to accommodate arm-pulling agents. We design the platform's optimal payment algorithms for its risk-aware revenue maximization (a regret minimization) under both independent and correlated risks. We prove that our algorithms achieve the sub-linear regret under independent risks when the platform can or cannot differentiate the utility on each arm. We also prove that our algorithm achieves the sublinear regret under correlated risks. We also carry out experiments to quantify the merits of our algorithms for various networking applications, such as crowdsourcing and edge computing.

References

[1]
A. Slivkins et al., "Introduction to multi-armed bandits," Foundations and Trends® in Machine Learning, vol. 12, no. 1-2, pp. 1--286, 2019.
[2]
S. Vakili and Q. Zhao, "Risk-averse multi-armed bandit problems under mean-variance measure," IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 6, pp. 1093--1111, 2016.
[3]
A. Sani, A. Lazaric, and R. Munos, "Risk-aversion in multi-armed bandits," Advances in neural information processing systems, vol. 25, 2012.
[4]
X. Liu, M. Derakhshani, S. Lambotharan, and M. Van der Schaar, "Risk-aware multi-armed bandits with refined upper confidence bounds," IEEE Signal Processing Letters, vol. 28, pp. 269--273, 2020.
[5]
Foap, https://www.foap.com/missions.
[6]
K. Liu and Q. Zhao, "Distributed learning in multi-armed bandit with multiple players," IEEE transactions on signal processing, vol. 58, no. 11, pp. 5667--5681, 2010.
[7]
Y. Gai, B. Krishnamachari, and R. Jain, "Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation," in 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN). IEEE, 2010, pp. 1--9.
[8]
E. Boursier and V. Perchet, "Selfish robustness and equilibria in multi-player bandits," in Conference on Learning Theory. PMLR, 2020, pp. 530--581.
[9]
A. C. Tossou, C. Dimitrakakis, J. Rzepecki, and K. Hofmann, "A novel individually rational objective in multi-agent multi-armed bandits: Algorithms and regret bounds," in Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020, pp. 1395--1403.
[10]
K. Taywade, B. Harrison, and A. Bagh, "Modelling cournot games as multi-agent multi-armed bandits," arXiv preprint arXiv:2201.01182, 2022.
[11]
L. T. Liu, H. Mania, and M. Jordan, "Competing bandits in matching markets," in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 1618--1628.
[12]
L. T. Liu, F. Ruan, H. Mania, and M. I. Jordan, "Bandit learning in decentralized matching markets," The Journal of Machine Learning Research, vol. 22, no. 1, pp. 9612--9645, 2021.
[13]
A. Sankararaman, S. Basu, and K. A. Sankararaman, "Dominate or delete: Decentralized competing bandits in serial dictatorship," in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 1252--1260.
[14]
S. Vakili and Q. Zhao, "Mean-variance and value at risk in multi-armed bandit problems," in 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2015, pp. 1330--1335.
[15]
Q. Zhu and V. Tan, "Thompson sampling algorithms for mean-variance bandits," in International Conference on Machine Learning. PMLR, 2020, pp. 11 599--11 608.
[16]
Y. David, B. Szörényi, M. Ghavamzadeh, S. Mannor, and N. Shimkin, "Pac bandits with risk constraints." in ISAIM, 2018.
[17]
N. Galichet, M. Sebag, and O. Teytaud, "Exploration vs exploitation vs safety: Risk-aware multi-armed bandits," in Asian conference on machine learning. PMLR, 2013, pp. 245--260.
[18]
A. Kagrecha, J. Nair, and K. P. Jagannathan, "Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards." in NeurIPS, 2019, pp. 11 269--11 278.
[19]
Y. Du, S. Wang, Z. Fang, and L. Huang, "Continuous mean-covariance bandits," Advances in Neural Information Processing Systems, vol. 34, pp. 875--886, 2021.
[20]
D. Fudenberg and J. Tirole, Game Theory. Cambridge, Massachusetts: MIT Press, 1991.
[21]
Q. Shao, J. Ye, and J. C. Lui, Tech. Rep. [Online]. Available: https://www.dropbox.com/scl/fo/xuw3wmu2agtl3zmu2h6qg/AIA62vRHOVCDJWEm6UHF2-8?rlkey=nfhbva52li63pi8rvkp22a4gx&dl=0
[22]
S. Burer and A. N. Letchford, "Non-convex mixed-integer nonlinear programming: A survey," Surveys in Operations Research and Management Science, vol. 17, no. 2, pp. 97--106, Jul. 2012.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MobiHoc '24: Proceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
October 2024
511 pages
ISBN:9798400705212
DOI:10.1145/3641512
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2024

Check for updates

Author Tags

  1. multi-armed bandits
  2. multi-agent systems
  3. risk-aware bandits

Qualifiers

  • Research-article

Funding Sources

  • Hong Kong Research Grants Council

Conference

MobiHoc '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 296 of 1,843 submissions, 16%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 147
    Total Downloads
  • Downloads (Last 12 months)147
  • Downloads (Last 6 weeks)87
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media