abstract

A Tutorial on Multi-Armed Bandit Applications for Large Language Models

Authors:

Djallel Bouneffouf,

Raphaël FéraudAuthors Info & Claims

KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 6412 - 6413

https://doi.org/10.1145/3637528.3671440

Published: 24 August 2024 Publication History

Abstract

This tutorial offers a comprehensive guide on using multi-armed bandit (MAB) algorithms to improve Large Language Models (LLMs). As Natural Language Processing (NLP) tasks grow, efficient and adaptive language generation systems are increasingly needed. MAB algorithms, which balance exploration and exploitation under uncertainty, are promising for enhancing LLMs.

The tutorial covers foundational MAB concepts, including the exploration-exploitation trade-off and strategies like epsilon-greedy, UCB (Upper Confidence Bound), and Thompson Sampling. It then explores integrating MAB with LLMs, focusing on designing architectures that treat text generation options as arms in a bandit problem. Practical aspects like reward design, exploration policies, and scalability are discussed.

Real-world case studies demonstrate the benefits of MAB-augmented LLMs in content recommendation, dialogue generation, and personalized content creation, showing how these techniques improve relevance, diversity, and user engagement.

References

[1]

Tor Lattimore and Csaba Szepesvari. Bandit algorithms. 2017.

[2]

Djallel Bouneffouf, Irina Rish, and Charu Aggarwal. Survey on applications of multi-armed and contextual bandits. In 2020 IEEE Congress on Evolutionary Computation (CEC), pages 1--8. IEEE, 2020.

Digital Library

[3]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.

[4]

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, LuWang, andWeizhu Chen. Lora: Low-rank adaptation of large language models, 2021.

[5]

Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 208--214, 2011.

[6]

Djallel Bouneffouf, Irina Rish, Guillermo A Cecchi, and Raphaël Féraud. Context attentive bandits: Contextual bandit with restricted context. arXiv preprint arXiv:1705.03821, 2017.

[7]

Djallel Bouneffouf, Raphael Feraud, Sohini Upadhyay, Irina Rish, and Yasaman Khazaeni. Toward optimal solution for the context-attentive bandit problem. In IJCAI, pages 3493--3500, 2021.

[8]

Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 2002.

[9]

Robin Allesiardo, Raphaël Féraud, and Odalric-Ambrym Maillard. The Nonstationary Stochastic Multi-armed Bandit Problem. International Journal of Data Science and Analytics, 2017.

[10]

Tanguy Urvoy, Fabrice Clerot, Raphael Féraud, and Sami Naamane. Generic exploration and k-armed voting bandits. In ICML, 2013.

Digital Library

[11]

Houssam Zenati, Eustache Diemert, Matthieu Martin, Julien Mairal, and Pierre Gaillard. Sequential counterfactual risk minimization. In ICML, 2023.

[12]

Yu Xia, Fang Kong, Tong Yu, Liya Guo, Ryan A Rossi, Sungchul Kim, and Shuai Li. Which llm to play? convergence-aware online model selection with timeincreasing bandits. arXiv preprint arXiv:2403.07213, 2024.

[13]

Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, and Benjamin Van Roy. Efficient exploration for llms. arXiv preprint arXiv:2402.00396, 2024.

[14]

Chengshuai Shi, Kun Yang, Jing Yang, and Cong Shen. Best arm identification for prompt learning under a limited budget. arXiv preprint arXiv:2402.09723, 2024.

[15]

Lixiang Li, Bharat Bhargava, Alina Nesen, and Nagender Aneja. Sentimentpulse: Temporal-aware custom language models vs. gpt-3.5 for consumer sentiment.

[16]

Djallel Bouneffouf, Oznur Alkan, Raphaël Féraud, and Baihan Lin. Question answering system with sparse and noisy feedback. In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4--10, 2023, pages 1--5. IEEE, 2023.

Index Terms

A Tutorial on Multi-Armed Bandit Applications for Large Language Models

Index terms have been assigned to the content through auto-classification.

Recommendations

Multi-armed bandit problem with known trend

We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different ...
Multi-armed Bandit with Additional Observations
SIGMETRICS '18

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose ...
Multi-armed Bandit with Additional Observations
SIGMETRICS '18: Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2024

6901 pages

ISBN:9798400704901

DOI:10.1145/3637528

General Chairs:
Ricardo Baeza-Yates
Northeastern University, USA
,
Francesco Bonchi
CENTAI / Eurecat, Italy

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2024

Check for updates

Author Tags

Qualifiers

Abstract

Conference

KDD '24

Sponsor:

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
378
Total Downloads

Downloads (Last 12 months)378
Downloads (Last 6 weeks)57

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten