Non-stationary bandits with auto-regressive temporal dependency
Abstract
References
Recommendations
A new look at dynamic regret for non-stationary stochastic bandits
We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of its dynamic regret, which ...
Learning Contextual Bandits in a Non-stationary Environment
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalMulti-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually assume a ...
Non-stationary bandits with knapsacks
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing SystemsIn this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
Qualifiers
- Research-article
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0