Maximal Objectives in the Multiarmed Bandit with Applications
Index Terms
- Maximal Objectives in the Multiarmed Bandit with Applications
Risk-Sensitive and Risk-Neutral Multiarmed Bandits
For the multiarmed bandit, the classic result is probabilistic: each state of each bandit (Markov chain with rewards) has an index that is determined by an optimal stopping time for that state's bandit, and expected discounted income is maximized by ...
Partially Observed Markov Decision Process Multiarmed Bandits---Structural Results
This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby ...
PAC-bayes-bernstein inequality for martingales and its application to multiarmed bandits
OTEAE'11: Proceedings of the 2011 International Conference on On-line Trading of Exploration and Exploitation 2 - Volume 26We develop a new tool for data-dependent analysis of the exploration-exploitation trade-off in learning under limited feedback. Our tool is based on two main ingredients. The first ingredient is a new concentration inequality that makes it possible to ...
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Published In
Linthicum, MD, United States
Publication History
Author Tags
- Research-article
Other Metrics
Bibliometrics & Citations
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0