Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Partially Observed Markov Decision Process Multiarmed Bandits---Structural Results

Published: 01 May 2009 Publication History

Abstract

This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby considerably simplifying the computational cost. A suboptimal value iteration algorithm based on Lovejoy's approximation is presented. We then show that for the case of totally positive of order 2 (TP2) transition probability matrices and monotone likelihood ratio (MLR) ordered observation probabilities, the Gittins index is MLR increasing in the information state. Algorithms that exploit this structure are then presented.

References

[1]
Bertsimas, D. and Nino-Mora, J., "Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems," Math. Oper. Res., v21, pp. 257-305, 1996.
[2]
Cassandra, A. R., "Tony's POMDP page,"
[3]
Cassandra, A. R., "Exact and approximate algorithms for partially observed Markov decision process," 1998.
[4]
Cassandra, A. R., Littman, M. L. and Zhang, N. L., "Incremental pruning: A simple fast exact method for partially observed Markov decision processes," Proc. 13th Annual Conf. Uncertainty in Artificial Intelligence (UAI-97), Morgan Kaufmann, San Francisco, 1997.
[5]
Gittins, J. C., Multi-Armed Bandit Allocation Indices, John Wiley and Sons, New York, 1989.
[6]
Kijima, M., Markov Processes for Stochastic Modelling, Chapman and Hall, London, 1997.
[7]
Krishnamurthy, V. and Djonin, D., "Structured threshold policies for dynamic sensor scheduling---A partially observed Markov decision process approach," IEEE Trans. Signal Processing, v55, pp. 4938-4957, 2007.
[8]
Krishnamurthy, V., Vázquez-Abad, F. J. and Martin, K., "Implementation of gradient estimation to a constrained Markov decision problem," 42nd IEEE Conf. Decision and Control, IEEE Press, Piscataway, NJ, 2003.
[9]
Kumar, P. R. and Varaiya, P., Stochastic Systems---Estimation, Identification and Adaptive Control, Prentice-Hall, Upper Saddle River, NJ, 1986.
[10]
Kushner, H. J. and Yin, G., "Stochastic approximation algorithms for parallel and distributed processing," Stochastics, v22, pp. 219-250, 1987.
[11]
Le Cadre, J. P. and Trémois, O., "Bearings-only tracking for maneuvering sources," IEEE Trans. Aerospace Electronic Systems, v34, pp. 179-193, 1998.
[12]
Lovejoy, W. S., "Some monotonicity results for partially observed Markov decision processes," Oper. Res., v35, pp. 736-743, 1987.
[13]
Lovejoy, W. S., "Computationally feasible bounds for partially observed Markov decision processes," Oper. Res., v39, pp. 162-175, 1991.
[14]
Muller, A. and Stoyan, D., Comparison Methods for Stochastic Models and Risk, John Wiley & Sons, Chichester, UK, 2002.
[15]
Papadimitriou, C. H., Computational Complexity, Addison-Wesley, Reading, MA, 1995.
[16]
Rabiner, L. R., "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, v77, pp. 257-285, 1989.
[17]
Ross, S., Introduction to Stochastic Dynamic Programming, Academic Press, San Diego, 1983.
[18]
Smallwood, R. D. and Sondik, E. J., "Optimal control of partially observable Markov processes over a finite horizon," Oper. Res., v21, pp. 1071-1088, 1973.
[19]
Spall, J., Introduction to Stochastic Search and Optimization, John Wiley and Sons, New York, 2003.
[20]
Whittle, P., "Multi-armed bandits and the Gittins index," J. R. Statist. Soc. B, v42, pp. 143-149, 1980.

Cited By

View all
  1. Partially Observed Markov Decision Process Multiarmed Bandits---Structural Results

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Mathematics of Operations Research
    Mathematics of Operations Research  Volume 34, Issue 2
    May 2009
    256 pages

    Publisher

    INFORMS

    Linthicum, MD, United States

    Publication History

    Published: 01 May 2009
    Received: 24 July 2008

    Author Tags

    1. likelihood ratio ordering
    2. monotone policies
    3. multiarmed bandits
    4. opportunistic scheduling
    5. partially observed Markov decision process
    6. stochastic approximation algorithm

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Statistical inference on multi-armed bandits with delayed feedbackProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619705(31328-31352)Online publication date: 23-Jul-2023
    • (2022)Morphing for Consumer DynamicsMarketing Science10.1287/mksc.2021.134641:4(769-794)Online publication date: 1-Jul-2022
    • (2021)Game of ThronesMathematics of Operations Research10.1287/moor.2020.105146:1(159-178)Online publication date: 1-Feb-2021
    • (2020)Restless Hidden Markov Bandit with Linear Rewards2020 59th IEEE Conference on Decision and Control (CDC)10.1109/CDC42340.2020.9304511(1183-1189)Online publication date: 14-Dec-2020
    • (2017)Customer Acquisition via Display Advertising Using Multi-Armed Bandit ExperimentsMarketing Science10.1287/mksc.2016.102336:4(500-522)Online publication date: 1-Jul-2017
    • (2015)Towards a secure medium access control protocol for cluster-based underwater wireless sensor networksInternational Journal of Distributed Sensor Networks10.1155/2015/3254742015(40-40)Online publication date: 1-Jan-2015
    • (2010)Distributed node selection for threshold key management with intrusion detection in mobile ad hoc networksWireless Networks10.1007/s11276-010-0250-616:8(2169-2178)Online publication date: 1-Nov-2010

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media