Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Solution Procedures for Partially Observed Markov Decision Processes

Published: 01 October 1989 Publication History

Abstract

We present three algorithms to solve the infinite horizon, expected discounted total reward partially observed Markov decision process POMDP. Each algorithm integrates a successive approximations algorithm for the POMDP due to A. Smallwood and E. Sondik with an appropriately generalized numerical technique that has been shown to reduce CPU time until convergence for the completely observed case. The first technique is reward revision. The second technique is reward revision integrated with modified policy iteration. The third is a standard extrapolation. A numerical study indicates the potentially significant computational value of these algorithms.

References

[1]
BERTSEKAS, D. P. 1976. Dynamic Programming and Stochastic Control. Academic Press, New York.
[2]
MONAHAN, G. E. 1982. A Survey of Partially Observable Markov Decision Processes. Mgmt. Sci. 28, 1-16.
[3]
PORTEUS, E. L. 1971. Some Bounds for Discounted Sequential Decision Processes. Mgmt. Sci. 18, 7-11.
[4]
PUTERMAN, M. L., AND M. C. SHIN. 1978. Modified Policy Iteration Algorithms for Discounted Markov Decision Processes. Mgmt. Sci. 24, 1127-1138.
[5]
PUTERMAN, M. L., AND M. C. SHIN. 1982. Action Elimination Procedure for Modified Policy Iteration Algorithms. Opns. Res. 30, 301-308.
[6]
SMALLWOOD, R., AND E. J. SONDIK. 1973. The Optimal Control of Partially Observable Markov Processes Over a Finite Horizon. Opns. Res. 21, 1071-1088.
[7]
SONDIK, E. J. 1978. The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs. Opns. Res. 26, 282-304.
[8]
VAN NUNEN, J. A. E. E. 1976. A Set of Successive Approximation Methods for Discounted Markovian Decision Problems. Z. Opern. Res. 20, 203-208.
[9]
WHITE, C. C, AND D. HARRINGTON. 1980. Application of Jensen's Inequality for Adaptive Suboptimal Design. J. Optim. Theory Appl. 32, 89-100.
[10]
WHITE, C. C, L. C. THOMAS AND W. T. SCHERER. 1985. Successive Approximations Based on Reward Revision. Opns. Res. 33, 1299-1315.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Operations Research
Operations Research  Volume 37, Issue 5
October 1989
163 pages

Publisher

INFORMS

Linthicum, MD, United States

Publication History

Published: 01 October 1989

Author Tags

  1. dynamic programming: Markov
  2. finite state

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Boosting denoisers with reinforcement learning for image restorationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-06840-326:7(3261-3272)Online publication date: 1-Apr-2022
  • (2010)Partially Observable Markov Decision ProcessesOperations Research10.1287/opre.1090.069758:1(214-228)Online publication date: 1-Jan-2010
  • (2009)Reinforcement LearningINFORMS Journal on Computing10.1287/ijoc.1080.030521:2(178-192)Online publication date: 1-Apr-2009
  • (2009)Optimal design of sequential real-time communication systemsIEEE Transactions on Information Theory10.1109/TIT.2009.203046255:11(5317-5338)Online publication date: 1-Nov-2009
  • (2005)A Partially Observed Markov Decision Process for Dynamic PricingManagement Science10.1287/mnsc.1050.039351:9(1400-1416)Online publication date: 1-Sep-2005
  • (2002)Adaptive Inventory Control for Nonstationary Demand and Partial InformationManagement Science10.5555/2773130.277313248:5(607-624)Online publication date: 1-May-2002
  • (2001)Speeding up the convergence of value iteration in partially observable Markov decision processesJournal of Artificial Intelligence Research10.5555/1622394.162239614:1(29-51)Online publication date: 1-Feb-2001
  • (1999)Decision-theoretic planningJournal of Artificial Intelligence Research10.5555/3013545.301354611:1(1-94)Online publication date: 1-Jul-1999
  • (1999)Continuous value function approximation for sequential bidding policiesProceedings of the Fifteenth conference on Uncertainty in artificial intelligence10.5555/2073796.2073806(81-90)Online publication date: 30-Jul-1999
  • (1997)Incremental pruningProceedings of the Thirteenth conference on Uncertainty in artificial intelligence10.5555/2074226.2074233(54-61)Online publication date: 1-Aug-1997
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media