article

Solution Procedures for Partially Observed Markov Decision Processes

Authors:

Chelsea C. White,

William T. SchererAuthors Info & Claims

Operations Research, Volume 37, Issue 5

Pages 791 - 797

https://doi.org/10.1287/opre.37.5.791

Published: 01 October 1989 Publication History

Abstract

We present three algorithms to solve the infinite horizon, expected discounted total reward partially observed Markov decision process POMDP. Each algorithm integrates a successive approximations algorithm for the POMDP due to A. Smallwood and E. Sondik with an appropriately generalized numerical technique that has been shown to reduce CPU time until convergence for the completely observed case. The first technique is reward revision. The second technique is reward revision integrated with modified policy iteration. The third is a standard extrapolation. A numerical study indicates the potentially significant computational value of these algorithms.

References

[1]

BERTSEKAS, D. P. 1976. Dynamic Programming and Stochastic Control. Academic Press, New York.

Digital Library

Google Scholar

[2]

MONAHAN, G. E. 1982. A Survey of Partially Observable Markov Decision Processes. Mgmt. Sci. 28, 1-16.

Digital Library

Google Scholar

[3]

PORTEUS, E. L. 1971. Some Bounds for Discounted Sequential Decision Processes. Mgmt. Sci. 18, 7-11.

Crossref

Google Scholar

[4]

PUTERMAN, M. L., AND M. C. SHIN. 1978. Modified Policy Iteration Algorithms for Discounted Markov Decision Processes. Mgmt. Sci. 24, 1127-1138.

Digital Library

Google Scholar

[5]

PUTERMAN, M. L., AND M. C. SHIN. 1982. Action Elimination Procedure for Modified Policy Iteration Algorithms. Opns. Res. 30, 301-308.

Digital Library

Google Scholar

[6]

SMALLWOOD, R., AND E. J. SONDIK. 1973. The Optimal Control of Partially Observable Markov Processes Over a Finite Horizon. Opns. Res. 21, 1071-1088.

Digital Library

Google Scholar

[7]

SONDIK, E. J. 1978. The Optimal Control of Partially Observable Markov Processes Over the Infinite Horizon: Discounted Costs. Opns. Res. 26, 282-304.

Digital Library

Google Scholar

[8]

VAN NUNEN, J. A. E. E. 1976. A Set of Successive Approximation Methods for Discounted Markovian Decision Problems. Z. Opern. Res. 20, 203-208.

Google Scholar

[9]

WHITE, C. C, AND D. HARRINGTON. 1980. Application of Jensen's Inequality for Adaptive Suboptimal Design. J. Optim. Theory Appl. 32, 89-100.

Crossref

Google Scholar

[10]

WHITE, C. C, L. C. THOMAS AND W. T. SCHERER. 1985. Successive Approximations Based on Reward Revision. Opns. Res. 33, 1299-1315.

Digital Library

Google Scholar

Cited By

View all

Zhang JZhang QZhao XKan J(2022)Boosting denoisers with reinforcement learning for image restorationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-06840-326:7(3261-3272)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s00500-022-06840-3
Zhang H(2010)Partially Observable Markov Decision ProcessesOperations Research10.1287/opre.1090.069758:1(214-228)Online publication date: 1-Jan-2010
https://dl.acm.org/doi/10.1287/opre.1090.0697
Gosavi A(2009)Reinforcement LearningINFORMS Journal on Computing10.1287/ijoc.1080.030521:2(178-192)Online publication date: 1-Apr-2009
https://dl.acm.org/doi/10.1287/ijoc.1080.0305
Show More Cited By

Solution Procedures for Partially Observed Markov Decision Processes

Recommendations

A New Specification of the Multichain Policy Iteration Algorithm in Undiscounted Markov Renewal Programs

We consider the Policy Iteration Algorithm for undiscounted Markov Renewal Programs. Previous specifications of the policy evaluation part of this algorithm all required the analysis of the chain structure for each policy generated. The purpose of this ...
Computationally Feasible Bounds for Partially Observed Markov Decision Processes

<P>A partially observed Markov decision process POMDP is a sequential decision problem where information concerning parameters of interest is incomplete, and possible actions include sampling, surveying, or otherwise collecting additional information. ...
Approximate solution methods for partially observable markov and semi-markov decision processes

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Operations Research Volume 37, Issue 5

October 1989

163 pages

ISSN:0030-364X

Issue’s Table of Contents

Publisher

INFORMS

Linthicum, MD, United States

Publication History

Published: 01 October 1989

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhang JZhang QZhao XKan J(2022)Boosting denoisers with reinforcement learning for image restorationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-022-06840-326:7(3261-3272)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s00500-022-06840-3
Zhang H(2010)Partially Observable Markov Decision ProcessesOperations Research10.1287/opre.1090.069758:1(214-228)Online publication date: 1-Jan-2010
https://dl.acm.org/doi/10.1287/opre.1090.0697
Gosavi A(2009)Reinforcement LearningINFORMS Journal on Computing10.1287/ijoc.1080.030521:2(178-192)Online publication date: 1-Apr-2009
https://dl.acm.org/doi/10.1287/ijoc.1080.0305
Mahajan ATeneketzis D(2009)Optimal design of sequential real-time communication systemsIEEE Transactions on Information Theory10.1109/TIT.2009.203046255:11(5317-5338)Online publication date: 1-Nov-2009
https://dl.acm.org/doi/10.1109/TIT.2009.2030462
Aviv YPazgal A(2005)A Partially Observed Markov Decision Process for Dynamic PricingManagement Science10.1287/mnsc.1050.039351:9(1400-1416)Online publication date: 1-Sep-2005
https://dl.acm.org/doi/10.1287/mnsc.1050.0393
Treharne JSox C(2002)Adaptive Inventory Control for Nonstationary Demand and Partial InformationManagement Science10.5555/2773130.277313248:5(607-624)Online publication date: 1-May-2002
https://dl.acm.org/doi/10.5555/2773130.2773132
Zhang NZhang W(2001)Speeding up the convergence of value iteration in partially observable Markov decision processesJournal of Artificial Intelligence Research10.5555/1622394.162239614:1(29-51)Online publication date: 1-Feb-2001
https://dl.acm.org/doi/10.5555/1622394.1622396
Boutilier CDean THanks S(1999)Decision-theoretic planningJournal of Artificial Intelligence Research10.5555/3013545.301354611:1(1-94)Online publication date: 1-Jul-1999
https://dl.acm.org/doi/10.5555/3013545.3013546
Boutilier CGoldszmidt MSabata B(1999)Continuous value function approximation for sequential bidding policiesProceedings of the Fifteenth conference on Uncertainty in artificial intelligence10.5555/2073796.2073806(81-90)Online publication date: 30-Jul-1999
https://dl.acm.org/doi/10.5555/2073796.2073806
Cassandra ALittman MZhang N(1997)Incremental pruningProceedings of the Thirteenth conference on Uncertainty in artificial intelligence10.5555/2074226.2074233(54-61)Online publication date: 1-Aug-1997
https://dl.acm.org/doi/10.5555/2074226.2074233
Show More Cited By

Abstract

References

Cited By

Recommendations

A New Specification of the Multichain Policy Iteration Algorithm in Undiscounted Markov Renewal Programs

Computationally Feasible Bounds for Partially Observed Markov Decision Processes

Approximate solution methods for partially observable markov and semi-markov decision processes

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations