Abstract
This paper investigates the computation of transient-optimal policies in discrete dynamic programming. The model, is quite general: it may contain transient as well as nontransient policies. and the transition matrices are not necessarily substochastic.
A functional equation for the so-called transient-value-vector is derived and the concept of superharmonicity is introduced. This concept provides the linear program to compute the transientvalue-vector and a transient-optimal policy.
We also discuss the elimination of suboptimal actions, the solution of problems with additional constraints, and the computation of an efficient policy for a multiple objective dynamic programming problem.
Similar content being viewed by others
References
D. Blackwell, “Discrete dynamic programming”,Annals of Mathematical Statistics 33 (1962) 719–726.
E. V. Denardo, “Contraction mappings in the theory underlying dynamic programming”.SIAM Review 9 (1967) 165–177.
E.V. Denardo and U.G. Rothblum, “Optimal stopping, exponential utility and linear programming”,Mathematical Programming 16 (1979) 228–244.
F. D'Epenoux “Sur un problème de production et de stockage dans l'aléatoire”,Revue Française de Recherche Operationelle 14 (1960) 3–16.
C Derman,Finite state Markovian decision processes (Academic Press, New York, 1970).
D. K. Faddeev and V.N. Faddeeva,Computational methods of linear algebra (Freeman, San Francisco, 1963).
K. M. van Hee, A. Hordijk and J. van der Wal, “Successive approximations for convergent dynamic programming”, in: H.C. Tijms and J. Wessels, eds.,Markov decision theory (Mathematical Centre Tract no. 93) (Mathematical Centre. Amsterdam, 1977) pp. 183–211.
A. Hordijk,Dynamic programming and Markov potential theory (Mathematical Centre Tract. 51, Mathematical Centre, Amsterdam, 1974).
A. Hordijk and L.C.M. Kallenberg, “Linear programming and Markov decision chains.Management Science 25 (1979) 352–362.
A. Hordijk and L.C.M. Kallenberg, “Linear programming and Markov games”, in O. Moeschlin and D. Pallaschke, eds,Game theory and mathematical economics (North-Holland, Amsterdam, 1981) pi. 291–320.
L.C.M. Kalenberg, “Linear programming and finite Markovian control problems”, Thesis, University of Leiden (Mathematical Centre, Amsterdam, 1980).
S. Karlin,Mathematical methods and theory in games, programming and economics, volume I (Addison-Wesley, Reading, Massachusetts, 1959).
T. Kato,Perturbartion theory for linear operators (Springer, New York, 1966).
M. Loève,Probability theory (Van Nostrand, New York, 1955).
J MacQueen, “A test for suboptimal actions in Markovian decision problems”,Operations Research 15 (1967) 559–561.
B.L. Miller and A.F. Veinott Jr., “Discrete dynamic programming with a small interest rate”,Annals of Mathematical Statistics 40 (1969) 366–370.
J.A.E.E. van Nunen and J. Wessels. “Markov decision processes with unbounded rewards”, in: H.C. Tijms and J. Wessels, eds.,Markov decision theory (Mathematical Centre Tract no. 93) (Mathematical Centre, Amsterdam, 1977) pp. 1–24.
U.G. Rothblum, “Normalized Markov decision chains. I: Sensitive discount optimality”.Operations Research 23 (1975) 785–795.
U.G. Rothblum, “Normalized Markov decision chains. II: Optimality of nonstationary policies”,SIAM Journal of Control and Optimization 15 (1977) 221–232.
A.F. Veinott Jr., Discrete dynamic programming with sensitive discount optimality criteria”,Annals of Mathematical Statistics 40 (1969) 1636–1660.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hordijk, A., Kallenberg, L.C.M. Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints. Mathematical Programming 30, 46–70 (1984). https://doi.org/10.1007/BF02591798
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02591798