Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints

  • Published:
Mathematical Programming Submit manuscript

Abstract

This paper investigates the computation of transient-optimal policies in discrete dynamic programming. The model, is quite general: it may contain transient as well as nontransient policies. and the transition matrices are not necessarily substochastic.

A functional equation for the so-called transient-value-vector is derived and the concept of superharmonicity is introduced. This concept provides the linear program to compute the transientvalue-vector and a transient-optimal policy.

We also discuss the elimination of suboptimal actions, the solution of problems with additional constraints, and the computation of an efficient policy for a multiple objective dynamic programming problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • D. Blackwell, “Discrete dynamic programming”,Annals of Mathematical Statistics 33 (1962) 719–726.

    MathSciNet  MATH  Google Scholar 

  • E. V. Denardo, “Contraction mappings in the theory underlying dynamic programming”.SIAM Review 9 (1967) 165–177.

    Article  MATH  MathSciNet  Google Scholar 

  • E.V. Denardo and U.G. Rothblum, “Optimal stopping, exponential utility and linear programming”,Mathematical Programming 16 (1979) 228–244.

    Article  MathSciNet  Google Scholar 

  • F. D'Epenoux “Sur un problème de production et de stockage dans l'aléatoire”,Revue Française de Recherche Operationelle 14 (1960) 3–16.

    Google Scholar 

  • C Derman,Finite state Markovian decision processes (Academic Press, New York, 1970).

    MATH  Google Scholar 

  • D. K. Faddeev and V.N. Faddeeva,Computational methods of linear algebra (Freeman, San Francisco, 1963).

    Google Scholar 

  • K. M. van Hee, A. Hordijk and J. van der Wal, “Successive approximations for convergent dynamic programming”, in: H.C. Tijms and J. Wessels, eds.,Markov decision theory (Mathematical Centre Tract no. 93) (Mathematical Centre. Amsterdam, 1977) pp. 183–211.

    Google Scholar 

  • A. Hordijk,Dynamic programming and Markov potential theory (Mathematical Centre Tract. 51, Mathematical Centre, Amsterdam, 1974).

    MATH  Google Scholar 

  • A. Hordijk and L.C.M. Kallenberg, “Linear programming and Markov decision chains.Management Science 25 (1979) 352–362.

    MATH  MathSciNet  Google Scholar 

  • A. Hordijk and L.C.M. Kallenberg, “Linear programming and Markov games”, in O. Moeschlin and D. Pallaschke, eds,Game theory and mathematical economics (North-Holland, Amsterdam, 1981) pi. 291–320.

    Google Scholar 

  • L.C.M. Kalenberg, “Linear programming and finite Markovian control problems”, Thesis, University of Leiden (Mathematical Centre, Amsterdam, 1980).

    Google Scholar 

  • S. Karlin,Mathematical methods and theory in games, programming and economics, volume I (Addison-Wesley, Reading, Massachusetts, 1959).

    Google Scholar 

  • T. Kato,Perturbartion theory for linear operators (Springer, New York, 1966).

    Google Scholar 

  • M. Loève,Probability theory (Van Nostrand, New York, 1955).

    MATH  Google Scholar 

  • J MacQueen, “A test for suboptimal actions in Markovian decision problems”,Operations Research 15 (1967) 559–561.

    MATH  Google Scholar 

  • B.L. Miller and A.F. Veinott Jr., “Discrete dynamic programming with a small interest rate”,Annals of Mathematical Statistics 40 (1969) 366–370.

    MathSciNet  MATH  Google Scholar 

  • J.A.E.E. van Nunen and J. Wessels. “Markov decision processes with unbounded rewards”, in: H.C. Tijms and J. Wessels, eds.,Markov decision theory (Mathematical Centre Tract no. 93) (Mathematical Centre, Amsterdam, 1977) pp. 1–24.

    Google Scholar 

  • U.G. Rothblum, “Normalized Markov decision chains. I: Sensitive discount optimality”.Operations Research 23 (1975) 785–795.

    Article  MATH  MathSciNet  Google Scholar 

  • U.G. Rothblum, “Normalized Markov decision chains. II: Optimality of nonstationary policies”,SIAM Journal of Control and Optimization 15 (1977) 221–232.

    Article  MATH  MathSciNet  Google Scholar 

  • A.F. Veinott Jr., Discrete dynamic programming with sensitive discount optimality criteria”,Annals of Mathematical Statistics 40 (1969) 1636–1660.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hordijk, A., Kallenberg, L.C.M. Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints. Mathematical Programming 30, 46–70 (1984). https://doi.org/10.1007/BF02591798

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02591798

Key words

Navigation