On the value function in constrained control of Markov chains

Eitan Altman¹,
Arie Hordijk² &
Lodewijk C. M. Kallenberg²

87 Accesses
4 Citations
Explore all metrics

Abstract

It is known that the value function in an unconstrained Markov decision process with finitely many states and actions is a piecewise rational function in the discount factor a, and that the value function can be expressed as a Laurent series expansion about α = 1 for α close enough to 1. We show in this paper that this property also holds for the value function of Markov decision processes with additional constraints. More precisely, we show by a constructive proof that there are numbers O = α_o <α₁ <... < α_m−1 < α_m = 1 such that for everyj = 1, 2, ...,m − 1 either the problem is not feasible for all discount factors α in the open interval (α_j−1, α_j) or the value function is a rational function in a in the closed interval [α_j−1, α_j]. As a consequence, if the constrained problem is feasible in the neighborhood of α = 1, then the value function has a Laurent series expansion about α = 1. Our proof technique for the constrained case provides also a new proof for the unconstrained case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

Constrained Markov decision processes in Borel spaces: from discounted to average optimality

Article 20 June 2016

Constrained Markov Decision Processes with Non-constant Discount Factor

Article Open access 30 May 2024

References

Altaian E (1994) Denumerable constrained Markov decision problems and finite approximations. Mathematics of Operations Research 19:169–191
Google Scholar
Altman E (1995) Constrained Markov decision processes. INRIA Report RR-2574
Altman E, Gaitsgory VA (1993) Stability and singular perturbations in constrained Markov decision problems. IEEE Transactions on Automatic Control 38:971–975
Google Scholar
Altman E, Shwartz A (1989) Optimal priority assignment: A time sharing approach. IEEE Transactions on Automatic Control 34:1089–1102
Google Scholar
Altman E, Shwartz A (1991) Sensitivity of constrained Markov decision problems. Annals of Operations Research 32:1–22
Google Scholar
Altman E, Shwartz A (1993) Time-sharing policies for controlled Markov chains. Operations Research 41:1116–1124
Google Scholar
Bellman R (1957) Dynamic programming. Princeton University Press
Beutler FJ, Ross KW (1985) Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Applications 112:236–252
Google Scholar
Beutler FJ, Ross KW (1986) Time-average optimal constrained semi-Markov decision processes. Advances of Applied Probability 18:341–359
Google Scholar
Bewley T, Kohlberg E (1976) The asymptotic theory of stochastic games. Mathematics of Operations Research 1:197–208
Google Scholar
Blackwell D (1962) Discrete dynamic programming. Annals of Mathematical Statistics 33:719–726
Google Scholar
Denardo EV (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Review 9:165–177
Google Scholar
D'Epenoux F (1960) Sur un probleme de production et de stockage dans l'aleatoire. Revue Francaise de recherche Operationelle 14:3–16
Google Scholar
Derman C (1970) Finite state Markovian decision processes. Academic Press, New York
Google Scholar
Feinberg EA (1994) Constrained semi-Markov decision processes with average rewards, ZOR — Mathematical Methods of Operations Research 39:257–288
Google Scholar
Hordijk A, Dekker R, Kallenberg LCM (1985) Sensitivity-analysis in discounted Markovian decision problems. OR Spektrum 7:143–151
Google Scholar
Hordijk A, Kallenberg LCM (1984) Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints. Mathematical Programming 30:46–70
Google Scholar
Hordijk A, Kallenberg LCM (1984) Constrained undiscounted stochastic dynamic programming. Mathematics of Operations Research 9:276–289
Google Scholar
Hordijk A, Spieksma F (1989) Constrained admission control to a queueing system. Advances of Applied Probability 21:409–431
Google Scholar
Howard RA (1960) Dynamic programming and Markov processes. M.I.T. Press, Cambridge, Massachusetts
Google Scholar
Kallenberg LCM (1983) Linear programming and finite Markovian control problems. Mathematical Centre Tracts 148, Mathematical Centre, Amsterdam
Google Scholar
Miller B, Veinott AF Jr. (1969) Discrete dynamic programming with a small interest rate. Annals of Mathematical Statistics 40:366–370
Google Scholar
Nain P, Ross KW (1986) Optimal priority assignment with hard constraint. IEEE Transactions on Automatic Control 31:883–888
Google Scholar
Ross KW, Varadarajan R (1989) Markov decision processes with sample path constraints: The communicating case. Operations Research 37:780–790
Google Scholar
Ross KW, Varadarajan R (1991) Multichain Markov decision processes with a sample path constraint: A decomposition approach. Mathematics of Operations Research 16:195–207
Google Scholar
Royden HL (1968) Real analysis, second edition. MacMillan
Sennott LI (1991) Constrained discounted Markov decision chains. Probability in the Engineering and Informational Sciences 5:463–475
Google Scholar
Sennott LI (1993) Constrained average cost Markov decision chains. Probability in the Engineering and Informational Sciences 7:69–84
Google Scholar
Shapley LS (1953) Stochastic games. Proceedings of the National Academy of Sciences 39:1095–1100
Google Scholar
Smallwood RD (1966) Optimum policy regions for Markov processes with discounting. Operations Research 14:658–669
Google Scholar
Tidbal M, Altman E Continuity of optimal values and solutions of convex optimization, and constrained control of Markov chains, submitted to SIAM
Veinott AF Jr. (1966) On finding optimal policies in discrete dynamic programming. Annals of Mathematical Statistics 37:1284–1294
Google Scholar
Zoutendijk G (1976) Mathematical programming methods. North-Holland, Amsterdam
Google Scholar

Download references

Author information

Authors and Affiliations

Centre Sophia-Antipolis, INRIA, 2004 Route des Lucioles, B. P. 93, 06902, Sophia-Antipolis Cedex, France
Eitan Altman
Department of Mathematics and Computer Science, Leiden University, P.O. Box 9512, 2300, RA Leiden, The Netherlands
Arie Hordijk & Lodewijk C. M. Kallenberg

Authors

Eitan Altman
View author publications
You can also search for this author in PubMed Google Scholar
Arie Hordijk
View author publications
You can also search for this author in PubMed Google Scholar
Lodewijk C. M. Kallenberg
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Altman, E., Hordijk, A. & Kallenberg, L.C.M. On the value function in constrained control of Markov chains. Mathematical Methods of Operations Research 44, 387–399 (1996). https://doi.org/10.1007/BF01193938

Download citation

Received: 15 October 1995
Issue Date: October 1996
DOI: https://doi.org/10.1007/BF01193938

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

Constrained Markov decision processes in Borel spaces: from discounted to average optimality

Constrained Markov Decision Processes with Non-constant Discount Factor

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Subscribe and save

Buy Now

Navigation

On the value function in constrained control of Markov chains

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

Constrained Markov decision processes in Borel spaces: from discounted to average optimality

Constrained Markov Decision Processes with Non-constant Discount Factor

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Subscribe and save

Buy Now

Search

Navigation