Abstract
It is known that the value function in an unconstrained Markov decision process with finitely many states and actions is a piecewise rational function in the discount factor a, and that the value function can be expressed as a Laurent series expansion about α = 1 for α close enough to 1. We show in this paper that this property also holds for the value function of Markov decision processes with additional constraints. More precisely, we show by a constructive proof that there are numbers O = αo <α1 <... < αm−1 < αm = 1 such that for everyj = 1, 2, ...,m − 1 either the problem is not feasible for all discount factors α in the open interval (αj−1, αj) or the value function is a rational function in a in the closed interval [αj−1, αj]. As a consequence, if the constrained problem is feasible in the neighborhood of α = 1, then the value function has a Laurent series expansion about α = 1. Our proof technique for the constrained case provides also a new proof for the unconstrained case.
Similar content being viewed by others
References
Altaian E (1994) Denumerable constrained Markov decision problems and finite approximations. Mathematics of Operations Research 19:169–191
Altman E (1995) Constrained Markov decision processes. INRIA Report RR-2574
Altman E, Gaitsgory VA (1993) Stability and singular perturbations in constrained Markov decision problems. IEEE Transactions on Automatic Control 38:971–975
Altman E, Shwartz A (1989) Optimal priority assignment: A time sharing approach. IEEE Transactions on Automatic Control 34:1089–1102
Altman E, Shwartz A (1991) Sensitivity of constrained Markov decision problems. Annals of Operations Research 32:1–22
Altman E, Shwartz A (1993) Time-sharing policies for controlled Markov chains. Operations Research 41:1116–1124
Bellman R (1957) Dynamic programming. Princeton University Press
Beutler FJ, Ross KW (1985) Optimal policies for controlled Markov chains with a constraint. Journal of Mathematical Analysis and Applications 112:236–252
Beutler FJ, Ross KW (1986) Time-average optimal constrained semi-Markov decision processes. Advances of Applied Probability 18:341–359
Bewley T, Kohlberg E (1976) The asymptotic theory of stochastic games. Mathematics of Operations Research 1:197–208
Blackwell D (1962) Discrete dynamic programming. Annals of Mathematical Statistics 33:719–726
Denardo EV (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Review 9:165–177
D'Epenoux F (1960) Sur un probleme de production et de stockage dans l'aleatoire. Revue Francaise de recherche Operationelle 14:3–16
Derman C (1970) Finite state Markovian decision processes. Academic Press, New York
Feinberg EA (1994) Constrained semi-Markov decision processes with average rewards, ZOR — Mathematical Methods of Operations Research 39:257–288
Hordijk A, Dekker R, Kallenberg LCM (1985) Sensitivity-analysis in discounted Markovian decision problems. OR Spektrum 7:143–151
Hordijk A, Kallenberg LCM (1984) Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints. Mathematical Programming 30:46–70
Hordijk A, Kallenberg LCM (1984) Constrained undiscounted stochastic dynamic programming. Mathematics of Operations Research 9:276–289
Hordijk A, Spieksma F (1989) Constrained admission control to a queueing system. Advances of Applied Probability 21:409–431
Howard RA (1960) Dynamic programming and Markov processes. M.I.T. Press, Cambridge, Massachusetts
Kallenberg LCM (1983) Linear programming and finite Markovian control problems. Mathematical Centre Tracts 148, Mathematical Centre, Amsterdam
Miller B, Veinott AF Jr. (1969) Discrete dynamic programming with a small interest rate. Annals of Mathematical Statistics 40:366–370
Nain P, Ross KW (1986) Optimal priority assignment with hard constraint. IEEE Transactions on Automatic Control 31:883–888
Ross KW, Varadarajan R (1989) Markov decision processes with sample path constraints: The communicating case. Operations Research 37:780–790
Ross KW, Varadarajan R (1991) Multichain Markov decision processes with a sample path constraint: A decomposition approach. Mathematics of Operations Research 16:195–207
Royden HL (1968) Real analysis, second edition. MacMillan
Sennott LI (1991) Constrained discounted Markov decision chains. Probability in the Engineering and Informational Sciences 5:463–475
Sennott LI (1993) Constrained average cost Markov decision chains. Probability in the Engineering and Informational Sciences 7:69–84
Shapley LS (1953) Stochastic games. Proceedings of the National Academy of Sciences 39:1095–1100
Smallwood RD (1966) Optimum policy regions for Markov processes with discounting. Operations Research 14:658–669
Tidbal M, Altman E Continuity of optimal values and solutions of convex optimization, and constrained control of Markov chains, submitted to SIAM
Veinott AF Jr. (1966) On finding optimal policies in discrete dynamic programming. Annals of Mathematical Statistics 37:1284–1294
Zoutendijk G (1976) Mathematical programming methods. North-Holland, Amsterdam
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Altman, E., Hordijk, A. & Kallenberg, L.C.M. On the value function in constrained control of Markov chains. Mathematical Methods of Operations Research 44, 387–399 (1996). https://doi.org/10.1007/BF01193938
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF01193938