Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces

Published: 01 February 2007 Publication History

Abstract

This paper deals with continuous-time Markov decision processes in Polish spaces, under an expected discounted reward criterion. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We first give conditions on the controlled system's primitive data. Under these conditions we prove that the transition functions of possibly nonhomogeneous continuous-time Markov processes are regular by using Feller's construction approach to such transition functions. Then, under additional continuity and compactness conditions, we ensure the existence of optimal stationary policies by using the technique of extended infinitesimal operators associated with the transition functions, and also provide a recursive way to compute (or at least to approximate) the optimal reward values. Finally, we use examples to illustrate our results and the gap between our conditions and those in the previous literature.

References

[1]
Anderson, W. J., Continuous-Time Markov Chains, Springer-Verlag, New York, 1991.
[2]
Bailey, N. T. J., The Mathematical Theory of Infectious Diseases, Griffin, London, UK, 1975.
[3]
Bellman, R., Dynamic Programming, Princeton University Press, Princeton, NJ, 1957.
[4]
Chen, M. F., From Markov Chains to Non-Equilibrium Particle Systems, World Scientific Publishing Co. Inc., River Edge, NJ, 2004.
[5]
Doshi, B. T., "Continuous-time control of Markov processes on an arbitrary state space: Discounted rewards," Ann. Statist., v4, pp. 1219-1235, 1976.
[6]
Feinberg, E. A., "Continuous-time jump Markov decision processes: A discrete-event approach," Math. Oper. Res., v29, pp. 492-524, 2004.
[7]
Feller, W., "On the integro-differential equations of purely discontinuous Markoff processes," Trans. Amer. Math. Soc., v48, pp. 488-515, 1940.
[8]
Fleming, W. H. and Soner, H. M., Controlled Markov Processes and Viscosity Solutions, Springer-Verlag, Berlin, Germany, 1993.
[9]
Gihman, I. I. and Skorohod, A. V., Controlled Stochastic Processes, Springer-Verlag, New York, Heidelberg, Berlin, Germany, 1979.
[10]
Guo, X. P. and Cao, X.-R., "Optimal control of ergodic continuous-time Markov chains with average sample-path rewards," SIAM J. Control Optim., v44, pp. 29-48, 2005.
[11]
Guo, X. P. and Hernández-Lerma, O., "Continuous-time controlled Markov chains," Ann. Appl. Probab., v13, pp. 363-388, 2003.
[12]
Guo, X. P. and Hernández-Lerma, O., "Continuous-time controlled Markov chains with discounted rewards," Acta Appl. Math., v79, pp. 195-216, 2003.
[13]
Guo, X. P. and Liu, K., "A note on optimality conditions for continuous-time Markov decision processes with average cost criterion," IEEE Trans. Automat. Control, v46, pp. 1984-1989, 2001.
[14]
Guo, X. P. and Zhu, W. P., "Denumerable state continuous-time Markov decision processes with unbounded cost and transition rates under the discounted criterion," J. Appl. Probab., v39, pp. 233-250, 2002.
[15]
Haviv, M. and Puterman, M. L., "Bias optimality in controlled queuing systems," J. Appl. Probab., v35, pp. 136-150, 1998.
[16]
Hernández-Lerma, O. and Govindan, T. E., "Nonstationary continuous-time Markov control processes with discounted costs on infinite horizon," Acta Appl. Math., v67, pp. 277-293, 2001.
[17]
Hernández-Lerma, O. and Lasserre, J. B., Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1999.
[18]
Hitchcock, S. E., "Extinction probabilities in predator-prey models," J. Appl. Probab., v23, pp. 1-13, 1986.
[19]
Holley, R. and Liggett, T. M., "Generalized Potlach and smoothing processes," Z. Wahrsch. Verw. Gebiete, v55, pp. 165-195, 1981.
[20]
Howard, R. A., Dynamic Programming and Markov Processes, Wiley, New York, 1960.
[21]
Kakumanu, P., "Continuously discounted Markov decision models with countable state and action spaces," Ann. Math. Statist., v42, pp. 919-926, 1971.
[22]
Lefèvre, C., "Optimal control of a birth and death epidemic process," Oper. Res., v29, pp. 971-982, 1981.
[23]
Lembersky, M. R., "On maximal rewards and ε-optimal policies in continuous time Markov chains," Ann. Statist., v2, pp. 159-169, 1974.
[24]
Lewis, M. E. and Puterman, M. L., "A note on bias optimality in controlled queueing systems," J. Appl. Probab., v37, pp. 300-305, 2000.
[25]
Lewis, M. E. and Puterman, M. L., "A probabilistic analysis of bias optimality in unichain Markov decision processes," IEEE Trans. Automat. Control, v46, pp. 96-100, 2001.
[26]
Liggett, T. M. and Spitzer, F., "Ergodic theorems for coupled random walks and other systems with locally interacting components," Z. Wahrsch. Verw. Gebiete, v56, pp. 443-468, 1981.
[27]
Lippman, S. A., "Applying a new device in the optimization of exponential queueing system," Oper. Res., v23, pp. 667-710, 1975.
[28]
Lund, R. B., Meyn, S. P. and Tweedie, R. L., "Computable exponential convergence rates for stochastically ordered Markov processes," Ann. Appl. Probab., v6, pp. 218-237, 1996.
[29]
Meyn, S. P. and Tweedie, R. L., "Stability of Markovian processes III: Foster-Lyapunov criteria for continuous-time processes," Adv. Appl. Probab., v25, pp. 518-548, 1993.
[30]
Miller, R. L., "Finite state continuous time Markov decision processes with an infinite planning horizon," J. Math. Anal. Appl., v22, pp. 552-569, 1968.
[31]
Puterman, M. L., Markov Decision Processes, Wiley, New York, 1994.
[32]
Rajarshi, M. B., "Simple proofs of two threshold theorems for a general stochastic epidemic," J. Appl. Probab., v18, pp. 721-724, 1981.
[33]
Reuter, G. E. H., "Competition processes," Proc. Fourth Berkeley Sympos. Math. Statist. Probab., v2, pp. 421-430, 1961.
[34]
Ridler-Rowe, C. J., "Extinction times for certain predator-prey models," J. Appl. Probab., v25, pp. 612-616, 1988.
[35]
Sennott, L. I., Stochastic Dynamic Programming and the Control of Queueing System, Wiley, New York, 1999.
[36]
Yushkevich, A. A. and Feinberg, E. A., "On homogeneous Markov model with continuous time and finite or countable state space," Theory Probab. Appl., v24, pp. 156-161, 1979.

Cited By

View all
  • (2019)An Approximation Approach for the Deviation Matrix of Continuous-Time Markov Processes with Application to Markov Decision TheoryOperations Research10.1287/opre.1090.078658:4-Part-1(918-932)Online publication date: 5-Jan-2019
  • (2019)Discounted Continuous-Time Markov Decision Processes with ConstraintsMathematics of Operations Research10.1287/moor.1100.047736:1(105-132)Online publication date: 1-Jan-2019
  • (2017)The risk probability criterion for discounted continuous-time Markov decision processesDiscrete Event Dynamic Systems10.1007/s10626-017-0257-627:4(675-699)Online publication date: 1-Dec-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Mathematics of Operations Research
Mathematics of Operations Research  Volume 32, Issue 1
February 2007
256 pages

Publisher

INFORMS

Linthicum, MD, United States

Publication History

Published: 01 February 2007

Author Tags

  1. Q-process
  2. discounted reward
  3. general state space
  4. optimal stationary policy

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)An Approximation Approach for the Deviation Matrix of Continuous-Time Markov Processes with Application to Markov Decision TheoryOperations Research10.1287/opre.1090.078658:4-Part-1(918-932)Online publication date: 5-Jan-2019
  • (2019)Discounted Continuous-Time Markov Decision Processes with ConstraintsMathematics of Operations Research10.1287/moor.1100.047736:1(105-132)Online publication date: 1-Jan-2019
  • (2017)The risk probability criterion for discounted continuous-time Markov decision processesDiscrete Event Dynamic Systems10.1007/s10626-017-0257-627:4(675-699)Online publication date: 1-Dec-2017
  • (2012)The Transformation Method for Continuous-Time Markov Decision ProcessesJournal of Optimization Theory and Applications10.1007/s10957-012-0015-8154:2(691-712)Online publication date: 1-Aug-2012
  • (2012)Continuous-Time Markov Decision Processes with State-Dependent Discount FactorsActa Applicandae Mathematicae: an international survey journal on applying mathematics and mathematical applications10.1007/s10440-012-9669-3121:1(5-27)Online publication date: 1-Oct-2012

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media