research-article

On dynamic programming decompositions of static risk measures in Markov decision processes

AUTHORs:

Mohammad Ghavamzadeh,

Marek PetrikAuthors Info & Claims

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Article No.: 2254, Pages 51734 - 51757

Published: 30 May 2024 Publication History

Abstract

Optimizing static risk-averse objectives in Markov decision processes is difficult because they do not admit standard dynamic programming equations common in Reinforcement Learning (RL) algorithms. Dynamic programming decompositions that augment the state space with discrete risk levels have recently gained popularity in the RL community. Prior work has shown that these decompositions are optimal when the risk level is discretized sufficiently. However, we show that these popular decompositions for Conditional-Value-at-Risk (CVaR) and Entropic-Value-at-Risk (EVaR) are inherently suboptimal regardless of the discretization level. In particular, we show that a saddle point property assumed to hold in prior literature may be violated. However, a decomposition does hold for Value-at-Risk and our proof demonstrates how this risk measure differs from CVaR and EVaR. Our findings are significant because risk-averse algorithms are used in high-stakes environments, making their correctness much more critical.

Supplementary Material

Additional material (3666122.3668376_supp.pdf)

Supplemental material.

Download
370.19 KB

References

[1]

Mohamadreza Ahmadi, Xiaobin Xiong, and Aaron D Ames. Risk-averse control via CVaR barrier functions: Application to bipedal robot locomotion. IEEE Control Systems Letters, 6:878-883, 2021.

[2]

A. Ahmadi-Javid. Entropic Value-at-Risk: A new coherent risk measure. Journal of Optimization Theory and Applications, 155(3):1105-1123, 2012.

Digital Library

[3]

Philippe Artzner, Freddy Delbaen, Jean-Marc Eber, and David Heath. Coherent measures of risk. Mathematical Finance, 9(3):203-228, 1999.

[4]

Basel Committee on Banking Supervision. Minimum capital requirements for market risk. In Basel III: international regulatory framework for banks, 2019.

[5]

Nicole Bäuerle and Jonathan Ott. Markov decision processes with average-value-at-risk criteria. Mathematical Methods of Operations Research, 74(3):361-379, 2011.

[6]

Kang Boda, Jerzy A Filar, Yuanlie Lin, and Lieneke Spanjers. Stochastic target hitting time and the problem of early retirement. IEEE Transactions on Automatic Control, 49(3):409-419, 2004.

[7]

Margaret P Chapman, Jonathan Lacotte, Aviv Tamar, Donggun Lee, Kevin M Smith, Victoria Cheng, Jaime F Fisac, Susmit Jha, Marco Pavone, and Claire J Tomlin. A risk-sensitive finite-time reachability approach for safety of stochastic dynamic systems. In American Control Conference (ACC), pages 2958-2963. IEEE, 2019.

[8]

Margaret P Chapman, Riccardo Bonalli, Kevin M Smith, Insoon Yang, Marco Pavone, and Claire J Tomlin. Risk-sensitive safety analysis using conditional value-at-risk. IEEE Transactions on Automatic Control, 67(12):6521-6536, 2022.

[9]

Arnav Choudhry, Brady Moon, Jay Patrikar, Constantine Samaras, and Sebastian Scherer. CVaR-based flight energy risk assessment for multirotor uavs using a deep energy model. In IEEE International Conference on Robotics and Automation (ICRA), pages 262-268, 2021.

Digital Library

[10]

Yinlam Chow and Mohammad Ghavamzadeh. Algorithms for CVaR optimization in MDPs. In Neural Information Processing Systems (NIPS), 2014.

[11]

Yinlam Chow, Aviv Tamar, Shie Mannor, and Marco Pavone. Risk-sensitive and robust decision-making: A CVaR optimization approach. In Neural Information Processing Systems (NIPS), 2015.

[12]

Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, and Marco Pavone. Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18: 1-51, 2018.

Digital Library

[13]

Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley-Interscience, 2^nd edition, 2006.

Digital Library

[14]

Erick Delage, Daniel Kuhn, and Wolfram Wiesemann. "Dice"-sion-making under uncertainty: When can a random decision reduce risk? Management Science, 65(7):3282-3301, 2019.

Digital Library

[15]

Rui Ding and Eugene Feinberg. CVaR optimization for MDPs: Existence and computation of optimal policies. ACM SIGMETRICS Performance Evaluation Review, 50(2):39-41, 2022.

Digital Library

[16]

Rui Ding and Eugene A. Feinberg. Sequential Optimization of CVaR, February 2023.

[17]

Jerzy A. Filar, Dmitry Krass, and Keith W. Ross. Percentile Performance Criteria For Limiting Average Markov Decision Processes. IEEE Transactions on Automatic Control, 40(1):2-10, 1995.

[18]

Hans Follmer and Alexander Schied. Stochastic Finance: Introduction in Discrete Time. De Gruyter Graduate, fourth edition, 2016.

[19]

Astghik Hakobyan and Insoon Yang. Wasserstein distributionally robust motion control for collision avoidance using conditional value-at-risk. IEEE Transactions on Robotics, 38(2):939-957, 2021.

[20]

Jia Lin Hau, Marek Petrik, and Mohammad Ghavamzadeh. Entropic risk optimization in discounted MDPs. In Artificial Intelligence and Statistics (AISTATS), 2023.

[21]

I Ge Jin, Bastian Schürmann, Richard M Murray, and Matthias Althoff. Risk-aware motion planning for automated vehicle among human-driven cars. In American Control Conference (ACC), pages 3987-3993, 2019.

[22]

Ramtin Keramati, Christoph Dann, Alex Tamkin, and Emma Brunskill. Being optimistic to be conservative: Quickly learning a CVaR policy. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 4436-4443, 2020.

[23]

Ümit Emre Köse. Optimal timing of living-donor liver transplantation under risk-aversion. PhD thesis, Bilkent Universitesi (Turkey), 2016.

[24]

Xiaocheng Li, Huaiyang Zhong, and Margaret L. Brandeau. Quantile Markov decision processes. Operations Research, 70(3):1428-1447, 2022.

Digital Library

[25]

Yuanlie Lin, Congbin Wu, and Boda Kang. Optimal models with maximizing probability of first achieving target value in the preceding stages. Science in China Series A: Mathematics, 46: 396-414, 2003.

[26]

Seungki Min, Ciamac C Moallemi, and Costis Maglaras. Risk-sensitive optimal execution via a conditional value-at-risk objective. arXiv preprint arXiv:2201.11962, 2022.

[27]

Xinyi Ni and Lifeng Lai. Risk-sensitive reinforcement learning via Entropic-VaR optimization. In Asilomar Conference on Signals, Systems, and Computers, pages 953-959. IEEE, 2022.

[28]

Georg Ch Pflug and Alois Pichler. Time-consistent decisions and temporal decomposition of coherent risk functionals. Mathematics of Operations Research, 41(2):682-699, 2016.

[29]

L. A Prashanth and Michael Fu. Risk-sensitive reinforcement learning via policy gradient search. Foundations and Trends in Machine Learning, 15(5):537-693, 2022.

Digital Library

[30]

Martin L Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, 2005.

Digital Library

[31]

Marc Rigter, Bruno Lacerda, and Nick Hawes. Risk-averse Bayes-adaptive reinforcement learning. Advances in Neural Information Processing Systems, 34:1142-1154, 2021.

[32]

A. Shapiro, D. Dentcheva, and A. Ruszczynski. Lectures on Stochastic Programming: Modeling and Theory. SIAM, 2014.

[33]

Alexander Shapiro. Interchangeability principle and dynamic equations in risk averse stochastic programming. Operations Research Letters, 45(4):377-381, 2017.

[34]

Vishnu D Sharma, Maymoonah Toubeh, Lifeng Zhou, and Pratap Tokekar. Risk-aware planning and assignment for ground vehicles using uncertain perception from aerial vehicles. In International Conference on Intelligent Robots and Systems (IROS), pages 11763-11769. IEEE, 2020.

Digital Library

[35]

Silvestr Stanko and Karel Macek. Risk-averse distributional reinforcement learning: A CVaR optimization approach. In International Joint Conference on Computational Intelligence, pages 412-423, 2019.

Digital Library

[36]

Congbin Wu and Yuanlie Lin. Minimizing risk models in Markov decision processes with policies depending on target values. Journal of mathematical analysis and applications, 231(1):47-67, 1999.

[37]

Huan Xu and Shie Mannor. Probabilistic goal Markov decision processes. In International Joint Conference on Artificial Intelligence, 2011.

[38]

Huaiyang Zhong. Decision Making for Disease Treatment: Operations Research and Data Analytic Modeling. Stanford University, 2020.

Recommendations

Risk-Sensitive Markov Decision Under Risk Constraints with Coherent Risk Measures
Modeling Decisions for Artificial Intelligence
Abstract
A Markov decision process with constraints of coherent risk measures is discussed. Risk-sensitive expected rewards under utility functions are approximated by weighted average value-at-risks, and risk constraints are described by coherent risk ...
Risk-sensitive Markov Decision Processes with Risk Constraints of Coherent Risk Measures in Fuzzy and Stochastic Environment
IJCCI 2019: Proceedings of the 11th International Joint Conference on Computational Intelligence

Risk-sensitive decision making with constraints of coherent risk measures is discussed in Markov decision processes. Risk-sensitive expected rewards under utility functions are approximated by weighted average value-at-risks, and risk constraints are ...
Reinsurer's optimal reinsurance strategy with upper and lower premium constraints under distortion risk measures

Motivated by Cui etźal. (2013) and Zheng and Cui (2014), we study in this paper the optimal (from the reinsurer's point of view) reinsurance problem where the risk is measured by distortion risk measures, the premiums are calculated under the distortion ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

December 2023

80772 pages

Copyright © 2023 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 30 May 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents