Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3666122.3668376guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

On dynamic programming decompositions of static risk measures in Markov decision processes

Published: 30 May 2024 Publication History

Abstract

Optimizing static risk-averse objectives in Markov decision processes is difficult because they do not admit standard dynamic programming equations common in Reinforcement Learning (RL) algorithms. Dynamic programming decompositions that augment the state space with discrete risk levels have recently gained popularity in the RL community. Prior work has shown that these decompositions are optimal when the risk level is discretized sufficiently. However, we show that these popular decompositions for Conditional-Value-at-Risk (CVaR) and Entropic-Value-at-Risk (EVaR) are inherently suboptimal regardless of the discretization level. In particular, we show that a saddle point property assumed to hold in prior literature may be violated. However, a decomposition does hold for Value-at-Risk and our proof demonstrates how this risk measure differs from CVaR and EVaR. Our findings are significant because risk-averse algorithms are used in high-stakes environments, making their correctness much more critical.

Supplementary Material

Additional material (3666122.3668376_supp.pdf)
Supplemental material.

References

[1]
Mohamadreza Ahmadi, Xiaobin Xiong, and Aaron D Ames. Risk-averse control via CVaR barrier functions: Application to bipedal robot locomotion. IEEE Control Systems Letters, 6:878-883, 2021.
[2]
A. Ahmadi-Javid. Entropic Value-at-Risk: A new coherent risk measure. Journal of Optimization Theory and Applications, 155(3):1105-1123, 2012.
[3]
Philippe Artzner, Freddy Delbaen, Jean-Marc Eber, and David Heath. Coherent measures of risk. Mathematical Finance, 9(3):203-228, 1999.
[4]
Basel Committee on Banking Supervision. Minimum capital requirements for market risk. In Basel III: international regulatory framework for banks, 2019.
[5]
Nicole Bäuerle and Jonathan Ott. Markov decision processes with average-value-at-risk criteria. Mathematical Methods of Operations Research, 74(3):361-379, 2011.
[6]
Kang Boda, Jerzy A Filar, Yuanlie Lin, and Lieneke Spanjers. Stochastic target hitting time and the problem of early retirement. IEEE Transactions on Automatic Control, 49(3):409-419, 2004.
[7]
Margaret P Chapman, Jonathan Lacotte, Aviv Tamar, Donggun Lee, Kevin M Smith, Victoria Cheng, Jaime F Fisac, Susmit Jha, Marco Pavone, and Claire J Tomlin. A risk-sensitive finite-time reachability approach for safety of stochastic dynamic systems. In American Control Conference (ACC), pages 2958-2963. IEEE, 2019.
[8]
Margaret P Chapman, Riccardo Bonalli, Kevin M Smith, Insoon Yang, Marco Pavone, and Claire J Tomlin. Risk-sensitive safety analysis using conditional value-at-risk. IEEE Transactions on Automatic Control, 67(12):6521-6536, 2022.
[9]
Arnav Choudhry, Brady Moon, Jay Patrikar, Constantine Samaras, and Sebastian Scherer. CVaR-based flight energy risk assessment for multirotor uavs using a deep energy model. In IEEE International Conference on Robotics and Automation (ICRA), pages 262-268, 2021.
[10]
Yinlam Chow and Mohammad Ghavamzadeh. Algorithms for CVaR optimization in MDPs. In Neural Information Processing Systems (NIPS), 2014.
[11]
Yinlam Chow, Aviv Tamar, Shie Mannor, and Marco Pavone. Risk-sensitive and robust decision-making: A CVaR optimization approach. In Neural Information Processing Systems (NIPS), 2015.
[12]
Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson, and Marco Pavone. Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18: 1-51, 2018.
[13]
Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006.
[14]
Erick Delage, Daniel Kuhn, and Wolfram Wiesemann. "Dice"-sion-making under uncertainty: When can a random decision reduce risk? Management Science, 65(7):3282-3301, 2019.
[15]
Rui Ding and Eugene Feinberg. CVaR optimization for MDPs: Existence and computation of optimal policies. ACM SIGMETRICS Performance Evaluation Review, 50(2):39-41, 2022.
[16]
Rui Ding and Eugene A. Feinberg. Sequential Optimization of CVaR, February 2023.
[17]
Jerzy A. Filar, Dmitry Krass, and Keith W. Ross. Percentile Performance Criteria For Limiting Average Markov Decision Processes. IEEE Transactions on Automatic Control, 40(1):2-10, 1995.
[18]
Hans Follmer and Alexander Schied. Stochastic Finance: Introduction in Discrete Time. De Gruyter Graduate, fourth edition, 2016.
[19]
Astghik Hakobyan and Insoon Yang. Wasserstein distributionally robust motion control for collision avoidance using conditional value-at-risk. IEEE Transactions on Robotics, 38(2):939-957, 2021.
[20]
Jia Lin Hau, Marek Petrik, and Mohammad Ghavamzadeh. Entropic risk optimization in discounted MDPs. In Artificial Intelligence and Statistics (AISTATS), 2023.
[21]
I Ge Jin, Bastian Schürmann, Richard M Murray, and Matthias Althoff. Risk-aware motion planning for automated vehicle among human-driven cars. In American Control Conference (ACC), pages 3987-3993, 2019.
[22]
Ramtin Keramati, Christoph Dann, Alex Tamkin, and Emma Brunskill. Being optimistic to be conservative: Quickly learning a CVaR policy. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 4436-4443, 2020.
[23]
Ümit Emre Köse. Optimal timing of living-donor liver transplantation under risk-aversion. PhD thesis, Bilkent Universitesi (Turkey), 2016.
[24]
Xiaocheng Li, Huaiyang Zhong, and Margaret L. Brandeau. Quantile Markov decision processes. Operations Research, 70(3):1428-1447, 2022.
[25]
Yuanlie Lin, Congbin Wu, and Boda Kang. Optimal models with maximizing probability of first achieving target value in the preceding stages. Science in China Series A: Mathematics, 46: 396-414, 2003.
[26]
Seungki Min, Ciamac C Moallemi, and Costis Maglaras. Risk-sensitive optimal execution via a conditional value-at-risk objective. arXiv preprint arXiv:2201.11962, 2022.
[27]
Xinyi Ni and Lifeng Lai. Risk-sensitive reinforcement learning via Entropic-VaR optimization. In Asilomar Conference on Signals, Systems, and Computers, pages 953-959. IEEE, 2022.
[28]
Georg Ch Pflug and Alois Pichler. Time-consistent decisions and temporal decomposition of coherent risk functionals. Mathematics of Operations Research, 41(2):682-699, 2016.
[29]
L. A Prashanth and Michael Fu. Risk-sensitive reinforcement learning via policy gradient search. Foundations and Trends in Machine Learning, 15(5):537-693, 2022.
[30]
Martin L Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, 2005.
[31]
Marc Rigter, Bruno Lacerda, and Nick Hawes. Risk-averse Bayes-adaptive reinforcement learning. Advances in Neural Information Processing Systems, 34:1142-1154, 2021.
[32]
A. Shapiro, D. Dentcheva, and A. Ruszczynski. Lectures on Stochastic Programming: Modeling and Theory. SIAM, 2014.
[33]
Alexander Shapiro. Interchangeability principle and dynamic equations in risk averse stochastic programming. Operations Research Letters, 45(4):377-381, 2017.
[34]
Vishnu D Sharma, Maymoonah Toubeh, Lifeng Zhou, and Pratap Tokekar. Risk-aware planning and assignment for ground vehicles using uncertain perception from aerial vehicles. In International Conference on Intelligent Robots and Systems (IROS), pages 11763-11769. IEEE, 2020.
[35]
Silvestr Stanko and Karel Macek. Risk-averse distributional reinforcement learning: A CVaR optimization approach. In International Joint Conference on Computational Intelligence, pages 412-423, 2019.
[36]
Congbin Wu and Yuanlie Lin. Minimizing risk models in Markov decision processes with policies depending on target values. Journal of mathematical analysis and applications, 231(1):47-67, 1999.
[37]
Huan Xu and Shie Mannor. Probabilistic goal Markov decision processes. In International Joint Conference on Artificial Intelligence, 2011.
[38]
Huaiyang Zhong. Decision Making for Disease Treatment: Operations Research and Data Analytic Modeling. Stanford University, 2020.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems
December 2023
80772 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 30 May 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media