Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5220/0008175604120423guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach

Published: 17 September 2019 Publication History

Abstract

Conditional Value-at-Risk (CVaR) is a well-known measure of risk that has been directly equated to robustness, an important component of Artificial Intelligence (AI) safety. In this paper we focus on optimizing CVaR in the context of Reinforcement Learning (RL), as opposed to the usual risk-neutral expectation. As a first original contribution, we improve the CVaR Value Iteration algorithm (Chow et al., 2015) in a way that reduces computational complexity of the original algorithm from polynomial to linear time. Secondly, we propose a sampling version of CVaR Value Iteration we call CVaR Q-learning. We also derive a distributional policy improvement algorithm, and later use it as a heuristic for extracting the optimal policy from the converged CVaR Q-learning algorithm. Finally, to show the scalability of our method, we propose an approximate Q-learning algorithm by reformulating the CVaR Temporal Difference update rule as a loss function which we later use in a deep learning context. All proposed methods are experimentally analyzed, including the Deep CVaR Q-learning agent which learns how to avoid risk from raw pixels.

Cited By

View all
  • (2023)On dynamic programming decompositions of static risk measures in Markov decision processesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668376(51734-51757)Online publication date: 10-Dec-2023
  • (2023)Percentile criterion optimization in offline reinforcement learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666531(9322-9352)Online publication date: 10-Dec-2023
  • (2022)Distributional reinforcement learning for risk-sensitive policiesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602516(30977-30989)Online publication date: 28-Nov-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
IJCCI 2019: Proceedings of the 11th International Joint Conference on Computational Intelligence
September 2019
563 pages
ISBN:9789897583841

Publisher

SCITEPRESS - Science and Technology Publications, Lda

Setubal, Portugal

Publication History

Published: 17 September 2019

Author Tags

  1. AI Safety
  2. CVaR
  3. Conditional Value-at-Risk
  4. Deep Learning
  5. Deep Q-learning.
  6. Distributional Reinforcement Learning
  7. Q-learning
  8. Reinforcement Learning
  9. Risk
  10. Value Iteration

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)On dynamic programming decompositions of static risk measures in Markov decision processesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668376(51734-51757)Online publication date: 10-Dec-2023
  • (2023)Percentile criterion optimization in offline reinforcement learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666531(9322-9352)Online publication date: 10-Dec-2023
  • (2022)Distributional reinforcement learning for risk-sensitive policiesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602516(30977-30989)Online publication date: 28-Nov-2022

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media