Article

Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach

Authors:

Silvestr Stanko,

Karel MacekAuthors Info & Claims

IJCCI 2019: Proceedings of the 11th International Joint Conference on Computational Intelligence

Pages 412 - 423

https://doi.org/10.5220/0008175604120423

Published: 17 September 2019 Publication History

Abstract

Conditional Value-at-Risk (CVaR) is a well-known measure of risk that has been directly equated to robustness, an important component of Artificial Intelligence (AI) safety. In this paper we focus on optimizing CVaR in the context of Reinforcement Learning (RL), as opposed to the usual risk-neutral expectation. As a first original contribution, we improve the CVaR Value Iteration algorithm (Chow et al., 2015) in a way that reduces computational complexity of the original algorithm from polynomial to linear time. Secondly, we propose a sampling version of CVaR Value Iteration we call CVaR Q-learning. We also derive a distributional policy improvement algorithm, and later use it as a heuristic for extracting the optimal policy from the converged CVaR Q-learning algorithm. Finally, to show the scalability of our method, we propose an approximate Q-learning algorithm by reformulating the CVaR Temporal Difference update rule as a loss function which we later use in a deep learning context. All proposed methods are experimentally analyzed, including the Deep CVaR Q-learning agent which learns how to avoid risk from raw pixels.

Cited By

View all

Kim JMin SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Risk-sensitive policy optimization via predictive CVaR policy gradientProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693046(24354-24369)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693046
Chen YZhang XWang SHuang LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Provable risk-sensitive distributional reinforcement learning with general function approximationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692374(7748-7791)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692374
Hau JDelage EGhavamzadeh MPetrik MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)On dynamic programming decompositions of static risk measures in Markov decision processesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668376(51734-51757)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668376
Show More Cited By

Index Terms

Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Vagueness and fuzzy logic
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Safe reinforcement learning using risk mapping by similarity

Reinforcement learning (RL) has been used to successfully solve sequential decision problem. However, considering risk at the same time as the learning process is an open research problem. In this work, we are interested in the type of risk that can lead ...
Risk-constrained reinforcement learning with percentile risk criteria

In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account risk, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this ...
Evaluation of reinforcement learning techniques
IITM '10: Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia

Reinforcement learning is became one of the most important approaches to machine intelligence. Now RL is widely use by different research field as intelligent control, robotics and neuroscience. It provides us possible solution within unknown ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

IJCCI 2019: Proceedings of the 11th International Joint Conference on Computational Intelligence

September 2019

563 pages

ISBN:9789897583841

Editors:
Juan Julian Merelo
University of Granada
,
Jonathan Garibaldi
University of Nottingham
,
Alejandro Linares Barranco
ETSI Informática
,
Kurosh Madani
University of Paris-EST Créteil (UPEC)
,
Kevin Warwick
University of Reading and Coventry University

Publisher

SCITEPRESS - Science and Technology Publications, Lda

Setubal, Portugal

Publication History

Published: 17 September 2019

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kim JMin SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Risk-sensitive policy optimization via predictive CVaR policy gradientProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693046(24354-24369)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693046
Chen YZhang XWang SHuang LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Provable risk-sensitive distributional reinforcement learning with general function approximationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692374(7748-7791)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692374
Hau JDelage EGhavamzadeh MPetrik MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)On dynamic programming decompositions of static risk measures in Markov decision processesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668376(51734-51757)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668376
Lobo ECousins CZick YPetrik MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Percentile criterion optimization in offline reinforcement learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666531(9322-9352)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666531
Lim SMalik IKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Distributional reinforcement learning for risk-sensitive policiesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602516(30977-30989)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602516

Abstract

Cited By

Index Terms

Recommendations

Safe reinforcement learning using risk mapping by similarity

Risk-constrained reinforcement learning with percentile risk criteria

Evaluation of reinforcement learning techniques

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations