Google Scholar

Deterministic policy gradient algorithms

D Silver, G Lever, N Heess, T Degris… - International …, 2014 - proceedings.mlr.press

D Silver, G Lever, N Heess, T Degris, D Wierstra, M Riedmiller

International conference on machine learning, 2014•proceedings.mlr.press

In this paper we consider deterministic policy gradient algorithms for reinforcement learning
with continuous actions. The deterministic policy gradient has a particularly appealing form:
it is the expected gradient of the action-value function. This simple form means that the
deterministic policy gradient can be estimated much more efficiently than the usual
stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-
critic algorithm that learns a deterministic target policy from an exploratory behaviour policy …

Abstract

In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Deterministic policy gradient algorithms outperformed their stochastic counterparts in several benchmark problems, particularly in high-dimensional action spaces.

proceedings.mlr.press

Show moreShow less

Save Cite Cited by 5440 Related articles All 32 versions View as HTML

Cite

Advanced search

Saved to My library

Deterministic policy gradient algorithms