Model-Free Trajectory Optimization for Reinforcement Learning

Riad Akrour, Gerhard Neumann, Hany Abdulsamad, Abbas Abdolmaleki

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2961-2970, 2016.

Abstract

Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-akrour16,
  title = 	 {Model-Free Trajectory Optimization for Reinforcement Learning},
  author = 	 {Akrour, Riad and Neumann, Gerhard and Abdulsamad, Hany and Abdolmaleki, Abbas},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {2961--2970},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/akrour16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/akrour16.html},
  abstract = 	 {Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.}
}

Endnote

%0 Conference Paper
%T Model-Free Trajectory Optimization for Reinforcement Learning
%A Riad Akrour
%A Gerhard Neumann
%A Hany Abdulsamad
%A Abbas Abdolmaleki
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-akrour16
%I PMLR
%P 2961--2970
%U https://proceedings.mlr.press/v48/akrour16.html
%V 48
%X Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.

RIS


TY  - CPAPER
TI  - Model-Free Trajectory Optimization for Reinforcement Learning
AU  - Riad Akrour
AU  - Gerhard Neumann
AU  - Hany Abdulsamad
AU  - Abbas Abdolmaleki
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-akrour16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 2961
EP  - 2970
L1  - http://proceedings.mlr.press/v48/akrour16.pdf
UR  - https://proceedings.mlr.press/v48/akrour16.html
AB  - Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.
ER  -

APA


Akrour, R., Neumann, G., Abdulsamad, H. & Abdolmaleki, A.. (2016). Model-Free Trajectory Optimization for Reinforcement Learning. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2961-2970 Available from https://proceedings.mlr.press/v48/akrour16.html.

Model-Free Trajectory Optimization for Reinforcement Learning

Abstract

Cite this Paper

Related Material