On the Expressivity of Neural Networks for Deep Reinforcement Learning

Kefan Dong, Yuping Luo, Tianhe Yu, Chelsea Finn, Tengyu Ma

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2627-2637, 2020.

Abstract

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on benchmark tasks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-dong20d,
  title = 	 {On the Expressivity of Neural Networks for Deep Reinforcement Learning},
  author =       {Dong, Kefan and Luo, Yuping and Yu, Tianhe and Finn, Chelsea and Ma, Tengyu},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {2627--2637},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/dong20d/dong20d.pdf},
  url = 	 {https://proceedings.mlr.press/v119/dong20d.html},
  abstract = 	 {We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on benchmark tasks.}
}

Endnote

%0 Conference Paper
%T On the Expressivity of Neural Networks for Deep Reinforcement Learning
%A Kefan Dong
%A Yuping Luo
%A Tianhe Yu
%A Chelsea Finn
%A Tengyu Ma
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-dong20d
%I PMLR
%P 2627--2637
%U https://proceedings.mlr.press/v119/dong20d.html
%V 119
%X We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on benchmark tasks.

APA


Dong, K., Luo, Y., Yu, T., Finn, C. & Ma, T.. (2020). On the Expressivity of Neural Networks for Deep Reinforcement Learning. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2627-2637 Available from https://proceedings.mlr.press/v119/dong20d.html.

On the Expressivity of Neural Networks for Deep Reinforcement Learning

Abstract

Cite this Paper

Related Material