Nothing Special   »   [go: up one dir, main page]

Skip to main content

PRAG: Periodic Regularized Action Gradient for Efficient Continuous Control

  • Conference paper
  • First Online:
PRICAI 2022: Trends in Artificial Intelligence (PRICAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13631))

Included in the following conference series:

Abstract

For actor-critic methods in reinforcement learning, it is vital to learn a useful critic such that the actor can be guided efficiently and properly. Previous methods mainly seek to estimate more accurate Q-values. However, in continuous control scenario where the actor is updated via deterministic policy gradient, only the action gradient (AG) is useful for updating the actor. It is thus a promising way to achieve higher sample efficiency by leveraging the action gradient of Q functions for policy guidance. Nevertheless, we empirically find that directly incorporating action gradient into the critics downgrades the performance of the agent, as it can be easily trapped in the local maxima. To fully utilize the benefits of action gradient and escape from the local optima, we propose Periodic Regularized Action Gradient (PRAG), which periodically involves action gradient for critic learning and additionally maximizes the target value. On a set of MuJoCo continuous control tasks, we show that PRAG can achieve higher sample efficiency and better final performance without much extra training cost, comparing to common model-free baselines. Our code is available at: https://github.com/Galaxy-Li/PRAG.

X. Li and Z. Qiao—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Baird, L.: Residual algorithms: reinforcement learning with function approximation. In: Machine Learning Proceedings. Elsevier, pp. 30–37 (1995)

    Google Scholar 

  2. Balduzzi, D., Ghifary, M.: Compatible value gradients for reinforcement learning of continuous deep policies. arXiv preprint arXiv:1509.03005 (2015)

  3. Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)

  4. D’Oro, P., Jaśkowski, W.: How to learn a useful critic? Model-based action-gradient-estimator policy optimization. Adv. Neural. Inf. Process. Syst. 33, 313–324 (2020)

    Google Scholar 

  5. Drucker, H., LeCun, Y.: Improving generalization performance using double backpropagation. IEEE Trans. Neural Netw. 36, 991–997 (1992)

    Google Scholar 

  6. Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International conference on machine learning. PMLR, pp. 1587–1596 (2018)

    Google Scholar 

  7. Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv: Learning (2018)

  8. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2015)

    Google Scholar 

  9. Konda, V., Tsitsiklis, J.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, p. 12 (1999)

    Google Scholar 

  10. Kuznetsov, A., Shvechikov, P., Grishin, A., Vetrov, D.P.: Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. arXiv preprint arXiv:2005.04269 (2020)

  11. Li, G., Gomez, R., Nakamura, K., He, B.: Human-centered reinforcement learning: a survey. IEEE Trans. Human-Mach. Syst. 49(4), 337–349 (2019)

    Article  Google Scholar 

  12. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  13. Liu, Y., Zeng, Y., Chen, Y., Tang, J., Pan, Y.: Self-improving generative adversarial reinforcement learning. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 52–60 (2019)

    Google Scholar 

  14. Lyu, J., Ma, X., Yan, J., Li, X.: Efficient continuous control with double actors and regularized critics. In: Thirty-sixth AAAI Conference on Artificial Intelligence (2022)

    Google Scholar 

  15. Lyu, J., Yang, Y., Yan, J., Li, X.: Value activation for bias alleviation: generalized-activated deep double deterministic policy gradients. arXiv preprint arXiv:2112.11216 (2021)

  16. Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Manage. Sci. 53(2), 308–322 (2007)

    Article  MATH  Google Scholar 

  17. Pan, L., Cai, Q., Huang, L.: Softmax deep double deterministic policy gradients. arXiv preprint arXiv:2010.09177 (2020)

  18. Pendrith, M.D., Ryan, M.R., et al.: Estimator Variance in Reinforcement Learning: Theoretical Problems and Practical Solutions. University of New South Wales, School of Computer Science and Engineering (1997)

    Google Scholar 

  19. Peters, J., Bagnell, J.A.: Policy gradient methods. Scholarpedia 5(11), 3698 (2010)

    Article  Google Scholar 

  20. Puterman, M.L.: Markov decision processes. In: Handbooks in Operations Research and Management Science, vol. 2, pp. 331–434 (1990)

    Google Scholar 

  21. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley (2014)

    Google Scholar 

  22. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning. PMLR, pp. 387–395 (2014)

    Google Scholar 

  23. Smith, A.E., Coit, D.W., Baeck, T., Fogel, D., Michalewicz, Z.: Penalty functions. In: Handbook of Evolutionary Computation, vol. 97, no. 1, p. C5 (1997)

    Google Scholar 

  24. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018)

    Google Scholar 

  25. Tesauro, G., et al.: Temporal difference learning and TD-Gammon. Commun. ACM 38(3), 58–68 (1995)

    Article  Google Scholar 

  26. Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE (2012)

    Google Scholar 

  27. Zeiler, M.: ADADELTA: an adaptive learning rate method. Computer Science (2012)

    Google Scholar 

Download references

Acknowledgement

The authors would like to thank the insightful comments from anonymous reviewers. This work was supported in part by the Science and Technology Innovation 2030-Key Project under Grant 2021ZD0201404.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiu Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X. et al. (2022). PRAG: Periodic Regularized Action Gradient for Efficient Continuous Control. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13631. Springer, Cham. https://doi.org/10.1007/978-3-031-20868-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20868-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20867-6

  • Online ISBN: 978-3-031-20868-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics