Applications of asynchronous deep reinforcement learning based on dynamic updating weights

Xingyu Zhao¹,
Shifei Ding¹,
Yuexuan An¹ &
…
Weikuan Jia²

1434 Accesses
Explore all metrics

Abstract

Deep reinforcement learning based on the asynchronous method is a new kind of reinforcement learning. It takes a multithreading way to enable multiple agents to update the parameters asynchronously in different exploration spaces. In this way, agents no longer need experience to reply and can update parameters online. At the same time, the asynchronous method can greatly improve the convergence speed of the algorithms and significantly improve the convergence performance of the algorithms. Asynchronous deep reinforcement learning algorithms, especially asynchronous advantage actor-critic algorithm, are very effective in solving practical problems and have been widely used. However, in existing asynchronous deep reinforcement learning algorithms, when each thread pushes updates to the global thread, it adopts a uniform learning rate, and fails to take account of the different information transmitted by different threads at each update. When the update of the agent to global thread is more biased towards failure information, it has no obvious help to update the parameters of the learning system. Therefore, we introduce the dynamic weights to asynchronous deep reinforcement learning algorithms and propose a new reinforcement learning algorithm named asynchronous advantage actor-critic with dynamic updating weights (DWA3C). When the information pushed by an agent is obviously helpful for the improvement of the system performance, we will enhance the update range, otherwise, we will weaken that. In this way, we can significantly improve the convergence efficiencies and convergence performances of the asynchronous deep reinforcement learning algorithms. And we also test the effectiveness of the algorithm through experiments. The experimental results show that, in the same running time, the proposed algorithm can significantly improve the convergence efficiency and convergence performance compared with the existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Averaged-A3C for Asynchronous Deep Reinforcement Learning

Improving sample efficiency in Multi-Agent Actor-Critic methods

Article 09 July 2021

ε-Maximum Critic Deep Deterministic Policy Gradient for Multi-agent Reinforcement Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press
Mnih V, Kavukcuoglu K, Silver D et al (2013) Playing atari with deep reinforcement learning. In: Proceedings of workshops at the 26th neural information processing systems 2013. Lake Tahoe, USA, pp 201–220
Levine S, Finn C, Darrell T et al (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(39):1–40
MathSciNet MATH Google Scholar
Zhang M, Mccarthy Z, Finn C, et al (2016) Learning deep neural network policies with continuous memory states. In: Proceedings of the international conference on robotics and automation. Stockholm, Sweden, pp 520–527
Satija H, Pineau J (2016) Simultaneous machine translation using deep reinforcement learning. In: Proceedings of the workshops of international conference on machine learning. New York, USA, pp 110–119
Oh J, Guo X, Lee H et al (2015) Action-conditional video prediction using deep networks in atari games. In: Advances in neural information processing systems, pp 2863–2871
Li J, Monroe W, Ritter A et al (2016) Deep reinforcement learning for dialogue generation. In: Proceedings of the conference on empirical methods in natural language processing. Austin, USA, pp 1192–1202
Sallab A, Abdou M, Perot E et al (2017) Deep reinforcement learning framework for autonomous driving. Electron Imag 19:70–76
Article Google Scholar
Caicedo J, Lazebnik S (2015) Active object localization with deep reinforcement learning. In: IEEE international conference on computer vision. IEEE, pp 2488–2496
Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K et al (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540): 529–533
Article Google Scholar
Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-Learning. AAAI, pp 2094–2100
Wang Z, Freitas N, Lanctot M (2016) Dueling network architectures for deep reinforcement learning. In: Proceedings of the international conference on machine learning. New York, USA, pp 1995–2003
Schaul T, Quan J, Antonoglou I et al (2016) Prioritized experience replay. In: Proceedings of the 4th international conference on learning representations. San Juan, Puerto Rico, pp 322–355
Lillicrap TP, Hunt JJ, Pritzel A et al (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971
Silver D, Lever G, Heess N et al (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st international conference on machine learning, pp 387–395
Konda V, Tsitsiklis J (2000) Actor-critic algorithms3 advances in neural information processing systems, pp 1008–1014
Mnih V, Badia A, Mirza M et al (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937
Watkins CJCH (1989) Learning from delayed rewards. Robot Auton Syst 15(4):233–235
Google Scholar
Ding S, Zhang N, Zhang J et al (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8(2):587–595
Article Google Scholar
Liao H, Ding S, Wang M et al (2016) An overview on rough neural networks. Neural Comput Appl 27(7):1805–1816
Article Google Scholar
Schulman J, Moritz P, Levine S et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438
Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. In: COURSERA: neural networks for machine learning, p 4
Kingma DP, Ba J (2014) Adam A method for stochastic optimization. arXiv:1412.6980
Ferreira LA, Bianchi RAC, Santos PE et al (2017) Answer set programming for non-stationary Markov decision processes. Appl Intell 47(4):993–1007
Article Google Scholar
Pakizeh E, Pedram MM, Palhang M (2015) Multi-criteria expertness based cooperative method for SARSA and eligibility trace algorithms. Appl Intell 43(3):487–498
Article Google Scholar
Hessel M, Modayil J, Van Hasselt H et al (2017) Rainbow: combining improvements in deep reinforcement learning. arXiv:1710.02298
Vien NA, Ertel W, Chung TC (2013) Learning via human feedback in continuous state and action spaces. Appl Intell 39(2):267–278
Article Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities(No.2017XKZD03).

Author information

Authors and Affiliations

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
Xingyu Zhao, Shifei Ding & Yuexuan An
School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
Weikuan Jia

Authors

Xingyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shifei Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yuexuan An
View author publications
You can also search for this author in PubMed Google Scholar
Weikuan Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shifei Ding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, X., Ding, S., An, Y. et al. Applications of asynchronous deep reinforcement learning based on dynamic updating weights. Appl Intell 49, 581–591 (2019). https://doi.org/10.1007/s10489-018-1296-x

Download citation

Published: 07 September 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s10489-018-1296-x

Applications of asynchronous deep reinforcement learning based on dynamic updating weights

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Averaged-A3C for Asynchronous Deep Reinforcement Learning

Improving sample efficiency in Multi-Agent Actor-Critic methods

ε-Maximum Critic Deep Deterministic Policy Gradient for Multi-agent Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Applications of asynchronous deep reinforcement learning based on dynamic updating weights

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Averaged-A3C for Asynchronous Deep Reinforcement Learning

Improving sample efficiency in Multi-Agent Actor-Critic methods

ε-Maximum Critic Deep Deterministic Policy Gradient for Multi-agent Reinforcement Learning

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation