Abstract
Deep reinforcement learning based on the asynchronous method is a new kind of reinforcement learning. It takes a multithreading way to enable multiple agents to update the parameters asynchronously in different exploration spaces. In this way, agents no longer need experience to reply and can update parameters online. At the same time, the asynchronous method can greatly improve the convergence speed of the algorithms and significantly improve the convergence performance of the algorithms. Asynchronous deep reinforcement learning algorithms, especially asynchronous advantage actor-critic algorithm, are very effective in solving practical problems and have been widely used. However, in existing asynchronous deep reinforcement learning algorithms, when each thread pushes updates to the global thread, it adopts a uniform learning rate, and fails to take account of the different information transmitted by different threads at each update. When the update of the agent to global thread is more biased towards failure information, it has no obvious help to update the parameters of the learning system. Therefore, we introduce the dynamic weights to asynchronous deep reinforcement learning algorithms and propose a new reinforcement learning algorithm named asynchronous advantage actor-critic with dynamic updating weights (DWA3C). When the information pushed by an agent is obviously helpful for the improvement of the system performance, we will enhance the update range, otherwise, we will weaken that. In this way, we can significantly improve the convergence efficiencies and convergence performances of the asynchronous deep reinforcement learning algorithms. And we also test the effectiveness of the algorithm through experiments. The experimental results show that, in the same running time, the proposed algorithm can significantly improve the convergence efficiency and convergence performance compared with the existing algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press
Mnih V, Kavukcuoglu K, Silver D et al (2013) Playing atari with deep reinforcement learning. In: Proceedings of workshops at the 26th neural information processing systems 2013. Lake Tahoe, USA, pp 201–220
Levine S, Finn C, Darrell T et al (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(39):1–40
Zhang M, Mccarthy Z, Finn C, et al (2016) Learning deep neural network policies with continuous memory states. In: Proceedings of the international conference on robotics and automation. Stockholm, Sweden, pp 520–527
Satija H, Pineau J (2016) Simultaneous machine translation using deep reinforcement learning. In: Proceedings of the workshops of international conference on machine learning. New York, USA, pp 110–119
Oh J, Guo X, Lee H et al (2015) Action-conditional video prediction using deep networks in atari games. In: Advances in neural information processing systems, pp 2863–2871
Li J, Monroe W, Ritter A et al (2016) Deep reinforcement learning for dialogue generation. In: Proceedings of the conference on empirical methods in natural language processing. Austin, USA, pp 1192–1202
Sallab A, Abdou M, Perot E et al (2017) Deep reinforcement learning framework for autonomous driving. Electron Imag 19:70–76
Caicedo J, Lazebnik S (2015) Active object localization with deep reinforcement learning. In: IEEE international conference on computer vision. IEEE, pp 2488–2496
Silver D, Huang A, Maddison CJ et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489
Silver D, Schrittwieser J, Simonyan K et al (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359
Mnih V, Kavukcuoglu K, Silver D, et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540): 529–533
Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-Learning. AAAI, pp 2094–2100
Wang Z, Freitas N, Lanctot M (2016) Dueling network architectures for deep reinforcement learning. In: Proceedings of the international conference on machine learning. New York, USA, pp 1995–2003
Schaul T, Quan J, Antonoglou I et al (2016) Prioritized experience replay. In: Proceedings of the 4th international conference on learning representations. San Juan, Puerto Rico, pp 322–355
Lillicrap TP, Hunt JJ, Pritzel A et al (2015) Continuous control with deep reinforcement learning. arXiv:1509.02971
Silver D, Lever G, Heess N et al (2014) Deterministic policy gradient algorithms. In: Proceedings of the 31st international conference on machine learning, pp 387–395
Konda V, Tsitsiklis J (2000) Actor-critic algorithms3 advances in neural information processing systems, pp 1008–1014
Mnih V, Badia A, Mirza M et al (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937
Watkins CJCH (1989) Learning from delayed rewards. Robot Auton Syst 15(4):233–235
Ding S, Zhang N, Zhang J et al (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8(2):587–595
Liao H, Ding S, Wang M et al (2016) An overview on rough neural networks. Neural Comput Appl 27(7):1805–1816
Schulman J, Moritz P, Levine S et al (2015) High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438
Tieleman T, Hinton G (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. In: COURSERA: neural networks for machine learning, p 4
Kingma DP, Ba J (2014) Adam A method for stochastic optimization. arXiv:1412.6980
Ferreira LA, Bianchi RAC, Santos PE et al (2017) Answer set programming for non-stationary Markov decision processes. Appl Intell 47(4):993–1007
Pakizeh E, Pedram MM, Palhang M (2015) Multi-criteria expertness based cooperative method for SARSA and eligibility trace algorithms. Appl Intell 43(3):487–498
Hessel M, Modayil J, Van Hasselt H et al (2017) Rainbow: combining improvements in deep reinforcement learning. arXiv:1710.02298
Vien NA, Ertel W, Chung TC (2013) Learning via human feedback in continuous state and action spaces. Appl Intell 39(2):267–278
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323
Acknowledgements
This work is supported by the Fundamental Research Funds for the Central Universities(No.2017XKZD03).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, X., Ding, S., An, Y. et al. Applications of asynchronous deep reinforcement learning based on dynamic updating weights. Appl Intell 49, 581–591 (2019). https://doi.org/10.1007/s10489-018-1296-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1296-x