Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Asynchronous reinforcement learning algorithms for solving discrete space path planning problems

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Reinforcement learning has great potential in solving practical problems, but when combining it with neural networks to solve small scale discrete space problems, it may easily trap in a local minimum value. Traditional reinforcement learning utilizes continuous updating of a single agent to learn policies, which easily leads to a slow convergence speed. In order to solve the above problems, we combine asynchronous methods with existing tabular reinforcement learning algorithms, propose a parallel architecture to solve the discrete space path planning problem, and present some new variants of asynchronous reinforcement learning algorithms. We apply these algorithms on the standard reinforcement learning environment problems, and the experimental results show that these methods can solve discrete space path planning problems efficiently. One of these algorithms, Asynchronous Phased Dyna-Q, which surpasses existing asynchronous reinforcement learning algorithms, can well balance exploration and exploitation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Sutton R, Barto A. (1998) Reinforcement Learning: An introduction. MIT press, Cambridge

    Google Scholar 

  2. Silver D, Schrittwieser J, Simonyan K et al (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359

    Article  Google Scholar 

  3. Silver D, Huang A, Maddison C et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  4. Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  5. Mnih V, Kavukcuoglu K, Silver D et al (2013) Playing atari with deep reinforcement learning. Proceedings of Workshops at the 26th Neural Information Processing Systems Lake Tahoe, USA, pp 201–220

  6. Levine S, Pastor P, Krizhevsky A et al (2016) Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection. International Symposium on Experimental Robotics. Springer, pp 173–184

  7. Zhang M, Mccarthy Z, Finn C et al (2016) Learning deep neural network policies with continuous memory states. Proceedings of the International Conference on Robotics and Automation, Stockholm, pp 520-527

  8. Levine S, Finn C, Darrell T et al (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(39):1–040

    MathSciNet  MATH  Google Scholar 

  9. Lenz I, Knepper R, Saxena A (2015) Deepmpc: learning deep latent features for model predictive control. Proceedings of the Robotics Science and Systems, Rome, pp 201–209

  10. Satija H, Pineau J (2016) Simultaneous machine translation using deep reinforcement learning. Proceedings of the Workshops of International Conference on Machine Learning, New York, pp 110–119

  11. Oh J, Guo X, LEE H et al (2016) Action-conditional video prediction using deep networks in atari games. Advances in Neural Information Processing Systems, pp 2863–2871

  12. Guo H (2015) Generating text with deep reinforcement learning. Proceedings of the Workshops of Advances in Neural Information Processing Systems, Montreal, 1-9

  13. Li J, Monroe W, RITTER A et al (2016) Deep reinforcement learning for dialogue generation. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Austin, pp 1192–1202

  14. Caicedo J, Lazebnik S (2015) Active Object Localization with Deep Reinforcement Learning. IEEE international conference on computer vision. IEEE, pp 2488–2496

  15. Oh J, Chockalingam V, SINGH S et al (2016) Control of memory, active perception, and action in Minecraft. Proceedings of the International Conference on Machine Learning, New York, pp 2790–2799

  16. Lample G, Chaplot D (2017) Playing FPS Games with Deep Reinforcement Learning. AAAI, pp 2140–2146

  17. Kempka M, Wydmuch M, RUNC G et al (2016) Vizdoom: A doom-based ai research platform for visual reinforcement learning. 2016 IEEE Conference Computational Intelligence and Games (CIG). IEEE, pp 1–8

  18. Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44

    Google Scholar 

  19. Watkins C (1989) Learning from delayed rewards. King’s College, Cambridge

    Google Scholar 

  20. Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. University of Cambridge Department of Engineering

  21. Singh SP, Sutton RS (1996) Reinforcement learning with replacing eligibility traces. Recent Advances in Reinforcement Learning, pp 123–158

  22. Wang Y, Tzuu-Hseng SL, Chih-Jui L (2013) Backward Q-learning: The combination of Sarsa algorithm and Q-learning. Eng Appl Artif Intell 26(9):2184–2193

    Article  Google Scholar 

  23. Mnih V, Badia AP, Mirza M et al (2016) Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning, pp 1928–1937

  24. Sutton RS (1991) Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull 2(4):160–163

    Article  Google Scholar 

  25. Sutton RS, Szepesvari C, Geramifard A et al (2008) Dyna-Style Planning with linear function approximation and prioritized sweeping. Conference on Uncertainty in Artificial Intelligence

  26. Watkins C, Dayan P (1992) Q-learning. Machine Learning, pp 279–292

  27. Konda V, Tsitsiklis J (2000) Actor-critic algorithms. Advances in neural information pro-cessing systems, pp 1008–1014

  28. Peng J, Williams RJ (1993) Efficient learning and planning within the dyna framework. Adapt Behav 1 (4):168–174

    Google Scholar 

  29. Weiß G (2000) A multiagent variant of Dyna-Q. 2000 Proceedings International Conference on Multiagent Systems. IEEE, pp 461–462

  30. Skoglund A, Palm R, Duckett T (2005) Towards a supervised Dyna-Q application on a robotic manipulator. Advances in Artificial Intelligence in Sweden, pp 148–153

  31. Zhao Y, Chen QW, Wei-Li HU (2009) A phased dyna reinforcement learning algorithm. Comput Simul 26(7):154–158

    Google Scholar 

  32. Wiering M, Otterlo M (2012) Reinforcement Learning: State-of-the-Art. Springer Publishing Company, Incorporated

  33. Tokic M, Palm G (2011) Value-difference based exploration: adaptive control between epsilon-greedy and softmax. Annual Conference on Artificial Intelligence. Springer, Berlin, pp 335–346

    Google Scholar 

  34. Nair A, Srinivasan P, Blackwell S et al (2015) Massively parallel methods for deep reinforcement learning. arXiv:1507.04296

  35. Brockman G, Cheung V, Pettersson L et al (2016) OpenAI Gym. arXiv:1606.01540

Download references

Acknowledgments

This work is supported by the Fundamental Research Funds for the Central Universities(No.2017XKZD03).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shifei Ding.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Ding, S., An, Y. et al. Asynchronous reinforcement learning algorithms for solving discrete space path planning problems. Appl Intell 48, 4889–4904 (2018). https://doi.org/10.1007/s10489-018-1241-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1241-z

Keywords

Navigation