Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

ETQ-learning: an improved Q-learning algorithm for path planning

  • Original Research Paper
  • Published:
Intelligent Service Robotics Aims and scope Submit manuscript

Abstract

Path planning algorithm has always been the core of intelligent robot research; a good path planning algorithm can significantly enhance the efficiency of robots in executing tasks. As the application scenarios for intelligent robots continue to diversify, their adaptability to the environment has become a key focus in current path planning algorithm research. As one of the classic reinforcement learning algorithms, Q-learning (QL) algorithm has its inherent advantages in adapting to the environment, but it also faces various challenges and shortcomings. These issues are primarily centered around suboptimal path planning, slow convergence speed, weak generalization capability and poor obstacle avoidance performance. In order to solve these issues in the QL algorithm, we have carried out the following work. (1) We redesign the reward mechanism of QL algorithm. The traditional Q-learning algorithm’s reward mechanism is simple to implement but lacks directionality. We propose a combined reward mechanism of "static assignment + dynamic adjustment." This mechanism can address the issue of random path selection and ultimately lead to optimal path planning. (2) We redesign the greedy strategy of QL algorithm. In the traditional Q-learning algorithm, the greedy factor in the strategy is either randomly generated or set manually, which limits its applicability to some extent. It is difficult to effectively applied to different physical environments and scenarios, which is the fundamental reason for the poor generalization capability of the algorithm. We propose a dynamic adjustment of the greedy factor, known as the \(\varepsilon -acc-increasing\) greedy strategy, which significantly improves the efficiency of Q-learning algorithm and enhances its generalization capability so that the algorithm has a wider range of application scenarios. (3) We introduce a concept to enhance the algorithm’s obstacle avoidance performance. We design the expansion distance, which pre-sets a "collision buffer" between the obstacle and agent to enhance the algorithm’s obstacle avoidance performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

available in GitHub https://github.com/wanghw1003/ETQlearning.

References

  1. Costa MM, Silva MF (2019) A survey on path planning algorithms for mobile robots. In: 2019 IEEE international conference on autonomous robot systems and competitions (ICARSC), IEEE, pp. 1–7

  2. Wang H, Lou S, Jing J, Wang Y, Liu W, Liu T (2022) The EBS-A* algorithm: an improved A* algorithm for path planning. PLoS ONE 17(2):e0263841

    Article  Google Scholar 

  3. Wang H, Qi X, Lou S, Jing J, He H, Liu W (2021) An efficient and robust improved A* algorithm for path planning. Symmetry 13(11):2213

    Article  Google Scholar 

  4. Li D, Yin W, Wong WE, Jian M, Chau M (2021) Quality-oriented hybrid path planning based on A* and Q-learning for unmanned aerial vehicle. IEEE Access 10:7664–7674

    Article  Google Scholar 

  5. Wang B, Liu Z, Li Q, Prorok A (2020) Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robot Autom Lett 5(4):6932–6939

    Article  Google Scholar 

  6. lipei S (2018) Research on intelligent vehicle dynamic path planning algorithm based on improved Q-learning

  7. Zhao M, Lu H, Yang S, Guo F (2020) The experience-memory Q-learning algorithm for robot path planning in unknown environment. IEEE Access 8:47824–47844

    Article  Google Scholar 

  8. Wang J, Ren Z, Liu T, Yu Y, Zhang C (2020) Qplex: duplex dueling multi-agent Q-learning, arXiv preprint arXiv:2008.01062

  9. Hasselt H (2010) Double Q-learning, Advances in neural information processing systems. 23

  10. guojun M, shimin G (2021) Improved Q-learning algorithm and its application to path planning. J Taiyuan Univ Technol 52(1):91

    Google Scholar 

  11. Yunjian P, Jin L (2022) Q-learning path planning based on exploration-exploitation trade-off optimization. Comput Technol Dev. 32(1–7)

  12. chengbo W, zinyu Z, zhiqiang Z, shaobo W (2018) Path planning for unmanned vessels based on Q-learning. Ship Ocean Eng 47(5):168–171

    Google Scholar 

  13. Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O et al (2017) Noisy networks for exploration, arXiv preprint arXiv:1706.10295

  14. Ates U (2020) Long-term planning with deep reinforcement learning on autonomous drones. In: Innovations in intelligent systems and applications conference (ASYU). IEEE 2020:1–6

  15. Zijian H, Xiaoguang G, Kaifang W, Yiwei Z, Qianglong W (2021) Relevant experience learning: a deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments. Chin J Aeronaut 34(12):187–204

    Article  Google Scholar 

  16. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, PMLR, pp. 1889–1897

  17. Zhang T, Huo X, Chen S, Yang B, Zhang G (2018) Hybrid path planning of a quadrotor UAV based on q-learning algorithm. In: 37th Chinese control conference (CCC). IEEE 5415–5419

  18. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347

  19. Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Pieter Abbeel O, Zaremba W (2017) Hindsight experience replay, Advances in neural information processing systems 30

  20. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, PMLR, pp. 1861–1870

  21. Kumar A, Gupta A, Levine S (2020) Discor: corrective feedback in reinforcement learning via distribution correction. Adv Neural Inf Process Syst 33:18560–18572

    Google Scholar 

  22. Kong D, Yang L (2022) Provably feedback-efficient reinforcement learning via active reward learning. Adv Neural Inf Process Syst 35:11063–11078

    Google Scholar 

  23. Song Y, Steinweg M, Kaufmann E, Scaramuzza D (2021) Autonomous drone racing with deep reinforcement learning. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp. 1205–1212

  24. Wang Z, Yang H, Wu Q, Zheng J (2021) Fast path planning for unmanned aerial vehicles by self-correction based on Q-learning. J Aerosp Inf Syst 18(4):203–211

    Google Scholar 

  25. Yan C, Xiang X (2018) A path planning algorithm for UAV based on improved q-learning. In: 2nd international conference on robotics and automation sciences (ICRAS). IEEE :1–5

  26. de Carvalho KB, de Oliveira IRL, Villa DK, Caldeira AG, Sarcinelli-Filho M, Brandão AS (2022) Q-learning based path planning method for UAVs using priority shifting. In: 2022 International Conference on Unmanned Aircraft Systems (ICUAS), IEEE, pp. 421–426

  27. Li S, Xu X, Zuo L (2015) Dynamic path planning of a mobile robot with improved Q-learning algorithm. In: IEEE international conference on information and automation. IEEE 409–414

  28. Wang Y, Wang S, Xie Y, Hu Y, Li H (2022) Q-learning-based collision-free path planning for mobile robot in unknown environment. In: 2022 IEEE 17th conference on industrial electronics and applications (ICIEA), IEEE, pp. 1104–1109

Download references

Funding

The government of Henan Province, China, supported the research in the form of the Henan Provincial Key R & D Special Funds(221111210300).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization was performed by H.W.; methodology by H.W. and J.J.; software by Q.W.; validation by Q.W. and R.L.; formal analysis by H.H.; writing original draft by H.W.; writing review and editing by J.J. and R.L.; funding by X.Q.

Corresponding author

Correspondence to Rui Lou.

Ethics declarations

Conflict of interest

The authors declare that they do not have any conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to Participate

All authors agreed to participate the research.

Consent for Publication

All authors read and approved the final manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Jing, J., Wang, Q. et al. ETQ-learning: an improved Q-learning algorithm for path planning. Intel Serv Robotics 17, 915–929 (2024). https://doi.org/10.1007/s11370-024-00544-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11370-024-00544-3

Keywords

Navigation