Abstract
Path planning algorithm has always been the core of intelligent robot research; a good path planning algorithm can significantly enhance the efficiency of robots in executing tasks. As the application scenarios for intelligent robots continue to diversify, their adaptability to the environment has become a key focus in current path planning algorithm research. As one of the classic reinforcement learning algorithms, Q-learning (QL) algorithm has its inherent advantages in adapting to the environment, but it also faces various challenges and shortcomings. These issues are primarily centered around suboptimal path planning, slow convergence speed, weak generalization capability and poor obstacle avoidance performance. In order to solve these issues in the QL algorithm, we have carried out the following work. (1) We redesign the reward mechanism of QL algorithm. The traditional Q-learning algorithm’s reward mechanism is simple to implement but lacks directionality. We propose a combined reward mechanism of "static assignment + dynamic adjustment." This mechanism can address the issue of random path selection and ultimately lead to optimal path planning. (2) We redesign the greedy strategy of QL algorithm. In the traditional Q-learning algorithm, the greedy factor in the strategy is either randomly generated or set manually, which limits its applicability to some extent. It is difficult to effectively applied to different physical environments and scenarios, which is the fundamental reason for the poor generalization capability of the algorithm. We propose a dynamic adjustment of the greedy factor, known as the \(\varepsilon -acc-increasing\) greedy strategy, which significantly improves the efficiency of Q-learning algorithm and enhances its generalization capability so that the algorithm has a wider range of application scenarios. (3) We introduce a concept to enhance the algorithm’s obstacle avoidance performance. We design the expansion distance, which pre-sets a "collision buffer" between the obstacle and agent to enhance the algorithm’s obstacle avoidance performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
available in GitHub https://github.com/wanghw1003/ETQlearning.
References
Costa MM, Silva MF (2019) A survey on path planning algorithms for mobile robots. In: 2019 IEEE international conference on autonomous robot systems and competitions (ICARSC), IEEE, pp. 1–7
Wang H, Lou S, Jing J, Wang Y, Liu W, Liu T (2022) The EBS-A* algorithm: an improved A* algorithm for path planning. PLoS ONE 17(2):e0263841
Wang H, Qi X, Lou S, Jing J, He H, Liu W (2021) An efficient and robust improved A* algorithm for path planning. Symmetry 13(11):2213
Li D, Yin W, Wong WE, Jian M, Chau M (2021) Quality-oriented hybrid path planning based on A* and Q-learning for unmanned aerial vehicle. IEEE Access 10:7664–7674
Wang B, Liu Z, Li Q, Prorok A (2020) Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robot Autom Lett 5(4):6932–6939
lipei S (2018) Research on intelligent vehicle dynamic path planning algorithm based on improved Q-learning
Zhao M, Lu H, Yang S, Guo F (2020) The experience-memory Q-learning algorithm for robot path planning in unknown environment. IEEE Access 8:47824–47844
Wang J, Ren Z, Liu T, Yu Y, Zhang C (2020) Qplex: duplex dueling multi-agent Q-learning, arXiv preprint arXiv:2008.01062
Hasselt H (2010) Double Q-learning, Advances in neural information processing systems. 23
guojun M, shimin G (2021) Improved Q-learning algorithm and its application to path planning. J Taiyuan Univ Technol 52(1):91
Yunjian P, Jin L (2022) Q-learning path planning based on exploration-exploitation trade-off optimization. Comput Technol Dev. 32(1–7)
chengbo W, zinyu Z, zhiqiang Z, shaobo W (2018) Path planning for unmanned vessels based on Q-learning. Ship Ocean Eng 47(5):168–171
Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O et al (2017) Noisy networks for exploration, arXiv preprint arXiv:1706.10295
Ates U (2020) Long-term planning with deep reinforcement learning on autonomous drones. In: Innovations in intelligent systems and applications conference (ASYU). IEEE 2020:1–6
Zijian H, Xiaoguang G, Kaifang W, Yiwei Z, Qianglong W (2021) Relevant experience learning: a deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments. Chin J Aeronaut 34(12):187–204
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, PMLR, pp. 1889–1897
Zhang T, Huo X, Chen S, Yang B, Zhang G (2018) Hybrid path planning of a quadrotor UAV based on q-learning algorithm. In: 37th Chinese control conference (CCC). IEEE 5415–5419
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347
Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Pieter Abbeel O, Zaremba W (2017) Hindsight experience replay, Advances in neural information processing systems 30
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, PMLR, pp. 1861–1870
Kumar A, Gupta A, Levine S (2020) Discor: corrective feedback in reinforcement learning via distribution correction. Adv Neural Inf Process Syst 33:18560–18572
Kong D, Yang L (2022) Provably feedback-efficient reinforcement learning via active reward learning. Adv Neural Inf Process Syst 35:11063–11078
Song Y, Steinweg M, Kaufmann E, Scaramuzza D (2021) Autonomous drone racing with deep reinforcement learning. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp. 1205–1212
Wang Z, Yang H, Wu Q, Zheng J (2021) Fast path planning for unmanned aerial vehicles by self-correction based on Q-learning. J Aerosp Inf Syst 18(4):203–211
Yan C, Xiang X (2018) A path planning algorithm for UAV based on improved q-learning. In: 2nd international conference on robotics and automation sciences (ICRAS). IEEE :1–5
de Carvalho KB, de Oliveira IRL, Villa DK, Caldeira AG, Sarcinelli-Filho M, Brandão AS (2022) Q-learning based path planning method for UAVs using priority shifting. In: 2022 International Conference on Unmanned Aircraft Systems (ICUAS), IEEE, pp. 421–426
Li S, Xu X, Zuo L (2015) Dynamic path planning of a mobile robot with improved Q-learning algorithm. In: IEEE international conference on information and automation. IEEE 409–414
Wang Y, Wang S, Xie Y, Hu Y, Li H (2022) Q-learning-based collision-free path planning for mobile robot in unknown environment. In: 2022 IEEE 17th conference on industrial electronics and applications (ICIEA), IEEE, pp. 1104–1109
Funding
The government of Henan Province, China, supported the research in the form of the Henan Provincial Key R & D Special Funds(221111210300).
Author information
Authors and Affiliations
Contributions
Conceptualization was performed by H.W.; methodology by H.W. and J.J.; software by Q.W.; validation by Q.W. and R.L.; formal analysis by H.H.; writing original draft by H.W.; writing review and editing by J.J. and R.L.; funding by X.Q.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they do not have any conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to Participate
All authors agreed to participate the research.
Consent for Publication
All authors read and approved the final manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, H., Jing, J., Wang, Q. et al. ETQ-learning: an improved Q-learning algorithm for path planning. Intel Serv Robotics 17, 915–929 (2024). https://doi.org/10.1007/s11370-024-00544-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-024-00544-3