ETQ-learning: an improved Q-learning algorithm for path planning

Huanwei Wang¹^na1,
Jing Jing¹^na1,
Qianlv Wang¹,
Hongqi He¹,
Xuyan Qi¹ &
…
Rui Lou ORCID: orcid.org/0009-0004-5639-9724¹

335 Accesses
Explore all metrics

Abstract

Path planning algorithm has always been the core of intelligent robot research; a good path planning algorithm can significantly enhance the efficiency of robots in executing tasks. As the application scenarios for intelligent robots continue to diversify, their adaptability to the environment has become a key focus in current path planning algorithm research. As one of the classic reinforcement learning algorithms, Q-learning (QL) algorithm has its inherent advantages in adapting to the environment, but it also faces various challenges and shortcomings. These issues are primarily centered around suboptimal path planning, slow convergence speed, weak generalization capability and poor obstacle avoidance performance. In order to solve these issues in the QL algorithm, we have carried out the following work. (1) We redesign the reward mechanism of QL algorithm. The traditional Q-learning algorithm’s reward mechanism is simple to implement but lacks directionality. We propose a combined reward mechanism of "static assignment + dynamic adjustment." This mechanism can address the issue of random path selection and ultimately lead to optimal path planning. (2) We redesign the greedy strategy of QL algorithm. In the traditional Q-learning algorithm, the greedy factor in the strategy is either randomly generated or set manually, which limits its applicability to some extent. It is difficult to effectively applied to different physical environments and scenarios, which is the fundamental reason for the poor generalization capability of the algorithm. We propose a dynamic adjustment of the greedy factor, known as the $\varepsilon -acc-increasing$ greedy strategy, which significantly improves the efficiency of Q-learning algorithm and enhances its generalization capability so that the algorithm has a wider range of application scenarios. (3) We introduce a concept to enhance the algorithm’s obstacle avoidance performance. We design the expansion distance, which pre-sets a "collision buffer" between the obstacle and agent to enhance the algorithm’s obstacle avoidance performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Algorithm for Path Planning Based on Improved Q-Learning

Research on Path Planning Algorithm for Mobile Robot Based on Improved Reinforcement Learning

Reinforcement learning path planning algorithm based on obstacle area expansion strategy

Article 03 February 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

available in GitHub https://github.com/wanghw1003/ETQlearning.

References

Costa MM, Silva MF (2019) A survey on path planning algorithms for mobile robots. In: 2019 IEEE international conference on autonomous robot systems and competitions (ICARSC), IEEE, pp. 1–7
Wang H, Lou S, Jing J, Wang Y, Liu W, Liu T (2022) The EBS-A* algorithm: an improved A* algorithm for path planning. PLoS ONE 17(2):e0263841
Article Google Scholar
Wang H, Qi X, Lou S, Jing J, He H, Liu W (2021) An efficient and robust improved A* algorithm for path planning. Symmetry 13(11):2213
Article Google Scholar
Li D, Yin W, Wong WE, Jian M, Chau M (2021) Quality-oriented hybrid path planning based on A* and Q-learning for unmanned aerial vehicle. IEEE Access 10:7664–7674
Article Google Scholar
Wang B, Liu Z, Li Q, Prorok A (2020) Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robot Autom Lett 5(4):6932–6939
Article Google Scholar
lipei S (2018) Research on intelligent vehicle dynamic path planning algorithm based on improved Q-learning
Zhao M, Lu H, Yang S, Guo F (2020) The experience-memory Q-learning algorithm for robot path planning in unknown environment. IEEE Access 8:47824–47844
Article Google Scholar
Wang J, Ren Z, Liu T, Yu Y, Zhang C (2020) Qplex: duplex dueling multi-agent Q-learning, arXiv preprint arXiv:2008.01062
Hasselt H (2010) Double Q-learning, Advances in neural information processing systems. 23
guojun M, shimin G (2021) Improved Q-learning algorithm and its application to path planning. J Taiyuan Univ Technol 52(1):91
Google Scholar
Yunjian P, Jin L (2022) Q-learning path planning based on exploration-exploitation trade-off optimization. Comput Technol Dev. 32(1–7)
chengbo W, zinyu Z, zhiqiang Z, shaobo W (2018) Path planning for unmanned vessels based on Q-learning. Ship Ocean Eng 47(5):168–171
Google Scholar
Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O et al (2017) Noisy networks for exploration, arXiv preprint arXiv:1706.10295
Ates U (2020) Long-term planning with deep reinforcement learning on autonomous drones. In: Innovations in intelligent systems and applications conference (ASYU). IEEE 2020:1–6
Zijian H, Xiaoguang G, Kaifang W, Yiwei Z, Qianglong W (2021) Relevant experience learning: a deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments. Chin J Aeronaut 34(12):187–204
Article Google Scholar
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, PMLR, pp. 1889–1897
Zhang T, Huo X, Chen S, Yang B, Zhang G (2018) Hybrid path planning of a quadrotor UAV based on q-learning algorithm. In: 37th Chinese control conference (CCC). IEEE 5415–5419
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347
Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Pieter Abbeel O, Zaremba W (2017) Hindsight experience replay, Advances in neural information processing systems 30
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, PMLR, pp. 1861–1870
Kumar A, Gupta A, Levine S (2020) Discor: corrective feedback in reinforcement learning via distribution correction. Adv Neural Inf Process Syst 33:18560–18572
Google Scholar
Kong D, Yang L (2022) Provably feedback-efficient reinforcement learning via active reward learning. Adv Neural Inf Process Syst 35:11063–11078
Google Scholar
Song Y, Steinweg M, Kaufmann E, Scaramuzza D (2021) Autonomous drone racing with deep reinforcement learning. In: 2021 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp. 1205–1212
Wang Z, Yang H, Wu Q, Zheng J (2021) Fast path planning for unmanned aerial vehicles by self-correction based on Q-learning. J Aerosp Inf Syst 18(4):203–211
Google Scholar
Yan C, Xiang X (2018) A path planning algorithm for UAV based on improved q-learning. In: 2nd international conference on robotics and automation sciences (ICRAS). IEEE :1–5
de Carvalho KB, de Oliveira IRL, Villa DK, Caldeira AG, Sarcinelli-Filho M, Brandão AS (2022) Q-learning based path planning method for UAVs using priority shifting. In: 2022 International Conference on Unmanned Aircraft Systems (ICUAS), IEEE, pp. 421–426
Li S, Xu X, Zuo L (2015) Dynamic path planning of a mobile robot with improved Q-learning algorithm. In: IEEE international conference on information and automation. IEEE 409–414
Wang Y, Wang S, Xie Y, Hu Y, Li H (2022) Q-learning-based collision-free path planning for mobile robot in unknown environment. In: 2022 IEEE 17th conference on industrial electronics and applications (ICIEA), IEEE, pp. 1104–1109

Download references

Funding

The government of Henan Province, China, supported the research in the form of the Henan Provincial Key R & D Special Funds(221111210300).

Author information

Huanwei Wang and Jing Jing have equally contributed to this work.

Authors and Affiliations

PLA Information Engineering University, Science Ave 62, ZhengZhou, 450001, Henan, China
Huanwei Wang, Jing Jing, Qianlv Wang, Hongqi He, Xuyan Qi & Rui Lou

Authors

Huanwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Jing
View author publications
You can also search for this author in PubMed Google Scholar
Qianlv Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hongqi He
View author publications
You can also search for this author in PubMed Google Scholar
Xuyan Qi
View author publications
You can also search for this author in PubMed Google Scholar
Rui Lou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization was performed by H.W.; methodology by H.W. and J.J.; software by Q.W.; validation by Q.W. and R.L.; formal analysis by H.H.; writing original draft by H.W.; writing review and editing by J.J. and R.L.; funding by X.Q.

Corresponding author

Correspondence to Rui Lou.

Ethics declarations

Conflict of interest

The authors declare that they do not have any conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to Participate

All authors agreed to participate the research.

Consent for Publication

All authors read and approved the final manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, H., Jing, J., Wang, Q. et al. ETQ-learning: an improved Q-learning algorithm for path planning. Intel Serv Robotics 17, 915–929 (2024). https://doi.org/10.1007/s11370-024-00544-3

Download citation

Received: 27 December 2023
Accepted: 12 May 2024
Published: 26 June 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s11370-024-00544-3

ETQ-learning: an improved Q-learning algorithm for path planning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Algorithm for Path Planning Based on Improved Q-Learning

Research on Path Planning Algorithm for Mobile Robot Based on Improved Reinforcement Learning

Reinforcement learning path planning algorithm based on obstacle area expansion strategy

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to Participate

Consent for Publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

ETQ-learning: an improved Q-learning algorithm for path planning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Algorithm for Path Planning Based on Improved Q-Learning

Research on Path Planning Algorithm for Mobile Robot Based on Improved Reinforcement Learning

Reinforcement learning path planning algorithm based on obstacle area expansion strategy

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to Participate

Consent for Publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation