Path Planning of Coastal Ships Based on Optimized DQN Reward Function
<p>Reinforcement learning schematic.</p> "> Figure 2
<p>Gridding process of marine environmental: (<b>a</b>) before gridding of marine environment; (<b>b</b>) after gridding of marine environment.</p> "> Figure 3
<p>Marine environmental information extraction and processing.</p> "> Figure 4
<p>Ship encounter situation based on COLREGS.</p> "> Figure 5
<p>Schematic diagram of ship state space.</p> "> Figure 6
<p>Schematic diagram of coastal ship path planning based on optimization deep Q network (DQN) algorithm.</p> "> Figure 7
<p>Two-dimensional simulation training marine environment: (<b>a</b>) real marine environment; (<b>b</b>) simulation experiment environment.</p> "> Figure 8
<p>Path planning results under different iterations. (<b>a</b>) 500th iteration; (<b>b</b>) 1000th iteration; (<b>c</b>) 2000th iteration; (<b>d</b>) 3000th iteration.</p> "> Figure 8 Cont.
<p>Path planning results under different iterations. (<b>a</b>) 500th iteration; (<b>b</b>) 1000th iteration; (<b>c</b>) 2000th iteration; (<b>d</b>) 3000th iteration.</p> "> Figure 9
<p>Path planning results in case 1 of marine environment. (<b>a</b>) Initial marine environmental information; (<b>b</b>) 3000th iteration in case 1.</p> "> Figure 10
<p>Path planning results in case 2 of marine environment. (<b>a</b>) Initial marine environmental information; (<b>b</b>) 3000th iteration in case 2.</p> "> Figure 11
<p>Combination of obstacle potential and DQN-based ship path planning. (<b>a</b>) Initial marine environment of obstacle potential; (<b>b</b>) Initial position of starting point and target point; (<b>c</b>) Model prediction path process; (<b>d</b>) Model prediction end of path.</p> "> Figure 12
<p>The path optimized by the Douglas–Peucker (DP) algorithm.</p> "> Figure 13
<p>Results of path planning in electronic chart.</p> "> Figure 14
<p>The convergence trend comparison results of step iteration.</p> "> Figure 15
<p>Comparison of path planning results under different algorithms. (<b>a</b>) Path planning based on traditional DQN; (<b>b</b>) Path planning based on Q-learning algorithm; (<b>c</b>) Path planning based on DDPG algorithm; (<b>d</b>) Path planning based on A* algorithm; (<b>e</b>) Path planning based on BUG2 algorithm; (<b>f</b>) Path planning based on APF algorithm.</p> "> Figure 15 Cont.
<p>Comparison of path planning results under different algorithms. (<b>a</b>) Path planning based on traditional DQN; (<b>b</b>) Path planning based on Q-learning algorithm; (<b>c</b>) Path planning based on DDPG algorithm; (<b>d</b>) Path planning based on A* algorithm; (<b>e</b>) Path planning based on BUG2 algorithm; (<b>f</b>) Path planning based on APF algorithm.</p> ">
Abstract
:1. Introduction
- 1.
- Processing of environmental information
- 2.
- Path search
- 3.
- Path smoothing
2. Related Research
3. Model of Coastal Ship Path Planning
3.1. Environmental Status Information Processing
3.1.1. Gridding of Marine Environment
3.1.2. Ship Navigation in Accordance with COLREGS
3.2. Design of Coastal Ship Path Planning Model
3.2.1. Optimized Design of the Reward Function
- 1.
- The potential function reward of target point to ship is designed as follows:
- 2.
- The reward area near the target point is designed as follows:
- 3.
- The dangerous area near the obstacle is designed as follows:
3.2.2. Description of State Space and Action Space
3.2.3. Action Exploration Strategy
3.2.4. Training and Learning Process of Model
3.3. Execution Process of Coastal Ship Path Planning Model
- Firstly, the ship’s environment is processed, and the state information of the ship and the environment is obtained by the sensing equipment.
- Call the model and take the state information as the input data of the model. After the model calculation, we can get the action that the ship should take under the current state.
- The ship controller obtains the action and executes the action in the current ship state.
- Obtain the state information of the next moment after the ship performs the action, and determine whether the ship state after execution is the end state.
- If the current state is not the end state, the ship state information at the next moment will be handed over to the model for further use. If the current state is in the end state, it indicates that the ship has reached the target point, and the calculation and call of the model are finished.
Algorithm 1.DQN algorithm for path planning of coastal ships |
|
4. Experimental Verification and Result Analysis
4.1. Experimental Environment Construction and Algorithm Parameter Setting
4.2. Model Validation Results
4.3. Path Smoothing
4.3.1. Douglas–Peucker Algorithm
- Connect a straight line, AB, between two points A and B at the beginning and end of a trajectory curve, which is the chord of the curve, traverse all other points on the curve, calculate the distance from each point to the line AB, find the point C with the maximum distance, and mark the maximum distance as .
- The distance is compared with the pre-defined threshold . If , the AB curve segment is approximated and the curve segment is processed.
- If , the curve AB is divided into AC and CB at point C, and the two sections are processed in steps (1) (2) respectively.
- When all the curves are processed, the broken line formed by connecting each segmentation point in turn is the path of the original curve.
4.3.2. Path Smoothing Based on DP Algorithm
4.4. Comparative Analysis of Experimental Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Roberts, G.N.; Sutton, R.; Zirilli, A.; Tiano, A. Intelligent ship autopilots––A historical perspective. Mechatronics 2003, 13, 1091–1103. [Google Scholar] [CrossRef]
- European Maritime Safety Agency (EMSA). Annual Overview of Marine Casualties and Incidents; EMSA: Lisbon, Portugal, 2018. [Google Scholar]
- Norstad, I.; Fagerholt, K.; Laporte, G. Tramp ship routing and scheduling with speed optimization. Transp. Res. Part. C Emerg. Technol. 2011, 19, 853–865. [Google Scholar] [CrossRef]
- Tsai, C.-C.; Huang, H.-C.; Chan, C.-K. Parallel Elite Genetic Algorithm and Its Application to Global Path Planning for Autonomous Robot Navigation. IEEE Trans. Ind. Electron. 2011, 58, 4813–4821. [Google Scholar] [CrossRef]
- Lyu, H.; Yin, Y. COLREGS-Constrained Real-time Path Planning for Autonomous Ships Using Modified Artificial Potential Fields. J. Navig. 2019, 72, 588–608. [Google Scholar] [CrossRef]
- Boschian, V.; Pruski, A. Grid modeling of robot cells: A memory-efficient approach. J. Intell. Robot. Syst. 1993, 8, 201–223. [Google Scholar] [CrossRef]
- Kuwata, Y.; Wolf, M.T.; Zarzhitsky, D.; Huntsberger, T.L. Safe Maritime Autonomous Navigation With COLREGS, Using Velocity Obstacles. IEEE J. Ocean. Eng. 2014, 39, 110–119. [Google Scholar] [CrossRef]
- Mankabady, S. Volume 1: International Shipping Rules. In The International Maritime Organization; Croom Helm: London, UK, 1987; pp. 299–300. [Google Scholar]
- Petres, C.; Romero-Ramirez, M.-A.; Plumet, F. Reactive path planning for autonomous sailboat. In Proceedings of the 15th International Conference on Advanced Robotics (ICAR), Tallin, Estonia, 20–23 June 2011; pp. 112–117. [Google Scholar]
- Campbell, S.; Naeem, W. A Rule-based Heuristic Method for COLREGS-compliant Collision Avoidance for an Unmanned Surface Vehicle. IFAC Proc. 2012, 45, 386–391. [Google Scholar] [CrossRef]
- Xue, Y.; Clelland, D.; Lee, B.; Han, D. Automatic simulation of ship navigation. Ocean. Eng. 2011, 38, 2290–2305. [Google Scholar] [CrossRef]
- Xin, J.; Li, S.; Sheng, J.; Zhang, Y.; Cui, Y. Application of Improved Particle Swarm Optimization for Navigation of Unmanned Surface Vehicles. Sensors 2019, 19, 3096. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xin, J.; Zhong, J.; Yang, F.; Cui, Y.; Sheng, J. An Improved Genetic Algorithm for Path-Planning of Unmanned Surface Vehicle. Sensors 2019, 19, 2640. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ding, F.; Zhang, Z.; Fu, M.; Wang, Y.; Wang, C. Energy-efficient Path Planning and Control Approach of USV Based on Particle Swarm optimization. In Proceedings of the OCEANS Conference, Charleston, SC, USA; 2018; pp. 1–6. [Google Scholar]
- Lazarowska, A. Ship’s trajectory planning for collision avoidance at sea based on ant colony optimization. J. Navig. 2015, 68, 291–307. [Google Scholar] [CrossRef] [Green Version]
- Zhou, X.; Wu, P.; Zhang, H.; Guo, W.; Liu, Y. Learn to Navigate: Cooperative Path Planning for Unmanned Surface Vehicles Using Deep Reinforcement Learning. IEEE Access 2019, 7, 165262–165278. [Google Scholar] [CrossRef]
- Chen, C.; Chen, X.-Q.; Ma, F.; Zeng, X.-J.; Wang, J. A knowledge-free path planning approach for smart ships based on reinforcement learning. Ocean. Eng. 2019, 189, 106299. [Google Scholar] [CrossRef]
- Zhang, X.; Wang, C.; Liu, Y.; Chen, X. Decision-Making for the Autonomous Navigation of Maritime Autonomous Surface Ships Based on Scene Division and Deep Reinforcement Learning. Sensors 2019, 19, 4055. [Google Scholar] [CrossRef] [Green Version]
- Yu, R.; Shi, Z.; Huang, C.; Li, T.; Ma, Q. Deep reinforcement learning based optimal trajectory tracking control of autonomous underwater vehicle. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017. [Google Scholar]
- Wang, C.-B.; Zhang, X.-Y.; Zhang, J.-W.; Ding, Z.-G.; An, L.-X. Navigation behavioural decision-making of MASS based on deep reinforcement learning and artificial potential field. J. Phys. Conf. Ser. 2019, 1357, 012026. [Google Scholar] [CrossRef] [Green Version]
- Sutton, R.; Barto, A. Reinforcement Learning: An Introduction. IEEE Trans. Neural Netw. 1998, 9, 1054. [Google Scholar] [CrossRef]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal. Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef] [Green Version]
- Volodymyr, M.; Koray, K.; David, S. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Kim, H.; Kim, D.; Shin, J.-U.; Kim, H.; Myung, H. Angular rate-constrained path planning algorithm for unmanned surface vehicles. Ocean. Eng. 2014, 84, 37–44. [Google Scholar] [CrossRef]
- Paul, W.O.; Jeong-Hyon, H. Compression of trajectory data: A comprehensive evaluation and new approach. Geoinformatica 2014, 18, 435–460. [Google Scholar]
- Zhang, D.; Zhang, X. Compression algorithm of GPS trajectory data based on space-time characteristics. J. Transp. Inf. Saf. 2013, 3, 6–9. [Google Scholar]
- Douglas, D.H.; Peucker, T.K. Algorithms for the Reduction of the Number of Points Required to Represent a Digitized Line or Its Caricature. Can. J. Cardiol. 1973, 10, 112–122. [Google Scholar] [CrossRef] [Green Version]
Parameters Name | Parameters Value | Description |
---|---|---|
Action space size | 8 | Optional action of the ship |
Learning rate | 0.01 | Learning rate of neural network |
Decay factor | 0.9 | Decay factor of cumulative reward |
Exploration rate | 1 | Exploration rate of action |
Explore decay rate | 0.995 | Exploration decay rate of action |
Experience replay memory | 10,000 | Stores historical experience data |
Sample size | 512 | Size of extracted empirical data |
Hidden layers size | 3 | Size of hidden layers in networks |
Number of neurons | 128 | Number of neurons in hidden layer |
Activation function | ReLu | Neuron activation function |
Before Optimization | After Optimization | |
---|---|---|
Number of path corners | 6 | 1 |
Path length/meter | 137.612 | 111.318 |
Method | Path Length/Meter | Time/s | Number of Path Corners |
---|---|---|---|
Optimized DQN | 137.612 | 0.6105 | 6 |
Traditional DQN | 154.241 | 0.9254 | 16 |
Q-learning | 146.617 | 0.7613 | 6 |
DDPG | 172.315 | 0.8251 | - |
A* | 116.421 | 1.2253 | 28 |
BUG2 | 92.728 | 2.3268 | 6 |
APF | 135.513 | 1.6216 | - |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, S.; Zhang, X.; Du, Y.; Zheng, Y.; Cao, Z. Path Planning of Coastal Ships Based on Optimized DQN Reward Function. J. Mar. Sci. Eng. 2021, 9, 210. https://doi.org/10.3390/jmse9020210
Guo S, Zhang X, Du Y, Zheng Y, Cao Z. Path Planning of Coastal Ships Based on Optimized DQN Reward Function. Journal of Marine Science and Engineering. 2021; 9(2):210. https://doi.org/10.3390/jmse9020210
Chicago/Turabian StyleGuo, Siyu, Xiuguo Zhang, Yiquan Du, Yisong Zheng, and Zhiying Cao. 2021. "Path Planning of Coastal Ships Based on Optimized DQN Reward Function" Journal of Marine Science and Engineering 9, no. 2: 210. https://doi.org/10.3390/jmse9020210
APA StyleGuo, S., Zhang, X., Du, Y., Zheng, Y., & Cao, Z. (2021). Path Planning of Coastal Ships Based on Optimized DQN Reward Function. Journal of Marine Science and Engineering, 9(2), 210. https://doi.org/10.3390/jmse9020210