This research is concerned with the motion planning problem encountered by underactuated autonomous underwater vehicles (AUVs) in a mapless environment. A motion planning system based on deep reinforcement learning is proposed. This system, which directly optimizes the policy, is an end-to-end motion planning system. It uses sensor information as input and continuous surge force and yaw moment as output. It can reach multiple target points in a sequence while simultaneously avoiding obstacles. In addition, this study proposes a reward curriculum training method to solve the problem in which the number of samples required for random exploration increases exponentially with the number of steps needed to obtain a reward. At the same time, the negative impact of intermediate rewards can be avoided. The proposed system demonstrates good planning ability for a mapless environment and excellent ability to migrate to other unknown environments. The system also has resistance to current disturbances. The simulation results show that the proposed mapless motion planning system can guide an underactuated AUV in navigating to its desired targets without colliding with any obstacles.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp. 41–48. ACM (2009)
Carreras, M., Batlle, J., Ridao, P.: Hybrid coordination of reinforcement learning-based behaviors for auv control. In: 2001 IEEE/RSJ international conference on intelligent robots and systems, 2001. Proceedings, vol. 3, pp. 1410–1415. IEEE (2001)
Carreras Pérez, M., Yuh, J., Batlle i Grabulosa, J., Ridao Rodríguez, P.: A behavior-based scheme using reinforcement learning for autonomous underwater vehicles. Ⓒ Oceanic Engineering 30, 416–427 (2005)
Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: Learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)
Cheng, Y., Zhang, W.: Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels. Neurocomputing 272, 63–73 (2018)
Cui, R., Yang, C., Li, Y., Sharma, S.: Adaptive neural network control of auvs with control input nonlinearities using reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. Hum. 47(6), 1019–1029 (2017)
Devlin, S., Kudenko, D., Grześ, M.: An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Adv. Complex Syst. 14(02), 251–278 (2011)
El-Fakdi, A., Carreras, M.: Policy gradient based reinforcement learning for real autonomous underwater cable tracking. In: IEEE/RSJ international conference on intelligent robots and systems, 2008, IROS 2008. pp. 3635–3640. IEEE (2008)
El-Fakdi, A., Carreras, M.: Two-step gradient-based reinforcement learning for underwater robotics behavior learning. Robot. Auton. Syst. 61(3), 271–282 (2013)
Fossen, T.I.: Handbook of marine craft hydrodynamics and motion control. John Wiley & Sons (2011)
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S.G., Grefenstette, E., Ramalho, T., Agapiou, J., et al.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471 (2016)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA), pp. 3389–3396. IEEE (2017)
Heess, N., Hunt, J.J., Lillicrap, T.P., Silver, D.: Memory-based control with recurrent neural networks. arXiv:1512.04455 (2015)
Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, A., Riedmiller, M., et al.: Emergence of locomotion behaviours in rich environments. arXiv:1707.02286 (2017)
Kawano, H., Ura, T.: Motion planning algorithm for nonholonomic autonomous underwater vehicle in disturbance using reinforcement learning and teaching method. In: IEEE international conference on robotics and automation, 2002. Proceedings. ICRA’02, vol. 4, pp. 4032–4038. IEEE (2002)
Kormushev, P., Caldwell, D.G.: Towards improved auv control through learning of periodic signals. In: Oceans-San Diego, 2013, pp. 1–4. IEEE (2013)
Lei, T., Ming, L.: A robot exploration strategy based on q-learning network. In: IEEE international conference on real-time computing and robotics (RCAR), pp. 57–62. IEEE (2016)
Li, Y., Cui, R., Li, Z., Xu, D.: Neural network approximation-based near-optimal motion planning with kinodynamic constraints using rrt. IEEE Transactions on Industrial Electronics (2018)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D. arXiv:1509.02971 (2015)
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp. 1928–1937 (2016)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Muller, U., Ben, J., Cosatto, E., Flepp, B., Cun, Y.L.: Off-road obstacle avoidance through end-to-end learning. In: Advances in neural information processing systems, pp. 739–746 (2006)
Ng, A.Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: Experimental Robotics IX, pp. 363–372. Springer (2006)
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)
Pfeiffer, M., Schaeuble, M., Nieto, J., Siegwart, R., Cadena, C.: From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots. In: 2017 IEEE international conference on robotics and automation (icra), pp. 1527–1533. IEEE (2017)
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv:1710.05941 (2018)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv:1511.05952 (2015)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International conference on machine learning, pp. 1889–1897 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Tai, L., Liu, M. arXiv:1610.01733 (2016)
Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 31–36. IEEE (2017)
Tambet, M., Avital, O., Taco, C., John, S.: Teacher-student curriculum learning. arXiv:1707.00183 (2017)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI, vol. 2, pp. 5. Phoenix, AZ (2016)
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv:1511.06581 (2015)
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in neural information processing systems, pp. 5279–5288 (2017)
Xiao, H., Cui, R., Xu, D.: A sampling-based bayesian approach for cooperative multiagent online search with resource constraints. IEEE Trans Cybern 48(6), 1773–1785 (2018)
Xie, C., Patil, S., Moldovan, T., Levine, S., Abbeel, P.: Model-based reinforcement learning with parametrized physical models and optimism-driven exploration. In: 2016 IEEE international conference on robotics and automation (ICRA), pp. 504–511. IEEE (2016)
Zaremba, W., Sutskever, I.: Learning to execute. arXiv:1410.4615 (2014)
Zhang, F., Leitner, J., Milford, M., Upcroft, B., Corke, P.: Towards vision-based deep reinforcement learning for robotic motion control (2015). arXiv:1511.03791
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sun, Y., Cheng, J., Zhang, G. et al. Mapless Motion Planning System for an Autonomous Underwater Vehicle Using Policy Gradient-based Deep Reinforcement Learning. J Intell Robot Syst 96, 591–601 (2019). https://doi.org/10.1007/s10846-019-01004-2
Issue Date:
DOI: https://doi.org/10.1007/s10846-019-01004-2