Goal-Oriented Obstacle Avoidance with Deep Reinforcement Learning in Continuous Action Space
<p>Architecture of the proposed actor network. Depth-wise separable convolution is performed on a stack of depth images. Max pooling is performed on the output. Positional information is concatenated with the depth information, followed by two fully connected layers and an output layer. ⊛ and ⊕ denote convolution and concatenation, respectively.</p> "> Figure 2
<p>Full architecture of the proposed network including the actor and critic parts. Both parts of the network use the same depth and position information as state inputs. The calculated actions from the actor are sent to the critic to update the network parameters. ⊛ and ⊕ denote convolution and concatenation, respectively.</p> "> Figure 3
<p>Environment in the Gazebo simulator for training.</p> "> Figure 4
<p>Simulation results with the shapes of different geometrical compositions. The green path designates successful motion. The red path indicates a motion resulting in a collision. The orange path designates a situation where the robot encounters a deadlock. (<b>a</b>) Cube, (<b>b</b>) sphere, (<b>c</b>) 3 thin walls, (<b>d</b>) goal point between two cubes, (<b>e</b>) a coffee table with a surface on a pole, (<b>f</b>) person, (<b>g</b>) empty bookshelf, (<b>h</b>) concave corner, (<b>i</b>) room, (<b>j</b>) long wall.</p> "> Figure 5
<p>Validation odometry information in a simulated environment. Numbers depict the sequence in which the robot needs to navigate to designated points. The green path visualizes the trajectory of each respective method. (<b>a</b>) Experimental results of the Erle-rover laser-based method. (<b>b</b>) Experimental results of the ADDPG sparse laser-based method. (<b>c</b>) Experimental results of the CDDPG depth image-based method.</p> "> Figure 6
<p>(<b>a</b>) Robot setup for experiments in a real environment. (<b>b</b>) Example of network outputs in a real environment. An RGB image is shown for visualization purposes, and the proposed network uses only provided depth images. <span class="html-italic">v</span> represents the linear and <math display="inline"><semantics> <mi>ω</mi> </semantics></math> angular velocities, respectively. One represents the maximal possible value of the respective velocity.</p> "> Figure 7
<p>Results in the real environment. The green path depicts the robot’s movements from odometry data. Red parts of the path show the human intervention after a crash. The orange part of the path shows the human intervention after the robot encountered a deadlock. Blue shapes depict the location and shape of the obstacles. Numbers from 0 to 6 describe the locations and sequence of the target goals. Images show the robot’s view of the environment at the location. (<b>a</b>) CDDPG performance in an environment without added obstacles; (<b>b</b>) CDDPG performance in an environment with added obstacles; (<b>c</b>) ADDPG performance in an environment without added obstacles; (<b>d</b>) ADDPG performance in an environment with added obstacles.</p> "> Figure 8
<p>Results in the dynamic environment. The green path depicts the robot’s movements from odometry data. Blue shapes depict obstacles already in the scene, and orange shapes depict newly introduced obstacles. A human obstacle was introduced in the scene, and if it was in motion, its opaque shape designated the starting position of the motion. The motion direction is visualized by an orange arrow. Numbers from 0 to 3 describe the locations and sequence of the target goals. (<b>a</b>) Layout of the experiment environment. (<b>b</b>) Experiment without obstacles. (<b>c</b>,<b>d</b>) Experiments with new static obstacles. (<b>e</b>,<b>f</b>) Experiments with relocated static and dynamic human obstacles. (<b>g</b>,<b>h</b>) Experiments with dynamic human obstacles.</p> ">
Abstract
:1. Introduction
- Creation of a convolutional deep deterministic policy gradient network for tackling a large amount of input data.
- Development of a deep deterministic policy gradient network with mixed inputs for goal-oriented collision avoidance.
- Transfer of a network, learned in a simulation, to the real environment for map-less vector navigation with depth image inputs.
2. Related Works
3. Deep Learning Network for Collision Avoidance in a Continuous Action Space
3.1. Convolutional Deep Deterministic Policy Gradient
3.2. Reward
4. Training
5. Experiments
5.1. Experiments in the Simulated Environment
5.2. Experiments in a Real Environment
6. Summary and Discussion
Author Contributions
Funding
Conflicts of Interest
Abbreviations
RRT* | Rapidly-exploring Random Tree Star |
SLAM | Simultaneous Localization and Mapping |
D3QN | Deep Double Q Network |
DDPG | Deep Deterministic Policy Gradient |
ADDPG | Asynchronous Deep Deterministic Policy Gradient |
CDDPG | Convolutional Deep Deterministic Policy Gradient |
ReLU | Rectified Linear Unit |
ROS | Robot Operating System |
References
- Sifre, L.; Mallat, S. Rigid-motion scattering for texture classification. arXiv 2014, arXiv:1403.1687. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Sariff, N.; Buniyamin, N. An overview of autonomous mobile robot path planning algorithms. In Proceedings of the 2006 4th Student Conference on Research and Development, Selangor, Malaysia, 27–28 June 2006; pp. 183–188. [Google Scholar]
- Radmanesh, M.; Kumar, M.; Guentert, P.H.; Sarim, M. Overview of path-planning and obstacle avoidance algorithms for UAVs: A comparative study. Unmanned Syst. 2018, 6, 95–118. [Google Scholar] [CrossRef]
- Noreen, I.; Khan, A.; Habib, Z. A comparison of RRT, RRT* and RRT*-smart path planning algorithms. Int. J. Comput. Sci. Netw. Secur. (IJCSNS) 2016, 16, 20. [Google Scholar]
- Kim, Y.N.; Ko, D.W.; Suh, I.H. Confidence random tree-based algorithm for mobile robot path planning considering the path length and safety. Int. J. Adv. Rob. Syst. 2019, 16, 1729881419838179. [Google Scholar] [CrossRef]
- Cimurs, R.; Suh, I.H. Time-optimized 3D Path Smoothing with Kinematic Constraints. Int. J. Control Autom. Syst. 2020. [Google Scholar] [CrossRef]
- Ribeiro, J.; Silva, M.; Santos, M.; Vidal, V.; Honório, L.; Silva, L.; Rezende, H.; Neto, A.S.; Mercorelli, P.; Pancoti, A. Ant Colony Optimization Algorithm and Artificial Immune System Applied to a Robot Route. In Proceedings of the 2019 20th International Carpathian Control Conference (ICCC), Krakow-Wieliczka, Poland, 26–29 May 2019; pp. 1–6. [Google Scholar]
- Lamini, C.; Benhlima, S.; Elbekri, A. Genetic algorithm based approach for autonomous mobile robot path planning. Procedia Comput. Sci. 2018, 127, 180–189. [Google Scholar] [CrossRef]
- Cimurs, R.; Hwang, J.; Suh, I.H. Bezier curve-based smoothing for path planner with curvature constraint. In Proceedings of the 2017 First IEEE International Conference on Robotic Computing (IRC), Taichung, Taiwan, 10–12 April 2017; pp. 241–248. [Google Scholar]
- Ferguson, D.; Stentz, A. Field D*: An interpolation-based path planner and replanner. In Robotics Research; Springer: Berlin/Heidelberg, Germany, 2007; pp. 239–253. [Google Scholar]
- Ferguson, D.; Stentz, A. The Field D* Algorithm for Improved Path Planning and Replanning in Uniform and Non-Uniform Cost Environments; Tech. Rep. CMU-RI-TR-05-19; Robotics Institute, Carnegie Mellon University: Pittsburgh, PA, USA, 2005. [Google Scholar]
- Dolgov, D.; Thrun, S.; Montemerlo, M.; Diebel, J. Path planning for autonomous vehicles in unknown semi-structured environments. Int. J. Robot. Res. 2010, 29, 485–501. [Google Scholar] [CrossRef]
- Taketomi, T.; Uchiyama, H.; Ikeda, S. Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 2017, 9, 16. [Google Scholar] [CrossRef]
- Fuentes-Pacheco, J.; Ruiz-Ascencio, J.; Rendón-Mancha, J.M. Visual simultaneous localization and mapping: A survey. Artif. Intell. Rev. 2015, 43, 55–81. [Google Scholar] [CrossRef]
- Ko, D.W.; Kim, Y.N.; Lee, J.H.; Suh, I.H. A scene-based dependable indoor navigation system. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 9–14 October 2016; pp. 1530–1537. [Google Scholar]
- Lin, J.; Wang, W.J.; Huang, S.K.; Chen, H.C. Learning based semantic segmentation for robot navigation in outdoor environment. In Proceedings of the 2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS), Otsu, Japan, 27–30 June 2017; pp. 1–5. [Google Scholar]
- Zhang, Y.; Chen, H.; He, Y.; Ye, M.; Cai, X.; Zhang, D. Road segmentation for all-day outdoor robot navigation. Neurocomputing 2018, 314, 316–325. [Google Scholar] [CrossRef]
- Niijima, S.; Sasaki, Y.; Mizoguchi, H. Real-time autonomous navigation of an electric wheelchair in large-scale urban area with 3D map. Adv. Robot. 2019, 33, 1006–1018. [Google Scholar] [CrossRef]
- Pham, H.; Smolka, S.A.; Stoller, S.D.; Phan, D.; Yang, J. A survey on unmanned aerial vehicle collision avoidance systems. arXiv 2015, arXiv:1508.07723. [Google Scholar]
- Hoy, M.; Matveev, A.S.; Savkin, A.V. Algorithms for collision-free navigation of mobile robots in complex cluttered environments: A survey. Robotica 2015, 33, 463–497. [Google Scholar] [CrossRef] [Green Version]
- Garcia-Cruz, X.; Sergiyenko, O.Y.; Tyrsa, V.; Rivas-Lopez, M.; Hernandez-Balbuena, D.; Rodriguez-Quiñonez, J.; Basaca-Preciado, L.; Mercorelli, P. Optimization of 3D laser scanning speed by use of combined variable step. Opt. Lasers Eng. 2014, 54, 141–151. [Google Scholar] [CrossRef]
- Ivanov, M.; Sergiyenko, O.; Tyrsa, V.; Mercorelli, P.; Kartashov, V.; Hernandez, W.; Sheiko, S.; Kolendovska, M. Individual scans fusion in virtual knowledge base for navigation of mobile robotic group with 3D TVS. In Proceedings of the IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society, Washington, DC, USA, 21–23 October 2018; pp. 3187–3192. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529. [Google Scholar] [CrossRef]
- Dann, M.; Zambetta, F.; Thangarajah, J. Integrating skills and simulation to solve complex navigation tasks in Infinite Mario. IEEE Trans. Games 2018, 10, 101–106. [Google Scholar] [CrossRef]
- Tampuu, A.; Matiisen, T.; Kodelja, D.; Kuzovkin, I.; Korjus, K.; Aru, J.; Aru, J.; Vicente, R. Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 2017, 12, e0172395. [Google Scholar] [CrossRef]
- Ding, X.; Zhang, Y.; Liu, T.; Duan, J. Deep learning for event-driven stock prediction. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
- Akita, R.; Yoshihara, A.; Matsubara, T.; Uehara, K. Deep learning for stock prediction using numerical and textual information. In Proceedings of the 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), Okayama, Japan, 26–29 June 2016; pp. 1–6. [Google Scholar]
- Chong, E.; Han, C.; Park, F.C. Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies. Expert Syst. Appl. 2017, 83, 187–205. [Google Scholar] [CrossRef] [Green Version]
- Sünderhauf, N.; Brock, O.; Scheirer, W.; Hadsell, R.; Fox, D.; Leitner, J.; Upcroft, B.; Abbeel, P.; Burgard, W.; Milford, M.; et al. The limits and potentials of deep learning for robotics. Int. J. Robot. Res. 2018, 37, 405–420. [Google Scholar]
- Tai, L.; Li, S.; Liu, M. A deep-network solution towards model-less obstacle avoidance. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, South Korea, 9–14 October 2016; pp. 2759–2764. [Google Scholar]
- Tai, L.; Paolo, G.; Liu, M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 31–36. [Google Scholar]
- Zhu, Y.; Mottaghi, R.; Kolve, E.; Lim, J.J.; Gupta, A.; Fei-Fei, L.; Farhadi, A. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of the 2017 IEEE international conference on robotics and automation (ICRA), Singapore, 29 May–3 June 2017; pp. 3357–3364. [Google Scholar]
- Richter, C.; Roy, N. Safe visual navigation via deep learning and novelty detection. In Proceedings of the Robotics: Science and Systems XIII, Cambridge, MA, USA, 12–16 July 2017. [Google Scholar]
- Zhang, J.; Springenberg, J.T.; Boedecker, J.; Burgard, W. Deep reinforcement learning with successor features for navigation across similar environments. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 2371–2378. [Google Scholar]
- Giusti, A.; Guzzi, J.; Cireşan, D.C.; He, F.L.; Rodríguez, J.P.; Fontana, F.; Faessler, M.; Forster, C.; Schmidhuber, J.; Di Caro, G.; et al. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robot. Autom. Lett. 2016, 1, 661–667. [Google Scholar] [CrossRef] [Green Version]
- Kahn, G.; Villaflor, A.; Pong, V.; Abbeel, P.; Levine, S. Uncertainty-aware reinforcement learning for collision avoidance. arXiv 2017, arXiv:1702.01182. [Google Scholar]
- Xie, L.; Wang, S.; Markham, A.; Trigoni, N. Towards monocular vision based obstacle avoidance through deep reinforcement learning. arXiv 2017, arXiv:1706.09829. [Google Scholar]
- Wang, Y.; He, H.; Sun, C. Learning to navigate through complex dynamic environment with modular deep reinforcement learning. IEEE Trans. Games 2018, 10, 400–412. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Rusu, A.A.; Vecerik, M.; Rothörl, T.; Heess, N.; Pascanu, R.; Hadsell, R. Sim-to-real robot learning from pixels with progressive nets. arXiv 2016, arXiv:1610.04286. [Google Scholar]
- James, S.; Johns, E. 3d simulation for robot arm control with deep q-learning. arXiv 2016, arXiv:1609.03759. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. In Proceedings of the International Conference on Machine Learning (ICML), Beijing, China, 21–26 June 2014; pp. 387–395. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- ROS. Erle-Rover. 2016. Available online: http://wiki.ros.org/Robots/Erle-Rover (accessed on 4 July 2019).
Parameter | Value |
---|---|
Actor Network Learning Rate | 0.0001 |
Critic Network Learning Rate | 0.001 |
Critic Network Discount Factor | 0.99 |
Soft Target Update Parameter | 0.001 |
Buffer Size | 80,000 |
Mini-Batch Size | 10 |
Random Seed Value | 1234 |
Distance (m) | Time (s) | |||||
---|---|---|---|---|---|---|
Erle-rover | ADDPG | CDDPG | Erle-rover | ADDPG | CDDPG | |
Lap 1 | 62.01 | 46.24 | 49.78 | 168 | 128 | 123 |
Lap 2 | 63.25 | 46.74 | 49.97 | 173 | 126 | 129 |
Lap 3 | 63.41 | 46.47 | 49.64 | 171 | 127 | 125 |
Lap 4 | 63.69 | 46.66 | 49.87 | 172 | 125 | 123 |
Lap 5 | 63.84 | 46.61 | 50.05 | 172 | 122 | 126 |
Average | 63.24 | 46.54 | 49.86 | 171.2 | 125.6 | 125.2 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cimurs, R.; Lee, J.H.; Suh, I.H. Goal-Oriented Obstacle Avoidance with Deep Reinforcement Learning in Continuous Action Space. Electronics 2020, 9, 411. https://doi.org/10.3390/electronics9030411
Cimurs R, Lee JH, Suh IH. Goal-Oriented Obstacle Avoidance with Deep Reinforcement Learning in Continuous Action Space. Electronics. 2020; 9(3):411. https://doi.org/10.3390/electronics9030411
Chicago/Turabian StyleCimurs, Reinis, Jin Han Lee, and Il Hong Suh. 2020. "Goal-Oriented Obstacle Avoidance with Deep Reinforcement Learning in Continuous Action Space" Electronics 9, no. 3: 411. https://doi.org/10.3390/electronics9030411
APA StyleCimurs, R., Lee, J. H., & Suh, I. H. (2020). Goal-Oriented Obstacle Avoidance with Deep Reinforcement Learning in Continuous Action Space. Electronics, 9(3), 411. https://doi.org/10.3390/electronics9030411