Reinforcement-Learning-Based Route Generation for Heavy-Traffic Autonomous Mobile Robot Systems
<p>Input layout and the corresponding occupancy grid: (<b>a</b>) layout with pick-up/drop-off locations (yellow circles), white represents the driving area and black are the walls or other static obstacles; (<b>b</b>) the corresponding occupancy grid with the reinforcement-learning agent searching for the routes between pick-up and drop-off locations (the light blue arrows around the agent illustrate four possible actions).</p> "> Figure 2
<p>Example layouts to demonstrate computational complexity: (<b>a</b>) 2 × 2, four pick-up/drop-off locations (four pick-up/drop-off locations means there are 12 directed routes between pick-up/drop-off locations to be found); (<b>b</b>) 3 × 3, four pick-up/drop-off locations; (<b>c</b>) 4 × 4, four pick-up/drop-off locations. The transition between cells is only in directions up, down, left, and right).</p> "> Figure 3
<p>The expected result.</p> "> Figure 4
<p>(<b>a</b>) Layout for the experiments and (<b>b</b>) the corresponding occupancy grid with the cell size of 1 m.</p> "> Figure 5
<p>Physics-based simulation developed using Robot Operating System (ROS) and Gazebo simulator.</p> "> Figure 6
<p>Simulation experiments for validation of the proposed method: (<b>a</b>) proposed approach; (<b>b</b>) baseline approach.</p> "> Figure 7
<p>Other layouts: (<b>a</b>) four pick-up/drop-off locations, one wall, 12 × 12 m; (<b>b</b>) four pick-up/drop-off locations, multiple walls, 12 × 12 m; (<b>c</b>) eight pick-up/drop-off locations, 30 × 20 m; and (<b>d</b>) seventeen pick-up/drop-off locations, 30 × 20 m. Yellow circles are pick-up/drop-off locations, white represents the driving area, and black are the walls or other static obstacles.</p> "> Figure 7 Cont.
<p>Other layouts: (<b>a</b>) four pick-up/drop-off locations, one wall, 12 × 12 m; (<b>b</b>) four pick-up/drop-off locations, multiple walls, 12 × 12 m; (<b>c</b>) eight pick-up/drop-off locations, 30 × 20 m; and (<b>d</b>) seventeen pick-up/drop-off locations, 30 × 20 m. Yellow circles are pick-up/drop-off locations, white represents the driving area, and black are the walls or other static obstacles.</p> "> Figure 8
<p>Episode reward during training the RL agent.</p> "> Figure 9
<p>Routes generated by the trained RL agent. The coloured squares (blue, red, green, and yellow) indicate pick-up/drop-off locations, and the colour of the route indicates to which pick-up location the route belongs. All four locations (blue, red, green, and yellow squares) represent both pick-up as well as drop-off locations (robots may travel in both directions for any pair of these locations). When the robot needs to travel from a pick-up location to a drop-off location, it follows a route that is the same colour as the pick-up location.</p> "> Figure 10
<p>Performances of the proposed and the baseline approach. Above: the performances calculated according to Equation (5). Below: the proportion of tests in which a multi-AMR system failed to finish given tasks due to severe conflicts between the robots.</p> "> Figure 11
<p>Results on different layouts: (<b>a</b>) four pick-up/drop-off locations, one wall, 12 × 12 m; (<b>b</b>) four pick-up/drop-off locations, multiple walls, 12 × 12 m; (<b>c</b>) eight pick-up/drop-off locations, 30 × 20 m; and (<b>d</b>) seventeen pick-up/drop-off locations, 30 × 20 m. The coloured squares (blue, red, green, and yellow, etc.) indicate pick-up/drop-off locations, the colour of the route indicates which pick-up location the route belongs to, the walls/obstacles are represented in light grey, and the driving surface is coloured dark grey.</p> "> Figure 12
<p>Episode reward during training and training times for different layouts. Tests were running on a mobile workstation with Intel Core i7–9750H Processor (12 × 2.6 GHz) and 32 GB RAM.</p> "> Figure 13
<p>Typical failures in the shortest-path based (baseline) approach when having a large number of AMRs in the system.</p> "> Figure 14
<p>Circular traffic in a counter-clockwise direction around the wall.</p> ">
Abstract
:1. Introduction
2. Reinforcement-Learning-Based Route Generation
3. Experimental Validation
3.1. Case Study Description
3.2. Reinforcement-Learning Parameters
3.3. Simulation, Baseline-Method, and other Parameters
- Right-of-way rule (since there may still be some intersections);
- Leave enough space in front of the robot (stop and reverse very slowly if another robot is closer than 2.7 m in front of the robot on the same route segment);
- Stop (and reverse slowly) if the front of the mobile robot gets too close to another mobile robot).
3.4. Different Layouts
4. Results
4.1. Generating Routes
4.2. Comparison of the Proposed to Baseline Approach
4.3. Different Layouts
5. Discussion
- automatic generation of routes;
- scalability in terms of the number of mobile robots (the number of vehicles in the system does not affect the difficulty and time of calculating routes/actions);
- in the generation of routes, the relative importance of each criterion can be adjusted for the specific application of the proposed method;
- a good system performance can be achieved with only a few simple traffic rules;
- the use of the proposed navigation method allows a more accurate (and easier) prediction of the future states of mobile robots (this property could also be exploited in the extension of the method by adding the functionality offered by state-of-the-art on-line path planning methods);
- a small number of possible ways how mobile robots can meet (e.g., at an intersection, on a straight line, or at a turn), this facilitates increasing the robustness of the system;
- additional communication between the robots is minimal (the robots only need to share their current location and orientation);
- compared to the baseline approach, the use of computational resources is minimal (in addition to following the route, the robot uses a simple algorithm to check only the current geometric relations with other mobile robots, i.e., relative position and orientation);
- the proposed method allows for a relatively simple decentralisation of a multi-AMR system.
- the relative importance of each route can be adjusted (some routes may be more important or frequent than others);
- the relative importance of each robot or task could be set by dynamically adjusting the traffic rules;
- the difficulty and time of route computation depends on the number of routes to be found;
- pick-up/drop-off locations must be determined in advance (e.g., if a pick-up/drop-off location later appears elsewhere, the default navigation system must be used to go to that location);
- routes found are static.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Hart, P.E.; Nilsson, N.J.; Raphael, B.B. Formal Basis for the Heuristic Determination of Minimum Cost Path. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- Kavraki, L.E.; Švestka, P.; Latombe, J.C.; Overmars, M.H. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans. Robot. Autom. 1996, 12, 566–580. [Google Scholar] [CrossRef] [Green Version]
- LaValle, S.M. Rapidly-Exploring Random Trees: A New Tool for Path Planning. In Technical Report No. 98–11; 1998; Available online: https://www.cs.csustan.edu/~xliang/Courses/CS4710-21S/Papers/06%20RRT.pdf (accessed on 20 April 2021).
- Khatib, O. Real-time obstacle avoidance for manipulators and mobile robots. In Proceedings of the 1985 IEEE International Conference on Robotics and Automation, St. Louis, MO, USA, 25–28 March 1985; Volume 2, pp. 500–505. [Google Scholar] [CrossRef]
- Olmi, R.; Secci, C.; Fantuzzi, C. Coordination of Industrial AGVs. Int. J. Veh. Auton. Syst. 2011, 9, 5–25. [Google Scholar] [CrossRef]
- Erdmann, M.; Lozano-Perez, T. On Multiple Moving Objects. In Proceedings of the 1986 IEEE International Conference on Robotics and Automation, San Francisco, CA, USA, 7–10 April 1986; Volume 2, pp. 477–521. [Google Scholar] [CrossRef]
- Pecora, F.; Cirillo, M.; Dimitrov, D. On mission-dependent coordination of multiple vehicles under spatial and temporal constraints. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systemsm, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 5262–5269. [Google Scholar] [CrossRef]
- Uttendorf, S.; Eilert, B.; Overmeyer, L. A fuzzy logic expert system for the automated generation of roadmaps for automated guided vehicle systems. In Proceedings of the 2016 IEEE International Conference on Industrial Engineering and Engineering Management, Bali, Indonesia, 4–7 December 2016; pp. 977–981. [Google Scholar] [CrossRef]
- Digani, V.; Sabattini, L.; Secchi, C.; Fantuzzi, C. An automatic approach for the generation of the roadmap for multi-AGV systems in an industrial environment. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; pp. 1736–1741. [Google Scholar] [CrossRef]
- Kleiner, K.; Sun, D.; Meyer-Delius, D. ARMO: Adaptive road map optimization for large robot teams. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA, 25–30 September 2011; pp. 3276–3282. [Google Scholar] [CrossRef]
- Henkel, C.; Toussaint, M. Optimized Directed Roadmap Graph for Multi-Agent Path Finding Using Stochastic Gradient Descent. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, Brno, Czech Republic, 30 March–3 April 2020; pp. 776–783. [Google Scholar] [CrossRef] [Green Version]
- Xiao, X.; Liu, B.; Warnell, G.; Stone, P. Motion Control for Mobile Robot Navigation Using Machine Learning: A Survey. arXiv 2020, arXiv:2011.13112. [Google Scholar]
- Faust, A.; Oslund, K.; Ramirez, O.; Francis, A.; Tapia, L.; Fiser, M.; Davidson, J. PRM-RL: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. In Proceedings of the ICRA 2018—IEEE International Conference on Robotics and Automation, Brisbane, Australia, 21–26 May 2018; pp. 5113–5120. [Google Scholar] [CrossRef] [Green Version]
- Gao, J.; Ye, W.; Guo, J.; Li, Z. Deep reinforcement learning for indoor mobile robot path planning. Sensors 2020, 20, 5493. [Google Scholar] [CrossRef] [PubMed]
- Sartoretti, G.; Kerr, J.; Shi, Y.; Wagner, G.; Kumar, T.S.; Koenig, S.; Choset, H. PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning. IEEE Robot. Autom. Lett. 2019, 4, 2378–2385. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Chen, B.; Zhou, H.; Koushik, G.; Hebert, M.; Zhao, D. MAPPER: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments. In Proceedings of the IROS 2020—International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA, 25–29 October 2020; pp. 11748–11754. [Google Scholar] [CrossRef]
- Bae, H.; Kim, G.; Kim, J.; Qian, D.; Lee, S. Multi-robot path planning method using reinforcement learning. Appl. Sci. 2019, 9, 3057. [Google Scholar] [CrossRef] [Green Version]
- MiR100. Available online: https://www.mobile-industrial-robots.com/en/solutions/robots/mir100/ (accessed on 8 March 2021).
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Stable Baselines. GitHub Repository 2018. Available online: https://github.com/hill-a/stable-baselines (accessed on 20 April 2021).
- ROS&Gazebo MiR100 Simulation. Available online: https://github.com/dfki-ric/mir_robot (accessed on 30 October 2020).
Example Layout | Total Nr. of Combinations of Routes | Estimated Time * |
---|---|---|
2 × 2 (Figure 2a) | 4096 | ≈1 ms |
3 × 3 (Figure 2b) | 4.444945756 × 1012 | ≈13 years |
4 × 4 (Figure 2c) | 1.15513119 × 1027 | ≈4.4 × 1015 years |
Variable | Case 1 | Value in Case 1 | Case 2 | Value in Case 2 |
---|---|---|---|---|
rstep | In each new step. | Rstep * | / | / |
rsp | Shortest path length decreased. | Rsp | Shortest path length increased. | −Rsp |
rposition | Agent hits the wall or other static obstacle, or at the wrong drop-off location. | −1 | Otherwise. | 0 |
rvisited | At the already visited location in current episode. | Rvisited | Otherwise. | 0 |
rvisit.c.r. | At the already visited location, it was in searching for the current route. | −1 | Otherwise. | 0 |
ropp | The just-generated part of the route is directed in the opposite direction as another part of the already existing routes in the current agent’s position. | −1 | Otherwise. | 0 |
rcross | Crossing another existing route. | ncross · Rcross | Otherwise. | 0 |
rturn | Making a turn. | Rturn | Otherwise. | 0 |
rmax.stp. | The number of steps for current route exceeded the predefined maximum allowed number of steps for single route. (nstp.c.r. > Nstp.max) | −1 | Otherwise. | 0 |
rdropoff | The agent arrives to drop-off location of current route. | 1 | Otherwise. | 0 |
Value Range |
---|
−1 ≪ Rstep < 0 |
0 < Rsp ≪ 1 |
−1 ≪ Rvisited < 0 |
−1 ≪ Rcross < 0 |
−1 ≪ Rturn < 0 |
Parameter | Value |
---|---|
Rstep | −0.001 |
Rsp | 0.05 |
Rvisited | −0.005 |
Rcross | −0.05 |
Rturn | −0.005 |
Method | Min. Route Length (m) | Max. Route Length (m) | Route Length Mean ± st. dev. (m) |
---|---|---|---|
shortest-path-based | 9.02 | 20.44 | 13.84 ± 3.91 |
proposed (RL-based) | 12.00 | 29.00 | 18.83 ± 5.31 |
Layout | Total Nr. of Routes | Min. Route Length [m] | Max. Route Length [m] | Route Length Mean ± st.dev. [m] | Nr. of Opposite Route Segments * |
---|---|---|---|---|---|
four pick-up/drop-off locations, one wall, 12 × 12 m | 12 | 6.00 | 18.00 | 12.17 ± 4.84 | 0 |
four pick-up/drop-off locations, multiple walls, 12 × 12 m | 12 | 12.00 | 29.00 | 18.83 ± 5.31 | 0 |
eight pick-up/drop-off locations, 30 × 20 m | 56 | 5.00 | 48.00 | 25.07 ± 11.17 | 3 |
seventeen pick-up/drop-off locations, 30 × 20 m | 272 | 3.00 | 66.00 | 24.57 ± 10.83 | 8 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kozjek, D.; Malus, A.; Vrabič, R. Reinforcement-Learning-Based Route Generation for Heavy-Traffic Autonomous Mobile Robot Systems. Sensors 2021, 21, 4809. https://doi.org/10.3390/s21144809
Kozjek D, Malus A, Vrabič R. Reinforcement-Learning-Based Route Generation for Heavy-Traffic Autonomous Mobile Robot Systems. Sensors. 2021; 21(14):4809. https://doi.org/10.3390/s21144809
Chicago/Turabian StyleKozjek, Dominik, Andreja Malus, and Rok Vrabič. 2021. "Reinforcement-Learning-Based Route Generation for Heavy-Traffic Autonomous Mobile Robot Systems" Sensors 21, no. 14: 4809. https://doi.org/10.3390/s21144809
APA StyleKozjek, D., Malus, A., & Vrabič, R. (2021). Reinforcement-Learning-Based Route Generation for Heavy-Traffic Autonomous Mobile Robot Systems. Sensors, 21(14), 4809. https://doi.org/10.3390/s21144809