Reinforcement Learning-Based Reactive Obstacle Avoidance Method for Redundant Manipulators
<p>Obstacle avoidance for a redundant manipulator.</p> "> Figure 2
<p>Reinforcement learning framework.</p> "> Figure 3
<p>The closet distance vector for each link.</p> "> Figure 4
<p>Framework of reactive obstacle avoidance.</p> "> Figure 5
<p>Methods to generate obstacles: (<b>a</b>) Obstacles randomly generated in the workspace; (<b>b</b>) An obstacle generated in the safe distance.</p> "> Figure 6
<p>The 4-DOF planar redundant manipulator.</p> "> Figure 7
<p>Network structure of the SAC.</p> "> Figure 8
<p>(<b>a</b>) Training in Stage I; (<b>b</b>) Training in Stage II.</p> "> Figure 9
<p>Learning curve of episode reward.</p> "> Figure 10
<p>Case A study. (<b>a</b>) Obstacle avoidance of GPM; (<b>b</b>) Obstacle avoidance of our method; (<b>c</b>) Joint angle changes of GPM; (<b>d</b>) Joint angle changes of our method; (<b>e</b>) Comparison of closest distance to obstacle; (<b>f</b>) Comparison of manipulability.</p> "> Figure 10 Cont.
<p>Case A study. (<b>a</b>) Obstacle avoidance of GPM; (<b>b</b>) Obstacle avoidance of our method; (<b>c</b>) Joint angle changes of GPM; (<b>d</b>) Joint angle changes of our method; (<b>e</b>) Comparison of closest distance to obstacle; (<b>f</b>) Comparison of manipulability.</p> "> Figure 11
<p>Manipulability of the 4-DOF planar manipulator.</p> "> Figure 12
<p>Comparison of manipulability movement.</p> "> Figure 13
<p>Case B study. (<b>a</b>) Obstacle avoidance of GPM; (<b>b</b>) Obstacle avoidance of our method; (<b>c</b>) Joint angle changes of GPM; (<b>d</b>) Joint angle changes of our method; (<b>e</b>) Comparison of closest distance to obstacle; (<b>f</b>) Comparison of manipulability.</p> "> Figure 13 Cont.
<p>Case B study. (<b>a</b>) Obstacle avoidance of GPM; (<b>b</b>) Obstacle avoidance of our method; (<b>c</b>) Joint angle changes of GPM; (<b>d</b>) Joint angle changes of our method; (<b>e</b>) Comparison of closest distance to obstacle; (<b>f</b>) Comparison of manipulability.</p> "> Figure 14
<p>Case C study. (<b>a</b>) Obstacle avoidance of GPM; (<b>b</b>) Obstacle avoidance of our method; (<b>c</b>) Joint angle changes of GPM; (<b>d</b>) Joint angle changes of our method; (<b>e</b>) Comparison of closest distance to obstacle; (<b>f</b>) Comparison of manipulability.</p> "> Figure 14 Cont.
<p>Case C study. (<b>a</b>) Obstacle avoidance of GPM; (<b>b</b>) Obstacle avoidance of our method; (<b>c</b>) Joint angle changes of GPM; (<b>d</b>) Joint angle changes of our method; (<b>e</b>) Comparison of closest distance to obstacle; (<b>f</b>) Comparison of manipulability.</p> "> Figure 15
<p>Case D study. (<b>a</b>) Obstacle avoidance of GPM; (<b>b</b>) Obstacle avoidance of our method; (<b>c</b>) Joint angle changes of GPM; (<b>d</b>) Joint angle changes of our method; (<b>e</b>) Comparison of closest distance to obstacle; (<b>f</b>) Comparison of manipulability.</p> "> Figure 15 Cont.
<p>Case D study. (<b>a</b>) Obstacle avoidance of GPM; (<b>b</b>) Obstacle avoidance of our method; (<b>c</b>) Joint angle changes of GPM; (<b>d</b>) Joint angle changes of our method; (<b>e</b>) Comparison of closest distance to obstacle; (<b>f</b>) Comparison of manipulability.</p> ">
Abstract
:1. Introduction
- (1)
- A general DRL framework for obstacle avoidance of redundant manipulators is established, in which multiple constraints can be integrated easily.
- (2)
- An improved representation of the state is given in obstacle avoidance. The dimension of state space is independent of the distribution of obstacles. Therefore, the learned obstacle avoidance strategy has a good generalization.
- (3)
- The self-motion of redundant manipulators is utilized to reduce the action space from the entire joint space to the null space of the Jacobian matrix, which greatly improves the learning efficiency of DRL.
- (4)
- A novel reward function of reinforcement learning is designed to cover multiple constraints. The manipulability of a manipulator is introduced, so the manipulator can learn to avoid obstacles while keeping away from the joint singularity.
2. Problem Setup
3. Method
3.1. Reinforcement Learning
3.2. State Definition
3.3. Action Definition
3.4. Reward Function Design
3.5. Learning for Reactive Obstacles Avoidance
3.5.1. SAC Algorithm
Algorithm 1. Soft Actor-Critic (SAC) |
1. Initialize policy network , Q network , target Q network 2. Initialize replay buffer 3. for each epoch do 4. for each environment step do 5. Sample from , collect 6. 7. end for 8. for each gradient step do 9. , for 10. 11. 12. , for 13. end for 14. end for |
3.5.2. RL-Based Reactive Obstacle Avoidance Algorithm for Redundant Manipulators
Algorithm 2. Proposed Obstacle Avoidance Algorithm for Redundant Manipulators |
1. Obtain state 2. Calculate the minimum distance 3. while , do 4. 5. 6. if is out of joint velocity range, then 7. 8. end if 9. 10. 11. 12. end while |
3.5.3. Training Strategy
4. Results and Discussion
4.1. System Description
4.2. Parameters Selection
4.3. Training
4.4. Simulation and Discussion
- Case study in Scenario I
- Case study in Scenario II
- More comparisons
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hjorth, S.; Lachner, J.; Stramigioli, S.; Madsen, O.; Chrysostomou, D. An Energy-Based Approach for the Integration of Collaborative Redundant Robots in Restricted Work Environments. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2020; pp. 7152–7158. [Google Scholar] [CrossRef]
- Khan, A.H.; Li, S.; Cao, X. Tracking Control of Redundant Manipulator under Active Remote Center-of-Motion Constraints: An RNN-Based Metaheuristic Approach. Sci. China Inf. Sci. 2021, 64, 132203. [Google Scholar] [CrossRef]
- Chen, G.; Yuan, B.; Jia, Q.; Sun, H.; Guo, W. Failure Tolerance Strategy of Space Manipulator for Large Load Carrying Tasks. Acta Astronaut. 2018, 148, 186–204. [Google Scholar] [CrossRef]
- Khatib, O. Real-Time Obstacle Avoidance for Manipulators and Mobile Robots. Int. J. Robot. Res. 1986, 5, 90–98. [Google Scholar] [CrossRef]
- Wang, W.; Zhu, M.; Wang, X.; He, S.; He, J.; Xu, Z. An Improved Artificial Potential Field Method of Trajectory Planning and Obstacle Avoidance for Redundant Manipulators. Int. J. Adv. Robot. Syst. 2018, 15, 1729881418799562. [Google Scholar] [CrossRef] [Green Version]
- Whitney, D.E. Resolved Motion Rate Control of Manipulators and Human Prostheses. IEEE Trans. Man-Mach. Syst. 1969, 10, 47–53. [Google Scholar] [CrossRef]
- Automatic Supervisory Control of the Configuration and Behavior of Multibody Mechanisms. IEEE Trans. Syst. Man Cybern. 1977, 7, 868–871. [CrossRef]
- Žlajpah, L.; Petrič, T. Obstacle Avoidance for Redundant Manipulators as Control Problem. In Serial and Parallel Robot Manipulators; Kucuk, S., Ed.; IntechOpen: Rijeka, Croatia, 2012; Chapter 11. [Google Scholar]
- Wan, J.; Yao, J.; Zhang, L.; Wu, H. A Weighted Gradient Projection Method for Inverse Kinematics of Redundant Manipulators Considering Multiple Performance Criteria. Stroj. Vestn. J. Mech. Eng. 2018, 64, 475–487. [Google Scholar] [CrossRef]
- Di Vito, D.; Natale, C.; Antonelli, G. A Comparison of Damped Least Squares Algorithms for Inverse Kinematics of Robot Manipulators This Work Was Supported by the European Community through TheprojectsROBUST(H2020-690416),EuRoC(FP7-608849), DexROV (H2020-635491) and AEROARMS (H2020-644271). IFAC-Pap. 2017, 50, 6869–6874. [Google Scholar] [CrossRef]
- Xiang, J.; Zhong, C.; Wei, W. General-Weighted Least-Norm Control for Redundant Manipulators. IEEE Trans. Robot. 2010, 26, 660–669. [Google Scholar] [CrossRef]
- Zhang, X.; Fan, B.; Wang, C.; Cheng, X. An Improved Weighted Gradient Projection Method for Inverse Kinematics of Redundant Surgical Manipulators. Sensors 2021, 21, 7362. [Google Scholar] [CrossRef] [PubMed]
- Liu, J.; Tong, Y.; Ju, Z.; Liu, Y. Novel Method of Obstacle Avoidance Planning for Redundant Sliding Manipulators. IEEE Access 2020, 8, 78608–78621. [Google Scholar] [CrossRef]
- Qureshi, A.H.; Miao, Y.; Simeonov, A.; Yip, M.C. Motion Planning Networks: Bridging the Gap Between Learning-Based and Classical Motion Planners. IEEE Trans. Robot. 2021, 37, 48–66. [Google Scholar] [CrossRef]
- Xu, Z.; Zhou, X.; Li, S. Deep Recurrent Neural Networks Based Obstacle Avoidance Control for Redundant Manipulators. Front. Neurorobot. 2019, 13, 47. [Google Scholar] [CrossRef] [PubMed]
- Sangiovanni, B.; Rendiniello, A.; Incremona, G.P.; Ferrara, A.; Piastra, M. Deep Reinforcement Learning for Collision Avoidance of Robotic Manipulators. In Proceedings of the 2018 European Control Conference (ECC), Limassol, Cyprus, 12–15 June 2018; pp. 2063–2068. [Google Scholar] [CrossRef]
- Kumar, V.; Hoeller, D.; Sundaralingam, B.; Tremblay, J.; Birchfield, S. Joint Space Control via Deep Reinforcement Learning. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 3619–3626. [Google Scholar] [CrossRef]
- Hua, X.; Wang, G.; Xu, J.; Chen, K. Reinforcement Learning-Based Collision-Free Path Planner for Redundant Robot in Narrow Duct. J. Intell. Manuf. 2021, 32, 471–482. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Sutton, R.S.; McAllester, D.A.; Singh, S.P.; Mansour, Y. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 29 November–4 December 1999; pp. 1057–1063. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2019, arXiv:1509.02971. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 1889–1897. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Yoshikawa, T. Manipulability of Robotic Mechanisms. Int. J. Robot. Res. 1985, 4, 3–9. [Google Scholar] [CrossRef]
- Luo, S.; Kasaei, H.; Schomaker, L. Accelerating Reinforcement Learning for Reaching Using Continuous Curriculum Learning. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
Joint Range | ||||
---|---|---|---|---|
joint 1 | −120 | 120 | −20 | 20 |
joint 2 | −160 | 160 | −20 | 20 |
joint 3 | −160 | 160 | −20 | 20 |
joint 4 | −160 | 160 | −20 | 20 |
Parameter | Value |
---|---|
Optimizer | Adam |
Learning rate | 0.001 |
Discount factor | 0.99 |
Polyak update factor | 0.995 |
Entropy target | −4 |
Replay buffer size | 1 × 105 |
Mini-batch size | 100 |
Max episode length | 400 |
1 | |
0.2 | |
0.05 |
Comparison | GPM | Ours |
---|---|---|
Success rate in Scenario I | 100% | 100% |
Success rate in Scenario II | 77.4% | 96.8% |
in Scenario I | 3.78 | 3.95 |
in Scenario II | 3.63 | 3.72 |
Time to calculate in Scenario I | 1.484 ms | 1.155 ms |
Time to calculate in Scenario II | 2.048 ms | 1.372 ms |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shen, Y.; Jia, Q.; Huang, Z.; Wang, R.; Fei, J.; Chen, G. Reinforcement Learning-Based Reactive Obstacle Avoidance Method for Redundant Manipulators. Entropy 2022, 24, 279. https://doi.org/10.3390/e24020279
Shen Y, Jia Q, Huang Z, Wang R, Fei J, Chen G. Reinforcement Learning-Based Reactive Obstacle Avoidance Method for Redundant Manipulators. Entropy. 2022; 24(2):279. https://doi.org/10.3390/e24020279
Chicago/Turabian StyleShen, Yue, Qingxuan Jia, Zeyuan Huang, Ruiquan Wang, Junting Fei, and Gang Chen. 2022. "Reinforcement Learning-Based Reactive Obstacle Avoidance Method for Redundant Manipulators" Entropy 24, no. 2: 279. https://doi.org/10.3390/e24020279
APA StyleShen, Y., Jia, Q., Huang, Z., Wang, R., Fei, J., & Chen, G. (2022). Reinforcement Learning-Based Reactive Obstacle Avoidance Method for Redundant Manipulators. Entropy, 24(2), 279. https://doi.org/10.3390/e24020279