research-article

Expert Initialized Reinforcement Learning with Application to Robotic Assembly

Authors:

Christoffer SlothAuthors Info & Claims

2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)

Pages 1405 - 1410

https://doi.org/10.1109/CASE49997.2022.9926540

Published: 20 August 2022 Publication History

Abstract

This paper investigates the advantages and boundaries of actor-critic reinforcement learning algorithms in an industrial setting. We compare and discuss Cycle of Learning, Deep Deterministic Policy Gradient and Twin Delayed Deep Deterministic Policy Gradient with respect to performance in simulation as well as on a real robot setup. Furthermore, it emphasizes the importance and potential of combining demonstrated expert behavior with the actor-critic reinforcement learning setting while using it with an admittance controller to solve an industrial assembly task. Cycle of Learning and Twin Delayed Deep Deterministic Policy Gradient showed to be equally usable in simulation, while Cycle of Learning proved to be best on a real world application due to the behavior cloning loss that enables the agent to learn rapidly. The results also demonstrated that it is a necessity to incorporate an admittance controller in order to transfer the learned behavior to a real robot.

References

[1]

Open AI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving rubik's cube with a robot hand,” 2019.

[2]

J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, and et al., “Mastering atari, go, chess and shogi by planning with a learned model,” Nature, vol. 588, no. 7839, p. 604-609, Dec 2020.

[3]

Open AI, “Openai five,” https://blog.openai.com/openai-five/, 2018.

[4]

Open AI, M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba, “Learning dexterous in-hand manipulation,” 2019.

[5]

T. Inoue, G. D. Magistris, A. Munawar, T. Yokoya, and R. Tachibana, “Deep reinforcement learning for high precision assembly tasks,” 2017.

[6]

D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” in Advances in Neural Information Processing Systems 31. Curran Associates, Inc., 2018, pp. 2451-2463.

[7]

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2014.

[8]

J. Marvel, R. Bostelman, and J. Falco, “Multi-robot assembly strategies and metrics,” ACM Computing Surveys, vol. 51, pp. 1-32, 01 2018.

Digital Library

[9]

K. V. Wyk, M. Culleton, J. Falco, and K. Kelly, “Comparative peg-in-hole testing of a force-based manipulation controlled robotic hand,” IEEE Transactions on Robotics, vol. 34, pp. 542-549, 2018.

[10]

V. G. Goecks, G. M. Gremillion, V. J. Lawhern, J. Valasek, and N. R. Waytowich, “Integrating behavior cloning and reinforcement learning for improved performance in dense and sparse reward environments,” 2020.

[11]

D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Ried-miller, “Deterministic policy gradient algorithms,” 31st International Conference on Machine Learning, ICML 2014, vol. 1, 06 2014.

[12]

S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” 2018.

[13]

T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” 2019.

[14]

R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Proceedings of the 12th International Conference on Neural Information Processing Systems, ser. NIPS'99. Cambridge, MA, USA: MIT Press, 1999, p. 1057-1063.

[15]

R. Bellman, “The theory of dynamic programming,” Bulletin of the American Mathematical Society, vol. 60, no. 6, pp. 503 - 515, 1954.

[16]

C. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, pp. 279-292, 05 1992.

Digital Library

[17]

G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the brownian motion,” Phys. Rev., vol. 36, pp. 823-841, Sep 1930.

[18]

F. Caccavale, B. Siciliano, and L. Villani, “The role of euler parameters in robot control,” Asian Journal of Control, vol. 1, no. 1, pp. 25-34, 1999.

[19]

“Mujoco documentation,” http://www.mujoco.org/book/, accessed: 2022-03-20.

[20]

“Iso tolerances for holes (iso 286-2),” https://www.tribology-abc.com/calculators/iso_holes.htm, accessed: 2022-03-20.

Index Terms

Expert Initialized Reinforcement Learning with Application to Robotic Assembly

Index terms have been assigned to the content through auto-classification.

Recommendations

Offline reinforcement learning application in robotic manipulation with a COG method case
CCEAI '22: Proceedings of the 6th International Conference on Control Engineering and Artificial Intelligence

Artificial intelligence now has different applications in various industrial fields. Reinforcement learning (RL) is one of the hot topics in the artificial intelligence, also in robotics. It is an important learning method in the field of robotic ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Reinforcement Learning: With Open AI, TensorFlow and Keras Using Python

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)

Aug 2022

1894 pages

Copyright © 2022.

Publisher

IEEE Press

Publication History

Published: 20 August 2022

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents