Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/CASE49997.2022.9926540guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Expert Initialized Reinforcement Learning with Application to Robotic Assembly

Published: 20 August 2022 Publication History

Abstract

This paper investigates the advantages and boundaries of actor-critic reinforcement learning algorithms in an industrial setting. We compare and discuss Cycle of Learning, Deep Deterministic Policy Gradient and Twin Delayed Deep Deterministic Policy Gradient with respect to performance in simulation as well as on a real robot setup. Furthermore, it emphasizes the importance and potential of combining demonstrated expert behavior with the actor-critic reinforcement learning setting while using it with an admittance controller to solve an industrial assembly task. Cycle of Learning and Twin Delayed Deep Deterministic Policy Gradient showed to be equally usable in simulation, while Cycle of Learning proved to be best on a real world application due to the behavior cloning loss that enables the agent to learn rapidly. The results also demonstrated that it is a necessity to incorporate an admittance controller in order to transfer the learned behavior to a real robot.

References

[1]
Open AI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving rubik's cube with a robot hand,” 2019.
[2]
J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, and et al., “Mastering atari, go, chess and shogi by planning with a learned model,” Nature, vol. 588, no. 7839, p. 604-609, Dec 2020.
[3]
Open AI, “Openai five,” https://blog.openai.com/openai-five/, 2018.
[4]
Open AI, M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba, “Learning dexterous in-hand manipulation,” 2019.
[5]
T. Inoue, G. D. Magistris, A. Munawar, T. Yokoya, and R. Tachibana, “Deep reinforcement learning for high precision assembly tasks,” 2017.
[6]
D. Ha and J. Schmidhuber, “Recurrent world models facilitate policy evolution,” in Advances in Neural Information Processing Systems 31. Curran Associates, Inc., 2018, pp. 2451-2463.
[7]
D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2014.
[8]
J. Marvel, R. Bostelman, and J. Falco, “Multi-robot assembly strategies and metrics,” ACM Computing Surveys, vol. 51, pp. 1-32, 01 2018.
[9]
K. V. Wyk, M. Culleton, J. Falco, and K. Kelly, “Comparative peg-in-hole testing of a force-based manipulation controlled robotic hand,” IEEE Transactions on Robotics, vol. 34, pp. 542-549, 2018.
[10]
V. G. Goecks, G. M. Gremillion, V. J. Lawhern, J. Valasek, and N. R. Waytowich, “Integrating behavior cloning and reinforcement learning for improved performance in dense and sparse reward environments,” 2020.
[11]
D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Ried-miller, “Deterministic policy gradient algorithms,” 31st International Conference on Machine Learning, ICML 2014, vol. 1, 06 2014.
[12]
S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” 2018.
[13]
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” 2019.
[14]
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Proceedings of the 12th International Conference on Neural Information Processing Systems, ser. NIPS'99. Cambridge, MA, USA: MIT Press, 1999, p. 1057-1063.
[15]
R. Bellman, “The theory of dynamic programming,” Bulletin of the American Mathematical Society, vol. 60, no. 6, pp. 503 - 515, 1954.
[16]
C. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, pp. 279-292, 05 1992.
[17]
G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the brownian motion,” Phys. Rev., vol. 36, pp. 823-841, Sep 1930.
[18]
F. Caccavale, B. Siciliano, and L. Villani, “The role of euler parameters in robot control,” Asian Journal of Control, vol. 1, no. 1, pp. 25-34, 1999.
[19]
“Mujoco documentation,” http://www.mujoco.org/book/, accessed: 2022-03-20.
[20]
“Iso tolerances for holes (iso 286-2),” https://www.tribology-abc.com/calculators/iso_holes.htm, accessed: 2022-03-20.

Index Terms

  1. Expert Initialized Reinforcement Learning with Application to Robotic Assembly
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)
    Aug 2022
    1894 pages

    Publisher

    IEEE Press

    Publication History

    Published: 20 August 2022

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media