chapter

Free access

Imitation learning from observations by minimizing inverse dynamics disagreement

AUTHORs:

Chuang GanAuthors Info & Claims

Proceedings of the 33rd International Conference on Neural Information Processing Systems

December 2019

Article No.: 22, Pages 239 - 249

Published: 08 December 2019 Publication History

PDF eReader Publisher Site

Abstract

This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations. In contrast to Learning from Demonstration (LfD) that involves both action and state supervision, LfO is more practical in leveraging previously inapplicable resources (e.g. videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and the expert, if following the modeling approach of GAIL [15]. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. We term our method as Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the conventional LfO method through further bridging the gap to LfD. Considerable empirical results on challenging benchmarks indicate that our method attains consistent improvements over other LfO counterparts.

References

[1]

Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learning. In International conference on Machine learning (ICML), 2004.

Digital Library

[2]

Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipulation. arXiv preprint arXiv:1808.00177, 2018.

[3]

Christopher G Atkeson and Stefan Schaal. Robot learning from demonstration. In International Conference on Machine Learning (ICML), 1997.

[4]

Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and Devon Hjelm. Mutual information neural estimation. In International Conference on Machine Learning (ICML), 2018.

[5]

Darrin C Bentivegna, Christopher G Atkeson, and Gordon Cheng. Learning tasks from observation and practice. Robotics and Autonomous Systems, 47(2-3):163-169, 2004.

[6]

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.

[7]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.

[8]

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning (ICML), 2016.

Digital Library

[9]

Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. In Advances in neural information processing systems (NeurIPS), 2017.

[10]

Justin Fu, Katie Luo, and Sergey Levine. Learning robust rewards with adversarial inverse reinforcement learning. International conference on Learning Representation (ICLR), 2018.

[11]

Alessandro Giusti, Jérôme Guzzi, Dan C Ciresan, Fang-Lin He, Juan P Rodríguez, Flavio Fontana, Matthias Faessler, Christian Forster, Jürgen Schmidhuber, Gianni Di Caro, et al. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters (RA-L), 2016.

[12]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems (NeurIPS), 2014.

Digital Library

[13]

Tuomas Haarnoja, Aurick Zhou, Sehoon Ha, Jie Tan, George Tucker, and Sergey Levine. Learning to walk via deep reinforcement learning. In Robotics: Science and Systems (RSS), 2019.

[14]

R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations (ICLR), 2019.

[15]

Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems (NeurIPS), 2016.

[16]

Mingxuan Jing, Xiaojian Ma, Wenbing Huang, Fuchun Sun, and Huaping Liu. Task transfer by preference-based cost learning. In AAAI Conference on Artificial Intelligence (AAAI), 2019.

[17]

Bingyi Kang, Zequn Jie, and Jiashi Feng. Policy optimization with demonstrations. In International Conference on Machine Learning (ICML), 2018.

[18]

Beomjoon Kim and Joelle Pineau. Maximum mean discrepancy imitation learning. In Robotics: Science and systems (RSS), 2013.

[19]

Kee-Eung Kim and Hyun Soo Park. Imitation learning via kernel mean embedding. In AAAI Conference on Artificial Intelligence (AAAI), 2018.

[20]

Rithesh Kumar, Anirudh Goyal, Aaron Courville, and Yoshua Bengio. Maximum entropy generators for energy-based models. arXiv preprint arXiv:1901.08508, 2019.

[21]

YuXuan Liu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. Imitation from observation: Learning to imitate behaviors from raw video via context translation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018.

[22]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning (ICML), 2016.

[23]

Andrew Y Ng, Stuart J Russell, et al. Algorithms for inverse reinforcement learning. In International conference on Machine learning (ICML), 2000.

Digital Library

[24]

Duy Nguyen-Tuong and Jan Peters. Model learning for robot control: a survey. Cognitive processing, 12(4):319-340, 2011.

[25]

Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. f-gan: Training generative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems (NeurIPS), 2016.

[26]

Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics (TOG), 2018.

[27]

Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, and Sergey Levine. Variational discriminator bottleneck: Improving imitation learning, inverse RL, and GANs by constraining information flow. In International Conference on Learning Representations (ICLR), 2019.

[28]

Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 1994.

[29]

Stéphane Ross and Drew Bagnell. Efficient reductions for imitation learning. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.

[30]

Stephane Ross and J Andrew Bagnell. Agnostic system identification for model-based reinforcement learning. In International conference on Machine learning (ICML), 2012.

[31]

Stefan Schaal. Learning from demonstration. In Advances in neural information processing systems (NeurIPS), 1997.

[32]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.

[33]

Bruno Siciliano and Oussama Khatib. Springer handbook of robotics. Springer, 2008.

[34]

Mark W Spong and Romeo Ortega. On adaptive inverse dynamics control of rigid robots. IEEE Transactions on Automatic Control (T-AC), 1990.

[35]

Bradly C Stadie, Pieter Abbeel, and Ilya Sutskever. Third-person imitation learning. In International Conference on Learning Representations (ICLR), 2017.

[36]

Wen Sun, Anirudh Vemula, Byron Boots, and Drew Bagnell. Provably efficient imitation learning from observation alone. In International Conference on Machine Learning (ICML), 2019.

[37]

Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 1998.

Digital Library

[38]

Umar Syed, Michael Bowling, and Robert E Schapire. Apprenticeship learning using linear programming. In International Conference on Machine Learning (ICML), 2008.

Digital Library

[39]

Faraz Torabi, Garrett Warnell, and Peter Stone. Behavioral cloning from observation. In International Joint Conference on Artificial Intelligence (IJCAI), 2018.

[40]

Faraz Torabi, Garrett Warnell, and Peter Stone. Generative adversarial imitation from observation. arXiv preprint arXiv:1807.06158, 2018.

[41]

Faraz Torabi, Garrett Warnell, and Peter Stone. Imitation learning from video by leveraging proprioception. In International Joint Conference on Artificial Intelligence (IJCAI), 2019.

Index Terms

Imitation learning from observations by minimizing inverse dynamics disagreement

Index terms have been assigned to the content through auto-classification.

Recommendations

Disagreement and Deep Disagreement : Should You Trust Yourself?
Lifelong inverse reinforcement learning
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

Methods for learning from demonstration (LfD) have shown success in acquiring behavior policies by imitating a user. However, even for a single task, LfD may require numerous demonstrations. For versatile agents that must learn many tasks via ...
Imitation Learning under Suboptimal Demonstrations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems

December 2019

15947 pages

Copyright © 2019 Neural Information Processing Systems Foundation, Inc.

In-Cooperation

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 08 December 2019

Qualifiers

Chapter
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
76
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)13

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Media

Figures

Other

Tables

View Table of Contents