Imitation learning from observations by minimizing inverse dynamics disagreement
December 2019
Article No.: 22, Pages 239 - 249
Abstract
This paper studies Learning from Observations (LfO) for imitation learning with access to state-only demonstrations. In contrast to Learning from Demonstration (LfD) that involves both action and state supervision, LfO is more practical in leveraging previously inapplicable resources (e.g. videos), yet more challenging due to the incomplete expert guidance. In this paper, we investigate LfO and its difference with LfD in both theoretical and practical perspectives. We first prove that the gap between LfD and LfO actually lies in the disagreement of inverse dynamics models between the imitator and the expert, if following the modeling approach of GAIL [15]. More importantly, the upper bound of this gap is revealed by a negative causal entropy which can be minimized in a model-free way. We term our method as Inverse-Dynamics-Disagreement-Minimization (IDDM) which enhances the conventional LfO method through further bridging the gap to LfD. Considerable empirical results on challenging benchmarks indicate that our method attains consistent improvements over other LfO counterparts.
References
[1]
Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learning. In International conference on Machine learning (ICML), 2004.
[2]
Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipulation. arXiv preprint arXiv:1808.00177, 2018.
[3]
Christopher G Atkeson and Stefan Schaal. Robot learning from demonstration. In International Conference on Machine Learning (ICML), 1997.
[4]
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and Devon Hjelm. Mutual information neural estimation. In International Conference on Machine Learning (ICML), 2018.
[5]
Darrin C Bentivegna, Christopher G Atkeson, and Gordon Cheng. Learning tasks from observation and practice. Robotics and Autonomous Systems, 47(2-3):163-169, 2004.
[6]
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
[7]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.
[8]
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning (ICML), 2016.
[9]
Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. In Advances in neural information processing systems (NeurIPS), 2017.
[10]
Justin Fu, Katie Luo, and Sergey Levine. Learning robust rewards with adversarial inverse reinforcement learning. International conference on Learning Representation (ICLR), 2018.
[11]
Alessandro Giusti, Jérôme Guzzi, Dan C Ciresan, Fang-Lin He, Juan P Rodríguez, Flavio Fontana, Matthias Faessler, Christian Forster, Jürgen Schmidhuber, Gianni Di Caro, et al. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters (RA-L), 2016.
[12]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems (NeurIPS), 2014.
[13]
Tuomas Haarnoja, Aurick Zhou, Sehoon Ha, Jie Tan, George Tucker, and Sergey Levine. Learning to walk via deep reinforcement learning. In Robotics: Science and Systems (RSS), 2019.
[14]
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations (ICLR), 2019.
[15]
Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
[16]
Mingxuan Jing, Xiaojian Ma, Wenbing Huang, Fuchun Sun, and Huaping Liu. Task transfer by preference-based cost learning. In AAAI Conference on Artificial Intelligence (AAAI), 2019.
[17]
Bingyi Kang, Zequn Jie, and Jiashi Feng. Policy optimization with demonstrations. In International Conference on Machine Learning (ICML), 2018.
[18]
Beomjoon Kim and Joelle Pineau. Maximum mean discrepancy imitation learning. In Robotics: Science and systems (RSS), 2013.
[19]
Kee-Eung Kim and Hyun Soo Park. Imitation learning via kernel mean embedding. In AAAI Conference on Artificial Intelligence (AAAI), 2018.
[20]
Rithesh Kumar, Anirudh Goyal, Aaron Courville, and Yoshua Bengio. Maximum entropy generators for energy-based models. arXiv preprint arXiv:1901.08508, 2019.
[21]
YuXuan Liu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. Imitation from observation: Learning to imitate behaviors from raw video via context translation. In 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018.
[22]
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International conference on machine learning (ICML), 2016.
[23]
Andrew Y Ng, Stuart J Russell, et al. Algorithms for inverse reinforcement learning. In International conference on Machine learning (ICML), 2000.
[24]
Duy Nguyen-Tuong and Jan Peters. Model learning for robot control: a survey. Cognitive processing, 12(4):319-340, 2011.
[25]
Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. f-gan: Training generative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
[26]
Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics (TOG), 2018.
[27]
Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, and Sergey Levine. Variational discriminator bottleneck: Improving imitation learning, inverse RL, and GANs by constraining information flow. In International Conference on Learning Representations (ICLR), 2019.
[28]
Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 1994.
[29]
Stéphane Ross and Drew Bagnell. Efficient reductions for imitation learning. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
[30]
Stephane Ross and J Andrew Bagnell. Agnostic system identification for model-based reinforcement learning. In International conference on Machine learning (ICML), 2012.
[31]
Stefan Schaal. Learning from demonstration. In Advances in neural information processing systems (NeurIPS), 1997.
[32]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[33]
Bruno Siciliano and Oussama Khatib. Springer handbook of robotics. Springer, 2008.
[34]
Mark W Spong and Romeo Ortega. On adaptive inverse dynamics control of rigid robots. IEEE Transactions on Automatic Control (T-AC), 1990.
[35]
Bradly C Stadie, Pieter Abbeel, and Ilya Sutskever. Third-person imitation learning. In International Conference on Learning Representations (ICLR), 2017.
[36]
Wen Sun, Anirudh Vemula, Byron Boots, and Drew Bagnell. Provably efficient imitation learning from observation alone. In International Conference on Machine Learning (ICML), 2019.
[37]
Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 1998.
[38]
Umar Syed, Michael Bowling, and Robert E Schapire. Apprenticeship learning using linear programming. In International Conference on Machine Learning (ICML), 2008.
[39]
Faraz Torabi, Garrett Warnell, and Peter Stone. Behavioral cloning from observation. In International Joint Conference on Artificial Intelligence (IJCAI), 2018.
[40]
Faraz Torabi, Garrett Warnell, and Peter Stone. Generative adversarial imitation from observation. arXiv preprint arXiv:1807.06158, 2018.
[41]
Faraz Torabi, Garrett Warnell, and Peter Stone. Imitation learning from video by leveraging proprioception. In International Joint Conference on Artificial Intelligence (IJCAI), 2019.
Index Terms
- Imitation learning from observations by minimizing inverse dynamics disagreement
Index terms have been assigned to the content through auto-classification.
Recommendations
Lifelong inverse reinforcement learning
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing SystemsMethods for learning from demonstration (LfD) have shown success in acquiring behavior policies by imitating a user. However, even for a single task, LfD may require numerous demonstrations. For versatile agents that must learn many tasks via ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
December 2019
15947 pages
Copyright © 2019 Neural Information Processing Systems Foundation, Inc.
In-Cooperation
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
Published: 08 December 2019
Qualifiers
- Chapter
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 76Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)13
Reflects downloads up to 19 Nov 2024
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in