research-article

MIND MELD: Personalized Meta-Learning for Robot-Centric Imitation Learning

Authors:

Mariah L. Schrum,

Erin Hedlund-Botti,

Matthew C. GombolayAuthors Info & Claims

HRI '22: Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction

Pages 157 - 165

Published: 07 March 2022 Publication History

Abstract

Learning from demonstration (LfD) techniques seek to enable users without computer programming experience to teach robots novel tasks. There are generally two types of LfD: human- and robot-centric. While human-centric learning is intuitive, human centric learning suffers from performance degradation due to covariate shift. Robot-centric approaches, such as Dataset Aggregation (DAgger), address covariate shift, but can struggle to learn from suboptimal human teachers. To create a more human-aware version of robot-centric LfD, we present Mutual Information-driven Meta-learning from Demonstration (MIND MELD). MIND MELD meta-learns a mapping from suboptimal and heterogeneous human feedback to optimal labels, thereby improving the learning signal for robot-centric LfD. The key to our approach is learning an informative personalized embedding using mutual information maximization via variational inference. The embedding then informs a mapping from human provided labels to optimal labels. We evaluate our framework in a human-subjects experiment, demonstrating that our approach improves corrective labels provided by human demonstrators. Our framework outperforms baselines in terms of ability to reach the goal (p < .001), average distance from the goal (p = .006), and various subjective ratings (p = .008).

Supplemental Material

ZIP File

Download
1.38 MB

References

[1]

Saleema Amershi, Maya Cakmak, W. Bradley Knox, and Todd Kulesza. Power to the people: The role of humans in interactive machine learning. AI Magazine, 35(4):105--120, 2014.

Digital Library

[2]

Brenna Argall, Sonia Chernova, Manuela M. Veloso, and Brett Browning. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5):469--483, May 2009.

Digital Library

[3]

Christoph Bartneck, Elizabeth Croft, and Dana Kulic. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. International Journal of Social Robotics, 1(1):71--81, 2009.

[4]

Jakob Berggren. Performance Evaluation of Imitation Learning Algorithms with Human Experts. 2019.

[5]

W. Bradley Knox and Peter Stone. TAMER: Training an Agent Manually via Evaluative Reinforcement. In 2008 7th IEEE International Conference on Development and Learning, pages 292--297, 2008.

[6]

Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, and Scott Niekum. Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. 4 2019.

[7]

Daniel S. Brown, Wonjoon Goo, and Scott Niekum. Better-than-demonstrator imitation learning via automatically-ranked demonstrations. 7 2019.

[8]

Eduardo F. Camacho and Carlos A. Bordons. Model Predictive Control in the Process Industry. Springer-Verlag, Berlin, Heidelberg, 1997.

Digital Library

[9]

Letian Chen, Rohan R. Paleja, and Matthew Craig Gombolay. Learning from suboptimal demonstration via self-supervised reward regression. In CoRL, 2020.

[10]

Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. Advances in Neural Information Processing Systems, pages 2180--2188, 2016.

[11]

Sonia Chernova and Andrea L. Thomaz. Robot Learning from Human Teachers. Morgan & Claypool Publishers, 2014.

Digital Library

[12]

Sonia Chernova and Manuela Veloso. Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34:1--25, 2009.

Digital Library

[13]

William N Dudley, Rita Wickham, and Nicholas Coombs. An introduction to survival statistics: Kaplan-meier analysis. Journal of the advanced practitioner in oncology, 7(1):91-100, 2016.

[14]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. 2017.

[15]

Sandra G Hart and Lowell E Staveland. Development of nasa-tlx (task load index): Results of empirical and theoretical research. Human mental workload, 1(3):139--183, 1988.

[16]

He He, Hal Daumé, and Jason Eisner. Imitation Learning by Coaching. Conference on Neural Information Processing Systems, pages 1--9, 2012.

[17]

Muhammad Abdullah Jamal and Guo-Jun Qi. Task agnostic meta-learning for few-shot learning.

[18]

Rohit Jena, Changliu Liu, and Katia Sycara. Augmenting GAIL with BC for sample efficient imitation learning. pages 1--11, 2020.

[19]

Jiun-Yin Jian, Ann Bisantz, and Colin Drury. Foundations for an Empirically Determined Scale of Trust in Automated Systems. International Journal of Cognitive Ergonomics, 4:53--71, 2000.

[20]

Sertac Karaman and Emilio Frazzoli. Incremental sampling-based algorithms for optimal motion planning, 2010.

[21]

Michael Kelly, Chelsea Sidrane, Katherine Driggs-Campbell, and Mykel J. Kochenderfer. HG-DAgger: Interactive imitation learning with human experts. Proceedings - IEEE International Conference on Robotics and Automation, 2019-May:8077--8083, 2019.

Digital Library

[22]

Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, and Ken Goldberg. Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations. Proceedings - IEEE International Conference on Robotics and Automation, pages 358--365, 2017.

Digital Library

[23]

Michael Laskey, Sam Staszak, Wesley Yu Shu Hsieh, Jeffrey Mahler, Florian T. Pokorny, Anca D. Dragan, and Ken Goldberg. SHIV: Reducing supervisor burden in DAgger using support vectors for efficient learning from demonstrations in high dimensional state spaces. Proceedings - IEEE International Conference on Robotics and Automation, 2016-June:462--469, 2016.

Digital Library

[24]

Ruisen Liu, Matthew C Gombolay, and Stephen Balakirsky. Torwards Unpaired Human-to-Robot Demonstration Translation Learning Novel Tasks. ICSR Workshop Human Robot Interaction for Space Robotics (HRI-SR), 2021.

[25]

Kunal Menda, Katherine Driggs-Campbell, and Mykel J. Kochenderfer. EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning. IEEE International Conference on Intelligent Robots and Systems, (2):5041--5048, 2019.

Digital Library

[26]

Takayuki Osa, Gerhard Neumann, and Jan Peters. An Algorithmic Perspective on Imitation Learning. 7(1):1--179, 2018.

Digital Library

[27]

Brandon Packard and Santiago Onta. A User Study on Learning from Human Demonstration. (Aiide):208--214, 2018.

[28]

Rohan Paleja and Matthew Gombolay. Inferring personalized bayesian embeddings for learning from heterogeneous demonstration. arXiv, 2019.

[29]

Harish Ravichandar, Athanasios S. Polydoros, Sonia Chernova, and Aude Billard. Recent Advances in Robot Learning from Demonstration, volume 3. 2020.

[30]

Laurel D. Riek. Wizard of oz studies in hri: A systematic review and new reporting guidelines. J. Hum.-Robot Interact., 1(1):119--136, July 2012.

Digital Library

[31]

Sté phane Ross and J. Andrew Bagnell. Efficient reductions for imitation learning. Journal of Machine Learning Research, 9:661--668, 2010.

[32]

Sté phane Ross, Geoffrey J Gordon, and J. Andrew Bagnell. No-regret reductions for imitation learning and structured prediction. Aistats, 15:627--635, 2011.

[33]

Stéphane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J. Andrew Bagnell, and Martial Hebert. Learning monocular reactive UAV control in cluttered natural environments. In 2013 IEEE International Conference on Robotics and Automation, pages 1765--1772, 2013.

[34]

S. Salvador and P. Chan. FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. 2004.

[35]

Claude Sammut. Automatically Constructing Control Systems by Observing Human Behaviour. Second International Inductive Logic Programming Workshop, (May), 1992.

[36]

Stefan Schaal. Learning from demonstration. In M. C. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9. MIT Press, 1997.

[37]

Mariah L. Schrum, Erin Hedlund, and Matthew C. Gombolay. Improving Robot-Centric Learning from Demonstration via Personalized Embeddings, 2021. _eprint: 2110.03134.

[38]

Mariah L. Schrum, Michael Johnson, Muyleng Ghuy, and Matthew C. Gombolay. Four years in review: Statistical practices of likert scales in human-robot interaction studies. ACM/IEEE International Conference on Human-Robot Interaction, pages 43--52, 2020.

Digital Library

[39]

Aran Sena and Matthew Howard. Quantifying teaching behavior in robot learning from demonstration. 2020.

Digital Library

[40]

Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. Airsim: High-fidelity visual and physical simulation for autonomous vehicles, 2017.

[41]

Jarrod M Snider. Automatic steering methods for autonomous automobile path tracking, 2009.

[42]

Jonathan Spencer, Sanjiban Choudhury, Matt Barnes, Matthew Schmittle, Mung Chiang, Peter Ramadge, and Siddhartha Srinivasa. Learning from Interventions: Human-robot interaction as both explicit and implicit feedback. 2020.

Cited By

Yang YChen LZaidi Zvan Waveren SKrishna AGombolay MGrollman DBroadbent EJu WSoh HWilliams T(2024)Enhancing Safety in Learning from Demonstration Algorithms via Control Barrier Function ShieldingProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3635002(820-829)Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1145/3610977.3635002
Silva ATambwekar PSchrum MGombolay MGrollman DBroadbent EJu WSoh HWilliams T(2024)Towards Balancing Preference and Performance through Adaptive Personalized ExplainabilityProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3635000(658-668)Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1145/3610977.3635000
Mehta SLosey D(2023)Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot InteractionACM Transactions on Human-Robot Interaction10.1145/3623384Online publication date: 22-Sep-2023
https://dl.acm.org/doi/10.1145/3623384

Index Terms

MIND MELD: Personalized Meta-Learning for Robot-Centric Imitation Learning
1. Human-centered computing
  1. Collaborative and social computing
    1. Empirical studies in collaborative and social computing

Recommendations

Active deep Q-learning with demonstration
Abstract
Reinforcement learning (RL) is a machine learning technique aiming to learn how to take actions in an environment to maximize some kind of reward. Recent research has shown that although the learning efficiency of RL can be improved with expert ...
Personalized Meta-Learning for Domain Agnostic Learning from Demonstration
HRI '22: Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction

For robots to perform novel tasks in the real-world, they must be capable of learning from heterogeneous, non-expert human teachers across various domains. Yet, novice human teachers often provide suboptimal demonstrations, making it difficult for robots ...
Learning with Minimal Supervision: New Meta-Learning and Reinforcement Learning Algorithms

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HRI '22: Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction

March 2022

1353 pages

General Chairs:
Daisuke Sakamoto
Hokkaido University, Japan
,
Astrid Weiss
Technische Universität Wien, Austria
,
Program Chairs:
Laura M Hiatt
Naval Research Laboratory, USA
,
Masahiro Shiomi
ATR, Japan

Sponsors

Publisher

IEEE Press

Publication History

Published: 07 March 2022

Check for updates

Author Tags

Qualifiers

Research-article

Data Availability

360092aux.zip https://dl.acm.org/doi/10.5555/3523760.3523785#360092aux.zip

Funding Sources

National Science Foundation
NASA Early Career Fellowship
MIT Lincoln Laboratory
Georgia Institute of Technology State Funding
Konica Minolta

Conference

HRI '22

Sponsor:

HRI '22: ACM/IEEE International Conference on Human-Robot Interaction

March 7 - 10, 2022

Hokkaido, Sapporo, Japan

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
393
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)4

Reflects downloads up to 30 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang YChen LZaidi Zvan Waveren SKrishna AGombolay MGrollman DBroadbent EJu WSoh HWilliams T(2024)Enhancing Safety in Learning from Demonstration Algorithms via Control Barrier Function ShieldingProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3635002(820-829)Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1145/3610977.3635002
Silva ATambwekar PSchrum MGombolay MGrollman DBroadbent EJu WSoh HWilliams T(2024)Towards Balancing Preference and Performance through Adaptive Personalized ExplainabilityProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3635000(658-668)Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1145/3610977.3635000
Mehta SLosey D(2023)Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot InteractionACM Transactions on Human-Robot Interaction10.1145/3623384Online publication date: 22-Sep-2023
https://dl.acm.org/doi/10.1145/3623384

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents