Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3523760.3523785acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
research-article

MIND MELD: Personalized Meta-Learning for Robot-Centric Imitation Learning

Published: 07 March 2022 Publication History

Abstract

Learning from demonstration (LfD) techniques seek to enable users without computer programming experience to teach robots novel tasks. There are generally two types of LfD: human- and robot-centric. While human-centric learning is intuitive, human centric learning suffers from performance degradation due to covariate shift. Robot-centric approaches, such as Dataset Aggregation (DAgger), address covariate shift, but can struggle to learn from suboptimal human teachers. To create a more human-aware version of robot-centric LfD, we present Mutual Information-driven Meta-learning from Demonstration (MIND MELD). MIND MELD meta-learns a mapping from suboptimal and heterogeneous human feedback to optimal labels, thereby improving the learning signal for robot-centric LfD. The key to our approach is learning an informative personalized embedding using mutual information maximization via variational inference. The embedding then informs a mapping from human provided labels to optimal labels. We evaluate our framework in a human-subjects experiment, demonstrating that our approach improves corrective labels provided by human demonstrators. Our framework outperforms baselines in terms of ability to reach the goal (p < .001), average distance from the goal (p = .006), and various subjective ratings (p = .008).

Supplemental Material

ZIP File

References

[1]
Saleema Amershi, Maya Cakmak, W. Bradley Knox, and Todd Kulesza. Power to the people: The role of humans in interactive machine learning. AI Magazine, 35(4):105--120, 2014.
[2]
Brenna Argall, Sonia Chernova, Manuela M. Veloso, and Brett Browning. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5):469--483, May 2009.
[3]
Christoph Bartneck, Elizabeth Croft, and Dana Kulic. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. International Journal of Social Robotics, 1(1):71--81, 2009.
[4]
Jakob Berggren. Performance Evaluation of Imitation Learning Algorithms with Human Experts. 2019.
[5]
W. Bradley Knox and Peter Stone. TAMER: Training an Agent Manually via Evaluative Reinforcement. In 2008 7th IEEE International Conference on Development and Learning, pages 292--297, 2008.
[6]
Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, and Scott Niekum. Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. 4 2019.
[7]
Daniel S. Brown, Wonjoon Goo, and Scott Niekum. Better-than-demonstrator imitation learning via automatically-ranked demonstrations. 7 2019.
[8]
Eduardo F. Camacho and Carlos A. Bordons. Model Predictive Control in the Process Industry. Springer-Verlag, Berlin, Heidelberg, 1997.
[9]
Letian Chen, Rohan R. Paleja, and Matthew Craig Gombolay. Learning from suboptimal demonstration via self-supervised reward regression. In CoRL, 2020.
[10]
Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets. Advances in Neural Information Processing Systems, pages 2180--2188, 2016.
[11]
Sonia Chernova and Andrea L. Thomaz. Robot Learning from Human Teachers. Morgan & Claypool Publishers, 2014.
[12]
Sonia Chernova and Manuela Veloso. Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34:1--25, 2009.
[13]
William N Dudley, Rita Wickham, and Nicholas Coombs. An introduction to survival statistics: Kaplan-meier analysis. Journal of the advanced practitioner in oncology, 7(1):91-100, 2016.
[14]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. 2017.
[15]
Sandra G Hart and Lowell E Staveland. Development of nasa-tlx (task load index): Results of empirical and theoretical research. Human mental workload, 1(3):139--183, 1988.
[16]
He He, Hal Daumé, and Jason Eisner. Imitation Learning by Coaching. Conference on Neural Information Processing Systems, pages 1--9, 2012.
[17]
Muhammad Abdullah Jamal and Guo-Jun Qi. Task agnostic meta-learning for few-shot learning.
[18]
Rohit Jena, Changliu Liu, and Katia Sycara. Augmenting GAIL with BC for sample efficient imitation learning. pages 1--11, 2020.
[19]
Jiun-Yin Jian, Ann Bisantz, and Colin Drury. Foundations for an Empirically Determined Scale of Trust in Automated Systems. International Journal of Cognitive Ergonomics, 4:53--71, 2000.
[20]
Sertac Karaman and Emilio Frazzoli. Incremental sampling-based algorithms for optimal motion planning, 2010.
[21]
Michael Kelly, Chelsea Sidrane, Katherine Driggs-Campbell, and Mykel J. Kochenderfer. HG-DAgger: Interactive imitation learning with human experts. Proceedings - IEEE International Conference on Robotics and Automation, 2019-May:8077--8083, 2019.
[22]
Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, and Ken Goldberg. Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations. Proceedings - IEEE International Conference on Robotics and Automation, pages 358--365, 2017.
[23]
Michael Laskey, Sam Staszak, Wesley Yu Shu Hsieh, Jeffrey Mahler, Florian T. Pokorny, Anca D. Dragan, and Ken Goldberg. SHIV: Reducing supervisor burden in DAgger using support vectors for efficient learning from demonstrations in high dimensional state spaces. Proceedings - IEEE International Conference on Robotics and Automation, 2016-June:462--469, 2016.
[24]
Ruisen Liu, Matthew C Gombolay, and Stephen Balakirsky. Torwards Unpaired Human-to-Robot Demonstration Translation Learning Novel Tasks. ICSR Workshop Human Robot Interaction for Space Robotics (HRI-SR), 2021.
[25]
Kunal Menda, Katherine Driggs-Campbell, and Mykel J. Kochenderfer. EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning. IEEE International Conference on Intelligent Robots and Systems, (2):5041--5048, 2019.
[26]
Takayuki Osa, Gerhard Neumann, and Jan Peters. An Algorithmic Perspective on Imitation Learning. 7(1):1--179, 2018.
[27]
Brandon Packard and Santiago Onta. A User Study on Learning from Human Demonstration. (Aiide):208--214, 2018.
[28]
Rohan Paleja and Matthew Gombolay. Inferring personalized bayesian embeddings for learning from heterogeneous demonstration. arXiv, 2019.
[29]
Harish Ravichandar, Athanasios S. Polydoros, Sonia Chernova, and Aude Billard. Recent Advances in Robot Learning from Demonstration, volume 3. 2020.
[30]
Laurel D. Riek. Wizard of oz studies in hri: A systematic review and new reporting guidelines. J. Hum.-Robot Interact., 1(1):119--136, July 2012.
[31]
Sté phane Ross and J. Andrew Bagnell. Efficient reductions for imitation learning. Journal of Machine Learning Research, 9:661--668, 2010.
[32]
Sté phane Ross, Geoffrey J Gordon, and J. Andrew Bagnell. No-regret reductions for imitation learning and structured prediction. Aistats, 15:627--635, 2011.
[33]
Stéphane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J. Andrew Bagnell, and Martial Hebert. Learning monocular reactive UAV control in cluttered natural environments. In 2013 IEEE International Conference on Robotics and Automation, pages 1765--1772, 2013.
[34]
S. Salvador and P. Chan. FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. 2004.
[35]
Claude Sammut. Automatically Constructing Control Systems by Observing Human Behaviour. Second International Inductive Logic Programming Workshop, (May), 1992.
[36]
Stefan Schaal. Learning from demonstration. In M. C. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9. MIT Press, 1997.
[37]
Mariah L. Schrum, Erin Hedlund, and Matthew C. Gombolay. Improving Robot-Centric Learning from Demonstration via Personalized Embeddings, 2021. _eprint: 2110.03134.
[38]
Mariah L. Schrum, Michael Johnson, Muyleng Ghuy, and Matthew C. Gombolay. Four years in review: Statistical practices of likert scales in human-robot interaction studies. ACM/IEEE International Conference on Human-Robot Interaction, pages 43--52, 2020.
[39]
Aran Sena and Matthew Howard. Quantifying teaching behavior in robot learning from demonstration. 2020.
[40]
Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. Airsim: High-fidelity visual and physical simulation for autonomous vehicles, 2017.
[41]
Jarrod M Snider. Automatic steering methods for autonomous automobile path tracking, 2009.
[42]
Jonathan Spencer, Sanjiban Choudhury, Matt Barnes, Matthew Schmittle, Mung Chiang, Peter Ramadge, and Siddhartha Srinivasa. Learning from Interventions: Human-robot interaction as both explicit and implicit feedback. 2020.

Cited By

View all
  • (2024)Enhancing Safety in Learning from Demonstration Algorithms via Control Barrier Function ShieldingProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3635002(820-829)Online publication date: 11-Mar-2024
  • (2024)Towards Balancing Preference and Performance through Adaptive Personalized ExplainabilityProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3635000(658-668)Online publication date: 11-Mar-2024
  • (2023)Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot InteractionACM Transactions on Human-Robot Interaction10.1145/3623384Online publication date: 22-Sep-2023

Index Terms

  1. MIND MELD: Personalized Meta-Learning for Robot-Centric Imitation Learning

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HRI '22: Proceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction
    March 2022
    1353 pages

    Sponsors

    Publisher

    IEEE Press

    Publication History

    Published: 07 March 2022

    Check for updates

    Author Tags

    1. learning from demonstration
    2. personalization, meta-learning

    Qualifiers

    • Research-article

    Data Availability

    Funding Sources

    • National Science Foundation
    • NASA Early Career Fellowship
    • MIT Lincoln Laboratory
    • Georgia Institute of Technology State Funding
    • Konica Minolta

    Conference

    HRI '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 268 of 1,124 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)60
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 30 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing Safety in Learning from Demonstration Algorithms via Control Barrier Function ShieldingProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3635002(820-829)Online publication date: 11-Mar-2024
    • (2024)Towards Balancing Preference and Performance through Adaptive Personalized ExplainabilityProceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610977.3635000(658-668)Online publication date: 11-Mar-2024
    • (2023)Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot InteractionACM Transactions on Human-Robot Interaction10.1145/3623384Online publication date: 22-Sep-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media