Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration
Pages 1463 - 1472
Abstract
Agents that assist people need to have well-initialized policies that can adapt quickly to align with their partners' reward functions. Initializing policies to maximize performance with unknown partners can be achieved by bootstrapping nonlinear models using imitation learning over large, offline datasets. Such policies can require prohibitive computation to fine-tune in-situ and therefore may miss critical run-time information about a partner's reward function as expressed through their immediate behavior. In contrast, online logistic regression using low-capacity models performs rapid inference and fine-tuning updates and thus can make effective use of immediate in-task behavior for reward function alignment. However, these low-capacity models cannot be bootstrapped as effectively by offline datasets and thus have poor initializations. We propose BLR-HAC, Bootstrapped Logistic Regression for Human Agent Collaboration, which bootstraps large nonlinear models to learn the parameters of a low-capacity model which then uses online logistic regression for updates during collaboration. We test BLR-HAC in a simulated surface rearrangement task and demonstrate that it achieves higher zero-shot accuracy than shallow methods and takes far less computation to adapt online while still achieving similar performance to fine-tuned, large nonlinear models. For code, please see our project page https://sites.google.com/view/blr-hac.
References
[1]
Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning. 1.
[2]
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, et al. 2022. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022).
[3]
Antonio Andriella, Carme Torras, Carla Abdelnour, and Guillem Alenyà. 2022. Introducing CARESSER: A framework for in situ learning robot social assistance from expert knowledge and demonstrations. User Modeling and User-Adapted Interaction (03 2022). https://doi.org/10.1007/s11257-021-09316-5
[4]
Reuben M. Aronson and Henny Admoni. 2022. Gaze Complements Control Input for Goal Prediction During Assisted Teleoperation. Robotics science and systems (2022). https://par.nsf.gov/biblio/10327640
[5]
Reuben M. Aronson, Thiago Santini, Thomas C. Kübler, Enkelejda Kasneci, Sid-dhartha Srinivasa, and Henny Admoni. 2018. Eye-Hand Behavior in Human-Robot Shared Manipulation. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction (Chicago, IL, USA) (HRI '18). Association for Computing Machinery, New York, NY, USA, 4--13. https://doi.org/10.1145/ 3171221.3171287
[6]
Chris L Baker, Joshua B Tenenbaum, and Rebecca R Saxe. 2007. Goal inference as inverse planning. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 29.
[7]
Dhruv Batra, Angel X Chang, Sonia Chernova, Andrew J Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, et al. 2020. Rearrangement: A challenge for embodied ai. arXiv preprint arXiv:2011.01975 (2020).
[8]
Micah Carroll, Rohin Shah, Mark K Ho, Tom Griffiths, Sanjit Seshia, Pieter Abbeel, and Anca Dragan. 2019. On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems 32 (2019).
[9]
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. 2021. Decision Transformer: Reinforcement Learning via Sequence Modeling. arXiv:2106.01345 [cs.LG]
[10]
Sean Chen, Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D. Dragan, and Sergey Levine. 2022. ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning. In 2022 International Conference on Robotics and Automation (ICRA). 7505--7512. https://doi.org/10.1109/ICRA46639.2022.9812442
[11]
Zhichao Chen, Yutaka Nakamura, and Hiroshi Ishiguro. 2022. Android as a Receptionist in a Shopping Mall Using Inverse Reinforcement Learning. IEEE Robotics and Automation Letters 7, 3 (2022), 7091--7098. https://doi.org/10.1109/ LRA.2022.3180042
[12]
Matei Ciocarlie, Kaijen Hsiao, Adam Leeper, and David Gossow. 2012. Mobile manipulation through an assistive home robot. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 5313--5320. https://doi.org/10.1109/ IROS.2012.6385907
[13]
Kevin Crowston. 2012. Amazon Mechanical Turk: A Research Tool for Organizations and Information Systems Scholars. In Shaping the Future of ICT Research. Methods and Approaches, Anol Bhattacherjee and Brian Fitzgerald (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 210--221.
[14]
Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Aditi Raghunathan, and Anca Dragan. 2022. Learning Representations that Enable Generalization in Assistive Tasks. In 6th Annual Conference on Robot Learning. https://openreview. net/forum?id=b88HF4vd_ej
[15]
Guy Hoffman. 2019. Evaluating Fluency in Human-Robot Collaboration. IEEE Transactions on Human-Machine Systems 49, 3 (2019), 209--218. https://doi.org/ 10.1109/THMS.2019.2904558
[16]
Shervin Javdani, Henny Admoni, Stefania Pellegrinelli, Siddhartha S. Srinivasa, and J. Andrew Bagnell. 2018. Shared autonomy via hindsight optimization for teleoperation and teaming. The International Journal of Robotics Research 37, 7 (2018), 717--742. https://doi.org/10.1177/0278364918776060 arXiv:https://doi.org/10.1177/0278364918776060
[17]
Michael L Littman, Anthony R Cassandra, and Leslie Pack Kaelbling. 1995. Learning policies for partially observable environments: Scaling up. In Machine Learning Proceedings 1995. Elsevier, 362--370.
[18]
Dylan P Losey, Andrea Bajcsy, Marcia K O'Malley, and Anca D Dragan. 2022. Physical interaction as communication: Learning robot objectives online from human corrections. The International Journal of Robotics Research 41, 1 (2022), 20--44.
[19]
Marius Mosbach, Maksym Andriushchenko, and Dietrich Klakow. 2021. On the Stability of Fine-tuning {BERT}: Misconceptions, Explanations, and Strong Baselines. In International Conference on Learning Representations. https: //openreview.net/forum?id=nzpLWnVAyah
[20]
Benjamin Newman, Kevin Carlberg, and Ruta Desai. 2020. Optimal Assistance for Object-Rearrangement Tasks in Augmented Reality. arXiv:2010.07358 [cs.HC]
[21]
Benjamin A. Newman, Reuben M. Aronson, Kris Kitani, and Henny Admoni. 2022. Helping People Through Space and Time: Assistance as a Perspective on Human-Robot Interaction. Frontiers in Robotics and AI 8 (2022). https: //doi.org/10.3389/frobt.2021.720319
[22]
Benjamin A. Newman, Reuben M. Aronson, Siddhartha S. Srinivasa, Kris Kitani, and Henny Admoni. 2022. HARMONIC: A multimodal dataset of assistive human-robot collaboration. The International Journal of Robotics Research 41, 1 (2022), 3--11. https://doi.org/10.1177/02783649211050677 arXiv:https://doi.org/10.1177/02783649211050677
[23]
Benjamin A. Newman, Abhijat Biswas, Sarthak Ahuja, Siddharth Girdhar, Kris K. Kitani, and Henny Admoni. 2020. Examining the Effects of Anticipatory Robot Assistance on Human Decision Making. In Social Robotics, Alan R. Wagner, David Feil-Seifer, Kerstin S. Haring, Silvia Rossi, Thomas Williams, Hongsheng He, and Shuzhi Sam Ge (Eds.). Springer International Publishing, Cham, 590--603.
[24]
Benjamin A. Newman, Christopher Jason Paxton, Kris Kitani, and Henny Admoni. 2023. Towards Online Adaptation for Autonomous Household Assistants. In Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction (Stockholm, Sweden) (HRI '23). Association for Computing Machinery, New York, NY, USA, 506--510. https://doi.org/10.1145/3568294.3580136
[25]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024--8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[26]
Prolific. 2014 Online. Prolific. https://www.prolific.co
[27]
Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, et al. 2019. Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9339--9347.
[28]
DJ Strouse, Kevin McKee, Matt Botvinick, Edward Hughes, and Richard Everett. 2021. Collaborating with humans without human data. Advances in Neural Information Processing Systems 34 (2021), 14502--14515.
[29]
Andrew Szot, Alexander Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John M Turner, Noah D Maestre, Mustafa Mukadam, Devendra Singh Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimír Vondru?, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel X Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, and Dhruv Batra. 2021. Habitat 2.0: Training Home Assistants to Rearrange their Habitat. In Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (Eds.). https://openreview.net/forum?id=DPHsCQ8OpA
[30]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[31]
Bryce Woodworth, Francesco Ferrari, Teofilo E. Zosa, and Laurel D. Riek. 2018. Preference Learning in Assistive Robotics: Observational Repeated Inverse Reinforcement Learning. In Proceedings of the 3rd Machine Learning for Healthcare Conference (Proceedings of Machine Learning Research, Vol. 85), Finale Doshi-Velez, Jim Fackler, Ken Jung, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens (Eds.). PMLR, 420--439. https://proceedings.mlr.press/v85/woodworth18a.html
[32]
Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum Entropy Inverse Reinforcement Learning. In Proc. AAAI. 1433--1438.
Index Terms
- Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration
Recommendations
Human-Agent Collaboration: Can an Agent be a Partner?
CHI EA '17: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing SystemsHuman-Agent Interaction has been much studied and discussed in the last two decades. We have two starting points for this panel. First we observe that interaction is not the same as collaboration. Collaboration involves mutual goal understanding, ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
May 2024
2898 pages
ISBN:9798400704864
- General Chairs:
- Mehdi Dastani,
- Jaime Simão Sichman,
- Program Chairs:
- Natasha Alechina,
- Virginia Dignum
Sponsors
Publisher
International Foundation for Autonomous Agents and Multiagent Systems
Richland, SC
Publication History
Published: 06 May 2024
Check for updates
Author Tags
Qualifiers
- Research-article
Conference
AAMAS '24
Sponsor:
AAMAS '24: International Conference on Autonomous Agents and Multiagent Systems
May 6 - 10, 2024
Auckland, New Zealand
Acceptance Rates
Overall Acceptance Rate 1,155 of 5,036 submissions, 23%
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 22Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)4
Reflects downloads up to 16 Nov 2024
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in