research-article

Open access

Preference-Conditioned Language-Guided Abstraction

Authors:

Theodore R. Sumers,

Ilia Sucholutsky,

Nishanth Kumar,

Thomas L. Griffiths,

Julie A. ShahAuthors Info & Claims

HRI '24: Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

Pages 572 - 581

https://doi.org/10.1145/3610977.3634930

Published: 11 March 2024 Publication History

Abstract

Learning from demonstrations is a common way for users to teach robots, but it is prone to spurious feature correlations. Recent work constructs state abstractions, i.e. visual representations containing task-relevant features, from language as a way to perform more generalizable learning. However, these abstractions also depend on a user's preference for what matters in a task, which may be hard to describe or infeasible to exhaustively specify using language alone. How do we construct abstractions to capture these latent preferences? We observe that how humans behave reveals how they see the world. Our key insight is that changes in human behavior inform us that there are differences in preferences for how humans see the world, i.e. their state abstractions. In this work, we propose using language models (LMs) to query for those preferences directly given knowledge that a change in behavior has occurred. In our framework, we use the LM in two ways: first, given a text description of the task and knowledge of behavioral change between states, we query the LM for possible hidden preferences; second, given the most likely preference, we query the LM to construct the state abstraction. In this framework, the LM is also able to ask the human directly when uncertain about its own estimate. We demonstrate our framework's ability to construct effective preference-conditioned abstractions in simulated experiments, a user study, as well as on a real Spot robot performing mobile manipulation tasks.

Supplemental Material

ZIP File

Appendix attached (1 page pdf)

Download
311.87 KB

References

[1]

David Abel, John Salvatier, Andreas Stuhlmüller, and Owain Evans. 2017. Agent-agnostic human-in-the-loop reinforcement learning. arXiv preprint arXiv:1701.04079 (2017).

[2]

Gati Aher, Rosa I. Arriaga, and Adam Tauman Kalai. 2023. Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies. arXiv:2208.10264 [cs.CL]

[3]

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, et al. 2022. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022).

[4]

Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate. 2023. Out of One, Many: Using Language Models to Simulate Human Samples. Political Analysis 31, 3 (2023), 337--351. https: //doi.org/10.1017/pan.2023.2

[5]

Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, T. J. Henighan, Nicholas Joseph, Saurav Kadavath, John Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, Dario Amodei, Tom B. Brown, Jack Clark, Sam McCandlish, Christopher Olah, Benjamin Mann, and Jared Kaplan. 2022. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. ArXiv abs/2204.05862 (2022). https://api. semanticscholar.org/CorpusID:248118878

[6]

Andreea Bobu, Chris Paxton, Wei Yang, Balakumar Sundaralingam, Yu-Wei Chao, Maya Cakmak, and Dieter Fox. 2022. Learning perceptual concepts by bootstrapping from human queries. IEEE Robotics and Automation Letters 7, 4 (2022), 11260--11267.

[7]

Andreea Bobu, Andi Peng, Pulkit Agrawal, Julie Shah, and Anca D Dragan. 2023. Aligning Robot and Human Representations. arXiv preprint arXiv:2302.01928 (2023).

[8]

Daniel S Brown, Wonjoon Goo, and Scott Niekum. 2020. Better-than- demonstrator imitation learning via automatically-ranked demonstrations. In Conference on robot learning. PMLR, 330--359.

[9]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ran- zato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877--1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

[10]

Kalesha Bullard, Sonia Chernova, and Andrea L Thomaz. 2018. Human-driven feature selection for a robotic agent learning classification tasks from demonstra- tion. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6923--6930.

Digital Library

[11]

Maya Cakmak and Andrea L Thomaz. 2012. Designing robot learners that ask good questions. In Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction. 17--24.

Digital Library

[12]

Crystal Chao, Maya Cakmak, and Andrea L Thomaz. 2010. Transparent active learning for robots. In 2010 5th ACM/IEEE International Conference on Human- Robot Interaction (HRI). IEEE, 317--324.

[13]

Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).

[14]

Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep Reinforcement Learning from Human Preferences. In Ad- vances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Ben- gio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Cur- ran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/ d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf

[15]

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Web- son, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei. 2022. Scaling Instruction-Finetuned Language Models. arXiv:2210.11416 [cs.LG]

[16]

John D Co-Reyes, Abhishek Gupta, Suvansh Sanjeev, Nick Altieri, Jacob Andreas, John DeNero, Pieter Abbeel, and Sergey Levine. 2018. Guiding policies with language via meta-learning. arXiv preprint arXiv:1811.07882 (2018).

[17]

Carlos G Correa, Mark K Ho, Frederick Callaway, Nathaniel D Daw, and Thomas L Griffiths. 2022. Humans decompose tasks by trading off utility and computational cost. arXiv preprint arXiv:2211.03890 (2022).

[18]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[19]

Danica Dillion, Niket Tandon, Yuling Gu, and Kurt Gray. 2023. Can AI language models replace human participants? Trends in Cognitive Sciences 27, 7 (2023), 597--600. https://doi.org/10.1016/j.tics.2023.04.008

[20]

Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, Andy Jones, Sam Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield-Dodds, Tom Henighan, Danny Hernandez, Tristan Hume, Josh Jacobson, Scott Johnston, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei, Tom Brown, Nicholas Joseph, Sam McCandlish, Chris Olah, Jared Kaplan, and Jack Clark. 2022. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned. arXiv:2209.07858 [cs.CL]

[21]

Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload, Peter A. Hancock and Najmedin Meshkati (Eds.). Advances in Psychology, Vol. 52. North-Holland, 139--183. https://doi.org/10.1016/S0166- 4115(08)62386--9

[22]

Mark K Ho, David Abel, Carlos G Correa, Michael L Littman, Jonathan D Cohen, and Thomas L Griffiths. 2022. People construct simplified mental representations to plan. Nature 606, 7912 (2022), 129--136.

[23]

Mark K Ho, Jonathan D Cohen, and Tom Griffiths. 2023. Rational simplification and rigidity in human planning. (2023).

[24]

Or Honovich, Thomas Scialom, Omer Levy, and Timo Schick. 2023. Unnat- ural Instructions: Tuning Language Models with (Almost) No Human Labor. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 14409--14428. https://doi.org/10.18653/v1/2023.acl-long.806

[25]

Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. 2022. Lan- guage Models as Zero-Shot Planners: Extracting Actionable Knowledge for Em- bodied Agents. arXiv preprint arXiv:2201.07207 (2022).

[26]

Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, et al . 2022. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608 (2022).

[27]

Holly Huey, Xuanchen Lu, Caren M Walker, and Judith E Fan. 2023. Visual explanations prioritize functional properties at the expense of visual fidelity. Cognition 236 (2023), 105414.

[28]

Hong Jun Jeon, Smitha Milli, and Anca Dragan. 2020. Reward-rational (implicit) choice: A unifying formalism for reward learning. In Advances in Neural Informa- tion Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 4415--4426. https://proceedings. neurips.cc/paper/2020/file/2f10c1578a0706e06b6d7db6f0b4a6af-Paper.pdf

[29]

Jianchao Ji, Zelong Li, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Juntao Tan, and Yongfeng Zhang. 2023. GenRec: Large Language Model for Generative Recommendation. ArXiv abs/2307.00457 (2023). https://api.semanticscholar.org/ CorpusID:259332879

[30]

Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi Fan. 2022. Vima: General robot manipulation with multimodal prompts. arXiv preprint arXiv:2210.03094 (2022).

[31]

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al . 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).

[32]

W Bradley Knox and Peter Stone. 2008. Tamer: Training an agent manually via evaluative reinforcement. In 2008 7th IEEE international conference on development and learning. IEEE, 292--297.

[33]

Minae Kwon, Sang Michael Xie, Kalesha Bullard, and Dorsa Sadigh. 2023. Reward design with language models. arXiv preprint arXiv:2303.00001 (2023).

[34]

Belinda Z. Li, William Chen, Pratyusha Sharma, and Jacob Andreas. 2023. LaMPP: Language Models as Probabilistic Priors for Perception and Action. arXiv:2302.02801 [cs.LG]

[35]

Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 2016. A Persona-Based Neural Conversation Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 994--1003. https://doi.org/10.18653/v1/P16--1094

[36]

Jessy Lin, Daniel Fried, Dan Klein, and Anca Dragan. 2022. Inferring Rewards from Language in Context. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 8546--8560. https://doi.org/10. 18653/v1/2022.acl-long.585

[37]

Hanjia Lyu, Song Jiang, Hanqing Zeng, Yinglong Xia, and Jiebo Luo. 2023. LLM- Rec: Personalized Recommendation via Prompting Large Language Models. ArXiv abs/2307.15780 (2023). https://api.semanticscholar.org/CorpusID:260334587 HRI '24, March 11--14, 2024, Boulder, CO, USA Peng et al.

[38]

Zhengyi Ma, Zhicheng Dou, Yutao Zhu, Hanxun Zhong, and Ji-Rong Wen. 2021. One Chatbot Per Person: Creating Personalized Chatbots Based on Implicit User Profiles. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 555--564. https://doi.org/10.1145/3404835.3462828

Digital Library

[39]

James MacGlashan, Mark K Ho, Robert Loftin, Bei Peng, Guan Wang, David L Roberts, Matthew E Taylor, and Michael L Littman. 2017. Interactive learning from policy-dependent human feedback. In International Conference on Machine Learning. PMLR, 2285--2294.

[40]

Zhiming Mao, Huimin Wang, Yiming Du, and Kam-Fai Wong. 2023. UniTRec: A Unified Text-to-Text Transformer and Joint Contrastive Learning Framework for Text-based Recommendation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Toronto, Canada, 1160--1170. https://doi.org/10. 18653/v1/2023.acl-short.100

[41]

OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]

[42]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 27730--27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/ b1efde53be364a73914f58805a001731-Paper-Conference.pdf

[43]

Andi Peng, Aviv Netanyahu, Mark K Ho, Tianmin Shu, Andreea Bobu, Julie Shah, and Pulkit Agrawal. 2023. Diagnosis, Feedback, Adaptation: A Human-in-the- Loop Framework for Test-Time Policy Adaptation. (2023).

[44]

Andi Peng, Ilia Sucholutsky, Belinda Li, Theodore Sumers, Thomas Griffiths, Jacob Andreas, and Julie Shah. [n. d.]. Learning with Language-Guided State Abstractions. ([n. d.]).

[45]

Qiao Qian, Minlie Huang, Haizhou Zhao, Jingfang Xu, and Xiaoyan Zhu. 2018. Assigning Personality/Profile to a Chatting Machine for Coherent Conversation Generation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, 4279--4285. https://doi.org/10.24963/ijcai.2018/595

[46]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv e-prints (2019). arXiv:1910.10683

[47]

Krishan Rana, Andrew Melnik, and Niko Sünderhauf. 2023. Contrastive Language, Action, and State Pre-training for Robot Learning. CoRR abs/2304.10782 (2023). https://doi.org/10.48550/arXiv.2304.10782 arXiv:2304.10782

[48]

Pratyusha Sharma, Antonio Torralba, and Jacob Andreas. 2022. Skill Induction and Planning with Latent Language. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 1713--1726. https://doi.org/10. 18653/v1/2022.acl-long.120

[49]

Haoyu Song, Wei-Nan Zhang, Yiming Cui, Dong Wang, and Ting Liu. 2019. Exploiting Persona Information for Diverse Generation of Conversational Re- sponses. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial In- telligence Organization, 5190--5196. https://doi.org/10.24963/ijcai.2019/721

[50]

Nisan Stiennon, Long Ouyang, Jeffrey Wu, Daniel Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul F Christiano. 2020. Learning to summarize with human feedback. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 3008--3021. https://proceedings.neurips.cc/ paper_files/paper/2020/file/1f89885d556929e98d3ef9b86448f951-Paper.pdf

[51]

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).

[52]

Xiaolei Wang, Kun Zhou, Ji-Rong Wen, and Wayne Xin Zhao. 2022. Towards Unified Conversational Recommender Systems via Knowledge-Enhanced Prompt Learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Washington DC, USA) (KDD '22). Association for Computing Machinery, New York, NY, USA, 1929--1937. https://doi.org/10.1145/ 3534678.3539382

Digital Library

[53]

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. Self-Instruct: Aligning Lan- guage Models with Self-Generated Instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- pers). Association for Computational Linguistics, Toronto, Canada, 13484--13508. https://doi.org/10.18653/v1/2023.acl-long.754

[54]

Jimmy Wu, Rika Antonova, Adam Kan, Marion Lepert, Andy Zeng, Shuran Song, Jeannette Bohg, Szymon Rusinkiewicz, and Thomas Funkhouser. 2023. TidyBot: Personalized Robot Assistance with Large Language Models. Autonomous Robots (2023).

[55]

Likang Wu, Zhilan Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen, Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, Hui Xiong, and Enhong Chen. 2023. A Survey on Large Language Models for Recommendation. ArXiv abs/2305.19860 (2023). https://api.semanticscholar.org/CorpusID:258987581

[56]

Andy Zeng, Maria Attarian, brian ichter, Krzysztof Marcin Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael S Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, and Pete Florence. 2023. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language. In The Eleventh International Conference on Learning Representations. https://openreview. net/forum?id=G2Q2Mh3avow

[57]

Ruohan Zhang, Faraz Torabi, Lin Guan, Dana H Ballard, and Peter Stone. 2019. Leveraging human guidance for deep reinforcement learning tasks. arXiv preprint arXiv:1909.09906 (2019).

[58]

Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018. Personalizing Dialogue Agents: I have a dog, do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 2204--2213. https://doi.org/10.18653/v1/P18--1205

[59]

Shengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tianwei Zhang, Fei Wu, and Guoyin Wang. 2023. Instruction Tuning for Large Language Models: A Survey. arXiv:2308.10792 [cs.CL]

[60]

Hanxun Zhong, Zhicheng Dou, Yutao Zhu, Hongjin Qian, and Ji-Rong Wen. 2022. Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States, 5808--5820. https://doi.org/10.18653/v1/2022.naacl-main.426

[61]

Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, and Ishan Misra. 2022. Detecting twenty-thousand classes using image-level supervision. In European Conference on Computer Vision. Springer, 350--368.

Digital Library

[62]

Xuhui Zhou, Yue Zhang, Leyang Cui, and Dandan Huang. 2019. Evaluating Commonsense in Pre-trained Language Models. In AAAI Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:208310123

[63]

Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2020. Fine-Tuning Language Models from Human Preferences. arXiv:1909.08593 [cs.CL]

Recommendations

Aligning Human and Robot Representations
HRI '24: Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

To act in the world, robots rely on a representation of salient task aspects: for example, to carry a coffee mug, a robot may consider movement efficiency or mug orientation in its behavior. However, if we want robots to act for and with people, their ...
Automatic task decomposition and state abstraction from demonstration
AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1

Both Learning from Demonstration (LfD) and Reinforcement Learning (RL) are popular approaches for building decision-making agents. LfD applies supervised learning to a set of human demonstrations to infer and imitate the human policy, while RL uses only ...
Q^{*}-based state abstraction and knowledge discovery in reinforcement learning

State abstraction and value function approximation are powerful and useful methods for time and memory management in reinforcement learning. In traditional trends, these methods are applied to speed up learning of the current task; however, when we ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HRI '24: Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

March 2024

982 pages

ISBN:9798400703225

DOI:10.1145/3610977

General Chairs:
Dan Grollman
Plus One Robotics, USA
,
Elizabeth Broadbent
University of Auckland, New Zealand
,
Program Chairs:
Wendy Ju
Cornell Tech, USA
,
Harold Soh
National University of Singapore, Singapore
,
Tom Williams
Colorado School of Mines, USA

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 March 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HRI '24

Sponsor:

HRI '24: ACM/IEEE International Conference on Human-Robot Interaction

March 11 - 15, 2024

CO, Boulder, USA

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
479
Total Downloads

Downloads (Last 12 months)479
Downloads (Last 6 weeks)78

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents