short-paper

Open access

Language, Camera, Autonomy! Prompt-engineered Robot Control for Rapidly Evolving Deployment

Authors:

Jacob P. Macdonald,

Allan B. Wollaber,

Jaime D. Peña,

Nathan McNeese,

Ho Chit SiuAuthors Info & Claims

HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

Pages 717 - 721

https://doi.org/10.1145/3610978.3640671

Published: 11 March 2024 Publication History

Abstract

The Context-observant LLM-Enabled Autonomous Robots (CLEAR) platform offers a general solution for large language model (LLM)-enabled robot autonomy. CLEAR-controlled robots use natural language to perceive and interact with their environment: contextual description deriving from computer vision and optional human commands prompt intelligent LLM responses that map to robotic actions. By emphasizing prompting, system behavior is programmed without manipulating code, and unlike other LLM-based robot control methods, we do not perform any model fine-tuning. CLEAR employs off-the-shelf pre-trained machine learning models for controlling robots ranging from simulated quadcopters to terrestrial quadrupeds. We provide the open-source CLEAR platform, along with sample implementations for a Unity-based quadcopter and Boston Dynamics Spot® robot. Each LLM used, GPT-3.5, GPT-4, and LLaMA2, exhibited behavioral differences when embodied by CLEAR, contrasting in actuation preference, ability to apply new knowledge, and receptivity to human instruction. GPT-4 demonstrates best performance compared to GPT-3.5 and LLaMA2, showing successful task execution 97% of the time. The CLEAR platform contributes to HRI by increasing the usability of robotics for natural human interaction.

Supplemental Material

MP4 File

Supplemental video

Download
228.49 MB

References

[1]

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. 2022. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022).

[2]

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. 2023. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818 (2023).

[3]

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, et al. 2022. Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817 (2022).

[4]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv: 2005.14165 [cs.CL]

[5]

Ishita Dasgupta, Christine Kaeser-Chen, Kenneth Marino, Arun Ahuja, Sheila Babayan, Felix Hill, and Rob Fergus. 2023. Collaborating with language models for embodied reasoning. arxiv: 2302.00763 [cs.LG]

[6]

Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, and Hongsheng Li. 2023. Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model. arxiv: 2305.11176 [cs.RO]

[7]

Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, and Linxi Fan. 2023. VIMA: General Robot Manipulation with Multimodal Prompts. arxiv: 2210.03094 [cs.RO]

[8]

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. 2023. YOLO by Ultralytics. https://github.com/ultralytics/ultralytics

[9]

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. 2023. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9493--9500.

[10]

OpenAI. 2023. GPT-4 Technical Report. arxiv: 2303.08774 [cs.CL]

[11]

Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. arxiv: 2304.03442 [cs.HC]

[12]

Guilherme Penedo, Quentin Malartic, Daniel Hesslow, Ruxandra Cojocaru, Alessandro Cappelli, Hamza Alobeidli, Baptiste Pannier, Ebtesam Almazrouei, and Julien Launay. 2023. The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116 (2023).

[13]

Maximilian Schreiner. 2023. GPT-4 architecture, datasets, costs and more leaked. https://the-decoder.com/gpt-4-architecture-datasets-costs-and-more-leaked/

[14]

Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. 2023. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 11523--11530.

[15]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arxiv: 2307.09288 [cs.CL]

[16]

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2023. A Survey on Large Language Model based Autonomous Agents. arxiv: 2308.11432 [cs.AI]

[17]

Jimmy Wu, Rika Antonova, Adam Kan, Marion Lepert, Andy Zeng, Shuran Song, Jeannette Bohg, Szymon Rusinkiewicz, and Thomas Funkhouser. 2023. TidyBot: Personalized Robot Assistance with Large Language Models. arxiv: 2305.05658 [cs.RO]

[18]

Lei Xu, Yangyi Chen, Ganqu Cui, Hongcheng Gao, and Zhiyuan Liu. 2022. Exploring the universal vulnerability of prompt-based learning paradigm. arXiv preprint arXiv:2204.05239 (2022).

[19]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large Language Models. arxiv: 2303.18223 [cs.CL]

Cited By

Benjdira BKoubaa AAli A(2025)Prompting Robotic Modalities (PRM): A structured architecture for centralizing language models in complex systemsFuture Generation Computer Systems10.1016/j.future.2025.107723166(107723)Online publication date: May-2025
https://doi.org/10.1016/j.future.2025.107723
Padmanabha AYuan JGupta JKarachiwalla ZMajidi CAdmoni HErickson Z(2024)VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive RobotsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676401(1-18)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676401
Wu ZShu PLi YLi QLiu TLi X(2024)Robot Control via Natural Instructions Empowered by Large Language ModelDiscovering the Frontiers of Human-Robot Interaction10.1007/978-3-031-66656-8_19(437-457)Online publication date: 24-Jul-2024
https://doi.org/10.1007/978-3-031-66656-8_19

Index Terms

Language, Camera, Autonomy! Prompt-engineered Robot Control for Rapidly Evolving Deployment

Recommendations

Humanoid robot control using depth camera
HRI '11: Proceedings of the 6th international conference on Human-robot interaction

Most human interactions with the environment depend on our ability to navigate freely and to use our hands and arms to manipulate objects. Developing natural means of controlling these abilities in humanoid robots can significantly broaden the usability ...
A Taxonomy of Robot Autonomy for Human-Robot Interaction
HRI '24: Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

Robot autonomy is an influential and ubiquitous factor in human-robot interaction (HRI), but it is rarely discussed beyond a one-dimensional measure of the degree to which a robot operates without human intervention. As robots become more sophisticated, ...
An Autonomous Robot for Harvesting Cucumbers in Greenhouses

This paper describes the concept of an autonomous robot for harvesting cucumbers in greenhouses. A description is given of the working environment of the robot and the logistics of harvesting. It is stated that for a 2 ha Dutch nursery, 4 harvesting ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction

March 2024

1408 pages

ISBN:9798400703232

DOI:10.1145/3610978

General Chairs:
Dan Grollman
Plus One Robotics, USA
,
Elizabeth Broadbent
University of Auckland, New Zealand60005686
,
Program Chairs:
Wendy Ju
Cornell Tech, USA60104837
,
Harold Soh
National University of Singapore, Singapore60017161
,
Tom Williams
Colorado School of Mines, USA60014965

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 March 2024

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Under Secretary of Defense for Research and Engineering

Conference

HRI '24

Sponsor:

HRI '24: ACM/IEEE International Conference on Human-Robot Interaction

March 11 - 15, 2024

CO, Boulder, USA

Acceptance Rates

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Upcoming Conference

HRI '25

Sponsor:
sigai
sigai

ACM/IEEE International Conference on Human-Robot Interaction

March 4 - 6, 2025

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
1,841
Total Downloads

Downloads (Last 12 months)1,841
Downloads (Last 6 weeks)124

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Benjdira BKoubaa AAli A(2025)Prompting Robotic Modalities (PRM): A structured architecture for centralizing language models in complex systemsFuture Generation Computer Systems10.1016/j.future.2025.107723166(107723)Online publication date: May-2025
https://doi.org/10.1016/j.future.2025.107723
Padmanabha AYuan JGupta JKarachiwalla ZMajidi CAdmoni HErickson Z(2024)VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive RobotsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676401(1-18)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676401
Wu ZShu PLi YLi QLiu TLi X(2024)Robot Control via Natural Instructions Empowered by Large Language ModelDiscovering the Frontiers of Human-Robot Interaction10.1007/978-3-031-66656-8_19(437-457)Online publication date: 24-Jul-2024
https://doi.org/10.1007/978-3-031-66656-8_19

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten