How to Prompt Your Robot: A PromptBook for Manipulation Skills with Code as Policies

Montserrat Gonzalez Arenas, Ted Xiao, Sumeet Singh, Vidhi Jain, Allen Z. Ren, Quan Vuong, Jake Varley, Alexander Herzog, Isabel Leal, Sean Kirmani, Dorsa Sadigh, Vikas Sindhwani, Kanishka Rao, Jacky Liang, Andy Zeng

Published: 21 Oct 2023, Last Modified: 21 Oct 2023LangRob @ CoRL 2023 PosterEveryoneRevisionsBibTeX

Keywords: zero-shot manipulation skills, prompting, language models, in-context-learning, code as policies

TL;DR: Prompting LLMs to write code for low-level manipulation primitives zero-shot.

Abstract: Large Language Models (LLMs) have demonstrated the ability to perform semantic reasoning, planning and code writing for robotics tasks. However, most methods rely on pre-existing primitives (i.e. pick, open drawer), which heavily limits their scalability to new scenarios. Additionally, existing approaches like Code as Policies (CaP) rely on examples of robot code in the prompt to write code for new tasks, and assume that LLMs can infer task information, constraints, and API usage from examples alone. But examples can be costly, and too few or too many can bias the LLM in the wrong direction. Recent research has demonstrated prompting LLMs with APIs and documentation enables code writing for successful zero-shot tool use. However, documenting robotics tasks and naively providing full robot APIs presents a challenge to context-length limits in LLMs. In this work, we introduce PromptBook, a recipe that combines LLM prompting paradigms - examples, APIs, documentation and chain of thought, to generate code for planning a sorting task with higher success rate than previous works. We further demonstrate PromptBook enables LLMs to write code for new low-level manipulation primitives in a zero-shot manner: from picking diverse objects, opening/closing drawers, to whisking, and waving hello. We evaluate the new skills on a mobile manipulator with 83\% success rate at picking, 50-71% at opening drawers and 100\% at closing them. Notably, the LLM is able to infer gripper orientation for grasping a drawer handle (z-axis aligned) vs. a top-down grasp (x-axis aligned). Finally, we provide guidelines to leverage human feedback and LLMs to write PromptBook prompts.

Submission Number: 42