Computer Science > Machine Learning

arXiv:2409.19479 (cs)

[Submitted on 28 Sep 2024]

Title:Spatial Reasoning and Planning for Deep Embodied Agents

View PDF

Abstract:Humans can perform complex tasks with long-term objectives by planning, reasoning, and forecasting outcomes of actions. For embodied agents to achieve similar capabilities, they must gain knowledge of the environment transferable to novel scenarios with a limited budget of additional trial and error. Learning-based approaches, such as deep RL, can discover and take advantage of inherent regularities and characteristics of the application domain from data, and continuously improve their performances, however at a cost of large amounts of training data. This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks, focusing on enhancing learning efficiency, interpretability, and transferability across novel scenarios. Four key contributions are made. 1) CALVIN, a differential planner that learns interpretable models of the world for long-term planning. It successfully navigated partially observable 3D environments, such as mazes and indoor rooms, by learning the rewards and state transitions from expert demonstrations. 2) SOAP, an RL algorithm that discovers options unsupervised for long-horizon tasks. Options segment a task into subtasks and enable consistent execution of the subtask. SOAP showed robust performances on history-conditional corridor tasks as well as classical benchmarks such as Atari. 3) LangProp, a code optimisation framework using LLMs to solve embodied agent problems that require reasoning by treating code as learnable policies. The framework successfully generated interpretable code with comparable or superior performance to human-written experts in the CARLA autonomous driving benchmark. 4) Voggite, an embodied agent with a vision-to-action transformer backend that solves complex tasks in Minecraft. It achieved third place in the MineRL BASALT Competition by identifying action triggers to segment tasks into multiple stages.

Comments:	DPhil Thesis - Engineering Science, University of Oxford. Original copy available at this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2409.19479 [cs.LG]
	(or arXiv:2409.19479v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2409.19479
Related DOI:	https://doi.org/10.5287/ora-0nexmdyo0

Submission history

From: Shu Ishida [view email]
[v1] Sat, 28 Sep 2024 23:05:56 UTC (23,708 KB)

Computer Science > Machine Learning

Title:Spatial Reasoning and Planning for Deep Embodied Agents

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Spatial Reasoning and Planning for Deep Embodied Agents

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators