Computer Science > Machine Learning

arXiv:2211.15144 (cs)

[Submitted on 28 Nov 2022 (v1), last revised 17 Apr 2023 (this version, v2)]

Title:Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes

Authors:Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, Sergey Levine

View PDF

Abstract:The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP. However, recent works argue that offline RL methods encounter unique challenges to scaling up model capacity. Drawing on the learnings from these works, we re-examine previous design choices and find that with appropriate choices: ResNets, cross-entropy based distributional backups, and feature normalization, offline Q-learning algorithms exhibit strong performance that scales with model capacity. Using multi-task Atari as a testbed for scaling and generalization, we train a single policy on 40 games with near-human performance using up-to 80 million parameter networks, finding that model performance scales favorably with capacity. In contrast to prior work, we extrapolate beyond dataset performance even when trained entirely on a large (400M transitions) but highly suboptimal dataset (51% human-level performance). Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal. Finally, we show that offline Q-learning with a diverse dataset is sufficient to learn powerful representations that facilitate rapid transfer to novel games and fast online learning on new variations of a training game, improving over existing state-of-the-art representation learning approaches.

Comments:	Accepted at ICLR 2023. Project website: this https URL
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2211.15144 [cs.LG]
	(or arXiv:2211.15144v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2211.15144

Submission history

From: Aviral Kumar [view email]
[v1] Mon, 28 Nov 2022 08:56:42 UTC (10,477 KB)
[v2] Mon, 17 Apr 2023 18:45:23 UTC (10,477 KB)

Computer Science > Machine Learning

Title:Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators