Computer Science > Machine Learning

arXiv:2307.13824 (cs)

[Submitted on 25 Jul 2023]

Title:Offline Reinforcement Learning with On-Policy Q-Function Regularization

Authors:Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist

View PDF

Abstract:The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.

Comments:	Published at European Conference on Machine Learning (ECML), 2023
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2307.13824 [cs.LG]
	(or arXiv:2307.13824v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2307.13824

Submission history

From: Laixi Shi [view email]
[v1] Tue, 25 Jul 2023 21:38:08 UTC (5,792 KB)

Computer Science > Machine Learning

Title:Offline Reinforcement Learning with On-Policy Q-Function Regularization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Offline Reinforcement Learning with On-Policy Q-Function Regularization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators