Statistics > Machine Learning

arXiv:2212.02125 (stat)

[Submitted on 5 Dec 2022]

Title:TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets

Authors:Yuanying Cai, Chuheng Zhang, Li Zhao, Wei Shen, Xuyun Zhang, Lei Song, Jiang Bian, Tao Qin, Tieyan Liu

View PDF

Abstract:We consider an offline reinforcement learning (RL) setting where the agent need to learn from a dataset collected by rolling out multiple behavior policies. There are two challenges for this setting: 1) The optimal trade-off between optimizing the RL signal and the behavior cloning (BC) signal changes on different states due to the variation of the action coverage induced by different behavior policies. Previous methods fail to handle this by only controlling the global trade-off. 2) For a given state, the action distribution generated by different behavior policies may have multiple modes. The BC regularizers in many previous methods are mean-seeking, resulting in policies that select out-of-distribution (OOD) actions in the middle of the modes. In this paper, we address both challenges by using adaptively weighted reverse Kullback-Leibler (KL) divergence as the BC regularizer based on the TD3 algorithm. Our method not only trades off the RL and BC signals with per-state weights (i.e., strong BC regularization on the states with narrow action coverage, and vice versa) but also avoids selecting OOD actions thanks to the mode-seeking property of reverse KL. Empirically, our algorithm can outperform existing offline RL algorithms in the MuJoCo locomotion tasks with the standard D4RL datasets as well as the mixed datasets that combine the standard datasets.

Comments:	Accepted by ICDM-22 (Best Student Paper Runner-Up Awards)
Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2212.02125 [stat.ML]
	(or arXiv:2212.02125v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2212.02125

Submission history

From: Chuheng Zhang [view email]
[v1] Mon, 5 Dec 2022 09:36:23 UTC (687 KB)

Statistics > Machine Learning

Title:TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators