Computer Science > Machine Learning

arXiv:2311.13294 (cs)

[Submitted on 22 Nov 2023]

Title:Probabilistic Inference in Reinforcement Learning Done Right

Authors:Jean Tarbouriech, Tor Lattimore, Brendan O'Donoghue

View PDF

Abstract:A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statistical inference and consequently do not perform well in challenging problems. In this work, we undertake a rigorous Bayesian treatment of the posterior probability of state-action optimality and clarify how it flows through the MDP. We first reveal that this quantity can indeed be used to generate a policy that explores efficiently, as measured by regret. Unfortunately, computing it is intractable, so we derive a new variational Bayesian approximation yielding a tractable convex optimization problem and establish that the resulting policy also explores efficiently. We call our approach VAPOR and show that it has strong connections to Thompson sampling, K-learning, and maximum entropy exploration. We conclude with some experiments demonstrating the performance advantage of a deep RL version of VAPOR.

Comments:	NeurIPS 2023
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2311.13294 [cs.LG]
	(or arXiv:2311.13294v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.13294

Submission history

From: Jean Tarbouriech [view email]
[v1] Wed, 22 Nov 2023 10:23:14 UTC (7,561 KB)

Computer Science > Machine Learning

Title:Probabilistic Inference in Reinforcement Learning Done Right

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Probabilistic Inference in Reinforcement Learning Done Right

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators