Model-Value Inconsistency as a Signal for Epistemic Uncertainty

Angelos Filos, Eszter Vértes, Zita Marinho, Gregory Farquhar, Diana Borsa, Abram Friesen, Feryal Behbahani, Tom Schaul, Andre Barreto, Simon Osindero
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:6474-6498, 2022.

Abstract

Using a model of the environment and a value function, an agent can construct many estimates of a state’s value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an implicit value ensemble (IVE). Consequently, the discrepancy between these estimates can be used as a proxy for the agent’s epistemic uncertainty; we term this signal model-value inconsistency or self-inconsistency for short. Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms. We provide empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful (i) as a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a learned model.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-filos22a, title = {Model-Value Inconsistency as a Signal for Epistemic Uncertainty}, author = {Filos, Angelos and V{\'e}rtes, Eszter and Marinho, Zita and Farquhar, Gregory and Borsa, Diana and Friesen, Abram and Behbahani, Feryal and Schaul, Tom and Barreto, Andre and Osindero, Simon}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {6474--6498}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/filos22a/filos22a.pdf}, url = {https://proceedings.mlr.press/v162/filos22a.html}, abstract = {Using a model of the environment and a value function, an agent can construct many estimates of a state’s value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an implicit value ensemble (IVE). Consequently, the discrepancy between these estimates can be used as a proxy for the agent’s epistemic uncertainty; we term this signal model-value inconsistency or self-inconsistency for short. Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms. We provide empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful (i) as a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a learned model.} }
Endnote
%0 Conference Paper %T Model-Value Inconsistency as a Signal for Epistemic Uncertainty %A Angelos Filos %A Eszter Vértes %A Zita Marinho %A Gregory Farquhar %A Diana Borsa %A Abram Friesen %A Feryal Behbahani %A Tom Schaul %A Andre Barreto %A Simon Osindero %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-filos22a %I PMLR %P 6474--6498 %U https://proceedings.mlr.press/v162/filos22a.html %V 162 %X Using a model of the environment and a value function, an agent can construct many estimates of a state’s value, by unrolling the model for different lengths and bootstrapping with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an implicit value ensemble (IVE). Consequently, the discrepancy between these estimates can be used as a proxy for the agent’s epistemic uncertainty; we term this signal model-value inconsistency or self-inconsistency for short. Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms. We provide empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful (i) as a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a learned model.
APA
Filos, A., Vértes, E., Marinho, Z., Farquhar, G., Borsa, D., Friesen, A., Behbahani, F., Schaul, T., Barreto, A. & Osindero, S.. (2022). Model-Value Inconsistency as a Signal for Epistemic Uncertainty. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:6474-6498 Available from https://proceedings.mlr.press/v162/filos22a.html.

Related Material