DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck

Jiameng Fan, Wenchao Li
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:6074-6102, 2022.

Abstract

Deep reinforcement learning (DRL) agents are often sensitive to visual changes that were unseen in their training environments. To address this problem, we leverage the sequential nature of RL to learn robust representations that encode only task-relevant information from observations based on the unsupervised multi-view setting. Specif- ically, we introduce a novel contrastive version of the Multi-View Information Bottleneck (MIB) objective for temporal data. We train RL agents from pixels with this auxiliary objective to learn robust representations that can compress away task-irrelevant information and are predictive of task-relevant dynamics. This approach enables us to train high-performance policies that are robust to visual distractions and can generalize well to unseen environments. We demonstrate that our approach can achieve SOTA performance on a di- verse set of visual control tasks in the DeepMind Control Suite when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established base- lines for generalization to unseen environments on the Procgen benchmark. Our code is open- sourced and available at https://github. com/BU-DEPEND-Lab/DRIBO.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-fan22b, title = {{DRIBO}: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck}, author = {Fan, Jiameng and Li, Wenchao}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {6074--6102}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/fan22b/fan22b.pdf}, url = {https://proceedings.mlr.press/v162/fan22b.html}, abstract = {Deep reinforcement learning (DRL) agents are often sensitive to visual changes that were unseen in their training environments. To address this problem, we leverage the sequential nature of RL to learn robust representations that encode only task-relevant information from observations based on the unsupervised multi-view setting. Specif- ically, we introduce a novel contrastive version of the Multi-View Information Bottleneck (MIB) objective for temporal data. We train RL agents from pixels with this auxiliary objective to learn robust representations that can compress away task-irrelevant information and are predictive of task-relevant dynamics. This approach enables us to train high-performance policies that are robust to visual distractions and can generalize well to unseen environments. We demonstrate that our approach can achieve SOTA performance on a di- verse set of visual control tasks in the DeepMind Control Suite when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established base- lines for generalization to unseen environments on the Procgen benchmark. Our code is open- sourced and available at https://github. com/BU-DEPEND-Lab/DRIBO.} }
Endnote
%0 Conference Paper %T DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck %A Jiameng Fan %A Wenchao Li %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-fan22b %I PMLR %P 6074--6102 %U https://proceedings.mlr.press/v162/fan22b.html %V 162 %X Deep reinforcement learning (DRL) agents are often sensitive to visual changes that were unseen in their training environments. To address this problem, we leverage the sequential nature of RL to learn robust representations that encode only task-relevant information from observations based on the unsupervised multi-view setting. Specif- ically, we introduce a novel contrastive version of the Multi-View Information Bottleneck (MIB) objective for temporal data. We train RL agents from pixels with this auxiliary objective to learn robust representations that can compress away task-irrelevant information and are predictive of task-relevant dynamics. This approach enables us to train high-performance policies that are robust to visual distractions and can generalize well to unseen environments. We demonstrate that our approach can achieve SOTA performance on a di- verse set of visual control tasks in the DeepMind Control Suite when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established base- lines for generalization to unseen environments on the Procgen benchmark. Our code is open- sourced and available at https://github. com/BU-DEPEND-Lab/DRIBO.
APA
Fan, J. & Li, W.. (2022). DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:6074-6102 Available from https://proceedings.mlr.press/v162/fan22b.html.

Related Material