Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3635637.3663243acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
extended-abstract

ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

Published: 06 May 2024 Publication History

Abstract

Offline reinforcement learning (RL) is a learning paradigm where an agent learns from a fixed dataset of experience. However, learning solely from a static dataset can limit the performance due to the lack of exploration. To overcome it, offline-to-online RL combines offline pre-training with online fine-tuning, which enables the agent to further refine its policy by interacting with the environment in real-time. Despite its benefits, existing offline-to-online RL methods suffer from performance degradation and slow improvement during the online phase. To tackle these challenges, we propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Moreover, to expedite online performance enhancement, we appropriately loosen the pessimism of Q-value estimation and incorporate ensemble-based exploration mechanisms into our framework. Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods during online fine-tuning on a range of locomotion tasks, significantly outperforming existing offline-to-online RL methods.

References

[1]
Gaon An, Seungyong Moon, Jang-Hyun Kim, and Hyun Oh Song. 2021. Uncertainty-based offline reinforcement learning with diversified q-ensemble. Advances in neural information processing systems, Vol. 34 (2021), 7436--7447.
[2]
Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. 2020. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219 (2020).
[3]
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).
[4]
Ilya Kostrikov, Ashvin Nair, and Sergey Levine. 2021. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169 (2021).
[5]
Seunghyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, and Jinwoo Shin. 2022. Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble. In Conference on Robot Learning. PMLR, 1702--1712.
[6]
Yihuan Mao, Chao Wang, Bin Wang, and Chongjie Zhang. 2022. MOORe: Model-based Offline-to-Online Reinforcement Learning. arXiv preprint arXiv:2201.10070 (2022).
[7]
Max Sobol Mark, Ali Ghadirzadeh, Xi Chen, and Chelsea Finn. 2022. Fine-tuning Offline Policies with Optimistic Action Selection. In Deep Reinforcement Learning Workshop NeurIPS 2022.
[8]
Ashvin Nair, Abhishek Gupta, Murtaza Dalal, and Sergey Levine. 2020. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359 (2020).
[9]
Mitsuhiko Nakamoto, Yuexiang Zhai, Anikait Singh, Max Sobol Mark, Yi Ma, Chelsea Finn, Aviral Kumar, and Sergey Levine. 2023. Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning. arXiv preprint arXiv:2303.05479 (2023).
[10]
Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 5026--5033.
[11]
Haichao Zhang, We Xu, and Haonan Yu. 2023. Policy Expansion for Bridging Offline-to-Online Reinforcement Learning. arXiv preprint arXiv:2302.00935 (2023).
[12]
Yi Zhao, Rinu Boney, Alexander Ilin, Juho Kannala, and Joni Pajarinen. 2022. Adaptive behavior cloning regularization for stable offline-to-online reinforcement learning. arXiv preprint arXiv:2210.13846 (2022).

Index Terms

  1. ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems
    May 2024
    2898 pages
    ISBN:9798400704864

    Sponsors

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 06 May 2024

    Check for updates

    Author Tags

    1. ensemble methods
    2. offline reinforcement learning
    3. offline-to-online rl
    4. online fine-tuning

    Qualifiers

    • Extended-abstract

    Funding Sources

    • National Natural Science Foundation of China
    • Xiaomi Young Talents Program of Xiaomi Foundation
    • National Key R&D Program of China

    Conference

    AAMAS '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 34
      Total Downloads
    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media