ORACLE: End-to-End Model Based Reinforcement Learning

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13101))

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

885 Accesses
3 Altmetric

Abstract

Reinforcement Learning (RL) algorithms seek to maximize some notion of reward. There are two categories of RL agents, model-based or model-free agents. In the case of model-free learning, the algorithm learns through trial and error in the target environment in contrast to model-based where the agent train in a learned or known environment instead.

Model-free reinforcement learning shows promising results in simulated environments but falls short in the case of real-world environments. This is because trial and error do not fit with the reality where errors are related to an economic burden. On the other hand, Model-based reinforcement learning (MBRL) aims to exploit a known or learned dynamics model, which substantially increases sample efficiency. This paper focuses on learning a dynamics model and use the learned model to train several model-free algorithms by directly sampling the dynamics model. However, it is challenging to achieve good accuracy on dynamics models for highly complex domains due to stochasticity and compounding noise in the system. A majority of model-based RL focuses on dynamics models that derive policies from observation space. Deriving policies from observation space is problematic because it is often high dimensional with significant complexity.

This paper proposes an end-to-end model-based reinforcement learning algorithm for learning model-free algorithms to act in environments without trial and error in the real environment. This method is beneficial for existing installations that employ existing decision-making systems, such as an expert system. The proposed algorithm has the same fundamental learning principles as the Dreaming Variational Autoencoder but is substantially different architecturally. We show that the algorithm is more sample efficient and performs comparably with existing model-free approaches. We also demonstrate how the algorithm is actor agnostic, enabling existing model-free algorithms to operate in a model-based context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards Model-Based Reinforcement Learning for Industry-Near Environments

A General Unbiased Training Framework for Deep Reinforcement Learning

CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning

Notes

1.
The definition of ‘sufficient’ is to train up until a satisfactory performance in terms of average return.
2.
We refer the reader to https://github.com/perara/oracle for a detailed implementation in python.
3.
We take this opportunity to welcome the RL community to consider open-source benchmarks for easier comparison of scientific results.
4.
We make the reader aware that the experiments are compute-heavy, hence few experiment iterations. In total, the experiments take $\sim $5 days of wall-clock time to train on consumer-level hardware.

References

Andersen, P., Goodwin, M., Granmo, O.: Deep RTS: a game environment for deep reinforcement learning in real-time strategy games. In: 2018 IEEE Conference on Computational Intelligence and Games (CIG), pp. 1–8 (2018). https://doi.org/10.1109/CIG.2018.8490409
Andersen, P.-A., Goodwin, M., Granmo, O.-C.: The dreaming variational autoencoder for reinforcement learning environments. In: Bramer, M., Petridis, M. (eds.) SGAI 2018. LNCS (LNAI), vol. 11311, pp. 143–155. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04191-5_11
Chapter Google Scholar
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017). https://doi.org/10.1109/MSP.2017.2743240
Article Google Scholar
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 4754–4765. Curran Associates, Inc. (2018)
Google Scholar
Coumans, E., Bai, Y.: PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org
Deisenroth, M., Rasmussen, C.E.: PILCO: A model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on Machine Learning ICML’11, pp. 465–472. Citeseer (2011)
Google Scholar
Doerr, A., et al.: Probabilistic recurrent state-space models. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1280–1289. PMLR (2018). http://proceedings.mlr.press/v80/doerr18a.html
Draganjac, I., Miklic, D., Kovacic, Z., Vasiljevic, G., Bogdan, S.: Decentralized control of multi-AGV systems in autonomous warehousing applications. IEEE Trans. Autom. Sci. Eng. 13(4), 1433–1447 (2016). https://doi.org/10.1109/TASE.2016.2603781
Article Google Scholar
Fraccaro, M.: Deep latent variable models for sequential data (2018). https://orbit.dtu.dk/en/publications/deep-latent-variable-models-for-sequential-data
Fuchs, A., Heider, Y., Wang, K., Sun, W.C., Kaliske, M.: DNN2: a hyper-parameter reinforcement learning game for self-design of neural network based elasto-plastic constitutive descriptions. Comput. Struct. 249, 106505 (2021). https://doi.org/10.1016/j.compstruc.2021.106505
Article Google Scholar
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)
MathSciNet MATH Google Scholar
Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to control: learning behaviors by latent imagination. In: Proceedings 8th International Conference on Learning Representations, ICLR’20 (2020). https://openreview.net/forum?id=S1lOTC4tDS
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings 36th International Conference on Machine Learning, ICML’18, vol. 97, pp. 2555–2565. PMLR, Long Beach (2019). http://proceedings.mlr.press/v97/hafner19a/hafner19a.pdf
Hafner, D., Lillicrap, T.P., Norouzi, M., Ba, J.: Mastering atari with discrete world models. In: Proceedings 9th International Conference on Learning Representations, ICLR’21 (2021). https://openreview.net/forum?id=0oabwyZbOu
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Proc. 32nd Conference on Artificial Intelligence, AAAI’18, pp. 3215–3222. AAAI Press, New Orleans (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/download/17204/16680
Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.G.: Averaging weights leads to wider optima and better generalization. In: R. Silva, A.G., Globerson, A. (eds.) 34th Conference on Uncertainty in Artificial Intelligence 2018, pp. 876–885. Association For Uncertainty in Artificial Intelligence (2018). http://arxiv.org/abs/1803.05407
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: Introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999). https://doi.org/10.1023/A:1007665907178
Article MATH Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Proceedings of the 2nd International Conference on Learning Representations (2013). https://doi.org/10.1051/0004-6361/201527329, http://arxiv.org/abs/1312.6114
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: Proceedings 7th International Conference on Learning Representations, ICLR’19 (2019). https://openreview.net/forum?id=Bkg6RiCqY7
Mallozzi, P., Pelliccione, P., Knauss, A., Berger, C., Mohammadiha, N.: Autonomous vehicles: state of the art, future trends, and challenges. In: Automotive Systems and Software Engineering, pp. 347–367. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12157-0_16
Chapter Google Scholar
Moerland, T.M., Broekens, J., Jonker, C.M.: Model-based reinforcement learning: a survey (2020). arxiv preprint arXiv:2006.16712
Ozair, S., Li, Y., Razavi, A., Antonoglou, I., van den Oord, A., Vinyals, O.: Vector quantized models for planning. In: Proceedings 39th International Conference on Machine Learning, ICML’21 (2021). http://arxiv.org/abs/2106.04615
Razavi, A., van den Oord, A., Poole, B., Vinyals, O.: Preventing posterior collapse with delta-VAEs. In: Proceedings 7th International Conference on Learning Representations, ICLR’19 (2019). https://openreview.net/forum?id=BJe0Gn0cY7
Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. pp. 14837–14847. Curran Associates Inc., Vancouver (2019). http://papers.nips.cc/paper/9625-generating-diverse-high-fidelity-images-with-vq-vae-2
Schrittwieser, J., et al.: Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020). https://doi.org/10.1038/s41586-020-03051-4
Article Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arxiv preprint arXiv:1707.06347
Seetharaman, P., Wichern, G., Pardo, B., Roux, J.L.: Autoclip: adaptive gradient clipping for source separation networks. In: IEEE International Workshop on Machine Learning for Signal Processing, MLSP, vol. 2020-September. IEEE Computer Society (2020). https://doi.org/10.1109/MLSP49062.2020.9231926
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull. 2(4), 160–163 (1991). https://doi.org/10.1145/122344.122377
Article Google Scholar
Varghese, N.V., Mahmoud, Q.H.: A survey of multi-task deep reinforcement learning. Electronics 9(9) (2020). https://doi.org/10.3390/electronics9091363
Yu, C., Liu, J., Nemati, S.: Reinforcement learning in healthcare: a survey (2019). arxiv preprint arXiv:1908.08796

Download references

Author information

Authors and Affiliations

Department of ICT, University of Agder, Grimstad, Norway
Per-Arne Andersen, Morten Goodwin & Ole-Christoffer Granmo

Authors

Per-Arne Andersen
View author publications
You can also search for this author in PubMed Google Scholar
Morten Goodwin
View author publications
You can also search for this author in PubMed Google Scholar
Ole-Christoffer Granmo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Per-Arne Andersen .

Editor information

Editors and Affiliations

University of Portsmouth, Portsmouth, UK
Max Bramer
RKE Consulting, Micheldever, UK
Richard Ellis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andersen, PA., Goodwin, M., Granmo, OC. (2021). ORACLE: End-to-End Model Based Reinforcement Learning. In: Bramer, M., Ellis, R. (eds) Artificial Intelligence XXXVIII. SGAI-AI 2021. Lecture Notes in Computer Science(), vol 13101. Springer, Cham. https://doi.org/10.1007/978-3-030-91100-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-91100-3_4
Published: 06 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91099-0
Online ISBN: 978-3-030-91100-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ORACLE: End-to-End Model Based Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Model-Based Reinforcement Learning for Industry-Near Environments

A General Unbiased Training Framework for Deep Reinforcement Learning

CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

ORACLE: End-to-End Model Based Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Model-Based Reinforcement Learning for Industry-Near Environments

A General Unbiased Training Framework for Deep Reinforcement Learning

CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation