$$\mathsf {SafePILCO}$$ : A Software Tool for Safe and Data-Efficient Policy Synthesis

Kyriakos Polymenakos¹¹,
Nikitas Rontsis¹¹,
Alessandro Abate¹¹ &
…
Stephen Roberts¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12289))

Included in the following conference series:

International Conference on Quantitative Evaluation of Systems

576 Accesses
1 Citations

Abstract

$\mathsf {Safe PILCO}$ is a software tool for safe and data-efficient policy search with reinforcement learning. It extends the known $\mathsf {PILCO}$ algorithm, originally written in MATLAB, to support safe learning. $\mathsf {Safe PILCO}$ is a Python implementation and leverages existing libraries that allow the codebase to remain short and modular, towards wider use by the verification, reinforcement learning, and control communities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Safe Policy Improvement Approaches and Their Limitations

Safe Policy Improvement in Constrained Markov Decision Processes

Omega-Regular Objectives in Model-Free Reinforcement Learning

Notes

1.
An extended version of this paper is at: http://arxiv.org/abs/2008.03273.
2.
Main package repository: https://github.com/nrontsis/PILCO.
3.
Experiments and figures reproduction repository: https://github.com/kyr-pol/SafePILCO_Tool-Reproducibility.
4.
By way of comparison, all gradient calculations in the $\mathsf {PILCO}$ Matlab implementation are hand-coded, thus extensions are laborious as any additional user-defined controller or reward function has to include these gradient calculations too.
5.
Code for the BAS simulator: https://gitlab.com/natchi92/BASBenchmarks.

References

Abadi, M., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. software available from tensorflow.org
Berkenkamp, F., Krause, A., Schoellig, A.P.: Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics. CoRR abs/1602.04450 (2016). http://arxiv.org/abs/1602.04450
Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)
Cauchi, N., Abate, A.: Benchmarks for cyber-physical systems: a modular model library for building automation systems. In: Proceedings of ADHS, pp. 49–54 (2018)
Google Scholar
Chatzilygeroudis, K., Rama, R., Kaushik, R., Goepp, D., Vassiliades, V., Mouret, J.B.: Black-box data-efficient policy search for robotics. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 51–58. IEEE (2017)
Google Scholar
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems, pp. 4754–4765 (2018)
Google Scholar
Deisenroth, M.P.: Efficient reinforcement learning using Gaussian processes. Ph.D. thesis, Karlsruhe Institute of Technology (2010)
Google Scholar
Deisenroth, M.P., Englert, P., Peters, J., Fox, D.: Multi-task policy search for robotics. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3876–3881. IEEE (2014)
Google Scholar
Deisenroth, M.P., Neumann, G., Peters, J., et al.: A survey on policy search for robotics. Found. Trends$\textregistered $ Robot. 2(1–2), 1–142 (2013)
Google Scholar
Deisenroth, M.P., Rasmussen, C.E., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Robotics: Science and Systems (2011)
Google Scholar
Deisenroth, M.P., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: In Proceedings of the International Conference on Machine Learning (2011)
Google Scholar
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: International Conference on Machine Learning (ICML), pp. 1329–1338 (2016)
Google Scholar
Duivenvoorden, R.R., Berkenkamp, F., Carion, N., Krause, A., Schoellig, A.P.: Constrained Bayesian optimization with particle swarms for safe adaptive controller tuning. In: Proceedings of the IFAC (International Federation of Automatic Control) World Congress, pp. 12306–12313 (2017)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)
Koller, T., Berkenkamp, F., Turchetta, M., Krause, A.: Learning-based model predictive control for safe exploration. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6059–6066. IEEE (2018)
Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
MathSciNet MATH Google Scholar
Mataric, M.J.: Reward functions for accelerated learning. In: Machine Learning Proceedings 1994, pp. 181–189. Elsevier (1994)
Google Scholar
Matthews, A.G.d.G., et al.: GPflow: a Gaussian process library using TensorFlow. J. Mach. Learn. Res. 18(40), 1–6 (2017). http://jmlr.org/papers/v18/16-537.html
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Ng, A.Y., Jordan, M.I.: Shaping and policy search in reinforcement learning. Ph.D. thesis, University of California, Berkeley Berkeley (2003)
Google Scholar
Polymenakos, K., Abate, A., Roberts, S.: Safe policy search using Gaussian process models. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1565–1573. International Foundation for Autonomous Agents and Multiagent Systems (2019)
Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Sui, Y., Gotovos, A., Burdick, J., Krause, A.: Safe exploration for optimization with Gaussian processes. In: Proceedings of The 32nd International Conference on Machine Learning, pp. 997–1005 (2015)
Google Scholar
Vinogradska, J., Bischoff, B., Achterhold, J., Koller, T., Peters, J.: Numerical quadrature for probabilistic policy search. IEEE Trans. Pattern Anal. Mach. Intell. 42, 164–175 (2018)
Article Google Scholar
Vinogradska, J., Bischoff, B., Nguyen-Tuong, D., Romer, A., Schmidt, H., Peters, J.: Stability of controllers for gaussian process forward models. In: Proceedings of The 33rd International Conference on Machine Learning, pp. 545–554 (2016)
Google Scholar
Vuong, T.L., Tran, K.: Uncertainty-aware model-based policy optimization. arXiv preprint arXiv:1906.10717 (2019)
Wang, T., et al.: Benchmarking model-based reinforcement learning (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Oxford, Oxford, UK
Kyriakos Polymenakos, Nikitas Rontsis, Alessandro Abate & Stephen Roberts

Authors

Kyriakos Polymenakos
View author publications
You can also search for this author in PubMed Google Scholar
Nikitas Rontsis
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Abate
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Roberts
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyriakos Polymenakos .

Editor information

Editors and Affiliations

Politecnico di Milano, Milan, Italy
Marco Gribaudo
Chinese Academy of Sciences, Beijing, China
David N. Jansen
University of Münster, Münster, Germany
Anne Remke

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 242 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Polymenakos, K., Rontsis, N., Abate, A., Roberts, S. (2020). $\mathsf {SafePILCO}$: A Software Tool for Safe and Data-Efficient Policy Synthesis. In: Gribaudo, M., Jansen, D.N., Remke, A. (eds) Quantitative Evaluation of Systems. QEST 2020. Lecture Notes in Computer Science(), vol 12289. Springer, Cham. https://doi.org/10.1007/978-3-030-59854-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-59854-9_3
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59853-2
Online ISBN: 978-3-030-59854-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

\(\mathsf {SafePILCO}\): A Software Tool for Safe and Data-Efficient Policy Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Safe Policy Improvement Approaches and Their Limitations

Safe Policy Improvement in Constrained Markov Decision Processes

Omega-Regular Objectives in Model-Free Reinforcement Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 242 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

\(\mathsf {SafePILCO}\): A Software Tool for Safe and Data-Efficient Policy Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Safe Policy Improvement Approaches and Their Limitations

Safe Policy Improvement in Constrained Markov Decision Processes

Omega-Regular Objectives in Model-Free Reinforcement Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 242 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation