Abstract
\(\mathsf {Safe PILCO}\) is a software tool for safe and data-efficient policy search with reinforcement learning. It extends the known \(\mathsf {PILCO}\) algorithm, originally written in MATLAB, to support safe learning. \(\mathsf {Safe PILCO}\) is a Python implementation and leverages existing libraries that allow the codebase to remain short and modular, towards wider use by the verification, reinforcement learning, and control communities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
An extended version of this paper is at: http://arxiv.org/abs/2008.03273.
- 2.
Main package repository: https://github.com/nrontsis/PILCO.
- 3.
Experiments and figures reproduction repository: https://github.com/kyr-pol/SafePILCO_Tool-Reproducibility.
- 4.
By way of comparison, all gradient calculations in the \(\mathsf {PILCO}\) Matlab implementation are hand-coded, thus extensions are laborious as any additional user-defined controller or reward function has to include these gradient calculations too.
- 5.
Code for the BAS simulator: https://gitlab.com/natchi92/BASBenchmarks.
References
Abadi, M., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. software available from tensorflow.org
Berkenkamp, F., Krause, A., Schoellig, A.P.: Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics. CoRR abs/1602.04450 (2016). http://arxiv.org/abs/1602.04450
Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)
Cauchi, N., Abate, A.: Benchmarks for cyber-physical systems: a modular model library for building automation systems. In: Proceedings of ADHS, pp. 49–54 (2018)
Chatzilygeroudis, K., Rama, R., Kaushik, R., Goepp, D., Vassiliades, V., Mouret, J.B.: Black-box data-efficient policy search for robotics. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 51–58. IEEE (2017)
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems, pp. 4754–4765 (2018)
Deisenroth, M.P.: Efficient reinforcement learning using Gaussian processes. Ph.D. thesis, Karlsruhe Institute of Technology (2010)
Deisenroth, M.P., Englert, P., Peters, J., Fox, D.: Multi-task policy search for robotics. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3876–3881. IEEE (2014)
Deisenroth, M.P., Neumann, G., Peters, J., et al.: A survey on policy search for robotics. Found. Trends\(\textregistered \) Robot. 2(1–2), 1–142 (2013)
Deisenroth, M.P., Rasmussen, C.E., Fox, D.: Learning to control a low-cost manipulator using data-efficient reinforcement learning. In: Robotics: Science and Systems (2011)
Deisenroth, M.P., Rasmussen, C.E.: PILCO: a model-based and data-efficient approach to policy search. In: In Proceedings of the International Conference on Machine Learning (2011)
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: International Conference on Machine Learning (ICML), pp. 1329–1338 (2016)
Duivenvoorden, R.R., Berkenkamp, F., Carion, N., Krause, A., Schoellig, A.P.: Constrained Bayesian optimization with particle swarms for safe adaptive controller tuning. In: Proceedings of the IFAC (International Federation of Automatic Control) World Congress, pp. 12306–12313 (2017)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)
Koller, T., Berkenkamp, F., Turchetta, M., Krause, A.: Learning-based model predictive control for safe exploration. In: 2018 IEEE Conference on Decision and Control (CDC), pp. 6059–6066. IEEE (2018)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
Mataric, M.J.: Reward functions for accelerated learning. In: Machine Learning Proceedings 1994, pp. 181–189. Elsevier (1994)
Matthews, A.G.d.G., et al.: GPflow: a Gaussian process library using TensorFlow. J. Mach. Learn. Res. 18(40), 1–6 (2017). http://jmlr.org/papers/v18/16-537.html
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Ng, A.Y., Jordan, M.I.: Shaping and policy search in reinforcement learning. Ph.D. thesis, University of California, Berkeley Berkeley (2003)
Polymenakos, K., Abate, A., Roberts, S.: Safe policy search using Gaussian process models. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1565–1573. International Foundation for Autonomous Agents and Multiagent Systems (2019)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Sui, Y., Gotovos, A., Burdick, J., Krause, A.: Safe exploration for optimization with Gaussian processes. In: Proceedings of The 32nd International Conference on Machine Learning, pp. 997–1005 (2015)
Vinogradska, J., Bischoff, B., Achterhold, J., Koller, T., Peters, J.: Numerical quadrature for probabilistic policy search. IEEE Trans. Pattern Anal. Mach. Intell. 42, 164–175 (2018)
Vinogradska, J., Bischoff, B., Nguyen-Tuong, D., Romer, A., Schmidt, H., Peters, J.: Stability of controllers for gaussian process forward models. In: Proceedings of The 33rd International Conference on Machine Learning, pp. 545–554 (2016)
Vuong, T.L., Tran, K.: Uncertainty-aware model-based policy optimization. arXiv preprint arXiv:1906.10717 (2019)
Wang, T., et al.: Benchmarking model-based reinforcement learning (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Polymenakos, K., Rontsis, N., Abate, A., Roberts, S. (2020). \(\mathsf {SafePILCO}\): A Software Tool for Safe and Data-Efficient Policy Synthesis. In: Gribaudo, M., Jansen, D.N., Remke, A. (eds) Quantitative Evaluation of Systems. QEST 2020. Lecture Notes in Computer Science(), vol 12289. Springer, Cham. https://doi.org/10.1007/978-3-030-59854-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-59854-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59853-2
Online ISBN: 978-3-030-59854-9
eBook Packages: Computer ScienceComputer Science (R0)