Article

Multi-objective Genetic Programming for Explainable Reinforcement Learning

Authors:

Mathurin Videau,

Alessandro Leite,

Olivier Teytaud,

Marc SchoenauerAuthors Info & Claims

Genetic Programming: 25th European Conference, EuroGP 2022, Held as Part of EvoStar 2022, Madrid, Spain, April 20–22, 2022, Proceedings

Pages 278 - 293

https://doi.org/10.1007/978-3-031-02056-8_18

Published: 20 April 2022 Publication History

Abstract

Deep reinforcement learning has met noticeable successes recently for a wide range of control problems. However, this is typically based on thousands of weights and non-linearities, making solutions complex, not easily reproducible, uninterpretable and heavy. The present paper presents genetic programming approaches for building symbolic controllers. Results are competitive, in particular in the case of delayed rewards, and the solutions are lighter by orders of magnitude and much more understandable.

References

[1]

Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: ICML, p. 1 (2004)

[2]

Argall BD, Chernova S, Veloso M, and Browning B A survey of robot learning from demonstration Robot. Auton. Syst. 2009 57 5 469-483

[3]

Arrieta AB et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI IF 2020 58 82-115

[4]

Auger, A., Schoenauer, M., Teytaud, O.: Local and global order 3/2 convergence of a surrogate evolutionary algorithm. In: GECCO, p. 8 (2005)

[5]

Bastani, O., Pu, Y., Solar-Lezama, A.: Verifiable reinforcement learning via policy extraction. arXiv:1805.08328 (2018)

[6]

Beyer, H.G., Hellwig, M.: Controlling population size and mutation strength by meta-ES under fitness noise. In: FOGA, pp. 11–24 (2013)

[7]

Biecek P and Burzykowski T Explanatory Model Analysis: Explore, Explain And Examine Predictive Models 2021 Boca Raton CRC Press

[8]

Brameier MF and Banzhaf W Linear Genetic Programming 2007 Cham Springer

[9]

Brockman, G., et al.: OpenAI Gym. arXiv:1606.01540 (2016)

[10]

Cazenave, T.: Nested Monte-Carlo search. In: IJCAI (2009)

[11]

Cazenille, L.: QDpy: a python framework for quality-diversity (2018). bit.ly/3s0uyVv

[12]

Coppens, Y., Efthymiadis, K., Lenaerts, T., Nowé, A., Miller, T., Weber, R., Magazzeni, D.: Distilling deep reinforcement learning policies in soft decision trees. In: CEX Workshop, pp. 1–6 (2019)

[13]

Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017)

[14]

Ernst D, Geurts P, and Wehenkel L Tree-based batch mode reinforcement learning JMLR 2005 6 503-556

[15]

Flageat, M., Cully, A.: Fast and stable map-elites in noisy domains using deep grids. In: ALIFE, pp. 273–282 (2020)

[16]

Fortin FA, De Rainville FM, Gardner MA, Parizeau M, and Gagné C DEAP: evolutionary algorithms made easy JMLR 2012 13 2171-2175

[17]

Gaier, A., Asteroth, A., Mouret, J.B.: Data-efficient exploration, optimization, and modeling of diverse designs through surrogate-assisted illumination. In: GECCO, pp. 99–106 (2017)

[18]

Gilpin, L., Bau, D., Yuan, B., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an approach to evaluating interpretability of ML. arXiv:1806.00069 (2018)

[19]

Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML, pp. 1861–1870 (2018)

[20]

Hansen N and Ostermeier A Completely derandomized self-adaptation in evolution strategies ECO 2003 11 1 1-10

[21]

Hein, D., et al.: A benchmark environment motivated by industrial control problems. In: IEEE SSCI, pp. 1–8 (2017)

[22]

Hein D, Udluft S, and Runkler TA Interpretable policies for reinforcement learning by genetic programming Eng. App. Artif. Intell. 2018 76 158-169

[23]

Kaelbling LP, Littman ML, and Moore AW Reinforcement learning: a survey JAIR 1996 4 237-285

[24]

Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: GECCO, pp. 195–202 (2017)

[25]

Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: IJCNN, pp. 1942–1948 (1995)

[26]

Koza JR Genetic Programming: On the Programming of Computers by means of Natural Evolution 1992 Massachusetts MIT Press

[27]

Kubalík, J., Žegklitz, J., Derner, E., Babuška, R.: Symbolic regression methods for reinforcement learning. arXiv:1903.09688 (2019)

[28]

Kwee, I., Hutter, M., Schmidhuber, J.: Gradient-based reinforcement planning in policy-search methods. In: Wiering, M.A. (ed.) EWRL. vol. 27, pp. 27–29 (2001)

[29]

Landajuela, M., et al.: Discovering symbolic policies with deep reinforcement learning. In: ICML, pp. 5979–5989 (2021)

[30]

Liu, G., Schulte, O., Zhu, W., Li, Q.: Toward interpretable deep reinforcement learning with linear model u-trees. In: ECML PKDD, pp. 414–429 (2018)

[31]

Liventsev, V., Härmä, A., Petković, M.: Neurogenetic programming framework for explainable reinforcement learning. arXiv:2102.04231 (2021)

[32]

Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS, pp. 4768–4777 (2017)

[33]

Maes, F., Fonteneau, R., Wehenkel, L., Ernst, D.: Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: ICDS, pp. 37–51 (2012)

[34]

Mania, H., Guy, A., Recht, B.: Simple random search provides a competitive approach to reinforcement learning. arXiv:1803.07055 (2018)

[35]

Meunier, L., et al.: Black-box optimization revisited: Improving algorithm selection wizards through massive benchmarking. In: IEEE TEVC (2021)

[36]

Miller T Explanation in artificial intelligence: insights from the social sciences Artif. Intell. 2019 267 1-38

[37]

Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML, pp. 1928–1937 (2016)

[38]

Mouret, J.B., Clune, J.: Illuminating search spaces by mapping elites. arXiv:1504.04909 (2015)

[39]

Pugh JK, Soros LB, and Stanley KO Quality diversity: a new frontier for evolutionary computation Front. Robot. AI 2016 3 40

[40]

Rapin, J., Teytaud, O.: Nevergrad - a gradient-free optimization platform (2018). bit.ly/3g8wghU

[41]

Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you? Explaining the predictions of any classifier. In: SIGKDD, pp. 1135–1144 (2016)

[42]

Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: AISTATS, pp. 627–635 (2011)

[43]

Roth, A.M., Topin, N., Jamshidi, P., Veloso, M.: Conservative q-improvement: reinforcement learning for an interpretable decision-tree policy. arXiv:1907.01180 (2019)

[44]

Russell, S.: Learning agents for uncertain environments. In: COLT, pp. 101–103 (1998)

[45]

Schoenauer, M., Ronald, E.: Neuro-genetic truck backer-upper controller. In: IEEE CEC, pp. 720–723 (1994)

[46]

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)

[47]

Selvaraju, R.R., et al.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)

[48]

Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: ICML, pp. 3145–3153 (2017)

[49]

Sigaud, O., Stulp, F.: Policy search in continuous action domains: an overview. arXiv:1803.04706 (2018)

[50]

Storn R and Price K Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces JGO 1997 11 4 341-359

[51]

Sutton RS and Barto AG Reinforcement Learning: An Introduction 2018 2 Cambridge MIT press

[52]

Verma, A., Murali, V., Singh, R., Kohli, P., Chaudhuri, S.: Programmatically interpretable reinforcement learning. In: ICML, pp. 5045–5054 (2018)

[53]

Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing Atari games. In: GECCO, pp. 229–236 (2018)

[54]

Zhang H, Zhou A, and Lin X Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis Complex Intell. Syst. 2020 6 3 741-753

Cited By

Nadizar GRovito LDe Lorenzo AMedvet EVirgolin M(2024)An Analysis of the Ingredients for Learning Interpretable Symbolic Regression Models with Human-in-the-loop and Genetic ProgrammingACM Transactions on Evolutionary Learning and Optimization10.1145/36436884:1(1-30)Online publication date: 23-Feb-2024
https://dl.acm.org/doi/10.1145/3643688
Nadizar GRovito LWilson DMedvet ELi XHandl J(2024)Interpretable Control CompetitionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664051(11-12)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3664051
Yan LMa HChen GLi XHandl J(2024)Reinforcement Learning-Assisted Genetic Programming Hyper Heuristic Approach to Location-Aware Dynamic Online Application Deployment in CloudsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654058(988-997)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3654058
Show More Cited By

Index Terms

Multi-objective Genetic Programming for Explainable Reinforcement Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
  2. Machine learning
    1. Machine learning approaches
2. Theory of computation

Index terms have been assigned to the content through auto-classification.

Recommendations

Generating interpretable reinforcement learning policies using genetic programming
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion

The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and ...
Genetic programming methods for reinforcement learning
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

Reinforcement Learning (RL) algorithms can be used to optimally solve dynamic decision-making and control problems. With continuous-valued state and input variables, RL algorithms must rely on function approximators to represent the value function and ...
Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
Abstract
Reinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Genetic Programming: 25th European Conference, EuroGP 2022, Held as Part of EvoStar 2022, Madrid, Spain, April 20–22, 2022, Proceedings

Apr 2022

316 pages

ISBN:978-3-031-02055-1

DOI:10.1007/978-3-031-02056-8

Editors:
Eric Medvet
University of Trieste, Trieste, Italy
,
Gisele Pappa
Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
,
Bing Xue
Victoria University of Wellington, Wellington, New Zealand

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 20 April 2022

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nadizar GRovito LDe Lorenzo AMedvet EVirgolin M(2024)An Analysis of the Ingredients for Learning Interpretable Symbolic Regression Models with Human-in-the-loop and Genetic ProgrammingACM Transactions on Evolutionary Learning and Optimization10.1145/36436884:1(1-30)Online publication date: 23-Feb-2024
https://dl.acm.org/doi/10.1145/3643688
Nadizar GRovito LWilson DMedvet ELi XHandl J(2024)Interpretable Control CompetitionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664051(11-12)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3664051
Yan LMa HChen GLi XHandl J(2024)Reinforcement Learning-Assisted Genetic Programming Hyper Heuristic Approach to Location-Aware Dynamic Online Application Deployment in CloudsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654058(988-997)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3654058
Jorgensen SNadizar GPietropolli GManzoni LMedvet EO'Reilly UHemberg ELi XHandl J(2024)Large Language Model-based Test Case Generation for GP AgentsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654056(914-923)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3654056
Nadizar GMedvet EWilson DLi XHandl J(2024)Searching for a Diversity of Interpretable Graph Control PoliciesProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3653987(933-941)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3653987
Eberhardinger MRupp FMaucher JMaghsudi S(2024)Unveiling the Decision-Making Process in Reinforcement Learning with Genetic ProgrammingAdvances in Swarm Intelligence10.1007/978-981-97-7181-3_28(349-365)Online publication date: 22-Aug-2024
https://dl.acm.org/doi/10.1007/978-981-97-7181-3_28
Nadizar GMedvet EWilson D(2024)Naturally Interpretable Control Policies via Graph-Based Genetic ProgrammingGenetic Programming10.1007/978-3-031-56957-9_5(73-89)Online publication date: 3-Apr-2024
https://dl.acm.org/doi/10.1007/978-3-031-56957-9_5
Trajanov RNikolikj ACenikj GTeytaud FVideau MTeytaud OEftimov TLópez-Ibáñez MDoerr C(2022)Improving Nevergrad’s Algorithm Selection Wizard NGOpt Through Automated Algorithm ConfigurationParallel Problem Solving from Nature – PPSN XVII10.1007/978-3-031-14714-2_2(18-31)Online publication date: 10-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-14714-2_2

View Options

View options

Media

Figures

Other

Tables

View Table of Contents