Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-031-02056-8_18guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Multi-objective Genetic Programming for Explainable Reinforcement Learning

Published: 20 April 2022 Publication History

Abstract

Deep reinforcement learning has met noticeable successes recently for a wide range of control problems. However, this is typically based on thousands of weights and non-linearities, making solutions complex, not easily reproducible, uninterpretable and heavy. The present paper presents genetic programming approaches for building symbolic controllers. Results are competitive, in particular in the case of delayed rewards, and the solutions are lighter by orders of magnitude and much more understandable.

References

[1]
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: ICML, p. 1 (2004)
[2]
Argall BD, Chernova S, Veloso M, and Browning B A survey of robot learning from demonstration Robot. Auton. Syst. 2009 57 5 469-483
[3]
Arrieta AB et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI IF 2020 58 82-115
[4]
Auger, A., Schoenauer, M., Teytaud, O.: Local and global order 3/2 convergence of a surrogate evolutionary algorithm. In: GECCO, p. 8 (2005)
[5]
Bastani, O., Pu, Y., Solar-Lezama, A.: Verifiable reinforcement learning via policy extraction. arXiv:1805.08328 (2018)
[6]
Beyer, H.G., Hellwig, M.: Controlling population size and mutation strength by meta-ES under fitness noise. In: FOGA, pp. 11–24 (2013)
[7]
Biecek P and Burzykowski T Explanatory Model Analysis: Explore, Explain And Examine Predictive Models 2021 Boca Raton CRC Press
[8]
Brameier MF and Banzhaf W Linear Genetic Programming 2007 Cham Springer
[9]
Brockman, G., et al.: OpenAI Gym. arXiv:1606.01540 (2016)
[10]
Cazenave, T.: Nested Monte-Carlo search. In: IJCAI (2009)
[11]
Cazenille, L.: QDpy: a python framework for quality-diversity (2018). bit.ly/3s0uyVv
[12]
Coppens, Y., Efthymiadis, K., Lenaerts, T., Nowé, A., Miller, T., Weber, R., Magazzeni, D.: Distilling deep reinforcement learning policies in soft decision trees. In: CEX Workshop, pp. 1–6 (2019)
[13]
Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017)
[14]
Ernst D, Geurts P, and Wehenkel L Tree-based batch mode reinforcement learning JMLR 2005 6 503-556
[15]
Flageat, M., Cully, A.: Fast and stable map-elites in noisy domains using deep grids. In: ALIFE, pp. 273–282 (2020)
[16]
Fortin FA, De Rainville FM, Gardner MA, Parizeau M, and Gagné C DEAP: evolutionary algorithms made easy JMLR 2012 13 2171-2175
[17]
Gaier, A., Asteroth, A., Mouret, J.B.: Data-efficient exploration, optimization, and modeling of diverse designs through surrogate-assisted illumination. In: GECCO, pp. 99–106 (2017)
[18]
Gilpin, L., Bau, D., Yuan, B., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an approach to evaluating interpretability of ML. arXiv:1806.00069 (2018)
[19]
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML, pp. 1861–1870 (2018)
[20]
Hansen N and Ostermeier A Completely derandomized self-adaptation in evolution strategies ECO 2003 11 1 1-10
[21]
Hein, D., et al.: A benchmark environment motivated by industrial control problems. In: IEEE SSCI, pp. 1–8 (2017)
[22]
Hein D, Udluft S, and Runkler TA Interpretable policies for reinforcement learning by genetic programming Eng. App. Artif. Intell. 2018 76 158-169
[23]
Kaelbling LP, Littman ML, and Moore AW Reinforcement learning: a survey JAIR 1996 4 237-285
[24]
Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: GECCO, pp. 195–202 (2017)
[25]
Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: IJCNN, pp. 1942–1948 (1995)
[26]
Koza JR Genetic Programming: On the Programming of Computers by means of Natural Evolution 1992 Massachusetts MIT Press
[27]
Kubalík, J., Žegklitz, J., Derner, E., Babuška, R.: Symbolic regression methods for reinforcement learning. arXiv:1903.09688 (2019)
[28]
Kwee, I., Hutter, M., Schmidhuber, J.: Gradient-based reinforcement planning in policy-search methods. In: Wiering, M.A. (ed.) EWRL. vol. 27, pp. 27–29 (2001)
[29]
Landajuela, M., et al.: Discovering symbolic policies with deep reinforcement learning. In: ICML, pp. 5979–5989 (2021)
[30]
Liu, G., Schulte, O., Zhu, W., Li, Q.: Toward interpretable deep reinforcement learning with linear model u-trees. In: ECML PKDD, pp. 414–429 (2018)
[31]
Liventsev, V., Härmä, A., Petković, M.: Neurogenetic programming framework for explainable reinforcement learning. arXiv:2102.04231 (2021)
[32]
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS, pp. 4768–4777 (2017)
[33]
Maes, F., Fonteneau, R., Wehenkel, L., Ernst, D.: Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: ICDS, pp. 37–51 (2012)
[34]
Mania, H., Guy, A., Recht, B.: Simple random search provides a competitive approach to reinforcement learning. arXiv:1803.07055 (2018)
[35]
Meunier, L., et al.: Black-box optimization revisited: Improving algorithm selection wizards through massive benchmarking. In: IEEE TEVC (2021)
[36]
Miller T Explanation in artificial intelligence: insights from the social sciences Artif. Intell. 2019 267 1-38
[37]
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML, pp. 1928–1937 (2016)
[38]
Mouret, J.B., Clune, J.: Illuminating search spaces by mapping elites. arXiv:1504.04909 (2015)
[39]
Pugh JK, Soros LB, and Stanley KO Quality diversity: a new frontier for evolutionary computation Front. Robot. AI 2016 3 40
[40]
Rapin, J., Teytaud, O.: Nevergrad - a gradient-free optimization platform (2018). bit.ly/3g8wghU
[41]
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you? Explaining the predictions of any classifier. In: SIGKDD, pp. 1135–1144 (2016)
[42]
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: AISTATS, pp. 627–635 (2011)
[43]
Roth, A.M., Topin, N., Jamshidi, P., Veloso, M.: Conservative q-improvement: reinforcement learning for an interpretable decision-tree policy. arXiv:1907.01180 (2019)
[44]
Russell, S.: Learning agents for uncertain environments. In: COLT, pp. 101–103 (1998)
[45]
Schoenauer, M., Ronald, E.: Neuro-genetic truck backer-upper controller. In: IEEE CEC, pp. 720–723 (1994)
[46]
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
[47]
Selvaraju, R.R., et al.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)
[48]
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: ICML, pp. 3145–3153 (2017)
[49]
Sigaud, O., Stulp, F.: Policy search in continuous action domains: an overview. arXiv:1803.04706 (2018)
[50]
Storn R and Price K Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces JGO 1997 11 4 341-359
[51]
Sutton RS and Barto AG Reinforcement Learning: An Introduction 2018 2 Cambridge MIT press
[52]
Verma, A., Murali, V., Singh, R., Kohli, P., Chaudhuri, S.: Programmatically interpretable reinforcement learning. In: ICML, pp. 5045–5054 (2018)
[53]
Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing Atari games. In: GECCO, pp. 229–236 (2018)
[54]
Zhang H, Zhou A, and Lin X Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis Complex Intell. Syst. 2020 6 3 741-753

Cited By

View all
  • (2024)An Analysis of the Ingredients for Learning Interpretable Symbolic Regression Models with Human-in-the-loop and Genetic ProgrammingACM Transactions on Evolutionary Learning and Optimization10.1145/36436884:1(1-30)Online publication date: 23-Feb-2024
  • (2024)Interpretable Control CompetitionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664051(11-12)Online publication date: 14-Jul-2024
  • (2024)Reinforcement Learning-Assisted Genetic Programming Hyper Heuristic Approach to Location-Aware Dynamic Online Application Deployment in CloudsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654058(988-997)Online publication date: 14-Jul-2024
  • Show More Cited By

Index Terms

  1. Multi-objective Genetic Programming for Explainable Reinforcement Learning
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        Genetic Programming: 25th European Conference, EuroGP 2022, Held as Part of EvoStar 2022, Madrid, Spain, April 20–22, 2022, Proceedings
        Apr 2022
        316 pages
        ISBN:978-3-031-02055-1
        DOI:10.1007/978-3-031-02056-8
        • Editors:
        • Eric Medvet,
        • Gisele Pappa,
        • Bing Xue

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 20 April 2022

        Author Tags

        1. Genetic Programming
        2. Reinforcement Learning
        3. Explainable Reinforcement Learning (XRL)
        4. Genetic Programming Reinforcement Learning (GPRL)

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 14 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)An Analysis of the Ingredients for Learning Interpretable Symbolic Regression Models with Human-in-the-loop and Genetic ProgrammingACM Transactions on Evolutionary Learning and Optimization10.1145/36436884:1(1-30)Online publication date: 23-Feb-2024
        • (2024)Interpretable Control CompetitionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3664051(11-12)Online publication date: 14-Jul-2024
        • (2024)Reinforcement Learning-Assisted Genetic Programming Hyper Heuristic Approach to Location-Aware Dynamic Online Application Deployment in CloudsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654058(988-997)Online publication date: 14-Jul-2024
        • (2024)Large Language Model-based Test Case Generation for GP AgentsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654056(914-923)Online publication date: 14-Jul-2024
        • (2024)Searching for a Diversity of Interpretable Graph Control PoliciesProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3653987(933-941)Online publication date: 14-Jul-2024
        • (2024)Unveiling the Decision-Making Process in Reinforcement Learning with Genetic ProgrammingAdvances in Swarm Intelligence10.1007/978-981-97-7181-3_28(349-365)Online publication date: 22-Aug-2024
        • (2024)Naturally Interpretable Control Policies via Graph-Based Genetic ProgrammingGenetic Programming10.1007/978-3-031-56957-9_5(73-89)Online publication date: 3-Apr-2024
        • (2022)Improving Nevergrad’s Algorithm Selection Wizard NGOpt Through Automated Algorithm ConfigurationParallel Problem Solving from Nature – PPSN XVII10.1007/978-3-031-14714-2_2(18-31)Online publication date: 10-Sep-2022

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media