Abstract
A fundamental aspect of intelligent agent behaviour is the ability to encode salient features of experience in memory and use these memories, in combination with current sensory information, to predict the best action for each situation such that long-term objectives are maximized. The world is highly dynamic, and behavioural agents must generalize across a variety of environments and objectives over time. This scenario can be modeled as a partially-observable multi-task reinforcement learning problem. We use genetic programming to evolve highly-generalized agents capable of operating in six unique environments from the control literature, including OpenAI’s entire Classic Control suite. This requires the agent to support discrete and continuous actions simultaneously. No task-identification sensor inputs are provided, thus agents must identify tasks from the dynamics of state variables alone and define control policies for each task. We show that emergent hierarchical structure in the evolving programs leads to multi-task agents that succeed by performing a temporal decomposition and encoding of the problem environments in memory. The resulting agents are competitive with task-specific agents in all six environments. Furthermore, the hierarchical structure of programs allows for dynamic run-time complexity, which results in relatively efficient operation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
With the parameters listed in Table 4, the team generation process creates 1575 new agents in each generation.
The population at any given generation includes 1575 new agents and 1575 elite agents from previous generations. The initial population size (\(R_{size}\) in Table 4) is 1000. Thus, after 2 generations the 63 bins of elites will remain full, their content being recalculated in each generation based on the fitness of new agents.
References
A. Agapitos, M. O’Neill, A. Brabazon, Genetic programming for the induction of seasonal forecasts: A study on weather derivatives, in Financial Decision Making Using Computational Intelligence. ed. by M. Doumpos, C. Zopounidis, P.M. Pardalos (Springer, US, Boston, MA, 2012), pp. 159–188
A. Banino, A.P. Badia, R. Koster, M.J. Chadwick, V. Zambaldi, D. Hassabis, C. Barry, M. Botvinick, D. Kumaran, C. Blundell, Memo: A deep network for flexible combination of episodic memories. arXiv:2001.10913 (2020)
A.M. Barreto, D.A. Augusto, H.J. Barbosa, On the characteristics of sequential decision problems and their impact on evolutionary computation. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO ’09, p. 1767-1768. Association for Computing Machinery, New York, NY, USA (2009). https://doi.org/10.1145/1569901.1570150
G. Barth-Maron, M.W. Hoffman, D. Budden, W. Dabney, D. Horgan, D.TB, A. Muldal, N. Heess, T. Lillicrap, Distributed distributional deterministic policy gradients. arXiv:1804.08617 (2018)
C. Beattie, J.Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Küttler, A. Lefrancq, S. Green, V. Valdés, A. Sadik, J. Schrittwieser, K. Anderson, S. York, M. Cant, A. Cain, A. Bolton, S. Gaffney, H. King, D. Hassabis, S. Legg, S. Petersen, DeepMind Lab. arXiv:1612.03801 (2016)
M. Brameier, W. Banzhaf, Linear Genetic Programming (Springer, Berlin, 2007)
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, OpenAI Gym. arXiv:1606.01540 (2016)
C. D’Eramo, D. Tateo, A. Bonarini, M. Restelli, J. Peters, Sharing knowledge in multi-task deep reinforcement learning. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rkgpv2VFvr
K. Desnos, N. Sourbier, P.Y. Raumer, O. Gesny, M. Pelcat, Gegelati: Lightweight Artificial Intelligence through Generic and Evolvable Tangled Program Graphs. In: Workshop on Design and Architectures for Signal and Image Processing (14th Edition), DASIP ’21, p. 35-43. ACM, New York, NY, USA (2021). https://doi.org/10.1145/3441110.3441575
C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A.A. Rusu, A. Pritzel, D. Wierstra, Pathnet: Evolution channels gradient descent in super neural networks. arXiv:1701.08734 (2017)
H. Fu, H. Tang, J. Hao, Z. Lei, Y, Chen, C. Fan, Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces. arXiv:1903.04959 (2019)
F. J. Gomez, J. Schmidhuber, Co-evolving recurrent neurons learn deep memory pomdps. In: Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation, GECCO ’05, p. 491-498. ACM, New York, NY, USA (2005). https://doi.org/10.1145/1068009.1068092
A. Goyal, A. Lamb, J. Hoffmann, S. Sodhani, S. Levine, Y. Bengio, B. Schölkopf, Recurrent independent mechanisms. arXiv:1909.10893 (2019)
K. Greff, R.K. Srivastava, J. Koutník, B.R. Steunebrink, J. Schmidhuber, Lstm: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017). https://doi.org/10.1109/TNNLS.2016.2582924
M. Hessel, H. Soyer, L. Espeholt, W. Czarnecki, S. Schmitt, H. van Hasselt, Multi-task deep reinforcement learning with popart. Proceedings of the AAAI Conference on Artificial Intelligence 33(01), 3796–3803 (2019) https://doi.org/10.1609/aaai.v33i01.33013796. https://ojs.aaai.org/index.php/AAAI/article/view/4266
M.I. Heywood, Evolutionary model building under streaming data for classification tasks: opportunities and challenges. Genet. Program. Evol. Mach. 16(3), 283–326 (2015)
J.H. Holland, Properties of the bucket brigade. In: Proceedings of the 1st International Conference on Genetic Algorithms, p. 1-7. L. Erlbaum Associates Inc., USA (1985)
S. Kelly, Scaling genetic programming to challenging reinforcement tasks through emergent modularity. Ph.D. thesis, Faculty of Computer Science, Dalhousie University (2018)
S. Kelly, Source code and animations (2021). Available at https://stephenkelly.ca/genp2021
S. Kelly, W. Banzhaf, Temporal memory sharing in visual reinforcement learning, in Genetic Programming Theory and Practice XVII. ed. by W. Banzhaf, L. Spector, L. Sheneman (Springer International Publishing, Cham, 2020), pp. 101–119
S. Kelly, M.I. Heywood, Discovering agent behaviors through code reuse: examples from half-field offense and Ms. Pac Man IEEE Trans. Games 10(2), 195–208 (2018)
S. Kelly, M.I. Heywood, Emergent solutions to high-dimensional multitask reinforcement learning. Evol. Comput. 26(3), 347–380 (2018)
S. Kelly, J. Newsted, W. Banzhaf, C. Gondro, A modular memory framework for time series prediction. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO ’20, pp. 949-957. ACM, New York, NY, USA (2020). https://doi.org/10.1145/3377930.3390216
J.F.C. Kingman, A simple model for the balance between selection and mutation. J. Appl. Prob. 15(1), 1–12 (1978)
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A.A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, R. Hadsell, Overcoming catastrophic forgetting in neural networks. Proc. National Acad. Sci. 114(13), 3521–3526 (2017)
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)
T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D., Wierstra, Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
L. Metz, J. Ibarz, N. Jaitly, J. Davidson, Discrete sequential prediction of continuous actions for deep RL. arXiv:1705.05035 (2017)
V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
D.E. Moriarty, A.C. Schultz, J.J. Grefenstette, Evolutionary algorithms for reinforcement learning. J. Artif. Int. Res. 11(1), 241–276 (1999)
A.M. Nedelcu, R.E. Michod, Evolvability, modularity, and individuality during the transition to multicellularity in volvocalean green algae. In: G. Schlosser, G. Wagner (eds.) Modularity in Development and Evolution, pp. 470–489. Chicago Press (2002)
E.O. Neftci, B.B. Averbeck, Reinforcement learning in artificial and biological systems. Nat. Mach. Intell. 1(3), 133–143 (2019). https://doi.org/10.1038/s42256-019-0025-4
J. Oh, V. Chockalingam, S. Singh, H. Lee, Control of memory, active perception, and action in minecraft. arXiv:1605.09128 (2016)
R.J. Preen, L. Bull, Dynamical genetic programming in Xcsf. Evol. Comput. 21(3), 361–387 (2013)
B. Recht, A tour of reinforcement learning: the view from continuous control. Ann. Rev. Control Robot. Auto. Syst. 2(1), 253–279 (2019)
A.A. Rusu, S.G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mih, K. Kavukcuoglu, R. Hadsell, Policy distillation. arXiv:1511.06295 (2016)
A.A. Rusu, N.C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, R. Hadsell, Progressive neural networks. arXiv:1606.04671 (2016)
H.A. Simon, The architecture of complexity. Proc. Am. Philos. Soc. 106, 467–482 (1962)
R.J. Smith, R. Amaral, M.I. Heywood, Evolving simple solutions to the CIFAR-10 benchmark using tangled program graphs. In: Proceedings of the 2021 IEEE Congress of Evolutionary Computation (CEC), paper to appear (2021)
R.J. Smith, M.I. Heywood, Evolving Dota 2 shadow fiend bots using genetic programming with external memory. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’19, pp. 179–187. ACM, New York, NY, USA (2019)
R.J. Smith, M.I. Heywood, A model of external memory for navigation in partially observable visual reinforcement learning tasks, in Genetic Programming. ed. by L. Sekanina, T. Hu, N. Lourenço, H. Richter, P. García-Sánchez (Springer International Publishing, Cham, 2019), pp. 162–177
R.S. Sutton, Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988). https://doi.org/10.1023/A:1022633531479
R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (A Bradford Book, Cambridge, 2018)
N. Vithayathil Varghese, Q.H. Mahmoud, A survey of multi-task deep reinforcement learning. Electronics 9(9) (2020). https://doi.org/10.3390/electronics9091363. https://www.mdpi.com/2079-9292/9/9/1363
G.P. Wagner, L. Altenberg, Perspective: complex adaptations and the evolution of evolvability. Evolution 50(3), 967–976 (1996)
N. Wagner, Z. Michalewicz, M. Khouja, R.R. McGregor, Time series forecasting for dynamic environments: the DyFor genetic program model. IEEE Trans. Evol. Comput. 11(4), 433–452 (2007)
R.A. Watson, J.B. Pollack, Modular interdependency in complex dynamical systems. Artif. Life 11(4), 445–457 (2005)
A.S. Yang, Modularity, evolvability, and adaptive radiations: a comparison of the hemi- and holometabolous insects. Evol. Develop. 3(2), 59–72 (2001)
M. Yang, Q. Hu, Y. Wang, Multi-task learning method for hierarchical time series forecasting, in Artificial Neural Networks and Machine Learning—ICANN 2019: Text and Time Series. ed. by I.V. Tetko, V. Kůrková, P. Karpov, F. Theis (Springer International Publishing, Cham, 2019), pp. 474–485
R. Yang, H. Xu, Y. Wu, X. Wang, Multi-task reinforcement learning with soft modularization. arXiv:2003.13661 (2020)
G.N. Yannakakis, J. Togelius, Artificial intelligence and games. Springer (2018). http://gameaibook.org
Acknowledgements
S.K. gratefully acknowledges support through the NSERC Postdoctoral Scholarship program. This material is based in part upon work supported by the National Science Foundation under Cooperative Agreement No. DBI-0939454 to the BEACON Center for Evolution in Action at Michigan State University. W.B. acknowledges support from the John R. Koza Endowment fund for part of this work. Michigan State University provided computational resources through the Institute for Cyber-Enabled Research. Additional support provided by ACENET, Calcul Québec, Compute Ontario and WestGrid, and Compute Canada (www.computecanada.ca). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Kelly, S., Voegerl, T., Banzhaf, W. et al. Evolving hierarchical memory-prediction machines in multi-task reinforcement learning. Genet Program Evolvable Mach 22, 573–605 (2021). https://doi.org/10.1007/s10710-021-09418-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-021-09418-4