Evolving hierarchical memory-prediction machines in multi-task reinforcement learning

Stephen Kelly ORCID: orcid.org/0000-0002-6071-4705¹,
Tatiana Voegerl¹,
Wolfgang Banzhaf¹ &
…
Cedric Gondro¹

664 Accesses
12 Citations
Explore all metrics

Abstract

A fundamental aspect of intelligent agent behaviour is the ability to encode salient features of experience in memory and use these memories, in combination with current sensory information, to predict the best action for each situation such that long-term objectives are maximized. The world is highly dynamic, and behavioural agents must generalize across a variety of environments and objectives over time. This scenario can be modeled as a partially-observable multi-task reinforcement learning problem. We use genetic programming to evolve highly-generalized agents capable of operating in six unique environments from the control literature, including OpenAI’s entire Classic Control suite. This requires the agent to support discrete and continuous actions simultaneously. No task-identification sensor inputs are provided, thus agents must identify tasks from the dynamics of state variables alone and define control policies for each task. We show that emergent hierarchical structure in the evolving programs leads to multi-task agents that succeed by performing a temporal decomposition and encoding of the problem environments in memory. The resulting agents are competitive with task-specific agents in all six environments. Furthermore, the hierarchical structure of programs allows for dynamic run-time complexity, which results in relatively efficient operation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evolutionary Computation and the Reinforcement Learning Problem

Qualitative differences between evolutionary strategies and reinforcement learning methods for control of autonomous agents

Article 07 December 2022

Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

With the parameters listed in Table 4, the team generation process creates 1575 new agents in each generation.
The population at any given generation includes 1575 new agents and 1575 elite agents from previous generations. The initial population size ($R_{size}$ in Table 4) is 1000. Thus, after 2 generations the 63 bins of elites will remain full, their content being recalculated in each generation based on the fitness of new agents.

References

A. Agapitos, M. O’Neill, A. Brabazon, Genetic programming for the induction of seasonal forecasts: A study on weather derivatives, in Financial Decision Making Using Computational Intelligence. ed. by M. Doumpos, C. Zopounidis, P.M. Pardalos (Springer, US, Boston, MA, 2012), pp. 159–188
A. Banino, A.P. Badia, R. Koster, M.J. Chadwick, V. Zambaldi, D. Hassabis, C. Barry, M. Botvinick, D. Kumaran, C. Blundell, Memo: A deep network for flexible combination of episodic memories. arXiv:2001.10913 (2020)
A.M. Barreto, D.A. Augusto, H.J. Barbosa, On the characteristics of sequential decision problems and their impact on evolutionary computation. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, GECCO ’09, p. 1767-1768. Association for Computing Machinery, New York, NY, USA (2009). https://doi.org/10.1145/1569901.1570150
G. Barth-Maron, M.W. Hoffman, D. Budden, W. Dabney, D. Horgan, D.TB, A. Muldal, N. Heess, T. Lillicrap, Distributed distributional deterministic policy gradients. arXiv:1804.08617 (2018)
C. Beattie, J.Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Küttler, A. Lefrancq, S. Green, V. Valdés, A. Sadik, J. Schrittwieser, K. Anderson, S. York, M. Cant, A. Cain, A. Bolton, S. Gaffney, H. King, D. Hassabis, S. Legg, S. Petersen, DeepMind Lab. arXiv:1612.03801 (2016)
M. Brameier, W. Banzhaf, Linear Genetic Programming (Springer, Berlin, 2007)
MATH Google Scholar
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, OpenAI Gym. arXiv:1606.01540 (2016)
C. D’Eramo, D. Tateo, A. Bonarini, M. Restelli, J. Peters, Sharing knowledge in multi-task deep reinforcement learning. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=rkgpv2VFvr
K. Desnos, N. Sourbier, P.Y. Raumer, O. Gesny, M. Pelcat, Gegelati: Lightweight Artificial Intelligence through Generic and Evolvable Tangled Program Graphs. In: Workshop on Design and Architectures for Signal and Image Processing (14th Edition), DASIP ’21, p. 35-43. ACM, New York, NY, USA (2021). https://doi.org/10.1145/3441110.3441575
C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A.A. Rusu, A. Pritzel, D. Wierstra, Pathnet: Evolution channels gradient descent in super neural networks. arXiv:1701.08734 (2017)
H. Fu, H. Tang, J. Hao, Z. Lei, Y, Chen, C. Fan, Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces. arXiv:1903.04959 (2019)
F. J. Gomez, J. Schmidhuber, Co-evolving recurrent neurons learn deep memory pomdps. In: Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation, GECCO ’05, p. 491-498. ACM, New York, NY, USA (2005). https://doi.org/10.1145/1068009.1068092
A. Goyal, A. Lamb, J. Hoffmann, S. Sodhani, S. Levine, Y. Bengio, B. Schölkopf, Recurrent independent mechanisms. arXiv:1909.10893 (2019)
K. Greff, R.K. Srivastava, J. Koutník, B.R. Steunebrink, J. Schmidhuber, Lstm: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017). https://doi.org/10.1109/TNNLS.2016.2582924
Article MathSciNet Google Scholar
M. Hessel, H. Soyer, L. Espeholt, W. Czarnecki, S. Schmitt, H. van Hasselt, Multi-task deep reinforcement learning with popart. Proceedings of the AAAI Conference on Artificial Intelligence 33(01), 3796–3803 (2019) https://doi.org/10.1609/aaai.v33i01.33013796. https://ojs.aaai.org/index.php/AAAI/article/view/4266
M.I. Heywood, Evolutionary model building under streaming data for classification tasks: opportunities and challenges. Genet. Program. Evol. Mach. 16(3), 283–326 (2015)
Article MathSciNet Google Scholar
J.H. Holland, Properties of the bucket brigade. In: Proceedings of the 1st International Conference on Genetic Algorithms, p. 1-7. L. Erlbaum Associates Inc., USA (1985)
S. Kelly, Scaling genetic programming to challenging reinforcement tasks through emergent modularity. Ph.D. thesis, Faculty of Computer Science, Dalhousie University (2018)
S. Kelly, Source code and animations (2021). Available at https://stephenkelly.ca/genp2021
S. Kelly, W. Banzhaf, Temporal memory sharing in visual reinforcement learning, in Genetic Programming Theory and Practice XVII. ed. by W. Banzhaf, L. Spector, L. Sheneman (Springer International Publishing, Cham, 2020), pp. 101–119
S. Kelly, M.I. Heywood, Discovering agent behaviors through code reuse: examples from half-field offense and Ms. Pac Man IEEE Trans. Games 10(2), 195–208 (2018)
Article Google Scholar
S. Kelly, M.I. Heywood, Emergent solutions to high-dimensional multitask reinforcement learning. Evol. Comput. 26(3), 347–380 (2018)
Article Google Scholar
S. Kelly, J. Newsted, W. Banzhaf, C. Gondro, A modular memory framework for time series prediction. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO ’20, pp. 949-957. ACM, New York, NY, USA (2020). https://doi.org/10.1145/3377930.3390216
J.F.C. Kingman, A simple model for the balance between selection and mutation. J. Appl. Prob. 15(1), 1–12 (1978)
Article MathSciNet Google Scholar
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A.A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, R. Hadsell, Overcoming catastrophic forgetting in neural networks. Proc. National Acad. Sci. 114(13), 3521–3526 (2017)
Article MathSciNet Google Scholar
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)
MATH Google Scholar
T.P. Lillicrap, J.J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, D., Wierstra, Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
L. Metz, J. Ibarz, N. Jaitly, J. Davidson, Discrete sequential prediction of continuous actions for deep RL. arXiv:1705.05035 (2017)
V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
D.E. Moriarty, A.C. Schultz, J.J. Grefenstette, Evolutionary algorithms for reinforcement learning. J. Artif. Int. Res. 11(1), 241–276 (1999)
MATH Google Scholar
A.M. Nedelcu, R.E. Michod, Evolvability, modularity, and individuality during the transition to multicellularity in volvocalean green algae. In: G. Schlosser, G. Wagner (eds.) Modularity in Development and Evolution, pp. 470–489. Chicago Press (2002)
E.O. Neftci, B.B. Averbeck, Reinforcement learning in artificial and biological systems. Nat. Mach. Intell. 1(3), 133–143 (2019). https://doi.org/10.1038/s42256-019-0025-4
Article Google Scholar
J. Oh, V. Chockalingam, S. Singh, H. Lee, Control of memory, active perception, and action in minecraft. arXiv:1605.09128 (2016)
R.J. Preen, L. Bull, Dynamical genetic programming in Xcsf. Evol. Comput. 21(3), 361–387 (2013)
Article Google Scholar
B. Recht, A tour of reinforcement learning: the view from continuous control. Ann. Rev. Control Robot. Auto. Syst. 2(1), 253–279 (2019)
Article Google Scholar
A.A. Rusu, S.G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mih, K. Kavukcuoglu, R. Hadsell, Policy distillation. arXiv:1511.06295 (2016)
A.A. Rusu, N.C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, R. Hadsell, Progressive neural networks. arXiv:1606.04671 (2016)
H.A. Simon, The architecture of complexity. Proc. Am. Philos. Soc. 106, 467–482 (1962)
Google Scholar
R.J. Smith, R. Amaral, M.I. Heywood, Evolving simple solutions to the CIFAR-10 benchmark using tangled program graphs. In: Proceedings of the 2021 IEEE Congress of Evolutionary Computation (CEC), paper to appear (2021)
R.J. Smith, M.I. Heywood, Evolving Dota 2 shadow fiend bots using genetic programming with external memory. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’19, pp. 179–187. ACM, New York, NY, USA (2019)
R.J. Smith, M.I. Heywood, A model of external memory for navigation in partially observable visual reinforcement learning tasks, in Genetic Programming. ed. by L. Sekanina, T. Hu, N. Lourenço, H. Richter, P. García-Sánchez (Springer International Publishing, Cham, 2019), pp. 162–177
R.S. Sutton, Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988). https://doi.org/10.1023/A:1022633531479
Article Google Scholar
R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (A Bradford Book, Cambridge, 2018)
MATH Google Scholar
N. Vithayathil Varghese, Q.H. Mahmoud, A survey of multi-task deep reinforcement learning. Electronics 9(9) (2020). https://doi.org/10.3390/electronics9091363. https://www.mdpi.com/2079-9292/9/9/1363
G.P. Wagner, L. Altenberg, Perspective: complex adaptations and the evolution of evolvability. Evolution 50(3), 967–976 (1996)
Article Google Scholar
N. Wagner, Z. Michalewicz, M. Khouja, R.R. McGregor, Time series forecasting for dynamic environments: the DyFor genetic program model. IEEE Trans. Evol. Comput. 11(4), 433–452 (2007)
Article Google Scholar
R.A. Watson, J.B. Pollack, Modular interdependency in complex dynamical systems. Artif. Life 11(4), 445–457 (2005)
Article Google Scholar
A.S. Yang, Modularity, evolvability, and adaptive radiations: a comparison of the hemi- and holometabolous insects. Evol. Develop. 3(2), 59–72 (2001)
Article Google Scholar
M. Yang, Q. Hu, Y. Wang, Multi-task learning method for hierarchical time series forecasting, in Artificial Neural Networks and Machine Learning—ICANN 2019: Text and Time Series. ed. by I.V. Tetko, V. Kůrková, P. Karpov, F. Theis (Springer International Publishing, Cham, 2019), pp. 474–485
R. Yang, H. Xu, Y. Wu, X. Wang, Multi-task reinforcement learning with soft modularization. arXiv:2003.13661 (2020)
G.N. Yannakakis, J. Togelius, Artificial intelligence and games. Springer (2018). http://gameaibook.org

Download references

Acknowledgements

S.K. gratefully acknowledges support through the NSERC Postdoctoral Scholarship program. This material is based in part upon work supported by the National Science Foundation under Cooperative Agreement No. DBI-0939454 to the BEACON Center for Evolution in Action at Michigan State University. W.B. acknowledges support from the John R. Koza Endowment fund for part of this work. Michigan State University provided computational resources through the Institute for Cyber-Enabled Research. Additional support provided by ACENET, Calcul Québec, Compute Ontario and WestGrid, and Compute Canada (www.computecanada.ca). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

BEACON Center for the Study of Evolution in Action, Michigan State University, East Lansing, MI, USA
Stephen Kelly, Tatiana Voegerl, Wolfgang Banzhaf & Cedric Gondro

Authors

Stephen Kelly
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Voegerl
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Banzhaf
View author publications
You can also search for this author in PubMed Google Scholar
Cedric Gondro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen Kelly.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kelly, S., Voegerl, T., Banzhaf, W. et al. Evolving hierarchical memory-prediction machines in multi-task reinforcement learning. Genet Program Evolvable Mach 22, 573–605 (2021). https://doi.org/10.1007/s10710-021-09418-4

Download citation

Published: 09 October 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10710-021-09418-4

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evolutionary Computation and the Reinforcement Learning Problem

Qualitative differences between evolutionary strategies and reinforcement learning methods for control of autonomous agents

Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Evolving hierarchical memory-prediction machines in multi-task reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evolutionary Computation and the Reinforcement Learning Problem

Qualitative differences between evolutionary strategies and reinforcement learning methods for control of autonomous agents

Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now