Abstract
We present an emotion-based hierarchical reinforcement learning (HRL) algorithm for environments with multiple sources of reward. The architecture of the system is inspired by the neurobiology of the brain and particularly those areas responsible for emotions, decision making and behaviour execution, being the amygdala, the orbito-frontal cortex and the basal ganglia respectively. The learning problem is decomposed according to sources of reward. A reward source serves as a goal for a given subtask. Each subtask is assigned an artificial emotion indication (AEI) which predicts the reward component associated with the subtask. The AEIs are learned along with the top-level policy simultaneously and used to interrupt subtask execution when the AEIs change significantly. The algorithm is tested in a simulated gridworld which has two sources of reward and is partially observable. Experiments are performed comparing the emotion based algorithm with other HRL algorithms under the same learning conditions. The use of the biologically inspired architecture significantly accelerates the learning process and achieves higher long term reward compared to a human designed policy and a restricted form of the MAXQ algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1), 181–211 (1999)
Hengst, B.: Discovering hierarchy in reinforcement learning with HEXQ. In: Hoffmann, C.S., Achim (eds.) Nineteenth International Conference on Machine Learning, Sydney Australia, pp. 243–250 (2002)
Shelton, C.R.: Balancing multiple sources of reward in reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1082–1088. MIT Press, Cambridge (2001)
Rolls, E.T.: The brain and emotion. Oxford University Press, Oxford (1999)
Schultz, W.L., Tremblay, L., Hollerman, J.R.: Reward processing in primate orbitofrontal cortex and basla ganglia. Cerebral Cortex 10, 272–283 (2000)
Zhou, W., Coggins, R.: A biologically inspired hierarchical reinforcement learning system. Cybernetics and Systems (2004) (to appear)
LeDoux, J.E.: Brain mechanisms of emotion and emotional learning. Current Opinion in Neurobiology 2, 191–197 (1992)
Barto, G.: Adaptive critics and the basal ganglia. In: Davis, J.L., Houk Beiser, J.C. (eds.) Models of information processing in the basal ganglia, pp. 215–232. MIT Press, Cambridge (1995)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Bradtke, S.J., Duff, M.O.: Reinforcement learning methods for continuous-time markov decision problems. In: Advances in Neural Information Processing Systems, vol. 7, pp. 393–500. MIT Press, Cambridge (1995)
Gadanho, J., Hallam, S.: Emotion-triggered learning in autonomous robot control. Cybernetics and Systems 32(5), 531–559 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhou, W., Coggins, R. (2004). Biologically Inspired Reinforcement Learning: Reward-Based Decomposition for Multi-goal Environments. In: Ijspeert, A.J., Murata, M., Wakamiya, N. (eds) Biologically Inspired Approaches to Advanced Information Technology. BioADIT 2004. Lecture Notes in Computer Science, vol 3141. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27835-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-27835-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23339-8
Online ISBN: 978-3-540-27835-1
eBook Packages: Springer Book Archive