Nothing Special   »   [go: up one dir, main page]

Skip to main content

Biologically Inspired Reinforcement Learning: Reward-Based Decomposition for Multi-goal Environments

  • Conference paper
Biologically Inspired Approaches to Advanced Information Technology (BioADIT 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3141))

Abstract

We present an emotion-based hierarchical reinforcement learning (HRL) algorithm for environments with multiple sources of reward. The architecture of the system is inspired by the neurobiology of the brain and particularly those areas responsible for emotions, decision making and behaviour execution, being the amygdala, the orbito-frontal cortex and the basal ganglia respectively. The learning problem is decomposed according to sources of reward. A reward source serves as a goal for a given subtask. Each subtask is assigned an artificial emotion indication (AEI) which predicts the reward component associated with the subtask. The AEIs are learned along with the top-level policy simultaneously and used to interrupt subtask execution when the AEIs change significantly. The algorithm is tested in a simulated gridworld which has two sources of reward and is partially observable. Experiments are performed comparing the emotion based algorithm with other HRL algorithms under the same learning conditions. The use of the biologically inspired architecture significantly accelerates the learning process and achieves higher long term reward compared to a human designed policy and a restricted form of the MAXQ algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)

    MATH  MathSciNet  Google Scholar 

  2. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1), 181–211 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  3. Hengst, B.: Discovering hierarchy in reinforcement learning with HEXQ. In: Hoffmann, C.S., Achim (eds.) Nineteenth International Conference on Machine Learning, Sydney Australia, pp. 243–250 (2002)

    Google Scholar 

  4. Shelton, C.R.: Balancing multiple sources of reward in reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1082–1088. MIT Press, Cambridge (2001)

    Google Scholar 

  5. Rolls, E.T.: The brain and emotion. Oxford University Press, Oxford (1999)

    Google Scholar 

  6. Schultz, W.L., Tremblay, L., Hollerman, J.R.: Reward processing in primate orbitofrontal cortex and basla ganglia. Cerebral Cortex 10, 272–283 (2000)

    Article  Google Scholar 

  7. Zhou, W., Coggins, R.: A biologically inspired hierarchical reinforcement learning system. Cybernetics and Systems (2004) (to appear)

    Google Scholar 

  8. LeDoux, J.E.: Brain mechanisms of emotion and emotional learning. Current Opinion in Neurobiology 2, 191–197 (1992)

    Article  MathSciNet  Google Scholar 

  9. Barto, G.: Adaptive critics and the basal ganglia. In: Davis, J.L., Houk Beiser, J.C. (eds.) Models of information processing in the basal ganglia, pp. 215–232. MIT Press, Cambridge (1995)

    Google Scholar 

  10. Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)

    Google Scholar 

  11. Bradtke, S.J., Duff, M.O.: Reinforcement learning methods for continuous-time markov decision problems. In: Advances in Neural Information Processing Systems, vol. 7, pp. 393–500. MIT Press, Cambridge (1995)

    Google Scholar 

  12. Gadanho, J., Hallam, S.: Emotion-triggered learning in autonomous robot control. Cybernetics and Systems 32(5), 531–559 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, W., Coggins, R. (2004). Biologically Inspired Reinforcement Learning: Reward-Based Decomposition for Multi-goal Environments. In: Ijspeert, A.J., Murata, M., Wakamiya, N. (eds) Biologically Inspired Approaches to Advanced Information Technology. BioADIT 2004. Lecture Notes in Computer Science, vol 3141. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27835-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27835-1_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23339-8

  • Online ISBN: 978-3-540-27835-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics