Coordinated Behavior for Sequential Cooperative Task Using Two-Stage Reward Assignment with Decay

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

International Conference on Neural Information Processing

Abstract

Recently, multi-agent deep reinforcement learning (MADRL) has been studied to learn actions to achieve complicated tasks and generate their coordination structure. The reward assignment in MADRL is a crucial factor to guide and produce both their behaviors for their own tasks and coordinated behaviors by agents’ individual learning. However, it has not been sufficiently clarified the reward assignment in MADRL’s effect on learned coordinated behavior. To address this issue, using the sequential tasks, coordinated delivery and execution problem with expiration time, we analyze the effect of various ratios of the reward given for the task that agent is responsible for to the reward given for the whole task. Then, we propose a two-stage reward assignment with decay to learn the actions for tasks that the agent is responsible for and coordinated actions for facilitating other agents’ tasks. We experimentally showed that the proposed method enabled agents to learn both actions in a balanced manner, so they could realize effective coordination, by reducing the number of tasks that were ignored by other agents. We also analyzed the mechanism behind the emergence of different coordinated behaviors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Two-stage reward allocation with decay for multi-agent coordinated behavior for sequential cooperative task by using deep reinforcement learning

Article Open access 27 May 2022

Shifting Reward Assignment for Learning Coordinated Behavior in Time-Limited Ordered Tasks

Deep Skill Chaining with Diversity for Multi-agent Systems*

References

Chang, Y.H., Ho, T., Kaelbling, L.P.: All learning is local: multi-agent learning in global reward games. In: Proceedings of the 16th International Conference on Neural Information Processing Systems, NIPS 2003, pp. 807–814. MIT Press, Cambridge (2003)
Google Scholar
Foerster, J., Nardelli, N., Farquhar, G., Torr, P., Kohli, P., Whiteson, S., et al.: Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1146–1155 (2017)
Google Scholar
Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396. IEEE (2017)
Google Scholar
Lample, G., Chaplot, D.S.: Playing fps games with deep reinforcement learning. In: AAAI, pp. 2140–2146 (2017)
Google Scholar
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Palmer, G., Tuyls, K., Bloembergen, D., Savani, R.: Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 443–451. International Foundation for Autonomous Agents and Multiagent Systems (2018)
Google Scholar
Peng, X.B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Sim-to-real transfer of robotic control with dynamics randomization. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–8. IEEE (2018)
Google Scholar
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI, Phoenix, AZ, vol. 2, p. 5 (2016)
Google Scholar

Download references

Acknowledgements

This work was partly supported by JSPS KAKENHI Grant Number 17KT0044, 20H04245.

Author information

Authors and Affiliations

Computer Science and Engineering, Waseda University, Tokyo, 1698555, Japan
Yuki Miyashita & Toshiharu Sugawara
Shimizu Corporation, Tokyo, 1040031, Japan
Yuki Miyashita

Authors

Yuki Miyashita
View author publications
You can also search for this author in PubMed Google Scholar
Toshiharu Sugawara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuki Miyashita .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, China
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Miyashita, Y., Sugawara, T. (2020). Coordinated Behavior for Sequential Cooperative Task Using Two-Stage Reward Assignment with Decay. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-63833-7_22
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Coordinated Behavior for Sequential Cooperative Task Using Two-Stage Reward Assignment with Decay

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Two-stage reward allocation with decay for multi-agent coordinated behavior for sequential cooperative task by using deep reinforcement learning

Shifting Reward Assignment for Learning Coordinated Behavior in Time-Limited Ordered Tasks

Deep Skill Chaining with Diversity for Multi-agent Systems*

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Coordinated Behavior for Sequential Cooperative Task Using Two-Stage Reward Assignment with Decay

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Two-stage reward allocation with decay for multi-agent coordinated behavior for sequential cooperative task by using deep reinforcement learning

Shifting Reward Assignment for Learning Coordinated Behavior in Time-Limited Ordered Tasks

Deep Skill Chaining with Diversity for Multi-agent Systems*

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation