research-article

Simultaneously Learning and Advising in Multiagent Reinforcement Learning

Authors:

Felipe Leno da Silva,

Anna Helena Reali CostaAuthors Info & Claims

AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Pages 1100 - 1108

Published: 08 May 2017 Publication History

Abstract

Reinforcement Learning has long been employed to solve sequential decision-making problems with minimal input data. However, the classical approach requires a large number of interactions with an environment to learn a suitable policy. This problem is further intensified when multiple autonomous agents are simultaneously learning in the same environment. The teacher-student approach aims at alleviating this problem by integrating an advising procedure in the learning process, in which an experienced agent (human or not) can advise a student to guide her exploration. Even though previous works reported that an agent can learn faster when receiving advice, their proposals require that the teacher is an expert in the learning task. Sharing successful episodes can also accelerate learning, but this procedure requires a lot of communication between agents, which is unfeasible for domains in which communication is limited. Thus, we here propose a multiagent advising framework where multiple agents can advise each other while learning in a shared environment. If in any state an agent is unsure about what to do, it can ask for advice to other agents and may receive answers from agents that have more confidence in their actuation for that state. We perform experiments in a simulated Robot Soccer environment and show that the learning process is improved by incorporating this kind of advice.

References

[1]

H. Akiyama. Helios team base code. https://osdn.jp/projects/rctools/, 2012.

[2]

O. Amir, E. Kamar, A. Kolobov, and B. Grosz. Interactive teaching strategies for agent training. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI), pages 804--811, 2016.

Digital Library

[3]

M. G. Azar, A. Lazaric, and E. Brunskill. Regret bounds for reinforcement learning with policy advice. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), pages 97--112. Springer, 2013.

[4]

L. Busoniu, R. Babuska, and B. De Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2):156--172, 2008.

Digital Library

[5]

J. A. Clouse. Learning from an automated training agent. In Adaptation and Learning in Multiagent Systems. Springer Verlag, 1996.

[6]

J. A. Clouse and P. E. Utgoff. A teaching method for reinforcement learning. In Proceedings of the 9th International Workshop on Machine Learning, pages 92--101, 1992.

Digital Library

[7]

F. Fernández and M. Veloso. Probabilistic policy reuse in a reinforcement learning agent. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 720--727, New York, NY, USA, 2006. ACM.

Digital Library

[8]

D. Garant, B. C. Silva, V. Lesser, and C. Zhang. Accelerating multi-agent reinforcement learning with dynamic co-learning. Technical report, 2015.

[9]

M. Hausknecht, P. Mupparaju, S. Subramanian, S. Kalyanakrishnan, and P. Stone. Half field offense: An environment for multiagent learning and ad hoc teamwork. In AAMAS Adaptive Learning Agents (ALA) Workshop, 2016.

[10]

H. Kitano, M. Asada, Y. Kuniyoshi, I. Noda, and E. Osawa. Robocup: The robot world cup initiative. In Proceedings of the 1st International Conference on Autonomous agents (IAA97), pages 340--347. ACM, 1997.

Digital Library

[11]

H. Kitano, M. Asada, Y. Kuniyoshi, I. Noda, E. Osawa, and H. Matsubara. Robocup: A challenge problem for AI. AI magazine, 18(1):73, 1997.

Digital Library

[12]

H. Kitano, M. Tambe, P. Stone, M. Veloso, S. Coradeschi, E. Osawa, H. Matsubara, I. Noda, and M. Asada. The robocup synthetic agent challenge 97. In RoboCup-97: Robot Soccer World Cup I, pages 62--73. Springer, 1998.

Digital Library

[13]

M. L. Koga, V. F. Silva, and A. H. R. Costa. Stochastic abstract policies: Generalizing knowledge to improve reinforcement learning. IEEE Transactions on Cybernetics, 45(1):77--88, 2015.

[14]

M. Lauer and M. Riedmiller. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th International Conference on Machine Learning (ICML), pages 535--542, 2000.

Digital Library

[15]

M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning (ICML), pages 157--163, 1994.

Digital Library

[16]

M. L. Littman. Reinforcement learning improves behaviour from evaluative feedback. Nature, 521(7553):445--451, 2015.

[17]

R. Maclin, J. W. Shavlik, and P. Kaelbling. Creating advice-taking reinforcement learners. In Machine Learning, pages 251--281, 1996.

Digital Library

[18]

D. Miller, A. Sun, M. Johns, H. Ive, D. Sirkin, S. Aich, and W. Ju. Distraction becomes engagement in automated driving. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, volume 59, pages 1676--1680. SAGE Publications, 2015.

[19]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529--533, 2015.

[20]

A. Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, and E. Liang. Autonomous inverted helicopter flight via reinforcement learning. In Experimental Robotics IX, pages 363--372. Springer, 2006.

[21]

L. Nunes and E. Oliveira. On learning by exchanging advice. Journal of Artificial Intelligence and the Simulation of Behaviour, 1(3):241--257, July 2003.

[22]

L. Panait and S. Luke. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 11(3):387--434, 2005.

Digital Library

[23]

M. L. Puterman. Markov Decision Processes : Discrete Stochastic Dynamic Programming. J. Wiley & Sons, Hoboken (N. J.), 2005.

Digital Library

[24]

A. A. Sherstov and P. Stone. Function approximation via Tile Coding: Automating parameter choice. In Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA), pages 194--205, 2005.

Digital Library

[25]

F. L. Silva and A. H. R. Costa. Accelerating Multiagent Reinforcement Learning through Transfer Learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, pages 5034--5035, 2017.

[26]

R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in neural information processing systems, pages 1038--1044, 1996.

Digital Library

[27]

R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, USA, 1st edition, 1998.

Digital Library

[28]

M. Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the 10th International Conference on Machine Learning (ICML), pages 330--337, 1993.

Digital Library

[29]

M. E. Taylor, N. Carboni, A. Fachantidis, I. P. Vlahavas, and L. Torrey. Reinforcement learning agents providing advice in complex video games. Connection Science, 26(1):45--63, 2014.

[30]

M. E. Taylor and P. Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10:1633--1685, 2009.

Digital Library

[31]

L. Torrey and M. E. Taylor. Teaching on a budget: agents advising agents in reinforcement learning. In Proceedings of 12th the International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pages 1053--1060, 2013.

Digital Library

[32]

L. Torrey, T. Walker, J. Shavlik, and R. Maclin. Using advice to transfer knowledge acquired in one reinforcement learning task to another. In Proceedings of the 16th European Conference on Machine Learning (ECAI), pages 412--424, 2005.

Digital Library

[33]

C. J. Watkins and P. Dayan. Q-learning. Machine learning, 8(3):279--292, 1992.

Digital Library

[34]

Y. Zhan, H. Bou-Ammar, and M. E. Taylor. Theoretically-grounded policy advice from multiple teachers in reinforcement learning settings with applications to negative transfer. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI), pages 2315--2321, 2016.

Digital Library

[35]

M. Zimmer, P. Viappiani, and P. Weng. Teacher-student framework: a reinforcement learning approach. In AAMAS workshop Autonomous Robots and Multirobot Systems, 2014.

Cited By

Zhou ZHu BZhao CZhang PLiu BLarson K(2024)Large language model as a policy teacher for training reinforcement learning agentsProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/627(5671-5679)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/627
Jin YWei SYuan JZhang XPelachaud CTaylor MFaliszewski PMascardi V(2022)Learning to Advise and Learning from Advice in Cooperative Multiagent Reinforcement LearningProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3536063(1645-1647)Online publication date: 9-May-2022
https://dl.acm.org/doi/10.5555/3535850.3536063
Zhu CLeung HHu SCai Y(2021)A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget ConstraintACM Transactions on Autonomous and Adaptive Systems10.1145/344726815:2(1-28)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3447268
Show More Cited By

Index Terms

Simultaneously Learning and Advising in Multiagent Reinforcement Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Cooperation and coordination
      2. Multi-agent systems
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
      2. Reinforcement learning
        Multi-agent reinforcement learning

Recommendations

Learning Cooperative Behaviours in Multiagent Reinforcement Learning
ICONIP '09: Proceedings of the 16th International Conference on Neural Information Processing: Part I

We investigated the coordination among agents in a goal finding task in a partially observable environment. In our problem formulation, the task was to locate a goal in a 2D space. However, no information related to the goal was given to the agents ...
Learning to Advise and Learning from Advice in Cooperative Multiagent Reinforcement Learning
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

We propose a novel policy-level generative adversarial learning framework to enhance cooperative multiagent reinforcement learning (MARL), which consists of a centralized advisor, MARL agents and discriminators. The advisor is realized through a dual ...
Advice taking in multiagent reinforcement learning
AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems

This paper proposes the β-WoLF algorithm for multiagent reinforcement learning (MARL) that uses an additional "advice" signal to inform agents about mutually beneficial forms of behaviour. β-WoLF is an extension of the WoLF-PHC algorithm that allows ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

May 2017

1914 pages

General Chairs:
Kate Larson
University of Waterloo, Canada
,
Michael Winikoff
University of Otago, New Zealand
,
Program Chairs:
Sanmay Das
Washington University in St. Louis, USA
,
Edmund Durfee
University of Michigan, USA

Sponsors

IFAAMAS

In-Cooperation

ACM: Association for Computing Machinery

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2017

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

CNPq
CAPES
Google Latin America Research Award
São Paulo Research Foundation (FAPESP)

Acceptance Rates

AAMAS '17 Paper Acceptance Rate 127 of 457 submissions, 28%;

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
273
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou ZHu BZhao CZhang PLiu BLarson K(2024)Large language model as a policy teacher for training reinforcement learning agentsProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/627(5671-5679)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/627
Jin YWei SYuan JZhang XPelachaud CTaylor MFaliszewski PMascardi V(2022)Learning to Advise and Learning from Advice in Cooperative Multiagent Reinforcement LearningProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems10.5555/3535850.3536063(1645-1647)Online publication date: 9-May-2022
https://dl.acm.org/doi/10.5555/3535850.3536063
Zhu CLeung HHu SCai Y(2021)A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget ConstraintACM Transactions on Autonomous and Adaptive Systems10.1145/344726815:2(1-28)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3447268
Zhu CCai YLeung HHu SEl Fallah Seghrouchni ASukthankar GAn BYorke-Smith N(2020)Learning by Reusing Previous Advice in Teacher-Student ParadigmProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3398761.3398953(1674-1682)Online publication date: 5-May-2020
https://dl.acm.org/doi/10.5555/3398761.3398953
Kim DLiu MOmidshafiei SLopez-Cot SRiemer MHabibi GTesauro GMourad SCampbell MHow JEl Fallah Seghrouchni ASukthankar GAn BYorke-Smith N(2020)Learning Hierarchical Teaching Policies for Cooperative AgentsProceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3398761.3398836(620-628)Online publication date: 5-May-2020
https://dl.acm.org/doi/10.5555/3398761.3398836
Silva FElkind EVeloso MAgmon NTaylor M(2019)Integrating Agent Advice and Previous Task Solutions in Multiagent Reinforcement LearningProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332142(2447-2448)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3332142
Zhu CLeung HHu SCai YElkind EVeloso MAgmon NTaylor M(2019)A Q-values Sharing Framework for Multiple Independent Q-learnersProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332099(2324-2326)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3332099
Da Silva FCosta A(2019)A survey on transfer learning for multiagent reinforcement learning systemsJournal of Artificial Intelligence Research10.1613/jair.1.1139664:1(645-703)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1613/jair.1.11396
Da Silva FTaylor MCosta A(2018)Autonomously reusing knowledge in multiagent reinforcement learningProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304788(5487-5493)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304652.3304788
Silva FCosta AAndre EKoenig SDastani MSukthankar G(2018)Object-Oriented Curriculum Generation for Reinforcement LearningProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237850(1026-1034)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.5555/3237383.3237850
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten