research-article

Free access

Succinct and robust multi-agent communication with temporal message control

AUTHORs:

Sai Qian Zhang,

Qi ZhangAuthors Info & Claims

NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing Systems

Article No.: 1449, Pages 17271 - 17282

Published: 06 December 2020 Publication History

PDF eReader Publisher Site

Abstract

Recent studies have shown that introducing communication between agents can significantly improve overall performance in cooperative Multi-agent reinforcement learning (MARL). However, existing communication schemes often require agents to exchange an excessive number of messages at run-time under a reliable communication channel, which hinders its practicality in many real-world situations. In this paper, we present Temporal Message Control (TMC), a simple yet effective approach for achieving succinct and robust communication in MARL. TMC applies a temporal smoothing technique to drastically reduce the amount of information exchanged between agents. Experiments show that TMC can significantly reduce inter-agent communication overhead without impacting accuracy. Furthermore, TMC demonstrates much better robustness against transmission loss than existing approaches in lossy networking environments.

References

[1]

Raspberry pi website:. https://www.raspberrypi.org.

[2]

Tmc code repository. https://github.com/saizhang0218/TMC.

[3]

Tmc video demo. https://tmcpaper.github.io/tmc/.

[4]

J. B. Andersen, T. S. Rappaport, and S. Yoshida. Propagation measurements and models for wireless communications channels. IEEE Communications Magazine, 33(1):42-49, 1995.

Digital Library

[5]

A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau. Tarmac: Targeted multi-agent communication. arXiv preprint arXiv:1810.11187, 2018.

[6]

D. Eckhardt and P. Steenkiste. Measurement and analysis of the error characteristics of an in-building wireless network. In Conference proceedings on Applications, technologies, architectures, and protocols for computer communications, pages 243-254, 1996.

Digital Library

[7]

J. Foerster, I. A. Assael, N. de Freitas, and S. Whiteson. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems, pages 2137-2145, 2016.

Digital Library

[8]

J. N. Foerster, C. A. S. de Witt, G. Farquhar, P. H. Torr, W. Boehmer, and S. Whiteson. Multi-agent common knowledge reinforcement learning. arXiv preprint arXiv:1810.11702, 2018.

[9]

J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson. Counterfactual multi-agent policy gradients. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

[10]

M. Hausknecht and P. Stone. Deep recurrent q-learning for partially observable mdps. In 2015 AAAI Fall Symposium Series, 2015.

[11]

S. Iqbal and F. Sha. Actor-attention-critic for multi-agent reinforcement learning. arXiv preprint arXiv:1810.02912, 2018.

[12]

J. Jiang and Z. Lu. Learning attentional communication for multi-agent cooperation. In Advances in Neural Information Processing Systems, pages 7254-7264, 2018.

[13]

D. Kim, S. Moon, D. Hostallero, W. J. Kang, T. Lee, K. Son, and Y. Yi. Learning to schedule communication in multi-agent reinforcement learning. arXiv preprint arXiv:1902.01554, 2019.

[14]

W. Kim, M. Cho, and Y. Sung. Message-dropout: An efficient training method for multi-agent deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 6079-6086, 2019.

Digital Library

[15]

J. Kober, J. A. Bagnell, and J. Peters. "reinforcement learning in robotics: A survey.". The International Journal of Robotics Research, 2013.

Digital Library

[16]

A. Konrad, B. Y. Zhao, A. D. Joseph, and R. Ludwig. A markov-based channel model algorithm for wireless networks. Wireless Networks, 9(3):189-199, 2003.

Digital Library

[17]

M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Pérolat, D. Silver, and T. Graepel. A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems, pages 4190-4203, 2017.

[18]

J. Lin, K. Dzeparoska, S. Q. Zhang, A. Leon-Garcia, and N. Papernot. On the robustness of cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2003.03722, 2020.

[19]

R. Lowe, Y. Wu, A. Tamar, J. Harb, O. P. Abbeel, and I. Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems, pages 6379-6390, 2017.

Digital Library

[20]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. "playing atari with deep reinforcement learning.". arXiv preprint arXiv:1312.5602, 2013.

[21]

F. A. Oliehoek, M. T. Spaan, and N. Vlassis. Optimal and approximate q-value functions for decentralized pomdps. Journal of Artificial Intelligence Research, 32:289-353, 2008.

[22]

P. Peng, Y. Wen, Y. Yang, Q. Yuan, Z. Tang, H. Long, and J. Wang. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069, 2017.

[23]

T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. Foerster, and S. Whiteson. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint arXiv:1803.11485, 2018.

[24]

S. Rayanchu, A. Mishra, D. Agrawal, S. Saha, and S. Banerjee. Diagnosing wireless packet losses in 802.11: Separating collision from weak signal. In IEEE INFOCOM 2008-The 27th Conference on Computer Communications, pages 735-743. IEEE, 2008.

[25]

M. Samvelyan, T. Rashid, C. S. de Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C.-M. Hung, P. H. S. Torr, J. Foerster, and S. Whiteson. The StarCraft Multi-Agent Challenge. CoRR, abs/1902.04043, 2019.

[26]

P. Sarolahti, M. Kojo, and K. Raatikainen. F-rto: an enhanced recovery algorithm for tcp retransmission timeouts. ACM SIGCOMM Computer Communication Review, 33(2):51-63, 2003.

Digital Library

[27]

S.-S. Shai, S. Shammah, and A. Shashua. "safe, multi-agent, reinforcement learning for autonomous driving.". arXiv preprint arXiv:1610.03295, 2016.

[28]

S. Shalev-Shwartz, S. Shammah, and A. Shashua. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.

[29]

A. Sheth, S. Nedevschi, R. Patra, S. Surana, E. Brewer, and L. Subramanian. Packet loss characterization in wifi-based long distance networks. In IEEE INFOCOM 2007-26th IEEE International Conference on Computer Communications, pages 312-320. IEEE, 2007.

Digital Library

[30]

P. Skobelev, D. Budaev, N. Gusev, and G. Voschuk. Designing multi-agent swarm of uav for precise agriculture. In International Conference on Practical Applications of Agents and Multi-Agent Systems, pages 47-59. Springer, 2018.

[31]

K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:1905.05408, 2019.

[32]

S. Sukhbaatar, R. Fergus, et al. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems, pages 2244-2252, 2016.

Digital Library

[33]

P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, et al. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.

[34]

D. Tse and P. Viswanath. Fundamentals of wireless communication. Cambridge university press, 2005.

[35]

H. S. Wang and N. Moayeri. Finite-state markov channel-a useful model for radio communication channels. IEEE transactions on vehicular technology, 44(1):163-171, 1995.

[36]

M. Wiering. Multi-agent reinforcement learning for traffic light control. In Machine Learning: Proceedings of the Seventeenth International Conference (ICML'2000), pages 1151-1158, 2000.

[37]

M. Wunder, M. L. Littman, and M. Babes. Classes of multiagent q-learning dynamics with epsilon-greedy exploration. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 1167-1174. Citeseer, 2010.

Digital Library

[38]

G. Xylomenos and G. C. Polyzos. Tcp and udp performance over a wireless lan. In IEEE INFOCOM'99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No. 99CH36320), volume 2, pages 439-446. IEEE, 1999.

[39]

S. Q. Zhang, Q. Zhang, and J. Lin. Efficient communication in multi-agent reinforcement learning via variance based control. arXiv preprint arXiv:1909.02682, 2019.

[40]

M. Zorzi and R. R. Rao. On channel modeling for delay analysis of packet communications over wireless links. In Proceedings of the annual Allerton Conference on Communication Control and Computing, volume 36, pages 526-535. Citeseer, 1998.

Index Terms

Succinct and robust multi-agent communication with temporal message control
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence

Index terms have been assigned to the content through auto-classification.

Recommendations

Efficient communication in multi-agent reinforcement learning via variance based control
NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems

Multi-agent reinforcement learning (MARL) has recently received considerable attention due to its applicability to a wide range of real-world applications. However, achieving efficient communication among agents has always been an overarching problem in ...
Learning to communicate with Deep multi-agent reinforcement learning
NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems

We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the ...
Learning multiagent communication with backpropagation
NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems

Many tasks in AI require the collaboration of multiple agents. Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing Systems

December 2020

22651 pages

ISBN:9781713829546

Editors:
H. Larochelle
Google Research
,
M. Ranzato
Facebook AI Research
,
R. Hadsell
DeepMind
,
M.F. Balcan
Carnegie Mellon University
,
H. Lin
National Taiwan University

Copyright © 2020 Neural Information Processing Systems Foundation, Inc.

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 06 December 2020

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
89
Total Downloads

Downloads (Last 12 months)70
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten