research-article

Q-value functions for decentralized POMDPs

Authors:

Frans A. Oliehoek,

Nikos VlassisAuthors Info & Claims

AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems

Article No.: 220, Pages 1 - 8

https://doi.org/10.1145/1329125.1329390

Published: 14 May 2007 Publication History

Abstract

Planning in single-agent models like MDPs and POMDPs can be carried out by resorting to Q-value functions: a (near-) optimal Q-value function is computed in a recursive manner by dynamic programming, and then a policy is extracted from this value function. In this paper we study whether similar Q-value functions can be defined in decentralized POMDP models (Dec-POMDPs), what the cost of computing such value functions is, and how policies can be extracted from such value functions. Using the framework of Bayesian games, we argue that searching for the optimal Q-value function may be as costly as exhaustive policy search. Then we analyze various approximate Q-value functions that allow efficient computation. Finally, we describe a family of algorithms for extracting policies from such Q-value functions.

References

[1]

R. Becker, S. Zilberstein, V. Lesser, and C. V. Goldman. Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research (JAIR), 22:423--455, December 2004.

Digital Library

[2]

D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of Markov decision processes. Math. Oper. Res., 27(4):819--840, 2002.

Digital Library

[3]

C. Boutilier. Planning, learning and coordination in multiagent decision processes. In TARK '96: Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge, pages 195--210, 1996.

Digital Library

[4]

R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for partially observable stochastic games with common payoffs. In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems, pages 136--143, 2004.

Digital Library

[5]

C. Guestrin, D. Koller, and R. Parr. Multiagent planning with factored MDPs. In Advances in Neural Information Processing Systems 14, pages 1523--1530, 2002.

Digital Library

[6]

E. A. Hansen, D. S. Bernstein, and S. Zilberstein. Dynamic programming for partially observable stochastic games. In Proceedings of the Nineteenth National Conference on Artificial Intelligence, pages 709--715, 2004.

Digital Library

[7]

L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artif. Intell., 101(1--2):99--134, 1998.

Digital Library

[8]

J. R. Kok and N. Vlassis. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 7:1789--1828, 2006.

Digital Library

[9]

M. Littman, A. Cassandra, and L. Kaelbling. Learning policies for partially observable environments: Scaling up. In International Conference on Machine Learning, pages 362--370, 1995.

Digital Library

[10]

R. Nair, M. Tambe, M. Yokoo, D. V. Pynadath, and S. Marsella. Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pages 705--711, 2003.

Digital Library

[11]

M. J. Osborne and A. Rubinstein. A Course in Game Theory. The MIT Press, July 1994.

[12]

C. H. Papadimitriou and J. N. Tsitsiklis. The complexity of Markov decision processes. Mathematics of Operations Research, 12(3):441--451, 1987.

Digital Library

[13]

M. L. Puterman. Markov Decision Processes---Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, 1994.

Digital Library

[14]

M. Roth, R. Simmons, and M. Veloso. Reasoning about joint beliefs for execution-time communication decisions. In Proc. of Int. Joint Conference on Autonomous Agents and Multi Agent Systems, pages 786--793, 2005.

Digital Library

[15]

P. Stone and M. Veloso. Multiagent systems: a survey from a machine learning perspective. Autonomous Robots, 8(3), 2000.

Digital Library

[16]

D. Szer, F. Charpillet, and S. Zilberstein. MAA*: A heuristic search algorithm for solving decentralized POMDPs. In Proc. of the Twenty First Conference on Uncertainty in Artificial Intelligence, 2005.

[17]

N. Vlassis. A concise introduction to multiagent systems and distributed AI. Informatics Institute, University of Amsterdam, Sept. 2003.

Digital Library

[18]

G. Weiss, editor. Multiagent Systems: a Modern Approach to Distributed Artificial Intelligence. MIT Press, 1999.

Digital Library

Cited By

Qu SGuo RCao ZLiu JSu BLiu M(2024)An Effective Training Method for Counterfactual Multi-Agent Policy Network Based on Differential Evolution AlgorithmApplied Sciences10.3390/app1418838314:18(8383)Online publication date: 18-Sep-2024
https://doi.org/10.3390/app14188383
Mostaani AVu TSharma SNguyen VLiao QChatzinotas S(2022)Task-Oriented Communication Design in Cyber-Physical Systems: A Survey on Theory and ApplicationsIEEE Access10.1109/ACCESS.2022.323103910(133842-133868)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3231039
Pesce EMontana G(2020)Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communicationMachine Learning10.1007/s10994-019-05864-5Online publication date: 23-Jan-2020
https://doi.org/10.1007/s10994-019-05864-5
Show More Cited By

Index Terms

Q-value functions for decentralized POMDPs
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems

Recommendations

Optimal and approximate Q-value functions for decentralized POMDPs

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting ...
Producing efficient error-bounded solutions for transition independent decentralized mdps
AAMAS '13: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

There has been substantial progress on algorithms for single-agent sequential decision making using partially observable Markov decision processes (POMDPs). A number of efficient algorithms for solving POMDPs share two desirable properties: error-bounds ...
Decentralized planning under uncertainty for teams of communicating agents
AAMAS '06: Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems

Decentralized partially observable Markov decision processes (DEC-POMDPs) form a general framework for planning for groups of cooperating agents that inhabit a stochastic and partially observable environment. Unfortunately, computing optimal plans in a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

AAMAS '07: Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems

May 2007

1585 pages

ISBN:9788190426275

DOI:10.1145/1329125

Conference Chairs:
Edmund Durfee
University of Michigan
,
Makoto Yokoo
Kyushu University
,
Program Chairs:
Michael Huhns
University of South Carolina
,
Onn Shehory
IBM Haifa Research Lab, Israel

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IFAAMAS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ICIS

Conference

AAMAS07

Sponsor:

AAMAS07: International Conference on Autonomous Agents and Mulitagent Systems

May 14 - 18, 2007

Hawaii, Honolulu

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
261
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qu SGuo RCao ZLiu JSu BLiu M(2024)An Effective Training Method for Counterfactual Multi-Agent Policy Network Based on Differential Evolution AlgorithmApplied Sciences10.3390/app1418838314:18(8383)Online publication date: 18-Sep-2024
https://doi.org/10.3390/app14188383
Mostaani AVu TSharma SNguyen VLiao QChatzinotas S(2022)Task-Oriented Communication Design in Cyber-Physical Systems: A Survey on Theory and ApplicationsIEEE Access10.1109/ACCESS.2022.323103910(133842-133868)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3231039
Pesce EMontana G(2020)Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communicationMachine Learning10.1007/s10994-019-05864-5Online publication date: 23-Jan-2020
https://doi.org/10.1007/s10994-019-05864-5
Kang BKim K(2012)Exploiting symmetries for single- and multi-agent Partially Observable Stochastic DomainsArtificial Intelligence10.1016/j.artint.2012.01.003182-183(32-57)Online publication date: 1-May-2012
https://dl.acm.org/doi/10.1016/j.artint.2012.01.003
Oliehoek F(2012)Decentralized POMDPsReinforcement Learning10.1007/978-3-642-27645-3_15(471-503)Online publication date: 2012
https://doi.org/10.1007/978-3-642-27645-3_15
Wu FZilberstein SChen X(2011)Online planning for multi-agent systems with bounded communicationArtificial Intelligence10.1016/j.artint.2010.09.008175:2(487-511)Online publication date: 1-Feb-2011
https://dl.acm.org/doi/10.1016/j.artint.2010.09.008
Wu FZilberstein SChen XLuck MSen S(2010)Point-based policy generation for decentralized POMDPsProceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 110.5555/1838206.1838377(1307-1314)Online publication date: 10-May-2010
https://dl.acm.org/doi/10.5555/1838206.1838377
Oliehoek FSpaan MVlassis N(2008)Optimal and approximate Q-value functions for decentralized POMDPsJournal of Artificial Intelligence Research10.5555/1622673.162268032:1(289-353)Online publication date: 1-May-2008
https://dl.acm.org/doi/10.5555/1622673.1622680
Oliehoek FSpaan MWhiteson SVlassis NPadgham LParkes D(2008)Exploiting locality of interaction in factored Dec-POMDPsProceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 110.5555/1402383.1402457(517-524)Online publication date: 12-May-2008
https://dl.acm.org/doi/10.5555/1402383.1402457
Oliehoek FKooij JVlassis N(2008)A Cross-Entropy Approach to Solving Dec-POMDPsAdvances in Intelligent and Distributed Computing10.1007/978-3-540-74930-1_15(145-154)Online publication date: 2008
https://doi.org/10.1007/978-3-540-74930-1_15

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents