research-article

Approximate solutions for factored Dec-POMDPs with many agents

Authors:

Frans A. Oliehoek,

Shimon Whiteson,

Matthijs T.J. SpaanAuthors Info & Claims

AAMAS '13: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Pages 563 - 570

Published: 06 May 2013 Publication History

Abstract

Dec-POMDPs are a powerful framework for planning in multiagent systems, but are provably intractable to solve. Despite recent work on scaling to more agents by exploiting weak couplings in factored models, scalability for unrestricted subclasses remains limited. This paper proposes a factored forward-sweep policy computation method that tackles the stages of the problem one by one, exploiting weakly coupled structure at each of these stages. To enable the method to scale to many agents, we propose a set of approximations: approximation of stages using a sparse interaction structure, bootstrapping off smaller tasks to compute heuristic payoff functions, and employing approximate inference to estimate required probabilities at each stage and to compute the best decision rules. An empirical evaluation shows that the loss in solution quality due to these approximations is small and that the proposed method achieves unprecedented scalability, solving Dec-POMDPs with hundreds of agents.

References

[1]

R. Becker, S. Zilberstein, V. Lesser, and C. V. Goldman. Transition-independent decentralized Markov decision processes. In AAMAS, 2003.

Digital Library

[2]

D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of Markov decision processes. Math. of OR, 27(4):819--840, 2002.

Digital Library

[3]

C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. JAIR, 11:1--94, 1999.

Digital Library

[4]

X. Boyen and D. Koller. Tractable inference for complex stochastic processes. In UAI, 1998.

Digital Library

[5]

R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for partially observable stochastic games with common payoffs. In AAMAS, 2004.

Digital Library

[6]

R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Game theoretic control for robot teams. In Proc. of the IEEE Int. Conf. on Robotics and Automation, 2005.

[7]

A. Farinelli, A. Rogers, A. Petcu, and N. R. Jennings. Decentralised coordination of low-power embedded devices using the max-sum algorithm. In AAMAS, 2008.

Digital Library

[8]

C. Guestrin, D. Koller, R. Parr, and S. Venkataraman. Efficient solution algorithms for factored MDPs. JAIR, 19:399--468, 2003.

Digital Library

[9]

Y. Kim, M. Krainin, and V. Lesser. Application of max-sum algorithm to radar coordination and scheduling. In Workshop on Distributed Constraint Reasoning, 2010.

[10]

J. R. Kok and N. Vlassis. Collaborative multiagent reinforcement learning by payoff propagation. JMLR, 7:1789--1828, 2006.

Digital Library

[11]

D. Koller and R. Parr. Computing factored value functions for policies in structured MDPs. In IJCAI, 1999.

Digital Library

[12]

A. Kumar and S. Zilberstein. Constraint-based dynamic programming for decentralized POMDPs with structured interactions. In AAMAS, 2009.

Digital Library

[13]

A. Kumar, S. Zilberstein, and M. Toussaint. Scalable multiagent planning using probabilistic inference. In IJCAI, 2011.

Digital Library

[14]

J. Marecki, T. Gupta, P. Varakantham, M. Tambe, and M. Yokoo. Not all agents are equal: scaling up distributed POMDPs for agent networks. In AAMAS, 2008.

Digital Library

[15]

K. P. Murphy and Y. Weiss. The factored frontier algorithm for approximate inference in DBNs. In UAI, 2001.

Digital Library

[16]

R. Nair, P. Varakantham, M. Tambe, and M. Yokoo. Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In AAAI, 2005.

Digital Library

[17]

F. A. Oliehoek. Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environments. PhD thesis, Informatics Institute, Univ. of Amsterdam, 2010.

[18]

F. A. Oliehoek. Decentralized POMDPs. In M. Wiering and M. van Otterlo, editors, Reinforcement Learning: State of the Art. Springer Berlin Heidelberg, 2012.

[19]

F. A. Oliehoek, J. F. Kooi, and N. Vlassis. The cross-entropy method for policy search in decentralized POMDPs. Informatica, 32:341--357, 2008.

[20]

F. A. Oliehoek, M. T. J. Spaan, and N. Vlassis. Optimal and approximate Q-value functions for decentralized POMDPs. JAIR, 32:289--353, 2008.

[21]

F. A. Oliehoek, M. T. J. Spaan, S. Whiteson, and N. Vlassis. Exploiting locality of interaction in factored Dec-POMDPs. In AAMAS, 2008.

Digital Library

[22]

F. A. Oliehoek, S. Whiteson, and M. T. J. Spaan. Exploiting structure in cooperative Bayesian games. In UAI, 2012.

[23]

J. Pajarinen and J. Peltonen. Efficient planning for factored infinite-horizon DEC-POMDPs. In IJCAI, 2011.

Digital Library

[24]

Z. Rabinovich, C. V. Goldman, and J. S. Rosenschein. The complexity of multiagent systems: the price of silence. In AAMAS, 2003.

Digital Library

[25]

A. Rogers, A. Farinelli, R. Stranders, and N. Jennings. Bounded approximate decentralised coordination via the max-sum algorithm. Artif. Intel., 175(2):730--759, 2011.

Digital Library

[26]

S. Seuken and S. Zilberstein. Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems, 17(2):190--250, 2008.

Digital Library

[27]

M. T. J. Spaan, F. A. Oliehoek, and C. Amato. Scaling up optimal heuristic search in Dec-POMDPs via incremental expansion. In IJCAI, 2011.

Digital Library

[28]

D. Szer, F. Charpillet, and S. Zilberstein. MAA*: A heuristic search algorithm for solving decentralized POMDPs. In UAI, 2005.

[29]

M. E. Taylor and P. Stone. Transfer learning for reinforcement learning domains: A survey. JMLR, 10:1633--1685, 2009.

Digital Library

[30]

P. Varakantham, J. Kwak, M. E. Taylor, J. Marecki, P. Scerri, and M. Tambe. Exploiting coordination locales in distributed POMDPs via social model shaping. In ICAPS, 2009.

Digital Library

[31]

P. Varakantham, J. Marecki, Y. Yabu, M. Tambe, and M. Yokoo. Letting loose a SPIDER on a network of POMDPs: Generating quality guaranteed policies. In AAMAS, 2007.

Digital Library

[32]

P. Velagapudi, P. Varakantham, P. Scerri, and K. Sycara. Distributed model shaping for scaling to decentralized POMDPs with hundreds of agents. In AAMAS, 2011.

Digital Library

[33]

S. J. Witwicki and E. H. Durfee. Influence-based policy abstraction for weakly-coupled Dec-POMDPs. In ICAPS, 2010.

[34]

F. Wu, S. Zilberstein, and X. Chen. Rollout sampling policy iteration for decentralized POMDPs. In UAI, 2010.

Cited By

Oliehoek FWitwicki SKaelbling L(2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12136
Pageaud SDeslandres VLehoux VHassas SElkind EVeloso MAgmon NTaylor M(2019)Multiagent Learning and Coordination with Clustered Deep Q-NetworkProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332042(2156-2158)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3332042
Amato CKonidaris GKaelbling LHow J(2019)Modeling and planning with macro-actions in decentralized POMDPsJournal of Artificial Intelligence Research10.1613/jair.1.1141864:1(817-859)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1613/jair.1.11418
Show More Cited By

Index Terms

Approximate solutions for factored Dec-POMDPs with many agents
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
      1. Multi-agent systems

Recommendations

Efficient planning for factored infinite-horizon DEC-POMDPs
IJCAI'11: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One

Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other ...
Exploiting locality of interaction in factored Dec-POMDPs
AAMAS '08: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1

Decentralized partially observable Markov decision processes (Dec-POMDPs) constitute an expressive framework for multiagent planning under uncertainty, but solving them is provably intractable. We demonstrate how their scalability can be improved by ...
Information Gathering in Decentralized POMDPs by Policy Graph Improvement
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Decentralized policies for information gathering are required when multiple autonomous agents are deployed to collect data about a phenomenon of interest without the ability to communicate. Decentralized partially observable Markov decision processes (...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

AAMAS '13: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

May 2013

1500 pages

ISBN:9781450319935

General Chairs:
Maria Gini
University of Minnesota, USA
,
Onn Shehory
IBM Haifa Research Lab, Israel
,
Program Chairs:
Takayuki Ito
Nagoya Institute of Technology, Japan
,
Catholijn Jonker
Delft Institute of Technology, The Netherlands

Sponsors

IFAAMAS

In-Cooperation

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2013

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AAMAS '13

Sponsor:

AAMAS '13: International conference on Autonomous Agents and Multi-Agent Systems

May 6 - 10, 2013

MN, St. Paul, USA

Acceptance Rates

AAMAS '13 Paper Acceptance Rate 140 of 599 submissions, 23%;

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
163
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Oliehoek FWitwicki SKaelbling L(2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12136
Pageaud SDeslandres VLehoux VHassas SElkind EVeloso MAgmon NTaylor M(2019)Multiagent Learning and Coordination with Clustered Deep Q-NetworkProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332042(2156-2158)Online publication date: 8-May-2019
https://dl.acm.org/doi/10.5555/3306127.3332042
Amato CKonidaris GKaelbling LHow J(2019)Modeling and planning with macro-actions in decentralized POMDPsJournal of Artificial Intelligence Research10.1613/jair.1.1141864:1(817-859)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1613/jair.1.11418
Rizk YAwad MTunstel E(2019)Cooperative Heterogeneous Multi-Robot SystemsACM Computing Surveys10.1145/330384852:2(1-31)Online publication date: 9-Apr-2019
https://dl.acm.org/doi/10.1145/3303848
Hernandez-Leal PKartal BTaylor M(2019)A survey and critique of multiagent deep reinforcement learningAutonomous Agents and Multi-Agent Systems10.1007/s10458-019-09421-133:6(750-797)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1007/s10458-019-09421-1
Oliehoek F(2018)Interactive learning and decision makingProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304829(5703-5708)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304652.3304829
Conroy RZeng YTang J(2016)Approximating value equivalence in interactive dynamic influence diagrams using behavioral coverageProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060621.3060650(201-207)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3060621.3060650
Scharpff JRoijers DOliehoek FSpaan Mde Weerdt M(2016)Solving transition-independent multi-agent MDPs with sparse interactionsProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016347(3174-3180)Online publication date: 12-Feb-2016
https://dl.acm.org/doi/10.5555/3016100.3016347
Conroy RZeng YCavazza MTang JPan YJonker CMarsella SThangarajah JTuyls K(2016)A Value Equivalence Approach for Solving Interactive Dynamic Influence DiagramsProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937094(1162-1170)Online publication date: 9-May-2016
https://dl.acm.org/doi/10.5555/2936924.2937094
Zeng YDoshi PChen YPan YMao HChandrasekaran M(2016)Approximating behavioral equivalence for scaling solutions of I-DIDsKnowledge and Information Systems10.1007/s10115-015-0912-x49:2(511-552)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1007/s10115-015-0912-x
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten