Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2484920.2485010acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Approximate solutions for factored Dec-POMDPs with many agents

Published: 06 May 2013 Publication History

Abstract

Dec-POMDPs are a powerful framework for planning in multiagent systems, but are provably intractable to solve. Despite recent work on scaling to more agents by exploiting weak couplings in factored models, scalability for unrestricted subclasses remains limited. This paper proposes a factored forward-sweep policy computation method that tackles the stages of the problem one by one, exploiting weakly coupled structure at each of these stages. To enable the method to scale to many agents, we propose a set of approximations: approximation of stages using a sparse interaction structure, bootstrapping off smaller tasks to compute heuristic payoff functions, and employing approximate inference to estimate required probabilities at each stage and to compute the best decision rules. An empirical evaluation shows that the loss in solution quality due to these approximations is small and that the proposed method achieves unprecedented scalability, solving Dec-POMDPs with hundreds of agents.

References

[1]
R. Becker, S. Zilberstein, V. Lesser, and C. V. Goldman. Transition-independent decentralized Markov decision processes. In AAMAS, 2003.
[2]
D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. The complexity of decentralized control of Markov decision processes. Math. of OR, 27(4):819--840, 2002.
[3]
C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. JAIR, 11:1--94, 1999.
[4]
X. Boyen and D. Koller. Tractable inference for complex stochastic processes. In UAI, 1998.
[5]
R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for partially observable stochastic games with common payoffs. In AAMAS, 2004.
[6]
R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Game theoretic control for robot teams. In Proc. of the IEEE Int. Conf. on Robotics and Automation, 2005.
[7]
A. Farinelli, A. Rogers, A. Petcu, and N. R. Jennings. Decentralised coordination of low-power embedded devices using the max-sum algorithm. In AAMAS, 2008.
[8]
C. Guestrin, D. Koller, R. Parr, and S. Venkataraman. Efficient solution algorithms for factored MDPs. JAIR, 19:399--468, 2003.
[9]
Y. Kim, M. Krainin, and V. Lesser. Application of max-sum algorithm to radar coordination and scheduling. In Workshop on Distributed Constraint Reasoning, 2010.
[10]
J. R. Kok and N. Vlassis. Collaborative multiagent reinforcement learning by payoff propagation. JMLR, 7:1789--1828, 2006.
[11]
D. Koller and R. Parr. Computing factored value functions for policies in structured MDPs. In IJCAI, 1999.
[12]
A. Kumar and S. Zilberstein. Constraint-based dynamic programming for decentralized POMDPs with structured interactions. In AAMAS, 2009.
[13]
A. Kumar, S. Zilberstein, and M. Toussaint. Scalable multiagent planning using probabilistic inference. In IJCAI, 2011.
[14]
J. Marecki, T. Gupta, P. Varakantham, M. Tambe, and M. Yokoo. Not all agents are equal: scaling up distributed POMDPs for agent networks. In AAMAS, 2008.
[15]
K. P. Murphy and Y. Weiss. The factored frontier algorithm for approximate inference in DBNs. In UAI, 2001.
[16]
R. Nair, P. Varakantham, M. Tambe, and M. Yokoo. Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In AAAI, 2005.
[17]
F. A. Oliehoek. Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environments. PhD thesis, Informatics Institute, Univ. of Amsterdam, 2010.
[18]
F. A. Oliehoek. Decentralized POMDPs. In M. Wiering and M. van Otterlo, editors, Reinforcement Learning: State of the Art. Springer Berlin Heidelberg, 2012.
[19]
F. A. Oliehoek, J. F. Kooi, and N. Vlassis. The cross-entropy method for policy search in decentralized POMDPs. Informatica, 32:341--357, 2008.
[20]
F. A. Oliehoek, M. T. J. Spaan, and N. Vlassis. Optimal and approximate Q-value functions for decentralized POMDPs. JAIR, 32:289--353, 2008.
[21]
F. A. Oliehoek, M. T. J. Spaan, S. Whiteson, and N. Vlassis. Exploiting locality of interaction in factored Dec-POMDPs. In AAMAS, 2008.
[22]
F. A. Oliehoek, S. Whiteson, and M. T. J. Spaan. Exploiting structure in cooperative Bayesian games. In UAI, 2012.
[23]
J. Pajarinen and J. Peltonen. Efficient planning for factored infinite-horizon DEC-POMDPs. In IJCAI, 2011.
[24]
Z. Rabinovich, C. V. Goldman, and J. S. Rosenschein. The complexity of multiagent systems: the price of silence. In AAMAS, 2003.
[25]
A. Rogers, A. Farinelli, R. Stranders, and N. Jennings. Bounded approximate decentralised coordination via the max-sum algorithm. Artif. Intel., 175(2):730--759, 2011.
[26]
S. Seuken and S. Zilberstein. Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems, 17(2):190--250, 2008.
[27]
M. T. J. Spaan, F. A. Oliehoek, and C. Amato. Scaling up optimal heuristic search in Dec-POMDPs via incremental expansion. In IJCAI, 2011.
[28]
D. Szer, F. Charpillet, and S. Zilberstein. MAA*: A heuristic search algorithm for solving decentralized POMDPs. In UAI, 2005.
[29]
M. E. Taylor and P. Stone. Transfer learning for reinforcement learning domains: A survey. JMLR, 10:1633--1685, 2009.
[30]
P. Varakantham, J. Kwak, M. E. Taylor, J. Marecki, P. Scerri, and M. Tambe. Exploiting coordination locales in distributed POMDPs via social model shaping. In ICAPS, 2009.
[31]
P. Varakantham, J. Marecki, Y. Yabu, M. Tambe, and M. Yokoo. Letting loose a SPIDER on a network of POMDPs: Generating quality guaranteed policies. In AAMAS, 2007.
[32]
P. Velagapudi, P. Varakantham, P. Scerri, and K. Sycara. Distributed model shaping for scaling to decentralized POMDPs with hundreds of agents. In AAMAS, 2011.
[33]
S. J. Witwicki and E. H. Durfee. Influence-based policy abstraction for weakly-coupled Dec-POMDPs. In ICAPS, 2010.
[34]
F. Wu, S. Zilberstein, and X. Chen. Rollout sampling policy iteration for decentralized POMDPs. In UAI, 2010.

Cited By

View all
  • (2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
  • (2019)Multiagent Learning and Coordination with Clustered Deep Q-NetworkProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332042(2156-2158)Online publication date: 8-May-2019
  • (2019)Modeling and planning with macro-actions in decentralized POMDPsJournal of Artificial Intelligence Research10.1613/jair.1.1141864:1(817-859)Online publication date: 1-Jan-2019
  • Show More Cited By

Index Terms

  1. Approximate solutions for factored Dec-POMDPs with many agents

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AAMAS '13: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
    May 2013
    1500 pages
    ISBN:9781450319935

    Sponsors

    • IFAAMAS

    In-Cooperation

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 06 May 2013

    Check for updates

    Author Tags

    1. factored decentralized partially observable markov decision processes
    2. multi-agent planning

    Qualifiers

    • Research-article

    Conference

    AAMAS '13
    Sponsor:

    Acceptance Rates

    AAMAS '13 Paper Acceptance Rate 140 of 599 submissions, 23%;
    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
    • (2019)Multiagent Learning and Coordination with Clustered Deep Q-NetworkProceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3306127.3332042(2156-2158)Online publication date: 8-May-2019
    • (2019)Modeling and planning with macro-actions in decentralized POMDPsJournal of Artificial Intelligence Research10.1613/jair.1.1141864:1(817-859)Online publication date: 1-Jan-2019
    • (2019)Cooperative Heterogeneous Multi-Robot SystemsACM Computing Surveys10.1145/330384852:2(1-31)Online publication date: 9-Apr-2019
    • (2019)A survey and critique of multiagent deep reinforcement learningAutonomous Agents and Multi-Agent Systems10.1007/s10458-019-09421-133:6(750-797)Online publication date: 1-Nov-2019
    • (2018)Interactive learning and decision makingProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304829(5703-5708)Online publication date: 13-Jul-2018
    • (2016)Approximating value equivalence in interactive dynamic influence diagrams using behavioral coverageProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060621.3060650(201-207)Online publication date: 9-Jul-2016
    • (2016)Solving transition-independent multi-agent MDPs with sparse interactionsProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016347(3174-3180)Online publication date: 12-Feb-2016
    • (2016)A Value Equivalence Approach for Solving Interactive Dynamic Influence DiagramsProceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems10.5555/2936924.2937094(1162-1170)Online publication date: 9-May-2016
    • (2016)Approximating behavioral equivalence for scaling solutions of I-DIDsKnowledge and Information Systems10.1007/s10115-015-0912-x49:2(511-552)Online publication date: 1-Nov-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media