Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2343576.2343638acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Dynamic potential-based reward shaping

Published: 04 June 2012 Publication History

Abstract

Potential-based reward shaping can significantly improve the time needed to learn an optimal policy and, in multi-agent systems, the performance of the final joint-policy. It has been proven to not alter the optimal policy of an agent learning alone or the Nash equilibria of multiple agents learning together.
However, a limitation of existing proofs is the assumption that the potential of a state does not change dynamically during the learning. This assumption often is broken, especially if the reward-shaping function is generated automatically.
In this paper we prove and demonstrate a method of extending potential-based reward shaping to allow dynamic shaping and maintain the guarantees of policy invariance in the single-agent case and consistent Nash equilibria in the multi-agent case.

References

[1]
J. Asmuth, M. Littman, and R. Zinkov. Potential-based shaping in model-based reinforcement learning. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, pages 604--609, 2008.
[2]
M. Babes, E. de Cote, and M. Littman. Social reward shaping in the prisoner's dilemma. In Proceedings of The Seventh Annual International Conference on Autonomous Agents and Multiagent Systems, volume 3, pages 1389--1392, 2008.
[3]
D. P. Bertsekas. Dynamic Programming and Optimal Control (2 Vol Set). Athena Scientific, 3rd edition, 2007.
[4]
C. Boutilier. Sequential optimality and coordination in multiagent systems. In International Joint Conference on Artificial Intelligence, volume 16, pages 478--485, 1999.
[5]
L. Busoniu, R. Babuska, and B. De Schutter. A Comprehensive Survey of MultiAgent Reinforcement Learning. IEEE Transactions on Systems Man & Cybernetics Part C Applications and Reviews, 38(2):156, 2008.
[6]
C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence, pages 746--752, 1998.
[7]
S. Devlin, M. Grześ, and D. Kudenko. An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Advances in Complex Systems, 2011.
[8]
S. Devlin and D. Kudenko. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of The Tenth Annual International Conference on Autonomous Agents and Multiagent Systems, 2011.
[9]
J. Filar and K. Vrieze. Competitive Markov decision processes. Springer Verlag, 1997.
[10]
M. Grześ and D. Kudenko. Plan-based reward shaping for reinforcement learning. In Proceedings of the 4th IEEE International Conference on Intelligent Systems (IS'08), pages 22--29. IEEE, 2008.
[11]
M. Grześ and D. Kudenko. Online learning of shaping rewards in reinforcement learning. Artificial Neural Networks-ICANN 2010, pages 541--550, 2010.
[12]
A. Laud. Theory and application of reward shaping in reinforcement learning. PhD thesis, University of Illinois at Urbana-Champaign, 2004.
[13]
B. Marthi. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine learning, page 608. ACM, 2007.
[14]
M. Matarić. Reinforcement learning in the multi-robot domain. Autonomous Robots, 4(1):73--83, 1997.
[15]
A. Y. Ng, D. Harada, and S. J. Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, pages 278--287, 1999.
[16]
M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1994.
[17]
J. Randløv and P. Alstrom. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, pages 463--471, 1998.
[18]
Y. Shoham, R. Powers, and T. Grenager. If multi-agent learning is the answer, what is the question? Artificial Intelligence, 171(7):365--377, 2007.
[19]
P. Stone and M. Veloso. Team-partitioned, opaque-transition reinforcement learning. In Proceedings of the third annual conference on Autonomous Agents, pages 206--212. ACM, 1999.
[20]
R. S. Sutton. Temporal credit assignment in reinforcement learning. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, 1984.
[21]
R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
[22]
C. Watkins and P. Dayan. Q-learning. Machine learning, 8(3):279--292, 1992.
[23]
E. Wiewiora. Potential-based shaping and Q-value initialization are equivalent. Journal of Artificial Intelligence Research, 19(1):205--208, 2003.
[24]
E. Wiewiora, G. Cottrell, and C. Elkan. Principled methods for advising reinforcement learning agents. In Proceedings of the Twentieth International Conference on Machine Learning, 2003.

Cited By

View all
  • (2024)Potential-Based Reward Shaping for Intrinsic MotivationProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662910(589-597)Online publication date: 6-May-2024
  • (2023)Efficient potential-based exploration in reinforcement learning using inverse dynamic bisimulation metricProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667805(38786-38797)Online publication date: 10-Dec-2023
  • (2023)Sample efficient model-free reinforcement learning from ltl specifications with optimality guaranteesProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/465(4180-4189)Online publication date: 19-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
AAMAS '12: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
June 2012
592 pages
ISBN:0981738117

Sponsors

  • The International Foundation for Autonomous Agents and Multiagent Systems: The International Foundation for Autonomous Agents and Multiagent Systems

In-Cooperation

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 04 June 2012

Check for updates

Author Tags

  1. reinforcement learning
  2. reward shaping

Qualifiers

  • Research-article

Conference

AAMAS 12
Sponsor:
  • The International Foundation for Autonomous Agents and Multiagent Systems

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)5
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Potential-Based Reward Shaping for Intrinsic MotivationProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662910(589-597)Online publication date: 6-May-2024
  • (2023)Efficient potential-based exploration in reinforcement learning using inverse dynamic bisimulation metricProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667805(38786-38797)Online publication date: 10-Dec-2023
  • (2023)Sample efficient model-free reinforcement learning from ltl specifications with optimality guaranteesProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/465(4180-4189)Online publication date: 19-Aug-2023
  • (2022)Towards trustworthy automatic diagnosis systems by emulating doctors' reasoning with deep reinforcement learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602049(24502-24515)Online publication date: 28-Nov-2022
  • (2021)Online Learning of Shaping Reward with Subgoal KnowledgeProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3463952.3464177(1613-1615)Online publication date: 3-May-2021
  • (2020)Adaptive reward-poisoning attacks against reinforcement learningProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525979(11225-11234)Online publication date: 13-Jul-2020
  • (2019)Policy poisoning in batch reinforcement learning and controlProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455592(14570-14580)Online publication date: 8-Dec-2019
  • (2019)Hierarchical reinforcement learning with advantage-based auxiliary rewardsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454413(1409-1419)Online publication date: 8-Dec-2019
  • (2018)Reinforcement learning with multiple expertsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327623(9549-9559)Online publication date: 3-Dec-2018
  • (2018)Keeping intelligence under controlProceedings of the 1st International Workshop on Software Engineering for Cognitive Services10.1145/3195555.3195558(37-40)Online publication date: 28-May-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media