Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3454287.3455555guideproceedingsArticle/Chapter ViewAbstractPublication PagesMonograph
chapter
Free access

Using a logarithmic mapping to enable lower discount factors in reinforcement learning

Published: 08 December 2019 Publication History

Abstract

In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.

References

[1]
Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
[2]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
[3]
Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 1994.
[4]
Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279-292, 1992.
[5]
Tommi Jaakkola, Michael I. Jordan, and Satinder P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. In Advances in Neural Information Processing Systems, pages 703-710, 1994.
[6]
Fabio Pardo, Arash Tavakoli, Vitaly Levdik, and Petar Kormushev. Time limits in reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 4045-4054, 2018.
[7]
Richard S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems, pages 1038-1044, 1996.
[8]
Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip Thomas, and Rémi Munos. Increasing the action gap: New operators for reinforcement learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, pages 1476-1483, 2016.
[9]
Amir-massoud Farahmand. Action-gap phenomenon in reinforcement learning. In Advances in Neural Information Processing Systems, pages 172-180, 2011.
[10]
Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, and Olivier Pietquin. Observe and look further: Achieving consistent performance on Atari. arXiv preprint arXiv:1805.11593, 2018.
[11]
Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, and Marc G. Bellemare. Dopamine: A research framework for deep reinforcement learning. arXiv preprint arXiv:1812.06110, 2018.
[12]
Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47: 253-279, 2013.
[13]
Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, and Michael Bowling. Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research, 61:523-562, 2018.
[14]
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, volume 48, pages 1995-2003, 2016.
[15]
Marc G. Bellemare, Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 449-458, 2017.
[16]
Will Dabney, Georg Ostrovski, David Silver, and Rémi Munos. Implicit quantile networks for distributional reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 1096-1105, 2018.
[17]
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pages 3215-3222, 2018.

Index Terms

  1. Using a logarithmic mapping to enable lower discount factors in reinforcement learning
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems
          December 2019
          15947 pages

          In-Cooperation

          Publisher

          Curran Associates Inc.

          Red Hook, NY, United States

          Publication History

          Published: 08 December 2019

          Qualifiers

          • Chapter
          • Research
          • Refereed limited

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 34
            Total Downloads
          • Downloads (Last 12 months)25
          • Downloads (Last 6 weeks)5
          Reflects downloads up to 19 Nov 2024

          Other Metrics

          Citations

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media