Learning Rates for Q-Learning

Eyal Even-Dar³ &
Yishay Mansour³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2111))

Included in the following conference series:

International Conference on Computational Learning Theory

2468 Accesses
17 Citations

Abstract

In this paper we derive convergence rates for Q-learning. We show an interesting relationship between the convergence rate and the learning rate used in the Q-learning. For a polynomial learning rate, one which is 1/t ^ω at time t where ω ε (1/2, 1), we show that that the convergence rate is polynomial in 1/(1 - γ), where γ is the discount factor. In contrast we show that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1 - γ). In addition we show a simple example that proves that this exponential behavior is inherent for a linear learning rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Multiscale Q-learning with linear function approximation

Article 30 August 2015

Convergence results for an averaged LQR problem with applications to reinforcement learning

Article Open access 08 July 2021

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme

References

F. Beleznay, T. Grobler, and Cs. Szepesvari. Comparing value-function estimation algorithms in undiscounted problems. Technical Report TR-99-02, Mindmaker Ltd, 1999.
Google Scholar
V.S. Borkar and S.P. Meyn. The o.d.e method for convergence of stochstic approximation and reinforcement learning. Siam J. control, 38(2):447–69, 2000.
Article MATH MathSciNet Google Scholar
Dimitri P. Bertsekas and Jhon N. Tsitsklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996.
MATH Google Scholar
T. Jaakkola, M.I. Jordan, and S.P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6, 1994.
Google Scholar
Michael Kearns and Stinder Singh. Finite-sample convergence rates for qlearning and indirect algorithms. In Neural Information Processing Systems 10, 1998.
Google Scholar
Littman M. and Cs. Szepesvari. A generalized reinforcement learning model: convergence and applications. In In International Conference on Machine Learning, 1996.
Google Scholar
M.L Puterman. Markov Decision Processes-Discrete Stochastic Dynamic Programming. Jhon Wiley & Sons. Inc., New York, NY, 1994.
MATH Google Scholar
Richard S. Sutton and Andrew G. Bato. Reinforcement Learning. Mit press, 1998.
Google Scholar
Cs. Szepesvari. The asymptotic convergence-rate of q-learning. In Neural Information Processing Systems 10, pages 1064–1070, 1997.
Google Scholar
Jhon N. Tsitsklis. Asynchronous stochastic approximation and q-learning. Machine Learning, 16:185–202, 1994.
Google Scholar
C. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.
Google Scholar
C. Watking and P. Dyan. Q-learning. Machine Learning, 8(3/4):279–292, 1992.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Tel-Aviv University, Israel
Eyal Even-Dar & Yishay Mansour

Authors

Eyal Even-Dar
View author publications
You can also search for this author in PubMed Google Scholar
Yishay Mansour
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, Department of Computer Science, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
David Helmbold
Research School of Information Sciences and Engineering Department of Telecommunications Engineering, Australian National University, Canberra, 0200, Australia
Bob Williamson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Even-Dar, E., Mansour, Y. (2001). Learning Rates for Q-Learning. In: Helmbold, D., Williamson, B. (eds) Computational Learning Theory. COLT 2001. Lecture Notes in Computer Science(), vol 2111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44581-1_39

Download citation

DOI: https://doi.org/10.1007/3-540-44581-1_39
Published: 13 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42343-0
Online ISBN: 978-3-540-44581-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Learning Rates for Q-Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Multiscale Q-learning with linear function approximation

Convergence results for an averaged LQR problem with applications to reinforcement learning

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning Rates for Q-Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Multiscale Q-learning with linear function approximation

Convergence results for an averaged LQR problem with applications to reinforcement learning

Full Gradient DQN Reinforcement Learning: A Provably Convergent Scheme

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation