Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1553374.1553501acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Fast gradient-descent methods for temporal-difference learning with linear function approximation

Published: 14 June 2009 Publication History

Abstract

Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning algorithm compatible with both linear function approximation and off-policy training, and whose complexity scales only linearly in the size of the function approximator. Although their gradient temporal difference (GTD) algorithm converges reliably, it can be very slow compared to conventional linear TD (on on-policy problems where TD is convergent), calling into question its practical utility. In this paper we introduce two new related algorithms with better convergence rates. The first algorithm, GTD2, is derived and proved convergent just as GTD was, but uses a different objective function and converges significantly faster (but still not as fast as conventional TD). The second new algorithm, linear TD with gradient correction, or TDC, uses the same update rule as conventional TD except for an additional term which is initially zero. In our experiments on small test problems and in a Computer Go application with a million features, the learning rate of this algorithm was comparable to that of conventional TD. This algorithm appears to extend linear TD to off-policy learning with no penalty in performance while only doubling computational requirements.

References

[1]
Antos, A., Szepesvári, Cs., Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning 71:89--129.
[2]
Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th Int. Conf. on Machine Learning, pp. 30--37.
[3]
Baird, L. C. (1999). Reinforcement Learning Through Gradient Descent. PhD thesis, Carnegie-Mellon University.
[4]
Barnard, E. (1993). Temporal-difference methods and Markov models. IEEE Transactions on Systems, Man, and Cybernetics 23(2):357--365.
[5]
Borkar, V. S. (1997). Stochastic approximation with two timescales. Systems and Control Letters 29:291--294.
[6]
Borkar, V. S. and Meyn, S. P. (2000). The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control And Optimization 38(2):447--469.
[7]
Boyan, J. (2002). Technical update: Least-squares temporal difference learning. Machine Learning 49:233--246.
[8]
Bradtke, S., Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning 22:33--57.
[9]
Dayan, P. (1992). The convergence of TD(λ) for general λ. Machine Learning 8:341--362.
[10]
Geramifard, A., Bowling, M., Sutton, R. S. (2006). Incremental least-square temporal difference learning. Proceedings AAAI, pp. 356--361.
[11]
Hirsch, M. W. (1989). Convergent activation dynamics in continuous time networks. Neural Networks 2:331--349.
[12]
Precup, D., Sutton, R. S. and Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. Proceedings of the 18th International Conference on Machine Learning, pp. 417--424.
[13]
Precup, D., Sutton, R. S., Paduraru, C., Koop, A., Singh, S. (2006). Off-policy learning with recognizers. Advances in Neural Information Processing Systems 18.
[14]
Silver, D., Sutton, R. S., Müller, M. (2007). Reinforcement learning of local shape in the game of Go. Proceedings of the 20th IJCAI, pp. 1053--1058.
[15]
Sturtevant, N. R., White, A. M. (2006). Feature construction for reinforcement learning in hearts. In Proceedings of the 5th International Conf. on Computers and Games.
[16]
Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning 3:9--44.
[17]
Sutton, R. S., Precup D., and Singh, S (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112:181--211.
[18]
Sutton, R. S., Szepesvári, Cs., Maei, H. R. (2009). A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation. Advances in Neural Information Processing Systems 21.

Cited By

View all
  • (2024)On Convergence Rate of MRetraceMathematics10.3390/math1218293012:18(2930)Online publication date: 20-Sep-2024
  • (2024)Distributed Multi-Agent Gradient Based Q-Learning with Linear Function Approximation2024 European Control Conference (ECC)10.23919/ECC64448.2024.10590764(2500-2505)Online publication date: 25-Jun-2024
  • (2024)Joint Principal Component Analysis and Supervised k Means Algorithm via Non-Iterative Analytic Optimization ApproachIEEE Transactions on Signal Processing10.1109/TSP.2024.336594472(1348-1360)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)142
  • Downloads (Last 6 weeks)18
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On Convergence Rate of MRetraceMathematics10.3390/math1218293012:18(2930)Online publication date: 20-Sep-2024
  • (2024)Distributed Multi-Agent Gradient Based Q-Learning with Linear Function Approximation2024 European Control Conference (ECC)10.23919/ECC64448.2024.10590764(2500-2505)Online publication date: 25-Jun-2024
  • (2024)Joint Principal Component Analysis and Supervised k Means Algorithm via Non-Iterative Analytic Optimization ApproachIEEE Transactions on Signal Processing10.1109/TSP.2024.336594472(1348-1360)Online publication date: 1-Jan-2024
  • (2024)High-Probability Sample Complexities for Policy Evaluation With Linear Function ApproximationIEEE Transactions on Information Theory10.1109/TIT.2024.339468570:8(5969-5999)Online publication date: Aug-2024
  • (2024)Continuous-Time Distributed Dynamic Programming for Networked Multi-Agent Markov Decision Processes2024 IEEE 18th International Conference on Control & Automation (ICCA)10.1109/ICCA62789.2024.10591854(960-967)Online publication date: 18-Jun-2024
  • (2024)Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlookExpert Systems with Applications10.1016/j.eswa.2023.122254238(122254)Online publication date: Mar-2024
  • (2024)Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement LearningInternational Journal of Computational Intelligence Systems10.1007/s44196-024-00560-217:1Online publication date: 24-Jun-2024
  • (2024)On the Analysis of Model-Free Methods for the Linear Quadratic RegulatorJournal of the Operations Research Society of China10.1007/s40305-024-00546-zOnline publication date: 9-Jul-2024
  • (2024)Finite-time error bounds for Greedy-GQMachine Learning10.1007/s10994-024-06542-x113:9(5981-6018)Online publication date: 30-Apr-2024
  • (2024)Human operator decision support for highly transient industrial processes: a reinforcement learning approachJournal of Intelligent Manufacturing10.1007/s10845-023-02295-xOnline publication date: 11-Jan-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media