research-article

Preconditioned temporal difference learning

Authors:

Hengshuai Yao,

Zhi-Qiang LiuAuthors Info & Claims

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 1208 - 1215

https://doi.org/10.1145/1390156.1390308

Published: 05 July 2008 Publication History

Get Access

Abstract

This paper extends many of the recent popular policy evaluation algorithms to a generalized framework that includes least-squares temporal difference (LSTD) learning, least-squares policy evaluation (LSPE) and a variant of incremental LSTD (iLSTD). The basis of this extension is a preconditioning technique that solves a stochastic model equation. This paper also studies three significant issues of the new framework: it presents a new rule of step-size that can be computed online, provides an iterative way to apply preconditioning, and reduces the complexity of related algorithms to near that of temporal difference (TD) learning.

References

[1]

Boyan, J. A. (1999). Least-squares temporal difference learning. Proceedings of the Sixteenth International Conference on Machine Learning (pp. 49--56). Morgan Kaufmann.

Digital Library

Google Scholar

[2]

Bradtke, S., & Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22, 33--57.

Digital Library

Google Scholar

[3]

Geramifard, A., Bowling, M., & Sutton, R. S. (2006a). Incremental least-squares temporal difference learning. Twenty-First National Conference on Artificial Intelligence (AAAI-06) (pp. 356--361). AAAI Press.

Digital Library

Google Scholar

[4]

Geramifard, A., Bowling, M., Zinkevich, M., & Sutton, R. S. (2006b). iLSTD: Eligibility traces and convergence analysis. Advances in Neural Information Processing Systems 19 (pp. 441--448).

Google Scholar

[5]

Lin, L.-J., & Mitchell, T. M. (1992). Memory approaches to reinforcement learning in non-markovian domains (Technical Report CMU-CS-92-138). Carnegie Mellon University, Pittsburgh, PA 15213.

Digital Library

Google Scholar

[6]

Nedić, A., & Bertsekas, D. P. (2003). Least-squares policy evaluation algorithms with linear function approximation. Journal of Discrete Event Systems, 13, 79--110.

Digital Library

Google Scholar

[7]

Saad, Y. (2003). Iterative methods for sparse linear systems. SIAM.

Digital Library

Google Scholar

[8]

Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9--44.

Digital Library

Google Scholar

[9]

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press.

Digital Library

Google Scholar

[10]

Tadić, V. (2001). On the convergence of temporal-difference learning with linear function approximation. Machine Learning, 42, 241--267.

Digital Library

Google Scholar

[11]

Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674--690.

Crossref

Google Scholar

[12]

Xu, X., He, H., & Hu, D. (2002). Efficient reinforcement learning using recursive least-squares methods. Journal of Artificial Intelligence Research, 16, 259--292.

Digital Library

Google Scholar

[13]

Yao, H., & Liu, Z. (2008). Preconditioned temporal difference learning (Technical Report CityU-SCMMCG-0408). City University of Hong Kong.

Google Scholar

Cited By

View all

Chen XMa XLi YYang GYang SGao YEvans RShpitser I(2023)Modified retrace for off-policy temporal difference learningProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625863(303-312)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625863
Fellows MSmith MWhiteson SKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Why target networks stabilise temporal difference methodsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618803(9886-9909)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618803
Devraj AMeyn S(2017)Zap Q-learningProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3294771.3294984(2232-2241)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.5555/3294771.3294984
Show More Cited By

Index Terms

Preconditioned temporal difference learning

Recommendations

Investigating Practical Linear Temporal Difference Learning
AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems

Off-policy reinforcement learning has many applications including: learning from demonstration, learning multiple goal seeking policies in parallel, and representing predictive knowledge. Recently there has been an proliferation of new policy-evaluation ...
Incremental least-squares temporal difference learning
AAAI'06: Proceedings of the 21st national conference on Artificial intelligence - Volume 1

Approximate policy evaluation with linear function approximation is a commonly arising problem in reinforcement learning, usually solved using temporal difference (TD) algorithms. In this paper we introduce a new variant of linear TD learning, called ...
Adaptively Preconditioned GMRES Algorithms

The restarted GMRES algorithm proposed by Saad and Schultz [SIAM J. Sci. Statist. Comput., 7 (1986), pp. 856--869] is one of the most popular iterative methods for the solution of large linear systems of equations Ax=b with a nonsymmetric and sparse ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

ICML '08: Proceedings of the 25th international conference on Machine learning

July 2008

1310 pages

ISBN:9781605582054

DOI:10.1145/1390156

General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Research Grants Council, University Grants Committee, Hong Kong

Conference

ICML '08

Sponsor:

Microsoft Research
Intel
IBM

ICML '08: The 25th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

July 5 - 9, 2008

Helsinki, Finland

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
168
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chen XMa XLi YYang GYang SGao YEvans RShpitser I(2023)Modified retrace for off-policy temporal difference learningProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625863(303-312)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625863
Fellows MSmith MWhiteson SKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Why target networks stabilise temporal difference methodsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618803(9886-9909)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618803
Devraj AMeyn S(2017)Zap Q-learningProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3294771.3294984(2232-2241)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.5555/3294771.3294984
Bertsekas D(2011)Approximate policy iteration: a survey and some new methodsJournal of Control Theory and Applications10.1007/s11768-011-1005-39:3(310-335)Online publication date: 19-Jul-2011
https://doi.org/10.1007/s11768-011-1005-3
Yu H(2010)Convergence of least squares temporal difference methods under general conditionsProceedings of the 27th International Conference on International Conference on Machine Learning10.5555/3104322.3104475(1207-1214)Online publication date: 21-Jun-2010
https://dl.acm.org/doi/10.5555/3104322.3104475

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Investigating Practical Linear Temporal Difference Learning

Incremental least-squares temporal difference learning

Adaptively Preconditioned GMRES Algorithms