Asynchronous Stochastic Approximation and Q-Learning

John N. Tsitsiklis¹

1347 Accesses
456 Citations
Explore all metrics

Abstract

We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.

Article PDF

Stochastic Approximation Algorithms

Stochastic Approximation Methods and Their Finite-Time Convergence Properties

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Barto, A.G., Bradtke, S.J., and S.P., Singh (1991). Real-time Learning and Control Using Asynchronous Dynamic Programming, (Technical Report 91-57). Amherst, MA: University of Massachusetts, Computer Science Dept.
Google Scholar
Bertsekas, D.P. (1982). Distributed Dynamic Programming. IEEE Transactions on Automatic Control, AC-27, 610–616.
Google Scholar
Bertsekas, D.P. and Tsitsiklis, J.N. (1989). Parallel and Distributed Computation: Numerical Methods, Englewood Cliffs, NJ: Prentice Hall.
Google Scholar
Bertsekas, D.P. and Tsitsiklis, J.N. (1991). An Analysis of Stochastic Shortest Path Problems. Mathematics of Operations Research, 16, 580–595.
Google Scholar
Dayan, P. (1992). The Convergence of TD(λ) for general λ. Machine Learning, 8, 341–362.
Google Scholar
Kushner, H.J. and Clark, D.S. (1978). Stochastic Approximation Methods for Constrained and Unconstrained Problems, New York, NY: Springer Verlag.
Google Scholar
Kushner, H.J. and Yin, G., (1987). Stochastic Approximation Algorithms for Parallel and Distributed Processing, Stochastics, 22, 219–250.
Google Scholar
Kushner, H.J. and Yin, G. (1987). Asymptotic Properties of Distributed and Communicating Stochastic Approximation Algorithms. SIAM J. Control and Optimization, 25, 1266–1290.
Google Scholar
Li, S. and Basar, T. (1987). Asymptotic Agreement and Convergence of Asynchronous Stochastic Algorithms. IEEE Transactions on Automatic Control, 32, 612–618.
Google Scholar
Moore, A.W. and Atkeson, C.G. (1992). Memory-based Reinforcement Learning: Converging with Less Data and Less Real Time, preprint, July 1992.
Poljak, B.T. and Tsypkin, Y. Z. (1973). Pseudogradient Adaptation and Training Algorithms. Automation and Remote Control, 12, 83–94.
Google Scholar
Sutton, R.S. (1988). Learning to Predict by the Method of Temporal Differences. Machine Learning, 3, 9–44.
Google Scholar
Sutton, R.S., Barto, A.G., and Williams, R.J. (1992). Reinforcement Learning is Direct Adaptive Control. IEEE Control Systems Magazine, April, 19–22.
Tsitsiklis, J.N., Bertsekas, D.P., and Athans, M. (1986). Distributed Deterministic and Stochastic Gradient Optimization Algorithms. IEEE Transactions on Automatic Control. 31, 803–812.
Google Scholar
Watkins, C.I.C.H. (1989). Learning from Delayed Rewards. Doctoral dissertation. University of Cambridge, Cambridge, United Kingdom.
Watkins, C.I.C.H. and Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA, 02139
John N. Tsitsiklis

Authors

John N. Tsitsiklis
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tsitsiklis, J.N. Asynchronous Stochastic Approximation and Q-Learning. Machine Learning 16, 185–202 (1994). https://doi.org/10.1023/A:1022689125041

Download citation

Issue Date: September 1994
DOI: https://doi.org/10.1023/A:1022689125041

Asynchronous Stochastic Approximation and Q-Learning

Abstract

Article PDF

Similar content being viewed by others

Stochastic Approximation Algorithms

Stochastic Approximation Methods and Their Finite-Time Convergence Properties

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Asynchronous Stochastic Approximation and Q-Learning

Abstract

Article PDF

Similar content being viewed by others

Stochastic Approximation Algorithms

Stochastic Approximation Methods and Their Finite-Time Convergence Properties

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation