A New Natural Policy Gradient by Stationary Distribution Metric

Tetsuro Morimura^1,2,
Eiji Uchibe¹,
Junichiro Yoshimoto^1,3 &
…
Kenji Doya^1,3,4

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5212))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

5927 Accesses
3 Citations

Abstract

The parameter space of a statistical learning machine has a Riemannian metric structure in terms of its objective function. [1] Amari proposed the concept of “natural gradient” that takes the Riemannian metric of the parameter space into account. Kakade [2] applied it to policy gradient reinforcement learning, called a natural policy gradient (NPG). Although NPGs evidently depend on the underlying Riemannian metrics, careful attention was not paid to the alternative choice of the metric in previous studies. In this paper, we propose a Riemannian metric for the joint distribution of the state-action, which is directly linked with the average reward, and derive a new NPG named “Natural State-action Gradient” (NSG). Then, we prove that NSG can be computed by fitting a certain linear model into the immediate reward function. In numerical experiments, we verify that the NSG learning can handle MDPs with a large number of states, for which the performances of the existing (N)PG methods degrade.

Download to read the full chapter text

Chapter PDF

Landscape Analysis of Stochastic Policy Gradient Methods

Compatible natural gradient policy search

Article Open access 20 May 2019

Homotopic policy mirror descent: policy convergence, algorithmic regularization, and improved sample complexity

Article 19 September 2023

Keywords

References

Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)
Article MathSciNet Google Scholar
Kakade, S.: A natural policy gradient. In: Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2002)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
MATH Google Scholar
Kimura, H., Miyazaki, K., Kobayashi, S.: Reinforcement learning in pomdps with function approximation. In: International Conference on Machine Learning, pp. 152–160 (1997)
Google Scholar
Baxter, J., Bartlett, P.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)
Article MATH MathSciNet Google Scholar
Fukumizu, K., Amari, S.: Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural Networks 13(3), 317–327 (2000)
Article Google Scholar
Morimura, T., Uchibe, E., Doya, K.: Utilizing natural gradient in temporal difference reinforcement learning with eligibility traces. In: International Symposium on Information Geometry and its Applications, pp. 256–263 (2005)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: European Conference on Machine Learning (2005)
Google Scholar
Richter, S., Aberdeen, D., Yu, J.: Natural actor-critic for road traffic optimisation. In: Advances in Neural Information Processing Systems. MIT Press, Cambridge (2007)
Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1, 2. Athena Scientific (1995)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Amari, S., Nagaoka, H.: Method of Information Geometry. Oxford University Press, Oxford (2000)
Google Scholar
Bagnell, D., Schneider, J.: Covariant policy search. In: Proceedings of the International Joint Conference on Artificial Intelligence (July 2003)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Reinforcement learning for humanoid robotics. In: IEEE-RAS International Conference on Humanoid Robots (2003)
Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, Heidelberg (2006)
MATH Google Scholar
Morimura, T., Uchibe, E., Yoshimoto, J., Doya, K.: Reinforcement learning with log stationary distribution gradient. Technical report, Nara Institute of Science and Technology (2007)
Google Scholar
Amari, S., Park, H., Fukumizu, K.: Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Computation 12(6), 1399–1409 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Initial Research Project, Okinawa Institute of Science and Technology, ,
Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto & Kenji Doya
IBM Research, Tokyo Research Laboratory, ,
Tetsuro Morimura
Graduate School of Information Science, Nara Institute of Science and Technology, ,
Junichiro Yoshimoto & Kenji Doya
ATR Computational Neuroscience Laboratories, ,
Kenji Doya

Authors

Tetsuro Morimura
View author publications
You can also search for this author in PubMed Google Scholar
Eiji Uchibe
View author publications
You can also search for this author in PubMed Google Scholar
Junichiro Yoshimoto
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Doya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Walter Daelemans Bart Goethals Katharina Morik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morimura, T., Uchibe, E., Yoshimoto, J., Doya, K. (2008). A New Natural Policy Gradient by Stationary Distribution Metric. In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87481-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-87481-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87480-5
Online ISBN: 978-3-540-87481-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A New Natural Policy Gradient by Stationary Distribution Metric

Abstract

Chapter PDF

Similar content being viewed by others

Landscape Analysis of Stochastic Policy Gradient Methods

Compatible natural gradient policy search

Homotopic policy mirror descent: policy convergence, algorithmic regularization, and improved sample complexity

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A New Natural Policy Gradient by Stationary Distribution Metric

Abstract

Chapter PDF

Similar content being viewed by others

Landscape Analysis of Stochastic Policy Gradient Methods

Compatible natural gradient policy search

Homotopic policy mirror descent: policy convergence, algorithmic regularization, and improved sample complexity

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation