Conjugate Directions for Stochastic Gradient Descent

Nicol N. Schraudolph⁵ &
Thore Graepel⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2415))

Included in the following conference series:

International Conference on Artificial Neural Networks

186 Accesses

Abstract

The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) setting, using fast Hessian-gradient products to set up low-dimensional Krylov subspaces within individual mini-batches. In our benchmark experiments the resulting online learning algorithms converge orders of magnitude faster than ordinary stochastic gradient descent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The stochastic proximal distance algorithm

Article 01 November 2024

Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator

Article 16 August 2019

A link between the steepest descent method and fixed-point iterations

Article Open access 18 March 2022

References

K. Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly Journal of Applied Mathematics, 11(2):164–168, 1944.
MathSciNet Google Scholar
D. Marquardt. An algorithm for least-squares estimation of non-linear parameters. Journal of the Society of Industrial and Applied Mathematics, 11(2):431–441, 1963.
Article MATH MathSciNet Google Scholar
M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49:409–436, 1952.
MATH MathSciNet Google Scholar
G.B. Orr. Dynamics and Algorithms for Stochastic Learning. PhD thesis, Department of Computer Science and Engineering, Oregon Graduate Institute, Beaverton, OR 97006, 1995. ftp://neural.cse.ogi.edu/pub/neural/papers/orrPhDchi-5.ps.Z,orrPhDch6-9.ps.Z.
Google Scholar
T. Graepel and N. N. Schraudolph. Stable adaptive momentum for rapid online learning in nonlinear systems. In Proceedings of the International Conference on Artificial Neural Networks (to appear), Lecture Notes in Computer Science. Springer Verlag, Berlin, 2002.
Google Scholar
N. N. Schraudolph. Local gain adaptation in stochastic gradient descent. In Proceedings of the 9th International Conference on Artificial Neural Networks, pages 569–574, Edinburgh, Scotland, 1999. IEE, London. http://www.inf.ethz.ch/~schraudo/pubs/smd.ps.gz.
Chapter Google Scholar
N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7), 2002. http://www.inf.ethz.ch/~schraudo/pubs/mvp.ps.gz.
B. A. Pearlmutter. Fast exact multiplication by the Hessian. Neural Computation, 6(1):147–160, 1994.
Article Google Scholar
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, second edition, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computational Science, ETH Zürich, Switzerland
Nicol N. Schraudolph & Thore Graepel

Authors

Nicol N. Schraudolph
View author publications
You can also search for this author in PubMed Google Scholar
Thore Graepel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ETS Informática, Universidad Autónoma de Madrid, 28049, Madrid, Spain
José R. Dorronsoro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schraudolph, N.N., Graepel, T. (2002). Conjugate Directions for Stochastic Gradient Descent. In: Dorronsoro, J.R. (eds) Artificial Neural Networks — ICANN 2002. ICANN 2002. Lecture Notes in Computer Science, vol 2415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46084-5_218

Download citation

DOI: https://doi.org/10.1007/3-540-46084-5_218
Published: 21 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44074-1
Online ISBN: 978-3-540-46084-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics