Abstract
The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from conjugate gradient in the stochastic (online) setting, using fast Hessian-gradient products to set up low-dimensional Krylov subspaces within individual mini-batches. In our benchmark experiments the resulting online learning algorithms converge orders of magnitude faster than ordinary stochastic gradient descent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
K. Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly Journal of Applied Mathematics, 11(2):164–168, 1944.
D. Marquardt. An algorithm for least-squares estimation of non-linear parameters. Journal of the Society of Industrial and Applied Mathematics, 11(2):431–441, 1963.
M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49:409–436, 1952.
G.B. Orr. Dynamics and Algorithms for Stochastic Learning. PhD thesis, Department of Computer Science and Engineering, Oregon Graduate Institute, Beaverton, OR 97006, 1995. ftp://neural.cse.ogi.edu/pub/neural/papers/orrPhDchi-5.ps.Z,orrPhDch6-9.ps.Z.
T. Graepel and N. N. Schraudolph. Stable adaptive momentum for rapid online learning in nonlinear systems. In Proceedings of the International Conference on Artificial Neural Networks (to appear), Lecture Notes in Computer Science. Springer Verlag, Berlin, 2002.
N. N. Schraudolph. Local gain adaptation in stochastic gradient descent. In Proceedings of the 9th International Conference on Artificial Neural Networks, pages 569–574, Edinburgh, Scotland, 1999. IEE, London. http://www.inf.ethz.ch/~schraudo/pubs/smd.ps.gz.
N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7), 2002. http://www.inf.ethz.ch/~schraudo/pubs/mvp.ps.gz.
B. A. Pearlmutter. Fast exact multiplication by the Hessian. Neural Computation, 6(1):147–160, 1994.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, second edition, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schraudolph, N.N., Graepel, T. (2002). Conjugate Directions for Stochastic Gradient Descent. In: Dorronsoro, J.R. (eds) Artificial Neural Networks — ICANN 2002. ICANN 2002. Lecture Notes in Computer Science, vol 2415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46084-5_218
Download citation
DOI: https://doi.org/10.1007/3-540-46084-5_218
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44074-1
Online ISBN: 978-3-540-46084-8
eBook Packages: Springer Book Archive