Efficient BackProp

Yann LeCun⁶,
Leon Bottou⁶,
Genevieve B. Orr⁷ &
…
Klaus -Robert Müller⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1524))

7955 Accesses
617 Citations
6 Altmetric

Abstract

The convergence of back-propagation learning is analyzed so as to explain common phenomenon observedb y practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposedin serious technical publications. This paper gives some of those tricks, ando.ers explanations of why they work. Many authors have suggested that second-order optimization methods are advantageous for neural net training. It is shown that most “classical” second-order methods are impractical for large neural networks. A few methods are proposed that do not have these limitations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

eBook: USD 15.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Backpropagation Issues with Deep Feedforward Neural Networks

Pushing Stochastic Gradient towards Second-Order Methods – Backpropagation Learning with Transformations in Nonlinearities

Backpropagation and the brain

Article 17 April 2020

References

S. Amari. Neural learning in structuredparameter spaces — natural riemannian gradient. In Michael C. Mozer, Michael I. Jordan, and Thomas Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 127. The MIT Press, 1997.
Google Scholar
S. Amari. Natural gradient works e.ciently in learning. Neural Computation, 10(2):251–276, 1998.
Article MathSciNet Google Scholar
R. Battiti. First-and second-order methods for learning: Between steepest descent andnewton’s method. Neural Computation, 4:141–166, 1992.
Article Google Scholar
S. Becker and Y. LeCun. Improving the convergence of backbropagation learning with secondo der metho ds. In David Touretzky, Geofrey Hinton, and T errence Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pages 29–37. Lawrence Erlbaum Associates, 1989.
Google Scholar
C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
Google Scholar
L. Bottou. Online algorithms andsto chastic approximations. In David Saad, editor, Online Learning in Neural Networks (1997 Workshop at the Newton Institute), Cambridge, 1998. The Newton Institute Series, Cambridge University Press.
Google Scholar
D. S. Broomheadand D. Lowe. Multivariable function interpolation andad aptive networks. Complex Systems, 2:321–355, 1988.
MathSciNet Google Scholar
W. L. Buntine and A. S. Weigend. Computing second order derivatives in Feed-Forwardnet works: A review. IEEE Transactions on Neural Networks, 1993. To appear.
Google Scholar
C. Darken and J. E. Moody. Note on learning rate schedules for stochastic optimization. In R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems, volume 3, pages 832–838. Morgan Kaufmann, San Mateo,CA, 1991.
Google Scholar
K. I. Diamantaras and S. Y. Kung. Principal Component Neural Networks. Wiley, New York, 1996.
MATH Google Scholar
R. Fletcher. Practical Methods of Optimization, chapter 8.7: Polynomial time algorithms, pages 183–188. John Wiley & Sons, New York, second edition, 1987.
MATH Google Scholar
S. Geman, E. Bienenstock, and R. Doursat. Neural networks andthe bias/variance dilemma. Neural Computation, 4(1):1–58, 1992.
Article Google Scholar
L. Goldstein. Mean square optimality in the continuous time Robbins Monro procedure. Technical Report DRB-306, Dept. of Mathematics, University of Southern California, LA, 1987.
Google Scholar
G. H. Golub and C. F. Van Loan. Matrix Computations, 2nd ed. Johns Hopkins University Press, Baltimore, 1989.
MATH Google Scholar
T.M. Heskes and B. Kappen. On-line learning processes in arti.cial neural networks. In J. G. Tayler, editor, Mathematical Approaches to Neural Networks, volume 51, pages 199–233. Elsevier, Amsterdam, 1993.
Google Scholar
Robert A. Jacobs. Increasedrates of convergence through learning rate adaptation. Neural Networks, 1:295–307, 1988.
Article Google Scholar
A. H. Kramer and A. Sangiovanni-Vincentelli. Efficient parallel learning algorithms for neural networks. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems. Proceedings of the 1988 Conference, pages 40–48, San Mateo, CA, 1989. Morgan Kaufmann.
Google Scholar
Y. LeCun. Modeles connexionnistes de l’apprentissage (connectionist learning models). PhD thesis, Université P. et M. Curie (Paris VI), 1987.
Google Scholar
Y. LeCun. Generalization andnet work design strategies. In R. Pfeifer, Z. Schreter, F. Fogelman, and L. Steels, editors, Connectionism in Perspective, Amsterdam, 1989. Elsevier. Proceedings of the International Conference Connectionism in Perspective, University of Zürich, 10.–13. October 1988.
Google Scholar
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Handwritten digit recognition with a backpropagation network. In D. S. Touretsky, editor, Advances in Neural Information Processing Systems, vol. 2, San Mateo, CA, 1990. Morgan Kaufman.
Google Scholar
Y. LeCun, J.S. Denker, and S.A. Solla. Optimal brain damage. In D. S. Touretsky, editor, Advances in Neural Information Processing Systems, vol. 2, pages 598–605, 1990.
Google Scholar
Y. LeCun, I. Kanter, and S. A. Solla. Secondord er properties of error surfaces. In Advances in Neural Information Processing Systems, vol. 3, San Mateo, CA, 1991. Morgan Kaufmann.
Google Scholar
Y. LeCun, P. Y. Simard, and B. Pearlmutter. Automatic learning rate maximization by on-line estimation of the hessian’s eigenvectors. In Giles, Hanson, and Cowan, editors, Advances in Neural Information Processing Systems, vol. 5, San Mateo, CA, 1993. Morgan Kaufmann.
Google Scholar
M. MØller. A scaledconjugate gradient algorithm for fast supervisedlearning. Neural Networks, 6:525–533, 1993.
Article Google Scholar
M. MØller. Supervised learning on large redundant training sets. International Journal of Neural Systems, 4(1):15–25, 1993.
Article Google Scholar
J. E. Moody and C. J. Darken. Fast learning in networks of locally-tunedpro cessing units. Neural Computation, 1:281–294, 1989.
Article Google Scholar
N. Murata. (in Japanese). PhD thesis, University of Tokyo, 1992.
Google Scholar
N. Murata, K.-R. Müller, A. Ziehe, and S. Amari. Adaptive on-line learning in changing environments. In Michael C. Mozer, Michael I. Jordan, and Thomas Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 599. The MIT Press, 1997.
Google Scholar
A.V. Oppenheim and R.W. Schafer. Digital Signal Processing. Prentice Hall, Englewood Cliffs, 1975.
MATH Google Scholar
G. B. Orr. Dynamics and Algorithms for Stochastic learning. PhD thesis, Oregon Graduate Institute, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Image Processing Research Department AT&T Labs - Research, 100 Schulz Drive, RedBank, NJ, 07701-7033, USA
Yann LeCun & Leon Bottou
Willamette University, 900 State Street, Salem, OR, 97301, USA
Genevieve B. Orr
GMD FIRST, Rudower Chaussee 5, 12489, Berlin, Germany
Klaus -Robert Müller

Authors

Yann LeCun
View author publications
You can also search for this author in PubMed Google Scholar
Leon Bottou
View author publications
You can also search for this author in PubMed Google Scholar
Genevieve B. Orr
View author publications
You can also search for this author in PubMed Google Scholar
Klaus -Robert Müller
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Willamette University, Salem, OR, 97301, USA
Genevieve B. Orr
GMD First (Forschungszentrum Informationstechnik), Rudower Chaussee 5, D-12489, Berlin, Germany
Klaus-Robert Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R. (1998). Efficient BackProp. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_2

Download citation

DOI: https://doi.org/10.1007/3-540-49430-8_2
Published: 28 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65311-0
Online ISBN: 978-3-540-49430-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Efficient BackProp

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Backpropagation Issues with Deep Feedforward Neural Networks

Pushing Stochastic Gradient towards Second-Order Methods – Backpropagation Learning with Transformations in Nonlinearities

Backpropagation and the brain

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Efficient BackProp

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Backpropagation Issues with Deep Feedforward Neural Networks

Pushing Stochastic Gradient towards Second-Order Methods – Backpropagation Learning with Transformations in Nonlinearities

Backpropagation and the brain

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation