[PS][PS] Learning rate schedules for faster stochastic gradient search

C Darken, J Chang, J Moody - Neural networks for signal processing, 1992 - 193.166.3.2
C Darken, J Chang, J Moody
Neural networks for signal processing, 1992193.166.3.2
Stochastic gradient descent is a general algorithm that includes LMS, on-line
backpropagation, and adaptive k-means clustering as special cases. The standard choices
of the learning rate (both adaptive and xed functions of time) often perform quite poorly. In
contrast, our recently proposed class of\search then converge"(STC) learning rate schedules
(Darken and Moody, 1990b, 1991) display the theoretically optimal asymptotic convergence
rate and a superior ability to escape from poor local minima However, the user is …
Abstract
Stochastic gradient descent is a general algorithm that includes LMS, on-line backpropagation, and adaptive k-means clustering as special cases. The standard choices of the learning rate (both adaptive and xed functions of time) often perform quite poorly. In contrast, our recently proposed class of\search then converge"(STC) learning rate schedules (Darken and Moody, 1990b, 1991) display the theoretically optimal asymptotic convergence rate and a superior ability to escape from poor local minima However, the user is responsible for setting a key parameter. We propose here a new methodology for creating the rst automatically adapting learning rates that achieve the optimal rate of convergence.
193.166.3.2