Nothing Special   »   [go: up one dir, main page]

Sourabh Medapati

Research Engineer at Google Brain
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Adaptive Gradient Methods at the Edge of Stability
    Behrooz Ghorbani
    David Cardoze
    Jeremy Cohen
    Justin Gilmer
    Shankar Krishnan
    NeuRIPS 2022 (2022) (to appear)
    Preview abstract Little is known about the training dynamics of adaptive gradient methods like Adam in deep learning. In this paper, we shed light on the behavior of these algorithms in the full-batch and sufficiently large batch settings. Specifically, we show that during full-batch training, the maximum eigenvalue of the \emph{preconditioned} Hessian typically equilibrates at the stability threshold of a related non-adaptive algorithm. For Adam with step size $\eta$ and $\beta_1 = 0.9$, this stability threshold is $38/\eta$. Similar effects occur during minibatch training, especially as the batch size grows. Yet, even though adaptive methods train at the “Edge of Stability,” their behavior in this regime differs in a crucial way from that of their non-adaptive counterparts. Whereas non-adaptive algorithms are forced to remain in low-curvature regions of the loss landscape, we demonstrate that adaptive gradient methods often advance into high-curvature regions, while adapting the preconditioner to compensate. We believe that our findings will serve as a foundation for the community’s future understanding of adaptive gradient methods in deep learning. View details