Computer Science > Machine Learning

arXiv:2401.10809 (cs)

[Submitted on 19 Jan 2024 (v1), last revised 24 Jan 2024 (this version, v2)]

Title:Neglected Hessian component explains mysteries in Sharpness regularization

Authors:Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

Abstract:Recent work has shown that methods like SAM which either explicitly or implicitly penalize second order information can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We show that these differences can be explained by the structure of the Hessian of the loss. First, we show that a common decomposition of the Hessian can be quantitatively interpreted as separating the feature exploitation from feature exploration. The feature exploration, which can be described by the Nonlinear Modeling Error matrix (NME), is commonly neglected in the literature since it vanishes at interpolation. Our work shows that the NME is in fact important as it can explain why gradient penalties are sensitive to the choice of activation function. Using this insight we design interventions to improve performance. We also provide evidence that challenges the long held equivalence of weight noise and gradient penalties. This equivalence relies on the assumption that the NME can be ignored, which we find does not hold for modern networks since they involve significant feature learning. We find that regularizing feature exploitation but not feature exploration yields performance similar to gradient penalties.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2401.10809 [cs.LG]
	(or arXiv:2401.10809v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.10809

Submission history

From: Yann Dauphin [view email]
[v1] Fri, 19 Jan 2024 16:52:53 UTC (1,174 KB)
[v2] Wed, 24 Jan 2024 19:09:06 UTC (1,174 KB)

Computer Science > Machine Learning

Title:Neglected Hessian component explains mysteries in Sharpness regularization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Neglected Hessian component explains mysteries in Sharpness regularization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators