Google Scholar

Approximate heavy tails in offline (multi-pass) stochastic gradient descent

KL Pavasovic, A Durmus, U Simsekli - arXiv preprint arXiv:2310.18455, 2023 - arxiv.org

… of offline (also called multi-pass) SGD exhibits ‘approximate’ power-law tails and the
approximation … Our main takeaway is that, as the number of data points increases, offline SGD will …

Save Cite Cited by 4 Related articles All 7 versions View as HTML

[PDF] neurips.cc

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

K Lehman, A Durmus… - Advances in Neural …, 2024 - proceedings.neurips.cc

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

K Lehman Pavasovic, A Durmus, U Simsekli - arXiv e-prints, 2023 - ui.adsabs.harvard.edu

… of offline (also called multi-pass) SGD exhibits 'approximate' power-law tails and the
approximation … Our main takeaway is that, as the number of data points increases, offline SGD will …

Save Cite Related articles

[PDF] arxiv.org

Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

V Kothapalli, T Pang, S Deng, Z Liu, Y Yang - arXiv preprint arXiv …, 2024 - arxiv.org

… and benefits of analyzing multi-pass training, we present a … , we show that “stochastic gradient
noise” during optimization is … rank-1 approximation matrix A to state the following lemma. …

Save Cite Cited by 2 Related articles All 2 versions View as HTML

[PDF] arxiv.org

SGD with Clipping is Secretly Estimating the Median Gradient

F Schaipp, G Garrigos, U Simsekli, R Gower - arXiv preprint arXiv …, 2024 - arxiv.org

… an (approximate) sample median is robust to heavy-tailed noise, … That is, we sample one
stochastic gradient gt per iteration … heavy tails in offline (multi-pass) stochastic gradient descent. …

[PDF] arxiv.org

Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation

B Dupuis, U Şimşekli - arXiv preprint arXiv:2402.07723, 2024 - arxiv.org

… Recently, several studies have provided empirical and theoretical evidence that stochastic
gradient descent (SGD) can exhibit heavy tails when the step-size is chosen large, or the …

[PDF] openreview.net

Near-Optimality of Contrastive Divergence Algorithms

P Glaser, KH Huang, A Gretton - The Thirty-eighth Annual Conference on … - openreview.net

… α close to 1, very heavy tails distributions may not verify the … used within an offline stochastic
gradient descent (SGD) … work on convergence guarantees for offline multi-pass SGD [41, 42, …

Save Cite Related articles View as HTML

[PDF] arxiv.org

Tighter generalisation bounds via interpolation

P Viallard, M Haddouche, U Şimşekli… - arXiv preprint arXiv …, 2024 - arxiv.org

… Algorithmic stability of heavytailed stochastic gradient descent on least squares. In …
However, ρα remains mainly theoretical, as it is the continuous approximation of a discrete …

Save Cite Cited by 3 Related articles All 4 versions View as HTML

[PDF] harvard.edu

Combinatorial Tasks as Model Systems of Deep Learning

BL Edelman - 2024 - search.proquest.com

… This paper explores the more complicated setting of offline training of neural networks (in …
to use stochastic gradient descent (SGD), which only requires access to the gradient of the …

Save Cite Related articles

A Robust Treatment Planning Framework that Accounts for Weekly Tumor Shrinkage Using Cone Beam Computed Tomography Images Using Deep Learning-Based …

R Li - 2022 - search.proquest.com

… categorized into three major groups of offline, online and real-time, as shown in Figure I.3. …
A commonly used optimizer is mini-batch stochastic gradient descent, and it can be explained …

Save Cite Related articles

Create alert

Cite

Advanced search

Saved to My library

Approximate heavy tails in offline (multi-pass) stochastic gradient descent

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

SGD with Clipping is Secretly Estimating the Median Gradient

Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation

Near-Optimality of Contrastive Divergence Algorithms

Tighter generalisation bounds via interpolation

Combinatorial Tasks as Model Systems of Deep Learning

A Robust Treatment Planning Framework that Accounts for Weekly Tumor Shrinkage Using Cone Beam Computed Tomography Images Using Deep Learning-Based …