Approximate heavy tails in offline (multi-pass) stochastic gradient descent

KL Pavasovic, A Durmus, U Simsekli - arXiv preprint arXiv:2310.18455, 2023 - arxiv.org
… of offline (also called multi-pass) SGD exhibits ‘approximate’ power-law tails and the
approximation … Our main takeaway is that, as the number of data points increases, offline SGD will …

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

K Lehman, A Durmus… - Advances in Neural …, 2024 - proceedings.neurips.cc
… of offline (also called multi-pass) SGD exhibits ‘approximate’ power-law tails and the
approximation … Our main takeaway is that, as the number of data points increases, offline SGD will …

Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent

K Lehman Pavasovic, A Durmus, U Simsekli - arXiv e-prints, 2023 - ui.adsabs.harvard.edu
… of offline (also called multi-pass) SGD exhibits 'approximate' power-law tails and the
approximation … Our main takeaway is that, as the number of data points increases, offline SGD will …

Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

V Kothapalli, T Pang, S Deng, Z Liu, Y Yang - arXiv preprint arXiv …, 2024 - arxiv.org
… and benefits of analyzing multi-pass training, we present a … , we show that “stochastic gradient
noise” during optimization is … rank-1 approximation matrix A to state the following lemma. …

SGD with Clipping is Secretly Estimating the Median Gradient

F Schaipp, G Garrigos, U Simsekli, R Gower - arXiv preprint arXiv …, 2024 - arxiv.org
… an (approximate) sample median is robust to heavy-tailed noise, … That is, we sample one
stochastic gradient gt per iteration … heavy tails in offline (multi-pass) stochastic gradient descent. …

Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation

B Dupuis, U Şimşekli - arXiv preprint arXiv:2402.07723, 2024 - arxiv.org
… Recently, several studies have provided empirical and theoretical evidence that stochastic
gradient descent (SGD) can exhibit heavy tails when the step-size is chosen large, or the …

Near-Optimality of Contrastive Divergence Algorithms

P Glaser, KH Huang, A Gretton - The Thirty-eighth Annual Conference on … - openreview.net
… α close to 1, very heavy tails distributions may not verify the … used within an offline stochastic
gradient descent (SGD) … work on convergence guarantees for offline multi-pass SGD [41, 42, …

Tighter generalisation bounds via interpolation

P Viallard, M Haddouche, U Şimşekli… - arXiv preprint arXiv …, 2024 - arxiv.org
… Algorithmic stability of heavytailed stochastic gradient descent on least squares. In …
However, ρα remains mainly theoretical, as it is the continuous approximation of a discrete …

Combinatorial Tasks as Model Systems of Deep Learning

BL Edelman - 2024 - search.proquest.com
… This paper explores the more complicated setting of offline training of neural networks (in …
to use stochastic gradient descent (SGD), which only requires access to the gradient of the …

A Robust Treatment Planning Framework that Accounts for Weekly Tumor Shrinkage Using Cone Beam Computed Tomography Images Using Deep Learning-Based …

R Li - 2022 - search.proquest.com
… categorized into three major groups of offline, online and real-time, as shown in Figure I.3. …
A commonly used optimizer is mini-batch stochastic gradient descent, and it can be explained …