Approximate heavy tails in offline (multi-pass) stochastic gradient descent
… of offline (also called multi-pass) SGD exhibits ‘approximate’ power-law tails and the
approximation … Our main takeaway is that, as the number of data points increases, offline SGD will …
approximation … Our main takeaway is that, as the number of data points increases, offline SGD will …
Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent
K Lehman, A Durmus… - Advances in Neural …, 2024 - proceedings.neurips.cc
… of offline (also called multi-pass) SGD exhibits ‘approximate’ power-law tails and the
approximation … Our main takeaway is that, as the number of data points increases, offline SGD will …
approximation … Our main takeaway is that, as the number of data points increases, offline SGD will …
Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent
K Lehman Pavasovic, A Durmus, U Simsekli - arXiv e-prints, 2023 - ui.adsabs.harvard.edu
… of offline (also called multi-pass) SGD exhibits 'approximate' power-law tails and the
approximation … Our main takeaway is that, as the number of data points increases, offline SGD will …
approximation … Our main takeaway is that, as the number of data points increases, offline SGD will …
Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise
… and benefits of analyzing multi-pass training, we present a … , we show that “stochastic gradient
noise” during optimization is … rank-1 approximation matrix A to state the following lemma. …
noise” during optimization is … rank-1 approximation matrix A to state the following lemma. …
SGD with Clipping is Secretly Estimating the Median Gradient
… an (approximate) sample median is robust to heavy-tailed noise, … That is, we sample one
stochastic gradient gt per iteration … heavy tails in offline (multi-pass) stochastic gradient descent. …
stochastic gradient gt per iteration … heavy tails in offline (multi-pass) stochastic gradient descent. …
Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation
B Dupuis, U Şimşekli - arXiv preprint arXiv:2402.07723, 2024 - arxiv.org
… Recently, several studies have provided empirical and theoretical evidence that stochastic
gradient descent (SGD) can exhibit heavy tails when the step-size is chosen large, or the …
gradient descent (SGD) can exhibit heavy tails when the step-size is chosen large, or the …
Near-Optimality of Contrastive Divergence Algorithms
… α close to 1, very heavy tails distributions may not verify the … used within an offline stochastic
gradient descent (SGD) … work on convergence guarantees for offline multi-pass SGD [41, 42, …
gradient descent (SGD) … work on convergence guarantees for offline multi-pass SGD [41, 42, …
Tighter generalisation bounds via interpolation
… Algorithmic stability of heavytailed stochastic gradient descent on least squares. In …
However, ρα remains mainly theoretical, as it is the continuous approximation of a discrete …
However, ρα remains mainly theoretical, as it is the continuous approximation of a discrete …
Combinatorial Tasks as Model Systems of Deep Learning
BL Edelman - 2024 - search.proquest.com
… This paper explores the more complicated setting of offline training of neural networks (in …
to use stochastic gradient descent (SGD), which only requires access to the gradient of the …
to use stochastic gradient descent (SGD), which only requires access to the gradient of the …
A Robust Treatment Planning Framework that Accounts for Weekly Tumor Shrinkage Using Cone Beam Computed Tomography Images Using Deep Learning-Based …
R Li - 2022 - search.proquest.com
… categorized into three major groups of offline, online and real-time, as shown in Figure I.3. …
A commonly used optimizer is mini-batch stochastic gradient descent, and it can be explained …
A commonly used optimizer is mini-batch stochastic gradient descent, and it can be explained …