Nothing Special   »   [go: up one dir, main page]

Skip to content

cloneofsimo/scaling-guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Almost Comprehensive Scaling Guide for Transformer Training

WIP

Feature

Activation Tracking:

alt text

Weight Update Tracking:

alt text

Scope:

  • Kaplan vs Chinchilla
  • SP vs muP vs layerwise SP
  • adamW weight decay
  • infinite lr scheduler
  • batch_size vs lr (sqrt BS law)
  • agd-muP (spectral initialization vs classic muP)
  • adam-atan2
  • data dependent lr tranfer
  • embedding lr transfer
  • u-muP

Shoutout

Thanks to Fal.ai for providing compute to run these experiments.

Releases

No releases published

Packages

No packages published