Towards understanding why lookahead generalizes better than SGD and beyond
Abstract
Supplementary Material
- Download
- 319.22 KB
References
Index Terms
- Towards understanding why lookahead generalizes better than SGD and beyond
Recommendations
Towards theoretically understanding why SGD generalizes better than ADAM in deep learning
NIPS '20: Proceedings of the 34th International Conference on Neural Information Processing SystemsIt is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local ...
Understanding generalization error of SGD in nonconvex optimization
AbstractThe success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing generalization bounds based on stability do not ...
Is local SGD better than minibatch SGD?
ICML'20: Proceedings of the 37th International Conference on Machine LearningWe study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
Qualifiers
- Research-article
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0