Computer Science > Machine Learning

arXiv:2305.15013 (cs)

[Submitted on 24 May 2023 (v1), last revised 26 May 2023 (this version, v2)]

Title:Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function

View PDF

Abstract:With multiple iterations of updates, local statistical gradient descent (L-SGD) has been proven to be very effective in distributed machine learning schemes such as federated learning. In fact, many innovative works have shown that L-SGD with independent and identically distributed (IID) data can even outperform SGD. As a result, extensive efforts have been made to unveil the power of L-SGD. However, existing analysis failed to explain why the multiple local updates with small mini-batches of data (L-SGD) can not be replaced by the update with one big batch of data and a larger learning rate (SGD). In this paper, we offer a new perspective to understand the strength of L-SGD. We theoretically prove that, with IID data, L-SGD can effectively explore the second order information of the loss function. In particular, compared with SGD, the updates of L-SGD have much larger projection on the eigenvectors of the Hessian matrix with small eigenvalues, which leads to faster convergence. Under certain conditions, L-SGD can even approach the Newton method. Experiment results over two popular datasets validate the theoretical results.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2305.15013 [cs.LG]
	(or arXiv:2305.15013v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2305.15013

Submission history

From: Linxuan Pan [view email]
[v1] Wed, 24 May 2023 10:54:45 UTC (487 KB)
[v2] Fri, 26 May 2023 05:18:28 UTC (342 KB)

Computer Science > Machine Learning

Title:Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators