Stochastic nested variance reduction for nonconvex optimization

AUTHORs:

Dongruo Zhou,

Pan Xu,

Quanquan GuAuthors Info & Claims

The Journal of Machine Learning Research, Volume 21, Issue 1

Article No.: 103, Pages 4130 - 4192

Published: 01 January 2020 Publication History

PDF eReader Publisher Site

Abstract

We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction, namely, Stochastic Nested Variance-Reduced Gradient descent (SNVRG). Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses K + 1 nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, SNVRG converges to an ε-approximate first-order stationary point within Õ(n∧ε^-2+ε^-3∧n^1/2ε^-2)¹ number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG O(n+n^2/3ε^-2) and that of SCSG O(n∧ε^-2+ε^-10/3∧n^2/3ε^-2). For gradient dominated functions, SNVRG also achieves better gradient complexity than the state-of-the-art algorithms.

Based on SNVRG, we further propose two algorithms that can find local minima faster than state-of-the-art algorithms in both finite-sum and general stochastic (online) nonconvex optimization. In particular, for finite-sum optimization problems, the proposed SNVRG + Neon2^finite algorithm achieves Õ(n^1/2ε^-2 + nε^-3_H + n^3/4ε^-7/2_H) gradient complexity to converge to an (ε, ε_H)-second-order stationary point, which outperforms SVRG+Neon2^finite (Allen-Zhu and Li, 2018), the best existing algorithm, in a wide regime. For general stochastic optimization problems, the proposed SNVRG+Neon2^online achieves Õ(ε^-3 +ε^-5_H +ε^-2ε^-3_H) gradient complexity, which is better than both SVRG+Neon2^online (Allen-Zhu and Li, 2018) and Natasha2 (Allen-Zhu, 2018a) in certain regimes. Thorough experimental results on different nonconvex optimization problems back up our theory.

References

[1]

Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, and Tengyu Ma. Finding approximate local minima faster than gradient descent. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1195-1199. ACM, 2017.

Abstract

References

Cited By

Index Terms

Recommendations

Stochastic nested variance reduction for nonconvex optimization

Stochastic variance reduction for nonconvex optimization

Variance-reduced reshuffling gradient descent for nonconvex optimization: Centralized and distributed algorithms

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations