Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter

Zeyuan Allen-Zhu
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:89-97, 2017.

Abstract

Given a non-convex function $f(x)$ that is an average of $n$ smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue $-\sigma$ of the Hessian. This parameter $\sigma$ captures how strongly non-convex $f(x)$ is, and is analogous to the strong convexity parameter for convex optimization. At least in theory, our methods outperform known results for a range of parameter $\sigma$, and can also be used to find approximate local minima. Our result implies an interesting dichotomy: there exists a threshold $\sigma_0$ so that the (currently) fastest methods for $\sigma>\sigma_0$ and for $\sigma<\sigma_0$ have different behaviors: the former scales with $n^{2/3}$ and the latter scales with $n^{3/4}$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-allen-zhu17a, title = {Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter}, author = {Zeyuan Allen-Zhu}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {89--97}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/allen-zhu17a/allen-zhu17a.pdf}, url = {https://proceedings.mlr.press/v70/allen-zhu17a.html}, abstract = {Given a non-convex function $f(x)$ that is an average of $n$ smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue $-\sigma$ of the Hessian. This parameter $\sigma$ captures how strongly non-convex $f(x)$ is, and is analogous to the strong convexity parameter for convex optimization. At least in theory, our methods outperform known results for a range of parameter $\sigma$, and can also be used to find approximate local minima. Our result implies an interesting dichotomy: there exists a threshold $\sigma_0$ so that the (currently) fastest methods for $\sigma>\sigma_0$ and for $\sigma<\sigma_0$ have different behaviors: the former scales with $n^{2/3}$ and the latter scales with $n^{3/4}$.} }
Endnote
%0 Conference Paper %T Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter %A Zeyuan Allen-Zhu %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-allen-zhu17a %I PMLR %P 89--97 %U https://proceedings.mlr.press/v70/allen-zhu17a.html %V 70 %X Given a non-convex function $f(x)$ that is an average of $n$ smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue $-\sigma$ of the Hessian. This parameter $\sigma$ captures how strongly non-convex $f(x)$ is, and is analogous to the strong convexity parameter for convex optimization. At least in theory, our methods outperform known results for a range of parameter $\sigma$, and can also be used to find approximate local minima. Our result implies an interesting dichotomy: there exists a threshold $\sigma_0$ so that the (currently) fastest methods for $\sigma>\sigma_0$ and for $\sigma<\sigma_0$ have different behaviors: the former scales with $n^{2/3}$ and the latter scales with $n^{3/4}$.
APA
Allen-Zhu, Z.. (2017). Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:89-97 Available from https://proceedings.mlr.press/v70/allen-zhu17a.html.

Related Material