Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter

Zeyuan Allen-Zhu

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:89-97, 2017.

Abstract

Given a non-convex function $f(x)$ that is an average of $n$ smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue $-\sigma$ of the Hessian. This parameter $\sigma$ captures how strongly non-convex $f(x)$ is, and is analogous to the strong convexity parameter for convex optimization. At least in theory, our methods outperform known results for a range of parameter $\sigma$, and can also be used to find approximate local minima. Our result implies an interesting dichotomy: there exists a threshold $\sigma_0$ so that the (currently) fastest methods for $\sigma>\sigma_0$ and for $\sigma<\sigma_0$ have different behaviors: the former scales with $n^{2/3}$ and the latter scales with $n^{3/4}$.

Cite this Paper

BibTeX


@InProceedings{pmlr-v70-allen-zhu17a,
  title = 	 {Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter},
  author =       {Zeyuan Allen-Zhu},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {89--97},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/allen-zhu17a/allen-zhu17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/allen-zhu17a.html},
  abstract = 	 {Given a non-convex function $f(x)$ that is an average of $n$ smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue $-\sigma$ of the Hessian. This parameter $\sigma$ captures how strongly non-convex $f(x)$ is, and is analogous to the strong convexity parameter for convex optimization. At least in theory, our methods outperform known results for a range of parameter $\sigma$, and can also be used to find approximate local minima. Our result implies an interesting dichotomy: there exists a threshold $\sigma_0$ so that the (currently) fastest methods for $\sigma>\sigma_0$ and for $\sigma<\sigma_0$ have different behaviors: the former scales with $n^{2/3}$ and the latter scales with $n^{3/4}$.}
}

Endnote

%0 Conference Paper
%T Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter
%A Zeyuan Allen-Zhu
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-allen-zhu17a
%I PMLR
%P 89--97
%U https://proceedings.mlr.press/v70/allen-zhu17a.html
%V 70
%X Given a non-convex function $f(x)$ that is an average of $n$ smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue $-\sigma$ of the Hessian. This parameter $\sigma$ captures how strongly non-convex $f(x)$ is, and is analogous to the strong convexity parameter for convex optimization. At least in theory, our methods outperform known results for a range of parameter $\sigma$, and can also be used to find approximate local minima. Our result implies an interesting dichotomy: there exists a threshold $\sigma_0$ so that the (currently) fastest methods for $\sigma>\sigma_0$ and for $\sigma<\sigma_0$ have different behaviors: the former scales with $n^{2/3}$ and the latter scales with $n^{3/4}$.

APA


Allen-Zhu, Z.. (2017). Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:89-97 Available from https://proceedings.mlr.press/v70/allen-zhu17a.html.

Related Material

Download PDF