research-article

Gradient Descent Finds the Cubic-Regularized Nonconvex Newton Step

Authors:

John DuchiAuthors Info & Claims

SIAM Journal on Optimization, Volume 29, Issue 3

Pages 2146 - 2178

https://doi.org/10.1137/17M1113898

Published: 01 January 2019 Publication History

Abstract

We consider the minimization of a nonconvex quadratic form regularized by a cubic term, which may exhibit saddle points and a suboptimal local minimum. Nonetheless, we prove that, under mild assumptions, gradient descent approximates the global minimum to within $\varepsilon$ accuracy in $O(\varepsilon^{-1}\log(1/\varepsilon))$ steps for large $\varepsilon$ and $O(\log(1/\varepsilon))$ steps for small $\varepsilon$ (compared to a condition number we define), with at most logarithmic dependence on the problem dimension. When we use gradient descent to approximate the cubic-regularized Newton step, our result implies a rate of convergence to second-order stationary points of general smooth nonconvex functions.

References

[1]

N. Agarwal, Z. Allen-Zhu, B. Bullins, E. Hazan, and T. Ma, Finding approximate local minima faster than gradient descent, in Proceedings of the Forty-Ninth Annual ACM Symposium on the Theory of Computing, Association for Computing Machinery, New York, NY, 2017.

[2]

A. Beck and Y. Vaisbourd, Globally solving the trust region subproblem using simple first-order methods, SIAM J. Optim., 28 (2018), pp. 1951--1967.

[3]

T. Bianconcini, G. Liuzzi, B. Morini, and M. Sciandrone, On the use of iterative methods in cubic regularization for unconstrained optimization, Comput. Optim. Appl., 60 (2015), pp. 35--57.

[4]

L. Bottou, F. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), pp. 223--311.

[5]

Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford, Accelerated methods for non-convex optimization, SIAM J. Optim., 28 (2018), pp. 1751--1772, https://doi.org/10.1137/17M1114296.

[6]

C. Cartis, N. I. M. Gould, and P. L. Toint, Adaptive cubic regularisation methods for unconstrained optimization. Part I: Motivation, convergence and numerical results, Math. Program., 127 (2011), pp. 245--295.

[7]

C. Cartis, N. I. Gould, and P. L. Toint, Adaptive cubic regularisation methods for unconstrained optimization. Part II: Worst-case function- and derivative-evaluation complexity, Math. Program., 130 (2011), pp. 295--319.

[8]

C. Cartis, N. I. Gould, and P. L. Toint, Complexity bounds for second-order optimality in unconstrained optimization, J. Complexity, 28 (2012), pp. 93--108.

[9]

A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust Region Methods, MPS-SIAM Ser. Optim., SIAM, Philadelphia, PA, 2000.

[10]

J. C. Duchi, Introductory lectures on stochastic convex optimization, in The Mathematics of Data, IAS/Park City Math. Ser. 25, American Mathematical Society, Providence, RI, 2018, pp. 99--186.

[11]

J. B. Erway and P. E. Gill, A subspace minimization method for the trust-region step, SIAM J. Optim., 20 (2010), pp. 1439--1461.

[12]

R. Ge, F. Huang, C. Jin, and Y. Yuan, Escaping from saddle points: Online stochastic gradient for tensor decomposition, in Proceedings of the Twenty Eighth Annual Conference on Computational Learning Theory, Proc. Mach. Learn. Res. 40, 2015, pp. 797--842; available at http://proceedings.mlr.press/v40/.

[13]

N. I. M. Gould, S. Lucidi, M. Roma, and P. L. Toint, Solving the trust-region subproblem using the Lanczos method, SIAM J. Optim., 9 (1999), pp. 504--525.

[14]

N. I. M. Gould, D. P. Robinson, and H. S. Thorne, On solving trust-region and other regularised subproblems in optimization, Math. Program. Comput., 2 (2010), pp. 21--57.

[15]

A. Griewank, The Modification of Newton's Method for Unconstrained Optimization by Bounding Cubic Terms, Technical report NA/12, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, 1981.

[16]

E. Hazan and T. Koren, A linear-time algorithm for trust region problems, Math. Program., 158 (2016), pp. 363--381.

[17]

N. Ho-Nguyen and F. K\il\inç-Karzan, A Second-Order Cone Based Approach for Solving the Trust-Region Subproblem and Its Variants, preprint, https://arxiv.org/abs/1603.03366, 2016.

[18]

J. Kuczyński and H. Woźniakowski, Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 1094--1122.

[19]

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), pp. 436--444.

[20]

J. D. Lee, M. Simchowitz, M. I. Jordan, and B. Recht, Gradient descent converges to minimizers, in Proceedings of the Twenty Ninth Annual Conference on Computational Learning Theory, Proc. Mach. Learn. Res. 49, 2016, pp. 1246--1257; available at http://proceedings.mlr.press/v49/.

[21]

K. Y. Levy, The Power of Normalization: Faster Evasion of Saddle Points, preprint, https://arxiv.org/abs/1611.04831, 2016.

[22]

C. Musco and C. Musco, Randomized block Krylov methods for stronger and faster approximate singular value decomposition, in Adv. Neural Inf. Process. Syst. 28, Curran Associates, Red Hook, NY, 2015, pp. 1396--1404.

[23]

Y. Nesterov, A method of solving a convex programming problem with convergence rate ${O}(1/k^2)$, Sov. Math. Dokl., 27 (1983), pp. 372--376.

[24]

Y. Nesterov, Introductory Lectures on Convex Optimization, Kluwer Academic, Norwell, MA, 2004.

[25]

Y. Nesterov, Gradient Methods for Minimizing Composite Objective Function, Technical Report 76, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, 2007.

[26]

Y. Nesterov and B. Polyak, Cubic regularization of Newton method and its global performance, Math. Program., 108 (2006), pp. 177--205.

[27]

J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer Ser. Oper. Res. Financ. Eng., Springer, New York, NY, 2006.

[28]

B. A. Pearlmutter, Fast exact multiplication by the Hessian, Neural Comput., 6 (1994), pp. 147--160.

[29]

B. T. Polyak, Some methods of speeding up the convergence of iteration methods, U.S.S.R. Comput. Math. and Math. Phys., 4(5) (1964), pp. 1--17.

[30]

N. N. Schraudolph, Fast curvature matrix-vector products for second-order gradient descent, Neural Comput., 14 (2002), pp. 1723--1738.

[31]

M. Simchowitz, A. E. Aloui, and B. Recht, Tight query complexity lower bounds for PCA via finite sample deformed Wigner law, in Proceedings of the Fiftieth Annual ACM SIGACT Symposium on the Theory of Computing, Association for Computing Machinery, New York, NY, 2018, pp. 1249--1259, https://doi.org/10.1145/3188745.3188796.

[32]

P. D. Tao and L. T. H. An, A D.C. optimization algorithm for solving the trust-region subproblem, SIAM J. Optim., 8 (1998), pp. 476--505.

[33]

L. N. Trefethen and D. Bau, III, Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997.

[34]

M. Weiser, P. Deuflhard, and B. Erdmann, Affine conjugate adaptive Newton methods for nonlinear elastomechanics, Optim. Methods Softw., 22 (2007), pp. 413--431.

[35]

L.-H. Zhang, C. Shen, and R.-C. Li, On the generalized Lanczos trust-region method, SIAM J. Optim., 27 (2017), pp. 2110--2142.

Cited By

He CHuang HLu Z(2024)A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimizationComputational Optimization and Applications10.1007/s10589-024-00603-689:3(843-894)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s10589-024-00603-6
Dussault JMigot TOrban D(2024)Scalable adaptive cubic regularization methodsMathematical Programming: Series A and B10.1007/s10107-023-02007-6207:1-2(191-225)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s10107-023-02007-6
Scheinberg KXie M(2023)Stochastic Adaptive Regularization Method with Cubics: A High Probability Complexity BoundProceedings of the Winter Simulation Conference10.5555/3643142.3643436(3520-3531)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3643142.3643436
Show More Cited By

Index Terms

Gradient Descent Finds the Cubic-Regularized Nonconvex Newton Step
1. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
    2. Numerical analysis
      1. Numerical differentiation
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization

Index terms have been assigned to the content through auto-classification.

Recommendations

First-Order Methods for Nonconvex Quadratic Minimization

We consider minimization of indefinite quadratics with either trust-region (norm) constraints or cubic regularization. Despite the nonconvexity of these problems we prove that, under mild assumptions, gradient descent converges to their global solutions and ...
A coordinate gradient descent method for nonsmooth separable minimization

We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ℓ1-regularization. We propose a (block) coordinate ...
On the Complexity of Steepest Descent, Newton's and Regularized Newton's Methods for Nonconvex Unconstrained Optimization Problems

It is shown that the steepest-descent and Newton's methods for unconstrained nonconvex optimization under standard assumptions may both require a number of iterations and function evaluations arbitrarily close to $O(\epsilon^{-2})$ to drive the norm of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Optimization

SIAM Journal on Optimization Volume 29, Issue 3

DOI:10.1137/sjope8.29.3

Issue’s Table of Contents

© 2019, Society for Industrial and Applied Mathematics.

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2019

Author Tags

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

He CHuang HLu Z(2024)A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimizationComputational Optimization and Applications10.1007/s10589-024-00603-689:3(843-894)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s10589-024-00603-6
Dussault JMigot TOrban D(2024)Scalable adaptive cubic regularization methodsMathematical Programming: Series A and B10.1007/s10107-023-02007-6207:1-2(191-225)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s10107-023-02007-6
Scheinberg KXie M(2023)Stochastic Adaptive Regularization Method with Cubics: A High Probability Complexity BoundProceedings of the Winter Simulation Conference10.5555/3643142.3643436(3520-3531)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3643142.3643436
Zhang DDavanloo Tajbakhsh S(2023)Riemannian Stochastic Variance-Reduced Cubic Regularized Newton Method for Submanifold OptimizationJournal of Optimization Theory and Applications10.1007/s10957-022-02137-5196:1(324-361)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1007/s10957-022-02137-5
Luo LLi YChen CKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Finding second-order stationary points in nonconvex-strongly-concave minimax optimizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602927(36667-36679)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602927
Gao YYue MNg MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Approximate secular equations for the cubic regularization subproblemProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601306(14250-14260)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601306
Agarwal NBoumal NBullins BCartis C(2021)Adaptive regularization with cubics on manifoldsMathematical Programming: Series A and B10.1007/s10107-020-01505-1188:1(85-134)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1007/s10107-020-01505-1

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents