Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Gradient Descent Finds the Cubic-Regularized Nonconvex Newton Step

Published: 01 January 2019 Publication History

Abstract

We consider the minimization of a nonconvex quadratic form regularized by a cubic term, which may exhibit saddle points and a suboptimal local minimum. Nonetheless, we prove that, under mild assumptions, gradient descent approximates the global minimum to within $\varepsilon$ accuracy in $O(\varepsilon^{-1}\log(1/\varepsilon))$ steps for large $\varepsilon$ and $O(\log(1/\varepsilon))$ steps for small $\varepsilon$ (compared to a condition number we define), with at most logarithmic dependence on the problem dimension. When we use gradient descent to approximate the cubic-regularized Newton step, our result implies a rate of convergence to second-order stationary points of general smooth nonconvex functions.

References

[1]
N. Agarwal, Z. Allen-Zhu, B. Bullins, E. Hazan, and T. Ma, Finding approximate local minima faster than gradient descent, in Proceedings of the Forty-Ninth Annual ACM Symposium on the Theory of Computing, Association for Computing Machinery, New York, NY, 2017.
[2]
A. Beck and Y. Vaisbourd, Globally solving the trust region subproblem using simple first-order methods, SIAM J. Optim., 28 (2018), pp. 1951--1967.
[3]
T. Bianconcini, G. Liuzzi, B. Morini, and M. Sciandrone, On the use of iterative methods in cubic regularization for unconstrained optimization, Comput. Optim. Appl., 60 (2015), pp. 35--57.
[4]
L. Bottou, F. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Rev., 60 (2018), pp. 223--311.
[5]
Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford, Accelerated methods for non-convex optimization, SIAM J. Optim., 28 (2018), pp. 1751--1772, https://doi.org/10.1137/17M1114296.
[6]
C. Cartis, N. I. M. Gould, and P. L. Toint, Adaptive cubic regularisation methods for unconstrained optimization. Part I: Motivation, convergence and numerical results, Math. Program., 127 (2011), pp. 245--295.
[7]
C. Cartis, N. I. Gould, and P. L. Toint, Adaptive cubic regularisation methods for unconstrained optimization. Part II: Worst-case function- and derivative-evaluation complexity, Math. Program., 130 (2011), pp. 295--319.
[8]
C. Cartis, N. I. Gould, and P. L. Toint, Complexity bounds for second-order optimality in unconstrained optimization, J. Complexity, 28 (2012), pp. 93--108.
[9]
A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust Region Methods, MPS-SIAM Ser. Optim., SIAM, Philadelphia, PA, 2000.
[10]
J. C. Duchi, Introductory lectures on stochastic convex optimization, in The Mathematics of Data, IAS/Park City Math. Ser. 25, American Mathematical Society, Providence, RI, 2018, pp. 99--186.
[11]
J. B. Erway and P. E. Gill, A subspace minimization method for the trust-region step, SIAM J. Optim., 20 (2010), pp. 1439--1461.
[12]
R. Ge, F. Huang, C. Jin, and Y. Yuan, Escaping from saddle points: Online stochastic gradient for tensor decomposition, in Proceedings of the Twenty Eighth Annual Conference on Computational Learning Theory, Proc. Mach. Learn. Res. 40, 2015, pp. 797--842; available at http://proceedings.mlr.press/v40/.
[13]
N. I. M. Gould, S. Lucidi, M. Roma, and P. L. Toint, Solving the trust-region subproblem using the Lanczos method, SIAM J. Optim., 9 (1999), pp. 504--525.
[14]
N. I. M. Gould, D. P. Robinson, and H. S. Thorne, On solving trust-region and other regularised subproblems in optimization, Math. Program. Comput., 2 (2010), pp. 21--57.
[15]
A. Griewank, The Modification of Newton's Method for Unconstrained Optimization by Bounding Cubic Terms, Technical report NA/12, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, 1981.
[16]
E. Hazan and T. Koren, A linear-time algorithm for trust region problems, Math. Program., 158 (2016), pp. 363--381.
[17]
N. Ho-Nguyen and F. K\il\inç-Karzan, A Second-Order Cone Based Approach for Solving the Trust-Region Subproblem and Its Variants, preprint, https://arxiv.org/abs/1603.03366, 2016.
[18]
J. Kuczyński and H. Woźniakowski, Estimating the largest eigenvalue by the power and Lanczos algorithms with a random start, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 1094--1122.
[19]
Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521 (2015), pp. 436--444.
[20]
J. D. Lee, M. Simchowitz, M. I. Jordan, and B. Recht, Gradient descent converges to minimizers, in Proceedings of the Twenty Ninth Annual Conference on Computational Learning Theory, Proc. Mach. Learn. Res. 49, 2016, pp. 1246--1257; available at http://proceedings.mlr.press/v49/.
[21]
K. Y. Levy, The Power of Normalization: Faster Evasion of Saddle Points, preprint, https://arxiv.org/abs/1611.04831, 2016.
[22]
C. Musco and C. Musco, Randomized block Krylov methods for stronger and faster approximate singular value decomposition, in Adv. Neural Inf. Process. Syst. 28, Curran Associates, Red Hook, NY, 2015, pp. 1396--1404.
[23]
Y. Nesterov, A method of solving a convex programming problem with convergence rate ${O}(1/k^2)$, Sov. Math. Dokl., 27 (1983), pp. 372--376.
[24]
Y. Nesterov, Introductory Lectures on Convex Optimization, Kluwer Academic, Norwell, MA, 2004.
[25]
Y. Nesterov, Gradient Methods for Minimizing Composite Objective Function, Technical Report 76, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, 2007.
[26]
Y. Nesterov and B. Polyak, Cubic regularization of Newton method and its global performance, Math. Program., 108 (2006), pp. 177--205.
[27]
J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed., Springer Ser. Oper. Res. Financ. Eng., Springer, New York, NY, 2006.
[28]
B. A. Pearlmutter, Fast exact multiplication by the Hessian, Neural Comput., 6 (1994), pp. 147--160.
[29]
B. T. Polyak, Some methods of speeding up the convergence of iteration methods, U.S.S.R. Comput. Math. and Math. Phys., 4(5) (1964), pp. 1--17.
[30]
N. N. Schraudolph, Fast curvature matrix-vector products for second-order gradient descent, Neural Comput., 14 (2002), pp. 1723--1738.
[31]
M. Simchowitz, A. E. Aloui, and B. Recht, Tight query complexity lower bounds for PCA via finite sample deformed Wigner law, in Proceedings of the Fiftieth Annual ACM SIGACT Symposium on the Theory of Computing, Association for Computing Machinery, New York, NY, 2018, pp. 1249--1259, https://doi.org/10.1145/3188745.3188796.
[32]
P. D. Tao and L. T. H. An, A D.C. optimization algorithm for solving the trust-region subproblem, SIAM J. Optim., 8 (1998), pp. 476--505.
[33]
L. N. Trefethen and D. Bau, III, Numerical Linear Algebra, SIAM, Philadelphia, PA, 1997.
[34]
M. Weiser, P. Deuflhard, and B. Erdmann, Affine conjugate adaptive Newton methods for nonlinear elastomechanics, Optim. Methods Softw., 22 (2007), pp. 413--431.
[35]
L.-H. Zhang, C. Shen, and R.-C. Li, On the generalized Lanczos trust-region method, SIAM J. Optim., 27 (2017), pp. 2110--2142.

Cited By

View all
  • (2024)A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimizationComputational Optimization and Applications10.1007/s10589-024-00603-689:3(843-894)Online publication date: 1-Dec-2024
  • (2024)Scalable adaptive cubic regularization methodsMathematical Programming: Series A and B10.1007/s10107-023-02007-6207:1-2(191-225)Online publication date: 1-Sep-2024
  • (2023)Stochastic Adaptive Regularization Method with Cubics: A High Probability Complexity BoundProceedings of the Winter Simulation Conference10.5555/3643142.3643436(3520-3531)Online publication date: 10-Dec-2023
  • Show More Cited By

Index Terms

  1. Gradient Descent Finds the Cubic-Regularized Nonconvex Newton Step
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image SIAM Journal on Optimization
        SIAM Journal on Optimization  Volume 29, Issue 3
        DOI:10.1137/sjope8.29.3
        Issue’s Table of Contents

        Publisher

        Society for Industrial and Applied Mathematics

        United States

        Publication History

        Published: 01 January 2019

        Author Tags

        1. gradient descent
        2. nonconvex quadratics
        3. cubic regularization
        4. global optimization
        5. Newton's method
        6. nonasymptotic rate of convergence
        7. power method
        8. trust region methods

        Author Tags

        1. 65K05
        2. 90C06
        3. 90C20
        4. 90C26
        5. 90C30

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 09 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimizationComputational Optimization and Applications10.1007/s10589-024-00603-689:3(843-894)Online publication date: 1-Dec-2024
        • (2024)Scalable adaptive cubic regularization methodsMathematical Programming: Series A and B10.1007/s10107-023-02007-6207:1-2(191-225)Online publication date: 1-Sep-2024
        • (2023)Stochastic Adaptive Regularization Method with Cubics: A High Probability Complexity BoundProceedings of the Winter Simulation Conference10.5555/3643142.3643436(3520-3531)Online publication date: 10-Dec-2023
        • (2023)Riemannian Stochastic Variance-Reduced Cubic Regularized Newton Method for Submanifold OptimizationJournal of Optimization Theory and Applications10.1007/s10957-022-02137-5196:1(324-361)Online publication date: 1-Jan-2023
        • (2022)Finding second-order stationary points in nonconvex-strongly-concave minimax optimizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602927(36667-36679)Online publication date: 28-Nov-2022
        • (2022)Approximate secular equations for the cubic regularization subproblemProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601306(14250-14260)Online publication date: 28-Nov-2022
        • (2021)Adaptive regularization with cubics on manifoldsMathematical Programming: Series A and B10.1007/s10107-020-01505-1188:1(85-134)Online publication date: 1-Jul-2021

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media