research-article

Unified Acceleration of High-Order Algorithms under General Hölder Continuity

Authors:

Yi MaAuthors Info & Claims

SIAM Journal on Optimization, Volume 31, Issue 3

Pages 1797 - 1826

https://doi.org/10.1137/19M1290243

Published: 01 January 2021 Publication History

Abstract

In this paper, through an intuitive vanilla proximal method perspective, we derive a concise unified acceleration framework (UAF) for minimizing a convex function that has Hölder continuous derivatives with respect to general (non-Euclidean) norms. The UAF reconciles two different high-order acceleration approaches, one by Nesterov [Math. Program., 112 (2008), pp. 159--181] and one by Monteiro and Svaiter [SIAM J. Optim., 23 (2013), pp. 1092--1125]. As a result, the UAF unifies the high-order acceleration instances of the two approaches by only two problem-related parameters and two additional parameters for framework design. Meanwhile, the UAF (and its analysis) is the first approach to make high-order methods applicable for high-order smoothness conditions with respect to non-Euclidean norms. Furthermore, the UAF is the first approach that can match the existing lower bound of iteration complexity for minimizing a convex function with Hölder continuous derivatives. For practical implementation, we introduce a new and effective heuristic that significantly simplifies the binary search procedure required by the framework. We use experiments on logistic regression to verify the effectiveness of the heuristic. Finally, the UAF is proposed directly in the general composite convex setting and shows that the existing high-order algorithms can be naturally extended to the general composite convex setting.

References

[1]

Z. Allen-Zhu, Katyusha: The first direct acceleration of stochastic gradient methods, J. Mach. Learn. Res., 18 (2017), pp. 8194--8244.

[2]

Z. Allen-Zhu and L. Orecchia, Linear coupling: An ultimate unification of gradient and mirror descent, in 8th Innovations in Theoretical Computer Science Conference (ITCS 2017), Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, Germany, 2017, 3.

[3]

Y. Arjevani, O. Shamir, and R. Shiff, Oracle complexity of second-order methods for smooth convex optimization, Math. Program., 178 (2019), pp. 327--360.

[4]

M. Baes, Estimate sequence methods: Extensions and approximations, Optimization Online, 2009 (2009), 2372.

[5]

K. Ball, E. A. Carlen, and E. H. Lieb, Sharp uniform convexity and smoothness inequalities for trace norms, Invent. Math., 115 (1994), pp. 463--482.

[6]

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183--202, https://doi.org/10.1137/080716542.

[7]

S. Bubeck, Q. Jiang, Y. T. Lee, Y. Li, and A. Sidford, Near-optimal method for highly smooth convex optimization, in Proceedings of the Thirty-Second Conference on Learning Theory (Phoenix, AZ), Proc. Mach. Learn. Res. 99, PMLR, 2019, pp. 492--507, http://proceedings.mlr.press/v99/bubeck19a.html.

[8]

B. Bullins, Fast Minimization of Structured Convex Quartics, preprint, https://arxiv.org/abs/1812.10349, 2018.

[9]

Y. Carmon and J. Duchi, Gradient descent finds the cubic-regularized nonconvex Newton step, SIAM J. Optim., 29 (2019), pp. 2146--2178, https://doi.org/10.1137/17M1113898.

[10]

Y. Carmon and J. C. Duchi, Analysis of Krylov subspace solutions of regularized nonconvex quadratic problems, in Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS), Montreal, Canada, 2018, pp. 10728--10738, https://proceedings.neurips.cc/paper/2018/file/349f36aa789af083b8e26839bd498af9-Paper.pdf.

[11]

Y. Carmon, A. Jambulapati, Q. Jiang, Y. Jin, Y. T. Lee, A. Sidford, and K. Tian, Acceleration with a Ball Optimization Oracle, preprint, https://arxiv.org/abs/2003.08078, 2020.

[12]

C. Cartis, N. I. Gould, and P. L. Toint, Universal regularization methods: Varying the power, the smoothness and the accuracy, SIAM J. Optim., 29 (2019), pp. 595--615, https://doi.org/10.1137/16M1106316.

[13]

C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intel. Syst. Tech., 2 (2011), 27; software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[14]

J. Diakonikolas and L. Orecchia, Accelerated extra-gradient descent: A novel accelerated first-order method, in 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, Germany, 2018, 23.

[15]

J. Diakonikolas and L. Orecchia, The approximate duality gap technique: A unified theory of first-order methods, SIAM J. Optim., 29 (2019), pp. 660--689, https://doi.org/10.1137/18M1172314.

[16]

A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, and C. A. Uribe, Optimal tensor methods in smooth convex and uniformly convex optimization, in Proceedings of the Thirty-Second Conference on Learning Theory (Phoenix, AZ), Proc. Mach. Learn. Res. 99, PMLR, 2019, pp. 1374--1391.

[17]

G. N. Grapiglia and Y. Nesterov, Accelerated regularized Newton methods for minimizing composite convex functions, SIAM J. Optim., 29 (2019), pp. 77--99, https://doi.org/10.1137/17M1142077.

[18]

G. N. Grapiglia and Y. Nesterov, Tensor Methods for Minimizing Convex Functions with Hölder Continuous Higher-Order Derivatives, preprint, https://arxiv.org/abs/1904.12559, 2019.

[19]

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer, New York, 2009.

[20]

B. Jiang, H. Wang, and S. Zhang, An optimal high-order tensor method for convex optimization, in Proceedings of the Thirty-Second Conference on Learning Theory (Phoenix, AZ), Proc. Mach. Learn. Res. 99, PMLR, 2019, pp. 1799--1801, http://proceedings.mlr.press/v99/jiang19a.html.

[21]

A. Juditsky and A. Nemirovski, First order methods for nonsmooth convex large-scale optimization, II: Utilizing problems structure, Optim. Mach. Learn., 30 (2011), pp. 149--183.

[22]

S. P. Karimireddy, S. U. Stich, and M. Jaggi, Global Linear Convergence of Newton's Method without Strong-Convexity or Lipschitz Gradients, preprint, https://arxiv.org/abs/1806.00413, 2018.

[23]

J. M. Kohler and A. Lucchi, Sub-sampled cubic regularization for non-convex optimization, in Proceedings of the 34th International Conference on Machine Learning (ICML), Proc. Mach. Learn. Res. 70, PMLR, 2017, pp. 1895--1904.

[24]

W. Krichene, A. Bayen, and P. L. Bartlett, Accelerated mirror descent in continuous and discrete time, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, 2015, pp. 2845--2853.

[25]

W. Krichene, A. Bayen, and P. L. Bartlett, Adaptive averaging in accelerated descent dynamics, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, 2016, pp. 2991--2999.

[26]

U. Marteau-Ferey, F. Bach, and A. Rudi, Globally convergent Newton methods for ill-conditioned generalized self-concordant losses, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, 2019, pp. 7636--7646.

[27]

R. D. C. Monteiro and B. F. Svaiter, An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods, SIAM J. Optim., 23 (2013), pp. 1092--1125, https://doi.org/10.1137/110833786.

[28]

A. S. Nemirovskii and Y. Nesterov, Optimal methods of smooth convex minimization, USSR Computational Mathematics and Mathematical Physics, 25 (1985), pp. 21--30.

[29]

A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, John Wiley & Sons, New York, 1983.

[30]

Y. Nesterov, A method for solving the convex programming problem with convergence rate $O(1/k^2 )$, Dokl. Akad. Nauk SSSR, 269 (1983), pp. 543--547 (in Russian).

[31]

Y. Nesterov, Accelerating the cubic regularization of Newton's method on convex problems, Math. Program., 112 (2008), pp. 159--181.

[32]

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, Boston, MA, 2004.

[33]

Y. Nesterov, Universal gradient methods for convex optimization problems, Math. Program., 152 (2015), pp. 381--404.

[34]

Y. Nesterov, Complexity bounds for primal-dual methods minimizing the model of objective function, Math. Program., 171 (2018), pp. 311--330.

[35]

Y. Nesterov, Implementable Tensor Methods in Unconstrained Convex Optimization, Tech. report, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), Louvain-la-Neuve, Belgium, 2018.

[36]

Y. Nesterov and B. T. Polyak, Cubic regularization of Newton method and its global performance, Math. Program., 108 (2006), pp. 177--205.

[37]

N. Parikh and S. Boyd, Proximal algorithms, Found. Trends Optim., 1 (2014), pp. 127--239.

[38]

V. Roulet and A. d'Aspremont, Sharpness, restart and acceleration, in Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, 2017, pp. 1119--1129, https://papers.nips.cc/paper/2017/hash/2ca65f58e35d9ad45bf7f3ae5cfd08f1-Abstract.html.

[39]

C. Song, S. Cui, Y. Jiang, and S.-T. Xia, Accelerated stochastic greedy coordinate descent by soft thresholding projection onto simplex, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, 2017, pp. 4838--4847.

[40]

C. Song and J. Diakonikolas, Fast Cyclic Coordinate Dual Averaging with Extrapolation for Generalized Variational Inequalities, preprint, https://arxiv.org/abs/2102.13244, 2021.

[41]

C. Song, Y. Jiang, and Y. Ma, Breaking the $O(1/\varepsilon)$ Optimal Rate for a Class of Minimax Problems, preprint, https://arxiv.org/abs/2003.11758, 2020.

[42]

C. Song, Y. Jiang, and Y. Ma, Newton dual extrapolation for non-monotone variational inequality, in OPT-ML Workshop of the 37th International Conference on Machine Learning, 2020, https://sites.google.com/view/optml-icml2020/accepted-papers?authuser=0.

[43]

C. Song, Y. Jiang, and Y. Ma, Variance reduction via accelerated dual averaging for finite-sum optimization, in Proceedings of the 34th International Conference on Neural Information Processing Systems, NeurIPS, San Diego, CA, 2020, https://proceedings.neurips.cc/paper/2020/hash/093b60fd0557804c8ba0cbf1453da22f-Abstract.html.

[44]

C. Song, S. J. Wright, and J. Diakonikolas, Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums, preprint, https://arxiv.org/abs/2102.13643, 2021.

[45]

C. Song, Y. Zhou, Z. Zhou, Y. Jiang, and Y. Ma, Optimistic dual extrapolation for coherent non-monotone variational inequalities, in Proceedings of the 34th International Conference on Neural Information Processing Systems, NeurIPS, San Diego, CA, 2020.

[46]

W. Su, S. Boyd, and E. J. Candès, A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights, in Proceedings of the 27th International Conference on Neural Information Processing Systems (NeurIPS), Montreal, Canada, 2014, pp. 2510--2518, https://proceedings.neurips.cc/paper/2014/file/f09696910bdd874a99cd74c8f05b5c44-Paper.pdf.

[47]

M. Wang, K. D. Duc, J. Fischer, and Y. S. Song, Operator norm inequalities between tensor unfoldings on the partition lattice, Linear Algebra Appl., 520 (2017), pp. 44--66.

[48]

A. Wibisono, A. C. Wilson, and M. I. Jordan, A variational perspective on accelerated methods in optimization, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. E7351--E7358.

[49]

A. Yurtsever, Q. Tran-Dinh, and V. Cevher, A universal primal-dual convex optimization framework, in Proceedings of the 28th International Conference on Neural Information Processing Systems (NeurIPS), Montreal, Canada, 2015, pp. 3150--3158, https://proceedings.neurips.cc/paper/2015/file/df9028fcb6b065e000ffe8a4f03eeb38-Paper.pdf.

Cited By

Lin TJordan M(2025)Perseus: a simple and optimal high-order method for variational inequalitiesMathematical Programming: Series A and B10.1007/s10107-024-02075-2209:1(609-650)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s10107-024-02075-2
Lin CSong CDiakonikolas JKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Accelerated cyclic coordinate dual averaging with extrapolation for composite convex optimizationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619278(21101-21126)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619278

Recommendations

On the convergence and worst-case complexity of trust-region and regularization methods for unconstrained optimization

A nonlinear stepsize control framework for unconstrained optimization was recently proposed by Toint (Optim Methods Softw 28:82---95, 2013), providing a unified setting in which the global convergence can be proved for trust-region algorithms and ...
A Flexible ADMM Algorithm for Big Data Applications

We present a Flexible Alternating Direction Method of Multipliers (F-ADMM) algorithm for solving optimization problems involving a strongly convex objective function that is separable into $$n \ge 2$$nź2 blocks, subject to (non-separable) linear ...
A proximal trust-region method for nonsmooth optimization with inexact function and gradient evaluations
Abstract
Many applications require minimizing the sum of smooth and nonsmooth functions. For example, basis pursuit denoising problems in data science require minimizing a measure of data misfit plus an $ℓ^{1}$ -regularizer. Similar problems arise in the optimal ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Optimization

SIAM Journal on Optimization Volume 31, Issue 3

DOI:10.1137/sjope8.31.3

Issue’s Table of Contents

© 2021, Society for Industrial and Applied Mathematics.

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2021

Author Tags

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lin TJordan M(2025)Perseus: a simple and optimal high-order method for variational inequalitiesMathematical Programming: Series A and B10.1007/s10107-024-02075-2209:1(609-650)Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s10107-024-02075-2
Lin CSong CDiakonikolas JKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Accelerated cyclic coordinate dual averaging with extrapolation for composite convex optimizationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619278(21101-21126)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619278

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents