Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Unified Acceleration of High-Order Algorithms under General Hölder Continuity

Published: 01 January 2021 Publication History

Abstract

In this paper, through an intuitive vanilla proximal method perspective, we derive a concise unified acceleration framework (UAF) for minimizing a convex function that has Hölder continuous derivatives with respect to general (non-Euclidean) norms. The UAF reconciles two different high-order acceleration approaches, one by Nesterov [Math. Program., 112 (2008), pp. 159--181] and one by Monteiro and Svaiter [SIAM J. Optim., 23 (2013), pp. 1092--1125]. As a result, the UAF unifies the high-order acceleration instances of the two approaches by only two problem-related parameters and two additional parameters for framework design. Meanwhile, the UAF (and its analysis) is the first approach to make high-order methods applicable for high-order smoothness conditions with respect to non-Euclidean norms. Furthermore, the UAF is the first approach that can match the existing lower bound of iteration complexity for minimizing a convex function with Hölder continuous derivatives. For practical implementation, we introduce a new and effective heuristic that significantly simplifies the binary search procedure required by the framework. We use experiments on logistic regression to verify the effectiveness of the heuristic. Finally, the UAF is proposed directly in the general composite convex setting and shows that the existing high-order algorithms can be naturally extended to the general composite convex setting.

References

[1]
Z. Allen-Zhu, Katyusha: The first direct acceleration of stochastic gradient methods, J. Mach. Learn. Res., 18 (2017), pp. 8194--8244.
[2]
Z. Allen-Zhu and L. Orecchia, Linear coupling: An ultimate unification of gradient and mirror descent, in 8th Innovations in Theoretical Computer Science Conference (ITCS 2017), Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, Germany, 2017, 3.
[3]
Y. Arjevani, O. Shamir, and R. Shiff, Oracle complexity of second-order methods for smooth convex optimization, Math. Program., 178 (2019), pp. 327--360.
[4]
M. Baes, Estimate sequence methods: Extensions and approximations, Optimization Online, 2009 (2009), 2372.
[5]
K. Ball, E. A. Carlen, and E. H. Lieb, Sharp uniform convexity and smoothness inequalities for trace norms, Invent. Math., 115 (1994), pp. 463--482.
[6]
A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009), pp. 183--202, https://doi.org/10.1137/080716542.
[7]
S. Bubeck, Q. Jiang, Y. T. Lee, Y. Li, and A. Sidford, Near-optimal method for highly smooth convex optimization, in Proceedings of the Thirty-Second Conference on Learning Theory (Phoenix, AZ), Proc. Mach. Learn. Res. 99, PMLR, 2019, pp. 492--507, http://proceedings.mlr.press/v99/bubeck19a.html.
[8]
B. Bullins, Fast Minimization of Structured Convex Quartics, preprint, https://arxiv.org/abs/1812.10349, 2018.
[9]
Y. Carmon and J. Duchi, Gradient descent finds the cubic-regularized nonconvex Newton step, SIAM J. Optim., 29 (2019), pp. 2146--2178, https://doi.org/10.1137/17M1113898.
[10]
Y. Carmon and J. C. Duchi, Analysis of Krylov subspace solutions of regularized nonconvex quadratic problems, in Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS), Montreal, Canada, 2018, pp. 10728--10738, https://proceedings.neurips.cc/paper/2018/file/349f36aa789af083b8e26839bd498af9-Paper.pdf.
[11]
Y. Carmon, A. Jambulapati, Q. Jiang, Y. Jin, Y. T. Lee, A. Sidford, and K. Tian, Acceleration with a Ball Optimization Oracle, preprint, https://arxiv.org/abs/2003.08078, 2020.
[12]
C. Cartis, N. I. Gould, and P. L. Toint, Universal regularization methods: Varying the power, the smoothness and the accuracy, SIAM J. Optim., 29 (2019), pp. 595--615, https://doi.org/10.1137/16M1106316.
[13]
C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, ACM Trans. Intel. Syst. Tech., 2 (2011), 27; software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[14]
J. Diakonikolas and L. Orecchia, Accelerated extra-gradient descent: A novel accelerated first-order method, in 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, Germany, 2018, 23.
[15]
J. Diakonikolas and L. Orecchia, The approximate duality gap technique: A unified theory of first-order methods, SIAM J. Optim., 29 (2019), pp. 660--689, https://doi.org/10.1137/18M1172314.
[16]
A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, and C. A. Uribe, Optimal tensor methods in smooth convex and uniformly convex optimization, in Proceedings of the Thirty-Second Conference on Learning Theory (Phoenix, AZ), Proc. Mach. Learn. Res. 99, PMLR, 2019, pp. 1374--1391.
[17]
G. N. Grapiglia and Y. Nesterov, Accelerated regularized Newton methods for minimizing composite convex functions, SIAM J. Optim., 29 (2019), pp. 77--99, https://doi.org/10.1137/17M1142077.
[18]
G. N. Grapiglia and Y. Nesterov, Tensor Methods for Minimizing Convex Functions with Hölder Continuous Higher-Order Derivatives, preprint, https://arxiv.org/abs/1904.12559, 2019.
[19]
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer, New York, 2009.
[20]
B. Jiang, H. Wang, and S. Zhang, An optimal high-order tensor method for convex optimization, in Proceedings of the Thirty-Second Conference on Learning Theory (Phoenix, AZ), Proc. Mach. Learn. Res. 99, PMLR, 2019, pp. 1799--1801, http://proceedings.mlr.press/v99/jiang19a.html.
[21]
A. Juditsky and A. Nemirovski, First order methods for nonsmooth convex large-scale optimization, II: Utilizing problems structure, Optim. Mach. Learn., 30 (2011), pp. 149--183.
[22]
S. P. Karimireddy, S. U. Stich, and M. Jaggi, Global Linear Convergence of Newton's Method without Strong-Convexity or Lipschitz Gradients, preprint, https://arxiv.org/abs/1806.00413, 2018.
[23]
J. M. Kohler and A. Lucchi, Sub-sampled cubic regularization for non-convex optimization, in Proceedings of the 34th International Conference on Machine Learning (ICML), Proc. Mach. Learn. Res. 70, PMLR, 2017, pp. 1895--1904.
[24]
W. Krichene, A. Bayen, and P. L. Bartlett, Accelerated mirror descent in continuous and discrete time, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, 2015, pp. 2845--2853.
[25]
W. Krichene, A. Bayen, and P. L. Bartlett, Adaptive averaging in accelerated descent dynamics, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, 2016, pp. 2991--2999.
[26]
U. Marteau-Ferey, F. Bach, and A. Rudi, Globally convergent Newton methods for ill-conditioned generalized self-concordant losses, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, 2019, pp. 7636--7646.
[27]
R. D. C. Monteiro and B. F. Svaiter, An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods, SIAM J. Optim., 23 (2013), pp. 1092--1125, https://doi.org/10.1137/110833786.
[28]
A. S. Nemirovskii and Y. Nesterov, Optimal methods of smooth convex minimization, USSR Computational Mathematics and Mathematical Physics, 25 (1985), pp. 21--30.
[29]
A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, John Wiley & Sons, New York, 1983.
[30]
Y. Nesterov, A method for solving the convex programming problem with convergence rate $O(1/k^2 )$, Dokl. Akad. Nauk SSSR, 269 (1983), pp. 543--547 (in Russian).
[31]
Y. Nesterov, Accelerating the cubic regularization of Newton's method on convex problems, Math. Program., 112 (2008), pp. 159--181.
[32]
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, Boston, MA, 2004.
[33]
Y. Nesterov, Universal gradient methods for convex optimization problems, Math. Program., 152 (2015), pp. 381--404.
[34]
Y. Nesterov, Complexity bounds for primal-dual methods minimizing the model of objective function, Math. Program., 171 (2018), pp. 311--330.
[35]
Y. Nesterov, Implementable Tensor Methods in Unconstrained Convex Optimization, Tech. report, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), Louvain-la-Neuve, Belgium, 2018.
[36]
Y. Nesterov and B. T. Polyak, Cubic regularization of Newton method and its global performance, Math. Program., 108 (2006), pp. 177--205.
[37]
N. Parikh and S. Boyd, Proximal algorithms, Found. Trends Optim., 1 (2014), pp. 127--239.
[38]
V. Roulet and A. d'Aspremont, Sharpness, restart and acceleration, in Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, 2017, pp. 1119--1129, https://papers.nips.cc/paper/2017/hash/2ca65f58e35d9ad45bf7f3ae5cfd08f1-Abstract.html.
[39]
C. Song, S. Cui, Y. Jiang, and S.-T. Xia, Accelerated stochastic greedy coordinate descent by soft thresholding projection onto simplex, in Advances in Neural Information Processing Systems, NeurIPS, San Diego, CA, 2017, pp. 4838--4847.
[40]
C. Song and J. Diakonikolas, Fast Cyclic Coordinate Dual Averaging with Extrapolation for Generalized Variational Inequalities, preprint, https://arxiv.org/abs/2102.13244, 2021.
[41]
C. Song, Y. Jiang, and Y. Ma, Breaking the $O(1/\varepsilon)$ Optimal Rate for a Class of Minimax Problems, preprint, https://arxiv.org/abs/2003.11758, 2020.
[42]
C. Song, Y. Jiang, and Y. Ma, Newton dual extrapolation for non-monotone variational inequality, in OPT-ML Workshop of the 37th International Conference on Machine Learning, 2020, https://sites.google.com/view/optml-icml2020/accepted-papers?authuser=0.
[43]
C. Song, Y. Jiang, and Y. Ma, Variance reduction via accelerated dual averaging for finite-sum optimization, in Proceedings of the 34th International Conference on Neural Information Processing Systems, NeurIPS, San Diego, CA, 2020, https://proceedings.neurips.cc/paper/2020/hash/093b60fd0557804c8ba0cbf1453da22f-Abstract.html.
[44]
C. Song, S. J. Wright, and J. Diakonikolas, Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums, preprint, https://arxiv.org/abs/2102.13643, 2021.
[45]
C. Song, Y. Zhou, Z. Zhou, Y. Jiang, and Y. Ma, Optimistic dual extrapolation for coherent non-monotone variational inequalities, in Proceedings of the 34th International Conference on Neural Information Processing Systems, NeurIPS, San Diego, CA, 2020.
[46]
W. Su, S. Boyd, and E. J. Candès, A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights, in Proceedings of the 27th International Conference on Neural Information Processing Systems (NeurIPS), Montreal, Canada, 2014, pp. 2510--2518, https://proceedings.neurips.cc/paper/2014/file/f09696910bdd874a99cd74c8f05b5c44-Paper.pdf.
[47]
M. Wang, K. D. Duc, J. Fischer, and Y. S. Song, Operator norm inequalities between tensor unfoldings on the partition lattice, Linear Algebra Appl., 520 (2017), pp. 44--66.
[48]
A. Wibisono, A. C. Wilson, and M. I. Jordan, A variational perspective on accelerated methods in optimization, Proc. Natl. Acad. Sci. USA, 113 (2016), pp. E7351--E7358.
[49]
A. Yurtsever, Q. Tran-Dinh, and V. Cevher, A universal primal-dual convex optimization framework, in Proceedings of the 28th International Conference on Neural Information Processing Systems (NeurIPS), Montreal, Canada, 2015, pp. 3150--3158, https://proceedings.neurips.cc/paper/2015/file/df9028fcb6b065e000ffe8a4f03eeb38-Paper.pdf.

Cited By

View all
  • (2023)Accelerated cyclic coordinate dual averaging with extrapolation for composite convex optimizationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619278(21101-21126)Online publication date: 23-Jul-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image SIAM Journal on Optimization
SIAM Journal on Optimization  Volume 31, Issue 3
DOI:10.1137/sjope8.31.3
Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2021

Author Tags

  1. high-order algorithms
  2. Nesterov's acceleration
  3. proximal method
  4. non-Euclidean norm

Author Tags

  1. 49M15
  2. 49M37
  3. 65K05
  4. 68Q25
  5. 90C25
  6. 90C30

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Accelerated cyclic coordinate dual averaging with extrapolation for composite convex optimizationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619278(21101-21126)Online publication date: 23-Jul-2023

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media