Abstract
In this paper we introduce a simple heuristic adaptive restart technique that can dramatically improve the convergence rate of accelerated gradient schemes. The analysis of the technique relies on the observation that these schemes exhibit two modes of behavior depending on how much momentum is applied at each iteration. In what we refer to as the ‘high momentum’ regime the iterates generated by an accelerated gradient scheme exhibit a periodic behavior, where the period is proportional to the square root of the local condition number of the objective function. Separately, it is known that the optimal restart interval is proportional to this same quantity. This suggests a restart technique whereby we reset the momentum whenever we observe periodic behavior. We provide a heuristic analysis that suggests that in many cases adaptively restarting allows us to recover the optimal rate of convergence with no prior knowledge of function parameters.
Similar content being viewed by others
References
A. Auslender, M. Teboulle, Interior gradient and proximal methods for convex and conic optimization, SIAM J. Optim. 16(3), 697–725 (2006).
A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci. 2, 183–202 (2009).
S. Becker, E. Candès, M. Grant, Templates for convex cone problems with applications to sparse signal recovery, Math. Program. Comput. 3(3), 165–218 (2011).
S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press, Cambridge, 2004).
E. Candès, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Commun. Pure Appl. Math. 59(8), 1207–1223 (2006).
E. Candès, M. Wakin, An introduction to compressive sampling, IEEE Signal Process. Mag. 25(2), 21–30 (2008).
A. Chambolle, R. De Vore, N. Lee, B. Lucier, Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage, IEEE Trans. Image Process. 7(3), 319–335 (1998).
A. Chiang, Fundamental Methods of Mathematical Economics (McGraw-Hill, New York, 1984).
I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math. 57(11), 1413–1457 (2004).
D. Donoho, Compressed sensing, IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006).
M. Gu, L. Lim, C. Wu, PARNES: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals. Technical report (2009). arXiv:0911.0492.
M. Hestenes, E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Natl. Bur. Stand. 49(6), 409–436 (1952).
G. Lan, R. Monteiro, Iteration complexity of first-order penalty methods for convex programming. Manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, June 2008
G. Lan, Z. Lu, R. Monteiro, Primal-dual first-order methods with o(1/ϵ) iteration-complexity for cone programming, Math. Program. 1–29 (2009).
J. Liu, L. Yuan, J. Ye, An efficient algorithm for a class of fused lasso problems, in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July (2010), pp. 323–332.
A. Nemirovski, Efficient methods in convex programming. Lecture notes (1994). http://www2.isye.gatech.edu/~nemirovs/Lect_EMCO.pdf.
A. Nemirovski, D. Yudin, Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience Series in Discrete Mathematics (Wiley, New York, 1983).
Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k 2), Sov. Math. Dokl. 27(2), 372–376 (1983).
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic, Dordrecht, 2004).
Y. Nesterov, Gradient methods for minimizing composite objective function. CORE discussion paper (2007). http://www.ecore.be/DPs/dp_1191313936.pdf.
J. Nocedal, S. Wright, Numerical Optimization. Springer Series in Operations Research (Springer, Berlin, 2000).
B. Polyak, Introduction to Optimization. Translations Series in Mathematics and Engineering (Optimization Software, Publications Division, New York, 1987).
R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B 58(1), 267–288 (1994).
P. Tseng, On accelerated proximal gradient methods for convex-concave optimization (2008). http://pages.cs.wisc.edu/~brecht/cs726docs/Tseng.APG.pdf.
Acknowledgements
We are very grateful to Stephen Boyd for his help and encouragement. We would also like to thank Stephen Wright for his advice and feedback, and Stephen Becker and Michael Grant for useful discussions. E. C. would like to thank the ONR (grant N00014-09-1-0258) and the Broadcom Foundation for their support. We must also thank two anonymous reviewers for their constructive feedback.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Felipe Cucker.
Rights and permissions
About this article
Cite this article
O’Donoghue, B., Candès, E. Adaptive Restart for Accelerated Gradient Schemes. Found Comput Math 15, 715–732 (2015). https://doi.org/10.1007/s10208-013-9150-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-013-9150-3