Abstract
This paper presents a new accelerated variant of Nesterov’s method for solving composite convex optimization problems in which certain acceleration parameters are adaptively (and aggressively) chosen so as to substantially improve its practical performance compared to existing accelerated variants while at the same time preserve the optimal iteration-complexity shared by these methods. Computational results are presented to demonstrate that the proposed adaptive accelerated method endowed with a restarting scheme outperforms other existing accelerated variants.
Similar content being viewed by others
References
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci. 2(1), 183–202 (2009). http://dx.doi.org/10.1137/080716542
Burer, S., Monteiro, R.D.C., Zhang, Y.: A computational study of a gradient-based log-barrier algorithm for a class of large-scale SDPs. Math. Program. 95, 359–379 (2003). http://dx.doi.org/10.1007/s10107-002-0353-7
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Core discussion paper 2011/02, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, Dec 2010
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002). http://dx.doi.org/doi:10.1007/s101070100263
Goldfarb, D., Scheinberg, K.: Fast first-order methods for composite convex optimization with line search. Optim.ization-online Prepr. 3004 (2011). http://www.optimization-online.org/DB_HTML/2011/04/3004.html
Gonzaga, C.C., Karas, E.W.: Fine tuning Nesterov’s steepest descent algorithm for differentiable convex programming. Math. Program., 1–26 (2012). http://dx.doi.org/10.1007/s10107-012-0541-z
Gonzaga, Clóvis C., Karas, Elizabeth W., Rossetto, Diane R.: An optimal algorithm for constrained differentiable convex optimization. SIAM J. Optim. 23(4), 1939–1955 (2013). http://dx.doi.org/10.1137/110836602
Lan, G., Lu, Z., Monteiro, R.D.C.: Primal-dual first-order methods with iteration-complexity for cone programming. Math. Program. 126(1), 1–29 (2011)
Merritt, M., Zhang, Y.: Interior-point gradient method for large-scale totally nonnegative least squares problems. J. Optim. Theory Appl. 126, 191–202 (2005). http://dx.doi.org/10.1007/s10957-005-2668-z
Monteiro, R.D.C., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013). http://epubs.siam.org/doi/abs/10.1137/110833786
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence \(O(1/k^2)\). Doklady AN SSSR 269(3), 543–547 (1983)
Nesterov, Y.: Introductory lectures on convex optimization: a basic course of applied optimization, vol. 87. Kluwer Academic Publishers, Boston (2004)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)
Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program. 109(2–3, Ser. B), 319–344 (2007)
Nesterov, Y.: Gradient methods for minimizing composite objective function. CORE, Leuven (2007)
O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math., 1–18 (2013). http://dx.doi.org/10.1007/s10208-013-9150-3
Povh, J., Rendl, F., Wiegele, A.: A boundary point method to solve semidefinite programs. Computing 78, 277–286 (2006). http://dx.doi.org/10.1007/s00607-006-0182-2
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. submitted to SIAM Journal. SIAM J Optim. (2008)
Zhao, X.-Y., Sun, D., Toh, K.-C.: A Newton-CG augmented lagrangian method for semidefinite programming. SIAM J. Optim. 20(4), 1737–1765 (2010). http://link.aip.org/link/?SJE/20/1737/1
Acknowledgments
We want to thank Elizabeth W. Karas for generously providing us with the code of the algorithms in [7]. Renato D.C Monteiro: The work of this author was partially supported by NSF Grants CMMI-0900094 and CMMI-1300221, and ONR Grant ONR N00014-11-1-0062. Benar F. Svaiter: The work of this author was partially supported by CNPq Grants No. 303583/2008-8, 302962/2011-5, 480101/2008-6, 474944/2010-7, FAPERJ Grants E-26/102.821/2008, E-26/102.940/2011.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Technical results used in Section 2
This section establishes a technical result which is used in the proof of Proposition 1 of Sect. 2, namely, Lemma 4.
This section assumes that the functions \(\phi \), f and h, and the set \(\Omega \), satisfy conditions C.1–C.5 of Sect. 2. Before stating and giving the proof of Lemma 4, we establish two technical results.
Lemma 2
Let \(A\ge 0\), \(\lambda >0\) and \(x^{0}\), \(y\in \mathcal {X}\) be given and assume that \(\Gamma :\mathcal {X}\rightarrow \bar{\mathbb {R}}\) is a proper closed convex function \(\Gamma :\mathcal {X}\rightarrow \bar{\mathbb {R}}\) such that \(A\Gamma \) is \(A\mu \)-strongly convex. Assume also that the triple \((y,A,\Gamma )\) satisfies
Also, define \(x\in \mathcal {X}\), \(a>0\) and \(\tilde{x}\in \mathcal {X}\) as
Then, for any proper closed convex function \(\gamma :\mathcal {X}\rightarrow \bar{\mathbb {R}}\) minorizing \(\phi \), we have
where
Proof
Note also that the definition of a in (21) implies that
Now, let an arbitrary \(u\in \mathcal {X}\) be given and define
Clearly, in view of the second identity in (21), we have
In view of (19), (20), the last two relations, and the fact that \(\gamma \) is a convex function minorizing \(\phi \) and the function \(A\Gamma (\cdot )+\Vert \cdot -x^{0}\Vert ^{2}/2\) is \((A\mu +1)\)-strongly convex, we conclude that
where the last inequality is due to (23) and the definition of \(\theta \) in (22). Since the above inequality holds for every \(u\in \mathcal {X}\), the conclusion of the lemma follows. \(\square \)
Lemma 3
Let \(\lambda >0\), \(\xi \in \mathcal {X}\) and a proper closed \(\mu \)-strongly convex function \(g:\mathcal {X}\rightarrow \bar{\mathbb {R}}\) be given and denote the optimal solution of
by \(\bar{x}\). Then, the function \(\gamma :\mathcal {X}\rightarrow \mathbb {R}\) defined as
has the property that \(\gamma \le g\) and \(\bar{x}\) is the optimal solution of
As a consequence, the optimal value of (25) is the same as that of (24).
Proof
The optimality condition for (24) implies that \((\xi -\bar{x})/\lambda \in \partial g(\bar{x})\), and hence that \(\gamma \le g\) in view of the definition of \(\gamma \) and the fact that g is a proper closed \(\mu \)-strongly convex function. Moreover, \(\bar{x}\) clearly satisfies the optimality condition for (25), from which the second claim of the lemma follows. Finally, the latter conclusion of the lemma follows immediately from its second claim. \(\square \)
The main result of this appendix is as follows.
Lemma 4
Let \(A\ge 0\), \(0<\lambda \le 1/(L-\mu _{f})\), \(x^{0},y\in \mathcal {X}\) and a proper closed \(\mu \)-strongly convex function \(\Gamma :\mathcal {X}\rightarrow \mathbb {R}\) be given, and assume that the triple \((y,A,\Gamma )\) satisfies (19). Let x, a and \(\tilde{x}\) be as in (20) and (21), and define
where
Also, define the functions \(\gamma ,\Gamma ^{+}:\mathcal {X}\rightarrow \bar{\mathbb {R}}\) as
Then, \(\Gamma ^{+}\) is a proper closed \(\mu \)-strongly convex function and the triple \((y^{+},A^{+},\Gamma ^{+})\) satisfies
Moreover, if \(\Gamma \) is a quadratic function with Hessian equal to \(\mu I\), then the (unique) optimal solution \(x^{+}\) of the minimization problem in (31) can be obtained as
Proof
We first claim that
To prove the above claim, note that conditions C.2 and C.3, the non-expansiveness property of \(\Pi _{\Omega }\), and the assumption that \(\lambda \in (0,1/(L-\mu _{f})]\), imply that
Then, in view of (27) and the above relation with \(u=y^{+}\), we have
The claim now follows from the first inequality in (35), the above relation, the definitions of \(\gamma \) and p, and Lemma 3 with \(g=p\) and \(\xi =\tilde{x}\) (and hence \(\bar{x}=y^{+}\)).
Now, using the assumption that (19) holds, the second relation in (34) and Lemma 2, we conclude that
and hence that the first inequality in (31) holds in view of (26) and (30). Moreover, the first inequality in (34), relations (26) and (30), and the first inequality of (19), clearly imply the second inequality in (31).
To show (32), let \(\lambda _{\mu }:=\lambda (A\mu +1)\) and note that the definition of a in (21) implies that
This inequality together with the definition of \(A^{+}\) in (26) then yields
showing that (32) holds. Finally, to show (33), first observe that the optimality conditions for (20) and (31) imply that \(x=x_{0}-A\nabla \Gamma (x)\) and \(x^{+}=x_{0}-A^{+}\nabla \Gamma ^{+}(x^{+})\). These two identities together with relations (26), (29) and (30), and the assumption that \(\Gamma \) is a quadratic function with Hessian equal to \(\mu I\), then imply that
and hence that (33) holds. \(\square \)
Appendix 2: Proof of Proposition 3
First note that the equivalence between (a) and (b) of Lemma 1 with \(\chi =\sum _{i=1}^{m}\alpha _{i}\phi (y)\), \(u_{0}=y\) and \(q(\cdot )=\sum _{i=1}^{m}\alpha _{i}\gamma _{i}(\cdot )+\Vert \cdot -x_{0}\Vert ^{2}/2\) (and hence \(Q=I+\sum _{i=1}^{m}\alpha _{i}\nabla ^{2}\gamma _{i}\)) imply that the hard constraint in problem (12) is equivalent to
Now, assume that (b) holds. Then, in view of (13), the point \(\alpha (t):=(t\bar{\alpha }_{1},\ldots ,t\bar{\alpha }_{m})\) clearly satisfies (36), and hence is feasible for (12) for every \(t>0\). Hence, for a fixed \(x^{*}\in X^{*}\), this conclusion implies that
where the last inequality follows from the assumption that \(\gamma _{i}\le \phi \) and the non-negativity of \(t\bar{\alpha }_{1},\ldots ,t\bar{\alpha }_{m}\). Dividing this expression by \(t(\sum _{i=1}^{m}\bar{\alpha }_{i})>0\), and letting \(t\rightarrow \infty \), we then conclude that \(\phi (y)-\phi (x^{*})\le 0\), and hence that \(y\in X^{*}\). Also, the objective function of (12) at the point \(\alpha (t)\) converges to infinity as \(t\rightarrow \infty \). We have thus shown that (b) implies (a).
Now assume that (a) holds. Since the feasible set of (12) is closed convex, it must have a nonzero direction of recession \(\bar{\alpha }:=(\bar{\alpha }_{1},\ldots ,\bar{\alpha }_{m})\). Moreover, since this set obviously contains the zero vector, it follows that \(\alpha (t):=(t\bar{\alpha }_{1},\ldots ,t\bar{\alpha }_{m})\) is feasible for (12) for every \(t>0\). This implies that \(\alpha (t)\) satisfies (36) for every \(t>0\) and \(\bar{\alpha }_{i}\ge 0\) for every i. Using the latter fact, the assumption that \(\gamma _{i}\le \phi \) (and hence \(\gamma _{i}(y)\le \phi (y)\)) for every i, and the assumption that either \(\nabla ^{2}\gamma _{i}\) is zero or is positive definite for every i (and hence, \(\sum _{i=1}^{m}\bar{\alpha }_{i}\nabla ^{2}\gamma _{i}\) is zero or is positive definite), we easily see that (13) holds. \(\square \)
Rights and permissions
About this article
Cite this article
Monteiro, R.D.C., Ortiz, C. & Svaiter, B.F. An adaptive accelerated first-order method for convex optimization. Comput Optim Appl 64, 31–73 (2016). https://doi.org/10.1007/s10589-015-9802-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-015-9802-0