Abstract
In this paper, we study the interplay between acceleration and structure identification for the proximal gradient algorithm. While acceleration is generally beneficial in terms of functional decrease, we report and analyze several cases where its interplay with identification has negative effects on the algorithm behavior (iterates oscillation, loss of structure, etc.). Then, we present a generic method that tames acceleration when structure identification may be at stake; it benefits from a convergence rate that matches the one of the accelerated proximal gradient under some qualifying condition. We show empirically that the proposed method is much more stable in terms of subspace identification compared to the accelerated proximal gradient method while keeping a similar functional decrease.
Similar content being viewed by others
Notes
For instance, if g is the absolute value on \(\mathbb {R}\), the proximity operator of g is equal to 0 for all \(x\in [-1,1]\) (and \(x-\mathrm {sign}(x)\) elsewhere). Thus, if some sequence \(x_k\) converges to a point within \((-1,1)\), then the output of the proximity operator of g at \(x_k\) will be equal to 0 for all k greater than some finite, but unknown, K.
That is for all iterates \(k>0\), \(\Vert x_{k+1}-x^\star \Vert \le \Vert x_k-x^\star \Vert\) for any minimizer \(x^\star \in {{\,\mathrm{argmin}\,}}_x F(x)\).
Other examples include problems regularized with nuclear norm, for which the target structure is the rank of the current (matrix) iterate \(x\in \mathbb {R}^{m_1\times m_2}\). Manifolds of interest are defined as \(\mathsf {M}_i =\{x\in \mathbb {R}^{m_1\times m_2} : {\text {rank}}(x)=i\}\), and the collection is \(\mathsf {M}=\{\mathsf {M}_0, \ldots , \mathsf {M}_{\min (m_1,m_2)}\}\).
Note that we are only interested here in the transient phase of identification when the iterates are in the process of reaching the optimal manifold. The behavior of the accelerated proximal gradient is usually better than the vanilla version before this phase and after identification; see e.g. the recent analyses of [2, 29] and references therein.
\(g = \min (\Vert \cdot \Vert _{p}-1, 0)\) for p in \(\{1.3, 2.6\}\), the proximity operator of which is \(\mathbf {prox}_{\gamma g}(u)\) is \(u (1-\gamma / \Vert u\Vert _p)\) if \(\Vert u\Vert _p > 1+\gamma\), \(u/\Vert u\Vert _p\) if \(1 \le \Vert u\Vert _p \le 1+\gamma\), and u otherwise.
This functional evaluation is actually only needed if the function is sharp with \(p=1\) in Assumption 3. For instance, this is the case when \(f\equiv 0\) and \(g(x) = \Vert x\Vert _1\), in which case \(F(x)=\Vert x\Vert _1\) verifies Assumption 3 with \(\beta =1, p=1\). In these rather degenerate cases, the proximal gradient converges in a finite number of steps.
These properties are often supposed to hold only locally but can be globalized easily [11, Sec. 2.4, Cor. 6].
An example of such a computation is performed for the lasso problem \(F(x) = \frac{1}{2} \left\| Ax-b\right\| _2^2 + \lambda _1 \Vert x\Vert _1\) in [11, Lem. 10]
References
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set Valued Anal. 9(1–2), 3–11 (2001)
Apidopoulos, V., Aujol, J.F., Dossal, C.: Convergence rate of inertial forward-backward algorithm beyond nesterov’s rule. Math. Prog. 180(1), 137–156 (2020)
Attouch, H., Peypouquet, J.: The rate of convergence of nesterov’s accelerated forward-backward method is actually faster than \(1/k^2\). SIAM J. Optim. 26(3), 1824–1834 (2016)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Beck, A.: First-Order Methods in Optimization, vol. 25. SIAM, Philepidia (2017)
Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans Image Process 18(11), 2419–2434 (2009)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bertsekas, D.: On the goldstein–levitin–polyak gradient projection method. IEEE Trans. Automat. Control 21(2), 174–184 (1976)
Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Prog. 165(2), 471–507 (2017)
Burke, J.V., Moré, J.J.: On the identification of active constraints. SIAM J. Numer. Anal. 25(5), 1197–1211 (1988)
Candes, E.J., Romberg, J.K., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. J. Issued Courant Inst. Math. Sci. 59(8), 1207–1223 (2006)
Catalina, A., Alaíz, C.M., Dorronsoro, J.R.: Revisiting fista for lasso: Acceleration strategies over the regularization path. In: ESANN (2018)
Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm”. J. Optim. Theory Appl. 166(3), 968–982 (2015)
Daniilidis, A., Hare, W., Malick, J.: Geometrical interpretation of the predictor-corrector type algorithms in structured optimization problems. Optimization 55(5&6), 481–503 (2006)
Donoho, D.L.: De-noising by soft-thresholding. IEEE Trans Inf. Theory 41(3), 613–627 (1995)
Drusvyatskiy, D., Lewis, A.S.: Optimality, identifiability, and sensitivity. Math. Prog. 147(1–2), 467–498 (2014)
Fadili, J., Malick, J., Peyré, G.: Sensitivity analysis for mirror-stratifiable convex functions. SIAM J. Optim. 28(4), 2975–3000 (2018)
Giselsson, P., Boyd, S.: Monotonicity and restart in fast gradient methods. In: 53rd IEEE Conference on Decision and Control, pp. 5058–5063. IEEE (2014)
Hare, W.L., Lewis, A.S.: Identifying active constraints via partial smoothness and prox-regularity. J. Convex Anal. 11(2), 251–266 (2004)
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (2012)
Ito, N., Takeda, A., Toh, K.C.: A unified formulation and fast accelerated proximal gradient method for classification. J. Mach. Learn. Res. 18(1), 510–558 (2017)
Iutzeler, F., Hendrickx, J.M.: A generic online acceleration scheme for optimization algorithms via relaxation and inertia. Optim. Methods Softw. 34(2), 383–405 (2019)
Iutzeler, F., Malick, J.: On the proximal gradient algorithm with alternated inertia. J. Optim. Theory Appl. 176(3), 688–710 (2018)
Lewis, A.S.: Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13(3), 702–725 (2002)
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Advances in Neural Information Processing Systems, pp. 379–387 (2015)
Liang, J., Fadili, J., Peyré, G., Luke, R.: Activity identification and local linear convergence of douglas–rachford/admm under partial smoothness. In: International Conference on Scale Space and Variational Methods in Computer Vision, pp. 642–653. Springer (2015)
Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward-backward-type methods. SIAM J. Optim. 27(1), 408–437 (2017)
Liang, J., Schönlieb, C.B.: Improving “fast iterative shrinkage-thresholding algorithm”: Faster, smarter and greedier. arXiv preprint arXiv:1811.01430 (2018)
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate \(\cal{O} (1/k^{2})\). Dokl. akad. nauk Sssr 269, 543–547 (1983)
NGuyen, T.P.: Kurdyka-lojasiewicz and convexity: algorithms and applications. Ph.D. thesis, Toulouse University (2017)
O’donoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
Poon, C., Liang, J.: Trajectory of alternating direction method of multipliers and adaptive acceleration. In: Advances in Neural Information Processing Systems, pp. 7355–7363 (2019)
Poon, C., Liang, J., Schoenlieb, C.: Local convergence properties of SAGA/Prox-SVRG and acceleration. In: International Conference on Machine Learning, pp. 4124–4132 (2018)
Scheinberg, K., Goldfarb, D., Bai, X.: Fast first-order methods for composite convex optimization with backtracking. Found. Comput. Math. 14(3), 389–417 (2014)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodological) 58(1), 267–288 (1996)
Vaiter, S., Peyré, G., Fadili, J.: Low Complexity Regularization of Linear Inverse Problems, chap. Sampling Theory, a Renaissance, pp. 103–153. Springer-Birkhäuser (2015)
Vaiter, S., Peyré, G., Fadili, J.: Model consistency of partly smooth regularizers. IEEE Trans. Inf. Theory 64(3), 1725–1737 (2017)
Acknowledgements
The authors would like to thank the AE and the two anonymous reviewers for their indications about the tone of the article and their suggestions for future works. FI benefited from the support of the ANR JCJC project STROLL (ANR-19-CE23-0008).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Obtaining the qualifying condition
The general qualifying condition (QC) can be recovered under the setting of partial smoothness of g and a usual non-degeneracy condition at the optimal point \(x^\star\) and associated point \(u^\star\) such that \(x^\star = \mathbf {prox}_{\gamma g}(x^\star )\).
Definition 1
(Relative interior) The relative interior of a subset S, noted \(\mathrm {ri}~S\) is defined as
where \({\text {Aff}}\) denotes the affine hull of a set.
Definition 2
(Parallel space) For a connex subset A, the parallel space of A, noted \(\mathrm {par}~A\), is defined as the vector space parallel to the affine space generated by A.
Definition 3
(Manifold) A subset \(\mathsf {M}\) of \(\mathbb {R}^n\) is said to be a p-dimensional \(\mathcal C^k\)-submanifold of \(\mathbb {R}^n\) around \(x \in \mathsf {M}\) (\(1 \le k \le +\infty\)) if there exists a local parameterization of \(\mathsf {M}\) around x, that is, a \({\mathcal {C}}^k\) function \(\varphi : \mathbb {R}^p \rightarrow \mathbb {R}^n\) such that \(\varphi\) realizes a local homeomorphism between a neighborhood of \(0 \in \mathbb {R}^p\) and a neighborhood of \(x \in \mathsf {M}\) and the derivative of \(\varphi\) at \(\varphi ^{-1}(x)=0\) is injective.
A p-dimensional \({\mathcal {C}}^k\)-submanifold of \(\mathbb {R}^n\) can alternatively be defined via a local equation, that is, a \(\mathcal C^k\) function \(\varPhi : \mathbb {R}^n \rightarrow \mathbb {R}^{n-p}\) with a surjective derivative at \({\bar{x}} \in \mathsf {M}\), that satisfies for all x close enough to \({\bar{x}}\): \(x \in \mathsf {M}\Leftrightarrow \varPhi (x) = 0\).
To lighten notations, we will only indicate the dimensionality and the smoothness degree of a manifold when it is relevant to the discussion.
Definition 4
(Tangent space) Given a point x living in a manifold \(\mathsf {M}\), defined either by a local parametrization \(\varphi\) or a local equation \(\varPhi\), the tangent space of \(\mathsf {M}\) at x, denoted \(T_\mathsf {M}(x)\), is defined as:
Definition 5
(Partly smooth function) Let \(g:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) be a proper, convex, lower semi-continuous function. g is said to be partly smooth at x relative to a manifold \(\mathsf {M}\) containing x if \(\partial g(x)\ne \emptyset\), and moreover
-
Smoothness: \(\mathsf {M}\) is a \({\mathcal {C}}^2\)-manifold around x, g restricted to \(\mathsf {M}\) is \({\mathcal {C}}^2\) around x;
-
Sharpness: The tangent space \(T_\mathsf {M}(x)\) coincides with \(\mathrm {par} (\partial g(x))^\perp\);
-
Continuity: The set-valued mapping \(\partial g\) is continuous at x relative to \(\mathsf {M}\).
Lemma 4
Let \(\mathsf {M}\)be a manifold and \(u^\star\)a point such that \(x^\star = \mathbf {prox}_{\gamma g}(u^\star )\in \mathsf {M}\). If g is partly smooth at \(x^\star\)relative to \(\mathsf {M}\)and \(\frac{u^\star -x^\star }{\gamma } \in \mathrm {ri}~\partial g(x^\star )\), then (QC) holds.
Proof
The proof follows the reasoning of [16, Lem. 27, Th. 28]. Let us define the function \(\rho (u, x) = g(x) + \frac{1}{2\gamma }\Vert x-u\Vert ^2\). Since g is partly smooth at \(x^\star\) relative to \(\mathsf {M}\), the function \((u, x) \mapsto g(x)\) is also partly smooth at \((u^\star , x^\star )\) relative to \(\mathbb {R}^n\times \mathsf {M}\) [26, Prop. 4.5]. Since the addition of a smooth function does not change this property [26, Cor. 4.6], \(\rho\) is partly smooth at \((u^\star , x^\star )\) relative to \(\mathbb {R}^n\times \mathsf {M}\).
Besides, defining the parametrized function \(\rho _{u^\star }(\cdot ) = \rho (u^\star , \cdot )\), we have that
-
(i)
\(x^\star = \mathbf {prox}_{\gamma g}(u^\star ) = {{\,\mathrm{argmin}\,}}_{x\in \mathbb {R}^n} \{ g(x) + \frac{1}{2\gamma } \Vert x - u^\star \Vert ^2 \} = {{\,\mathrm{argmin}\,}}_{x\in \mathbb {R}^n} \rho _{u^\star } ( x )\) and as \(\rho _{u^\star }\) is \(1/\gamma\)-strongly convex, we have for all \(x\in \mathbb {R}^n\), \(\rho _{u^\star }(x) \ge \rho _{u^\star }(x^\star ) + \frac{1}{2\gamma }\Vert x-x^\star \Vert ^2\);
-
(ii)
the qualifying constraint \(\frac{u^\star -x^\star }{\gamma } \in \mathrm {ri}~\partial g(x^\star )\) implies that \(0\in \mathrm {ri}~\left( \partial g(x^\star ) + \frac{x^\star -u^\star }{\gamma } \right) = \mathrm {ri}~\partial \rho _{u^\star }(x^\star )\);
which enable us to use [16, Lem. 27] (or equivalently [21, Th. 3.2 and 6.2i]).
Thus, there exists a neighborhood \({\mathcal {B}}(u^\star , \varepsilon )\) of \(u^\star\) and a function \(\varPhi\) such that for all u in \({\mathcal {B}}(u^\star , \varepsilon )\), \(\varPhi (u)\in \mathsf {M}\) and is a critical point of \(\rho _u\) restricted to a neighborhood of \(x^\star\). Since \(\rho _u\) is convex, \(\varPhi (u)\) is actually the global optimum of \(\rho _u\). Therefore, we have that for all u in \(\mathcal B(u^\star , \varepsilon )\), \(\mathbf {prox}_{\gamma g}(u)\in \mathsf {M}\) which is exactly (QC). \(\square\)
Appendix 2: Recalls on the (accelerated) proximal gradient descent
We recall here the descent lemma for the composite objective function F, which is central in the analysis of any first order method.
Lemma 5
(Descent lemma) Let \(\gamma >0\), the following inequalities hold for any \(x, y \in {\mathbb {R}}^n\)
Proof
The first inequality is directly equivalent to [15, Lem. 1]. The second is derived from the identity \(\Vert \mathcal {T}_\gamma (x)-x\Vert ^2 + \Vert y-x\Vert ^2 = 2\langle \mathcal {T}_\gamma (x)-x, y-x \rangle + \Vert y-\mathcal {T}_\gamma (x)\Vert ^2\). \(\square\)
We now give an accelerated descent lemma for the composite objective F, that is fundamental in the analysis of our provisional algorithm in Sect. 3.1.
Lemma 6
(Accelerated Descent lemma)Let Assumptions 1and 2hold. For any \(\gamma >0\), any pair of points \((x_k, y_k)\)and for \(x^\star\)a fixed point of \({\mathcal {T}}_{\gamma }\)(i.e. a minimizer of F), we have (independently of the acceleration step)
where \(x_{k+1} = {\mathcal {T}}_{\gamma }(y_k)\), and \(v_k:=F(x_{k+1})-F^\star\).
Proof
The proof follows the same global layout as the one of FISTA [7]. Using the Lemma 5 at \((x=y_k, y=x_k)\) and \((x=y_k, y=x^\star )\),
The first equation times \((t_k-1)\) added to the second yields:
Multiplying by \(t_k\), using the relation \(t_k^2-t_k \le t_{k-1}^2\) from Assumption 2(i), we get:
Applying the identity \(\Vert c-a\Vert ^2+2\langle a-b,c-a\rangle = \Vert b-c\Vert ^2 - \Vert b-a\Vert ^2\) to the inner product of the last identity yields the result. \(\square\)
Appendix 3: About error bounds
In order to connect the functional suboptimality with the iterates behavior, the geometry of the function can be used. For a proper convex lower-semicontinuous function F achieving its minimum \(F^\star\), two common variants of geometric inequalities are Error Bounds and Kurdyka-Łojasiewicz inequalities [10, 11, 32]. For anyFootnote 7\(x \notin {{\,\mathrm{argmin}\,}}F\), they respectively write
-
Error Bound \(\varphi (F(x)-F^\star )\ge {{\,\mathrm{dist}\,}}(x,{{\,\mathrm{argmin}\,}}F)\)
-
Kurdyka-Łojasiewicz (KL) \({{\,\mathrm{dist}\,}}(0,\partial F(x))\ge 1/\varphi '(F(x)-F^\star )\)
where \(\varphi (t) = Ct^\theta /\theta\) with \(C>0\) and \(\theta \in (0,1]\) is called a desingularizing function. These properties are widely satisfied (e.g. by any semi-algebraic function [10]) but the desingularizing function (more particularly \(\theta\)) is often hard to estimateFootnote 8.
It is easy to see that Assumption 3 is a (global) error bound with \(\varphi (t) = (1/\beta )^{1/p} t^{1/p}\) (matching the definition of \(\varphi\) with \(C = 1/p (1/\beta )^{1/p}>0\) and \(\theta =1/p\in (0,1]\)) but the knowledge of the constants is not necessary.
Then, an important result about Error Bounds and KL inequalities is that they are equivalent with the same desingularizing function [11, Th. 5] which allows us to get the following result for the proximal gradient operator.
Lemma 7
Let Assumptions 1and 3hold. Then, there exists a constant \(B>0\)such that for all x
Proof
First, let us notice that the KL and error bound properties can be combined as
Furthermore, using that \({{\,\mathrm{dist}\,}}(0,\partial F(\mathsf {T}_\gamma (x) )) \le \frac{L\gamma + 1}{\gamma } \Vert x-\mathsf {T}_\gamma (x) \Vert\) for all \(x\in \mathbb {R}^n\) (see e.g. [11, Sec. 4.1]), we get that
which, combined with the fact that \(\theta =1/p\), gets the claimed result. \(\square\)
Appendix 4: proof of corollary 1
Proof
First, Theorem 3.1 or Theorem 3.2 give the convergence of the sequence \((x_k)\) to the unique minimizer \(x^\star\) and finite time convergence since the partial smoothness assumption of Lemma 1 implies the qualifying constraint (QC) of both theorems from Theorem 4.
In order to show the linear convergence behavior, we follow [29] and [35, Chap. 2.1.2] and apply their first-order analysis of accelerated proximal gradient to (Prov. Alg.). We are in the same setting and have compatible assumptions so that we can apply [29, Prop. 4.5] to (Prov. Alg.) as often as identification happened since both tests will return acceleration after that moment. We have, for \(r_k=x_k-x^\star\), \(d_k = [r_{k-1} \quad r_k]^T\):
with M a matrix of spectral radius \(\rho (M)<1\) as \(\gamma \in (0, 1/L]\) from [29, Cor. 4.9, Rem. 4.10].
Take any \(\varepsilon \in (0,(1-\rho (M))/2)\) and let \(K_2>0\) be the smallest time from which i) identification holds; and ii) \(\Vert e_k\Vert / \Vert d_k\Vert \le \varepsilon\). \(K_2\) is finite since identification happens in finite time and \(\Vert e_k\Vert = o(\Vert d_k\Vert )\).
Expliciting Eq. (4.1) yields, for \(k\ge 0\),
Taking the norm and applying triangular inequalities yield
where \(\rho :=\rho (M) +\varepsilon\) and \(C_M>0\) is a constant such that \(\Vert M^k\Vert \le C_M \rho ^k\) for all k (see [22, Cor. 5.6.13]).
Let us denote \(w_{k+1} = C_M^{-1} (\rho + \varepsilon )^{-k}\Vert d_{K_2+k+1}\Vert\) (\(\rho + \varepsilon = \rho (M) + 2\varepsilon \in (0,1)\)) and show that this sequence is uniformly bounded by a constant \(C_w\). The initialization is clear. Suppose that \(w_l\le C\) for all \(l\le k\); multiplying the previous equation by \(C_M^{-1} (\rho +\epsilon )^{-k}\) yields
By recursion, we have
Therefore, for all \(k\ge 0\), \(\Vert d_{K_2+k+1}\Vert \le C_M C_w (\rho (M) + 2\varepsilon )^k \Vert d_{K+1}\Vert\) for any \(\varepsilon >0\). \(\square\)
Additional illustrations
We provide here additional illustrations Figs. 7 and 8, which consist in the same experiments as in Figs. 5a and 6 except for the starting point of the algorithms, which are taken here as the null vector (or matrix). The conclusions are quite similar to those of Sect. 4.2; the identification stability of the proposed methods is even more explicit.
Besides, we also report one further experiment where the solution is non qualified, and the structure-encoding manifolds have non-null curvature. The smooth part f is again a least-squares function, while the non-smooth part is defined as \(g(x) = \max (0, \Vert x_{1:5}\Vert _{1.3}-1) + \ldots + \max (0, \Vert x_{45:50}\Vert _{1.3}-1)\) for \(x\in {\mathbb {R}}^{50}\). The structure-encoding manifolds of g are any cartesian product \(\mathsf {M}= \times _{i=1}^{10} \mathsf {M}_i \subset {\mathbb {R}}^{50}\), where each manifold \(\mathsf {M}_i\) is either \({\mathbb {R}}^5\) or \(\{x\in {\mathbb {R}}^5 : \Vert x\Vert _{1.3}=1\}\) (i.e. the 1.3-norm unit sphere). Figure 9 displays the performance of the four algorithms of interest (stopped as soon as suboptimality gets below \(10^{-12}\)). None is able to recover the complete structure of the solution, which lives in \(\mathsf {M}^\star = \times _{i=1}^{10} \{x\in {\mathbb {R}}^5 : \Vert x\Vert _{1.3}=1\}\). This is not surprising for a non-qualified problem. The vanilla proximal gradient recovers half of the elementary manifolds, and \(\mathsf {T}^2\) recovers 4 but much faster. Interestingly, the accelerated proximal gradient and \(\mathsf {T}^1\) iterates are not able to retain even one of the manifolds they encountered. This behavior is consistent with the one noticed in Fig. 1b and suggests that proximal gradient may have more robust identification properties than accelerated proximal gradient for non-qualified problems. The second test seems to behave comparably to proximal gradient descent in terms of identification, while converging with the same number of iterations as accelerated proximal gradient.
Rights and permissions
About this article
Cite this article
Bareilles, G., Iutzeler, F. On the interplay between acceleration and identification for the proximal gradient algorithm. Comput Optim Appl 77, 351–378 (2020). https://doi.org/10.1007/s10589-020-00218-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-020-00218-7