Abstract
Convergence analysis is carried out for a forward-backward splitting/generalized gradient projection method for the minimization of a special class of non-smooth and genuinely non-convex minimization problems in infinite-dimensional Hilbert spaces. The functionals under consideration are the sum of a smooth, possibly non-convex and non-smooth, necessarily non-convex functional. For separable constraints in the sequence space, we show that the generalized gradient projection method amounts to a discontinuous iterative thresholding procedure, which can easily be implemented. In this case we prove strong subsequential convergence and moreover show that the limit satisfies strengthened necessary conditions for a global minimizer, i.e., it avoids a certain set of non-global minimizers. Eventually, the method is applied to problems arising in the recovery of sparse data, where strong convergence of the whole sequence is shown, and numerical tests are presented.
Similar content being viewed by others
References
Nocedal, J., Wright, S.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)
Pytlak, R.: Conjugate Gradient Algorithms in Nonconvex Optimization, Nonconvex Optimization and Its Applications, vol. 89. Springer, Berlin (2009)
Nikolova, M.: Analysis of the recovery of edges in images and signals by minimizing nonconvex regularized least-squares. Multiscale Modell. Simul. 4(3), 960–991 (2005)
Nikolova, M.: Markovian reconstruction using a GNC approach. IEEE Trans. Image Process. 8(9), 1204–1220 (1999)
Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Bredies, K., Lorenz, D.A.: Linear convergence of iterative soft-thresholding. J. Fourier Anal. Appl. 14(5–6), 813–837 (2008). doi:10.1007/s00041-008-9041-1
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. Ser. B. (2011). doi:10.1007/s10107-011-0484-9.
Ito, K., Kunisch, K.: A variational approach to sparsity optimization based on Lagrange multiplier theory. MOBIS SFB-Report 2011–2014 (2011). http://math.uni-graz.at/mobis/publications/SFB-Report-2011-014.pdf.
Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward–backward algorithm for minimizing the sum of a differentiable function and a convex function. To appear J. Optim. Theory Appl. (2013). http://www.optimization-online.org/DB_HTML/2013/01/3749.html
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: Inertial proximal algorithm for non-convex optimization. (2013). http://lmb.informatik.uni-freiburg.de/Publications/2013/OB13
Kaplan, A., Tichatschke, R.: Proximal point methods and nonconvex optimization. J. Global Optim. 13, 389–406 (1998)
Lewis, A.S., Luke, D.R., Malick, J.: Local linear convergence for alternating and averaged nonconvex projections. Found. Comput. Math. 9, 485–513 (2009)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, Grundlehren der Mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)
Penot, J.P.: Proximal mappings. J. Approx. Theory 94, 203–221 (1998)
Hare, W., Sagastizábal, C.: Computing proximal points of nonconvex functions. Math. Program. Ser. B 116(1–2), 221–258 (2009)
Demyanov, V.F., Rubinov, A.M.: Approximate Methods in Optimization Problems. Modern Analytic and Computational Methods in Science and Mathematics, vol. 32. American Elsevier, New York (1970)
Dunn, J.C.: Global and asymptotic convergence rate estimates for a class of projected gradient processes. SIAM J. Control Optim. 19(3), 368–400 (1981)
Clarke, F.H.: Optimization and Nonsmooth Analysis. CRM Université de Montréal, Montréal (1989)
Bredies, K., Lorenz, D.A.: Regularization with non-convex separable constraints. Inverse Problems, 25(8), 085011 (2009). doi:10.1088/0266-5611/25/8/085011
Blumensath, T., Yaghoobi, M., Davies, M.: Iterative hard thresholding and \(l^0\) regularisation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2007)
Moulin, P., Liu, J.: Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors. IEEE Trans. Inform. Theory 45(3), 909–919 (1999)
Lorenz, D.A.: Non-convex variational denoising of images: interpolation between hard and soft wavelet shrinkage. Curr. Dev. Theory Appl. Wavelets 1(1), 31–56 (2007)
Lorenz, D.A.: Convergence rates and source conditions for Tikhonov regularization with sparsity constraints. J. Inverse Ill-Posed Problems 16(5), 463–478 (2008). doi:10.1515/JIIP.2008.025
Grasmair, M.: Well-posedness classes for sparse regularisation. Reports of FSP S092-“Industrial Geometry”. University of Vienna, Austria (2010)
Lai, M.J., Wang, J.: An unconstrained \(\ell _q\) minimization with \(0 < q\le 1\) for sparse solution of underdetermined linear systems. SIAM J. Optim. 21(1), 82–101 (2011)
Sturmfels, B.: What is a Gröbner basis? Notices Am. Math. Soc. 52(10), 1199–1200 (2005)
Buchberger, B.: Introduction to Gröbner bases. In: Buchberger, B., Winkler, F. (eds.) Gröbner Basis and Applications. London Mathematical Society Lecture Notes Series, vol. 251, pp. 3–31. Cambridge University Press, Cambridge (1998)
Kreuzer, M., Robbiano, L.: Computational Commutative Algebra, vol. 1. Springer, New York (2000)
Acknowledgments
Kristian Bredies acknowledges support by the SFB Research Center “Mathematical Optimization and Applications in Biomedical Sciences” at the University of Graz. Dirk Lorenz acknowledges support by the DFG under Grant LO 1436/2-1.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Hedy Attouch.
Appendix
Appendix
1.1 8.1 Proof of Lemma 3.2
As \(\phi \) is non-decreasing, \(\phi '(y) \ge 0\) for all \(y > 0\), hence \(\lim _{y \rightarrow \infty } \phi '(y) \ge 0\) (with \(\infty \) allowed). Thus, each \(\rho _s\) is a strictly convex function with \(\rho _s(y) \rightarrow \infty \) for \(y \rightarrow 0\) and \(y \rightarrow \infty \), which implies that a unique minimizer \(y_s > 0\) exists.
Next, note that since \(\rho _s\) is strictly convex for some \(s > 0\), it is locally Lipschitz. Consequently, \(\phi '\) is locally Lipschitz and in particular absolutely continuous on compact intervals contained in \({]{0,\infty }[}\). This implies that \(\phi '\) is differentiable almost everywhere in \({]{0,\infty }[}\), with \(\phi '(y_1) - \phi '(y_0) = \int _{y_0}^{y_1} \phi ''(t) \,\mathrm {d}{t}\) for each \(y_1 > y_0 > 0\). Moreover, by assumption \(t \mapsto t\phi ''(t)\) is Lebesgue integrable on \([0,y]\) for each \(y > 0\), hence Fubini’s theorem is applicable. We compute
Since \((\phi ')'_-\) is strictly monotonically increasing and coincides with \(\phi ''\) where the latter exists, i.e., almost everywhere in \({]{0,\infty }[}\), we get for \(y > 0\) and any \(z \ge (\phi ')'_-(y)\) that
This already proves \(\psi (y) > -z\). For \(\psi \) being strictly decreasing, consider \(y > 0\) where \(\phi ''(y)\) exists and deduce
where the latter has already been established. Hence, \(\psi ' < 0\) almost everywhere in \({]{0,\infty }[}\), and as \(\psi \) is also locally Lipschitz in \({]{0,\infty }[}\), it is absolutely continuous in each compact interval contained in \({]{0,\infty }[}\). Consequently, one can conclude that \(\psi \) is strictly monotonically decreasing. Moreover, \(\psi (y) > -(\phi ')'_-(y)\) also implies \(\psi (y) \rightarrow \infty \) as \(y \rightarrow 0\), since \(\phi '(y)\) is bounded around \(0\) whenever \((\phi ')'_-(y)\) is bounded around \(0\). Finally, from the assumptions \(\phi (y) \rightarrow \infty \) and \(\phi '(y)/y \rightarrow 0\) as \(y \rightarrow \infty \), it follows with L’Hôpital’s rule that \(\phi (y)/y^2 \rightarrow 0\). Consequently, \(\psi (y) \rightarrow 0\) as \(y \rightarrow \infty \) and, together with the above, \(\psi : {]{0,\infty }[} \rightarrow {]{0,\infty }[}\) is onto. \(\square \)
1.2 8.2 Tools from Algebraic Geometry
This section contains some technical lemmas for algebraic equations in order to prove Proposition 5.1. We make here use of the theory of Gröbner bases for solving algebraic equations.
We present the most important notations and definitions, which are needed for the theory here for the sake of convenience and in order to provide a uniform notation. A brief introduction to Gröbner bases can also be found in [28].
Definition 8.1
(Some notations and definitions in polynomial rings)
-
We denote by \(\mathbf {P}\) the polynomial ring \(\mathbf {C}[X_1,\ldots ,X_n]\) in \(n\) variables with complex coefficients.
-
A subgroup \(I\) of \(\mathbf {P}\) with respect to \(+\) is called an ideal of \(\mathbf {P}\), iff for all \(f \in \mathbf {P}\) and all \(g \in I\) the product \(f g\) is also in \(I\). We write \(I \vartriangleleft \mathbf {P}\) if \(I\) is an ideal of \(\mathbf {P}\).
-
Let \(F\) be a subset of \(\mathbf {P}\). The set
$$\begin{aligned} \langle F \rangle = \bigcap \limits _{F \subset I \vartriangleleft \mathbf {P}} I \end{aligned}$$is called the ideal generated by \(F\). It holds that
$$\begin{aligned} \langle f_1,\ldots ,f_m\rangle = \left\{ \sum \limits _{i=1}^m {g_i f_i}:{f_i \in F, \,g_i \in \mathbf {P}\text { for } i = 1,\ldots ,m}\right\} . \end{aligned}$$ -
For \(\alpha \in \mathbf {N}^n_0\) the power product \(X^{\alpha }\) is defined by \(X^{\alpha } := X_1^{\alpha _1} \cdots X^{\alpha _n}_n\).
-
We denote with \(\mathbf {T}\subset \mathbf {P}\) the set \(\{{ X^{\alpha }}\ : \ {\alpha \in \mathbf {N}_0^n}\}\) of power products. A power product \(X^\alpha \) is mixed if \(\alpha _i > 0\) for at least two \(i\).
-
The lexicographical order \(<_{{{\mathrm{lex}}}}\) of multi-indices \(\alpha , \beta \in \mathbf {N}^n_0\) is defined as
$$\begin{aligned} \alpha <_{{{\mathrm{lex}}}} \beta \quad \Leftrightarrow \quad \exists m > 0: (\forall i < m: \alpha _i = \beta _i) \wedge \alpha _m < \beta _m. \end{aligned}$$ -
The total degree lexicographical order \(\prec \) on the set \(\mathbf {T}\) of power products is defined by
$$\begin{aligned}&t_1 \prec t_2 \Leftrightarrow t_1:=X^{\alpha }, t_2 := X^{\beta }, \quad \alpha , \beta \in \mathbf {N}^n_0, \\&\quad |{\alpha }| < |{\beta }| \vee \left( |{\alpha }| = |{\beta }| \wedge \alpha <_{{{\mathrm{lex}}}} \beta \right) \end{aligned}$$with \(|{\alpha }| = \sum _{j=1}^n \alpha _j\).
-
For a polynomial \(f \in \mathbf {P}\) we define the coefficient of the power product \(t\) of \(f\) as \(C(f,t)\).
-
We define the support of the polynomial \(f\) as \({{\mathrm{supp}}}(f) := \{{ t \in \mathbf {T}}\ : \ {C(f,t) \ne 0}\}\).
-
The degree of the polynomial \(f\) is defined by \(\deg (f) := \max \ \{{ |{\alpha }|}\ : \ {X^\alpha \in {{\mathrm{supp}}}(f)}\}\), while the degree with respect to \(X_i\) is given by \(\deg _{X_i}(f) := \max \ \{{ \alpha _i}\ : \ {X^\alpha \in {{\mathrm{supp}}}(f)}\}\).
-
The leading powerproduct of \(f\) is defined by \({{\mathrm{LPP}}}(f) := \max _{\prec }\{{t}\ : \ {t \in {{\mathrm{supp}}}(f)}\}\).
-
The trailing term of \(f\) is the smallest powerproduct in \({{\mathrm{supp}}}(f)\) with respect to \(\prec \).
Definition 8.2
(Varieties) Let \(I \subset \mathbf {P}\) be a set of polynomials. Then the variety of \(I\) is defined as \( V(I) := \{{ c \in \mathbf {C}^n}\ : \ {f(c) = 0 \text { for all } f \in I}\}\).
Definition 8.3
Let \(F \subset \mathbf {P}\). The initial ideal with respect to the order \(\prec \) is defined by \({{\mathrm{In}}}_{\prec }(F) := \left\langle \{{{{\mathrm{LPP}}}(f)}\ : \ {f \in F}\} \right\rangle \).
Definition 8.4
(Gröbner basis) Let \(I \vartriangleleft \mathbf {P}\) be an ideal and \(\prec \) the lexicographical total degree order. A finite \(G \subset I\) such that \({{\mathrm{In}}}_{\prec }(I) = {{\mathrm{In}}}_{\prec }(G)\) is called a Gröbner basis for \(I\) with respect to \(\prec \).
We call \(G\) a reduced Gröbner basis iff additionally
-
(i)
each \(g \in G\) is normed, i.e., the leading coefficient, \(C(g,{{\mathrm{LPP}}}(g))\), is \(1\),
-
(ii)
the set \(\{{{{\mathrm{LPP}}}(g)}\ : \ {g \in G}\}\) minimally generates \({{\mathrm{In}}}_{\prec }(I)\), i.e., for all \(g\in G\) it holds that \({{\mathrm{In}}}_{\prec }(G{\setminus }\{g\}) \ne {{\mathrm{In}}}_{\prec }(G)\), and
-
(iii)
no trailing term of any \(g\in G\) lies in \({{\mathrm{In}}}_{\prec }(I)\).
One can show that a reduced Gröbner basis exists for each ideal \(I\) and that it is unique. We will use the following characterization of Gröbner bases (the proof can be found in [29, Sect. 3.5] or [30, Theorem 2.4.1]).
Theorem 8.1
Let \(I \vartriangleleft \mathbf {P}\) be an ideal and let \(G \subset I\) be a finite set of normed polynomials. Then, the following statements are equivalent:
-
(i)
\(G\) is a Gröbner basis for \(I\) with respect to \(\prec \).
-
(ii)
It holds that \( \{{{{\mathrm{LPP}}}(f)}\ : \ {f \in I}\} = \{{t {{\mathrm{LPP}}}(g)}\ : \ {g \in G, \ t \in \mathbf {T}}\}\).
Theorem 8.2 is also from the theory of Gröbner bases and gives a connection between the finiteness of varieties and the structure of the reduced Gröbner basis (the proof can be found in [29, Sect. 3.6], [30, Proposition 3.7.1]).
Theorem 8.2
Let \(I \vartriangleleft \mathbf {P}\) be an ideal, and \(G\) the reduced Gröbner basis with respect to \(\prec \). Then, the following statements are equivalent:
-
(i)
\(V(I)\) is finite.
-
(ii)
There exists \(g_1,\ldots ,g_n \in G\), and \(m_1,\ldots ,m_n \in \mathbf {N}_0\), such that \({{\mathrm{LPP}}}(g_i) = X_i^{m_i}\).
-
(iii)
The quotient space \(\mathbf {P}/I\) is a finite-dimensional \(\mathbf {C}\) vector space.
Finally, we have all the tools to obtain the desired results about the solutions of algebraic equations. They are formulated in the following two lemmas.
Lemma 8.1
Let \(n \in \mathbf {N}\), \(A \in \mathbf {C}^{n \times n}\) be a Hermitian positive definite matrix. The system of quadratic equations \({{\mathrm{diag}}}(X) A X = 0\), has only the trivial solution \(X = 0\).
Proof
Let \(X\) be a solution of \({{\mathrm{diag}}}(X)AX = 0\). Then, for each \(i\), we have \(X_i =0\) or \((AX)_i = 0\). Hence, we can rearrange the indices \(1,\ldots ,n\) in such a way that for a \(1 \le k \le n\), that \(X_1,\ldots ,X_k \not = 0\), and \(X_{k+1},\ldots ,X_n = 0\). In order to exchange the indices \(i\) and \(j\), we have to exchange the \(i\)-th and the \(j\)-th rows as well as the \(i\)-th and \(j\)-th columns of \(A\). Since the eigenvalues of \(A\) are invariant under those operations, the resulting new matrix \(\tilde{A}\) is also positive definite and Hermitian. Since \(X_{k+1},\ldots , X_n = 0\), we can eliminate the corresponding rows and columns of the system. Denoting the resulting matrix by \(\tilde{\tilde{A}}\) and \(\tilde{X} = (X_1,\ldots ,X_k)\), it follows that
From the Sylvester criterion we know that \(\tilde{\tilde{A}}\) is positive definite as it is a submatrix of \(\tilde{A}\). Therefore, \(\tilde{X} = 0\), and hence, \(X = 0\). \(\square \)
Lemma 8.2
Let \(A \in \mathbf {C}^{n\times n}\) be a positive definite, Hermitian matrix and \(b,c,d \in \mathbf {C}^n\). Then, the system
with \(r > q\) has only finitely many solutions.
Proof
Define the polynomials
and
Furthermore set \(\tilde{I}= \langle \tilde{f}_1,\ldots ,\tilde{f}_n \rangle \).
Let \(\tilde{G}\) be the reduced Gröbner basis for \(\tilde{I}\) with respect to the total degree lexicographical order \(\prec \). From Lemma 8.1 and Theorem 8.2 we know that there are \(m_1,\ldots ,m_n \in \mathbf {N}\) such that \({{\mathrm{LPP}}}(\tilde{g}_i) = X_i^{m_i}\) for \(i=1,\ldots ,n\). Since \(\tilde{G}\subseteq \tilde{I}\) there are \(r_{ij} \in \mathbf {P}\), such that
for all \(i = 1,\ldots ,n\). Let us now fix some \(i \in \{1,\ldots ,n\}\).
Next note that \(\tilde{f}_i\) only contains quadratic power products and that \(X_i^2 \in {{\mathrm{supp}}}(\tilde{f}_i)\) since \(A_{ii} > 0\) by positive definiteness of \(A\). Therefore, \(\deg (\tilde{f}_i) = 2\). Now split each \(r_{ij}\) according to
We claim that \(\sum _{j=1}^n t_{ij} \tilde{f}_j = 0\). Assume the opposite and observe that for each \(t \in \mathbf {T}\) with \(\deg (t) > m_i-2\) and each \(j=1,\ldots ,n\) it follows that \(C(t\tilde{f}_j, s) = 0\) for each \(s \in \mathbf {T}\) with \(\deg (s) \le m_i\). Consequently, \(C(\sum _{j=1}^n t_{ij} \tilde{f}_j, s) = 0\) for each \({s \in \mathbf {T}}\) with \(\deg (s) \le m_i\) implying that \(\deg (\sum _{j=1}^n t_{ij} \tilde{f}_j) > m_i\) since \(\sum _{j=1}^n t_{ij} \tilde{f}_j \ne 0\). But we also have that
and \(\deg \Bigl (\sum _j s_{ij} \tilde{f}_j \Bigr ) \le m_i\), from which the contradiction \(\deg (\sum _{j=1}^n t_{ij} \tilde{f}_j) \le m_i\) follows. Hence, \(\sum _{j=1}^n t_{ij} \tilde{f}_j = 0\). In particular, \(\tilde{g}_i = \sum _{j=1}^n s_{ij} \tilde{f}_j\), so, by possibly replacing \(r_{ij}\) with \(s_{ij}\), we can achieve in (22) that \(\deg (r_{ij}) \le m_i -2\) for each \(j=1,\ldots ,n\).
We define the polynomials \(h_i := \sum _j r_{ij} f_j\) and consider them elements of the polynomial ring \(\tilde{\mathbf {P}} := \mathbf {C}[a_1,\ldots ,a_n,X_1,\ldots ,X_n]\) in \(2n\) variables, and extend the order \(\prec \) to \(\tilde{\mathbf {P}}\), i.e., we consider \(a_1\prec \cdots \prec a_n \prec X_1\prec \cdots \prec X_n\). Additionally, we denote by \(\tilde{\mathbf {T}}\) the set of power products of \(\tilde{\mathbf {P}}\). It follows that
Since we assumed that the \(b_j\), \(c_j\), and \(d_j\) are constants, \(\deg (r_{ij}) \le m_i - 2\) implies that
and since \(a_j \prec X_i\) for all \(j\), we have
Now we define the ideal \(J \vartriangleleft \tilde{\mathbf {P}}\) generated by \(f_1,\ldots ,f_n\) and the polynomials \(k_j = a_j^r - X_j^{q}\), \(j = 1,\ldots ,n\). Clearly, \(h_1,\ldots ,h_n \in J\). Also, \({{\mathrm{LPP}}}(h_i) = X_i^{m_i}\) and \({{\mathrm{LPP}}}(k_i) = a_i^r\) for \(i=1,\ldots ,n\), the latter since \(r > q\).
In order to prove that \(V(J)\) is finite we consider the reduced Gröbner basis \(G\) for \(J\) with respect to \(\prec \). By the above, we have \(X_i^{m_i},a_i^r \in \{{{\mathrm{LPP}}}(f)\,|\,\,f \in J \}\), and by Theorem 8.1 \(\{{{{\mathrm{LPP}}}(f)}\ : \ {f \in J}\} = \{{t g}\ : \ {g \in G, \ t\in \tilde{\mathbf {T}}}\}\) . Hence, for each \(i=1,\ldots ,n\), \(X_i^{m_i} = t_i {{\mathrm{LPP}}}(g_i)\) and \(a_i^r = t_{i-n} {{\mathrm{LPP}}}(g_{i-n})\) for some \(g_1,\ldots ,g_{2n}\in G\), and \(t_1,\ldots ,t_{2n} \in \tilde{\mathbf {T}}\). This is only possible if there exist \(\hat{m}_1,\ldots ,\hat{m}_{2n}\) such that \({{\mathrm{LPP}}}(g_i) = X_i^{\hat{m}_i}\) and \({{\mathrm{LPP}}}(g_{i+n}) = a_i^{\hat{m}_{i+n}}\) for \(i=1,\ldots ,n\). This implies by Theorem 8.2 that \(V(J) = V(\{{ f_1,\ldots ,f_n,k_1,\ldots ,k_n }\})\) is finite, which is what we wanted to show. \(\square \)
Rights and permissions
About this article
Cite this article
Bredies, K., Lorenz, D.A. & Reiterer, S. Minimization of Non-smooth, Non-convex Functionals by Iterative Thresholding. J Optim Theory Appl 165, 78–112 (2015). https://doi.org/10.1007/s10957-014-0614-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-014-0614-7