A proximal difference-of-convex algorithm with extrapolation

4876 Accesses
103 Citations
Explore all metrics

Abstract

We consider a class of difference-of-convex (DC) optimization problems whose objective is level-bounded and is the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function and a continuous concave function. While this kind of problems can be solved by the classical difference-of-convex algorithm (DCA) (Pham et al. Acta Math Vietnam 22:289–355, 1997), the difficulty of the subproblems of this algorithm depends heavily on the choice of DC decomposition. Simpler subproblems can be obtained by using a specific DC decomposition described in Pham et al. (SIAM J Optim 8:476–505, 1998). This decomposition has been proposed in numerous work such as Gotoh et al. (DC formulations and algorithms for sparse optimization problems, 2017), and we refer to the resulting DCA as the proximal DCA. Although the subproblems are simpler, the proximal DCA is the same as the proximal gradient algorithm when the concave part of the objective is void, and hence is potentially slow in practice. In this paper, motivated by the extrapolation techniques for accelerating the proximal gradient algorithm in the convex settings, we consider a proximal difference-of-convex algorithm with extrapolation to possibly accelerate the proximal DCA. We show that any cluster point of the sequence generated by our algorithm is a stationary point of the DC optimization problem for a fairly general choice of extrapolation parameters: in particular, the parameters can be chosen as in FISTA with fixed restart (O’Donoghue and Candès in Found Comput Math 15, 715–732, 2015). In addition, by assuming the Kurdyka-Łojasiewicz property of the objective and the differentiability of the concave part, we establish global convergence of the sequence generated by our algorithm and analyze its convergence rate. Our numerical experiments on two difference-of-convex regularized least squares models show that our algorithm usually outperforms the proximal DCA and the general iterative shrinkage and thresholding algorithm proposed in Gong et al. (A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, 2013).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An extension of the proximal point algorithm beyond convexity

Article Open access 06 September 2021

A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption

Article 25 March 2019

Weak Convexity and Approximate Subdifferentials

Article 21 October 2024

Notes

We would also like to point to the article “DC programming and DCA” on the person webpage of Le Thi Hoai An: http://www.lita.univ-lorraine.fr/~lethi/index.php/en/research/dc-programming-and-dca.html.
This algorithm was called “the proximal difference-of-convex decomposition algorithm” in [17]. As noted in [17], their algorithm is the DCA applied to a specific DC decomposition.
It is also discussed at the end of the numerical section of [17] that suitably incorporating extrapolation techniques into the proximal DCA can accelerate the algorithm empirically.
Indeed, when $P_2 = 0$, FISTA with fixed restart and FISTA with both fixed and adaptive restart are special cases of $\mathrm{pDCA}_e$.
$\lambda _{\max }(A^TA)$ is computed via the MATLAB code lambda = norm(A*A’); when $m\le 2000$, and by opts.issym = 1; lambda= eigs(A*A’,1,’LM’,opts); otherwise.
These $\lambda $ satisfy $\lambda < \frac{1}{2}\Vert A^{T}b\Vert _{\infty }$ for all our random instances.
In the tables, “max” means the number of iterations hits 5000.
The CPU time reported for $\mathrm{pDCA}_e$ does not include the time for computing $\lambda _{\max }(A^TA)$.
We set $\epsilon = 0.5$ in (5.4).
In the tables, “max” means the number of iterations hits 5000.
The CPU time reported for $\mathrm{pDCA}_e$ does not include the time for computing $\lambda _{\max }(A^TA)$.

References

Ahn, M., Pang, J.S., Xin, J.: Difference-of-convex learning: directional stationarity, optimality, and sparsity. SIAM J. Optim. 27, 1637–1665 (2017)
Alvarado, A., Scutari, G., Pang, J.S.: A new decomposition method for multiuser DC-programming and its applications. IEEE Trans. Signal Process. 62, 2984–2998 (2014)
Article MathSciNet Google Scholar
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Progr. Ser. B 116, 5–16 (2009)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss-Seidel methods. Math. Progr. Ser. A 137, 91–129 (2013)
Article MathSciNet MATH Google Scholar
Banert, S., Boţ, R.I.: A general double-proximal gradient algorithm for d.c. programming. arXiv preprint arXiv:1610.06538v1
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Article MathSciNet MATH Google Scholar
Becker, S., Candès, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Progr. Comput. 3, 165–218 (2011)
Article MathSciNet MATH Google Scholar
Bian, W., Chen, X.: Optimality and complexity for constrained optimization problems with nonconvex regularization. Math. Oper. Res. (2017). doi:10.1287/moor.2016.0837
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2007)
Article MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Progr. Ser. A 146, 459–494 (2014)
Article MathSciNet MATH Google Scholar
Candès, E.J., Wakin, M., Boyd, S.: Enhancing sparsity by reweighted $\ell _{1}$ minimization. J. Fourier Anal. Appl. 14, 877–905 (2008)
Article MathSciNet MATH Google Scholar
Chen, X., Lu, Z., Pong, T.K.: Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 26, 1465–1492 (2016)
Article MathSciNet MATH Google Scholar
Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. Fixed-Point Algorithms Inverse Probl. Sci. Eng. 49, 185–212 (2011)
Article MathSciNet MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: ICML (2013)
Gotoh, J., Takeda, A., Tono, K.: DC formulations and algorithms for sparse optimization problems. Math. Progr. Ser. B. (2017). doi:10.1007/s10107-017-1181-0
Le Thi, H.A., Pham, D.T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46 (2005)
Article MathSciNet MATH Google Scholar
Le Thi, H.A., Pham, D.T., Le, D.M.: Exact penalty in D.C. programming. Vietnam J. Math. 27, 169–178 (1999)
MathSciNet MATH Google Scholar
Le Thi, H.A., Pham, D.T., Huynh, V.N.: Exact penalty and error bounds in DC programming. J. Glob. Optim. 52, 509–535 (2012)
Article MathSciNet MATH Google Scholar
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. (2017). doi:10.1007/s10208-017-9366-8
Liu, T., Pong, T.K.: Further properties of the forward-backward envelope with applications to difference-of-convex programming. Comput. Optim. Appl. 67, 489–520 (2017)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate $O(\frac{1}{k^2})$. Sov. Math. Dokl. 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Boston (2004)
Book MATH Google Scholar
Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Progr. Ser. B 109, 319–344 (2007)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Progr. Ser. B 140, 125–161 (2013)
Article MathSciNet MATH Google Scholar
O’Donoghue, B., Candès, E.J.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015)
Article MathSciNet MATH Google Scholar
Pham, D.T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithms and applications. Acta Math. Vietnam. 22, 289–355 (1997)
MathSciNet MATH Google Scholar
Pham, D.T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8, 476–505 (1998)
Article MathSciNet MATH Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)
Article Google Scholar
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)
Book MATH Google Scholar
Sanjabi, M., Razaviyayn, M., Luo, Z.-Q.: Optimal joint base station assignment and beamforming for heterogeneous networks. IEEE Trans. Signal Process. 62, 1950–1961 (2014)
Article MathSciNet Google Scholar
Tuy, H.: Convex Analysis and Global Optimization, 2nd edn. Springer, Berlin (2016)
Book MATH Google Scholar
Wright, S.J., Nowak, R., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)
Article MathSciNet Google Scholar
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of $\ell _{1-2}$ for compressed sensing. SIAM J. Sci. Comput. 37, A536–A563 (2015)
Article MathSciNet MATH Google Scholar
Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Article MathSciNet MATH Google Scholar
Zhang, S., Xin, J.: Minimization of transformed $L_1$ penalty: theory, difference of convex function algorithm, and robust application in compressed sensing. arXiv preprint arXiv:1411.5735v3

Download references

Acknowledgements

The authors would like to thank the two anonymous referees for their helpful comments.

Author information

Authors and Affiliations

School of Science, Hebei University of Technology, Tianjin, People’s Republic of China
Bo Wen
Department of Mathematics, Harbin Institute of Technology, Harbin, People’s Republic of China
Bo Wen
Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, People’s Republic of China
Bo Wen, Xiaojun Chen & Ting Kei Pong

Authors

Bo Wen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ting Kei Pong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaojun Chen.

Additional information

Xiaojun Chen’s work was supported in part by Hong Kong Research Grants Council PolyU153000/15p. Ting Kei Pong ’s work was supported in part by Hong Kong Research Grants Council PolyU253008/15p.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wen, B., Chen, X. & Pong, T.K. A proximal difference-of-convex algorithm with extrapolation. Comput Optim Appl 69, 297–324 (2018). https://doi.org/10.1007/s10589-017-9954-1

Download citation

Received: 19 December 2016
Published: 20 October 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s10589-017-9954-1

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An extension of the proximal point algorithm beyond convexity

A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption

Weak Convexity and Approximate Subdifferentials

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

A proximal difference-of-convex algorithm with extrapolation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An extension of the proximal point algorithm beyond convexity

A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption

Weak Convexity and Approximate Subdifferentials

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation