Abstract
We consider a class of difference-of-convex (DC) optimization problems whose objective is level-bounded and is the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function and a continuous concave function. While this kind of problems can be solved by the classical difference-of-convex algorithm (DCA) (Pham et al. Acta Math Vietnam 22:289–355, 1997), the difficulty of the subproblems of this algorithm depends heavily on the choice of DC decomposition. Simpler subproblems can be obtained by using a specific DC decomposition described in Pham et al. (SIAM J Optim 8:476–505, 1998). This decomposition has been proposed in numerous work such as Gotoh et al. (DC formulations and algorithms for sparse optimization problems, 2017), and we refer to the resulting DCA as the proximal DCA. Although the subproblems are simpler, the proximal DCA is the same as the proximal gradient algorithm when the concave part of the objective is void, and hence is potentially slow in practice. In this paper, motivated by the extrapolation techniques for accelerating the proximal gradient algorithm in the convex settings, we consider a proximal difference-of-convex algorithm with extrapolation to possibly accelerate the proximal DCA. We show that any cluster point of the sequence generated by our algorithm is a stationary point of the DC optimization problem for a fairly general choice of extrapolation parameters: in particular, the parameters can be chosen as in FISTA with fixed restart (O’Donoghue and Candès in Found Comput Math 15, 715–732, 2015). In addition, by assuming the Kurdyka-Łojasiewicz property of the objective and the differentiability of the concave part, we establish global convergence of the sequence generated by our algorithm and analyze its convergence rate. Our numerical experiments on two difference-of-convex regularized least squares models show that our algorithm usually outperforms the proximal DCA and the general iterative shrinkage and thresholding algorithm proposed in Gong et al. (A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, 2013).
Similar content being viewed by others
Notes
We would also like to point to the article “DC programming and DCA” on the person webpage of Le Thi Hoai An: http://www.lita.univ-lorraine.fr/~lethi/index.php/en/research/dc-programming-and-dca.html.
It is also discussed at the end of the numerical section of [17] that suitably incorporating extrapolation techniques into the proximal DCA can accelerate the algorithm empirically.
Indeed, when \(P_2 = 0\), FISTA with fixed restart and FISTA with both fixed and adaptive restart are special cases of \(\mathrm{pDCA}_e\).
\(\lambda _{\max }(A^TA)\) is computed via the MATLAB code lambda = norm(A*A’); when \(m\le 2000\), and by opts.issym = 1; lambda= eigs(A*A’,1,’LM’,opts); otherwise.
These \(\lambda \) satisfy \(\lambda < \frac{1}{2}\Vert A^{T}b\Vert _{\infty }\) for all our random instances.
In the tables, “max” means the number of iterations hits 5000.
The CPU time reported for \(\mathrm{pDCA}_e\) does not include the time for computing \(\lambda _{\max }(A^TA)\).
We set \(\epsilon = 0.5\) in (5.4).
In the tables, “max” means the number of iterations hits 5000.
The CPU time reported for \(\mathrm{pDCA}_e\) does not include the time for computing \(\lambda _{\max }(A^TA)\).
References
Ahn, M., Pang, J.S., Xin, J.: Difference-of-convex learning: directional stationarity, optimality, and sparsity. SIAM J. Optim. 27, 1637–1665 (2017)
Alvarado, A., Scutari, G., Pang, J.S.: A new decomposition method for multiuser DC-programming and its applications. IEEE Trans. Signal Process. 62, 2984–2998 (2014)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Progr. Ser. B 116, 5–16 (2009)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss-Seidel methods. Math. Progr. Ser. A 137, 91–129 (2013)
Banert, S., Boţ, R.I.: A general double-proximal gradient algorithm for d.c. programming. arXiv preprint arXiv:1610.06538v1
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Becker, S., Candès, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Progr. Comput. 3, 165–218 (2011)
Bian, W., Chen, X.: Optimality and complexity for constrained optimization problems with nonconvex regularization. Math. Oper. Res. (2017). doi:10.1287/moor.2016.0837
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2007)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Progr. Ser. A 146, 459–494 (2014)
Candès, E.J., Wakin, M., Boyd, S.: Enhancing sparsity by reweighted \(\ell _{1}\) minimization. J. Fourier Anal. Appl. 14, 877–905 (2008)
Chen, X., Lu, Z., Pong, T.K.: Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 26, 1465–1492 (2016)
Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. Fixed-Point Algorithms Inverse Probl. Sci. Eng. 49, 185–212 (2011)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: ICML (2013)
Gotoh, J., Takeda, A., Tono, K.: DC formulations and algorithms for sparse optimization problems. Math. Progr. Ser. B. (2017). doi:10.1007/s10107-017-1181-0
Le Thi, H.A., Pham, D.T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46 (2005)
Le Thi, H.A., Pham, D.T., Le, D.M.: Exact penalty in D.C. programming. Vietnam J. Math. 27, 169–178 (1999)
Le Thi, H.A., Pham, D.T., Huynh, V.N.: Exact penalty and error bounds in DC programming. J. Glob. Optim. 52, 509–535 (2012)
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. (2017). doi:10.1007/s10208-017-9366-8
Liu, T., Pong, T.K.: Further properties of the forward-backward envelope with applications to difference-of-convex programming. Comput. Optim. Appl. 67, 489–520 (2017)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(\frac{1}{k^2})\). Sov. Math. Dokl. 27, 372–376 (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Boston (2004)
Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Progr. Ser. B 109, 319–344 (2007)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Progr. Ser. B 140, 125–161 (2013)
O’Donoghue, B., Candès, E.J.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015)
Pham, D.T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithms and applications. Acta Math. Vietnam. 22, 289–355 (1997)
Pham, D.T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8, 476–505 (1998)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)
Sanjabi, M., Razaviyayn, M., Luo, Z.-Q.: Optimal joint base station assignment and beamforming for heterogeneous networks. IEEE Trans. Signal Process. 62, 1950–1961 (2014)
Tuy, H.: Convex Analysis and Global Optimization, 2nd edn. Springer, Berlin (2016)
Wright, S.J., Nowak, R., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(\ell _{1-2}\) for compressed sensing. SIAM J. Sci. Comput. 37, A536–A563 (2015)
Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zhang, S., Xin, J.: Minimization of transformed \(L_1\) penalty: theory, difference of convex function algorithm, and robust application in compressed sensing. arXiv preprint arXiv:1411.5735v3
Acknowledgements
The authors would like to thank the two anonymous referees for their helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Xiaojun Chen’s work was supported in part by Hong Kong Research Grants Council PolyU153000/15p. Ting Kei Pong ’s work was supported in part by Hong Kong Research Grants Council PolyU253008/15p.
Rights and permissions
About this article
Cite this article
Wen, B., Chen, X. & Pong, T.K. A proximal difference-of-convex algorithm with extrapolation. Comput Optim Appl 69, 297–324 (2018). https://doi.org/10.1007/s10589-017-9954-1
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-017-9954-1