Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

A proximal difference-of-convex algorithm with extrapolation

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We consider a class of difference-of-convex (DC) optimization problems whose objective is level-bounded and is the sum of a smooth convex function with Lipschitz gradient, a proper closed convex function and a continuous concave function. While this kind of problems can be solved by the classical difference-of-convex algorithm (DCA) (Pham et al. Acta Math Vietnam 22:289–355, 1997), the difficulty of the subproblems of this algorithm depends heavily on the choice of DC decomposition. Simpler subproblems can be obtained by using a specific DC decomposition described in Pham et al. (SIAM J Optim 8:476–505, 1998). This decomposition has been proposed in numerous work such as Gotoh et al. (DC formulations and algorithms for sparse optimization problems, 2017), and we refer to the resulting DCA as the proximal DCA. Although the subproblems are simpler, the proximal DCA is the same as the proximal gradient algorithm when the concave part of the objective is void, and hence is potentially slow in practice. In this paper, motivated by the extrapolation techniques for accelerating the proximal gradient algorithm in the convex settings, we consider a proximal difference-of-convex algorithm with extrapolation to possibly accelerate the proximal DCA. We show that any cluster point of the sequence generated by our algorithm is a stationary point of the DC optimization problem for a fairly general choice of extrapolation parameters: in particular, the parameters can be chosen as in FISTA with fixed restart (O’Donoghue and Candès in Found Comput Math 15, 715–732, 2015). In addition, by assuming the Kurdyka-Łojasiewicz property of the objective and the differentiability of the concave part, we establish global convergence of the sequence generated by our algorithm and analyze its convergence rate. Our numerical experiments on two difference-of-convex regularized least squares models show that our algorithm usually outperforms the proximal DCA and the general iterative shrinkage and thresholding algorithm proposed in Gong et al. (A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, 2013).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. We would also like to point to the article “DC programming and DCA” on the person webpage of Le Thi Hoai An: http://www.lita.univ-lorraine.fr/~lethi/index.php/en/research/dc-programming-and-dca.html.

  2. This algorithm was called “the proximal difference-of-convex decomposition algorithm” in [17]. As noted in [17], their algorithm is the DCA applied to a specific DC decomposition.

  3. It is also discussed at the end of the numerical section of [17] that suitably incorporating extrapolation techniques into the proximal DCA can accelerate the algorithm empirically.

  4. Indeed, when \(P_2 = 0\), FISTA with fixed restart and FISTA with both fixed and adaptive restart are special cases of \(\mathrm{pDCA}_e\).

  5. \(\lambda _{\max }(A^TA)\) is computed via the MATLAB code lambda = norm(A*A’); when \(m\le 2000\), and by opts.issym = 1; lambda= eigs(A*A’,1,’LM’,opts); otherwise.

  6. These \(\lambda \) satisfy \(\lambda < \frac{1}{2}\Vert A^{T}b\Vert _{\infty }\) for all our random instances.

  7. In the tables, “max” means the number of iterations hits 5000.

  8. The CPU time reported for \(\mathrm{pDCA}_e\) does not include the time for computing \(\lambda _{\max }(A^TA)\).

  9. We set \(\epsilon = 0.5\) in (5.4).

  10. In the tables, “max” means the number of iterations hits 5000.

  11. The CPU time reported for \(\mathrm{pDCA}_e\) does not include the time for computing \(\lambda _{\max }(A^TA)\).

References

  1. Ahn, M., Pang, J.S., Xin, J.: Difference-of-convex learning: directional stationarity, optimality, and sparsity. SIAM J. Optim. 27, 1637–1665 (2017)

  2. Alvarado, A., Scutari, G., Pang, J.S.: A new decomposition method for multiuser DC-programming and its applications. IEEE Trans. Signal Process. 62, 2984–2998 (2014)

    Article  MathSciNet  Google Scholar 

  3. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Progr. Ser. B 116, 5–16 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  5. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss-Seidel methods. Math. Progr. Ser. A 137, 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  6. Banert, S., Boţ, R.I.: A general double-proximal gradient algorithm for d.c. programming. arXiv preprint arXiv:1610.06538v1

  7. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  8. Becker, S., Candès, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Progr. Comput. 3, 165–218 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bian, W., Chen, X.: Optimality and complexity for constrained optimization problems with nonconvex regularization. Math. Oper. Res. (2017). doi:10.1287/moor.2016.0837

  10. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2007)

    Article  MATH  Google Scholar 

  11. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Progr. Ser. A 146, 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  12. Candès, E.J., Wakin, M., Boyd, S.: Enhancing sparsity by reweighted \(\ell _{1}\) minimization. J. Fourier Anal. Appl. 14, 877–905 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  13. Chen, X., Lu, Z., Pong, T.K.: Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim. 26, 1465–1492 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  14. Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. Fixed-Point Algorithms Inverse Probl. Sci. Eng. 49, 185–212 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  15. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  16. Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: ICML (2013)

  17. Gotoh, J., Takeda, A., Tono, K.: DC formulations and algorithms for sparse optimization problems. Math. Progr. Ser. B. (2017). doi:10.1007/s10107-017-1181-0

  18. Le Thi, H.A., Pham, D.T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  19. Le Thi, H.A., Pham, D.T., Le, D.M.: Exact penalty in D.C. programming. Vietnam J. Math. 27, 169–178 (1999)

    MathSciNet  MATH  Google Scholar 

  20. Le Thi, H.A., Pham, D.T., Huynh, V.N.: Exact penalty and error bounds in DC programming. J. Glob. Optim. 52, 509–535 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  21. Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. (2017). doi:10.1007/s10208-017-9366-8

  22. Liu, T., Pong, T.K.: Further properties of the forward-backward envelope with applications to difference-of-convex programming. Comput. Optim. Appl. 67, 489–520 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  23. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(\frac{1}{k^2})\). Sov. Math. Dokl. 27, 372–376 (1983)

    MATH  Google Scholar 

  24. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Boston (2004)

    Book  MATH  Google Scholar 

  25. Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Progr. Ser. B 109, 319–344 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  26. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Progr. Ser. B 140, 125–161 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  27. O’Donoghue, B., Candès, E.J.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  28. Pham, D.T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithms and applications. Acta Math. Vietnam. 22, 289–355 (1997)

    MathSciNet  MATH  Google Scholar 

  29. Pham, D.T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8, 476–505 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  30. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)

    Article  Google Scholar 

  31. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)

    Book  MATH  Google Scholar 

  32. Sanjabi, M., Razaviyayn, M., Luo, Z.-Q.: Optimal joint base station assignment and beamforming for heterogeneous networks. IEEE Trans. Signal Process. 62, 1950–1961 (2014)

    Article  MathSciNet  Google Scholar 

  33. Tuy, H.: Convex Analysis and Global Optimization, 2nd edn. Springer, Berlin (2016)

    Book  MATH  Google Scholar 

  34. Wright, S.J., Nowak, R., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)

    Article  MathSciNet  Google Scholar 

  35. Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(\ell _{1-2}\) for compressed sensing. SIAM J. Sci. Comput. 37, A536–A563 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  36. Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  37. Zhang, S., Xin, J.: Minimization of transformed \(L_1\) penalty: theory, difference of convex function algorithm, and robust application in compressed sensing. arXiv preprint arXiv:1411.5735v3

Download references

Acknowledgements

The authors would like to thank the two anonymous referees for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaojun Chen.

Additional information

Xiaojun Chen’s work was supported in part by Hong Kong Research Grants Council PolyU153000/15p. Ting Kei Pong ’s work was supported in part by Hong Kong Research Grants Council PolyU253008/15p.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, B., Chen, X. & Pong, T.K. A proximal difference-of-convex algorithm with extrapolation. Comput Optim Appl 69, 297–324 (2018). https://doi.org/10.1007/s10589-017-9954-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-017-9954-1

Keywords

Mathematics Subject Classification

Navigation