On first-order algorithms for l1/nuclear norm minimization

Published online by Cambridge University Press: 02 April 2013

Yurii Nesterov and

Arkadi Nemirovski

Show author details

Yurii Nesterov: Affiliation:
Center for Operations Research and Econometrics, 34 voie du Roman Pays, 1348, Louvain-la-Neuve, Belgium E-mail: yurii.nesterov@uclivain.be
Arkadi Nemirovski: Affiliation:
School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA E-mail: arkadi.nemirovski@isye.gatech.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In the past decade, problems related to l1/nuclear norm minimization have attracted much attention in the signal processing, machine learning and optimization communities. In this paper, devoted to l1/nuclear norm minimization as ‘optimization beasts’, we give a detailed description of two attractive first-order optimization techniques for solving problems of this type. The first one, aimed primarily at lasso-type problems, comprises fast gradient methods applied to composite minimization formulations. The second approach, aimed at Dantzig-selector-type problems, utilizes saddle-point first-order algorithms and reformulation of the problem of interest as a generalized bilinear saddle-point problem. For both approaches, we give complete and detailed complexity analyses and discuss the application domains.

Type: Research Article
Information: Acta Numerica , Volume 22 , May 2013 , pp. 509 - 575

DOI: https://doi.org/10.1017/S096249291300007X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Argyriou, A., Evgeniou, T. and Pontil, M. (2008), ‘Convex multi-task machine learning’, Mach. Learning 73, 243–272.CrossRef Google Scholar

Arora, S. and Kale, S. (2007), A combinatorial, primal–dual approach to semidefinite programs. In Proc. 39th Annual ACM Symposium on Theory of Computing, ACM, pp. 227–236.Google Scholar

Arora, S., Hazan, E. and Kale, S. (2005), Fast algorithms for approximate semi-definite programming using the multiplicative weights update method. In Proc. 46th Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Society, pp. 339–348.Google Scholar

d'Aspremont, A. W. (2011), ‘Subsampling algorithms for semidefinite programming’, Stochastic Systems 1, 274–305.CrossRef Google Scholar

Bach, F. R., Mairal, J. and Ponce, J. (2008), Convex sparse matrix factorizations. Preprint, Laboratoire d'Informatique de l'Ecole Normale Supérieure.arXiv:0812.1869v1Google Scholar

Baes, M., Bürgisser, M. and Nemirovski, A. (2013), Randomized Mirror-Prox method for solving structured large-scale matrix saddle-point problems. SIAM J. Optim., to appear.Google Scholar

Baraniuk, R., Cevher, V., Duarte, M. F. and Hegde, C. (2010), ‘Model-based compressive sensing’, IEEE Trans. Inform. Theory 56, 1982–2001.CrossRef Google Scholar

Beck, A. and Teboulle, M. (2003), ‘Mirror descent and nonlinear projected subgradient methods for convex optimization’, Oper. Res. Lett. 31, 167–175.CrossRef Google Scholar

Beck, A. and Teboulle, M. (2009 a), ‘A fast iterative shrinkage-thresholding algorithm for linear inverse problems’, SIAM J. Imaging Sci. 2, 183–202.CrossRef Google Scholar

Beck, A. and Teboulle, M. (2009 b), ‘Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems’, IEEE Trans. Image Processing 18, 2419–2434.CrossRef Google Scholar PubMed

Becker, S., Candès, E. and Grant, M. (2010), ‘Templates for convex cone problems with applications to sparse signal recovery’, Math. Prog. Comput. 3, 165–218.CrossRef Google Scholar

Bickel, P., Ritov, Y. and Tsybakov, A. (2008), ‘Simultaneous analysis of Lasso and Dantzig selector’, Ann. Statist. 37, 1705–1732.Google Scholar

Cai, J.-F., Candès, E. and Shen, Z. (2008), ‘A singular value thresholding algorithm for matrix completion’, SIAM J. Optim. 20, 1956–1982.CrossRef Google Scholar

Candès, E. (2006), Compressive sampling. In Proc. International Congress of Mathematicians 2006, Vol. 3, Invited Lectures (Sanz-Solé, M., Soria, J., Varona, J. L. and Verdera, J., eds), European Mathematical Society, pp. 1433–1452.Google Scholar

Candès, E. and Recht, B. (2008), ‘Exact matrix completion via convex optimization’, Found. Comput. Math. 9, 717–772.CrossRef Google Scholar

Candès, E. and Tao, T. (2006), ‘Decoding by linear programming’, IEEE Trans. Inform. Theory 51, 4203–4215.CrossRef Google Scholar

Candès, E. and Tao, T. (2007), ‘The Dantzig selector: Statistical estimation when p is much larger than n’, Ann. Statist. 35, 2313–23516.Google Scholar

Candès, E. and Tao, T. (2009), ‘The power of convex relaxation: Near-optimal matrix completion’, IEEE Trans. Inform. Theory 56, 2053–2080.CrossRef Google Scholar

Candès, E., Romberg, J. and Tao, T. (2006), ‘Stable signal recovery from incomplete and inaccurate measurements’, Comm. Pure Appl. Math. 59, 1207–1223.CrossRef Google Scholar

Chambolle, A. (2004), ‘An algorithm for total variation minimization and applications’, J. Math. Imaging Vision 20, 89–107.Google Scholar

Chandrasekaran, V., Recht, B., Parrilo, P. and Willsky, A. (2012), ‘The convex geometry of linear inverse problems’, Found. Comput. Math. 12, 805–849.CrossRef Google Scholar

Chandrasekaran, V., Sanghavi, D., Parrilo, P. and Willsky, A. (2011), ‘Rank-sparsity incoherence for matrix decomposition’, SIAM J. Optim. 21, 572–596.CrossRef Google Scholar

Chesneau, C. and Hebiri, M. (2008), ‘Some theoretical results on the grouped variables lasso’, Math. Methods Statist. 27, 317–326.CrossRef Google Scholar

Cohen, A., Dahmen, W. and DeVore, R. (2009), ‘Compressed sensing and best k-term approximation’, J. Amer. Math. Soc. 22, 211–231.CrossRef Google Scholar

Dai, W., Sheikh, M., Milenkovic, O., and Baraniuk, R. (2009), ‘Compressive sensing DNA microarrays’, EURASIP J. Bioinformatics and Systems Biology 2009.CrossRef Google Scholar PubMed

Devolder, O. (2011), Stochastic first order methods in smooth convex optimization. CORE Discussion paper 2011/2070.Google Scholar

Donoho, D. and Tanner, J. (2005), ‘Sparse nonnegative solutions of underdetermined linear equations by linear programming’, Proc. Natl Acad. Sci. USA 102, 9446–9451.CrossRef Google Scholar PubMed

Donoho, D., Elad, M. and Temlyakov, V. (2006), ‘Stable recovery of sparse overcomplete representations in the presence of noise’, IEEE Trans. Inform. Theory 53, 6–18.CrossRef Google Scholar

Eldar, Y., Kuppinger, P. and Bülcskei, H. (2010), ‘Block-sparse signals: uncertainty relations and efficient recovery’, IEEE Trans. Signal Processing 58, 3042–3054.CrossRef Google Scholar

Elhamifar, E. and Vidal, R. (2012), ‘Block-sparse recovery via convex optimization’, IEEE Trans. Signal Processing 60 4094–4107.CrossRef Google Scholar

Goldfarb, D. and Ma, S. (2011), ‘Convergence of fixed-point continuation algorithms for matrix rank minimization’, Found. Comput. Math. 11, 183–210.CrossRef Google Scholar

Goldfarb, D. and Yin, W. (2009), ‘Parametric maximum flow algorithms for fast total variation minimization’, SIAM J. Sci. Comput. 31, 3712–3743.CrossRef Google Scholar

Goldfarb, D., Scheinberg, K. and Xi, B. (2011), Fast first order methods for composite convex optimization with backtracking. http://www.optimization-online.org/DBHTML/2011/04/3004.html Google Scholar

Grigoriadis, M. D. and Khachiyan, L. G. (1995), ‘A sublinear-time randomized approximation algorithm for matrix games’, Oper. Res. Lett. 18, 53–58.CrossRef Google Scholar

Herman, M. and Strohmer, T. (2007), ‘High-resolution radar via compressed sensing’, IEEE Trans. Signal Process. 57, 2275–2284.CrossRef Google Scholar

Huang, B., Ma, S. and Goldfarb, D. (2013), ‘Accelerated linearized Bregman method’, J. Sci. Comput. 54, 428–453.CrossRef Google Scholar

Huang, J. and Zhang, T. (2010), ‘The benefit of group sparsity’, Ann. Statist. 38, 1978–2004.CrossRef Google Scholar

Jaggi, M. and Sulovský, M. (2010), A simple algorithm for nuclear norm regularized problems. In ICML 2010: Proc. 27th International Conference on Machine Learning, June 2010, Omnipress, pp. 471–478. www.icml2010.org/papers/196.pdf Google Scholar

Juditsky, A. and Nemirovski, A. (2011 a), ‘On verifiable sufficient conditions for sparse signal recovery via l ₁ minimization’, Math. Program. B 127, 57–88.CrossRef Google Scholar

Juditsky, A. and Nemirovski, A. (2011 b), First-order methods for nonsmooth large-scale convex minimization: I General purpose methods; II Utilizing problem's structure. In Optimization for Machine Learning (Sra, S., Nowozin, S. and Wright, S., eds), The MIT Press, pp. 121–183.CrossRef Google Scholar

Juditsky, A. and Nemirovski, A. (2011 c), ‘Accuracy guarantees for l ₁ recovery’, IEEE Trans. Inform. Theory 57, 7818–7839.CrossRef Google Scholar

Juditsky, A., Karzan, F. Kılınç and Nemirovski, A. (2011 a), ‘Verifiable conditions of l ₁-recovery of sparse signals with sign restrictions’, Math. Program. B 127, 89–122.CrossRef Google Scholar

Juditsky, A., Karzan, F. Kılınç and Nemirovski, A. (2013 a), Randomized first order algorithms with applications to l ₁-minimization. Math. Program., to appear.Google Scholar

Juditsky, A., Karzan, F. Kılınç, Nemirovski, A. S. and Polyak, B. T. (2011 b), On the accuracy of l ₁-filtering of signals with block-sparse structure. In Advances in Neural Information Processing Systems 24 (Shawe-Taylor, J.et al., eds), Neural Information Processing Systems Foundation (NIPS), pp. 1260–1268.Google Scholar

Juditsky, A., Karzan, F. Kılınç, Nemirovski, A. and Polyak, B. (2013 b), Accuracy guarantees for l ₁ recovery of block-sparse signals. Ann. Statist., to appear.Google Scholar

Juditsky, A., Nemirovski, A. and Tauvel, C. (2011 c), ‘Solving variational inequalities with Stochastic Mirror Prox algorithm’, Stochastic Systems 1, 17–58.CrossRef Google Scholar

Lee, J., Recht, B., Salakhutdinov, R., Srebro, N. and Tropp, J. (2010), Practical large-scale optimization for max-norm regularization. In Advances in Neural Information Processing Systems 23 (Lafferty, J.et al., eds), NIPS, pp. 1297–1305.Google Scholar

Lemarechal, C., Nemirovski, A. and Nesterov, Y. (1995), ‘New variants of bundle methods’, Math. Program. 69, 111–148.CrossRef Google Scholar

Liu, H. and Zhang, J. (2009), ‘Estimation consistency of the group lasso and its applications’, J. Mach. Learning Res. Proc. Track 5, 376–383.Google Scholar

Liu, H., Shang, J., Jiang, X. and Liu, J. (2010), ‘The group Dantzig selector’, J. Mach. Learning Res. Proc. Track 9, 461–468.Google Scholar

Liu, Y.-J., Sun, D. and Toh, K.-C. (2012), ‘An implementable proximal point algorithmic framework for nuclear norm minimization’, Math Program. 133, 399–436.CrossRef Google Scholar

Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A. (2011), ‘Oracle inequalities and optimal inference under group sparsity’, Ann. Statist. 39, 2164–2204.CrossRef Google Scholar

Lustig, M., and, D. DonohoPauly, J. M. (2007), ‘Sparse MRI: The application of compressed sensing for rapid MR imaging’, Magnetic Resonance in Medicine 56, 1182–1195.CrossRef Google Scholar

Ma, S., Goldfarb, D. and Chen, L. (2011), ‘Fixed point and Bregman iterative methods for matrix rank minimization’, Math. Program. 128, 321–353.CrossRef Google Scholar

Meier, L., van de Geer, S. and Bühlmann, P. (2008), ‘The group lasso for logistic regression’, J. Roy. Statist. Soc. B 70, 53–71.CrossRef Google Scholar

Meinshausen, N. and Yu, B. (2009), ‘Lasso-type recovery of sparse representations for high-dimensional data’, Ann. Statist. 37, 246–270.CrossRef Google Scholar

Men, H., Nguyen, N., Freund, R., Lim, K., Parrilo, P. and Peraire, J. (2011), ‘Design of photonic crystals with multiple and combined band gaps’, Phys. Rev. E 83, 046703.Google Scholar PubMed

Nardi, Y. and Rinaldo, A. (2008), ‘On the asymptotic properties of the group lasso estimator for linear models’, Electron. J. Statist. 2, 605–633.CrossRef Google Scholar

Nemirovski, A. (1992), ‘Information-based complexity of linear operator equations’, J. Complexity 8, 153–175.CrossRef Google Scholar

Nemirovski, A. (2004), ‘Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex–concave saddle point problems’, SIAM J. Optim. 15, 229–251.CrossRef Google Scholar

Nemirovski, A., Juditsky, A., Lan, G. and Shapiro, A. (2009), ‘Stochastic approximation approach to stochastic programming’, SIAM J. Optim. 19, 1574–1609.CrossRef Google Scholar

Nesterov, Y. (1983), ‘A method for solving a convex programming problem with rate of convergence O(1/k ²)’, Soviet Math. Doklady 27, 372–376.Google Scholar

Nesterov, Y. (2005 a), ‘Smooth minimization of non-smooth functions’, CORE Discussion Paper 2003/12 and Math. Program. 103, 127–152.CrossRef Google Scholar

Nesterov, Y. (2005 b), ‘Excessive gap technique in nonsmooth convex minimization’, SIAM J. Optim. 16, 235–239.CrossRef Google Scholar

Nesterov, Y. (2007 a), ‘Dual extrapolation and its application for solving variational inequalities and related problems’, Math. Program. 109, 319–344.CrossRef Google Scholar

Nesterov, Y. (2007 b), Gradient methods for minimizing composite objective functions. CORE Discussion Paper 2007/76.Google Scholar

Nesterov, Y. (2009), ‘Primal–dual subgradient methods for convex problems’, Math. Program. B 120, 221–259.CrossRef Google Scholar

Nesterov, Y. and Nemirovski, A. (1994), Interior Point Polynomial Algorithms in Convex Programming. SIAM Studies in Applied Mathematics.CrossRef Google Scholar

Nesterov, Y. and Polyak, B. (2006), ‘Cubic regularization of Newton's method and its global performance’, Math. Program. 108, 177–205.CrossRef Google Scholar

Osher, S., Mao, Y., Dong, B. and Yin, W. (2010), ‘Fast linearized Bregman iteration for compressive sensing and sparse denoising’, Commun. Math. Sci. 8, 93–111.Google Scholar

Parvaresh, F., Vikalo, H., Misra, S. and Hassibi, B. (2008), ‘Recovering sparse signals using sparse measurement matrices in compressed DNA microarrays’, IEEE J. Selected Topics Signal Process. 2, 275–285.CrossRef Google Scholar

Qin, Z. and Goldfarb, D. (2012), Structured sparsity via alternating direction methods. J. Mach. Learning Res. 13, 1435–1468.Google Scholar

Recht, B. and Ré, C. (2011), Parallel stochastic gradient algorithms for large-scale matrix completion. http://www.optimization-online.org/DBHTML/2011/04/3012.html Google Scholar

Recht, B., Fazel, M. and Parrilo, P. (2010), ‘Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization’, SIAM Rev. 52, 471–501.CrossRef Google Scholar

Recht, B., Ré, C., and, S. WrightNiu, F. (2011 a), Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems 24 (Shawe-Taylor, J.et al., eds), NIPS, pp. 693–701.Google Scholar

Recht, B., Xuy, W. and Hassibi, B. (2011 b), ‘Null space conditions and thresholds for rank minimization’, Math. Program. B 127, 175–211.CrossRef Google Scholar

Rudin, L., Osher, S. and Fatemi, E. (1992), ‘Nonlinear total variation based noise removal algorithms’, Physica D 60, 259–268.Google Scholar

Santosa, S. and Symes, W. (1986), ‘Linear inversion of band-limited reflection seismograms’, SIAM J. Sci. Comput. 7, 1307–1330.CrossRef Google Scholar

Scheinberg, K., Ma, S. and Goldfarb, D. (2010), Sparse inverse covariance selection via alternating linearization methods. In Advances in Neural Information Processing Systems 23 (Lafferty, J.et al., eds), NIPS, pp. 2101–2109.Google Scholar

Shi, J., Yin, W., Osher, S. and Sajda, P. (2010), ‘A fast hybrid algorithm for large-scale l ₁-regularized logistic regression’, J. Mach. Learning Res. 11, 713–741.Google Scholar

Srebro, N. and Shraibman, A. (2005), Rank, trace-norm and max-norm. In Proc. 18th Annual Conference on Learning Theory (COLT), Vol. 3559 of Lecture Notes in Computer Science, Springer, pp. 545–560.Google Scholar

Studer, V., Bobin, J., Chahid, M., Mousavi, H., Candès, E. and Dahan, M. (2012), ‘Compressive fluorescence microscopy for biological and hyperspectral imaging’, Proc. Natl Acad. Sci. USA 109, E1679–E1687.CrossRef Google Scholar PubMed

Taylor, H., Banks, S. and McCoy, J. (1979), ‘Deconvolution with the l ₁-norm’, Geophys. 44, 39–52.CrossRef Google Scholar

Tibshirani, R. (1996), ‘Regression shrinkage and selection via the lasso’, J. Roy. Statist. Soc. B 58, 267–288.Google Scholar

Tropp, J. (2006), ‘Just relax: Convex programming methods for identifying sparse signals’, IEEE Trans. Inform. Theory 51, 1030–1051.CrossRef Google Scholar

Tseng, P. (2000), ‘A modified forward–backward splitting method for maximal monotone mappings’, SIAM J. Control Optim. 38, 431–446.CrossRef Google Scholar

Tseng, P. (2008), On accelerated proximal gradient methods for convex–concave optimization. Technical report. Submitted to SIAM J. Optim. www.math.washington.edu/~tseng/papers/apgm.pdf Google Scholar

Vasanawala, S., Alley, M. T., Hargreaves, B. A., Barth, R. A., Pauly, J. M. and Lustig, M. (2010), ‘Improved pediatric MR imaging with compressed sensing’, Radiology 256, 607–616.CrossRef Google Scholar PubMed

Wagner, G., Schmieder, P., Stern, A. and Hoch, J. (1993), ‘Application of nonlinear sampling schemes to cosy-type spectra’, J. Biomolecular NMR 3, 569–576.Google Scholar

Wen, Z., Yin, W., Goldfarb, D. and Zhang, Y. (2010), ‘A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation’, SIAM J. Sci. Comput. 32, 1832–1857.CrossRef Google Scholar

Wen, Z., Yin, W., Zhang, H. and Goldfarb, D. (2012), ‘On the convergence of an active-set method for l ₁ minimization’, Optim. Software 27, 1127–1146.CrossRef Google Scholar

Yang, J. and Yuan, X. (2013), ‘Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization’, Math. Comp. 82, 301–329.CrossRef Google Scholar

Yin, W., Osher, S., Goldfarb, D. and Darbon, J. (2008), ‘Bregman iterative algorithms for L ₁-minimization with applications to compressed sensing’, SIAM J. Imaging Sci. 1, 143–168.CrossRef Google Scholar

Yuan, M. and Lin, Y. (2006), ‘Model selection and estimation in regression with grouped variables’, J. Roy. Statist. Soc. B 68, 49–67.CrossRef Google Scholar

Article contents

On first-order algorithms for l1/nuclear norm minimization

Abstract

Access options

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests