Hostname: page-component-586b7cd67f-rdxmf Total loading time: 0 Render date: 2024-11-21T05:08:01.691Z Has data issue: false hasContentIssue false

On first-order algorithms for l1/nuclear norm minimization

Published online by Cambridge University Press:  02 April 2013

Yurii Nesterov
Affiliation:
Center for Operations Research and Econometrics, 34 voie du Roman Pays, 1348, Louvain-la-Neuve, Belgium E-mail: yurii.nesterov@uclivain.be
Arkadi Nemirovski
Affiliation:
School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA E-mail: arkadi.nemirovski@isye.gatech.edu

Abstract

In the past decade, problems related to l1/nuclear norm minimization have attracted much attention in the signal processing, machine learning and optimization communities. In this paper, devoted to l1/nuclear norm minimization as ‘optimization beasts’, we give a detailed description of two attractive first-order optimization techniques for solving problems of this type. The first one, aimed primarily at lasso-type problems, comprises fast gradient methods applied to composite minimization formulations. The second approach, aimed at Dantzig-selector-type problems, utilizes saddle-point first-order algorithms and reformulation of the problem of interest as a generalized bilinear saddle-point problem. For both approaches, we give complete and detailed complexity analyses and discuss the application domains.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Argyriou, A., Evgeniou, T. and Pontil, M. (2008), ‘Convex multi-task machine learning’, Mach. Learning 73, 243272.CrossRefGoogle Scholar
Arora, S. and Kale, S. (2007), A combinatorial, primal–dual approach to semidefinite programs. In Proc. 39th Annual ACM Symposium on Theory of Computing, ACM, pp. 227236.Google Scholar
Arora, S., Hazan, E. and Kale, S. (2005), Fast algorithms for approximate semi-definite programming using the multiplicative weights update method. In Proc. 46th Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Society, pp. 339348.Google Scholar
d'Aspremont, A. W. (2011), ‘Subsampling algorithms for semidefinite programming’, Stochastic Systems 1, 274305.CrossRefGoogle Scholar
Bach, F. R., Mairal, J. and Ponce, J. (2008), Convex sparse matrix factorizations. Preprint, Laboratoire d'Informatique de l'Ecole Normale Supérieure.arXiv:0812.1869v1Google Scholar
Baes, M., Bürgisser, M. and Nemirovski, A. (2013), Randomized Mirror-Prox method for solving structured large-scale matrix saddle-point problems. SIAM J. Optim., to appear.Google Scholar
Baraniuk, R., Cevher, V., Duarte, M. F. and Hegde, C. (2010), ‘Model-based compressive sensing’, IEEE Trans. Inform. Theory 56, 19822001.CrossRefGoogle Scholar
Beck, A. and Teboulle, M. (2003), ‘Mirror descent and nonlinear projected subgradient methods for convex optimization’, Oper. Res. Lett. 31, 167175.CrossRefGoogle Scholar
Beck, A. and Teboulle, M. (2009 a), ‘A fast iterative shrinkage-thresholding algorithm for linear inverse problems’, SIAM J. Imaging Sci. 2, 183202.CrossRefGoogle Scholar
Beck, A. and Teboulle, M. (2009 b), ‘Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems’, IEEE Trans. Image Processing 18, 24192434.CrossRefGoogle ScholarPubMed
Becker, S., Candès, E. and Grant, M. (2010), ‘Templates for convex cone problems with applications to sparse signal recovery’, Math. Prog. Comput. 3, 165218.CrossRefGoogle Scholar
Bickel, P., Ritov, Y. and Tsybakov, A. (2008), ‘Simultaneous analysis of Lasso and Dantzig selector’, Ann. Statist. 37, 17051732.Google Scholar
Cai, J.-F., Candès, E. and Shen, Z. (2008), ‘A singular value thresholding algorithm for matrix completion’, SIAM J. Optim. 20, 19561982.CrossRefGoogle Scholar
Candès, E. (2006), Compressive sampling. In Proc. International Congress of Mathematicians 2006, Vol. 3, Invited Lectures (Sanz-Solé, M., Soria, J., Varona, J. L. and Verdera, J., eds), European Mathematical Society, pp. 14331452.Google Scholar
Candès, E. and Recht, B. (2008), ‘Exact matrix completion via convex optimization’, Found. Comput. Math. 9, 717772.CrossRefGoogle Scholar
Candès, E. and Tao, T. (2006), ‘Decoding by linear programming’, IEEE Trans. Inform. Theory 51, 42034215.CrossRefGoogle Scholar
Candès, E. and Tao, T. (2007), ‘The Dantzig selector: Statistical estimation when p is much larger than n’, Ann. Statist. 35, 231323516.Google Scholar
Candès, E. and Tao, T. (2009), ‘The power of convex relaxation: Near-optimal matrix completion’, IEEE Trans. Inform. Theory 56, 20532080.CrossRefGoogle Scholar
Candès, E., Romberg, J. and Tao, T. (2006), ‘Stable signal recovery from incomplete and inaccurate measurements’, Comm. Pure Appl. Math. 59, 12071223.CrossRefGoogle Scholar
Chambolle, A. (2004), ‘An algorithm for total variation minimization and applications’, J. Math. Imaging Vision 20, 89107.Google Scholar
Chandrasekaran, V., Recht, B., Parrilo, P. and Willsky, A. (2012), ‘The convex geometry of linear inverse problems’, Found. Comput. Math. 12, 805849.CrossRefGoogle Scholar
Chandrasekaran, V., Sanghavi, D., Parrilo, P. and Willsky, A. (2011), ‘Rank-sparsity incoherence for matrix decomposition’, SIAM J. Optim. 21, 572596.CrossRefGoogle Scholar
Chesneau, C. and Hebiri, M. (2008), ‘Some theoretical results on the grouped variables lasso’, Math. Methods Statist. 27, 317326.CrossRefGoogle Scholar
Cohen, A., Dahmen, W. and DeVore, R. (2009), ‘Compressed sensing and best k-term approximation’, J. Amer. Math. Soc. 22, 211231.CrossRefGoogle Scholar
Dai, W., Sheikh, M., Milenkovic, O., and Baraniuk, R. (2009), ‘Compressive sensing DNA microarrays’, EURASIP J. Bioinformatics and Systems Biology 2009.CrossRefGoogle ScholarPubMed
Devolder, O. (2011), Stochastic first order methods in smooth convex optimization. CORE Discussion paper 2011/2070.Google Scholar
Donoho, D. and Tanner, J. (2005), ‘Sparse nonnegative solutions of underdetermined linear equations by linear programming’, Proc. Natl Acad. Sci. USA 102, 94469451.CrossRefGoogle ScholarPubMed
Donoho, D., Elad, M. and Temlyakov, V. (2006), ‘Stable recovery of sparse overcomplete representations in the presence of noise’, IEEE Trans. Inform. Theory 53, 618.CrossRefGoogle Scholar
Eldar, Y., Kuppinger, P. and Bülcskei, H. (2010), ‘Block-sparse signals: uncertainty relations and efficient recovery’, IEEE Trans. Signal Processing 58, 30423054.CrossRefGoogle Scholar
Elhamifar, E. and Vidal, R. (2012), ‘Block-sparse recovery via convex optimization’, IEEE Trans. Signal Processing 60 40944107.CrossRefGoogle Scholar
Goldfarb, D. and Ma, S. (2011), ‘Convergence of fixed-point continuation algorithms for matrix rank minimization’, Found. Comput. Math. 11, 183210.CrossRefGoogle Scholar
Goldfarb, D. and Yin, W. (2009), ‘Parametric maximum flow algorithms for fast total variation minimization’, SIAM J. Sci. Comput. 31, 37123743.CrossRefGoogle Scholar
Goldfarb, D., Scheinberg, K. and Xi, B. (2011), Fast first order methods for composite convex optimization with backtracking. http://www.optimization-online.org/DBHTML/2011/04/3004.htmlGoogle Scholar
Grigoriadis, M. D. and Khachiyan, L. G. (1995), ‘A sublinear-time randomized approximation algorithm for matrix games’, Oper. Res. Lett. 18, 5358.CrossRefGoogle Scholar
Herman, M. and Strohmer, T. (2007), ‘High-resolution radar via compressed sensing’, IEEE Trans. Signal Process. 57, 22752284.CrossRefGoogle Scholar
Huang, B., Ma, S. and Goldfarb, D. (2013), ‘Accelerated linearized Bregman method’, J. Sci. Comput. 54, 428453.CrossRefGoogle Scholar
Huang, J. and Zhang, T. (2010), ‘The benefit of group sparsity’, Ann. Statist. 38, 19782004.CrossRefGoogle Scholar
Jaggi, M. and Sulovský, M. (2010), A simple algorithm for nuclear norm regularized problems. In ICML 2010: Proc. 27th International Conference on Machine Learning, June 2010, Omnipress, pp. 471478. www.icml2010.org/papers/196.pdfGoogle Scholar
Juditsky, A. and Nemirovski, A. (2011 a), ‘On verifiable sufficient conditions for sparse signal recovery via l 1 minimization’, Math. Program. B 127, 5788.CrossRefGoogle Scholar
Juditsky, A. and Nemirovski, A. (2011 b), First-order methods for nonsmooth large-scale convex minimization: I General purpose methods; II Utilizing problem's structure. In Optimization for Machine Learning (Sra, S., Nowozin, S. and Wright, S., eds), The MIT Press, pp. 121183.CrossRefGoogle Scholar
Juditsky, A. and Nemirovski, A. (2011 c), ‘Accuracy guarantees for l 1 recovery’, IEEE Trans. Inform. Theory 57, 78187839.CrossRefGoogle Scholar
Juditsky, A., Karzan, F. Kılınç and Nemirovski, A. (2011 a), ‘Verifiable conditions of l 1-recovery of sparse signals with sign restrictions’, Math. Program. B 127, 89122.CrossRefGoogle Scholar
Juditsky, A., Karzan, F. Kılınç and Nemirovski, A. (2013 a), Randomized first order algorithms with applications to l 1-minimization. Math. Program., to appear.Google Scholar
Juditsky, A., Karzan, F. Kılınç, Nemirovski, A. S. and Polyak, B. T. (2011 b), On the accuracy of l 1-filtering of signals with block-sparse structure. In Advances in Neural Information Processing Systems 24 (Shawe-Taylor, J.et al., eds), Neural Information Processing Systems Foundation (NIPS), pp. 12601268.Google Scholar
Juditsky, A., Karzan, F. Kılınç, Nemirovski, A. and Polyak, B. (2013 b), Accuracy guarantees for l 1 recovery of block-sparse signals. Ann. Statist., to appear.Google Scholar
Juditsky, A., Nemirovski, A. and Tauvel, C. (2011 c), ‘Solving variational inequalities with Stochastic Mirror Prox algorithm’, Stochastic Systems 1, 1758.CrossRefGoogle Scholar
Lee, J., Recht, B., Salakhutdinov, R., Srebro, N. and Tropp, J. (2010), Practical large-scale optimization for max-norm regularization. In Advances in Neural Information Processing Systems 23 (Lafferty, J.et al., eds), NIPS, pp. 12971305.Google Scholar
Lemarechal, C., Nemirovski, A. and Nesterov, Y. (1995), ‘New variants of bundle methods’, Math. Program. 69, 111148.CrossRefGoogle Scholar
Liu, H. and Zhang, J. (2009), ‘Estimation consistency of the group lasso and its applications’, J. Mach. Learning Res. Proc. Track 5, 376383.Google Scholar
Liu, H., Shang, J., Jiang, X. and Liu, J. (2010), ‘The group Dantzig selector’, J. Mach. Learning Res. Proc. Track 9, 461468.Google Scholar
Liu, Y.-J., Sun, D. and Toh, K.-C. (2012), ‘An implementable proximal point algorithmic framework for nuclear norm minimization’, Math Program. 133, 399436.CrossRefGoogle Scholar
Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A. (2011), ‘Oracle inequalities and optimal inference under group sparsity’, Ann. Statist. 39, 21642204.CrossRefGoogle Scholar
Lustig, M., and, D. DonohoPauly, J. M. (2007), ‘Sparse MRI: The application of compressed sensing for rapid MR imaging’, Magnetic Resonance in Medicine 56, 11821195.CrossRefGoogle Scholar
Ma, S., Goldfarb, D. and Chen, L. (2011), ‘Fixed point and Bregman iterative methods for matrix rank minimization’, Math. Program. 128, 321353.CrossRefGoogle Scholar
Meier, L., van de Geer, S. and Bühlmann, P. (2008), ‘The group lasso for logistic regression’, J. Roy. Statist. Soc. B 70, 5371.CrossRefGoogle Scholar
Meinshausen, N. and Yu, B. (2009), ‘Lasso-type recovery of sparse representations for high-dimensional data’, Ann. Statist. 37, 246270.CrossRefGoogle Scholar
Men, H., Nguyen, N., Freund, R., Lim, K., Parrilo, P. and Peraire, J. (2011), ‘Design of photonic crystals with multiple and combined band gaps’, Phys. Rev. E 83, 046703.Google ScholarPubMed
Nardi, Y. and Rinaldo, A. (2008), ‘On the asymptotic properties of the group lasso estimator for linear models’, Electron. J. Statist. 2, 605633.CrossRefGoogle Scholar
Nemirovski, A. (1992), ‘Information-based complexity of linear operator equations’, J. Complexity 8, 153175.CrossRefGoogle Scholar
Nemirovski, A. (2004), ‘Prox-method with rate of convergence O(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex–concave saddle point problems’, SIAM J. Optim. 15, 229251.CrossRefGoogle Scholar
Nemirovski, A., Juditsky, A., Lan, G. and Shapiro, A. (2009), ‘Stochastic approximation approach to stochastic programming’, SIAM J. Optim. 19, 15741609.CrossRefGoogle Scholar
Nesterov, Y. (1983), ‘A method for solving a convex programming problem with rate of convergence O(1/k 2)’, Soviet Math. Doklady 27, 372376.Google Scholar
Nesterov, Y. (2005 a), ‘Smooth minimization of non-smooth functions’, CORE Discussion Paper 2003/12 and Math. Program. 103, 127152.CrossRefGoogle Scholar
Nesterov, Y. (2005 b), ‘Excessive gap technique in nonsmooth convex minimization’, SIAM J. Optim. 16, 235239.CrossRefGoogle Scholar
Nesterov, Y. (2007 a), ‘Dual extrapolation and its application for solving variational inequalities and related problems’, Math. Program. 109, 319344.CrossRefGoogle Scholar
Nesterov, Y. (2007 b), Gradient methods for minimizing composite objective functions. CORE Discussion Paper 2007/76.Google Scholar
Nesterov, Y. (2009), ‘Primal–dual subgradient methods for convex problems’, Math. Program. B 120, 221259.CrossRefGoogle Scholar
Nesterov, Y. and Nemirovski, A. (1994), Interior Point Polynomial Algorithms in Convex Programming. SIAM Studies in Applied Mathematics.CrossRefGoogle Scholar
Nesterov, Y. and Polyak, B. (2006), ‘Cubic regularization of Newton's method and its global performance’, Math. Program. 108, 177205.CrossRefGoogle Scholar
Osher, S., Mao, Y., Dong, B. and Yin, W. (2010), ‘Fast linearized Bregman iteration for compressive sensing and sparse denoising’, Commun. Math. Sci. 8, 93111.Google Scholar
Parvaresh, F., Vikalo, H., Misra, S. and Hassibi, B. (2008), ‘Recovering sparse signals using sparse measurement matrices in compressed DNA microarrays’, IEEE J. Selected Topics Signal Process. 2, 275285.CrossRefGoogle Scholar
Qin, Z. and Goldfarb, D. (2012), Structured sparsity via alternating direction methods. J. Mach. Learning Res. 13, 14351468.Google Scholar
Recht, B. and , C. (2011), Parallel stochastic gradient algorithms for large-scale matrix completion. http://www.optimization-online.org/DBHTML/2011/04/3012.htmlGoogle Scholar
Recht, B., Fazel, M. and Parrilo, P. (2010), ‘Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization’, SIAM Rev. 52, 471501.CrossRefGoogle Scholar
Recht, B., , C., and, S. WrightNiu, F. (2011 a), Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems 24 (Shawe-Taylor, J.et al., eds), NIPS, pp. 693701.Google Scholar
Recht, B., Xuy, W. and Hassibi, B. (2011 b), ‘Null space conditions and thresholds for rank minimization’, Math. Program. B 127, 175211.CrossRefGoogle Scholar
Rudin, L., Osher, S. and Fatemi, E. (1992), ‘Nonlinear total variation based noise removal algorithms’, Physica D 60, 259268.Google Scholar
Santosa, S. and Symes, W. (1986), ‘Linear inversion of band-limited reflection seismograms’, SIAM J. Sci. Comput. 7, 13071330.CrossRefGoogle Scholar
Scheinberg, K., Ma, S. and Goldfarb, D. (2010), Sparse inverse covariance selection via alternating linearization methods. In Advances in Neural Information Processing Systems 23 (Lafferty, J.et al., eds), NIPS, pp. 21012109.Google Scholar
Shi, J., Yin, W., Osher, S. and Sajda, P. (2010), ‘A fast hybrid algorithm for large-scale l 1-regularized logistic regression’, J. Mach. Learning Res. 11, 713741.Google Scholar
Srebro, N. and Shraibman, A. (2005), Rank, trace-norm and max-norm. In Proc. 18th Annual Conference on Learning Theory (COLT), Vol. 3559 of Lecture Notes in Computer Science, Springer, pp. 545560.Google Scholar
Studer, V., Bobin, J., Chahid, M., Mousavi, H., Candès, E. and Dahan, M. (2012), ‘Compressive fluorescence microscopy for biological and hyperspectral imaging’, Proc. Natl Acad. Sci. USA 109, E1679E1687.CrossRefGoogle ScholarPubMed
Taylor, H., Banks, S. and McCoy, J. (1979), ‘Deconvolution with the l 1-norm’, Geophys. 44, 3952.CrossRefGoogle Scholar
Tibshirani, R. (1996), ‘Regression shrinkage and selection via the lasso’, J. Roy. Statist. Soc. B 58, 267288.Google Scholar
Tropp, J. (2006), ‘Just relax: Convex programming methods for identifying sparse signals’, IEEE Trans. Inform. Theory 51, 10301051.CrossRefGoogle Scholar
Tseng, P. (2000), ‘A modified forward–backward splitting method for maximal monotone mappings’, SIAM J. Control Optim. 38, 431446.CrossRefGoogle Scholar
Tseng, P. (2008), On accelerated proximal gradient methods for convex–concave optimization. Technical report. Submitted to SIAM J. Optim. www.math.washington.edu/~tseng/papers/apgm.pdfGoogle Scholar
Vasanawala, S., Alley, M. T., Hargreaves, B. A., Barth, R. A., Pauly, J. M. and Lustig, M. (2010), ‘Improved pediatric MR imaging with compressed sensing’, Radiology 256, 607616.CrossRefGoogle ScholarPubMed
Wagner, G., Schmieder, P., Stern, A. and Hoch, J. (1993), ‘Application of nonlinear sampling schemes to cosy-type spectra’, J. Biomolecular NMR 3, 569576.Google Scholar
Wen, Z., Yin, W., Goldfarb, D. and Zhang, Y. (2010), ‘A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation’, SIAM J. Sci. Comput. 32, 18321857.CrossRefGoogle Scholar
Wen, Z., Yin, W., Zhang, H. and Goldfarb, D. (2012), ‘On the convergence of an active-set method for l 1 minimization’, Optim. Software 27, 11271146.CrossRefGoogle Scholar
Yang, J. and Yuan, X. (2013), ‘Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization’, Math. Comp. 82, 301329.CrossRefGoogle Scholar
Yin, W., Osher, S., Goldfarb, D. and Darbon, J. (2008), ‘Bregman iterative algorithms for L 1-minimization with applications to compressed sensing’, SIAM J. Imaging Sci. 1, 143168.CrossRefGoogle Scholar
Yuan, M. and Lin, Y. (2006), ‘Model selection and estimation in regression with grouped variables’, J. Roy. Statist. Soc. B 68, 4967.CrossRefGoogle Scholar