Abstract
In applications throughout science and engineering one is often faced with the challenge of solving an ill-posed inverse problem, where the number of available measurements is smaller than the dimension of the model to be estimated. However in many practical situations of interest, models are constrained structurally so that they only have a few degrees of freedom relative to their ambient dimension. This paper provides a general framework to convert notions of simplicity into convex penalty functions, resulting in convex optimization solutions to linear, underdetermined inverse problems. The class of simple models considered includes those formed as the sum of a few atoms from some (possibly infinite) elementary atomic set; examples include well-studied cases from many technical fields such as sparse vectors (signal processing, statistics) and low-rank matrices (control, statistics), as well as several others including sums of a few permutation matrices (ranked elections, multiobject tracking), low-rank tensors (computer vision, neuroscience), orthogonal matrices (machine learning), and atomic measures (system identification). The convex programming formulation is based on minimizing the norm induced by the convex hull of the atomic set; this norm is referred to as the atomic norm. The facial structure of the atomic norm ball carries a number of favorable properties that are useful for recovering simple models, and an analysis of the underlying convex geometry provides sharp estimates of the number of generic measurements required for exact and robust recovery of models from partial information. These estimates are based on computing the Gaussian widths of tangent cones to the atomic norm ball. When the atomic set has algebraic structure the resulting optimization problems can be solved or approximated via semidefinite programming. The quality of these approximations affects the number of measurements required for recovery, and this tradeoff is characterized via some examples. Thus this work extends the catalog of simple models (beyond sparse vectors and low-rank matrices) that can be recovered from limited linear information via tractable convex programming.
Similar content being viewed by others
Notes
A spherical cap is a subset of the sphere obtained by intersecting the sphere \(\mathbb{S}^{p-1}\) with a halfspace.
While Proposition 3.15 follows as a consequence of the general result in Corollary 3.14, one can remove the constant factor 9 in the statement of Proposition 3.15 by carrying out a more refined analysis of the Birkhoff polytope.
References
S. Aja-Fernandez, R. Garcia, D. Tao, X. Li, Tensors in Image Processing and Computer Vision. Advances in Pattern Recognition (Springer, Berlin, 2009).
N. Alon, A. Naor, Approximating the cut-norm via Grothendieck’s inequality, SIAM J. Comput. 35, 787–803 (2006).
A. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inf. Theory 39, 930–945 (1993).
A. Barvinok, A Course in Convexity (American Mathematical Society, Providence, 2002).
C. Beckmann, S. Smith, Tensorial extensions of independent component analysis for multisubject FMRI analysis, NeuroImage 25, 294–311 (2005).
D. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods (Athena Scientific, Nashua, 2007).
D. Bertsekas, A. Nedic, A. Ozdaglar, Convex Analysis and Optimization (Athena Scientific, Nashua, 2003).
P. Bickel, Y. Ritov, A. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector, Ann. Stat. 37, 1705–1732 (2009).
J. Bochnak, M. Coste, M. Roy, Real Algebraic Geometry (Springer, Berlin, 1988).
F.F. Bonsall, A general atomic decomposition theorem and Banach’s closed range theorem, Q. J. Math. 42, 9–14 (1991).
A. Brieden, P. Gritzmann, R. Kannan, V. Klee, L. Lovasz, M. Simonovits, Approximation of diameters: randomization doesn’t help, in Proceedings of the 39th Annual Symposium on Foundations of Computer Science (1998), pp. 244–251.
J. Cai, E. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM J. Optim. 20, 1956–1982 (2008).
J. Cai, S. Osher, Z. Shen, Linearized Bregman iterations for compressed sensing, Math. Comput. 78, 1515–1536 (2009).
E. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58, 1–37 (2011).
E. Candès, Y. Plan, Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements, IEEE Trans. Inf. Theory 57, 2342–2359 (2011).
E.J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inf. Theory 52, 489–509 (2006).
E.J. Candès, B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math. 9, 717–772 (2009).
E. Candès, T. Tao, Decoding by linear programming, IEEE Trans. Inf. Theory 51, 4203–4215 (2005).
V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, A.S. Willsky, Rank-sparsity incoherence for matrix decomposition, SIAM J. Optim. 21, 572–596 (2011).
P. Combettes, V. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul. 4, 1168–1200 (2005).
I. Daubechies, M. Defriese, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math. LVII, 1413–1457 (2004).
K.R. Davidson, S.J. Szarek, Local operator theory, random matrices and Banach spaces, in Handbook of the Geometry of Banach Spaces, vol. I (2001), pp. 317–366.
V. de Silva, L. Lim, Tensor rank and the ill-posedness of the best low-rank approximation problem, SIAM J. Matrix Anal. Appl. 30, 1084–1127 (2008).
R. DeVore, V. Temlyakov, Some remarks on greedy algorithms, Adv. Comput. Math. 5, 173–187 (1996).
M. Deza, M. Laurent, Geometry of Cuts and Metrics (Springer, Berlin, 1997).
D.L. Donoho, High-dimensional centrally-symmetric polytopes with neighborliness proportional to dimension, Discrete Comput. Geom. (online) (2005).
D.L. Donoho, For most large underdetermined systems of linear equations the minimal ℓ 1-norm solution is also the sparsest solution, Commun. Pure Appl. Math. 59, 797–829 (2006).
D.L. Donoho, Compressed sensing, IEEE Trans. Inf. Theory 52, 1289–1306 (2006).
D. Donoho, J. Tanner, Sparse nonnegative solution of underdetermined linear equations by linear programming, Proc. Natl. Acad. Sci. USA 102, 9446–9451 (2005).
D. Donoho, J. Tanner, Counting faces of randomly-projected polytopes when the projection radically lowers dimension, J. Am. Math. Soc. 22, 1–53 (2009).
D. Donoho, J. Tanner, Counting the faces of randomly-projected hypercubes and orthants with applications, Discrete Comput. Geom. 43, 522–541 (2010).
R.M. Dudley, The sizes of compact subsets of Hilbert space and continuity of Gaussian processes, J. Funct. Anal. 1, 290–330 (1967).
M. Dyer, A. Frieze, R. Kannan, A random polynomial-time algorithm for approximating the volume of convex bodies, J. ACM 38, 1–17 (1991).
M. Fazel, Matrix rank minimization with applications, Ph.D. thesis, Department of Electrical Engineering, Stanford University (2002).
M. Figueiredo, R. Nowak, An EM algorithm for wavelet-based image restoration, IEEE Trans. Image Process. 12, 906–916 (2003).
M. Fukushima, H. Mine, A generalized proximal point algorithm for certain non-convex minimization problems, Int. J. Inf. Syst. Sci. 12, 989–1000 (1981).
M. Goemans, D. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, J. ACM 42, 1115–1145 (1995).
Y. Gordon, On Milman’s inequality and random subspaces which escape through a mesh in ℝn, in Geometric Aspects of Functional Analysis, Israel Seminar 1986–1987. Lecture Notes in Mathematics, vol. 1317 (1988), pp. 84–106.
J. Gouveia, P. Parrilo, R. Thomas, Theta bodies for polynomial ideals, SIAM J. Optim. 20, 2097–2118 (2010).
T. Hale, W. Yin, Y. Zhang, A fixed-point continuation method for ℓ 1-regularized minimization: methodology and convergence, SIAM J. Optim. 19, 1107–1130 (2008).
J. Harris, Algebraic Geometry: A First Course (Springer, Berlin).
J. Haupt, W.U. Bajwa, G. Raz, R. Nowak, Toeplitz compressed sensing matrices with applications to sparse channel estimation, IEEE Trans. Inform. Theory 56(11), 5862–5875 (2010).
S. Jagabathula, D. Shah, Inferring rankings using constrained sensing, IEEE Trans. Inf. Theory 57, 7288–7306 (2011).
L. Jones, A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training, Ann. Stat. 20, 608–613 (1992).
D. Klain, G. Rota, Introduction to Geometric Probability (Cambridge University Press, Cambridge, 1997).
T. Kolda, Orthogonal tensor decompositions, SIAM J. Matrix Anal. Appl. 23, 243–255 (2001).
T. Kolda, B. Bader, Tensor decompositions and applications, SIAM Rev. 51, 455–500 (2009).
M. Ledoux, The Concentration of Measure Phenomenon (American Mathematical Society, Providence, 2000).
M. Ledoux, M. Talagrand, Probability in Banach Spaces (Springer, Berlin, 1991).
J. Löfberg, YALMIP: A toolbox for modeling and optimization in MATLAB, in Proceedings of the CACSD Conference, Taiwan (2004). Available from http://control.ee.ethz.ch/~joloef/yalmip.php.
S. Ma, D. Goldfarb, L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Math. Program. 128, 321–353 (2011).
O. Mangasarian, B. Recht, Probability of unique integer solution to a system of linear equations, Eur. J. Oper. Res. 214, 27–30 (2011).
J. Matoušek, Lectures on Discrete Geometry (Springer, Berlin, 2002).
S. Negahban, P. Ravikumar, M. Wainwright, B. Yu, A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers, Preprint (2010).
Y. Nesterov, Quality of semidefinite relaxation for nonconvex quadratic optimization. Technical report (1997).
Y. Nesterov, Introductory Lectures on Convex Optimization (Kluwer Academic, Amsterdam, 2004).
Y. Nesterov, Gradient methods for minimizing composite functions, CORE discussion paper 76 (2007).
P.A. Parrilo, Semidefinite programming relaxations for semialgebraic problems, Math. Program. 96, 293–320 (2003).
G. Pisier, Remarques sur un résultat non publié de B. Maurey. Séminaire d’analyse fonctionnelle (Ecole Polytechnique Centre de Mathematiques, Palaiseau, 1981).
G. Pisier, Probabilistic methods in the geometry of Banach spaces, in Probability and Analysis, pp. 167–241 (1986).
E. Polak, Optimization: Algorithms and Consistent Approximations (Springer, Berlin, 1997).
H. Rauhut, Circulant and Toeplitz matrices in compressed sensing, in Proceedings of SPARS’09, (2009).
B. Recht, M. Fazel, P.A. Parrilo, Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization, SIAM Rev. 52, 471–501 (2010).
B. Recht, W. Xu, B. Hassibi, Null space conditions and thresholds for rank minimization, Math. Program., Ser. B 127, 175–211 (2011).
R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 1970).
M. Rudelson, R. Vershynin, Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements, in CISS 2006 (40th Annual Conference on Information Sciences and Systems) (2006).
R. Sanyal, F. Sottile, B. Sturmfels, Orbitopes, Preprint, arXiv:0911.5436 (2009).
N. Srebro, A. Shraibman, Rank, trace-norm and max-norm in 18th Annual Conference on Learning Theory (COLT) (2005).
M. Stojnic, Various thresholds for ℓ 1-optimization in compressed sensing, Preprint, arXiv:0907.3666 (2009).
K. Toh, M. Todd, R. Tutuncu, SDPT3—a MATLAB software package for semidefinite-quadratic-linear programming. Available from. http://www.math.nus.edu.sg/~mattohkc/sdpt3.html.
K. Toh, S. Yun, An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems, Pac. J. Optim. 6, 615–640 (2010).
S. van de Geer, P. Bühlmann, On the conditions used to prove oracle results for the Lasso, Electron. J. Stat. 3, 1360–1392 (2009).
S. Wright, R. Nowak, M. Figueiredo, Sparse reconstruction by separable approximation, IEEE Trans. Signal Process. 57, 2479–2493 (2009).
W. Xu, B. Hassibi, Compressive sensing over the Grassmann manifold: a unified geometric framework, IEEE Trans. Inform. Theory 57(10), 6894–6919 (2011).
W. Yin, S. Osher, J. Darbon, D. Goldfarb, Bregman iterative algorithms for compressed sensing and related problems, SIAM J. Imaging Sci. 1, 143–168 (2008).
G. Ziegler, Lectures on Polytopes (Springer, Berlin, 1995).
Acknowledgements
This work was supported in part by AFOSR grant FA9550-08-1-0180, in part by a MURI through ARO grant W911NF-06-1-0076, in part by a MURI through AFOSR grant FA9550-06-1-0303, in part by NSF FRG 0757207, in part through ONR award N00014-11-1-0723, and NSF award CCF-1139953.
We gratefully acknowledge Holger Rauhut for several suggestions on how to improve the presentation in Sect. 3, and Amin Jalali for pointing out an error in a previous draft. We thank Santosh Vempala, Joel Tropp, Bill Helton, Martin Jaggi, and Jonathan Kelner for helpful discussions. Finally, we acknowledge the suggestions of the associate editor Emmanuel Candès as well as the comments and pointers to references made by the reviewers, all of which improved our paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Emmanuel Candès.
Appendices
Appendix A: Proof of Proposition 3.6
Proof
First note that the Gaussian width can be upper-bounded as follows:
where \(\mathcal{B}(0,1)\) denotes the unit Euclidean ball. The expression on the right-hand side inside the expected value can be expressed as the optimal value of the following convex optimization problem for each g∈ℝp:
We now proceed to form the dual problem of (31) by first introducing the Lagrangian
where \(\mathbf{u}\in\mathcal{C}^{\ast}\) and γ≥0 is a scalar. To obtain the dual problem we maximize the Lagrangian with respect to z, which amounts to setting
Putting this into the Lagrangian above gives the dual problem
Solving this optimization problem with respect to γ we find that \(\gamma= \frac{1}{2} \|\mathbf{g}-\mathbf{u}\|\), which gives the dual problem to (31)
Under very mild assumptions about \(\mathcal{C}\), the optimal value of (32) is equal to that of (31) (for example as long as \(\mathcal{C}\) has a nonempty relative interior, strong duality holds). Hence we have derived
This equation combined with the bound (30) gives us the desired result. □
Appendix B: Proof of Theorem 3.9
Proof
We set \(\beta= \tfrac{1}{\varTheta}\). First note that if \(\beta\geq \tfrac{1}{4} \exp\{\tfrac{p}{9}\}\) then the width bound exceeds \(\sqrt{p}\), which is the maximal possible value for the width of \(\mathcal{C}\). Thus, we will assume throughout that \(\beta\leq \tfrac{1}{4} \exp\{\tfrac{p}{9}\}\).
Using Proposition 3.6 we need to upper-bound the expected distance to the polar cone. Let \(\mathbf{g}\sim\mathcal {N}(0,I)\) be a normally distributed random vector. Then the norm of g is independent from the angle of g. That is, ∥g∥ is independent from g/∥g∥. Moreover g/∥g∥ is distributed as a uniform sample on \(\mathbb{S}^{p-1}\), and \(\operatorname{\mathbb{E}}_{\mathbf{g}}[\|\mathbf{g}\|]\leq\sqrt{p}\). Thus we have
where u is sampled uniformly on \(\mathbb{S}^{p-1}\).
To bound the latter quantity, we will use isoperimetry. Suppose A is a subset of \(\mathbb{S}^{p-1}\) and B is a spherical cap with the same volume as A. Let N(A,r) denote the locus of all points in the sphere of Euclidean distance at most r from the set A. Let μ denote the Haar measure on \(\mathbb{S}^{p-1}\) and let μ(A;r) denote the measure of N(A,r). Then spherical isoperimetry states that μ(A;r)≥μ(B;r) for all r≥0 (see, for example, [48, 53]).
Let B now denote a spherical cap with \(\mu(B)=\mu(\mathcal{C}^{\ast}\cap\mathbb{S}^{p-1})\). Then we have
where the first equality is the integral form of the expected value and the last inequality follows by isoperimetry. Hence we can bound the expected distance to the polar cone intersecting the sphere using only knowledge of the volume of spherical caps on \(\mathbb{S}^{p-1}\).
To proceed let v(φ) denote the volume of a spherical cap subtending a solid angle φ. An explicit formula for v(φ) is
where \(z_{p} = \int_{0}^{\pi}\sin^{p-1}(\vartheta)\,\mathrm{d}\vartheta\) [45]. Let φ(β) denote the minimal solid angle of a cap such that β copies of that cap cover \(\mathbb {S}^{p-1}\). Since the geodesic distance on the sphere is always greater than or equal to the Euclidean distance, if K is a spherical cap subtending ψ radians, μ(K;t)≥v(ψ+t). Therefore
We can proceed to simplify the right-hand side integral:
(43) follows by switching the order of integration, and the rest of these equalities follow by straightforward integration and some algebra.
Using the inequalities that \(z_{p} \geq\frac{2}{\sqrt{p-1}}\) (see [48]) and sin(x)≤exp(−(x−π/2)2/2) for x∈[0,π], we can bound the last integral as
Performing the change of variables \(a = \sqrt{p-1}(\vartheta-\tfrac {\pi}{2})\), we are left with the integral
In this final bound, we bounded the first term by dropping the upper integrand, and for the second term we used the fact that
We are now left with the task of computing a lower bound for φ(β). We need to first reparameterize the problem. Let K be a spherical cap. Without loss of generality, we may assume that
for some h∈[0,1]. Here h is the height of the cap over the equator. Via elementary trigonometry, the solid angle that K subtends is given by π/2−sin−1(h). Hence, if h(β) is the largest number such that β caps of height h cover \(\mathbb {S}^{p-1}\), then h(β)=sin(π/2−φ(β)).
The quantity h(β) may be estimated using the following estimate from [11]. For h∈[0,1], let γ(p,h) denote the volume of a spherical cap of \(\mathbb{S}^{p-1}\) of height h.
Lemma B.1
(See [11])
For \(1\geq h\geq\frac{2}{\sqrt{p}}\),
Note that for \(h \geq\frac{2}{\sqrt{p}}\),
So if
then h≤1 because we have assumed \(\beta\leq\tfrac{1}{4} \exp\{ \tfrac{p}{9}\}\) and p≥9. Moreover, \(h\geq\frac{2}{\sqrt{p}}\) and the volume of the cap with height h is less than or equal to 1/β. That is
Combining the estimate (50) with Proposition 3.6, and using our estimate for φ(β), we get the bound
This expression can be simplified by using the following bounds. First, sin−1(x)≥x lets us upper-bound the first term by \(\sqrt{\frac{p}{p-1}}\frac{1}{8\beta}\). For the second term, using the inequality \(\sin^{-1}(x)\leq\tfrac{\pi}{2}x\) results in the upper bound
For p≥9 the upper bound can be expressed simply as \(w(\mathcal{C})\leq3\sqrt{\log(4 \beta)}\). We recall that \(\beta= \tfrac{1}{\varTheta}\), which completes the proof of the theorem. □
Appendix C: Direct Width Calculations
We first give the proof of Proposition 3.10.
Proof
Let x ⋆ be an s-sparse vector in ℝp with ℓ 1 norm equal to 1, and let \(\mathcal{A}\) denote the set of unit-Euclidean-norm one-sparse vectors. Let Δ denote the set of coordinates where x ⋆ is nonzero. The normal cone at x ⋆ with respect to the ℓ 1 ball is given by
Here Δc represents the zero entries of x ⋆. The minimum squared distance to the normal cone at x ⋆ can be formulated as a one-dimensional convex optimization problem for arbitrary z∈ℝp
where
is the ℓ 1-shrinkage function. Hence, for any fixed t≥0 independent of g, we have
Now we directly integrate the second term, treating each summand individually. For a zero-mean, unit-variance normal random variable g,
The first simplification follows because the shrink function and Gaussian distributions are symmetric about the origin. The second equality follows by integrating by parts. The inequality follows by a tight bound on the Gaussian Q-function:
Using this bound, we get
Setting \(t= \sqrt{2\log(p/s)}\) gives
The last inequality follows because
whenever 0≤s≤p. □
Next we give the proof of Proposition 3.11.
Proof
Let x ⋆ be an m 1×m 2 matrix of rank r with singular value decomposition UΣV ∗, and let \(\mathcal{A}\) denote the set of rank-one unit-Euclidean-norm matrices of size m 1×m 2. Without loss of generality, impose the conventions m 1≤m 2, Σ is r×r, U is m 1×r, and V is m 2×r, and assume the nuclear norm of x ⋆ is equal to 1.
Let u k (respectively v k ) denote the k’th column of U (respectively V). It is convenient to introduce the orthogonal decomposition \({\mathbb{R}}^{m_{1} \times m_{2}} = \Delta\oplus\Delta ^{\perp}\) where Δ is the linear space spanned by elements of the form u k z T and \(\mathbf{y}\mathbf{v}_{k}^{T}\), 1≤k≤r, where z and y are arbitrary, and Δ⊥ is the orthogonal complement of Δ. The space Δ⊥ is the subspace of matrices spanned by the family (yz T), where y (respectively z) is any vector orthogonal to all the columns of U (respectively V). The normal cone of the nuclear norm ball at x ⋆ is given by the cone generated by the subdifferential at x ⋆:
Note that here \(\|Z\|_{\mathcal{A}}^{\ast}\) is the operator norm, equal to the maximum singular value of Z [63].
Let G be a Gaussian random matrix with i.i.d. entries, each with mean zero and unit variance. Then the matrix
is in the normal cone at x ⋆. We can then compute
Here (79) follows because \(\mathcal{P}_{\Delta }(G)\) and \(\mathcal{P}_{\Delta^{\perp}}(G)\) are independent. The final line follows because dim(T)=r(m 1+m 2−r) and the Frobenius (i.e., Euclidean) norm of UV ∗ is \(\|UV^{*}\|_{F}=\sqrt{r}\). Due to the isotropy of Gaussian random matrices, \(\mathcal{P}_{\Delta^{\perp}}(G)\) is identically distributed as an (m 1−r)×(m 2−r) matrix with i.i.d. Gaussian entries each with mean zero and variance one. We thus know that
(see, for example, [22]). To bound the latter expectation, we again use the integral form of the expected value. Letting \(\mu_{T^{\perp}}\) denote the quantity \(\sqrt{m_{1}-r}+\sqrt{m_{2}-r}\), we have
Using this bound in (80), we obtain
where the second inequality follows from the fact that (a+b)2≤2a 2+2b 2. We conclude that 3r(m 1+m 2−r) random measurements are sufficient to recover a rank r, m 1×m 2 matrix using the nuclear norm heuristic. □
Rights and permissions
About this article
Cite this article
Chandrasekaran, V., Recht, B., Parrilo, P.A. et al. The Convex Geometry of Linear Inverse Problems. Found Comput Math 12, 805–849 (2012). https://doi.org/10.1007/s10208-012-9135-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-012-9135-7
Keywords
- Convex optimization
- Semidefinite programming
- Atomic norms
- Real algebraic geometry
- Gaussian width
- Symmetry