The Convex Geometry of Linear Inverse Problems

Venkat Chandrasekaran¹,
Benjamin Recht²,
Pablo A. Parrilo³ &
…
Alan S. Willsky³

10k Accesses
763 Citations
33 Altmetric
3 Mentions
Explore all metrics

Abstract

In applications throughout science and engineering one is often faced with the challenge of solving an ill-posed inverse problem, where the number of available measurements is smaller than the dimension of the model to be estimated. However in many practical situations of interest, models are constrained structurally so that they only have a few degrees of freedom relative to their ambient dimension. This paper provides a general framework to convert notions of simplicity into convex penalty functions, resulting in convex optimization solutions to linear, underdetermined inverse problems. The class of simple models considered includes those formed as the sum of a few atoms from some (possibly infinite) elementary atomic set; examples include well-studied cases from many technical fields such as sparse vectors (signal processing, statistics) and low-rank matrices (control, statistics), as well as several others including sums of a few permutation matrices (ranked elections, multiobject tracking), low-rank tensors (computer vision, neuroscience), orthogonal matrices (machine learning), and atomic measures (system identification). The convex programming formulation is based on minimizing the norm induced by the convex hull of the atomic set; this norm is referred to as the atomic norm. The facial structure of the atomic norm ball carries a number of favorable properties that are useful for recovering simple models, and an analysis of the underlying convex geometry provides sharp estimates of the number of generic measurements required for exact and robust recovery of models from partial information. These estimates are based on computing the Gaussian widths of tangent cones to the atomic norm ball. When the atomic set has algebraic structure the resulting optimization problems can be solved or approximated via semidefinite programming. The quality of these approximations affects the number of measurements required for recovery, and this tradeoff is characterized via some examples. Thus this work extends the catalog of simple models (beyond sparse vectors and low-rank matrices) that can be recovered from limited linear information via tractable convex programming.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Provably Accelerating Ill-Conditioned Low-Rank Estimation via Scaled Gradient Descent, Even with Overparameterization

Matrix completion with sparse measurement errors

Article 16 January 2023

Level-set methods for convex optimization

Article 04 December 2018

Notes

A spherical cap is a subset of the sphere obtained by intersecting the sphere $\mathbb{S}^{p-1}$ with a halfspace.
While Proposition 3.15 follows as a consequence of the general result in Corollary 3.14, one can remove the constant factor 9 in the statement of Proposition 3.15 by carrying out a more refined analysis of the Birkhoff polytope.

References

S. Aja-Fernandez, R. Garcia, D. Tao, X. Li, Tensors in Image Processing and Computer Vision. Advances in Pattern Recognition (Springer, Berlin, 2009).
Book MATH Google Scholar
N. Alon, A. Naor, Approximating the cut-norm via Grothendieck’s inequality, SIAM J. Comput. 35, 787–803 (2006).
Article MathSciNet MATH Google Scholar
A. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inf. Theory 39, 930–945 (1993).
Article MathSciNet MATH Google Scholar
A. Barvinok, A Course in Convexity (American Mathematical Society, Providence, 2002).
MATH Google Scholar
C. Beckmann, S. Smith, Tensorial extensions of independent component analysis for multisubject FMRI analysis, NeuroImage 25, 294–311 (2005).
Article Google Scholar
D. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods (Athena Scientific, Nashua, 2007).
Google Scholar
D. Bertsekas, A. Nedic, A. Ozdaglar, Convex Analysis and Optimization (Athena Scientific, Nashua, 2003).
MATH Google Scholar
P. Bickel, Y. Ritov, A. Tsybakov, Simultaneous analysis of Lasso and Dantzig selector, Ann. Stat. 37, 1705–1732 (2009).
Article MathSciNet MATH Google Scholar
J. Bochnak, M. Coste, M. Roy, Real Algebraic Geometry (Springer, Berlin, 1988).
Google Scholar
F.F. Bonsall, A general atomic decomposition theorem and Banach’s closed range theorem, Q. J. Math. 42, 9–14 (1991).
Article MathSciNet MATH Google Scholar
A. Brieden, P. Gritzmann, R. Kannan, V. Klee, L. Lovasz, M. Simonovits, Approximation of diameters: randomization doesn’t help, in Proceedings of the 39th Annual Symposium on Foundations of Computer Science (1998), pp. 244–251.
Google Scholar
J. Cai, E. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM J. Optim. 20, 1956–1982 (2008).
Article Google Scholar
J. Cai, S. Osher, Z. Shen, Linearized Bregman iterations for compressed sensing, Math. Comput. 78, 1515–1536 (2009).
Article MathSciNet MATH Google Scholar
E. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM 58, 1–37 (2011).
Article Google Scholar
E. Candès, Y. Plan, Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements, IEEE Trans. Inf. Theory 57, 2342–2359 (2011).
Article Google Scholar
E.J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inf. Theory 52, 489–509 (2006).
Article MATH Google Scholar
E.J. Candès, B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math. 9, 717–772 (2009).
Article MathSciNet MATH Google Scholar
E. Candès, T. Tao, Decoding by linear programming, IEEE Trans. Inf. Theory 51, 4203–4215 (2005).
Article Google Scholar
V. Chandrasekaran, S. Sanghavi, P.A. Parrilo, A.S. Willsky, Rank-sparsity incoherence for matrix decomposition, SIAM J. Optim. 21, 572–596 (2011).
Article MathSciNet MATH Google Scholar
P. Combettes, V. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Model. Simul. 4, 1168–1200 (2005).
Article MathSciNet MATH Google Scholar
I. Daubechies, M. Defriese, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math. LVII, 1413–1457 (2004).
Article Google Scholar
K.R. Davidson, S.J. Szarek, Local operator theory, random matrices and Banach spaces, in Handbook of the Geometry of Banach Spaces, vol. I (2001), pp. 317–366.
Chapter Google Scholar
V. de Silva, L. Lim, Tensor rank and the ill-posedness of the best low-rank approximation problem, SIAM J. Matrix Anal. Appl. 30, 1084–1127 (2008).
Article MathSciNet Google Scholar
R. DeVore, V. Temlyakov, Some remarks on greedy algorithms, Adv. Comput. Math. 5, 173–187 (1996).
Article MathSciNet MATH Google Scholar
M. Deza, M. Laurent, Geometry of Cuts and Metrics (Springer, Berlin, 1997).
MATH Google Scholar
D.L. Donoho, High-dimensional centrally-symmetric polytopes with neighborliness proportional to dimension, Discrete Comput. Geom. (online) (2005).
D.L. Donoho, For most large underdetermined systems of linear equations the minimal ℓ ₁-norm solution is also the sparsest solution, Commun. Pure Appl. Math. 59, 797–829 (2006).
Article MathSciNet MATH Google Scholar
D.L. Donoho, Compressed sensing, IEEE Trans. Inf. Theory 52, 1289–1306 (2006).
Article MathSciNet Google Scholar
D. Donoho, J. Tanner, Sparse nonnegative solution of underdetermined linear equations by linear programming, Proc. Natl. Acad. Sci. USA 102, 9446–9451 (2005).
Article MathSciNet Google Scholar
D. Donoho, J. Tanner, Counting faces of randomly-projected polytopes when the projection radically lowers dimension, J. Am. Math. Soc. 22, 1–53 (2009).
Article MathSciNet MATH Google Scholar
D. Donoho, J. Tanner, Counting the faces of randomly-projected hypercubes and orthants with applications, Discrete Comput. Geom. 43, 522–541 (2010).
Article MathSciNet MATH Google Scholar
R.M. Dudley, The sizes of compact subsets of Hilbert space and continuity of Gaussian processes, J. Funct. Anal. 1, 290–330 (1967).
Article MathSciNet MATH Google Scholar
M. Dyer, A. Frieze, R. Kannan, A random polynomial-time algorithm for approximating the volume of convex bodies, J. ACM 38, 1–17 (1991).
Article MathSciNet MATH Google Scholar
M. Fazel, Matrix rank minimization with applications, Ph.D. thesis, Department of Electrical Engineering, Stanford University (2002).
M. Figueiredo, R. Nowak, An EM algorithm for wavelet-based image restoration, IEEE Trans. Image Process. 12, 906–916 (2003).
Article MathSciNet Google Scholar
M. Fukushima, H. Mine, A generalized proximal point algorithm for certain non-convex minimization problems, Int. J. Inf. Syst. Sci. 12, 989–1000 (1981).
Article MathSciNet MATH Google Scholar
M. Goemans, D. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, J. ACM 42, 1115–1145 (1995).
Article MathSciNet MATH Google Scholar
Y. Gordon, On Milman’s inequality and random subspaces which escape through a mesh in ℝⁿ, in Geometric Aspects of Functional Analysis, Israel Seminar 1986–1987. Lecture Notes in Mathematics, vol. 1317 (1988), pp. 84–106.
Chapter Google Scholar
J. Gouveia, P. Parrilo, R. Thomas, Theta bodies for polynomial ideals, SIAM J. Optim. 20, 2097–2118 (2010).
Article MathSciNet MATH Google Scholar
T. Hale, W. Yin, Y. Zhang, A fixed-point continuation method for ℓ ₁-regularized minimization: methodology and convergence, SIAM J. Optim. 19, 1107–1130 (2008).
Article MathSciNet MATH Google Scholar
J. Harris, Algebraic Geometry: A First Course (Springer, Berlin).
J. Haupt, W.U. Bajwa, G. Raz, R. Nowak, Toeplitz compressed sensing matrices with applications to sparse channel estimation, IEEE Trans. Inform. Theory 56(11), 5862–5875 (2010).
Article MathSciNet Google Scholar
S. Jagabathula, D. Shah, Inferring rankings using constrained sensing, IEEE Trans. Inf. Theory 57, 7288–7306 (2011).
Article MathSciNet Google Scholar
L. Jones, A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training, Ann. Stat. 20, 608–613 (1992).
Article MATH Google Scholar
D. Klain, G. Rota, Introduction to Geometric Probability (Cambridge University Press, Cambridge, 1997).
MATH Google Scholar
T. Kolda, Orthogonal tensor decompositions, SIAM J. Matrix Anal. Appl. 23, 243–255 (2001).
Article MathSciNet MATH Google Scholar
T. Kolda, B. Bader, Tensor decompositions and applications, SIAM Rev. 51, 455–500 (2009).
Article MathSciNet MATH Google Scholar
M. Ledoux, The Concentration of Measure Phenomenon (American Mathematical Society, Providence, 2000).
Google Scholar
M. Ledoux, M. Talagrand, Probability in Banach Spaces (Springer, Berlin, 1991).
MATH Google Scholar
J. Löfberg, YALMIP: A toolbox for modeling and optimization in MATLAB, in Proceedings of the CACSD Conference, Taiwan (2004). Available from http://control.ee.ethz.ch/~joloef/yalmip.php.
Google Scholar
S. Ma, D. Goldfarb, L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Math. Program. 128, 321–353 (2011).
Article MathSciNet MATH Google Scholar
O. Mangasarian, B. Recht, Probability of unique integer solution to a system of linear equations, Eur. J. Oper. Res. 214, 27–30 (2011).
Article MathSciNet MATH Google Scholar
J. Matoušek, Lectures on Discrete Geometry (Springer, Berlin, 2002).
Book MATH Google Scholar
S. Negahban, P. Ravikumar, M. Wainwright, B. Yu, A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers, Preprint (2010).
Y. Nesterov, Quality of semidefinite relaxation for nonconvex quadratic optimization. Technical report (1997).
Y. Nesterov, Introductory Lectures on Convex Optimization (Kluwer Academic, Amsterdam, 2004).
MATH Google Scholar
Y. Nesterov, Gradient methods for minimizing composite functions, CORE discussion paper 76 (2007).
P.A. Parrilo, Semidefinite programming relaxations for semialgebraic problems, Math. Program. 96, 293–320 (2003).
Article MathSciNet MATH Google Scholar
G. Pisier, Remarques sur un résultat non publié de B. Maurey. Séminaire d’analyse fonctionnelle (Ecole Polytechnique Centre de Mathematiques, Palaiseau, 1981).
Google Scholar
G. Pisier, Probabilistic methods in the geometry of Banach spaces, in Probability and Analysis, pp. 167–241 (1986).
Chapter Google Scholar
E. Polak, Optimization: Algorithms and Consistent Approximations (Springer, Berlin, 1997).
MATH Google Scholar
H. Rauhut, Circulant and Toeplitz matrices in compressed sensing, in Proceedings of SPARS’09, (2009).
Google Scholar
B. Recht, M. Fazel, P.A. Parrilo, Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization, SIAM Rev. 52, 471–501 (2010).
Article MathSciNet MATH Google Scholar
B. Recht, W. Xu, B. Hassibi, Null space conditions and thresholds for rank minimization, Math. Program., Ser. B 127, 175–211 (2011).
Article MathSciNet MATH Google Scholar
R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 1970).
MATH Google Scholar
M. Rudelson, R. Vershynin, Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements, in CISS 2006 (40th Annual Conference on Information Sciences and Systems) (2006).
Google Scholar
R. Sanyal, F. Sottile, B. Sturmfels, Orbitopes, Preprint, arXiv:0911.5436 (2009).
N. Srebro, A. Shraibman, Rank, trace-norm and max-norm in 18th Annual Conference on Learning Theory (COLT) (2005).
Google Scholar
M. Stojnic, Various thresholds for ℓ ₁-optimization in compressed sensing, Preprint, arXiv:0907.3666 (2009).
K. Toh, M. Todd, R. Tutuncu, SDPT3—a MATLAB software package for semidefinite-quadratic-linear programming. Available from. http://www.math.nus.edu.sg/~mattohkc/sdpt3.html.
K. Toh, S. Yun, An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems, Pac. J. Optim. 6, 615–640 (2010).
MathSciNet MATH Google Scholar
S. van de Geer, P. Bühlmann, On the conditions used to prove oracle results for the Lasso, Electron. J. Stat. 3, 1360–1392 (2009).
Article MathSciNet Google Scholar
S. Wright, R. Nowak, M. Figueiredo, Sparse reconstruction by separable approximation, IEEE Trans. Signal Process. 57, 2479–2493 (2009).
Article MathSciNet Google Scholar
W. Xu, B. Hassibi, Compressive sensing over the Grassmann manifold: a unified geometric framework, IEEE Trans. Inform. Theory 57(10), 6894–6919 (2011).
Article MathSciNet Google Scholar
W. Yin, S. Osher, J. Darbon, D. Goldfarb, Bregman iterative algorithms for compressed sensing and related problems, SIAM J. Imaging Sci. 1, 143–168 (2008).
Article MathSciNet MATH Google Scholar
G. Ziegler, Lectures on Polytopes (Springer, Berlin, 1995).
Book MATH Google Scholar

Download references

Acknowledgements

This work was supported in part by AFOSR grant FA9550-08-1-0180, in part by a MURI through ARO grant W911NF-06-1-0076, in part by a MURI through AFOSR grant FA9550-06-1-0303, in part by NSF FRG 0757207, in part through ONR award N00014-11-1-0723, and NSF award CCF-1139953.

We gratefully acknowledge Holger Rauhut for several suggestions on how to improve the presentation in Sect. 3, and Amin Jalali for pointing out an error in a previous draft. We thank Santosh Vempala, Joel Tropp, Bill Helton, Martin Jaggi, and Jonathan Kelner for helpful discussions. Finally, we acknowledge the suggestions of the associate editor Emmanuel Candès as well as the comments and pointers to references made by the reviewers, all of which improved our paper.

Author information

Authors and Affiliations

Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91125, USA
Venkat Chandrasekaran
Computer Sciences Department, University of Wisconsin, Madison, WI, 53706, USA
Benjamin Recht
Laboratory for Information and Decision Systems, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Pablo A. Parrilo & Alan S. Willsky

Authors

Venkat Chandrasekaran
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Recht
View author publications
You can also search for this author in PubMed Google Scholar
Pablo A. Parrilo
View author publications
You can also search for this author in PubMed Google Scholar
Alan S. Willsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Venkat Chandrasekaran.

Additional information

Communicated by Emmanuel Candès.

Appendices

Appendix A: Proof of Proposition 3.6

Proof

First note that the Gaussian width can be upper-bounded as follows:

$$ w\bigl(\mathcal{C} \cap\mathbb{S}^{p-1}\bigr) \leq \operatorname{\mathbb{E}}_\mathbf{g}\Bigl[ \sup_{\mathbf{z}\in \mathcal{C}\cap\mathcal{B}(0,1)} \mathbf{g}^T \mathbf{z}\Bigr], $$

(30)

where $\mathcal{B}(0,1)$ denotes the unit Euclidean ball. The expression on the right-hand side inside the expected value can be expressed as the optimal value of the following convex optimization problem for each g∈ℝ^p:

$$ \begin{array}{l@{\ }l} \displaystyle\max_{\mathbf{z}} & \mathbf{g}^T \mathbf{z}\\[5pt] \mathrm{s.t.} & \mathbf{z}\in\mathcal{C},\\[5pt] & \|\mathbf{z}\|^2\leq1. \end{array} $$

(31)

We now proceed to form the dual problem of (31) by first introducing the Lagrangian

$$ \mathcal{L}(\mathbf{z},\mathbf{u},\gamma) = \mathbf{g}^T \mathbf {z}+ \gamma\bigl(1- \mathbf{z}^T \mathbf{z}\bigr) - \mathbf{u}^T \mathbf{z} $$

where $\mathbf{u}\in\mathcal{C}^{\ast}$ and γ≥0 is a scalar. To obtain the dual problem we maximize the Lagrangian with respect to z, which amounts to setting

$$ \mathbf{z}= \frac{1}{2\gamma} (\mathbf{g}-\mathbf{u}). $$

Putting this into the Lagrangian above gives the dual problem

$$ \begin{array}{l@{\ }l} \min& \gamma+ \dfrac{1}{4\gamma} \|\mathbf{g}-\mathbf{u}\|^2\\[7pt] \mathrm{s.t.} & \mathbf{u}\in\mathcal{C}^\ast,\\ & \gamma\geq0. \end{array} $$

Solving this optimization problem with respect to γ we find that $\gamma= \frac{1}{2} \|\mathbf{g}-\mathbf{u}\|$, which gives the dual problem to (31)

$$ \begin{array}{l@{\ }l} \min& \|\mathbf{g}-\mathbf{u}\|\\[5pt] \mathrm{s.t.} & \mathbf{u}\in\mathcal{C}^\ast. \end{array} $$

(32)

Under very mild assumptions about $\mathcal{C}$, the optimal value of (32) is equal to that of (31) (for example as long as $\mathcal{C}$ has a nonempty relative interior, strong duality holds). Hence we have derived

$$ \operatorname{\mathbb{E}}_\mathbf{g}\Bigl[ \sup_{\mathbf{z}\in\mathcal{C}\cap\mathcal{B}(0,1)} \mathbf{g}^T \mathbf{z}\Bigr] =\operatorname{\mathbb{E}}_\mathbf {g}\bigl[\mathrm{dist}\bigl(\mathbf{g}, \mathcal{C}^\ast\bigr) \bigr]. $$

(33)

This equation combined with the bound (30) gives us the desired result. □

Appendix B: Proof of Theorem 3.9

Proof

We set $\beta= \tfrac{1}{\varTheta}$. First note that if $\beta\geq \tfrac{1}{4} \exp\{\tfrac{p}{9}\}$ then the width bound exceeds $\sqrt{p}$, which is the maximal possible value for the width of $\mathcal{C}$. Thus, we will assume throughout that $\beta\leq \tfrac{1}{4} \exp\{\tfrac{p}{9}\}$.

Using Proposition 3.6 we need to upper-bound the expected distance to the polar cone. Let $\mathbf{g}\sim\mathcal {N}(0,I)$ be a normally distributed random vector. Then the norm of g is independent from the angle of g. That is, ∥g∥ is independent from g/∥g∥. Moreover g/∥g∥ is distributed as a uniform sample on $\mathbb{S}^{p-1}$, and $\operatorname{\mathbb{E}}_{\mathbf{g}}[\|\mathbf{g}\|]\leq\sqrt{p}$. Thus we have

$$ \operatorname{\mathbb{E}}_\mathbf{g}\bigl[\mathrm{dist}\bigl (\mathbf{g},\mathcal{C}^\ast\bigr) \bigr] \leq\operatorname{\mathbb{E}}_\mathbf{g}\bigl[\|\mathbf {g}\|\cdot\mathrm{dist}\bigl(\mathbf{g}/\|\mathbf{g}\|, \mathcal{C}^\ast\cap\mathbb{S}^{p-1}\bigr)\bigr]\leq\sqrt{p} \operatorname{\mathbb{E}}_\mathbf{u}\bigl[ \mathrm{dist}\bigl (\mathbf{u}, \mathcal{C}^\ast\cap \mathbb{S}^{p-1}\bigr)\bigr] $$

(34)

where u is sampled uniformly on $\mathbb{S}^{p-1}$.

To bound the latter quantity, we will use isoperimetry. Suppose A is a subset of $\mathbb{S}^{p-1}$ and B is a spherical cap with the same volume as A. Let N(A,r) denote the locus of all points in the sphere of Euclidean distance at most r from the set A. Let μ denote the Haar measure on $\mathbb{S}^{p-1}$ and let μ(A;r) denote the measure of N(A,r). Then spherical isoperimetry states that μ(A;r)≥μ(B;r) for all r≥0 (see, for example, [48, 53]).

Let B now denote a spherical cap with $\mu(B)=\mu(\mathcal{C}^{\ast}\cap\mathbb{S}^{p-1})$. Then we have

(35)

(36)

(37)

where the first equality is the integral form of the expected value and the last inequality follows by isoperimetry. Hence we can bound the expected distance to the polar cone intersecting the sphere using only knowledge of the volume of spherical caps on $\mathbb{S}^{p-1}$.

To proceed let v(φ) denote the volume of a spherical cap subtending a solid angle φ. An explicit formula for v(φ) is

$$ v(\varphi)= z_p^{-1}\int_0^\varphi \sin^{p-1}(\vartheta)\,\mathrm{d}\vartheta $$

(38)

where $z_{p} = \int_{0}^{\pi}\sin^{p-1}(\vartheta)\,\mathrm{d}\vartheta$ [45]. Let φ(β) denote the minimal solid angle of a cap such that β copies of that cap cover $\mathbb {S}^{p-1}$. Since the geodesic distance on the sphere is always greater than or equal to the Euclidean distance, if K is a spherical cap subtending ψ radians, μ(K;t)≥v(ψ+t). Therefore

$$ \int_0^\infty\bigl(1- \mu(B;t)\bigr)\,\mathrm{d}t \leq\int_0^\infty\bigl(1-v\bigl( \varphi(\beta)+t\bigr)\bigr)\,\mathrm{d}t . $$

(39)

We can proceed to simplify the right-hand side integral:

(40)

(41)

(42)

(43)

(44)

(45)

(46)

(43) follows by switching the order of integration, and the rest of these equalities follow by straightforward integration and some algebra.

Using the inequalities that $z_{p} \geq\frac{2}{\sqrt{p-1}}$ (see [48]) and sin(x)≤exp(−(x−π/2)²/2) for x∈[0,π], we can bound the last integral as

(47)

Performing the change of variables $a = \sqrt{p-1}(\vartheta-\tfrac {\pi}{2})$, we are left with the integral

(48)

(49)

(50)

In this final bound, we bounded the first term by dropping the upper integrand, and for the second term we used the fact that

$$ \int_{-\infty}^\infty\exp\bigl(-x^2/2\bigr) \,\mathrm{d}x = \sqrt{2\pi}. $$

(51)

We are now left with the task of computing a lower bound for φ(β). We need to first reparameterize the problem. Let K be a spherical cap. Without loss of generality, we may assume that

$$ K = \bigl\{ x\in\mathbb{S}^{p-1}:x_1\geq h\bigr\} $$

(52)

for some h∈[0,1]. Here h is the height of the cap over the equator. Via elementary trigonometry, the solid angle that K subtends is given by π/2−sin⁻¹(h). Hence, if h(β) is the largest number such that β caps of height h cover $\mathbb {S}^{p-1}$, then h(β)=sin(π/2−φ(β)).

The quantity h(β) may be estimated using the following estimate from [11]. For h∈[0,1], let γ(p,h) denote the volume of a spherical cap of $\mathbb{S}^{p-1}$ of height h.

Lemma B.1

(See [11])

For $1\geq h\geq\frac{2}{\sqrt{p}}$,

$$ \frac{1}{10 h \sqrt{p}}\bigl(1-h^2\bigr)^{\frac{p-1}{2}} \leq \gamma(p,h) \leq\frac{1}{2 h \sqrt{p}}\bigl(1-h^2\bigr)^{\frac{p-1}{2}} . $$

(53)

Note that for $h \geq\frac{2}{\sqrt{p}}$,

$$ \frac{1}{2 h \sqrt{p}}\bigl(1-h^2\bigr)^{\frac{p-1}{2}} \leq \frac{1}{4}\bigl(1-h^2\bigr)^{\frac{p-1}{2}} \leq \frac{1}{4}\exp\biggl(-\frac{p-1}{2} h^2\biggr). $$

(54)

So if

$$ h = \sqrt{\frac{2\log(4\beta)}{p-1}} $$

(55)

then h≤1 because we have assumed $\beta\leq\tfrac{1}{4} \exp\{ \tfrac{p}{9}\}$ and p≥9. Moreover, $h\geq\frac{2}{\sqrt{p}}$ and the volume of the cap with height h is less than or equal to 1/β. That is

$$ \varphi(\beta)\geq\pi/2 - \sin^{-1} \biggl( \sqrt{\frac{2\log (4\beta)}{p-1}} \biggr). $$

(56)

Combining the estimate (50) with Proposition 3.6, and using our estimate for φ(β), we get the bound

(57)

This expression can be simplified by using the following bounds. First, sin⁻¹(x)≥x lets us upper-bound the first term by $\sqrt{\frac{p}{p-1}}\frac{1}{8\beta}$. For the second term, using the inequality $\sin^{-1}(x)\leq\tfrac{\pi}{2}x$ results in the upper bound

$$ w(\mathcal{C}) \leq\sqrt{\frac{p}{p-1}} \biggl( \frac{1}{8\beta} + \frac{\pi^{3/2}}{2} \sqrt{\log(4\beta)} \biggr). $$

(58)

For p≥9 the upper bound can be expressed simply as $w(\mathcal{C})\leq3\sqrt{\log(4 \beta)}$. We recall that $\beta= \tfrac{1}{\varTheta}$, which completes the proof of the theorem. □

Appendix C: Direct Width Calculations

We first give the proof of Proposition 3.10.

Proof

Let x ^⋆ be an s-sparse vector in ℝ^p with ℓ ₁ norm equal to 1, and let $\mathcal{A}$ denote the set of unit-Euclidean-norm one-sparse vectors. Let Δ denote the set of coordinates where x ^⋆ is nonzero. The normal cone at x ^⋆ with respect to the ℓ ₁ ball is given by

(59)

(60)

Here Δ^c represents the zero entries of x ^⋆. The minimum squared distance to the normal cone at x ^⋆ can be formulated as a one-dimensional convex optimization problem for arbitrary z∈ℝ^p

(61)

(62)

where

$$ \mathrm{shrink}(z,t) = \left\{ \begin{array}{l@{\quad}l} z+t & z<-t,\\ 0 & -t\leq z \leq t,\\ z - t & z>t \end{array} \right. $$

(63)

is the ℓ ₁-shrinkage function. Hence, for any fixed t≥0 independent of g, we have

(64)

(65)

Now we directly integrate the second term, treating each summand individually. For a zero-mean, unit-variance normal random variable g,

(66)

(67)

(68)

(69)

(70)

The first simplification follows because the shrink function and Gaussian distributions are symmetric about the origin. The second equality follows by integrating by parts. The inequality follows by a tight bound on the Gaussian Q-function:

$$ Q(x) = \frac{1}{\sqrt{2\pi}} \int_{x}^{\infty} \exp \bigl(-g^2/2\bigr) \,\mathrm{d}g \leq\frac{1}{\sqrt{2\pi}}\frac {1}{x}\exp \bigl(-x^2/2\bigr)\quad\mbox{for }x>0. $$

(71)

Using this bound, we get

$$ \operatorname{\mathbb{E}}\Bigl[\inf_{\mathbf {u}\in N_\mathcal{A}(\mathbf{x}^{\star})} \|\mathbf{g}-\mathbf{u}\|_2^2 \Bigr] \leq s\bigl(1+t^2\bigr) +(p-s)\frac{2}{\sqrt{2\pi}} \frac{1}{t}\exp\bigl(-t^2/2\bigr). $$

(72)

Setting $t= \sqrt{2\log(p/s)}$ gives

$$ \operatorname{\mathbb{E}}\Bigl[\inf _{\mathbf{z}\in N_\mathcal{A}(\mathbf{x}^{\star})} \|\mathbf {g}-\mathbf{z} \|_2^2 \Bigr] \leq s \biggl(1+2\log\biggl( \frac{p}{s} \biggr) \biggr)+\frac{s(1-s/p)}{\pi\sqrt{\log (p/s)}}\leq2s\log(p/s )+ \frac{5}{4}s. $$

(73)

The last inequality follows because

$$ \frac{(1-s/p)}{\pi\sqrt{\log(p/s)}}\leq0.204<1/4 $$

(74)

whenever 0≤s≤p. □

Next we give the proof of Proposition 3.11.

Proof

Let x ^⋆ be an m ₁×m ₂ matrix of rank r with singular value decomposition UΣV ^∗, and let $\mathcal{A}$ denote the set of rank-one unit-Euclidean-norm matrices of size m ₁×m ₂. Without loss of generality, impose the conventions m ₁≤m ₂, Σ is r×r, U is m ₁×r, and V is m ₂×r, and assume the nuclear norm of x ^⋆ is equal to 1.

Let u _k (respectively v _k) denote the k’th column of U (respectively V). It is convenient to introduce the orthogonal decomposition ${\mathbb{R}}^{m_{1} \times m_{2}} = \Delta\oplus\Delta ^{\perp}$ where Δ is the linear space spanned by elements of the form u _k z ^T and $\mathbf{y}\mathbf{v}_{k}^{T}$, 1≤k≤r, where z and y are arbitrary, and Δ^⊥ is the orthogonal complement of Δ. The space Δ^⊥ is the subspace of matrices spanned by the family (yz ^T), where y (respectively z) is any vector orthogonal to all the columns of U (respectively V). The normal cone of the nuclear norm ball at x ^⋆ is given by the cone generated by the subdifferential at x ^⋆:

(75)

(76)

Note that here $\|Z\|_{\mathcal{A}}^{\ast}$ is the operator norm, equal to the maximum singular value of Z [63].

Let G be a Gaussian random matrix with i.i.d. entries, each with mean zero and unit variance. Then the matrix

$$ Z(G) = \bigl\|\mathcal{P}_{\Delta^{\perp}}(G)\bigr\| UV^* + \mathcal{P}_{\Delta^{\perp}}(G) $$

(77)

is in the normal cone at x ^⋆. We can then compute

(78)

(79)

(80)

Here (79) follows because $\mathcal{P}_{\Delta }(G)$ and $\mathcal{P}_{\Delta^{\perp}}(G)$ are independent. The final line follows because dim(T)=r(m ₁+m ₂−r) and the Frobenius (i.e., Euclidean) norm of UV ^∗ is $\|UV^{*}\|_{F}=\sqrt{r}$. Due to the isotropy of Gaussian random matrices, $\mathcal{P}_{\Delta^{\perp}}(G)$ is identically distributed as an (m ₁−r)×(m ₂−r) matrix with i.i.d. Gaussian entries each with mean zero and variance one. We thus know that

$$ \operatorname{\mathbb{P}}\bigl[ \bigl\|\mathcal{P}_{\Delta^{\perp }}(G)\bigr\|\geq\sqrt{m_1-r}+\sqrt {m_2-r} +s \bigr] \leq\exp\bigl(-s^2/2 \bigr) $$

(81)

(see, for example, [22]). To bound the latter expectation, we again use the integral form of the expected value. Letting $\mu_{T^{\perp}}$ denote the quantity $\sqrt{m_{1}-r}+\sqrt{m_{2}-r}$, we have

(82)

(83)

(84)

(85)

(86)

(87)

Using this bound in (80), we obtain

(88)

(89)

(90)

where the second inequality follows from the fact that (a+b)²≤2a ²+2b ². We conclude that 3r(m ₁+m ₂−r) random measurements are sufficient to recover a rank r, m ₁×m ₂ matrix using the nuclear norm heuristic. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chandrasekaran, V., Recht, B., Parrilo, P.A. et al. The Convex Geometry of Linear Inverse Problems. Found Comput Math 12, 805–849 (2012). https://doi.org/10.1007/s10208-012-9135-7

Download citation

Received: 02 December 2010
Revised: 25 February 2012
Accepted: 03 July 2012
Published: 16 October 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10208-012-9135-7

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Provably Accelerating Ill-Conditioned Low-Rank Estimation via Scaled Gradient Descent, Even with Overparameterization

Matrix completion with sparse measurement errors

Level-set methods for convex optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Proof of Proposition 3.6

Proof

Appendix B: Proof of Theorem 3.9

Proof

Lemma B.1

Appendix C: Direct Width Calculations

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

The Convex Geometry of Linear Inverse Problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Provably Accelerating Ill-Conditioned Low-Rank Estimation via Scaled Gradient Descent, Even with Overparameterization

Matrix completion with sparse measurement errors

Level-set methods for convex optimization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Proof of Proposition 3.6

Proof

Appendix B: Proof of Theorem 3.9

Proof

Lemma B.1

Appendix C: Direct Width Calculations

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation