Towards Sample-Optimal Methods For Solving Random Quadratic Equations With Structure
Towards Sample-Optimal Methods For Solving Random Quadratic Equations With Structure
Towards Sample-Optimal Methods For Solving Random Quadratic Equations With Structure
Abstract—We consider the problem of estimating a struc- m > 2n − 1 Gaussian measurements [6], and in the case of
tured high-dimensional parameter vector using random Gaussian high dimension n, the sample complexity m can be very large.
quadratic samples. This problem is a generalization of the Similarly, polynomial neural networks enable applications
classical problem of phase retrieval and impacts numerous prob-
lems in computational imaging. We provide a generic algorithm such as classification, where the activation functions are
based on alternating minimization that, if properly initialized, quadratic mappings [7]. The problem in (1) then corresponds
achieves information-theoretically optimal sample complexity. In to learning weights of a single neuron, x∗ , for m Gaussian
essence, we show that solving a system of random quadratic distributed training examples ai , and yi being the corresponding
equations with structural constraints is (nearly) as easy as output labels. The task is to design efficient algorithms for
solving the corresponding linear system with the same constraints,
if a proper initial guess of the solution is available. As an learning weights using fewer training samples.
immediate consequence, our approach improves upon the best To reduce sample complexity of such problems, several
known existing sample complexity results for phase retrieval works introduce sparsity assumptions. Sparsity has been used
(structured or otherwise). We support our theory via several to great advantage in compressive sensing and streaming
numerical experiments. algorithms [8], [9], and establish an information theoretically
A full version of this paper is accessible at: https:// optimal [10] requirement of O s log n samples for stable
s
gaurijagatap.github.io/ assets/ ISIT18.pdf recovery of x∗ from linear measurements. Sparsity assump-
tions for inverting quadratic (or magnitude-only) equations
I. I NTRODUCTION of the form (1) has similarly helped lower computational
Motivation: Our focus in this paper is the following con- and memory requirements [5], [11]–[13]. Specifically, several
strained estimation problem. An unknown vector of parameters, papers consider the problem in (1), where Ms represents
∗ n
x ∈ R , is observed (or measured) to yield observations all s-sparse vectors x∗ , with the assumption of a Gaussian
y ∈ Rm of the form: observation framework [14]–[16]. In previous work [16], [17],
p
we have proposed a sparse phase retrieval algorithm called
yi = |hai , x∗ i| , i = [m], s.t. x∗ ∈ Ms (1) CoPRAM, which is linearly convergent and improves upon
where Ms ⊂ Rn is a model set that reflects the structural all other algorithms, obtaining a Gaussian sample complexity
2
∗
constraints on x . We adopt the familiar setting of under- of O s log n in general ( O (s log n) if power-law decaying
determined Gaussian observations, A = [a1 . . . ai . . . am ] ∈ > sparse signals are considered [17]).
R m×n
with m < n. The task is to recover an estimate of A natural extension of sparsity is the notion of structured
∗
x from either absolute-value (p = 1) or quadratic (p = 2) sparsity. Several works in compressed sensing and statistical
measurements y . 1 learning have utilized various structures such as blocks, clusters,
An important application of the aforementioned setup is the and trees [18], [19]. Block structures in the context of sparse
classical signal processing problem of phase retrieval. Here, phase retrieval have been studied in [16]. Tree structures are
the measurements correspond to the magnitudes of complex popularly found in applications where sparsity is considered in
2D Fourier (or Short Time Fourier) transform coefficients. wavelet basis [20]. While the impact of structured sparsity has
The sensing apparatus is incapable of detecting phase of the been studied thoroughly for the case of linear measurements
complex light-field reflected or transmitted from the illuminated [21], recovery from quadratic or magnitude-only measurements
object source. This necessitates a phase recovery strategy, and is relatively less understood.
proposed solution approaches been explored dating back to the Our contributions: In this paper, we propose a new algo-
1970s via several works [1], [2]. Recent, renewed interest by rithmic framework called Model-based CoPRAM to solve the
the statistical learning community in this problem has focused problem of phase retrieval of signals with underlying structured
on the case of Gaussian observations, and have spawned several sparsity. Our framework is fairly generic and succeeds for
algorithms which are efficient as well as asymptotically (near) parameters belonging to any structured sparsity model (defined
optimal [3]–[6]. However, even in the best case, one requires formally below). Moreover, we provably show that if the
algorithm is properly initialized, then its sample complexity is
1 This work was supported in part by NSF grants CCF-1566281 and CCF- information-theoretically optimal. In particular, we analyze a
1750920, and a Black and Veatch Faculty Fellowship. special instantiation of our framework, called Tree CoPRAM,
2297
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:27:41 UTC from IEEE Xplore. Restrictions apply.
2018 IEEE International Symposium on Information Theory (ISIT)
Algorithm 1 Model-based CoPRAM Note that the problem above is not convex, because p ∈ P is
Input: A, y, s, t0 a set of all vectors with entries constrained to be in {−1, 1}.
Instead, we alternate between estimating p and x. We perform
1
Pm 2
1: Compute signal power: φ2 = m Pi=1 yi . two estimation steps: (i) if we fix the signal estimate x, then
1 m
2: Compute: diag(M) := Mjj = m 2 2
i=1 yi aij for j = [n].
the minimizer p ∈ P is given in closed form as:
3: Set: Ŝ ← M ODEL A PPROX(diag(M)).
Pm 2 p = sign (Ax) , (5)
1 T
4: Set: v1 ← top s.v. of MŜ = m i=1 yi ai Ŝ ai Ŝ .
(phase estimation, Line 8 of Algorithm 1);
5: Compute: v ∈ Rn ← v1 for Ŝ, and 0 for Ŝ c .
(ii) if we fix the phase vector p, the signal vector x ∈ Ms is
6: Compute: x0 ← φv .
obtained by solving a (structured) sparse recovery problem,
for t = 0, · · · , t0 − 1 do 1
7: min √ kAx − p ◦ yk2 , (6)
8: pt+1 ← sign (Axt ), x∈Ms m
9: xt+1 = M ODEL C O S A MP( √1m A, √1m pt+1 ◦ y,s,xt ). if m < n and √Am satisfies the model-RIP as defined above.
10: end for (signal estimation, Line 9 of Algorithm 1).
Here, we employ the Model-based CoSaMP [19] algorithm
Output: xt0 ← xt . to (approximately) solve (6). Note that since (6) itself is a non-
convex problem, exact minimization can be hard. However, in
minimization based descent approach, similar to our previous each signal estimation step, we do not need to explicitly obtain
work in [16]. Our algorithm is largely parameter-free except the minimizer; we can still show a sufficient descent criterion by
for knowledge of the underlying sparsity s. Moreover, the unpackin the analysis of the Model-based CoSaMP algorithm.
theoretical analysis requires no extra assumptions on the For analysis reasons, we require that the entries of√the input
parameter vector, except that its support belongs to a structured sensing matrix are distributed according to N (0, 1/ m). This
sparsity model. We call our algorithm Model-based CoPRAM, can be achieved by scaling down the inputs √ to Model-based
which generalizes our previous algorithm called CoPRAM (or CoSaMP: At , pt+1 ◦ y by a factor of m. We also use a
Compressive Phase Retrieval with Alternating Minimization) “warm start" Model-based CoSaMP routine for the (t + 1)th
[16], [17]. The algorithm can be broken down into three types update of x, xt+1 , for each iteration where the initial guess of
of update stages: (i) initialization; (ii) phase estimation; and (iii) the solution to (6) is given by the current signal estimate xt .
signal estimation. The full algorithm is presented in pseudo- We now analyze our proposed descent scheme. We obtain
code form as Algorithm 1. The phase and signal estimation the following theoretical result:
stages are described in detail in the Section III-A. Due to the
simplicity of our algorithm, it can easily be extended to a Theorem III.1. Given an initialization x0 ∈ Ms satisfying
∗ ∗
0
general class of signals defined by any model Ms . In this dist x , x ≤ δ0 kx k2 , for 0 < δ0 < 1, if we have number
paper, we focus on the special case where the model Ms of Gaussian measurements,
corresponds to tree-sparse vectors in Rn . m > C (s + card(M4s )) ,
A. Convergence of Model-based CoPRAM then the iterates xt+1 of Algorithm 1, satisfy:
This part of the algorithm is described in Lines 7-10 of
dist xt+1 , x∗ ≤ ρ0 dist xt , x∗ ,
(7)
Algorithm 1. Once we obtain a good enough initial estimate
x0 ∈ Ms such that dist x0 , x∗ ≤ δ0 kx∗ k2 , we construct where xt , xt+1 , x∗ ∈ Ms , and 0 < ρ0 < 1 is a constant, with
a method to accurately estimate the true x∗ . To achieve this, probability greater than 1 − e−γm , for positive constant γ.
we adapt the alternating minimization approach from [5]. The
Proof sketch: Via an algebraic derivation similar to the one
observation model in (1) can be restated as follows:
provided in [17], the per-iteration error for the tth iteration
sign (hai , x∗ i) ◦ yi = hai , x∗ i , of Alg. 1, with L iterations of Model-based CoSaMP, can be
for all i = {1, 2, . . . m}. We denote the phase vector p ∈ Rm derived as:
as a vector that contains the unknown signs of the measure- (ρ1 ρ4 + ρ2 )
xt+1 − x∗ 2 ≤ (ρ1 ρ3 )L x∗ − xt 2 + kEph k2 ,
ments, i.e., pi = sign (hai , xi) for all i = {1, 2, . . . m}. Let (1 − ρ1 ρ3 )
p∗ denote the true phase vector and let P denote the set (8)
of all phase vectors, i.e. P = {p : pi = ±1, ∀i}. Then our where ρ1 , ρ2 , ρ3 , ρ4 are appropriate constants, and Eph is
measurement model gets modified as: the error in estimating phase in the tth run of Model-based
∗
p ◦ y = Ax . ∗ CoPRAM. The second part of this proof requires a bound on
the phase error term kEph k2 , which can be analytically derived
The loss function in (3) gets modified and is composed of two as:
variables x and p, m
2 4 X > ∗ 2
min kAx − p ◦ yk2 (4) kE ph k = a x · 1{sign(ai xt ) sign(ai x∗ )=−1} .
x∈Ms ,p∈P
2 m i=1 i
2298
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:27:41 UTC from IEEE Xplore. Restrictions apply.
2018 IEEE International Symposium on Information Theory (ISIT)
We bound this term by invoking Lemma III.2. top left singular vector (s.v.) of M to construct a good initial
estimate x0 (Lines 4-6 of Algorithm 1).
Lemma III.2. As long as the initial estimate is a small distance To provide the intuition behind this strategy, we leverage the
away from the true signal x∗ ∈ Ms , dist x0 , x∗ ≤ δ0 kx∗ k2 ,
fact that the diagonal elements of the expectation matrix E [M]
and subsequently, dist (xt , x∗ ) ≤ δ0 kx∗ k2 , where xt is the are given by E [Mjj ] = kx∗ k2 + 2x∗2 j . The signal marginals
tth update of Algorithm 1, then the following bound holds, Mjj corresponding to j ∈ S are larger, in expectation, than
m
4 X > ∗ 2 2 those corresponding to j ∈ S c . Therefore the signal marginals
ai x · 1{(a> xt )(a> x∗ )≤0} ≤ ρ25 xt − x∗ 2 ,
m i i Mjj serve as a good indicator to extract an approximate
i=1
support Ŝ of x∗ . We additionally impose structure Ms to
with probability greater than 1 − e−γ2 m , where γ2 is a this sparse initial vector, by utilizing an approximate model
positive constant, as long as m > C (s + card(M4s )) and projection algorithm (such as tree projection [22]) (Line 3
ρ25 = 0.0256. of Algorithm 1). We demonstrate experimentally that this
We therefore achieve a per-step error reduction scheme of initialization strategy produces a good estimate of x∗ . We do
the form: not have a full theoretical characterization of the initialization
stage, and intend to pursue this in future work.
xt+1 − x∗ 2
≤ ρ0 xt − x∗ 2
,
IV. E XPERIMENTS
if the initial estimate x0 satisfies x0 − x∗ 2 ≤ δ0 kx∗ k2 , and
this result can be trivially extended to the case where the initial In this section, we present experimental results to demon-
estimate x0 satisfies x0 + x∗ 2 ≤ δ0 kx∗ k2 , hence giving strate the empirical advantages of Model-based CoPRAM
the convergence criterion of the form (for ρ0 < 1): (specifically, for the tree-sparsity model) over standard sparse
phase retrieval algorithms (specifically, CoPRAM [16] which
dist xt+1 , x∗ ≤ ρ0 dist xt , x∗ .
provides the best available empirical performance for the
The complete proof of Theorem III.1 and Lemma III.2 can standard case). We consider two different sizes (32 × 32 and
be found in Appendix A of the full paper [28]. We present a 64 × 64) of an image of Rice University’s Lovett Hall as
corollary of Theorem III.1 for tree sparse signals. shown in Figure 2. This image is modeled as sparse in the
Haar wavelet basis, with the number of levels of decomposition
Corollary III.3. As a consequence of Theorem III.1, if Ms is are chosen to be log2 n where n is the number of image pixels.
a model representing rooted tree sparse signals with sparsity s,
then Algorithm 1 is linearly convergent and requires a Gaussian
sample complexity ofm > Cs, as long as the initialization x0
satisfies dist x0 , x∗ ≤ δ0 kx∗ k2 .
The proof of this corollary can be found in Appendix A
of the full version of this paper [28]. Observe that m = O(s)
samples are necessary for reconstructing any s-sparse parameter
vector even in the linear case (where perfect phase information
is available), and therefore Theorem III.1 implies information-
Fig. 2: Image considered for simulations, resized to 32 × 32
theoretic optimality (up to constants) of our proposed approach.
and 64 × 64 pixels, considered to be sparse in Haar basis.
B. Initialization of Model-based CoPRAM The Tree CoPRAM and CoPRAM algorithms were run for
the following experimental settings: n = 1024 and n = 4096.
The first stage (Lines 1-6 of Algorithm 1) of Model-based The original image x̂ was sparsified by fixing s and picking
CoPRAM uses a spectral initialization approach, similar to that the top s- wavelet coefficients of x̂. This sparsified image is
provided in previous sparse phase retrieval methods [5], [14]– considered to be the s-sparse tree structured ground truth x∗ .
[16], [26]. We construct a biased estimator of the squared true Phase transitions: We demonstrate the superior perfor-
signal coefficients, which we call the signal marginal matrix: mance of the Tree CoPRAM algorithm in comparison to
m CoPRAM, through a series of phase transition graphs and
1 X 2
M= y ai a>
i . diagrams. In Figure 3, we illustrate two different settings of
m i=1 i
sparsities for n = 1024 dimensional x∗ : s = 10 and s = 31,
The j th signal coefficient
Pm 2can be estimated from the the diagonal and compare the performances of CoPRAM and Tree CoPRAM
1 2
term Mjj = m i=1 i ij and the set of all Mjj ’s can be
y a , by plotting the variation in the number of measurements m on
calculated in O (mn) time. The approximate support estimate the horizontal axis and the probability of successful recovery
Ŝ can be extracted by performing an exact or approximate tree (fraction of trials in which kxt0 − x∗ k2 / kx∗ k2 ≤ 0.05). It is
projection algorithm [22] on the n-dimensional diagonal of the clear that far fewer samples are required for successful recovery,
marginal matrix M. From this we obtain the sub-matrix MŜ , when Tree CoPRAM is used instead of CoPRAM.
whose rows and columns are projected onto Ŝ. This is followed Figure 4 shows phase transitions for image size n=4096,
by a spectral technique ( [5], [14]–[16]), which extracts the at different sparsities (s=10,20,31,41,51,61,72,82,92,102) and
2299
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:27:41 UTC from IEEE Xplore. Restrictions apply.
2018 IEEE International Symposium on Information Theory (ISIT)
2300
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 09,2023 at 15:27:41 UTC from IEEE Xplore. Restrictions apply.