Nothing Special   »   [go: up one dir, main page]

21jan Solutions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science


Circuits and Systems Group

EE4C03 STATISTICAL DIGITAL SIGNAL PROCESSING


AND MODELING
18 January 2021, 9:00–12:20
Block 1 (9:00-10:30)
Open book, strictly timed take-home exam. (Electronic) copies of the book and the
course slides allowed. No other tools except a basic pocket calculator permitted.
Upload answers during 10:25–10:35

This block consists of three questions (24 points); more than usual, and this will be taken into
account during grading. Answer in Dutch or English. Make clear in your answer how you reach
the final result; the road to the answer is very important. Write your name and student number
on each sheet.
Hint: Avoid losing too much time on detailed calculations, write down the general approach
first.

Question 1 (8 points)

(a) Let x(n) = A sin(ω0 n + φ), where A and ω0 are fixed constants, and φ is a random phase,
uniformly distributed over [−π, π]. The signal is filtered by an FIR filter with impulse
response 
1, n = 0,


h(n) = − 12 , n = 1,


0, otherwise.
The output of the filter is y(n) = h(n) ∗ x(n).
Compute the autocovariance sequence cy (k).

(b) For a zero mean WSS process, let the autocorrelation sequence be given by

r(k) = b a|k| , (|a| < 1)

and construct the N × N correlation matrix R. Take N > 2.

– Derive the rank of R.


– How can a random signal with this autocorrelation sequence be generated from white
noise?

(c) True or False: is this a valid autocorrelation matrix?


 
2 0 2 0
 0 2 0 2 
R=
 
 2 0 2 0


0 2 0 2

(Motivate your answer.)


(d) True or False: if y(n) = x1 (n) + x2 (n), and x1 (n) is a WSS process and x2 (n) is a WSS
process and not independent of x1 (n), then y(n) is a WSS process. (Motivate your answer:
give a proof or give a counterexample.)

Solution

(a)
1 1
cx (k) = A2 cos(ω0 k) ⇔ Px (ω) = πA2 [u0 (ω − ω0 ) + u0 (ω + ω0 )]
2 2
where u0 (ω) is a delta spike in frequency domain.
1 1 1 1 5
h(k) = δ(n)− δ(n−1) ⇔ H(z) = 1− z −1 ⇔ |H(ω)|2 = (1− e−jω )(1− ejω ) = −cos(ω)
2 2 2 2 4

5 1
Py (ω) = ( − cos(ω)) πA2 [u0 (ω − ω0 ) + u0 (ω + ω0 )]
4  2 
1 2 5
= πA − cos(ω0 ) [u0 (ω − ω0 ) + u0 (ω + ω0 )]
2 4
Hence  
1 5
cy (k) = A2 − cos(ω0 ) cos(ω0 k)
2 4

(b) The covariance matrix is


 
1 a a2 ··· aN −1

 a 1 a ··· aN −2 

a2 a 1 ··· aN −3
 
R = b 
.. .. ..
 
. . .
 
 
aN −1 aN −2 · · · a 1

Since |a| < 1, the rank of R is N (full rank).


A signal with this autocorrelation can be generated by filtering white Gaussian noise with
a first-order filter
B
H(z) =
1 − Az −1
Comparing to eqn (3.122), the resulting autocorrelation sequence is
B
rx (k) = A|k|
1 − A2
p
we find that A = a, and B = b(1 − a2 ).

(c) True. The matrix is Hermitian, Toeplitz, and r(0) ≥ |r(k)|. Because r(0) = 2, r(2) = 2,
the process is periodic with period 2. The given values in R are consistent with this.
With Matlab, you could also compute the eigenvalues of R and note that they are all
semi-positive. Actually, the rank is 2 (columns are repeated), and the covariance sequence
is r(k) = 1 + cos(kπ), so this could have been generated by a non-zero mean sinusoidal
process.

(d) False, in general. The two processes x1 (n) and x2 (n) need to be jointly WSS, and this was
not specified. In particular, rx1 ,x2 (k, l) should depend only on the difference k − l.
A counterexample would be a case where x2 (n) = x1 (−n).

2
Question 2 (9 points)

A random sequence x(n) has the autocorrelation sequence

rx (0) = 1, rx (1) = 0.8, rx (2) = 0.5, rx (3) = 0.1

(a) Use the Schur recursion to compute, step by step, the reflection coefficients Γ1 , Γ2 , Γ3 and
the modeling errors ǫ1 , ǫ2 , ǫ3 .

(b) Draw the corresponding lattice filter schematic that filters the signal x(n) into white noise.
(Clearly indicate the inputs and outputs.)

(c) From the reflection coefficients, compute the model parameters a3 (k) of a 3rd order model.

(d) Write down the 3 × 3 Yule-Walker equations and verify your result under (c).

(e) If we would have Γ3 = 0, what conclusions can you draw from this? (What can you say
about the random process, what can you say about Γ4 , Γ5 , · · · and ǫ4 , ǫ5 , · · · .)

Solution

(a)      
0 1 0 0 0 0
 0.8 0.8  =⇒  0.8 1  =⇒  0 0.36 
 shift   rotate, Γ1 = −0.8 
     
0.5 0.5   0.5 0.8   −0.14 0.4
 
 
0.1 0.1 0.1 0.5 −0.3 0.42
     
0 0 0 0 0 0
=⇒  0 0  =⇒  0 0  =⇒  0 0 
shift   rotate, Γ2 = 0.14 = 0.3889  shift 
     
0.36
−0.14 0.36  0 0.3056  0 0
 
   
−0.3 0.4 −0.1444 0.2833 −0.1444 0.3056
 
0 0
=⇒  0 0 
0.1444
rotate, Γ3 = 0.3056 = 0.4725 
 
 0 0


0 0.2373
The modeling errors are ǫ1 = 0.36, ǫ2 = 0.3056, ǫ3 = 0.2373.

(b) See Levinson slides, p.18:

1 a1 (z) = 1 + z −1 Γ1 a2 (z) = 1 + z −1 Γ1 + z −1 Γ2 (Γ1 + z −1 ) a3 (z)

x[n] v3 [n]
Γ1 Γ2 Γ3
Γ1 Γ2 Γ3

z −1 z −1 z −1 v3R [n]

aR aR −1 Γ ) + z −1 (Γ + z −1 )aR (z)
1 (z) = Γ1 + z 2 (z) = Γ2 (1 + z
−1
1 1 3

If the input is x(n), the outputs are v3 (n) (white noise up to order 3) and v3R (n).

3
(c) If the input is [1, 0, 0, 0] (a delta spike), the output at the top right is a3 (n). Using the
diagram, this can be computed by following the same procedure as above:
     
1 1 1 0 1 −0.8
 0 0  =⇒  0 1  =⇒  −0.8 1 
 shift   rotate, Γ1 = −0.8 
     
 0 0   0 0   0 0 
 

0 0 0 0 0 0
     
1 0 1 0.3889 1 0
=⇒  −0.8 −0.8  =⇒  −1.1111 −1.1111  =⇒  −1.1111 0.3889 
shift   rotate, Γ2 = 0.14
0.36 = 0.3889   shift 
     
0 1   0.3889 1  0.3889 −1.1111

  
0 0 0 0 0 1
 
1 0.4725
=⇒  −0.9274 −0.1361 
0.1444
rotate, Γ3 = 0.3056 = 0.4725 
 
 −0.1361 −0.9274 

0.4725 1
The model parameters are a3 (k) = [1, −0.9274, −0.1361, 0.4725].

(d) The Yule-Walker equations with the a3 (k)-coefficients filled in are


    
1 0.8 0.5 −0.9274 0.8
 0.8 1 0.8  −0.1361 = − 0.5
    
0.5 0.8 1 0.4725 0.1

and indeed the LHS is equal to the RHS.

(e) If Γ3 = 0, then the underlying process has an AR model of order 2. The prediction error
ǫ4 = ǫ3 , and becomes constant after the 2nd step. Γ3 and all subsequent Γk will be zero:
the recursion stops.

Question 3 (7 points)

We receive 1 second of a signal xa (t), sampled at a rate fs = 1000 Hz to obtain a data sequence
x(n), n = 0, · · · , N − 1. The signal is assumed to be white Gaussian noise with variance σ 2 per
sample, but it may be contaminated by a narrowband interferer. To detect it, Bartlett’s method
is used to estimate the power spectrum of x(n). At each frequency we wish to obtain a variance
of the estimate that is smaller than 51 σ 4 .

(a) How should the number of segments K and the length of each segment L be chosen?

(b) What is best resolution (in Hz) that can be obtained using this technique?

(c) How can the resolution be improved? Give 3 possibilities. For each possibility, discuss the
effect on the variance of the spectrum estimate.

(d) Suppose the bandwidth of the interferer is B Hz. To detect the interferer, does it help to
improve the resolution? To answer this, consider the case where the resolution is smaller
than B as well as the case where it is larger.

4
Solution

(a) For Bartlett’s method, we have N = KL; the resolution is


2π 2π
δω = 0.89K = 0.89
N L
and the variance is
1 2 jω
Var(P̂B (ejω )) ≈ P (e )
K x
In this case, we have N = 1000; the condition on the variance gives K: for those frequencies
where there is no interferer, hence Px (ejω ) = σ 2 , follows K = 5.
From N = 1000 and K = 5, it follows that L = 200.

(b) The resolution is (for normalized radial frequencies ω)


2π 2π
∆ω = 0.89K = 0.89
N L
Translating to the original analog domain frequencies (F , in Hz, where ω = 2πf =
2πF/fs ), we find
fs
∆F = 0.89 = 4.45 Hz
L
Note that we can also write fs = N/T where T = 1 sec is the observation time; it follows

N K
∆F = 0.89 = 0.89
LT T

(c) 1. Increase L; with constant N we will have smaller K and hence a larger variance.
2. Decrease fs ; with a constant total measurement time we will have fewer samples N , if
we keep L the same then K will decrease, hence the variance will be larger.
3. Increase the observation time; with the same fs we will have a larger N , hence we can
choose a larger L while K remains the same. With this option the variance can stay the
same.

(d) The area under the peak in the spectral density estimate is the energy of the interferer.
1. Suppose the interferer bandwidth is smaller than the resolution. Then, if we halve the
resolution (double L), the new frequency bin that contains the interferer will be half as
wide. With the same interferer power, the ’peak’ in the power spectrum will double, and
we will more easily detect it. Thus, this seems useful. With constant N , the variance of
the estimates will also double, visible in the plot by an increased standard deviation by a

factor 2. So the effect will be less pronounced but still useful.
2. If the interferer bandwidth is larger than the resolution, then if we further improve the
resolution, the peak will not increase anymore; it will not help to detect the interferer. If
the improvement in resolution causes a larger variance of the estimates, the overall effect
is even detrimental.

5
Delft University of Technology
Faculty of Electrical Engineering, Mathematics, and Computer Science
Circuits and Systems Group

EE4C03 STATISTICAL DIGITAL SIGNAL PROCESSING


AND MODELING
18 January 2021, 9:00–12:20
Block 2 (10:50-12:20)
Open book, strictly timed take-home exam. (Electronic) copies of the book and the
course slides allowed. No other tools except a basic pocket calculator permitted.
Upload answers during 12:15–12:25

This block consists of three questions (26 points); more than usual, and this will be taken into
account during grading. Answer in Dutch or English. Make clear in your answer how you reach
the final result; the road to the answer is very important. Write your name and student number
on each sheet.
Hint: Avoid losing too much time on detailed calculations, write down the general approach
first.

Question 4 (9 points)

Consider a process x(n) consisting of a sinusoid plus noise. Specifically, x(n) is given by

x(n) = A(n) cos(ωn) + e(n),

2 , and e(n)
where ω is a deterministic frequency, A(n) is Gaussian with mean zero and variance σA
2
is white Gaussian with mean zero and variance σe . We assume A(n) and e(n) are independent.
We want to estimate the amplitude A(n) of this sinusoid by means of a simple zeroth order filter
w applied to x(n).

(a) What is the distribution of x(n)? What is its mean and its variance? Is x(n) stationary?

(b) Whether x(n) is stationary or not, can you think of a good filter w to estimate A(n)? In
your answer, you may assume w is time-varying.

(c) Derive the normalized least mean squares (NLMS) algorithm to adaptively track the filter
w that allows us to estimate A(n) from x(n).

(d) Suppose we now want to estimate A(n) from x(n) using a Kalman filter. In that case, we
first need to make some assumption on the dynamics of A(n). Since we have no specific
information about the dynamics, we simply assume

A(n + 1) = A(n).

Using the above equation as state equation, derive the Kalman filter to estimate A(n) from
x(n). Name at least two advantages of this Kalman filter over the NLMS filter described
in (c)?
Solution
2 cos2 (ωn) + σ 2 . Since the
(a) The process x(n) is Gaussian with zero mean and variance σA e
variance depends on n the process is not stationary.

(b) Since the process is not stationary, strictly speaking there is no optimal Wiener filter. If
we would have to conisder a time-invariant filter w, the best we can do is to take w = 0,
which is not very useful. A better time-varying choice could be w(n) = 1/ cos(ωn), or a
Wiener-style filter like
2 cos(ωn)
σA
E{A(n)x(n)}
w(n) = = 2 cos2 (ωn) + σ 2 .
E{|x(n)|2 } σA e

(c) The NLMS filter update looks as follows:


µ
ŵ(n+1) = ŵ(n) − x(n)(ŵ(n) x(n) − A(n)).
|x(n)|2

(d) The Kalman filter is based on the state space model

A(n + 1) = A(n),
x(n) = A(n) cos(ωn) + e(n).

The updating equations are then given by

Â(n|n − 1) = Â(n − 1|n − 1)


p(n|n − 1) = p(n − 1|n − 1)
k(n) = p(n|n − 1) cos(ωn)[cos2 (ωn)p(n|n − 1) + σe2 ]−1
Â(n|n) = Â(n|n − 1) + k(n)[x(n) − Â(n|n − 1) cos(ωn)]
p(n|n) = [1 − k(n) cos(ωn)]p(n|n − 1)

As far as the advantages is concerned, first of all, the Kalman filter provides the optimal
estimate of the amplitude, given the prior data, whereas the NLMS filter does not. Sec-
ond, it explicitly takes the time-varying behavior of the measurement model into account.
Thirdly, it does not require the true reference signal A(n) in the update equations.

Question 5 (8 points)

Consider the noise canceling scenario depicted below.

2
The useful signal x(n) that is picked up by the microphone is corrupted by noise v(n) after
it passes through the noise path h(n). So the signal y(n) measured by the microphone is
y(n) = x(n) + h(n) ∗ v(n), where ∗ represents the convolution operator. To cancel the noise,
we apply a separate microphone at the noise source, which only picks up the noise v(n). We
let the noise v(n) pass through the filter w(n) and subtract its output z(n) from y(n) to obtain
e(n) = y(n) − z(n). Minimizing the error e(n) we hope that the filter w(n) will be close to the
noise path h(n) such that e(n) will be close to the desired signal x(n). We assume that the
noise v(n) is white Gaussian with zero mean and variance σ 2 and that it is independent from
the useful signal x(n).
Suppose we can only afford estimating a filter w(n) of a short order p which is much smaller
than the order of h(n). Hence, the filter taps of w(n) can be grouped in a short vector w =
[w(0), w(1), . . . , w(p)]T . To compenstate for this limited order, we include a delay δ before the
filter w(n) and instead of filtering v(n) we filter v(n − δ).

(a) First, express the optimal Wiener filter w as a function of rv (k) = E{v(n)v(n − k)} and
ryv (k) = E{y(n)v(n − k)}.

(b) Using the fact that v(n) is white Gaussian with zero mean and variance σ 2 , express rv (k)
and ryv (k) as a function of h(n).

(c) From the expressions in (b), write the optimal solution of w as a function of h(n) and give
the expression for the optimal error e(n).

(d) Assume that we achieve our goal and w(n) = h(n + δ), for n = 0, 1, . . . , p. In that case,
what would be a good way to select δ?

(e) Is this still a good approach when v(n) is not white? Also explain why. Describe in words
what will change.

Solution

(a) Defining the vector vδ (n) = [v(n − δ), v(n − 1 − δ), . . . , v(n − p − δ)]T , the output of the
filter w is given by z(n) = wT vδ (n). The optimal Wiener filter is this given by

w = E{vδ (n)vδT (n)}−1 E{y(n)vδ (n)}


 −1  
rv (0) . . . rv (p) ryv (δ)
=  ... . . .. ..
   
..   . 
rv (−p) ... rv (0) ryv (p + δ)

(b) Using the fact that v(n) is white Gaussian noise with zero mean and variance σ 2 , we have

rv (k) = σ 2 δ(k),
ryv (k) = E{(x(n) + h(n) ∗ v(n))v(n − k)}
= E{(h(n) ∗ v(n))v(n − k)}
= h(k) ∗ rv (k) = σ 2 h(k) ∗ δ(k) = σ 2 h(k).

3
(c) From (b), it is easy to see that
 −1  
rv (0) . . . rv (p) ryv (δ)
 .. . . .. ..
w= .
  
..   . 
rv (−p) . . . rv (0) ryv (p + δ)
   
h(δ) h(δ)
= (σ 2 I)−1 σ 2  .. ..
= .
   
. .
h(p + δ) h(p + δ)

The error is given by


p
X
e(n) = x(n) + h(n) ∗ v(n) − w(k)v(n − δ − k)
k=0
Xp
= x(n) + h(n) ∗ v(n) − h(k + δ)v(n − δ − k).
k=0

(d) It seems logical to select a δ for which the mean square error is the smallest, i.e., for which
E{e2 (n)} is minimal. Since we can write
p
X p
X
2 2 2 2
E{e (n)} = constant − σ w (k) = constant − σ h2 (k + δ),
k=0 k=0

we select the δ for which kwk2 = pk=0 w2 (k) = pk=0 h2 (k + δ) is the highest. Selecting
P P

such a δ means the noise path h(n) has maximal energy from the indices δ to δ + p.

(e) Yes, the approach is then still valid. In that case, w(n) will be approximately equal to
h(n + δ) since also rv (k) will play a role.

Question 6 (9 points)

Consider a stochastic process x(n) with zero mean and correlation function rx (k). Now assume
we sample this process with two low-rate samplers, one at rate 1/3 and the other at rate 1/5.
So the sequence obtained from the first sampler is x1 (n) = x(3n) whereas the one related to the
second sampler is x2 (n) = x(5n).

(a) Express the correlation matrix of the first 5 samples of x1 (n) as a function of rx (k), i.e.,
use rx (k) to compute Rx1 = E{x1 xH T
1 } with x1 = [x(0), x(3), x(6), x(9), x(12)] . Similarly
express the correlation matrix of the first 3 samples of x2 (n) as a function of rx (k), i.e.,
use rx (k) to compute Rx2 = E{x2 xH T
2 } with x2 = [x(0), x(5), x(10)] .

(b) Express the cross-correlation matrix between the first 5 samples of x1 (n) and the 3 first
samples of x2 (n) as a function of rx (k), i.e., use rx (k) to compute Rx1 ,x2 = E{x1 xH
2 }.

(c) How can these three correlation matrices be computed in practice if you have given N1 = 50
samples of x1 (n) and N2 = 30 samples of x2 (n)?

(d) After computing the three correlation matrices Rx1 , Rx2 , and Rx1 ,x2 , which correlation
values rx (k) can we estimate from Rx1 , Rx2 , and Rx1 ,x2 ?

4
This result shows that even though we did not explicitly sample the process x(n) at rate 1, we
can still derive (part of) its correlation function from two low-rate samplers.
Now that we have estimated (part of) the correlation function rx (k), we can use this to interpo-
late the signals x1 (n) = x(3n) and x2 (n) = x(5n) in order to find the full rate 1 sequence x(n).
Assume for instance we want to estimate x(n), for n = 0, 1, 2, . . . , 14, from x1 and x2 jointly (or
equivalently from x0 = [xT1 , xT2 ]T ).

(e) Can you give the optimal Wiener filter expression to estimate x(n), for n = 0, 1, 2, . . . , 14,
from x1 and x2 ? Try to express this filter as a function of rx (k).

(f) Do the correlation values from (d) provide enough information to estimate the full sequence
x(n), for n = 0, 1, 2, . . . , 14, from x1 and x2 ? If so, explain why. If not, can you think
of a way to circumvent this problem and still obtain a good estimate of x(n), for n =
0, 1, 2, . . . , 14?

Solution

(a) The correlation matrix of the first 5 samples of x1 (n) is


 
rx (0) rx (3) rx (6) rx (9) rx (12)
 r (−3) rx (0) rx (3) rx (6) rx (9) 
 x 
Rx1 =  rx (−6) rx (−3) rx (0) rx (3) rx (6) 
 
 
 rx (−9) rx (−6) rx (−3) rx (0) rx (3) 
rx (−12) rx (−9) rx (−6) rx (−3) rx (0)

Similarly, the correlation matrix of the first 3 samples of x2 (n) is


 
rx (0) rx (5) rx (10)
Rx1 =  rx (−5) rx (0) rx (5) 
 
rx (−10) rx (−5) rx (0)

(b) The cross-correlation matrix between the first 5 samples of x1 (n) and the 3 first samples
of x2 (n) is  
rx (0) rx (5) rx (10)
 r (−3) rx (2) rx (7) 
 x 
Rx1 ,x2 =  rx (−6) rx (−1) rx (4) 
 
 
 rx (−9) rx (−4) rx (1) 
rx (−12) rx (−7) rx (−2)

(c) In practice, the matrices can be computed as a temporal average. For the auto-correlation
matrices, this is fairly easy. Defining x1 (n) = [x(3n), x(3n + 3), x(3n + 6), x(3n + 9), x(3n +
12)]T and x2 (n) = [x(5n), x(5n + 5), x(5n + 10)]T , we can estimate them as
49
1 X
R̂x1 = x1 (n)xH
1 (n),
50
n=0
29
1 X
R̂x2 = x2 (n)xH
2 (n).
30
n=0

The cross-correlation matrix is a bit more tricky to compute since you have to make sure
that the first samples of every block are lined up and this only happens every 15 samples.

5
So in this case we have to compute the average using the vectors x̃1 (n) = [x(15n), x(15n +
3), x(15n + 6), x(15n + 9), x(15n + 12)]T and x̃2 (n) = [x(15n), x(15n + 5), x(15n + 10)]T ,
leading to
9
1 X
R̂x1 ,x2 = x̃1 (n)x̃H
2 (n).
10
n=0

Note that in this case we have less terms in the average.

(d) First note that if you have rx (−k), then you also know rx (k) since rx (−k) = rx∗ (k). As
such, we know rx (k) for the lags |k| = 0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 12.

(e) We look for the filter wn to estimate x(n) from x0 , i.e., x̂(n) = wnT x0 . The Wiener filter
is given by
−1 ∗
wn = E{x0 xH
0 } E{x(n)x0 }
 
rx (n)

 rx (n − 3) 

rx (n − 6)
 
" #−1 



Rx1 Rx1 ,x2  rx (n − 9) 
=  
RH
x1 ,x2 Rx2 
 rx (n − 12) 

rx (n)
 
 
rx (n − 5)
 
 
rx (n − 10)

(f) We always know the correlation values inside the correlation matrix that is inverted. So
that is not an issue. The question is whether we have available all correlation values
inside E{x(n)x∗0 }. Looking at these correlation values, it is clear that we would need all
correlation values rx (k) for |k| = 0, 1, 2, . . . , 14, in order to accurately estimate all samples
x(n), for n = 0, 1, 2, . . . , 14. So what we have from (d) is not enough. To solve this
problem, we could apply a Wiener filter on a part of x0 and not on the full vector. For
instance to estimate x(1), we need rx (1), rx (−2), rx (−5), rx (−8), rx (−11), rx (−4), rx (−9)
if we want to apply the full blown filter. Since we don’t have rx (−8) and rx (−11), we
could drop the last two samples in x1 and only apply a Wiener filter on the other samples.
That Wiener filter can be accurately computed. The same reasoning can be used for other
x(n) values.

You might also like