Good Estimation Detection Notes

Estimation techniques
March 2, 2006
Contents
1 Problem Statement 2
2 Bayesian Estimation Techniques 2

2.1 Minimum Mean Squared Error (MMSE) estimation . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.1 General formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1.2 Important special case: Gaussian model . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.3 Example: Linear Gaussian Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Linear estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Linear MMSE Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Orthogonality Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4.1 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4.2 Example: Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Non-Bayesian Estimation Techniques 6

3.1 The BLU estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Example: Linear Gaussian Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 The LS estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Orthogonality condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Extension to vector estimation 8
5 Further problems 8
5.1 Problem 1: relations for Linear Gaussian Models . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.1.1 No a priori information: MMSE and LS . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1.2 Uncorrelated noise: BLU and LS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.1.3 Zero-noise: MMSE and LS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Problem 2: orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.3 Practical example: multi-antenna communication . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.3.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.3.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1
1 Problem Statement
We are interested in estimating a continuous scalar1 parameter a ∈ A from a vector observation r. The
observation and the parameter are related through a probabilistic mapping pR|A (r |a ). a may or may not
be a sample of a random variable. This leads to Bayesian and non-Bayesian estimators, respectively.
Special cases
• Noisy observation model:
r = f (a, n)
where n (the noise component) is independent of a, and has a pdf pN (n).
• Linear model: for a known h,

r = ha + n
with n independent of a
• Linear Gaussian model: linear model with N ∼ N (0, ΣN ), A ∼ N (mA , ΣA )
2 Bayesian Estimation Techniques

Here, a ∈ A has a known a priori distribution pA (a). The most important Bayesian estimators are
• the MAP (maximum a posteriori) estimator

• the MMSE (minimum mean squared error) estimator
• the linear MMSE estimator
The latter two will be covered in this section. We remind that the MAP estimate is given by
âM AP (r) = arg max pA|R (a |r ) .

a∈A
2.1 Minimum Mean Squared Error (MMSE) estimation

2.1.1 General formulation
The MMSE estimator minimizes the expected estimation error
Z Z
2
C = da (a − â (r)) pA|R (a |r ) pR (r) dr
A R
n o
2
= E (A − â (R)) .
Taking the derivative (assuming this is possible) w.r.t. â (r) and setting the result to zero yields
Z
â (r) = a × pA|R (a |r ) da.
A
1 We will only consider estimation of scalars. Extension to vectors is straightforward.
2
2.1.2 Important special case: Gaussian model
When A and R are jointly Gaussian, we can write
[A, R] ∼ N (m, Σ)
where · ¸
mA
m=
mR
and · ¸
ΣA ΣTAR
Σ= .
ΣAR ΣR
In that case it can be shown (after some straightforward manipulation), that
¡ ¢
A|R ∼ N mA + ΣTAR Σ−1 T −1
R (r − mR ) , ΣA − ΣAR ΣR ΣAR
so that
âM M SE (r) = mA + ΣTAR Σ−1
R (r − mR ) (1)
Note that
• A posteriori (after observing r), the uncertainty w.r.t. a is reduced from ΣA to ΣA −ΣTAR Σ−1
R ΣAR < ΣA .
• When A and R are independent ΣAR = 0, so that âM M SE (r) = mA

• For A and R jointly Gaussian, âM AP (r) = âM M SE (r).
• The estimator is linear in the observation
2.1.3 Example: Linear Gaussian Model

Problem: Derive the MMSE estimator for the Linear Gaussian Model
Solution: We know that
R = hA + N
with A ∼ N (mA , ΣA ), and N ∼ N (0, ΣN ) with A and N independent. We also know that the MMSE estimator is
given by:

R (r − mR ) .
Let us compute mR , ΣR and ΣAR .
mR = E {R}
= hmA
ΣAR = E {(R − mR ) (A − mA )}
= E {(hA + N − hmA ) (A − mA )}
= E {(h (A − mA ) + N) (A − mA )}
n o
2
= hE (A − mA ) + E {N (A − mA )}
= hΣA
and
n o
T
ΣR = E (R − mR ) (R − mR )
n o
T
= E (h (A − mA ) + N) (h (A − mA ) + N)
n o © ª
2
= hE (A − mA ) hT + E NNT
= hΣA hT + ΣN .
3
This leads to

R (r − mR )
¡ ¢−1
= mA + ΣA h hΣA hT + ΣN
T
(r − hmA )
2.2 Linear estimation

In some cases, it is preferred to have an estimator which is a linear function of the observation:
â (r) = bT r + c
so that â (r) is obtained through an affine transformation of the observation. Clearly, when A and R are
jointly Gaussian, the MMSE estimator is a linear estimator with
bT = ΣTAR Σ−1
R
and
c = mA + ΣTAR Σ−1
R (mR ) .
2.3 Linear MMSE Estimator

When A and R are not jointly Gaussian, we can still use (1) as an estimator. This is known as the Linear
MMSE (LMMSE) estimator:
âLM M SE (r) = mA + ΣTAR Σ−1R (r − mR )
Theorem: We wish to find an estimator with the following properties
• the estimator must be linear in the observation

• E {â (r)} = mA (this can be interpreted as a form of unbiasedness)
• the estimator has the minimum MSE among all linear estimators
Then the estimator is the LMMSE estimator.

Proof: Given a generic linear estimator
â (r) = bT r + c (2)
we would like to find b and c such that the resulting estimator is unbiased and has minimum variance.
Clearly
E {â (r)} = bT mR + c
so that b and c have to satisfy
bT mR + c = mA
c = mA − bT mR
c2 = m2A + bT mR mTR b − 2mA bT mR
We also know that
n o
T
ΣR = E (R − mR ) (R − mR )
© ª
= E RRT − mR mTR
and
ΣAR = E {(R − mR ) (A − mA )}
= E {RA} − mR mA .
4
Let us now look at the variance of the estimation error (the estimation error is zero-mean):
n o
2
V = E (â (R) − A)
n¡ ¢2 o
= E bT R + c − A
© ª © ª
= bT E RRT b + E A2 + c2 + 2cbT mR − 2cmA − 2bT E {RA}
¡ ¢ ¡ ¢
= bT ΣR + mR mTR b + ΣA + m2A + c2 + 2c bT mR − mA − 2bT (ΣAR + mR mA )
¡ ¢
= bT ΣR + mR mTR b + ΣA + m2A − c2 − 2bT (ΣAR + mR mA )
= bT ΣR b + ΣA − 2bT ΣAR
where we have used the fact that c2 = m2A + bT mR mTR b − 2mA bT mR . Taking derivative w.r.t. b and
equating the result to zero, yields2
2ΣR b − 2ΣAR = 0
and thus (since Σ−1
R is symmetric)
b = Σ−1
R ΣAR (3)
and
c = mA − bT mR
= mA − ΣTAR Σ−1
R mR (4)
Substitution of (3) and (4) into (2) gives us the final result
â (r) = mA + ΣTAR Σ−1

R (r − mR )
which is clearly the desired result. QED
2.4 Orthogonality Condition

2.4.1 Principle
The orthogonality condition is useful in deriving (L-)MMSE estimators.
Theorem: for LMMSE estimation, the estimation error is (statistically) orthogonal to the observation when
mR = 0 and mA = 0:
Proof:
© ¡ ¢ª
E {R (â (R) − A)} = E R ΣTAR Σ−1 R R−A
© ª¡ ¢
= E RRT Σ−1 R ΣAR − E {RA}
¡ ¢
= ΣR Σ−1 R ΣAR − ΣAR
= 0.
QED.
2.4.2 Example: Linear Model

Problem: Verify the orthogonality condition assuming mA = 0.
Solution: For mA = 0, we need to verify that E {R (â (R) − A)} = 0, we see that we need to verify that
E {Râ (R)} = E {RA}

= hΣA
2 To see this requires some knowledge of vector calculus: differentiation w.r.t. x gives us xT Ax → 2Ax and xT y → y.
5
n ³ ¡ ¢−1 ´o
E {Râ (R)} = E R ΣA hT hΣA hT + ΣN R
© ª¡ ¢−1
= E RRT hΣA hT + ΣN ΣA h
¡ T
¢¡ T
¢−1
= hΣA h + ΣN hΣA h + ΣN ΣA h
= ΣA h
3 Non-Bayesian Estimation Techniques

The above techniques cannot be applied when we do not consider A to be a random variable. Another
approach is called for, where we would like unbiased estimators with small variances. The most important
non-Bayesian estimators are
• the ML (maximum likelihood) estimator

• the BLU (best linear unbiased) estimator
• the LS (least squares) estimator
The latter two will be covered in this section. All expectations are taken for a given value of a. We remind
that the ML estimate is given by
âM L (r) = arg max pR|A (r |a ) .
a∈A
3.1 The BLU estimator

As before, we consider a generic linear estimator:
â (r) = bT r + c.
To obtain an unbiased estimator, we need to assume that c = 0 and that
E {R} = ha
for some known vector h. We introduce

n o
T
ΣR = E (R − E {R}) (R − E {R})
© ª
= E RRT − a2 hhT
which is assumed to be known to the estimator. Our goal is to find a b that leads to an unbiased estimator
with minimal variance for all a.
Unbiased
An unbiased estimator must satisfy E {â (r)} = a, for all a ∈ A, so that
bT h = 1.
6
Variance of Estimation Error
The variance of the estimation error is given by
n o
2
Va = E (â (R) − a)
n¡ ¢2 o
= E bT r − a
n¡ ¢2 o
= E bT (r − ha)
n o
2
= bT E (r − ha) b
= bT ΣR b.
This leads us to the following optimization problem: find b that minimizes Va , subject to bT h = 1. Using a
Lagrange multiplier technique, we find we need to minimize
¡ ¢
bT ΣR b − λ bT h − 1 .
This leads to
b = Σ−1
R hλ.
With the constraint bT h = 1 giving rise to
λhT Σ−1
R h=1
so that ¡ T −1 ¢−1
b = Σ−1
R h h ΣR h
and
¡ ¢−1 ¡ T −1 ¢−1
Va = hT Σ−1
R h hT Σ−1 −1
R ΣR ΣR h h ΣR h
¡ ¢−1
= hT Σ−1
R h .
3.1.1 Example: Linear Gaussian Model

Problem: derive the BLU estimator for the Linear Gaussian Model
Solution:
¡ ¢−1 T −1
âBLU (r) = hT Σ−1
R h h ΣR r
where
n o
T
ΣR = E (R − mR ) (R − mR )
n o
T
= E (R − ha) (R − ha)
n o
T
= E (ha + N − ha) (ha + N − ha)
= ΣN .
Hence ¡ ¢−1 T −1
âBLU (r) = hT Σ−1
N h h ΣN r
7
3.2 The LS estimator
If we refer back to our problem statement, the LS estimator tries to find a that minimizes the distance the
between the observation and the ‘reconstructed’ noiseless observation:
n o
2
d = E kr − h (a, 0)k .
When h (a, 0) = ha, we find that

n o
2
d = E kr − hak
© ª
= E rT r + a2 hT h − 2arT h
Minimizing w.r.t. a gives us
2ahT h − 2rT h = 0
and finally
¡ ¢−1 T
âLS (r) = hT h h r.
3.3 Orthogonality condition

The reconstruction error is orthogonal to the parameter:
³ ¡ ¢−1 T ´T
T
(r − hâLS (r)) a = r − h hT h h r a
T
= (r − r) a
= 0
4 Extension to vector estimation

When
r = Ha + n
with r and n N × 1 vectors, H a N × M matrix, and a an M × 1 vector. The MMSE, LS and BLU estimates
are then given by
¡ ¢−1
âLM M SE (r) = mA + ΣA HT HΣA HT + ΣN (r − HmA )
¡ T −1 ¢
−1 −1 T −1
= mA + H ΣN H + ΣA H ΣN (r − HmA )
¡ ¢−1 T
âLS (r) = HT H H r
¡ T −1 ¢−1 T −1
âBLU (r) = H ΣN H H ΣN r
5 Further problems
5.1 Problem 1: relations for Linear Gaussian Models
How are MMSE, LS and BLU related? Consider the scalar case. The estimators are given by
¡ ¢−1
âM M SE (r) = mA + ΣA hT hΣA hT + ΣN (r − hmA )
¡ ¢−1 T
âLS (r) = hT h h r
¡ T −1 ¢−1 T −1
âBLU (r) = h ΣN h h ΣN r
8
5.1.1 No a priori information: MMSE and LS
When mA = 0 and Σ−1
A = 0 we get
¡ ¢−1
âM M SE (r) = hT hhT r
¡ T ¢−1 T
= hh h r
= âLS (r)
5.1.2 Uncorrelated noise: BLU and LS

When ΣN = σ 2 I,
¡ T ¢−1 T
âBLU (r) = h h h r
= âLS (r)
5.1.3 Zero-noise: MMSE and LS

For the MMSE estimator, when ΣN = 0,
¡ ¢−1
âM M SE (r) = mA + hT hhT (r − hmA )
T
¡ T ¢−1
= h hh r
= âLS (r) .
5.2 Problem 2: orthogonality

Problem: Derive the LMMSE estimator, starting from the orthogonality condition.
Solution: We introduce Q = A − mA , and S = R − mR = R − hmA , and find the LMMSE estimate of Q
from the observation S. The orthogonality principle then tells us that
E {Sq̂ (S)} = E {SQ}

= hΣA
with, for some (as yet undetermined) b:

© ª
E {Sq̂ (S)} = E SbT S
¡ ¢
= hΣA hT + ΣN b
so that ¡ ¢−1
b = hΣA hT + ΣN ΣA h.
Hence ¡ ¢−1
q̂LM M SE (s) = ΣA hT hΣA hT + ΣN s.
The estimate of A is then given by
âLM M SE (r) = q̂LM M SE (r − mR ) + mA

¡ ¢−1
= mA + ΣA hT hΣA hT + ΣN (r − hmA ) .
5.3 Practical example: multi-antenna communication

In this problem, we show how LMMSE and LS can be used when ML or MAP leads to algorithms which
are too complex to implement.
9
5.3.1 Problem
We are interested in the following problem: we transmit a vector of nT iid data symbols a ∈ A = ΩnT over
an AWGN channel. Here nT is the number of transmit antennas. The receiver is equipped with nR receive
antennas. We can write the observation as
r = Ha + n
¡ ¢
where H is a (known) nR × nT channel matrix and N ∼ N 0, σ 2 InR . In any practical communication
scheme, Ω is a finite set of size M (e.g., BPSK signaling with Ω = {−1, +1}). The symbols are iid, uniformly
distributed over Ω.
1. Determine the ML and MAP estimates of a. Determine the complexity of the receiver (number of
operations to estimate a) as a function of nT .
2. How can we use LMMSE and LS be used to estimate a? We assume ΣA = InT and mA = 0. Determine
the complexity of the resulting estimators as a function of nT .
5.3.2 Solution
Solution - part 1
The MAP and ML estimators are considered to be optimal in the case of estimating discrete parameters, in
a sense of minimizing the error probability.
âM L (r) = arg max

n
pR|A (r |a )
a∈Ω T
= arg max
n
log pR|A (r |a )
a∈Ω T
1 2
= arg max − kr − Hak
n
a∈Ω σ2T
2
= arg min
n
kr − Hak
a∈Ω T
âM AP (r) = arg max

n
pA|R (a |r )
a∈Ω T
pA (a)
= arg max pR|A (r |a )
n
a∈Ω T pR (r)
= arg max
n
pR|A (r |a )
a∈Ω T
= âM L (r)
2
Complexity: for each a ∈ ΩnT , we need to compute kr − Hak . Hence, the complexity is exponential in the
number of transmit antennas.
Special case:
Let us assume Ω = {−1, +1} and nT = 1, nR > 1. Then
¡ ¢
âM L (r) = arg max ahT r
a∈Ω
In this case we combine the observations from multiple receive antennas. Each observation is weighted
with the channel gain. This means that when the channel is unreliable on a given antenna (so that hk is
small), this has only a small contribution to our decision statistic.
10
Solution - part 2
Let us temporarily forget that a lives in a discrete space. We can then introduce the LMMSE and LS esti-
mates as follows:
¡ ¢−1
âLM M SE (r) = HT HHT + σ 2 InR r
¡ ¢−1 T
âLS (r) = HT H H r.
Note that âLM M SE (r) and âLS (r) may not belong to ΩnT . Hence, we need to quantize these estimates
(mapping the estimate to the closest point in ΩnT ); this can be done on a symbol-per-symbol basis. Gen-
erally the LS estimator will have poor performance when HHT is close to singular. Note that the LMMSE
estimator requires knowledge of σ 2 , while the MAP and ML estimators do not.
Complexity: now the complexity is of the order nR × nT (since any matrix not depending on r can be
pre-computed by the receiver), which is linear in the number of transmit antennas.
Special case: -=-=-Comment: there was a mistake in the lecture. These are the correct results-=-=-
Let us assume that we use more transmit that receive antennas: nT > nR . In that case the the nR × nR
matrix HHT is necessarily singular, so that LS estimation will not work. The LMMSE estimator requires
the inversion of HHT + σ 2 InR . When σ 2 6= 0, this matrix is always invertible. This is a strange situation,
where noise is actually helpful in the estimation process.
11

Good Estimation Detection Notes

Uploaded by

Copyright:

Available Formats

Good Estimation Detection Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Good Estimation Detection Notes

Uploaded by

Copyright:

Available Formats

Estimation techniques

2 Bayesian Estimation Techniques 2

3 Non-Bayesian Estimation Techniques 6

4 Extension to vector estimation 8

• Linear model: for a known h,

2 Bayesian Estimation Techniques

• the MAP (maximum a posteriori) estimator

• the linear MMSE estimator

âM AP (r) = arg max pA|R (a |r ) .

2.1 Minimum Mean Squared Error (MMSE) estimation

• When A and R are independent ΣAR = 0, so that âM M SE (r) = mA

2.1.3 Example: Linear Gaussian Model

âM M SE (r) = mA + ΣTAR Σ−1

âM M SE (r) = mA + ΣTAR Σ−1

2.2 Linear estimation

2.3 Linear MMSE Estimator

Theorem: We wish to find an estimator with the following properties

• the estimator must be linear in the observation

Then the estimator is the LMMSE estimator.

We also know that

â (r) = mA + ΣTAR Σ−1

which is clearly the desired result. QED

2.4 Orthogonality Condition

2.4.2 Example: Linear Model

E {Râ (R)} = E {RA}

3 Non-Bayesian Estimation Techniques

• the ML (maximum likelihood) estimator

3.1 The BLU estimator

To obtain an unbiased estimator, we need to assume that c = 0 and that

for some known vector h. We introduce

With the constraint bT h = 1 giving rise to

3.1.1 Example: Linear Gaussian Model

When h (a, 0) = ha, we find that

3.3 Orthogonality condition

4 Extension to vector estimation

5.1.2 Uncorrelated noise: BLU and LS

5.1.3 Zero-noise: MMSE and LS

5.2 Problem 2: orthogonality

E {Sq̂ (S)} = E {SQ}

with, for some (as yet undetermined) b:

âLM M SE (r) = q̂LM M SE (r − mR ) + mA

5.3 Practical example: multi-antenna communication

âM L (r) = arg max

âM AP (r) = arg max

You might also like