Nothing Special   »   [go: up one dir, main page]

2.1 Definition of The Problem

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2.

INTRODUCTION
2.1 Definition of the Problem
Modern estimation theory is applied in many areas such as

• Radar
• Sonar
• Speech
• Image analysis
• Biomedicine and biomedical engineering
• Communications
• Control
• Seismology … etc
The problem is to estimate values of a group of parameters. 0, we are interested in d To
determine the range, we transmit an electromagnetic pulse that is reflected by the aircraft,

To determine the range, we transmit an electromagnetic pulse that is reflected by the aircraft,
causing an echo to be received by the antenna 𝜏0 seconds later. The range is determined by
the equation 𝜏0 = 2𝑅/𝑐, where 𝑐 is the speed of electromagnetic propagation. The received
echo faces environmental, electromagnetic, and thermal noise. Moreover, the electronic
system and introduces a time delay also. The radar system puts the continuous waveform into
a digital computer by sampling the received data and processes the resulting time series to
estimate the value of the range.
Like the radar system, all the abovementioned systems are faced with the problem of
extracting values of parameters.

1. We sample and store the values of the continuous-time waveform.


2. We have equivalent problem of extracting parameter values from discrete-time
waveform.
Mathematically, we have 𝑁-point (generally noisy or corrupted) data set (or random process)
{𝑥[0], 𝑥[1], ⋯ , 𝑥[𝑁 − 1]} that depends on an unknown parameter 𝜃. We wish to determine
𝜃 based on the data, which we call an estimator

𝜃̂ = 𝑔(𝑥[0], 𝑥[1], ⋯ , 𝑥[𝑁 − 1])


where 𝑔 is some function.

• this is the problem of parameter estimation, which is the subject of this course.
• We need to determine the estimator 𝑔
• We need to determine the length of the data 𝑁
• How close is 𝜃̂ to 𝜃

2.2 The Mathematical Estimation Problem


• The first step is to model the data
• Since the data are inherently random i.e., it contains randomness, we describe it by
its probability density function (pdf): 𝑝(𝑥[0], 𝑥[1], ⋯ , 𝑥[𝑁 − 1]; 𝜃)
• The pdf is parametrized by the unknown parameter 𝜃 and we use semicolon to denote
it.
• For example, if 𝑁 = 1 and 𝜃 denotes the mean, the pdf is

1 1
𝑝(𝑥[0]; 𝜃) = exp [− (𝑥[0] − 𝜃)2 ]
√2𝜋𝜎 2 2𝜎 2

• If 𝑁 is not 1 and 𝜃 denotes the mean, the pdf is


𝑁−1
1 1
𝑝(𝐱; 𝜃) = 𝑁 exp [− 2
∑(𝑥[𝑛] − 𝜃)2 ]
2𝜎
(2𝜋𝜎 2 ) 2 𝑛=0
under the assumption that 𝐱 is WSS
• Once the pdf has been specified the problem becomes determining an optimal
estimator (or function) of data.
• An estimator is a rule that assigns a value to 𝜃 for each realization of 𝐱.

2.3 Assessing Estimator Performance


Consider a DC level 𝐴 embedded in noise,
𝑥[𝑛] = 𝐴 + 𝑤[𝑛]
Where 𝑤[𝑛] is a zero-mean noise process.

• Based on the observations {𝑥[0], 𝑥[1], ⋯ , 𝑥[𝑁 − 1]} we would like to estimate 𝐴
• Intuitively, since A is the average level of 𝑥[𝑛], it would be reasonable to estimate A as
𝑁−1
1
𝐴̂ = ∑ 𝑥[𝑛]
𝑁
𝑛=0

• Several questions come to mind


o How close will 𝐴̂ be to 𝐴?
o Are there better estimators than sample mean?
o How to measure, evaluate and compare performances of various estimators?

-1

-2
0 50 100 150

Figure 2. A DC signal embedded in noise.


For example, consider that it turns out to be 𝐴̂ = 0.9 for the realization shown in Figure 2,
where 𝐴 = 1 is the true value. Another estimator might be

𝐴̌ = 𝑥[0]

Intuitively, it will not perform well, since it doesn’t make use of all the data. There no averaging
to reduce noise effects. However, for this data set, 𝐴̌ = 0.95 turns out to be closer than 𝐴̂.
Can we conclude that 𝐴̌ is a better estimator? The answer is of course no.

• Since an estimator is a function of the data, which are random variables, it is too a
random variable.

• Since an estimator is a random variable, it is subject to many possible outcomes.


• The performance of the estimator can be completely described statistically, or by its
pdf.
• 𝐴̌ is better for only this realization. Most of the time, it is worse than 𝐴̂. To evaluate
the performance of the estimator,
o We first look at the expected value of the estimator.
𝑁−1 𝑁−1
1 1
𝐸[𝐴̂] = 𝐸 ( ∑ 𝑥[𝑛]) = ∑ 𝐸(𝑥[𝑛]) = 𝐴
𝑁 𝑁
𝑛=0 𝑛=0

𝐸[𝐴̌] = 𝐸(𝑥[0]) = 𝐴

o We measure the performance of the estimators by comparing their variances.


𝑁−1 𝑁−1
1 1 1 𝜎2
var(𝐴̂) = var ( ∑ 𝑥[𝑛]) = 2 ∑ var(𝑥[𝑛]) = 2 𝑁𝜎 2 =
𝑁 𝑁 𝑁 𝑁
𝑛=0 𝑛=0

var(𝐴̌) = var(𝑥[0]) = 𝜎 2

var(𝐴̂) < var(𝐴̌)

o If a numerical performance analysis cannot be made or too hard to solve, we


make Monte-Carlo analysis:

▪ We repeat the experiment for 1000 times by making computer


simulations and find an estimate of variance of the estimators.
▪ Number of experiments can be increased or decreased. The number of
simulations are usually larger than 100.

2.4 Review: Gaussian Probability Density Function


The most useful and frequently used pdf is the Gaussian distribution. Pdf of a Gaussian
random variable with mean 𝜇𝑥 and variance 𝜎𝑥2 is
1 1
𝑝(𝑥) = exp [− (𝑥 − 𝜇𝑥 )2 ] , −∞<𝑥 <∞
√2𝜋𝜎𝑥2 2𝜎𝑥2

The shorthand notation 𝑥~𝒩(𝜇𝑥 , 𝜎𝑥2 ) is often used. For iid random processes 𝑥(𝑛) =
𝑥(0), 𝑥(1), ⋯ 𝑥(𝑁 − 1), each of which are distributed as 𝒩(𝜇𝑥 , 𝜎𝑥2 ). Then, 𝑥(𝑛) is distributed
according to
𝑁−1
𝑁 (𝑥)
1 1
𝑝(𝑥(𝑛)) = 𝑝 = 2 𝑁/2
exp [− 2 ∑(𝑥(𝑛) − 𝜇𝑥 )2 ]
(2𝜋𝜎𝑥 ) 2𝜎𝑥
𝑛=0

Let us assume that we have the data

𝑥(𝑛) = 𝑑(𝑛) + 𝑤(𝑛), 𝑛 = 0, ⋯ , 𝑁 − 1

Where 𝑤(𝑛)~𝒩(0, 𝜎 2 ) are iid random process. Then the pdf of the above 𝑥(𝑛) can be written
by
𝑁−1
1 1 2
𝑝(𝑥(𝑛)) = exp [− ∑(𝑥(𝑛) − 𝑑(𝑛)) ]
(2𝜋𝜎 2 )𝑁/2 2𝜎 2
𝑛=0

The extension to a set of random variables 𝐱 = [𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛 ]𝑇 with mean


𝐸[𝐱] = 𝛍𝐱
and covariance matrix
𝐸[(𝐱 − 𝛍𝐱 )(𝐱 − 𝛍𝐱 )𝑇 ] = 𝐂𝐱
is the multivariate Gaussian pdf
1 1
𝑝(𝐱) = 𝑁 1 exp [− (𝐱 − 𝛍𝐱 )𝑇 𝐂𝐱 (𝐱 − 𝛍𝐱 )]
2
(2𝜋) 2 det(𝐂𝐱 ) 2

Notice that 𝐂𝐱 is an 𝑛 × 𝑛 symmetric matrix with [𝐂𝐱 ]𝑖𝑗 = 𝐸{(𝑥𝑖 − 𝐸[𝑥𝑖 ])(𝑥𝑗 − 𝐸[𝑥𝑗 ])} =
cov(𝑥𝑖 𝑥𝑗 ). 𝐂𝐱 is assumed to be positive definite so that it is invertible. If 𝐂𝐱 is a diagonal matrix,
then the random variables are uncorrelated.
If the random variables are uncorrelated, 𝑝(𝐱) factors into product of N univariate Gaussian
pdfs and hence the random variables are also independent.
If 𝐱 is linearly transformed as
𝐲 = 𝐀𝐱 + 𝐛
where 𝐀 is 𝑚 × 𝑛 and b is 𝑚 × 1 with 𝑚 ≤ 𝑛 and 𝐀 full rank (so that 𝐂𝐱 is nonsingular), then
𝐲 is also distributed according to a multivariate Gaussian distribution with
𝐸[𝐲] = 𝛍𝐲 = 𝐀𝛍𝐲 + 𝐛

and
𝑇
𝐸 [(𝐲 − 𝛍𝐲 )(𝐲 − 𝛍𝒚 ) ] = 𝐂𝐲 = 𝐀𝐂𝐱 𝐀𝐓

2.5 HOMEWORK 2:
Research homework:
1. What is an independent and identically distributed random process.
a. Definition
b. Properties
c. What is a Gaussian iid process
2. What is a circularly symmetric iid Gaussian process? What is the difference between
an ordinary iid Gaussian process.
o

You might also like