Lecture 4

2.
160 Identification, Estimation, and Learning

Lecture Notes No. 4
February 17, 2006
3. Random Variables and Random Processes

Deterministic System:
Input Output
In realty, the observed output is noisy and does not fit the model perfectly. In the
deterministic approach we treat such discrepancies as “error’.
Random Process:
An alternative approach is to explicitly model the process describing how the

“error” is generated.
Exogenous Observed
Input Process Output
Process Measurement
Noise Noise
`
Random Process
Objective of modeling random processes:

• Use stochastic properties of the process for better estimating parameters and
state of the process, and
• Better understand, analyze, and evaluate performance of the estimator.
3.1 Random Variables
You may have already learned probability and stochastic processes in some subjects. The
following are fundamentals that will be used regularly for the rest of this course. Check if
you feel comfortable with each of the following definitions and terminology. (Check the
box of each item below.) If not, consult standard textbooks on the subject. See the
references at the end of this chapter.
1
1) Random Variable
X: Random variable is a function that maps every point in the sample space
to the real axis line.
2) Probability distribution Fx(x) and probability density function fx(x); PDF
FX ( x) = Prob( X ≤ x )
d f x (x)
f x ( x) = Fx ( x )
dx
In the statistics and probability literature, the convention is that capital X represents
a random variable while lower-case x is used for an instantiation/realization of the random
variable.
3) Joint probability densities

Let X and Y be two random variables
fXY(x,y) = Prob (X = x and Y = y, simultaneously)
4) Statistically independent (or simply Independent) random variables
fXY(x,y)= fX(x) fY(y)
5) Conditional probability density

f X Y = Pr ob ( X given Y = y ) =
f XY ( x, y )
fY ( y)
= (Joint probability density divided by Probability density of Y=y)
2
If X and Y are independent,
f XY ( x, y ) f X ( x ) f Y ( y )
fX Y = = = f X ( x)
fY ( y ) fY ( y )
Occurrence of Y = y does not influence the occurrence of X = x.
6) Bayes Rule
fY X ( x ) f X ( x )
fX Y = � f X Y f Y ( y ) = f XY ( x, y ) = f Y X ( x ) f X ( x )
fY ( y )
7Expectation
)
∞
Expected value of X E[ X ] = ∫ xf
−∞
X ( x )dx
mean, average
For discrete random variables,

E [ X ] = ∑ pi x i
i
8) Variance
VarX = E[( X − E ( X )) 2 ] = E[ X 2 − 2 XE ( X ) + (E ( X ) ) ] = E[ X 2 ] − (E ( X ) )
2 2
9) Moment
k-th moment of X
∞
E[ x ] = ∫x
k k
f X ( x )dx
−∞
k=1 mean, k=2, k=3 higher moments
10) Normal (Gaussian) Random Variable

X ~ N (m x , σ 2 ) : Random variable X has a normal distribution with mean mx and
variance σ 2
The first and second moments completely characterize the distribution.

1 1
f X ( x) = exp[ − 2 ( x − m x )]
2π σ 2σ
∞
σ 2 = ∫ ( x − m x ) 2 f X ( x )dx
−∞
3
11) Correlation
The expectation of the product of two random variables, X and Y
∞ ∞
E[ XY ] = ∫ ∫ xy f
−∞−∞
XY ( x, y ) dxdy
Joint probability
If X and Y are independent
∞ ∞
E[ XY ] = ∫ ∫ xy f
−∞−∞
X (x) fY ( y ) dxdy
∞ ∞
= ∫x f
−∞
X ( x) dx ⋅ ∫ y fY ( y ) dy
−∞
= E[ X ] E[Y ]
Note that, although the correlation is zero, the two random variables are not
necessarily independent.
12) Orthogonality
X and Y are said to be orthogonal, if E [XY] = 0
13) Covariance
Covariance of X and Y = E [(X-mX) (Y-mY)]
3.2 Random Processes
Random Variable X Random Process X(t)
X(t; S1)
S1 X(t; S2)
S2 t
Sample
Space X(t; Sn)
t
Waveform Space
(Waveform Ensemble)
t
t1 t2
4
A Random Process = a family (ensemble) of time functions having a probability measure
Characterization of a random process
• First-order Densities
fx1 fx2 fx3
X(t1) X(t2) X(t3)
• Second-order densities
Consider a joint probability density fX1X2(x1,x2)
If the random process has some correlation between two random variables X(t1) and X(t2), it
can be captured by the following autocorrelation: (“auto” means correlation within the
single same random process)
RXX (t1 , t2 ) = E[ X (t1 ) X (t2 )] = ∫∫ +∞

x1 x2 f X1X 2 (x1 , x2 ) dx1dx2
−∞
If the auto-correlation function depends only on the time difference τ = t1 − t2 , it reduces to
R XX (τ ) = E[ X (t + τ ) X (t )] R XX (τ ) = R XX (−τ ) even function
5
Then the process is called “Wide Sense Stationary”.
Auto-covariance:
C XX (t1 ,t 2 ) = E[( X (t1 ) − mX (t1 )) ⋅ ( X (t 2 ) − mX (t 2 ))]

= RXX (t1 ,t 2 ) − mX (t1 ) mX (t 2 )
• Higher-order densities
Joint density fX1X2…. Xn (x1,x2 ,…,xn)

E [ X (t1 ) X (t 2 )… X (t n )] n-th order
Total characterization: If joint densities of X(t1) ,X(t 2), …

.., X(tn) for all n are
known, the random process is said to be totally (completely) characterized. …
Unrealistic.
Reference Textbook on Random processes
Lonnie Ludeman , “Random Processes - Filtering, Estimation, and Detection”, Wiley 2003,
ISBN 0-471-25975-6
Robert Brown and Patrick Hwang, “Introduction to Random Signals and Applied Kalman
, ISBN 0-47
Filthering, Third Edition”, Wiley 1997 1-12839-2, TK5102.9.B75
6
Application: Adaptive Noise Cancellation
Let us consider a simple example of the above definitions and properties of random
processes. Active noise cancellation is a technique for canceling unwanted noise by
measuring the noise source as well as the signal source. It was originated by Widrow’s
research in the 60’s, but recently this technique was used for advanced headsets, like
“Bose”. So, people are using it daily, without knowing it. Here is how it works.
Live Concert Recording
Microphone 2
Audience Noise
Recording
Microphone 1
Player Recording
True signal from

the player + Microphone 1 y(t) + z(t)
Player Recording _
x(t) +
w(t)
Interference wı (t )
Dynamics
Microphone 2 Adaptive
v(t) Measuring Noise Filter
Noise from the
audience
Parameter
Estimator
7
Process:
The recorded signal
y (t ) = x(t ) + w(t )
Assumed interference dynamics (FIR):

w(t ) = b1v(t − 1) + b2v(t − 2) + � + bm v(t − m)
Interference dynamics model (FIR):

wı (t ;θı) = bı1v(t − 1) + bı2 v(t − 2) + � + bımv(t − m) = ϕ T (t ) ⋅θı
Noise cancellation:
z (t : θı) = y (t ) − wı (t : θı)
Problem: Tune the FIR parameters θı so that the recovered signal z (t : θı) is as close to the
original true signal x(t) as possible. Assume that the interference w(t) is strongly correlated
to noise v(t) and that the true signal x(t) is uncorrelated with the noise v(t). Consider the
expectation of the squared output z (t : θı) , i.e. the average power of signal z (t : θı) ,
E[ z (t : θı) 2 ] = E[{x(t ) + w(t ) − ϕ T (t ) ⋅ θı}2 ]

= E[x 2 (t )] + 2E[ x(t ){w(t ) − ϕ T (t ) ⋅ θı}] + E[{w(t ) − ϕ T (t ) ⋅ θı}2 ]
Our objective is to find the parameter vector θı that minimizes the mean squared error
E[{w(t ) − ϕ T (t ) ⋅ θı}2 ] . Examining the second term:
E[ x(t ) w(t )] = E[ x(t )b1v(t − 1)] + E[ x(t )b2v(t − 2)] + � + E[ x(t )bm v(t − m)] = 0
E[ x(t ) ⋅ ϕ T (t )θı] = 0
Therefore, minimizing the average power of z (t : θı) with respect to parameter vector θı is
equivalent to minimizing E [{ w ( t ) − ϕ T ( t ) ⋅ θı } 2 ] ,
θı = arg min E[{w(t ) − ϕ T (t ) ⋅ θ }2 ] = arg min E[ z (t : θ ) 2 ]

θ θ
since E[ x(t ) 2 ] is not a function of the parameter vector and is not relevant to minimization
of the squared error.
We can use the Recursive Least Squares algorithm with forgetting factor α ( 0 < α ≤ 1 ):
Pt −1ϕ (t )
θı(t ) = θı(t − 1) + { y (t ) − ϕ T (t )θı(t − 1)}
α + ϕ (t ) Pt −1ϕ (t )
T
1 Pt −1ϕ (t )ϕ T (t ) Pt −1
Pt = (Pt −1 − )
α α + ϕ T (t ) Pt −1ϕ (t )

Lecture 4

Uploaded by

Copyright:

Available Formats

Lecture 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4

Uploaded by

Copyright:

Available Formats

2.

160 Identification, Estimation, and Learning

3. Random Variables and Random Processes

An alternative approach is to explicitly model the process describing how the

Objective of modeling random processes:

3.1 Random Variables

2) Probability distribution Fx(x) and probability density function fx(x); PDF

3) Joint probability densities

fXY(x,y) = Prob (X = x and Y = y, simultaneously)

4) Statistically independent (or simply Independent) random variables

fXY(x,y)= fX(x) fY(y)

5) Conditional probability density

For discrete random variables,

10) Normal (Gaussian) Random Variable

The first and second moments completely characterize the distribution.

Covariance of X and Y = E [(X-mX) (Y-mY)]

3.2 Random Processes

Random Variable X Random Process X(t)

Characterization of a random process

X(t1) X(t2) X(t3)

Consider a joint probability density fX1X2(x1,x2)

RXX (t1 , t2 ) = E[ X (t1 ) X (t2 )] = ∫∫ +∞

If the auto-correlation function depends only on the time difference τ = t1 − t2 , it reduces to

R XX (τ ) = E[ X (t + τ ) X (t )] R XX (τ ) = R XX (−τ ) even function

C XX (t1 ,t 2 ) = E[( X (t1 ) − mX (t1 )) ⋅ ( X (t 2 ) − mX (t 2 ))]

Joint density fX1X2…. Xn (x1,x2 ,…,xn)

Total characterization: If joint densities of X(t1) ,X(t 2), …

Reference Textbook on Random processes

Live Concert Recording

True signal from

Assumed interference dynamics (FIR):

Interference dynamics model (FIR):

E[ z (t : θı) 2 ] = E[{x(t ) + w(t ) − ϕ T (t ) ⋅ θı}2 ]

θı = arg min E[{w(t ) − ϕ T (t ) ⋅ θ }2 ] = arg min E[ z (t : θ ) 2 ]

You might also like