Lecture 4
Lecture 4
Lecture 4
Input Output
In realty, the observed output is noisy and does not fit the model perfectly. In the
deterministic approach we treat such discrepancies as “error’.
Random Process:
Exogenous Observed
Input Process Output
Process Measurement
Noise Noise
`
Random Process
You may have already learned probability and stochastic processes in some subjects. The
following are fundamentals that will be used regularly for the rest of this course. Check if
you feel comfortable with each of the following definitions and terminology. (Check the
box of each item below.) If not, consult standard textbooks on the subject. See the
references at the end of this chapter.
1
1) Random Variable
X: Random variable is a function that maps every point in the sample space
to the real axis line.
FX ( x) = Prob( X ≤ x )
d f x (x)
f x ( x) = Fx ( x )
dx
In the statistics and probability literature, the convention is that capital X represents
a random variable while lower-case x is used for an instantiation/realization of the random
variable.
2
If X and Y are independent,
f XY ( x, y ) f X ( x ) f Y ( y )
fX Y = = = f X ( x)
fY ( y ) fY ( y )
Occurrence of Y = y does not influence the occurrence of X = x.
6) Bayes Rule
fY X ( x ) f X ( x )
fX Y = � f X Y f Y ( y ) = f XY ( x, y ) = f Y X ( x ) f X ( x )
fY ( y )
7Expectation
)
∞
Expected value of X E[ X ] = ∫ xf
−∞
X ( x )dx
mean, average
8) Variance
VarX = E[( X − E ( X )) 2 ] = E[ X 2 − 2 XE ( X ) + (E ( X ) ) ] = E[ X 2 ] − (E ( X ) )
2 2
9) Moment
k-th moment of X
∞
E[ x ] = ∫x
k k
f X ( x )dx
−∞
k=1 mean, k=2, k=3 higher moments
3
11) Correlation
The expectation of the product of two random variables, X and Y
∞ ∞
E[ XY ] = ∫ ∫ xy f
−∞−∞
XY ( x, y ) dxdy
Joint probability
If X and Y are independent
∞ ∞
E[ XY ] = ∫ ∫ xy f
−∞−∞
X (x) fY ( y ) dxdy
∞ ∞
= ∫x f
−∞
X ( x) dx ⋅ ∫ y fY ( y ) dy
−∞
= E[ X ] E[Y ]
Note that, although the correlation is zero, the two random variables are not
necessarily independent.
12) Orthogonality
X and Y are said to be orthogonal, if E [XY] = 0
13) Covariance
X(t; S1)
S1 X(t; S2)
S2 t
Sample
Space X(t; Sn)
t
Waveform Space
(Waveform Ensemble)
t
t1 t2
4
A Random Process = a family (ensemble) of time functions having a probability measure
• First-order Densities
fx1 fx2 fx3
• Second-order densities
If the random process has some correlation between two random variables X(t1) and X(t2), it
can be captured by the following autocorrelation: (“auto” means correlation within the
single same random process)
5
Then the process is called “Wide Sense Stationary”.
Auto-covariance:
• Higher-order densities
Lonnie Ludeman , “Random Processes - Filtering, Estimation, and Detection”, Wiley 2003,
ISBN 0-471-25975-6
Robert Brown and Patrick Hwang, “Introduction to Random Signals and Applied Kalman
, ISBN 0-47
Filthering, Third Edition”, Wiley 1997 1-12839-2, TK5102.9.B75
6
Application: Adaptive Noise Cancellation
Let us consider a simple example of the above definitions and properties of random
processes. Active noise cancellation is a technique for canceling unwanted noise by
measuring the noise source as well as the signal source. It was originated by Widrow’s
research in the 60’s, but recently this technique was used for advanced headsets, like
“Bose”. So, people are using it daily, without knowing it. Here is how it works.
Microphone 2
Audience Noise
Recording
Microphone 1
Player Recording
Microphone 2 Adaptive
v(t) Measuring Noise Filter
Noise from the
audience
Parameter
Estimator
7
Process:
The recorded signal
y (t ) = x(t ) + w(t )
Noise cancellation:
z (t : θı) = y (t ) − wı (t : θı)
Problem: Tune the FIR parameters θı so that the recovered signal z (t : θı) is as close to the
original true signal x(t) as possible. Assume that the interference w(t) is strongly correlated
to noise v(t) and that the true signal x(t) is uncorrelated with the noise v(t). Consider the
expectation of the squared output z (t : θı) , i.e. the average power of signal z (t : θı) ,
Our objective is to find the parameter vector θı that minimizes the mean squared error
E[{w(t ) − ϕ T (t ) ⋅ θı}2 ] . Examining the second term:
E[ x(t ) w(t )] = E[ x(t )b1v(t − 1)] + E[ x(t )b2v(t − 2)] + � + E[ x(t )bm v(t − m)] = 0
E[ x(t ) ⋅ ϕ T (t )θı] = 0
Therefore, minimizing the average power of z (t : θı) with respect to parameter vector θı is
equivalent to minimizing E [{ w ( t ) − ϕ T ( t ) ⋅ θı } 2 ] ,
since E[ x(t ) 2 ] is not a function of the parameter vector and is not relevant to minimization
of the squared error.
We can use the Recursive Least Squares algorithm with forgetting factor α ( 0 < α ≤ 1 ):
Pt −1ϕ (t )
θı(t ) = θı(t − 1) + { y (t ) − ϕ T (t )θı(t − 1)}
α + ϕ (t ) Pt −1ϕ (t )
T
1 Pt −1ϕ (t )ϕ T (t ) Pt −1
Pt = (Pt −1 − )
α α + ϕ T (t ) Pt −1ϕ (t )