Nothing Special   »   [go: up one dir, main page]

SSP4SE Appa

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Appendix A

Some probability and statistics

A.1 Probabilities, random variables and their distribution


We summarize a few of the basic concepts of random variables, usually de-
noted by capital letters, X ,Y, Z, etc, and their probability distributions, defined
by the cumulative distribution function (CDF) FX (x) = P(X ≤ x), etc.
To a random experiment we define a sample space Ω, that contains all
the outcomes that can occur in the experiment. A random variable, X , is a
function defined on a sample space Ω. To each outcome ω ∈ Ω it defines a
real number X (ω ), that represents the value of a numeric quantity that can
be measured in the experiment. For X to be called a random variable, the
probability P(X ≤ x) has to be defined for all real x.

Distributions and moments


A probability distribution with cumulative distribution function FX (x) can be
discrete or continuous with probability function pX (x) and probability density
function fX (x), respectively, such that
(
∑k≤x pX (k), if X takes only integer values,
FX (x) = P(X ≤ x) = Rx
−∞ fX (y) dy.

A distribution can be of mixed type. The distribution function is then an inte-


gral plus discrete jumps; see Appendix B.
The expectation of a random variable X is defined as the center of gravity
in the distribution,
(
∑k kpX (k),
E[X ] = mX = R ∞
x=−∞ x fX (x) dx.

The variance is a simple measure of the spreading of the distribution and is

261
262 SOME PROBABILITY AND STATISTICS
defined as
(
2 2
∑k (k − mX )2 pX (k),
V[X ] = E[(X − mX ) ] = E[X ] − m2X = R∞ 2
x=−∞ (x − mX ) fX (x) dx.

Chebyshev’s inequality states that, for all ε > 0,


E[(X − mX )2 ]
P(|X − mX | > ε ) ≤ .
ε2
In order to describe the statistical properties of a random function one
needs the notion of multivariate distributions. If the result of a experiment
is described by two different quantities, denoted by X and Y , e.g. length and
height of randomly chosen individual in a population, or the value of a random
function at two different time points, one has to deal with a two-dimensional
random variable. This is described by its two-dimensional distribution func-
tion FX,Y (x, y) = P(X ≤ x,Y ≤ y), or the corresponding two-dimensional prob-
ability or density function,
∂ 2 FX,Y (x, y)
fX,Y (x, y) = .
∂x∂y
Two random variables X ,Y are independent if, for all x, y,
FX,Y (x, y) = FX (x)FY (y).
An important concept is the covariance between two random variables X and
Y , defined as
C[X ,Y ] = E[(X − mX )(Y − mY )] = E[XY ] − mX mY .
The correlation coefficient is equal to the dimensionless, normalized covari-
ance,1
C[X ,Y ]
ρ [X ,Y ] = p .
V[X ]V[Y ]
Two random variable with zero correlation, ρ [X ,Y ] = 0, are called uncorre-
lated. Note that if two random variables X and Y are independent, then they
are also uncorrelated, but the reverse does not hold. It can happen that two
uncorrelated variables are dependent.
1 Speaking about the “correlation between two random quantities”, one often means the de-
gree of covariation between the two. However, one has to remember that the correlation only
measures the degree of linear covariation. Another meaning of the term correlation is used
in connection with two data series, (x1 , . . . , xn ) and (y1 , . . . , yn ). Then sometimes the sum of
products, ∑n1 xk yk , can be called “correlation”, and a device that produces this sum is called a
“correlator”.
MULTIDIMENSIONAL NORMAL DISTRIBUTION 263
Conditional distributions
If X and Y are two random variables with bivariate density function fX,Y (x, y),
we can define the conditional distribution for X given Y = y, by the condi-
tional density,
fX,Y (x, y)
fX|Y =y (x) = ,
fY (y)
for every y where the marginal density fY (y) is non-zero. The expectation in
this distribution, the conditional expectation, is a function of y, and is denoted
and defined as
Z
E[X | Y = y] = x fX|Y =y (x) dx = m(y).
x

The conditional variance is defined as


Z
V[X | Y = y] = (x − m(y))2 fX|Y =y (x) dx = σX|Y (y).
x

The unconditional expectation of X can be obtained from the law of total


probability, and computed as
Z Z 
E[X ] = x fX|Y =y (x) dx fY (y) dy = E[E[X | Y ]].
y x

The unconditional variance of X is given by

V[X ] = E[V[X | Y ]] + V[E[X | Y ]].

A.2 Multidimensional normal distribution


A one-dimensional normal random variable X with expectation m and vari-
ance σ 2 has probability density function
(  )
1 1 x−m 2
fX (x) = √ exp − ,
2πσ 2 2 σ

and we write X ∼ N(m, σ 2 ). If m = 0 and σ = 1, the normal distribution is


standardized. If X ∼ N(0, 1), then σ X + m ∈ N(m, σ 2 ), and if X ∼ N(m, σ 2 )
then (X − m)/σ ∼ N(0, 1). We accept a constant random variable, X ≡ m, as
a normal variable, X ∼ N(m, 0).
264 SOME PROBABILITY AND STATISTICS
Now, let X1 , . . . , Xn have expectation mk = E[Xk ] and covariances σ jk =
C[X j , Xk ], and define, (with ′ for transpose),

µ = (m1 , . . . , mn )′ ,
Σ = (σ jk ) = the covariance matrix for X1 , . . . , Xn .

It is a characteristic property of the normal distributions that all linear


combinations of a multivariate normal variable also has a normal distribution.
To formulate the definition, write a = (a1 , . . . , an )′ and X = (X1 , . . . , Xn )′ , with
a′ X = a1 X1 + · · · + an Xn . Then,
n
E[a′ X] = a′ µ = ∑ a jm j,
j=1
n (A.1)
V[a′ X] = a′ Σ a = ∑ a j ak σ jk .
j,k

✤ ✜

Definition A.1. The random variables X1 , . . . , Xn , are said to have


an n-dimensional normal distribution is every linear combination
a1 X1 + · · · + an Xn has a normal distribution. From (A.1) we have
that X = (X1 , . . . , Xn )′ is n-dimensional normal, if and only if a′ X ∼
N(a′ µ , a′ Σ a) for all a = (a1 , . . . , an )′ .
✣ ✢
Obviously, Xk = 0 · X1 + · · · + 1 · Xk + · · · + 0 · Xn , is normal, i.e., all
marginal distributions in an n-dimensional normal distribution are one-
dimensional normal. However, the reverse is not necessarily true; there are
variables X1 , . . . , Xn , each of which is one-dimensional normal, but the vector
(X1 , . . . , Xn )′ is not n-dimensional normal.
It is an important consequence of the definition that sums and differences
of n-dimensional normal variables have a normal distribution.
If the covariance matrix Σ is non-singular, the n-dimensional normal dis-
tribution has a probability density function (with x = (x1 , . . . , xn )′ )
 
1 1
√ µ
exp − (x − ) Σ
′ −1
µ
(x − ) . (A.2)
(2π )n/2 det Σ 2

The distribution is said to be non-singular. The density (A.2) is constant on


every ellipsoid (x − µ )′ Σ −1 (x − µ ) = C in Rn .
✤ ✜
MULTIDIMENSIONAL NORMAL DISTRIBUTION 265

Note: the density function of an n-dimensional normal distribution


is uniquely determined by the expectations and covariances.
✣ ✢

Example A.1. Suppose X1 , X2 have a two-dimensional normal distribution If


det Σ = σ11 σ22 − σ12
2
> 0,
then Σ is non-singular, and
 
1 σ22 −σ12
Σ −1
= .
det Σ −σ12 σ11

With Q(x1 , x2 ) = (x − µ )′ Σ −1 (x − µ ),
Q(x1 , x2 ) = 
= σ σ 1 −σ 2 (x1 − m1 )2 σ22 − 2(x1 − m1 )(x2 − m2 )σ12 + (x2 − m2 )2 σ11 =
11 22 12
 2     2 
1
= 1−ρ 2 1 −m1
x√
σ11 − 2ρ 1 −m1
x√
σ11
2 −m2
x√
σ22 +
2 −m2
x√
σ22 ,

where we also used the correlation coefficient ρ = √σσ1112σ22 , and,


 
1 1
fX1 ,X2 (x1 , x2 ) = p exp − Q(x1 , x2 ) . (A.3)
2π σ11 σ22 (1 − ρ 2 ) 2

For variables with m1 = m2 = 0 and σ11 = σ22 = σ 2 , the bivariate density is


 
1 1
fX1 ,X2 (x1 , x2 ) = p exp − 2 (x − 2ρ x1 x2 + x2 ) .
2 2
2πσ 2 1 − ρ 2 2σ (1 − ρ 2 ) 1
We see that, if ρ = 0, this is the density of two independent normal variables.
Figure A.1 shows the density function fX1 ,X2 (x1 , x2 ) and level curves for
a bivariate normal distribution with expectation µ = (0, 0), and covariance
matrix  
1 0.5
Σ= .
0.5 1
The correlation coefficient is ρ = 0.5. N
Remark A.1. If the covariance matrix Σ is singular and non-invertible, then
there exists at least one set of constants a1 , . . . , an , not all equal to 0, such
that a′ Σ a = 0. From (A.1) it follows that V[a′ X] = 0, which means that a′ X is
constant equal to a′ µ . The distribution of X is concentrated to a hyper plane
a′ x = constant in Rn . The distribution is said to be singular and it has no
density function in Rn .
266 SOME PROBABILITY AND STATISTICS
0.2
2

0.1 1
0
0 −1
2
0 −2
−2 0 2 −2 0 2
−2

Figure A.1 Two-dimensional normal density . Left: density function; Right: elliptic
level curves at levels 0.01, 0.02, 0.05, 0.1, 0.15.

Remark A.2. Formula (A.1) implies that every covariance matrix Σ is pos-
itive definite or positive semi-definite, i.e., ∑ j,k a j ak σ jk ≥ 0 for all a1 , . . . , an .
Conversely, if Σ is a symmetric, positive definite matrix of size n×n, i.e., if
∑ j,k a j ak σ jk > 0 for all a1 , . . . , an 6= 0, . . . , 0, then (A.2) defines the density
function for an n-dimensional normal distribution with expectation mk and
covariances σ jk . Every symmetric, positive definite matrix is a covariance
matrix for a non-singular distribution.
Furthermore, for every symmetric, positive semi-definite matrix, Σ , i.e.,
such that

∑ a j ak σ jk ≥ 0
j,k

for all a1 , . . . , an with equality holding for some choice of a1 , . . . , an 6= 0, . . . , 0,


there exists an n-dimensional normal distribution that has Σ as its covariance
matrix.
For n-dimensional normal variables, “uncorrelated” and “independent”
are equivalent.
✤ ✜

Theorem A.1. If the random variables X1 , . . . , Xn are n-


dimensional normal and uncorrelated, then they are independent.
✣ ✢

Proof. We show the theorem only for non-singular variables with density
function. It is true also for singular normal variables.
If X1 , . . . , Xn are uncorrelated, σ jk = 0 for j 6= k, then Σ , and also Σ −1 are
MULTIDIMENSIONAL NORMAL DISTRIBUTION 267
diagonal matrices, i.e., (note that σ j j = V[X j ]),
 −1 
σ11 . . . 0
 
det Σ = ∏ σ j j , Σ −1 =  ... . . . ...  .
0 . . . σnn
j −1

This means that (x − µ )′ Σ −1 (x − µ ) = ∑ j (x j − µ j )2 /σ j j , and the density


(A.2) is  
1 (x j − m j )2
∏ p2πσ j j exp − 2σ j j .
j

Hence, the joint density function for X1 , . . . , Xn is a product of the marginal


densities, which says that the variables are independent. 

A.2.1 Conditional normal distribution


This section deals with partial observations in a multivariate normal distribu-
tion. It is a special property of this distribution, that conditioning on observed
values of a subset of variables, leads to a conditional distribution for the unob-
served variables that is also normal. Furthermore, the expectation in the con-
ditional distribution is linear in the observations, and the covariance matrix
does not depend on the observed values. This property is particularly useful
in prediction of Gaussian time series, as formulated by the Kalman filter.

Conditioning in the bivariate normal distribution


Let X and Y have a bivariate normal distribution with expectations mX and
mY , variances σX2 and σY2 , respectively, and with correlation coefficient ρ =
C[X ,Y ]/(σX σY ). The simultaneous density function is given by (A.3).
The conditional density function for X given that Y = y is
fX,Y (x, y)
f X|Y =y (x) =
fY (y)
 
1 (x − (mX + σX ρ (y − mY )/σY ))2
= p √ exp − .
σX 1 − ρ 2 2π 2σX2 (1 − ρ 2 )
Hence, the conditional distribution of X given Y = y is normal with expecta-
tion and variance
mX|Y =y = mX + σX ρ (y − mY )/σY , σX|Y
2
=y = σX (1 − ρ ).
2 2

Note: the conditional expectation depends linearly on the observed y-value,


and the conditional variance is constant, independent of Y = y.
268 SOME PROBABILITY AND STATISTICS
Conditioning in the multivariate normal distribution
Let X = (X1 , . . . , Xn )′ and Y = (Y1 , . . . ,Ym )′ be two multivariate normal vari-
ables, of size n and m, respectively, such that Z = (X1 , . . . , Xn ,Y1 , . . . ,Ym )′ is
(n + m)-dimensional normal. Denote the expectations

E[X] = mX , E[Y] = mY ,

and partition the covariance matrix for Z (with Σ XY = Σ ′YX ),


     
X X Σ XX Σ XY
Σ = Cov ; = . (A.4)
Y Y Σ YX Σ YY

If the covariance matrix Σ is positive definite, the distribution of (X, Y)


has the density function
1
e− 2 (x −mX ,y −mY )ΣΣ
1 ′ ′ ′ ′ −1
(x−mX ,y−mY )
fXY (x, y) = √ ,
(2π )(m+n)/2 det Σ
while the m-dimensional density of Y is
1 − 21 (y−mY )′ Σ −1
YY (y−mY ) .
fY (y) = √ e
(2π )m/2 det Σ YY
To find the conditional density of X given that Y = y,
fYX (y, x)
fX|Y (x | y) = , (A.5)
fY (y)
we need the following matrix property.
Theorem A.2 (“Matrix inversions lemma”). Let B be a p × p-matrix (p =
n + m):  
B11 B12
B= ,
B21 B22
where the sub-matrices have dimension n × n, n × m, etc. Suppose B, B11 , B22
are non-singular, and partition the inverse in the same way as B,
 
−1 A11 A12
A=B = .
A21 A22

Then
!
(B11 − B12 B−1
22 B21 )
−1 −(B11 − B12 B−1 −1 −1
22 B21 ) B12 B22
A= .
−(B22 − B21 B−1 −1 −1
11 B12 ) B21 B11 (B22 − B21 B−1
11 B12 )
−1
MULTIDIMENSIONAL NORMAL DISTRIBUTION 269
Proof. For the proof, see a matrix theory textbook, for example, [22]. 

✤ ✜

Theorem A.3 (“Conditional normal distribution”). The conditional


normal distribution for X, given that Y = y, is n-dimensional nor-
mal with expectation and covariance matrix

E[X | Y = y] = mX|Y=y = mX + Σ XY Σ −1
YY (y − mY ), (A.6)

C[X | Y = y] = Σ XX|Y = Σ XX − Σ XY Σ YY
−1
Σ YX . (A.7)

These formulas are easy to remember: the dimension of the sub-


matrices, for example in the covariance matrix Σ XX|Y , are the only
possible for the matrix multiplications in the right hand side to be
meaningful.
✣ ✢

Proof. To simplify calculations, we start with mX = mY = 0, and add the


expectations afterwards. The conditional distribution of X given that Y = y
is, according to (A.5), the ratio between two multivariate normal densities,
and hence it is of the form,
   
1 ′ ′ −1 x 1 ′ −1 1
c exp − (x , y ) Σ + y Σ YY y = c exp{− Q(x, y)},
2 y 2 2
where c is a normalization constant, independent of x and y. The matrix Σ
can be partitioned as in (A.4), and if we use the matrix inversion lemma, we
find that
 
A11 A12
Σ =A=
−1
(A.8)
A21 A22
!
ΣXX −Σ
(Σ ΣXY Σ YY
−1
Σ YX )−1 ΣXX −Σ
−(Σ ΣXY Σ −1
YY Σ YX )−1 Σ
XY Σ −1
YY
= .
−(ΣΣYY −Σ ΣYX Σ XX
−1
Σ XY )−1 Σ YX Σ −1
XX Σ
(Σ YY Σ
−Σ Σ −1
Σ
YX XX XY )−1

We also see that

Q(x, y) = x′ A11 x + x′ A12 y + y′ A21 x + y′ A22 y − y′ Σ −1


YY y
e
= (x′ − y′ C′ )A11 (x − Cy) + Q(y)
e
= x′ A11 x − x′ A11 Cy − y′ C′ A11 x + y′ C′ A11 Cy + Q(y),
e
for some matrix C and quadratic form Q(y) in y.
270 SOME PROBABILITY AND STATISTICS
Here A11 = Σ −1
XX|Y , according to (A.7) and (A.8), while we can find C by
solving
11 A12 = Σ XY Σ YY ,
−1
−A11 C = A12 , i.e. C = −A−1

according to (A.8). This is precisely the matrix in (A.6).


If we reinstall the deleted mX and mY , we get the conditional density for
X given Y = y to be of the form

1 1
Σ−1
c exp{− Q(x, y)} = c exp{− (x′ − m′X|Y=y )Σ XX|Y (x − mX|Y=y )},
2 2

which is the normal density we were looking for. 

A.2.2 Complex normal variables

In most of the book, we have assumed all random variables to be real valued.
In many applications, and also in the mathematical background, it is advan-
tageous to consider complex variables, simply defined as Z = X + iY , where
X and Y have a bivariate distribution. The mean value of a complex random
variable is simply

E[Z] = E[ℜZ] + iE[ℑZ],

while the variance and covariances are defined with complex conjugate on the
second variable,

C[Z1 , Z2 ] = E[Z1 Z2 ] − mZ1 mZ2 ,


V[Z] = C[Z, Z] = E[|Z|2 ] − |mZ |2 .

Note, that for a complex Z = X + iY , with V[X ] = V [Y ] = σ 2 ,

C[Z, Z] = V[X ] + V[Y ] = 2σ 2 ,


C[Z, Z] = V[X ] − V[Y ] + 2iC[X ,Y ] = 2iC[X ,Y ].

Hence, if the real and imaginary parts are uncorrelated with the same vari-
ance, then the complex variable Z is uncorrelated with its own complex con-
jugate, Z. Often, one uses the term orthogonal, instead of uncorrelated for
complex variables.

You might also like