Regression Vs Kalman Filter
Regression Vs Kalman Filter
Regression Vs Kalman Filter
Simo Särkkä
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Contents
2 Multidimensional Models
3 Non-Linear Models
6 Dynamic Models
7 Summary
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Introduction: Correlation of Time Series
0
0 20 40 60 80 100
1.5
y(t)
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Introduction: Correlation of Time Series (cont.)
1.4 1.4
1.2 1.2
2
1 1 R = 0.9731
y = 0.11 x + 0.17
0.8 0.8
y
y
0.6 0.6
0.4 0.4
0.2 0.2
0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
x x
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Least Squared Solution to Linear Regression [1/3]
1.4
1.2
0.8
y
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10
x
y (i) = a x (i) + b, i = 1, . . . , n.
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Least Squared Solution to Linear Regression [2/3]
1 X (i)
S(a, b) = (y − ax (i) − b)2
n
i
= E[(y − ax − b)2 ],
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Least Squared Solution to Linear Regression [3/3]
where
1 X (i) (i) 1 X (i) (i)
E [xy] = x y , E [x 2 ] = x x
n n
i i
1 X (i) 1 X (i)
E [x] = x , E [y] = y .
n n
i i
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Correlation coefficient R
y − E[y] = a (x − E[x]),
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Correlation coefficient R (cont)
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Coefficient of determination R 2
1 X (i)
S(a, b) = (y − ax (i) − b)2
n
i
(E[x y] − E[x] E[y])2
= E[y 2 ] − E[y]2 − .
E[x 2 ] − E[x]2
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Coefficient of determination R 2 (cont)
The proportion of variance (MSE) in y that the fit has
explained, that is, the coefficient of determination can be
computed as:
Var[y] − S(a, b) S(a, b)
=1−
Var[y] E[y 2 ] − E[y]2
(E[x y] − E[x] E[y])2
= .
(E[x 2 ] − E[x]2 ) (E[y 2 ] − E[y]2 )
Comparing to the correlation coefficient expression reveals
that
Var[y] − S(a, b)
= R 2.
Var[y]
That is, the coefficient of determination is the square of the
correlation coefficient.
This definition of correlation coefficient also works with
non-linear and multidimensional models.
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Cautions on Interpretation of Correlations
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Effect of Delay to Correlations [1/2]
0
0 20 40 60 80 100
1.5
y(t)
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Effect of Delay to Correlations [2/2]
2 0.9
1.4 R = 0.3016
y = 0.06 x + 0.41 0.8
1.2
2
Coefficient of variation R
0.7
1
0.6
0.8 0.5
y
0.4
0.6
0.3
0.4
0.2
0.2
0.1
0 0
1 2 3 4 5 6 7 8 9 10 0 5 10 15 20
x Delay
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Multidimensional linear models [1/3]
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Multidimensional linear models [2/3]
The model and the mean squared error can be now written
as:
Y = Hθ
1
S(θ) = (Y − H θ)T (Y − H θ).
n
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Multidimensional linear models [3/3]
1
∇S(θ) = [−2HT Y + 2HT Hθ] = 0.
n
The resulting least squares estimate is
θ = (HT H)−1 HT Y.
Var[y] − S(θ)
R2 = .
Var[y]
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Example: Dependence of Three Signals [1/3]
−2
0 500 1000 1500
x2(t)
2
−2
0 500 1000 1500
y(t)
0.5
−0.5
0 500 1000 1500
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Example: Dependence of Three Signals [2/3]
R2 = 0.9918
Fit
Data
0.5
y = 0.10 x1 + 0.30 x2
0.4
0.3
0.2
0.1
−0.1
−0.2
200 400 600 800 1000 1200 1400
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Example: Dependence of Three Signals [3/3]
The scatter plot of (x1 , x2 , y) reveals that the time series lie
on the same plane in 3d:
Scatter plot of (x ,x ,y)
1 2
0.6
0.5
0.4
0.3
y
0.2
0.1
−0.1
1.5
1 1.5
1
0.5 0.5
0
0 −0.5
−1
x −0.5 −1.5
2 x1
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Linear-in-Parameters Models
y = a1 x + a2 x 2 + ... + ad x d + b,
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Nonlinear Models [1/3]
The advantage of linearity is that it yields to easy
mathematics.
But, linearity can be a restriction in the modeling point of
view.
We can also use general non-linear models of the form
y = f (x; θ),
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Nonlinear Models [2/3]
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Nonlinear Models [3/3]
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Example: Approximation of Sine
Polynomial fit
2
2
R = 0.9960
1.5 y = −0.00032x6+0.00072x5+0.04x4−0.22x3−0.046x2+1.1x+0.16
0.5
−0.5
−1
0 1 2 3 4 5 6
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Over-fitting and Regularization [1/3]
0.5 0.5
0 0
−0.5 −0.5
0 2 4 6 8 10 0 2 4 6 8 10
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Over-fitting and Regularization [2/3]
1
Sr (θ) = (Y − H θ)T (Y − H θ) + λ |θ|2 .
n
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Over-fitting and Regularization [3/3]
The parameter λ can be used for tuning the effective order
of the polynomial from say 10 to 0 (λ = 0, . . . , ∞).
It is also possible to optimize the parameter λ using
information criteria (AIC, BIC, DIC, ...) or by using
cross-validation.
The polynomial order can be also used as a regularization
parameter as such and estimated by information criteria or
cross-validation.
In the case of MLP’s the number of hidden units (and
parameters) can be similarly used as a regularization
parameter.
A general class of cost terms are the Tikhonov
regularizers: Z
C(θ) = |Lf (x; θ)|2 d x,
R2 = 0.8039
1
0.5
−0.5
0 2 4 6 8 10
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Example: Regularization of Polynomial Fit (cont)
R2 = 0.8077
1
0.5
−0.5
0 2 4 6 8 10
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Cautions on Practical Use of Non-Linear Models
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Input Selection in Multi-Linear Models
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Model Selection [1/2]
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Model Selection [2/2]
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Maximum Likelihood
The linear regression can be equivalently formulated as
stochastic model
where
1 2 2
N(y | m, σ 2 ) = √ e−(y −m) /(2σ ) .
2πσ 2
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Maximum Likelihood (cont)
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Bayesian Data Analysis
In Bayesian analysis, also the parameters are considered
as random variables with a prior distribution:
θ ∼ p(θ).
This prior distribution can be, for example,
multidimensional Gaussian:
1 1 T −1
p(θ) = p exp − (θ − µ) Σ (θ − µ) .
|2πΣ| 2
The measurements are modeled in the same manner as in
ML-estimation, e.g.:
p(y (i) | θ) = N(y (i) | f (x(i) ; θ), σ 2 ).
The joint distribution of all the measurements is now
Y
p(Y | θ) = p(y (i) | θ).
i
where
The posterior mean m is the L2 -regularized least squares
solution.
The posterior covariance Σ is the covariance of the error in
the mean.
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Posterior Distribution [3/3]
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Hierarchical Models
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Hierarchical Models (cont)
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Marginalization of Hyper-parameters
Watch out for the notation - this looks the same as the
posterior with fixed variances, but it is not the same!
In the linear regression case, this marginal posterior
distribution is a Student’s T-distribution.
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Typically Used Distribution Models
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Gaussian, Monte Carlo and Other Approximations
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Gaussian, Monte Carlo and Other Approximations
(cont)
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Batch Bayesian Estimation
1 Collect the data:
p(yk | θ)
3 Specify the prior distribution:
p(θ).
4 Compute the posterior distribution:
1 Y
p(θ|Y) = p(θ) p(yk | θ).
Z
k
5
1.2
1
0
0 20 40 60 80 100
0.8
y
1.5
y(t) 0.6
1
0.4
0.5 0.2
0 0
0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10
x
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Bayesian Batch Linear Regression [2/4]
Measurement data:
Likelihood (σ 2 given):
p(θ) = N(θ | m0 , P0 ).
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Bayesian Batch Linear Regression [3/4]
Because p(θ) and p(θ | Y) are Gaussian, we get
p(θ | Y) = N(θ | mT , PT ).
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Bayesian Batch Linear Regression [4/4]
Linear fit
1.6
1.4
1.2
1 R2 = 0.9731
y = 0.11 x + 0.17
0.8
y
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10
x
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Bayesian Recursive Linear Regression [1/4]
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Bayesian Recursive Linear Regression [2/4]
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Bayesian Recursive Linear Regression [3/4]
Estimated parameters
0.35
Recursive a
Recursive b
0.3 Batch a
Batch b
0.25
Paramater value
0.2
0.15
0.1
0.05
0
0 20 40 60 80 100
Step number
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Bayesian Recursive Linear Regression [4/4]
0
10
−1
10
−2
10
−3
10
0 20 40 60 80 100
Step number
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Kalman Filtering [1/7]
θ k = θ k −1 + wk
p(yk | θ k ) = N(yk | ak xk + bk , σ 2 )
p(θk | θ k −1 ) = N(θ k | θ k −1 , Q)
p(θ0 ) = N(θ 0 | m0 , P0 ).
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Kalman Filtering [2/7]
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Kalman Filtering [3/7]
k = mk −1
m−
k = Pk −1 + Q.
P−
Update:
T 2
Sk = Hk P−
k Hk + σ
T −1
Kk = P−
k Hk Sk
mk = m−
k + Kk [yk − Hk mk ]
−
T
Pk = P−
k − Kk Sk Kk .
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Kalman Filtering [4/7]
y
2
y(t) −0.5
1
−1
0
−1 −1.5
−2 −2
0 50 100 150 200 0 5 10 15 20
x
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Kalman Filtering [5/7]
Estimated parameters
0.6
Kalman a
Kalman b
0.5
Batch a
Batch b
0.4
Paramater value
0.3
0.2
0.1
−0.1
−0.2
0 20 40 60 80 100 120 140 160 180
Step number
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Kalman Filtering [6/7]
−1
−2
0 20 40 60 80 100 120 140 160 180 200
−1
−2
0 20 40 60 80 100 120 140 160 180 200
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Kalman Filtering [7/7]
xk = A xk −1 + qk , qk ∼ N(0, Q)
yk = H xk + rk , rk ∼ N(0, R)
x0 ∼ N(m0 , P0 ).
p(yk | xk ) = N(yk | H xk , R)
p(xk | xk −1 ) = N(xk | A xk −1 , Q).
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Probabilistic Non-Linear Filtering [1/2]
xk = f(xk −1 , qk )
yk = h(xk , rk ).
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Probabilistic Non-Linear Filtering [2/2]
dx
= f(x, t) + w(t)
dt
where w(t) is a continuous time Gaussian white noise
process.
Approximation methods: Extended Kalman filters,
Unscented Kalman filters, sequential Monte Carlo, particle
filters.
Simo Särkkä Lecture 2: From Linear Regression to Kalman Filter and Beyond
Summary