Lecture03_kernel
Lecture03_kernel
Lecture03_kernel
Spring 2020
Stanley Chan
Mathematical Background
Lecture 1: Linear regression: A basic data analytic tool
Lecture 2: Regularization: Constraining the solution
Lecture 3: Kernel Method: Enabling nonlinearity
Predicted Value
The predicted value of a new sample x is
N
T
x= αn hx , x n i.
X
yb = θb
n=1
We want this model to encapsulate nonlinearity. (How? Question 2)6 / 28
c Stanley Chan 2020. All Rights Reserved.
Dual Form of Linear Regression
Goal: Addresses Question 1: Express θb as
N
αn x n .
X
θb =
n=1
Remark:
The dimensions of I on the left is d × d, on the right is N × N.
If λ = 0, then the above is true only when A is invertible.c Stanley Chan 2020. All Rights Reserved.
7 / 28
Dual Form of Linear Regression
Using the Lemma, we can show that
θb = (AT A + λI )−1 AT y (Primal Form)
= AT (AAT + λI )−1 y (Dual Form)
| {z }
def
=α
— ( x 1 )T
T
—
— (x 2 )T —
α1 N
αn x n , αn = [(AAT + λI )−1 y ]n
.. X def
= =
.. .
.
n=1
— ( x N )T
αN
—
n=1
The Idea:
Replace the inner product hx , x n i by k(x , x n ):
N
yb = θb x = αn k(x , x n ).
T X
n=1
(u T v )2 =
X X
ui vi uj vj
i=1 j=1
v12
2 X
2
X v1 v2
(ui uj )(vi vj ) = u12 u1 u2 u2 u1 u22
= v2 v1 .
i=1 j=1
v22
So if we define φ as
u12
u u
φ(u ) =
u1 u2
= 1 7→
u2 u2 u1
u22
then (u T v )2 = hφ(u ), φ(v )i. c Stanley Chan 2020. All Rights Reserved.
11 / 28
Radial Basis Function
A useful kernel is the radial basis kernel (RBF):
ku − v k2
k(u , v ) = exp − .
2σ 2
The corresponding nonlinear transform of RBF is infinite
dimensional. See Appendix.
ku − v k2 measures the distance between two data points u and v .
σ is the std dev, defining “far” and “close”.
RBF enforces local structure; Only a few samples are used.
αn = [(K + λI )−1 y ]n .
n=1
Mathematical Background
Lecture 1: Linear regression: A basic data analytic tool
Lecture 2: Regularization: Constraining the solution
Lecture 3: Kernel Method: Enabling nonlinearity
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-0.1
0 10 20 30 40 50 60 70 80 90 100
10
20
30
40
50
60
70
80
90
100
10 20 30 40 50 60 70 80 90 100
0 0 0
10 10 10
20 20 20
30 30 30
40 40 40
50 50 50
60 60 60
70 70 70
80 80 80
90 90 90
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
1
Image source:
https://towardsdatascience.com/understanding-the-kernel-trick-e0bc6112ef78c Stanley Chan 2020. All Rights Reserved.
21 / 28
Kernels in Support Vector Machines
Example. RBF for SVM (We will discuss SVM later in the semester.)
Radial Basis Function is often used in support vector machine.
Poor choice of parameter can lead to low training loss, but with the
risk of over-fit.
Under-fitted data can sometimes give better generalization.
(y − AAT α).
1
α=
λ
Rearrange terms gives (AAT + λI )α = y , which yields
α = (AAT + λI )−1 y .
Substitute into θ = AT α gives θ = AT (AAT + λI )−1 y . c Stanley Chan 2020. All Rights Reserved.
25 / 28
Non-Linear Transform for RBF
Let us consider scalar u ∈ R.
k(u, v ) = exp{−(u − v )2 }
= exp{−u 2 } exp{2uv } exp{−v 2 }
∞
!
X 2 k uk v k
= exp{−u 2 } exp{−v 2 }
k!
k=0
r r r !T
2 1 2 2 2 3
= exp{−u 2 } 1, u, u2, u3, . . . ,
1! 2! 3!
r r r !
21 22 2 23 3
× 1, v, v , v , . . . , exp{−v 2 }
1! 2! 3!
So Φ is
r r r !
21 22 2 23 3
φ(x) = exp{−x 2 } 1, x, x , x ,...,
1! 2! 3! c Stanley Chan 2020. All Rights Reserved.
26 / 28
Kernels are Positive Semi-Definite
Given {x j }N
j=1 , construct a N × N matrix K such that
where [Φ(x i )]k denotes the k-th element of the vector Φ(x i ).
c Stanley Chan 2020. All Rights Reserved.
27 / 28
Existence of Nonlinear Transform