Three Derivations of Principal Component Analysis
Three Derivations of Principal Component Analysis
Three Derivations of Principal Component Analysis
lewis
Why are the PCA basis vectors the eigenvectors of the correlation matrix?
¿From Ballard & Brown, Computer Vision: The (random) data vector is x; its component along a proposed axis u is (x · u).
The variance of this is E(x · u − E(x · u))2 (the variance is the expectation of the square of the data with its mean removed).
C is the covariance or ’correlation’ matrix. The u that gives the maximum value to u T Cu (with the constraint that u is a unit
vector) is the eigenvector of C with the largest eigenvalue. The second and subsequent principal component axes are the other
eigenvectors sorted by eigenvalue.
Find PCA basis vectors u that minimize E||x − x̂||2 for a partial expansion out to P components:
X
P
x̂ = (x · uk )uk
k=1
X
N
x − x̂ = (x · uk )uk
k=P +1
The correlation matrix of some data: C = E[xxT ]. The correlation matrix of the data x transformed by some transform T:
C 0 = E[T x(T x)T ] = E[T xxT T T ]. The inner xxT is the correlation matrix of the original data. Now suppose that the rows
of T are chosen to be the eigenvectors of this correlation matrix– then because of the orthogonality of the eigenvectors, the
resuling matrix C’ will be diagonal. Thus C’, the correlation matrix of the transformed data, is uncorrelated. So the basis that
diagonalizes the correlation matrix consists of the eigenvectors of the (original) correlation matrix.
Three Derivations of Principal Components j.p.lewis
Correlation matrices
If xk are a sliding window through a signal, i.e. x0 contains samples 0..10, x1 samples 1..11, etc., then this corresponds to
estimating the autocovariance of the signal. If xk are images scanned into a vector, this gives the average (after dividing by N)
correlation of pixel i with pixel j.
The i, j entry of M T M is the dot of data vector i with data vector j. If a column of M contains various measurements for a
particular person then (M T M )i,j gives the correlation, averaged across tests, of person i with person j, while (M M T )i,j gives
the correlation, averaged across people, of test i versus test j.
SVD decomposes a possibly non-square matrix M into USV where U,V are square rotation-like matrices and S is a diagonal
matrix of singular values. The columns of U are the eigenvectors of M M T , the columns of V are the eigenvectors of M T M .
Computation Trick
If we are computing PCA on an image, M will be (e.g.) a million by N (N images), and M M T will be million2 . Instead, first
find the eigenvectors of M T M (which is N xN ): M T M x = λx. Then premultiply by M and interpret as (M M T )(M x) =
λ(M x), i.e., M x are the desired eigenvectors, now given as a linear combination of the original data using weights which are
the eigenvector of the smaller system.