Matrix Norms
Matrix Norms
Matrix Norms
Steven L. Carnie
Here we summarize the important properties of the vector and matrix
norms we need for 620-381.
In doing error analysis for problems involving vectors and matrices (e.g.
solving linear systems), we need to measure the size of the errors in a vector
or matrix. There are two common kinds of error analysis in numerical linear
algebra: componentwise and normwise. In general, normwise error analysis
is easier (but less precise). For that we need to introduce vector norms and
matrix norms.
Vector norms
From 620122/142/211 you should have seen the idea of a vector norm. Here
we work with real vector spaces Rn but everything goes over to complex
spaces (replace transpose by Hermitian transpose, symmetric matrix by Hermitian, orthogonal matrix by unitary etc. where necessary).
Definition 1 The norm of a vector space V is a function k k : V 7 R with
the properties:
1. kxk 0 x V where kxk = 0 x = 0 (norms are positive for
nonzero vectors)
2. kxk = ||kxk (scaling a vector uniformly scales the norm by the same
amount)
3. kx + yk kxk + kxk x, y V (triangle inequality)
C Example The 3 most common vector norms are:
P
1. kxk1 = |xi | (the 1-norm)
1
2. kxk2 = (
x2i )1/2 =
p
kQxk2 = (Qx)T (Qx) = xT QT Qx = xT x = kxk2
since QT Q = I for an orthogonal matrix. Its also the only norm with a close
connection to the inner product (dot product) between vectors.
In finite-dimensional spaces, it doesnt matter much precisely which norm
you use, since they cant differ by more than a factor n from each other. This
property is called norm equivalence.
Theorem 1
kxk2 kxk1
kxk kxk2
nkxk2
nkxk
It is not obvious, but true, that these two properties are equivalent i.e. a
norm that is monotone is absolute and vice versa.
Not every vector norm has these properties but the p-norms do.
We use vector norms to measure the size of errors in, for example, the
RHS b of a linear system.
Matrix norms
Similarly we will want to measure the size of errors in the coefficient matrix
A of a linear system. For that we need matrix norms. A matrix norm is
just a vector norm on the m n dimensional vector space of m n matrices.
Hence
Definition 2 A matrix norm is a function k k : Mmn 7 R with the properties:
1. kAk 0 A Mmn where kAk = 0 A = 0 (norms are positive for
nonzero matrices)
2. kAk = ||kAk (scaling a matrix uniformly scales the norm by the
same amount)
3. kA + Bk kAk + kBk A, B Mmn (triangle inequality)
C Example The following are matrix norms:
P
1. kAkF = ( a2ij )1/2 (the Frobenius norm)
2. kAkM = maxi,j |aij | (the max-norm)
Exercise 2 Are these norms monotone?
Hint: are they absolute?
Well mostly use not these norms, but a common class of matrix norms,
called subordinate matrix norms or induced matrix norms or (Demmel) operator norms:
kAxk
kxk
In words, the (subordinate) norm of a matrix is the norm of the largest image
under the map A of a unit vector.
Exercise 3 Prove that any subordinate norm is a matrix norm
C Example The subordinate norms well use are those corresponding to
the vector norms listed above:
P
1. kAk1 = maxj i |aij | (the maximum column sum)
2. kAk2 (the 2-norm )
P
3. kAk = maxi j |aij | (the maximum row sum)
These are the only subordinate p-norms that are easy to compute.
P
Exercise 4 Prove that kAk = maxi j |aij |
The subordinate norms have some useful submultiplicative properties:
kAxk kAkkxk
by definition; and
kABk kAkkBk
Any matrix norm with the latter property is called consistent.
Similarly, we have norm equivalence for matrix norms, since they are
within a factor of n of each other:
Theorem 2
1
kAk2 kAk1 nkAk2
n
1
kAk kAk2 nkAk
n
1
kAk kAk1 nkAk
n
So you may as well use whichever one is convenient.
4
2.1
max (AT A)
kAxk2
(xT AT Ax)1/2
= max
x6=0
kxk2
kxk2
Sensitivity
Now we know how to measure size in our various vector spaces, we ask: how
do small changes in the coefficient matrix and RHS of a linear system affect
the solution? Since we can change a matrix in many different ways, we just
ask that the change is arbitrary but that the norm of the difference is small,
relative to the norm of the matrix i.e. we consider small normwise relative
changes to the matrix. This means we are doing a worst case analysis since
were looking for the worst that can happen.
(1)
(2)
+
)
kAk
kxk
kbk
1 (A) kAk kAk
6
(3)
kAk
kxk
kAk
(4)
So far we havent specified the matrix norm were working in. If thats
important we specify it by a subscript e.g. 2 (A) if were using the matrix
2-norm.
3.1
Properties of (A)
1.
(A) 1
for any square matrix.
Proof: Since I = A1 A, for any consistent norm kIk = 1 kA1 k kAk =
(A).
This means relative errors in the solution can never be smaller than
those of the inputs, unlike the situation in evaluating a function, with
relative condition number | xf 0 (x)/f (x) |.
We call a matrix with (A) = 1 perfectly-conditioned. If (A) 1, we
call the problem ill-conditioned.
Rule of Thumb: if (A) 10k , expect to lose k digits of accuracy in
the answer (compared to the error in the inputs)
2.
(A) =
max kAxk
min kAxk
max (AT A)
min (AT A)
(A)
kAk
So if (A) is small (near 1), A cannot be close to a singular matrix.
Conversely if (A) is large, A could be close to a singular matrix. In
fact, in any p-norm one can prove that
kAkp
1
= min
p (A)
kAkp
i.e. if (A) is large, A is close to a singular matrix,
Geometrically, a matrix being ill-conditioned corresponds to a system of
hyperplanes that are almost parallel i.e. you wouldnt have to tilt them
much to make them parallel and hence have no or infinitely many solutions.
In statistics, this is called collinearity.
(A) is computed in Matlab with the cond command. See help cond to
check which norms you can ask for. Its not cheap to compute (A) since
its more expensive to find A1 than to solve your system. So instead one
usually estimates its value. Typically you just want to know if its getting
too big. In Matlab one can use either condest to estimate 1 (A) or rcond
which estimates its reciprocal.
3.2
A=
1001 1000
1
1
A = 0.1I20
0.1 0 0
0 0.1 0
..
..
..
..
.
.
.
.
0 0 0.1
2020
Its determinant is 1020 but its condition number = 1 since max kAxk =
min kAxk = 0.1 i.e its perfectly-conditioned.
The condition number measures how close a linear system is to singularity, not the determinant.
If you try to solve a system in Matlab using \ and the estimated condition
number is too large, you get a warning :
Warning: Matrix is close to singular or badly scaled.
Results may be inaccurate. RCOND = 9.144066e-19.
You then take your life in your hands ...
Now that we understand the sensitivity of linear systems, we need to determine how to ensure a small backward error i.e. how do we make a stable
algorithm?
9
4.1
The first question to ask is: how do I measure the backward error?
Recall that in root-finding the residual f (xn ) was a measure of the (absolute) backward error since xn was the exact root of the nearby function
f(x) f (x) f (xn ). A similar thing happens here.
we can compute the residual
From a computed solution x
r = A
xb
Then assuming b = 0, we have
(A + A)
x=b
A
x = r
krk kAk k
xk
kAk
krk
k
xk
kAk
kAk k
xk
where we have defined the relative residual . Therefor a large relative residual guarantees a large (normwise) relative backward error. In fact, the bound
can be attained so a small relative residual guarantees a small backward error.
A stable method produces small relative residuals.
4.2
Error analysis
At this stage, we quote the result of bounding the backward error of common
algorithms using the standard model of floating point operations.
Notation: the quantity nu/(1 nu) crops up so often we give it a symbol.
n
nu
1 nu
Recall that in double precision u is so small that for any reasonable value of
n, n nu.
10
(5)
which is not what were after. We would like a bound for A involving | A |.
|| U
| can be much bigger
The problem with Gauss elimination is that | L
than | A |.
U
had only positive components, there would be no
If for some reason L,
problem since then
|| U
|=| L
U
|=| A | +O(n )
|L
but in general we dont know that is true (and its not).
11
4.3
|| U
|
The purpose of a pivoting strategy is to choose pivots that make | L
some modest multiple of | A |. Partial pivoting ensures that
| 1
|L
|?
componentwise, but what can we say about | U
Going to a normwise analysis, we get
kUk
kAk 2n kLk
for a monotone norm (such as the 1, F or infinity norms). Following Demmel,
let us define the growth factor using the max-norm:
M kUk
M /kAkM
= kLk
then, for the 1 or infinity norms we get
kAk 2n2 n kAk
where we have used
kAkM kAk1 nkAkM
kAkM kAk nkAkM
Hence the normwise backward error is determined by the growth factor.
For Gauss elimination without pivoting, there is no bound for the growth
factor.
1
C Example The matrix
for 1 has 2 which can be
1 1
very large.
For Gauss elimination with partial pivoting, the situation is rather complicated.
There are matrices that have = 2n1 for them GEPP is not backward stable.
These matrices appear to be very rare in practise. For most matrices,
is very modest in size e.g < 10
12
Beyond GEPP
What if youre a nervous sort of person, who wants more reassurance about
backward stability than GEPP gives you? There are 3 options open to you:
1. Gauss elimination with complete pivoting
2. iterative refinement
3. QR factorization
5.1
Instead of just looking for the largest (in magnitude) element in the pivot
column, look for the largest element in the remaining submatrix to the right
and below the pivot. Then do row and column swaps to bring that element
into the pivot position. This is called complete pivoting and costs about n3
operations i.e. its expensive compared to partial pivoting.
Then we have the following result:
n1/2+log(n)/4
13
A 1968 conjecture that n was disproved in 1991 but its still thought
that
n
for complete pivoting i.e GE with complete pivoting is backward stable.
5.2
Iterative refinement
5.3
QR factorization
where
kAkF n2 cn kAkF
(6)
References
1. J.W.Demmel, Applied Numerical Linear Algebra, SIAM, 1997.
2. G.Golub and C. Van Loan, Matrix Computations, Johns Hopkins Press,
1989.
3. N.J.Higham, Accuracy and Stability in Numerical Algorithms, SIAM,
1996.
15